Graphical abstract graphic file with name ga1.jpg [43]Open in a new tab Keywords: neonatal hyperbilirubinemia, Gut microbiota, metabolome, Machine learning, causal inference, branched-chain amino acid Abbreviations: BCAA, branched-chain amino acid; NJ, neonatal jaundice; HC, healthy controls; ROC, receiver operating characteristic; AUROC, the area under the ROC; LC-MS, liquid chromatography-mass spectrometry; PCA, the principal component analysis; PLS, partial least-squares regression; OPLS-DA, orthogonal partial least squares-discriminant analysis; MSUD, maple syrup urine disease; KEGG, Kyoto Encyclopedia of Genes and Genomes Abstract Background The gut microbiota plays an important role in the early stages of human life. Our previous study showed that the abundance of intestinal flora involved in galactose metabolism was altered and correlated with increased serum bilirubin levels in children with jaundice. We conducted the present study to systematically evaluate alterations in the meconium metabolome of neonates with jaundice and search for metabolic markers associated with neonatal jaundice. Methods We included 68 neonates with neonatal hyperbilirubinemia, also known as neonatal jaundice (NJ) and 68 matched healthy controls (HC), collected meconium samples from them at birth, and performed metabolomic analysis via liquid chromatography-mass spectrometry. Results Gut metabolites enabled clearly distinguishing the neonatal jaundice (NJ) and healthy control (HC) groups. We also identified the compositions of the gut metabolites that differed significantly between the NJ and HC groups; these differentially significant metabolites were enriched in aminyl tRNA biosynthesis; pantothenic acid and coenzyme biosynthesis; and the valine, leucine and isoleucine biosynthesis pathways. Gut branched-chain amino acid (BCAA) levels were positively correlated with serum bilirubin levels, and the area under the receiver operating characteristic curve of the random forest classifier model based on BCAAs, proline, methionine, phenylalanine and total bilirubin reached 96.9%, showing good potential for diagnostic applications. Machine learning-based causal inference analysis revealed the causal effect of BCAAs on serum total bilirubin and NJ. Conclusions Altered gut metabolites in neonates with jaundice showed that increased BCAAs and total serum bilirubin were positively correlated. BCAAs proline, methionine, phenylalanine are potential biomarkers of NJ. 1. Introduction Neonatal hyperbilirubinemia, also known as neonatal jaundice, Jaundice is common in the neonatal period, occurring in 60%–84% of full- and near-full-term neonates, and hyperbilirubinemia occurs in approximately 8%–11% of newborns [44][1], accounting for 49.1% of hospitalized neonates [45][2]. Severe hyperbilirubinemia can lead to bilirubin encephalopathy and severe sequelae, imposing a heavy burden on society and families [46][3], [47][4]. Studies on neonatal hyperbilirubinemia have shown that the enterohepatic circulation plays an important role in bilirubin excretion [48][5] and that the gut microbiota is involved in the development of several liver diseases via the gut-liver axis [49][6], [50][7]. We previously performed a metagenomic analysis of neonatal jaundice, which showed that changes in the gut microbiotas of patients with jaundice were mainly characterized by significantly decreased abundances of Bifidobacteria and galactose-metabolizing bacteria and suggested that Bifidobacteria may be involved in bilirubin metabolism via the galactose-metabolizing pathway [51][8]. Few studies have been published on neonatal metabolomics, and those studies mainly assessed metabolomics from serum samples [52][9]. Serum metabolite studies in neonates with jaundice have suggested abnormalities in amino acid metabolism in these patients [53][10] and identified biomarkers that can be used for early diagnosis of biliary atresia, a potential cause of jaundice [54][9], [55][11]. Studies on gut metabolomics in neonates with jaundice are lacking. Previous studies suggested that gut metabolites may play important roles in jaundice and that the gut microbiota is associated with serum bilirubin. In this study, we further investigated the possible mechanisms underlying the development of NJ and bilirubin encephalopathy by examining the differences in gut metabolomics between neonates with and without jaundice. 2. Methods 2.1. Participants and sample collection All patients were from a tertiary general hospital in Shenzhen, China. The diagnostic criteria for neonatal hyperbilirubinemia referred to the American Academy of Pediatrics Guidelines for Neonatal Jaundice Intervention [56][12] and the Expert Consensus on the Diagnosis and Treatment of Neonatal Hyperbilirubinemia of the Neonatology Group of the Chinese Medical Association Pediatrics Branch [57][13]. Inclusion criteria were no high-risk factors in the mother before birth and no fetal defecation after birth before enrollment. Exclusion criteria were mothers with high-risk factors, antibiotic use within 2 weeks before delivery, newborns who were younger than gestational age, and newborns with severe infections or congenital malformations confirmed after admission. The first meconium samples (3–5 g) excreted by the newborns included in the study after birth were collected in sterile containers by researchers with gloves, avoided inadvertent pollution, then were placed in a − 80℃ freezer immediately. The neonates were grouped into either the NJ or healthy control (HC) group based on their serum bilirubin levels such as TBIL: total bilirubin (UniCel DxC800 Synchron automatic biochemical analysis instrument from Beckman Coulter Co., LTD) during hospitalization. Specimens from patients whose basic data (e.g., sex, gestational age, birth weight and birth mode) did not significantly differ (P > 0.05) were further analyzed. The hospital’s medical ethics committee approved the study protocol, which was performed in accordance with the Declaration of Helsinki. Each child's parents provided written informed consent. 2.2. Liquid chromatography-mass spectrometry (LC-MS) LC-MS was performed as previously described [58][14], [59][15]. The collected stool samples were freeze-dried to remove water, then approximately 30 mg of the stool was weighted and added to 600 µL of 50% acetonitrile/water extract containing 5 µM chlorosulfonylurea (internal standard), mixed thoroughly and sonicated at room temperature for 30 min. To better remove impurities from the stool, an equal volume of the extract was added to 200 µL of supernatant from the first centrifugation, vortexed and centrifuged at 18,000 r/min for 25 min. The supernatant was then sampled for analysis. An Ultimate 3000 LC system (Thermo Scientific, Waltham, MA, USA) coupled with an Acquity UPLC HSS T3 column (2.1 mm × 100 mm, 1.8 µm; Waters Corporation, Milford, MA, USA) was used to separate the metabolites. MS was then performed using an Orbitrap Elite mass spectrometer (Thermo Scientific) in electron spray ionization-positive and -negative modes (ESI + and ESI − ) per the manufacturer's instructions. 2.3. Metabolomics analysis Regarding the methodology of metabolomic analysis, we mainly refer to our previous publication [60][14], [61][15] and the raw data were preprocessed by Compound Discoverer software (ThermoFisher Scientific, USA) for LC/MS data (detailed in [62]Supplementary Appendix),in short, the extracted data normalized to the sum of the peak area before analysis, multivariate statistical analysis was performed using SIMCA-P Software (Umetrics AB, Umea, Sweden), including PCA analysis, PLS-DA analysis and OPLS-DA analysis. Differential metabolites were screened by OPLS-DA model VIP (variable weight) value > 1 and T-test P value (P < 0.05). After screening the differential compounds, HMDB and KEGG database were used to match the corresponding mass-charge ratio (PPM < 10) to list the candidate compounds. Final matching and identification by the secondary fragment corresponding to the differential compound, Then using KEGG and MetaboAnalyst [63]https://www.metaboanalyst.ca/ [64]https://www.genome.jp/kegg/ commercial database to analyze the metabolite pathways. 2.4. Machine learning and causal inference Machine learning was performed as previously described [65][14], [66][16] to determine which meconium metabolites could be used as neonatal hyperbilirubinemia biomarkers. As the random forest method allows for ranking the importance of the selected features.We used the Random Forest Classifier function of scikit learn (version 0.23.1) to determine the importance of the meconium metabolites. We used the train_test_split function (parameter, test_size = 0.4) to split the samples into training and validation sets, then used Random Forest Classifier to train and validate the model for neonatal hyperbilirubinemia classification and used the roc_curve function to plot the receiver operating characteristic (ROC) curve to obtain the area under the ROC (AUROC). Machine learning-based causal inference was performed using the Microsoft DoWhy ([67]https://github.com/microsoft/dowhy) and EconML ([68]https://github.com/econml/) libraries following the software manual as detailed in another manuscript [69][16] (detailed in [70]Supplementary Appendix). Briefly, meconium metabolites might lead to NJ was encoded into a causal model and represented by a graph, with each arrow in the graph indicating a causal relationship. Second, Dowhy's backdoor.linear_regression method was used to check whether meconium metabolites could estimate thelevel of TBIL. Third, EconML's machine-learning method was used to construct the estimator using gradient-boosting trees to learn the relationship between the outcome and confounders and the relationship between the intervention and confounders and finally compare the residuals between the outcome and intervention. Finally, placebo_treatment_refuter and data_subset_refuter tests were used to evaluate the model’s robustness. 3. Results 3.1. Gut metabolomics clearly distinguished neonates with jaundice from HCs Sixty-eight neonates with hyperbilirubinemia were included in this study: 38 males and 30 females, of whom, 55 were delivered vaginally, and 13 were delivered via cesarean section, mean birth weight of (3144 ± 386) g, mean TBIL of (275.0 ± 64.0) umol/L.Sixty-eight matched HCs were also included: 35 males and 33 females, of whom, 44 were delivered vaginally, and 24 were delivered via cesarean section, mean birth weight of (3096 ± 466) g, mean TBIL of (150.6 ± 40.7) umol/L ([71]Supplementary Table 1). There was no significant difference between the two groups except for serum bilirubin (P < 0.05). The OPLS-DA model was built, and the R^2 and Q^2 values were used to test the overfitting of the model and assess its statistical significance. The original model (R2Y) was closer to 1, indicating that the established model was more consistent with the real situation of the sample data. The original model (Q^2) was close to 0.5, indicating that adding a new sample to the model yielded a more approximate distribution, and the original model better explained the differences between the two sample groups ([72]Fig. 1A). Thus, the original model had good robustness, with no overfitting, and the fecal metabolomic analysis results showed good stability for the samples and instruments in both positive- and negative-ion mode. The metabolites clearly distinguished NJ from HC ([73]Fig. 1B). Fig. 1. [74]Fig. 1 [75]Open in a new tab OPLS-DA model to evaluate metabolomic data. A, Permutation test of the OPLS-DA model for the NJ vs HC groups. The original model (R2Y) was closer to 1, indicating that the established model was more consistent with the real situation of the sample data. The original model (Q^2) was close to 0.5, indicating that adding a new sample to the model yielded a more approximate distribution and that the original model better explained the differences between the two sample groups. B, Scatter plot of the OPLS-DA model scores for the NJ vs. HC groups; the two groups of sample metabolites can be clearly distinguished. Based on the good robustness of the study model and the likelihood that metabolites can clearly distinguish the NJ group from the HC group, we further visualized the differential metabolites. First, we measured the relative levels of the metabolites at the same level based on z-scores ([76]Fig. 2A), and the metabolite groups varied largely across the groups. The z-scores ranged from − 2 to 8 relative to those of the HC group. Fig. 2. [77]Fig. 2 [78]Open in a new tab Visualization of metabolites that differed significantly between the NJ and HC groups. A, z-score plots showing the extent of variation in the differentially significant metabolites between the NJ and HC groups. z-score plots show that the metabolites were highly variable across the groups, with z-scores ranging from − 2 to 8 relative to those of the HCs. B, Volcano diagram showing the metabolites that differed significantly between the NJ and HC groups. Each point represents a metabolite; the horizontal coordinate represents the fold change of the group comparing each substance (taken as the logarithm with a base of 2). The vertical coordinate represents the P-value of the Student's t-test (taken as the negative logarithm with a base of 10), and the scatter size represents the VIP value of the OPLS-DA model, with a larger scatter indicating a larger VIP value. The scatter color represents the final screening results, with significantly upregulated metabolites in red, significantly downregulated metabolites in blue, and non-significantly different metabolites in gray. (For interpretation of the references to color in this figure legend, the