Graphical abstract

   graphic file with name ga1.jpg
   [43]Open in a new tab

   Keywords: neonatal hyperbilirubinemia, Gut microbiota, metabolome,
   Machine learning, causal inference, branched-chain amino acid

   Abbreviations: BCAA, branched-chain amino acid; NJ, neonatal jaundice;
   HC, healthy controls; ROC, receiver operating characteristic; AUROC,
   the area under the ROC; LC-MS, liquid chromatography-mass spectrometry;
   PCA, the principal component analysis; PLS, partial least-squares
   regression; OPLS-DA, orthogonal partial least squares-discriminant
   analysis; MSUD, maple syrup urine disease; KEGG, Kyoto Encyclopedia of
   Genes and Genomes

Abstract

Background

   The gut microbiota plays an important role in the early stages of human
   life. Our previous study showed that the abundance of intestinal flora
   involved in galactose metabolism was altered and correlated with
   increased serum bilirubin levels in children with jaundice. We
   conducted the present study to systematically evaluate alterations in
   the meconium metabolome of neonates with jaundice and search for
   metabolic markers associated with neonatal jaundice.

Methods

   We included 68 neonates with neonatal hyperbilirubinemia, also known as
   neonatal jaundice (NJ) and 68 matched healthy controls (HC), collected
   meconium samples from them at birth, and performed metabolomic analysis
   via liquid chromatography-mass spectrometry.

Results

   Gut metabolites enabled clearly distinguishing the neonatal jaundice
   (NJ) and healthy control (HC) groups. We also identified the
   compositions of the gut metabolites that differed significantly between
   the NJ and HC groups; these differentially significant metabolites were
   enriched in aminyl tRNA biosynthesis; pantothenic acid and coenzyme
   biosynthesis; and the valine, leucine and isoleucine biosynthesis
   pathways. Gut branched-chain amino acid (BCAA) levels were positively
   correlated with serum bilirubin levels, and the area under the receiver
   operating characteristic curve of the random forest classifier model
   based on BCAAs, proline, methionine, phenylalanine and total bilirubin
   reached 96.9%, showing good potential for diagnostic applications.
   Machine learning-based causal inference analysis revealed the causal
   effect of BCAAs on serum total bilirubin and NJ.

Conclusions

   Altered gut metabolites in neonates with jaundice showed that increased
   BCAAs and total serum bilirubin were positively correlated. BCAAs
   proline, methionine, phenylalanine are potential biomarkers of NJ.

1. Introduction

   Neonatal hyperbilirubinemia, also known as neonatal jaundice, Jaundice
   is common in the neonatal period, occurring in 60%–84% of full- and
   near-full-term neonates, and hyperbilirubinemia occurs in approximately
   8%–11% of newborns [44][1], accounting for 49.1% of hospitalized
   neonates [45][2]. Severe hyperbilirubinemia can lead to bilirubin
   encephalopathy and severe sequelae, imposing a heavy burden on society
   and families [46][3], [47][4].

   Studies on neonatal hyperbilirubinemia have shown that the
   enterohepatic circulation plays an important role in bilirubin
   excretion [48][5] and that the gut microbiota is involved in the
   development of several liver diseases via the gut-liver axis [49][6],
   [50][7]. We previously performed a metagenomic analysis of neonatal
   jaundice, which showed that changes in the gut microbiotas of patients
   with jaundice were mainly characterized by significantly decreased
   abundances of Bifidobacteria and galactose-metabolizing bacteria and
   suggested that Bifidobacteria may be involved in bilirubin metabolism
   via the galactose-metabolizing pathway [51][8]. Few studies have been
   published on neonatal metabolomics, and those studies mainly assessed
   metabolomics from serum samples [52][9]. Serum metabolite studies in
   neonates with jaundice have suggested abnormalities in amino acid
   metabolism in these patients [53][10] and identified biomarkers that
   can be used for early diagnosis of biliary atresia, a potential cause
   of jaundice [54][9], [55][11]. Studies on gut metabolomics in neonates
   with jaundice are lacking.

   Previous studies suggested that gut metabolites may play important
   roles in jaundice and that the gut microbiota is associated with serum
   bilirubin. In this study, we further investigated the possible
   mechanisms underlying the development of NJ and bilirubin
   encephalopathy by examining the differences in gut metabolomics between
   neonates with and without jaundice.

2. Methods

2.1. Participants and sample collection

   All patients were from a tertiary general hospital in Shenzhen, China.
   The diagnostic criteria for neonatal hyperbilirubinemia referred to the
   American Academy of Pediatrics Guidelines for Neonatal Jaundice
   Intervention [56][12] and the Expert Consensus on the Diagnosis and
   Treatment of Neonatal Hyperbilirubinemia of the Neonatology Group of
   the Chinese Medical Association Pediatrics Branch [57][13]. Inclusion
   criteria were no high-risk factors in the mother before birth and no
   fetal defecation after birth before enrollment. Exclusion criteria were
   mothers with high-risk factors, antibiotic use within 2 weeks before
   delivery, newborns who were younger than gestational age, and newborns
   with severe infections or congenital malformations confirmed after
   admission. The first meconium samples (3–5 g) excreted by the newborns
   included in the study after birth were collected in sterile containers
   by researchers with gloves, avoided inadvertent pollution, then were
   placed in a − 80℃ freezer immediately. The neonates were grouped into
   either the NJ or healthy control (HC) group based on their serum
   bilirubin levels such as TBIL: total bilirubin (UniCel DxC800 Synchron
   automatic biochemical analysis instrument from Beckman Coulter Co.,
   LTD) during hospitalization. Specimens from patients whose basic data
   (e.g., sex, gestational age, birth weight and birth mode) did not
   significantly differ (P > 0.05) were further analyzed. The hospital’s
   medical ethics committee approved the study protocol, which was
   performed in accordance with the Declaration of Helsinki. Each child's
   parents provided written informed consent.

2.2. Liquid chromatography-mass spectrometry (LC-MS)

   LC-MS was performed as previously described [58][14], [59][15]. The
   collected stool samples were freeze-dried to remove water, then
   approximately 30 mg of the stool was weighted and added to 600 µL of
   50% acetonitrile/water extract containing 5 µM chlorosulfonylurea
   (internal standard), mixed thoroughly and sonicated at room temperature
   for 30 min. To better remove impurities from the stool, an equal volume
   of the extract was added to 200 µL of supernatant from the first
   centrifugation, vortexed and centrifuged at 18,000 r/min for 25 min.
   The supernatant was then sampled for analysis. An Ultimate 3000 LC
   system (Thermo Scientific, Waltham, MA, USA) coupled with an Acquity
   UPLC HSS T3 column (2.1 mm × 100 mm, 1.8 µm; Waters Corporation,
   Milford, MA, USA) was used to separate the metabolites. MS was then
   performed using an Orbitrap Elite mass spectrometer (Thermo Scientific)
   in electron spray ionization-positive and -negative modes (ESI + and
   ESI − ) per the manufacturer's instructions.

2.3. Metabolomics analysis

   Regarding the methodology of metabolomic analysis, we mainly refer to
   our previous publication [60][14], [61][15] and the raw data were
   preprocessed by Compound Discoverer software (ThermoFisher Scientific,
   USA) for LC/MS data （detailed in [62]Supplementary Appendix),in short,
   the extracted data normalized to the sum of the peak area before
   analysis, multivariate statistical analysis was performed using SIMCA-P
   Software (Umetrics AB, Umea, Sweden), including PCA analysis, PLS-DA
   analysis and OPLS-DA analysis. Differential metabolites were screened
   by OPLS-DA model VIP (variable weight) value > 1 and T-test P value
   (P < 0.05). After screening the differential compounds, HMDB and KEGG
   database were used to match the corresponding mass-charge ratio
   (PPM < 10) to list the candidate compounds. Final matching and
   identification by the secondary fragment corresponding to the
   differential compound, Then using KEGG and MetaboAnalyst
   [63]https://www.metaboanalyst.ca/ [64]https://www.genome.jp/kegg/
   commercial database to analyze the metabolite pathways.

2.4. Machine learning and causal inference

   Machine learning was performed as previously described [65][14],
   [66][16] to determine which meconium metabolites could be used as
   neonatal hyperbilirubinemia biomarkers. As the random forest method
   allows for ranking the importance of the selected features.We used the
   Random Forest Classifier function of scikit learn (version 0.23.1) to
   determine the importance of the meconium metabolites. We used the
   train_test_split function (parameter, test_size = 0.4) to split the
   samples into training and validation sets, then used Random Forest
   Classifier to train and validate the model for neonatal
   hyperbilirubinemia classification and used the roc_curve function to
   plot the receiver operating characteristic (ROC) curve to obtain the
   area under the ROC (AUROC).

   Machine learning-based causal inference was performed using the
   Microsoft DoWhy ([67]https://github.com/microsoft/dowhy) and EconML
   ([68]https://github.com/econml/) libraries following the software
   manual as detailed in another manuscript [69][16] (detailed in
   [70]Supplementary Appendix). Briefly, meconium metabolites might lead
   to NJ was encoded into a causal model and represented by a graph, with
   each arrow in the graph indicating a causal relationship. Second,
   Dowhy's backdoor.linear_regression method was used to check whether
   meconium metabolites could estimate thelevel of TBIL. Third, EconML's
   machine-learning method was used to construct the estimator using
   gradient-boosting trees to learn the relationship between the outcome
   and confounders and the relationship between the intervention and
   confounders and finally compare the residuals between the outcome and
   intervention. Finally, placebo_treatment_refuter and
   data_subset_refuter tests were used to evaluate the model’s robustness.

3. Results

3.1. Gut metabolomics clearly distinguished neonates with jaundice from HCs

   Sixty-eight neonates with hyperbilirubinemia were included in this
   study: 38 males and 30 females, of whom, 55 were delivered vaginally,
   and 13 were delivered via cesarean section, mean birth weight of
   (3144 ± 386) g, mean TBIL of (275.0 ± 64.0) umol/L.Sixty-eight matched
   HCs were also included: 35 males and 33 females, of whom, 44 were
   delivered vaginally, and 24 were delivered via cesarean section, mean
   birth weight of (3096 ± 466) g, mean TBIL of (150.6 ± 40.7) umol/L
   ([71]Supplementary Table 1). There was no significant difference
   between the two groups except for serum bilirubin (P < 0.05).

   The OPLS-DA model was built, and the R^2 and Q^2 values were used to
   test the overfitting of the model and assess its statistical
   significance. The original model (R2Y) was closer to 1, indicating that
   the established model was more consistent with the real situation of
   the sample data. The original model (Q^2) was close to 0.5, indicating
   that adding a new sample to the model yielded a more approximate
   distribution, and the original model better explained the differences
   between the two sample groups ([72]Fig. 1A). Thus, the original model
   had good robustness, with no overfitting, and the fecal metabolomic
   analysis results showed good stability for the samples and instruments
   in both positive- and negative-ion mode. The metabolites clearly
   distinguished NJ from HC ([73]Fig. 1B).

Fig. 1.

   [74]Fig. 1
   [75]Open in a new tab

   OPLS-DA model to evaluate metabolomic data. A, Permutation test of the
   OPLS-DA model for the NJ vs HC groups. The original model (R2Y) was
   closer to 1, indicating that the established model was more consistent
   with the real situation of the sample data. The original model (Q^2)
   was close to 0.5, indicating that adding a new sample to the model
   yielded a more approximate distribution and that the original model
   better explained the differences between the two sample groups. B,
   Scatter plot of the OPLS-DA model scores for the NJ vs. HC groups; the
   two groups of sample metabolites can be clearly distinguished.

   Based on the good robustness of the study model and the likelihood that
   metabolites can clearly distinguish the NJ group from the HC group, we
   further visualized the differential metabolites. First, we measured the
   relative levels of the metabolites at the same level based on z-scores
   ([76]Fig. 2A), and the metabolite groups varied largely across the
   groups. The z-scores ranged from − 2 to 8 relative to those of the HC
   group.

Fig. 2.

   [77]Fig. 2
   [78]Open in a new tab

   Visualization of metabolites that differed significantly between the NJ
   and HC groups. A, z-score plots showing the extent of variation in the
   differentially significant metabolites between the NJ and HC groups.
   z-score plots show that the metabolites were highly variable across the
   groups, with z-scores ranging from − 2 to 8 relative to those of the
   HCs. B, Volcano diagram showing the metabolites that differed
   significantly between the NJ and HC groups. Each point represents a
   metabolite; the horizontal coordinate represents the fold change of the
   group comparing each substance (taken as the logarithm with a base of
   2). The vertical coordinate represents the P-value of the Student's
   t-test (taken as the negative logarithm with a base of 10), and the
   scatter size represents the VIP value of the OPLS-DA model, with a
   larger scatter indicating a larger VIP value. The scatter color
   represents the final screening results, with significantly upregulated
   metabolites in red, significantly downregulated metabolites in blue,
   and non-significantly different metabolites in gray. (For
   interpretation of the references to color in this figure legend, the