Abstract

   Insight into the metabolic biosignature of tuberculosis (TB) may inform
   clinical care, reduce adverse effects, and facilitate
   metabolism-informed therapeutic development. However, studies often
   yield inconsistent findings regarding the metabolic profiles of TB.
   Herein, we conducted an untargeted metabolomics study using plasma from
   63 Korean TB patients and 50 controls. Metabolic features were
   integrated with the data of another cohort from China (35 TB patients
   and 35 controls) for a global functional meta-analysis. Specifically,
   all features were matched to a known biological network to identify
   potential endogenous metabolites. Next, a pathway-level gene set
   enrichment analysis-based analysis was conducted for each study and the
   resulting p-values from the pathways of two studies were combined. The
   meta-analysis revealed both known metabolic alterations and novel
   processes. For instance, retinol metabolism and cholecalciferol
   metabolism, which are associated with TB risk and outcome, were altered
   in plasma from TB patients; proinflammatory lipid mediators were
   significantly enriched. Furthermore, metabolic processes linked to the
   innate immune responses and possible interactions between the host and
   the bacillus showed altered signals. In conclusion, our
   proof-of-concept study indicated that a pathway-level meta-analysis
   directly from metabolic features enables accurate interpretation of TB
   molecular profiles.

Introduction

   Tuberculosis (TB) is a devastating infectious disease, and an estimated
   1.7 billion people are latently infected globally [[38]1]. Despite
   extensive efforts, TB remains a leading cause of mortality worldwide,
   especially in countries where it is endemic. According to the World
   Health Organization Global Report (2020), there were around 10 million
   newly diagnosed TB patients in 2019, and approximately 1.4 million
   deaths [[39]2]. TB has a broad pathophysiological spectrum, hampering
   eradication efforts [[40]3]. A holistic model based on high-dimensional
   data is required to describe host-response endotype characteristics in
   general, and the TB immune endotype in particular. Specifically, -omics
   technologies have facilitated the discovery of clinically useful
   biomarkers for risk assessment, diagnosis, and prediction of clinical
   events. For instance, after performing a comprehensive analysis of
   plasma pulmonary TB samples, and samples from community-acquired
   pneumonia patients, lung cancer patients and normal controls, Huang et
   al. introduced xanthine, 4-pyridoxate, and d-glutamic acid as potential
   biomarkers [[41]4]. Sweeney3 (GBP5, DUSP3, KLF2), a host-response-based
   gene signature, met the criteria of the World Health
   Organization/Foundation for Innovative New Diagnostics target product
   profile for a non-sputum-based triage test [[42]5]. Comprehensive
   -omics data and appropriate analytical methods enable investigation of
   drug efficacy, personalized dosing, prediction of relapse-free cure,
   and phenotypic drug susceptibility testing, as aspects of personalized
   precision medicine [[43]6–[44]8].

   Studies of host-response transcriptome biosignatures have achieved
   considerable success in terms of stratifying TB patients for the
   purposes of risk prediction [[45]9], diagnosis [[46]10], treatment
   monitoring, outcome prediction [[47]11], and recurrence prediction
   [[48]12]. Blood metabolic responses have also been tracked based on the
   “blood metabolic signature,” which partially reflects the interaction
   between the human body and Mycobacterium tuberculosis (M. tuberculosis)
   bacilli. The metabolic responses of TB patients may aid predictions of
   risk, diagnosis, and outcomes, as well as treatment monitoring
   [[49]13]. Integrating multi-omics data with clinical information could
   facilitate host-directed therapy for TB; for example, TB meningitis
   [[50]14]. However, the usefulness of the serum and plasma metabolomic
   analysis for tracking the blood metabolic signature of TB across
   populations is unclear. Moreover, the variability in study designs and
   limited guidelines for the use of omics technologies in clinical
   research could lead to less reliable data, complicated analyses, and
   missed biological signals [[51]15]. Therefore, rigorous designs are
   required for the reproducibility of–omics studies.

   In computationally functional interpretation, a set of genes or
   metabolites associated with a phenotype of interest is typically
   identified by a statistical test. Next, it is compared with a
   predefined database of biological functions, which returns enriched
   scores, for which p-values and/or q-values are calculated. Its
   fundamental principle comprises over-representation analyses. Gene set
   enrichment analysis (GSEA) utilizes a metric representing the overall
   ranks of features (e.g., t-score or fold-change) to find “significantly
   coordinated reposition” of the association strength based on a database
   of genes or metabolites sharing biological functions [[52]16]. GSEA has
   been used extensively in transcriptomics studies, but comparatively
   infrequently in metabolomics. Despite its ability to obtain profound
   information from samples, untargeted metabolomics has not met
   expectations in terms of providing mechanistic insight into the
   metabolic alterations of phenotypes of interest. This is primarily due
   to the difficulty of compound annotation and identification. Generally,
   a tiny fraction of ions can be assigned to metabolites with an
   acceptable level of confidence. The limited ability to define
   metabolites hampers subsequent functional interpretation. The situation
   has gradually improved since the introduction of the Mummichog
   algorithm [[53]17]. Overall, Mummichog leverages known metabolic
   networks to map all potentially molecular relevant metabolites. Hence,
   it allows a test of representation in which potentially valid
   metabolites are over-represented in a pathway, whereas others are
   randomly distributed to a metabolic network. This allows rapid
   assessment of potential alterations in a phenotype of interest in a
   hypothesis-generating study using untargeted metabolomics data
   [[54]18].

   Meta-analysis for pathway enrichment analysis or pathway-level
   meta-analysis is a powerful approach for capturing the biological
   signatures of a particular phenotype of interest across studies with
   heterogeneous settings [[55]19]. Pathway-level meta-analysis of
   metabolomics data using GSEA has recently become feasible [[56]20].
   Using a computational method to predict functional activities from
   metabolic features and a pathway-level enrichment meta-analysis using
   GSEA may provide insight into the metabolic biosignature of phenotypes
   of interest.

   Problematic reproducibility and minimal overlap of metabolic features
   across studies have hampered the investigation of the metabolic
   alterations in TB. There is an urgent need to develop a strategy to
   reliably capture the global metabolic biosignature of TB. Herein, we
   conducted a pathway-level GSEA-based meta-analysis of two pulmonary TB
   untargeted metabolomics data sets from South Korea and China. The
   analysis is a proof-of-concept of the ability of metabolomics
   meta-analysis using metabolic features to identify metabolic
   alterations in pulmonary TB.

Results

Data exploration reveals considerable metabolic feature changes in TB
patients

   We performed principal component analysis to examine and visualize the
   untargeted metabolomics data of the two studies in positive and
   negative ion modes. The three-dimensional score plots of cPMTb
   (positive ion mode), ST001231 (positive ion mode), cPMTb (negative ion
   mode), and ST001231 (negative ion mode) suggested comparatively clear
   separation of TB patients from their counterparts ([57]Fig 1A–1D).
   However, the TB patients and cPMTb controls (NC) had a higher level of
   metabolome similarity than in ST001231.

Fig 1. Data exploration and visualization.

   [58]Fig 1
   [59]Open in a new tab

   Principal component analysis of (A) cPMTb positive ion mode, (B)
   ST001231 positive ion mode, (C) cPMTb negative ion mode, and (D)
   ST001231 negative ion mode. TB, tuberculosis; NC, normal control.

   Unpaired t-tests were also conducted, and the features were visualized
   using volcano plots. At a significance level of 0.05, only 218 features
   were upregulated, while 142 were downregulated in the TB group of cPMTb
   (positive ion mode, S1A Fig in [60]S1 File). By contrast, 549
   upregulated features and 412 downregulated features were found in the
   TB group in ST001231 (positive ion mode, S1B Fig in [61]S1 File).
   Notably, few features had a high fold change in cPMTb, whereas many
   features had a high fold change in ST001231. Similar patterns were
   observed in negative ion mode (S1C and S1D Fig in [62]S1 File).

TB patients have distinct metabolome profiles

   Partial least-squares discriminant analysis and random forest analysis
   were used to examine whether the metabolic profiles could be used to
   classify TB patients and controls. In positive ion mode, the partial
   least-squares discriminant analysis models possessed excellent
   discriminatory performance. In particular, the optimal model in the
   cPMTb study contained five principal components with an accuracy,
   goodness-of-fit (R^2), and goodness-of-prediction (Q^2) of 0.90, 0.93,
   and 0.63, respectively ([63]Fig 2A). Likewise, the optimal model in
   ST001231, which had four principal components, had an accuracy, R^2,
   and Q^2 of 1.00, 1.00, and 0.96, respectively ([64]Fig 2B). Similar
   performance was observed in negative ion mode: cPMTb (accuracy, 0.94;
   R^2, 0.96; Q^2, 0.70) and ST001231 (accuracy, 1.00; R^2, 1.00; Q^2,
   0.91) ([65]Fig 2C and 2D). Remarkably, there were marked differences in
   metabolic profiles between TB patients and NCs in ST001231. The
   out-of-bag errors of the four random forest models were 0.12, 0.00,
   0.13, and 0.00 for cPMTb (positive ion mode), ST001231 (positive ion
   mode), cPMTb (negative ion mode), and ST001231 (negative ion mode),
   respectively (S2 Fig in [66]S1 File). These analyses collectively
   indicated that TB patients possess a distinct metabolome profile,
   compared with controls.

Fig 2. Partial least-squares discriminant analysis.

   [67]Fig 2
   [68]Open in a new tab

   (A) cPMTb positive ion mode. (B) ST001231 positive ion mode. (C) cPMTb
   negative ion mode. (D) ST001231 negative ion mode. * optimal value of
   Q^2; TB, tuberculosis; NC, normal control.

Profound plasma metabolic alterations of pulmonary TB patients

   Pathway-level meta-analysis was conducted separately in positive and
   negative ion modes. In positive ion mode, the meta-analysis revealed
   that 15 pathways had a combined p-value of < 0.05. They belonged to
   metabolic homeostasis, proinflammatory processes, and vitamin
   metabolism. The five pathways with the lowest combined p-values were
   “carnitine shuttle,” “vitamin A (retinol) metabolism,” “pentose
   phosphate pathway,” “purine metabolism,” and “pentose and glucuronate
   interconversions” ([69]Fig 3A). Notably, only two pathways were
   significant in both individual studies among the significant pathways
   in the meta-analysis: “carnitine shuttle” and “vitamin A (retinol)
   metabolism.” Several pathways, including “pentose and glucuronate
   interconversions,” “hyaluronan metabolism,” and “fructose and mannose
   metabolism”—were enriched only in the cPMTb study (A1pos). By contrast,
   pathways, such as “sialic acid metabolism,” “purine metabolism,” and
   “androgen and estrogen biosynthesis and metabolism”—were significant
   only in ST001231 (B1pos). The heterogeneity of significant pathways
   among studies might be due to their relatively small sample sizes,
   sample heterogeneity, and use of different LC-MS platforms. More
   details are shown in S1 Table in [70]S2 File.

Fig 3. Pathway meta-analysis by gene set enrichment analysis.

   [71]Fig 3
   [72]Open in a new tab

   (A) Positive ion mode. (B) Negative ion mode. The enrichment factor of
   a pathway was calculated by dividing its number of significant hits by
   the expected number of hits.

   Analysis of the data in negative ion mode yielded a greater number of
   significant pathways. Indeed, 24 pathways had a combined p-value of <
   0.05 in the meta-analysis. Similar to the enriched pathways in positive
   ion mode, these belonged to proinflammatory processes, vitamin
   metabolism, metabolic homeostasis, amino acid-related metabolism, and
   some potentially novel pathways. Five of the pathways with the lowest
   combined p-values were “glycolysis and gluconeogenesis,” “pyruvate
   metabolism,” “fructose and mannose metabolism,” “vitamin D3
   (cholecalciferol) metabolism,” and “de novo fatty acid biosynthesis”
   ([73]Fig 3B). Among the significant pathways in the meta-analysis,
   eight were significantly enriched in both studies—the above-mentioned
   five pathways and “bile acid biosynthesis,” “arachidonic acid
   metabolism,” and “vitamin A (retinol) metabolism.” Seven pathways were
   significantly enriched only in cPMTb (A1neg), including “leukotriene
   metabolism,” “galactose metabolism,” “C21-steroid hormone biosynthesis
   and metabolism,” and “sialic acid metabolism.” In contrast, eight
   pathways—including “propanoate metabolism,” several amino acid-related
   pathways and “vitamin B3 (nicotinate and nicotinamide) metabolism”—were
   enriched only in ST001231 (B1neg). More details are shown in S2 Table
   in [74]S2 File.

Discussion

   Meta-analysis enhances statistical power, reliability, and
   generalizability, especially in high-throughput data settings [[75]21].
   A feature-level meta-analysis provides more comprehensive information
   than secondary pooled analyses of a limited number of identified
   metabolites. Moreover, as mentioned above, metabolite identification
   remains a fundamental issue in metabolomics [[76]22]. An analysis that
   forgoes metabolite identification significantly reduces the time (i.e.,
   from days to hours) required to obtain valuable insights and derive
   actionable targets for the phenotype of interest. Therefore, we could
   focus more on the validation of potential biomarkers and the
   performance of experiments to delineate molecular mechanisms of
   disease.

   In this study, a pathway-level GSEA-based meta-analysis of two
   pulmonary TB untargeted metabolomics data sets was conducted. The two
   included data sets had a significant degree of heterogeneity in
   clinical characteristics, which might affect the number of enriched
   pathways. Nevertheless, the meta-analysis provided considerable insight
   into global metabolic alterations in plasma from pulmonary TB patients.
   The results are pathophysiologically comparable with previous findings
   using conventional targeted methods in addition to novel metabolic
   alterations. The analysis is capable of suggesting biological processes
   that may be significantly influenced by the clinical characteristics of
   a cohort. Furthermore, the findings suggested that functional
   interpretation of metabolomics data at the pathway level can provide
   insights into the molecular signatures of TB patients. Importantly,
   biological speculations at the level of individual metabolites exhibit
   human-centric bias [[77]23]. Below we discuss some of the most
   important findings.

   “Vitamin A (retinol) metabolism” and “vitamin D3 (cholecalciferol)
   metabolism” were altered in the meta-analysis. Vitamin A deficiency is
   reportedly associated with an increased risk of incident TB among
   household contacts [[78]24]. Vitamin A supplementation may boost
   immunity against TB [[79]25], and vitamin A and zinc co-supplementation
   may improve outcomes [[80]26]. Vitamin D3 deficiency is a risk factor
   for TB. Vitamin D3 supplementation may be associated with immune
   activation, and thus should improve treatment outcomes; however, this
   requires validation [[81]26]. In addition, the “bile acid biosynthesis”
   and “purine metabolism” pathways were significantly altered in our TB
   patients compared to controls. These pathways may also be involved in
   host defense. Indeed, some bile acids inhibit the in vitro growth of M.
   tuberculosis [[82]27]. Bile acid derivatives are also potential anti-TB
   agents [[83]28], and purine metabolism in M. tuberculosis is a target
   for drug development [[84]29, [85]30]. Furthermore, together with lipid
   metabolism, these pathways are reportedly linked to anti-TB
   drug-induced hepatotoxicity [[86]31].

   Notably, we observed significant systemic changes in the host (i.e.,
   disease phenotype) due to TB infection. Proinflammatory lipid mediators
   and pro-resolving lipid mediators are associated with TB and strongly
   associated with TB comorbid type 2 diabetes. The arachidonic
   acid-derived leukotriene and prostaglandin families were reported to be
   the most abundant proinflammatory lipid mediators [[87]32]. Our pathway
   analysis revealed significant enrichment of “arachidonic acid
   metabolism,” “leukotriene metabolism,” and “prostaglandin formation
   from arachidonate.” We also found various processes related to
   nutrients and oxidative stress, including “pyruvate metabolism,”
   “fructose and mannose metabolism,” “glycolysis and gluconeogenesis,”
   “de novo fatty acid biosynthesis,” and the metabolism of several amino
   acids. These findings are concur with a previous report that metabolic
   processes are involved in adaptations and/or interactions of the host
   and microbe during infection [[88]33]. Medium-chain fatty acids are
   involved in protective immunity against M. tuberculosis [[89]34].
   Additionally, alteration of “pyruvate metabolism” might be linked to
   the increased catabolism and/or energy consumption observed in TB
   patients [[90]35]. “Fructose and mannose metabolism” and “glycolysis
   and gluconeogenesis” in M. tuberculosis are reportedly affected by
   nutrient starvation. In addition, they are linked to central carbon
   metabolism, which is essential for the maintenance of metabolic
   homeostasis in M. tuberculosis [[91]36]. For example, mycobacteria in
   phagosomes took up exogenous pyruvate more efficiently than glucose and
   the pyruvate was used as a carbon source for intracellular growth
   [[92]37].

   We also found some potentially important pathways associated with TB.
   In a study of the innate immune responses to M. tuberculosis using
   macrophages, Blischak et al. found a subset of genes specifically
   involved in infection, including protein-coding genes related to the
   regulation of sialic acid synthesis [[93]38]. We found that “sialic
   acid metabolism” was altered in plasma from TB patients, and Isa et al.
   [[94]39] reported an altered level of sialic acid in urine. Further
   studies are warranted to explore the role of sialic acid metabolism and
   the associated glycoproteins in the immune response, to understand the
   susceptibility of TB and potential therapeutic targets. “C21-steroid
   hormone biosynthesis and metabolism” was significantly changed in TB
   patients, and may be associated with pathological processes (e.g., host
   defense against TB infection) [[95]40]. Finally, the roles of other
   pathways showing alterations in TB patients, such as “Vitamin B3
   (nicotinate and nicotinamide) metabolism,” “Propanoate metabolism,” and
   “androgen and estrogen biosynthesis and metabolism,” remain to be
   elucidated.

   This study had some limitations. First, the analysis was conducted with
   only two untargeted metabolomics data sets. The lack of data might
   impede the identification of subtle TB-associated metabolic
   disturbances. Second, similar to a recent study [[96]41], the pathway
   annotations require validation. Nonetheless, the analysis validated the
   available pathological and biological evidence, suggesting its
   reliability. Third, blood-derived metabolomics studies cannot directly
   elucidate in vivo growth mechanisms or the mode of action of anti-TB
   drugs [[97]13]. Instead, they are more suitable for applications
   related to host systematic molecular alterations. Finally, post hoc
   metabolite identification and individual quantification are required to
   evaluate the associations of metabolites with clinical TB
   manifestations.

Conclusions

   We showed that pathway meta-analysis of several studies can overcome
   cross-study inconsistency by increasing the power and generalizability
   of the results. In addition, pathologically comparable and novel
   metabolic alterations in plasma from pulmonary TB patients were
   described. Subsequent studies are needed to leverage these findings to
   discover novel diagnostic biomarkers, metabolism-informed clinical
   care, and metabolism-informed therapeutic development.

Materials and methods

Institutional review board statement

   This study was approved by the Institutional Review Board of Korea
   University Guro Hospital (2017GR0012). All investigations were
   conducted in accordance with the principles of the Declaration of
   Helsinki. Informed consent was obtained from all subjects involved in
   the cPMTb study. Patients provided written informed consent for
   analysis of their blood and clinical data.

Korean tuberculosis cohort characteristics

   The samples used in this study were part of a multi-center TB cohort
   entitled Center for Precision Medicine for Tuberculosis (cPMTb). The
   biospecimens and data used for this study were provided by the Biobank
   of Korea University Guro Hospital, a member of Korea Biobank.
   Individuals with human immunodeficiency virus infection, chronic renal
   disease, chronic liver diseases, chronic lung diseases, and malignant
   diseases were excluded from the analysis. Eventually, plasma samples
   from 63 clinically diagnosed pulmonary TB patients and 50 normal
   controls were collected.

   In the TB group, the mean age (± standard deviation) was 55 (± 16)
   years and 27% of the patients were women. Forty-eight patients (76%)
   had positive sputum smears and 14 patients (22%) had a chest cavity
   image on x-ray. In the controls, the mean age (± standard deviation)
   was 60 (± 10) years, and 58% of the controls were women.

Chinese tuberculosis cohort characteristics

   We downloaded data from TB patients and NC (Metabolomics Workbench,
   study ID ST001231) for the pathway-level meta-analysis with the cPMTb
   cohort to elucidate the metabolic profiles of pulmonary TB. In brief,
   the study involved 70 plasma samples of pulmonary TB (35 samples) and
   NC (35 samples). In the TB group, the age ranged from 18 to 64 years
   and 49% of the patients were women; of the patients, 86% had positive
   sputum smears and 17% had a chest cavity image on x-ray. The age of the
   NC group ranged from 23 to 60 years, and 31% of the controls were
   women. The untargeted metabolomics study was carried out by
   ultra-high-performance liquid chromatography coupled with Q Exactive
   mass spectrometer in positive and negative ion modes. More details are
   provided in the original publication [[98]4].

Chemicals and reagents

   High-performance liquid chromatography-grade water, methanol, and
   acetonitrile (ACN) were from J.T. Baker (Phillipsburg, NJ, USA).
   Analytical-grade formic acid and ammonium acetate and the internal
   standard (cholic acid-d5), were purchased from Toronto Research
   Chemicals (Toronto, Canada). Authentic chemicals for establishing the
   in-house database were purchased from Sigma-Aldrich (St. Louis, MO,
   USA).

Sample preparation

   Blood samples were collected routinely on the day of enrollment in the
   overnight-fasted and medication-free state before treatment. Plasma was
   prepared by centrifuging the whole blood for 10 min at 4,500 rpm and
   stored at -80°C until analysis.

   The extraction of metabolites from plasma was conducted in accordance
   with our established protocol [[99]42]. In brief, 50 μL of plasma were
   mixed with 150 μL of ACN containing 5 μg/mL cholic acid-d5 in a
   microcentrifuge Eppendorf tube. The mixture was vigorously vortexed for
   5 min and centrifuged for 10 min at 13,000 rpm at 4°C; the supernatant
   was collected. An equal amount of each sample was collected and mixed
   to create a pooled quality control (QC) sample. All extracts were
   stored at -20°C and subsequently analyzed using a high-performance
   liquid chromatography quadrupole time of flight mass spectrometer.

Instrumental conditions for untargeted metabolomics

   The analysis was conducted as described previously with an Agilent 1200
   series high-performance liquid chromatography (Agilent Technologies,
   Santa Clara, CA, USA) coupled to a 6530 Q-TOF mass spectrometer
   (Agilent Technologies) [[100]42]. The autosampler was set at 4°C for
   all procedures. In positive ion mode, ACQUITY UPLC BEH C18 (100 × 2.1
   mm, 1.7 μm; Waters) was maintained at 40°C, and metabolite separation
   was conducted by binary gradient elution with a flow rate of 0.4
   mL/min. Mobile phase A was water with 0.1% formic acid; mobile phase B
   was ACN with 0.1% formic acid. The gradient was 0 min, 2% B; 1 min, 2%
   B; 3 min, 20% B; 8 min, 90% B; 14 min, 90% B; 14.5 min, 2% B; 18 min,
   2% B. Essential mass spectrometer parameters are given in S3 Table in
   [101]S2 File. In negative ion mode, the ZIC-HILIC column (100 × 2.1 mm,
   3.5 μm; Merck, Darmstadt, Germany) was maintained at 35°C, and
   metabolite separation was conducted by binary gradient elution with a
   flow rate of 0.5 mL/min. Mobile phase A was ACN/water (5:95, v/v) with
   10 mM ammonium acetate; mobile phase B was ACN/water (95:5, v/v) with
   10 mM ammonium acetate. The gradient was 0 min, 99% B; 1 min, 99% B; 15
   min, 50% B; 17 min, 50% B; 17.1 min, 99% B; and 22 min, 99% B. The mass
   spectrometer was operated using equivalent conditions to positive ion
   mode.

Data preprocessing and alignment

   The generated *.d raw files were converted to mzML files using
   ProteoWizard [[102]43]. The mzML files were then submitted to MS-DIAL
   (version 4.60) [[103]44] for peak detection, alignment, and annotation.
   Essential data processing parameters are given in S4 Table in [104]S2
   File. Features with sample average signals lower than fivefold above
   the blank average were removed. LOWESS signal correction across batches
   was applied to the aligned data set. Before subsequent statistical
   analyses, features with a relative standard deviation of ≥ 20% in QC
   samples were removed. Features with missing values in ≥ 50% of samples
   were also removed, otherwise imputed using feature-wise k-nearest
   neighbors. Finally, normalized and filtered features were
   log-transformed and Pareto scaled. Post-processing data treatment were
   conducted using MetaboAnalyst 5.0 [[105]20].

   The *.raw files from ST001231 were submitted directly to MS-DIAL
   (version 4.60) for peak detection, alignment, and annotation. Data
   processing parameters are given in S4 Table in [106]S2 File. Because
   there were no blank samples, no feature removal based on blank
   information was applied. Features with missing values in ≥ 50% of the
   samples were removed, otherwise imputed using feature-wise k-nearest
   neighbors. Features with a relative standard deviation of ≥ 20% in QC
   samples were removed. Quantile normalization was employed for
   cross-sample normalization. Finally, the data were log-transformed and
   Pareto-scaled before subsequent analyses.

   Normalized data are provided in S5-S8 Tables in [107]S2 File.

Data exploration and visualization

   Principal component analysis was conducted to reduce data
   dimensionality, thus facilitating exploration and visualization of the
   data. The principal component analysis aims to find an orthogonal basis
   (or new axes) that can explain data variability and project
   observations onto a smaller subspace. In our study, e[1], e[2], e[3]
   were the new axes (or eigenvectors corresponding to the three largest
   eigenvalues of the sample covariance matrix) and each observation
   [MATH: <mi>x</mi><mo>∈</mo><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>p</mi></mrow></msup>
   :MATH]
   was converted to a vector (x^Te[1], x^Te[2], x^Te[3]) and plotted in
   three-dimensional space.

Statistical analysis

   Multiple statistical methods were used to analyze the untargeted
   metabolomics data. For univariate analysis, unpaired t-tests were used.
   The adjusted p-value following the Benjamini-Hochberg procedure (i.e.,
   a false discovery rate of 0.05) was used as the significance level.
   Partial least-squares discriminant analysis and random forest analysis
   (number of trees, 500; number of predictors, 50) were used to examine
   the class discrimination (i.e., TB and the counterpart) using
   metabolomics data. A 10-fold cross-validation procedure was used to
   measure classification performance.

Pathway-level meta-analysis of metabolic features

   The normalized and transformed data that contained m/z values,
   retention time (in seconds), and peak intensity for each ion mode were
   subjected to pathway-level meta-analysis. Before the pathway-level
   integration, the following calculations were performed: individual m/z
   statistics (i.e., t-test); putative metabolite annotation (mass
   tolerance, 10 ppm); and pathway prediction. Next, the p-values from
   individual studies were combined using Fisher’s method. Given
   individual p-values p[i] from the ith hypothesis i = 1, …, n, the
   method aggregated them by:
   [MATH:
   <msup><mrow><mi>X</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><m
   o>-</mo><mn>2</mn><mrow><munderover><mo
   stretchy="false">∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow
   ><mi>n</mi></mrow></munderover><mrow><mrow><mrow><mtext>log</mtext><msp
   ace
   width="4pt"></mspace></mrow><mrow><msub><mrow><mi>p</mi></mrow><mrow><m
   i>i</mi></mrow></msub></mrow></mrow></mrow></mrow><mo>,</mo> :MATH]

   which follows the chi-square distribution with degrees of freedom 2n
   under the null hypotheses.

   GSEA was used for the pathway-level enrichment algorithm. In brief, the
   algorithm ranked all genes in data based on t-statistics and compared
   them to a prespecified gene set (or pathway), termed S. If top-ranked
   genes (i.e., large t-statistics) had many overlaps with S, such that
   the enrichment score increased, then S was regarded as an active
   pathway. The Homo sapiens (human) [MFN] (combined KEGG, BiGG, and
   Edinburgh) was used as the pathway library for analysis. A pathway with
   a combined p-value of < 0.05 was considered statistically significant.

Supporting information

   S1 File. Volcano plots of metabolic features and the random forest
   classification of the two studies.

   (DOCX)
   [108]Click here for additional data file.^ (1.3MB, docx)
   S2 File. Supplementary method and materials and data.

   (XLSX)
   [109]Click here for additional data file.^ (9.8MB, xlsx)

Data Availability

   All relevant data are within the manuscript and its [110]Supporting
   information files.

Funding Statement

   This research was supported in part by the Bio & Medical Technology
   Development Program of the National Research Foundation (NRF) funded by
   the Korean government (MSIT) (No. 2019M3E5D1A01068994), and the
   National Research Foundation of Korea (NRF) grant funded by the Korean
   government (MSIT) (grant No. 2018R1A5A2021242). The funders did not
   influence study design, data collection, data analysis and
   interpretation, and the manuscriptșs content.

References