Abstract
Insight into the metabolic biosignature of tuberculosis (TB) may inform
clinical care, reduce adverse effects, and facilitate
metabolism-informed therapeutic development. However, studies often
yield inconsistent findings regarding the metabolic profiles of TB.
Herein, we conducted an untargeted metabolomics study using plasma from
63 Korean TB patients and 50 controls. Metabolic features were
integrated with the data of another cohort from China (35 TB patients
and 35 controls) for a global functional meta-analysis. Specifically,
all features were matched to a known biological network to identify
potential endogenous metabolites. Next, a pathway-level gene set
enrichment analysis-based analysis was conducted for each study and the
resulting p-values from the pathways of two studies were combined. The
meta-analysis revealed both known metabolic alterations and novel
processes. For instance, retinol metabolism and cholecalciferol
metabolism, which are associated with TB risk and outcome, were altered
in plasma from TB patients; proinflammatory lipid mediators were
significantly enriched. Furthermore, metabolic processes linked to the
innate immune responses and possible interactions between the host and
the bacillus showed altered signals. In conclusion, our
proof-of-concept study indicated that a pathway-level meta-analysis
directly from metabolic features enables accurate interpretation of TB
molecular profiles.
Introduction
Tuberculosis (TB) is a devastating infectious disease, and an estimated
1.7 billion people are latently infected globally [[38]1]. Despite
extensive efforts, TB remains a leading cause of mortality worldwide,
especially in countries where it is endemic. According to the World
Health Organization Global Report (2020), there were around 10 million
newly diagnosed TB patients in 2019, and approximately 1.4 million
deaths [[39]2]. TB has a broad pathophysiological spectrum, hampering
eradication efforts [[40]3]. A holistic model based on high-dimensional
data is required to describe host-response endotype characteristics in
general, and the TB immune endotype in particular. Specifically, -omics
technologies have facilitated the discovery of clinically useful
biomarkers for risk assessment, diagnosis, and prediction of clinical
events. For instance, after performing a comprehensive analysis of
plasma pulmonary TB samples, and samples from community-acquired
pneumonia patients, lung cancer patients and normal controls, Huang et
al. introduced xanthine, 4-pyridoxate, and d-glutamic acid as potential
biomarkers [[41]4]. Sweeney3 (GBP5, DUSP3, KLF2), a host-response-based
gene signature, met the criteria of the World Health
Organization/Foundation for Innovative New Diagnostics target product
profile for a non-sputum-based triage test [[42]5]. Comprehensive
-omics data and appropriate analytical methods enable investigation of
drug efficacy, personalized dosing, prediction of relapse-free cure,
and phenotypic drug susceptibility testing, as aspects of personalized
precision medicine [[43]6–[44]8].
Studies of host-response transcriptome biosignatures have achieved
considerable success in terms of stratifying TB patients for the
purposes of risk prediction [[45]9], diagnosis [[46]10], treatment
monitoring, outcome prediction [[47]11], and recurrence prediction
[[48]12]. Blood metabolic responses have also been tracked based on the
“blood metabolic signature,” which partially reflects the interaction
between the human body and Mycobacterium tuberculosis (M. tuberculosis)
bacilli. The metabolic responses of TB patients may aid predictions of
risk, diagnosis, and outcomes, as well as treatment monitoring
[[49]13]. Integrating multi-omics data with clinical information could
facilitate host-directed therapy for TB; for example, TB meningitis
[[50]14]. However, the usefulness of the serum and plasma metabolomic
analysis for tracking the blood metabolic signature of TB across
populations is unclear. Moreover, the variability in study designs and
limited guidelines for the use of omics technologies in clinical
research could lead to less reliable data, complicated analyses, and
missed biological signals [[51]15]. Therefore, rigorous designs are
required for the reproducibility of–omics studies.
In computationally functional interpretation, a set of genes or
metabolites associated with a phenotype of interest is typically
identified by a statistical test. Next, it is compared with a
predefined database of biological functions, which returns enriched
scores, for which p-values and/or q-values are calculated. Its
fundamental principle comprises over-representation analyses. Gene set
enrichment analysis (GSEA) utilizes a metric representing the overall
ranks of features (e.g., t-score or fold-change) to find “significantly
coordinated reposition” of the association strength based on a database
of genes or metabolites sharing biological functions [[52]16]. GSEA has
been used extensively in transcriptomics studies, but comparatively
infrequently in metabolomics. Despite its ability to obtain profound
information from samples, untargeted metabolomics has not met
expectations in terms of providing mechanistic insight into the
metabolic alterations of phenotypes of interest. This is primarily due
to the difficulty of compound annotation and identification. Generally,
a tiny fraction of ions can be assigned to metabolites with an
acceptable level of confidence. The limited ability to define
metabolites hampers subsequent functional interpretation. The situation
has gradually improved since the introduction of the Mummichog
algorithm [[53]17]. Overall, Mummichog leverages known metabolic
networks to map all potentially molecular relevant metabolites. Hence,
it allows a test of representation in which potentially valid
metabolites are over-represented in a pathway, whereas others are
randomly distributed to a metabolic network. This allows rapid
assessment of potential alterations in a phenotype of interest in a
hypothesis-generating study using untargeted metabolomics data
[[54]18].
Meta-analysis for pathway enrichment analysis or pathway-level
meta-analysis is a powerful approach for capturing the biological
signatures of a particular phenotype of interest across studies with
heterogeneous settings [[55]19]. Pathway-level meta-analysis of
metabolomics data using GSEA has recently become feasible [[56]20].
Using a computational method to predict functional activities from
metabolic features and a pathway-level enrichment meta-analysis using
GSEA may provide insight into the metabolic biosignature of phenotypes
of interest.
Problematic reproducibility and minimal overlap of metabolic features
across studies have hampered the investigation of the metabolic
alterations in TB. There is an urgent need to develop a strategy to
reliably capture the global metabolic biosignature of TB. Herein, we
conducted a pathway-level GSEA-based meta-analysis of two pulmonary TB
untargeted metabolomics data sets from South Korea and China. The
analysis is a proof-of-concept of the ability of metabolomics
meta-analysis using metabolic features to identify metabolic
alterations in pulmonary TB.
Results
Data exploration reveals considerable metabolic feature changes in TB
patients
We performed principal component analysis to examine and visualize the
untargeted metabolomics data of the two studies in positive and
negative ion modes. The three-dimensional score plots of cPMTb
(positive ion mode), ST001231 (positive ion mode), cPMTb (negative ion
mode), and ST001231 (negative ion mode) suggested comparatively clear
separation of TB patients from their counterparts ([57]Fig 1A–1D).
However, the TB patients and cPMTb controls (NC) had a higher level of
metabolome similarity than in ST001231.
Fig 1. Data exploration and visualization.
[58]Fig 1
[59]Open in a new tab
Principal component analysis of (A) cPMTb positive ion mode, (B)
ST001231 positive ion mode, (C) cPMTb negative ion mode, and (D)
ST001231 negative ion mode. TB, tuberculosis; NC, normal control.
Unpaired t-tests were also conducted, and the features were visualized
using volcano plots. At a significance level of 0.05, only 218 features
were upregulated, while 142 were downregulated in the TB group of cPMTb
(positive ion mode, S1A Fig in [60]S1 File). By contrast, 549
upregulated features and 412 downregulated features were found in the
TB group in ST001231 (positive ion mode, S1B Fig in [61]S1 File).
Notably, few features had a high fold change in cPMTb, whereas many
features had a high fold change in ST001231. Similar patterns were
observed in negative ion mode (S1C and S1D Fig in [62]S1 File).
TB patients have distinct metabolome profiles
Partial least-squares discriminant analysis and random forest analysis
were used to examine whether the metabolic profiles could be used to
classify TB patients and controls. In positive ion mode, the partial
least-squares discriminant analysis models possessed excellent
discriminatory performance. In particular, the optimal model in the
cPMTb study contained five principal components with an accuracy,
goodness-of-fit (R^2), and goodness-of-prediction (Q^2) of 0.90, 0.93,
and 0.63, respectively ([63]Fig 2A). Likewise, the optimal model in
ST001231, which had four principal components, had an accuracy, R^2,
and Q^2 of 1.00, 1.00, and 0.96, respectively ([64]Fig 2B). Similar
performance was observed in negative ion mode: cPMTb (accuracy, 0.94;
R^2, 0.96; Q^2, 0.70) and ST001231 (accuracy, 1.00; R^2, 1.00; Q^2,
0.91) ([65]Fig 2C and 2D). Remarkably, there were marked differences in
metabolic profiles between TB patients and NCs in ST001231. The
out-of-bag errors of the four random forest models were 0.12, 0.00,
0.13, and 0.00 for cPMTb (positive ion mode), ST001231 (positive ion
mode), cPMTb (negative ion mode), and ST001231 (negative ion mode),
respectively (S2 Fig in [66]S1 File). These analyses collectively
indicated that TB patients possess a distinct metabolome profile,
compared with controls.
Fig 2. Partial least-squares discriminant analysis.
[67]Fig 2
[68]Open in a new tab
(A) cPMTb positive ion mode. (B) ST001231 positive ion mode. (C) cPMTb
negative ion mode. (D) ST001231 negative ion mode. * optimal value of
Q^2; TB, tuberculosis; NC, normal control.
Profound plasma metabolic alterations of pulmonary TB patients
Pathway-level meta-analysis was conducted separately in positive and
negative ion modes. In positive ion mode, the meta-analysis revealed
that 15 pathways had a combined p-value of < 0.05. They belonged to
metabolic homeostasis, proinflammatory processes, and vitamin
metabolism. The five pathways with the lowest combined p-values were
“carnitine shuttle,” “vitamin A (retinol) metabolism,” “pentose
phosphate pathway,” “purine metabolism,” and “pentose and glucuronate
interconversions” ([69]Fig 3A). Notably, only two pathways were
significant in both individual studies among the significant pathways
in the meta-analysis: “carnitine shuttle” and “vitamin A (retinol)
metabolism.” Several pathways, including “pentose and glucuronate
interconversions,” “hyaluronan metabolism,” and “fructose and mannose
metabolism”—were enriched only in the cPMTb study (A1pos). By contrast,
pathways, such as “sialic acid metabolism,” “purine metabolism,” and
“androgen and estrogen biosynthesis and metabolism”—were significant
only in ST001231 (B1pos). The heterogeneity of significant pathways
among studies might be due to their relatively small sample sizes,
sample heterogeneity, and use of different LC-MS platforms. More
details are shown in S1 Table in [70]S2 File.
Fig 3. Pathway meta-analysis by gene set enrichment analysis.
[71]Fig 3
[72]Open in a new tab
(A) Positive ion mode. (B) Negative ion mode. The enrichment factor of
a pathway was calculated by dividing its number of significant hits by
the expected number of hits.
Analysis of the data in negative ion mode yielded a greater number of
significant pathways. Indeed, 24 pathways had a combined p-value of <
0.05 in the meta-analysis. Similar to the enriched pathways in positive
ion mode, these belonged to proinflammatory processes, vitamin
metabolism, metabolic homeostasis, amino acid-related metabolism, and
some potentially novel pathways. Five of the pathways with the lowest
combined p-values were “glycolysis and gluconeogenesis,” “pyruvate
metabolism,” “fructose and mannose metabolism,” “vitamin D3
(cholecalciferol) metabolism,” and “de novo fatty acid biosynthesis”
([73]Fig 3B). Among the significant pathways in the meta-analysis,
eight were significantly enriched in both studies—the above-mentioned
five pathways and “bile acid biosynthesis,” “arachidonic acid
metabolism,” and “vitamin A (retinol) metabolism.” Seven pathways were
significantly enriched only in cPMTb (A1neg), including “leukotriene
metabolism,” “galactose metabolism,” “C21-steroid hormone biosynthesis
and metabolism,” and “sialic acid metabolism.” In contrast, eight
pathways—including “propanoate metabolism,” several amino acid-related
pathways and “vitamin B3 (nicotinate and nicotinamide) metabolism”—were
enriched only in ST001231 (B1neg). More details are shown in S2 Table
in [74]S2 File.
Discussion
Meta-analysis enhances statistical power, reliability, and
generalizability, especially in high-throughput data settings [[75]21].
A feature-level meta-analysis provides more comprehensive information
than secondary pooled analyses of a limited number of identified
metabolites. Moreover, as mentioned above, metabolite identification
remains a fundamental issue in metabolomics [[76]22]. An analysis that
forgoes metabolite identification significantly reduces the time (i.e.,
from days to hours) required to obtain valuable insights and derive
actionable targets for the phenotype of interest. Therefore, we could
focus more on the validation of potential biomarkers and the
performance of experiments to delineate molecular mechanisms of
disease.
In this study, a pathway-level GSEA-based meta-analysis of two
pulmonary TB untargeted metabolomics data sets was conducted. The two
included data sets had a significant degree of heterogeneity in
clinical characteristics, which might affect the number of enriched
pathways. Nevertheless, the meta-analysis provided considerable insight
into global metabolic alterations in plasma from pulmonary TB patients.
The results are pathophysiologically comparable with previous findings
using conventional targeted methods in addition to novel metabolic
alterations. The analysis is capable of suggesting biological processes
that may be significantly influenced by the clinical characteristics of
a cohort. Furthermore, the findings suggested that functional
interpretation of metabolomics data at the pathway level can provide
insights into the molecular signatures of TB patients. Importantly,
biological speculations at the level of individual metabolites exhibit
human-centric bias [[77]23]. Below we discuss some of the most
important findings.
“Vitamin A (retinol) metabolism” and “vitamin D3 (cholecalciferol)
metabolism” were altered in the meta-analysis. Vitamin A deficiency is
reportedly associated with an increased risk of incident TB among
household contacts [[78]24]. Vitamin A supplementation may boost
immunity against TB [[79]25], and vitamin A and zinc co-supplementation
may improve outcomes [[80]26]. Vitamin D3 deficiency is a risk factor
for TB. Vitamin D3 supplementation may be associated with immune
activation, and thus should improve treatment outcomes; however, this
requires validation [[81]26]. In addition, the “bile acid biosynthesis”
and “purine metabolism” pathways were significantly altered in our TB
patients compared to controls. These pathways may also be involved in
host defense. Indeed, some bile acids inhibit the in vitro growth of M.
tuberculosis [[82]27]. Bile acid derivatives are also potential anti-TB
agents [[83]28], and purine metabolism in M. tuberculosis is a target
for drug development [[84]29, [85]30]. Furthermore, together with lipid
metabolism, these pathways are reportedly linked to anti-TB
drug-induced hepatotoxicity [[86]31].
Notably, we observed significant systemic changes in the host (i.e.,
disease phenotype) due to TB infection. Proinflammatory lipid mediators
and pro-resolving lipid mediators are associated with TB and strongly
associated with TB comorbid type 2 diabetes. The arachidonic
acid-derived leukotriene and prostaglandin families were reported to be
the most abundant proinflammatory lipid mediators [[87]32]. Our pathway
analysis revealed significant enrichment of “arachidonic acid
metabolism,” “leukotriene metabolism,” and “prostaglandin formation
from arachidonate.” We also found various processes related to
nutrients and oxidative stress, including “pyruvate metabolism,”
“fructose and mannose metabolism,” “glycolysis and gluconeogenesis,”
“de novo fatty acid biosynthesis,” and the metabolism of several amino
acids. These findings are concur with a previous report that metabolic
processes are involved in adaptations and/or interactions of the host
and microbe during infection [[88]33]. Medium-chain fatty acids are
involved in protective immunity against M. tuberculosis [[89]34].
Additionally, alteration of “pyruvate metabolism” might be linked to
the increased catabolism and/or energy consumption observed in TB
patients [[90]35]. “Fructose and mannose metabolism” and “glycolysis
and gluconeogenesis” in M. tuberculosis are reportedly affected by
nutrient starvation. In addition, they are linked to central carbon
metabolism, which is essential for the maintenance of metabolic
homeostasis in M. tuberculosis [[91]36]. For example, mycobacteria in
phagosomes took up exogenous pyruvate more efficiently than glucose and
the pyruvate was used as a carbon source for intracellular growth
[[92]37].
We also found some potentially important pathways associated with TB.
In a study of the innate immune responses to M. tuberculosis using
macrophages, Blischak et al. found a subset of genes specifically
involved in infection, including protein-coding genes related to the
regulation of sialic acid synthesis [[93]38]. We found that “sialic
acid metabolism” was altered in plasma from TB patients, and Isa et al.
[[94]39] reported an altered level of sialic acid in urine. Further
studies are warranted to explore the role of sialic acid metabolism and
the associated glycoproteins in the immune response, to understand the
susceptibility of TB and potential therapeutic targets. “C21-steroid
hormone biosynthesis and metabolism” was significantly changed in TB
patients, and may be associated with pathological processes (e.g., host
defense against TB infection) [[95]40]. Finally, the roles of other
pathways showing alterations in TB patients, such as “Vitamin B3
(nicotinate and nicotinamide) metabolism,” “Propanoate metabolism,” and
“androgen and estrogen biosynthesis and metabolism,” remain to be
elucidated.
This study had some limitations. First, the analysis was conducted with
only two untargeted metabolomics data sets. The lack of data might
impede the identification of subtle TB-associated metabolic
disturbances. Second, similar to a recent study [[96]41], the pathway
annotations require validation. Nonetheless, the analysis validated the
available pathological and biological evidence, suggesting its
reliability. Third, blood-derived metabolomics studies cannot directly
elucidate in vivo growth mechanisms or the mode of action of anti-TB
drugs [[97]13]. Instead, they are more suitable for applications
related to host systematic molecular alterations. Finally, post hoc
metabolite identification and individual quantification are required to
evaluate the associations of metabolites with clinical TB
manifestations.
Conclusions
We showed that pathway meta-analysis of several studies can overcome
cross-study inconsistency by increasing the power and generalizability
of the results. In addition, pathologically comparable and novel
metabolic alterations in plasma from pulmonary TB patients were
described. Subsequent studies are needed to leverage these findings to
discover novel diagnostic biomarkers, metabolism-informed clinical
care, and metabolism-informed therapeutic development.
Materials and methods
Institutional review board statement
This study was approved by the Institutional Review Board of Korea
University Guro Hospital (2017GR0012). All investigations were
conducted in accordance with the principles of the Declaration of
Helsinki. Informed consent was obtained from all subjects involved in
the cPMTb study. Patients provided written informed consent for
analysis of their blood and clinical data.
Korean tuberculosis cohort characteristics
The samples used in this study were part of a multi-center TB cohort
entitled Center for Precision Medicine for Tuberculosis (cPMTb). The
biospecimens and data used for this study were provided by the Biobank
of Korea University Guro Hospital, a member of Korea Biobank.
Individuals with human immunodeficiency virus infection, chronic renal
disease, chronic liver diseases, chronic lung diseases, and malignant
diseases were excluded from the analysis. Eventually, plasma samples
from 63 clinically diagnosed pulmonary TB patients and 50 normal
controls were collected.
In the TB group, the mean age (± standard deviation) was 55 (± 16)
years and 27% of the patients were women. Forty-eight patients (76%)
had positive sputum smears and 14 patients (22%) had a chest cavity
image on x-ray. In the controls, the mean age (± standard deviation)
was 60 (± 10) years, and 58% of the controls were women.
Chinese tuberculosis cohort characteristics
We downloaded data from TB patients and NC (Metabolomics Workbench,
study ID ST001231) for the pathway-level meta-analysis with the cPMTb
cohort to elucidate the metabolic profiles of pulmonary TB. In brief,
the study involved 70 plasma samples of pulmonary TB (35 samples) and
NC (35 samples). In the TB group, the age ranged from 18 to 64 years
and 49% of the patients were women; of the patients, 86% had positive
sputum smears and 17% had a chest cavity image on x-ray. The age of the
NC group ranged from 23 to 60 years, and 31% of the controls were
women. The untargeted metabolomics study was carried out by
ultra-high-performance liquid chromatography coupled with Q Exactive
mass spectrometer in positive and negative ion modes. More details are
provided in the original publication [[98]4].
Chemicals and reagents
High-performance liquid chromatography-grade water, methanol, and
acetonitrile (ACN) were from J.T. Baker (Phillipsburg, NJ, USA).
Analytical-grade formic acid and ammonium acetate and the internal
standard (cholic acid-d5), were purchased from Toronto Research
Chemicals (Toronto, Canada). Authentic chemicals for establishing the
in-house database were purchased from Sigma-Aldrich (St. Louis, MO,
USA).
Sample preparation
Blood samples were collected routinely on the day of enrollment in the
overnight-fasted and medication-free state before treatment. Plasma was
prepared by centrifuging the whole blood for 10 min at 4,500 rpm and
stored at -80°C until analysis.
The extraction of metabolites from plasma was conducted in accordance
with our established protocol [[99]42]. In brief, 50 μL of plasma were
mixed with 150 μL of ACN containing 5 μg/mL cholic acid-d5 in a
microcentrifuge Eppendorf tube. The mixture was vigorously vortexed for
5 min and centrifuged for 10 min at 13,000 rpm at 4°C; the supernatant
was collected. An equal amount of each sample was collected and mixed
to create a pooled quality control (QC) sample. All extracts were
stored at -20°C and subsequently analyzed using a high-performance
liquid chromatography quadrupole time of flight mass spectrometer.
Instrumental conditions for untargeted metabolomics
The analysis was conducted as described previously with an Agilent 1200
series high-performance liquid chromatography (Agilent Technologies,
Santa Clara, CA, USA) coupled to a 6530 Q-TOF mass spectrometer
(Agilent Technologies) [[100]42]. The autosampler was set at 4°C for
all procedures. In positive ion mode, ACQUITY UPLC BEH C18 (100 × 2.1
mm, 1.7 μm; Waters) was maintained at 40°C, and metabolite separation
was conducted by binary gradient elution with a flow rate of 0.4
mL/min. Mobile phase A was water with 0.1% formic acid; mobile phase B
was ACN with 0.1% formic acid. The gradient was 0 min, 2% B; 1 min, 2%
B; 3 min, 20% B; 8 min, 90% B; 14 min, 90% B; 14.5 min, 2% B; 18 min,
2% B. Essential mass spectrometer parameters are given in S3 Table in
[101]S2 File. In negative ion mode, the ZIC-HILIC column (100 × 2.1 mm,
3.5 μm; Merck, Darmstadt, Germany) was maintained at 35°C, and
metabolite separation was conducted by binary gradient elution with a
flow rate of 0.5 mL/min. Mobile phase A was ACN/water (5:95, v/v) with
10 mM ammonium acetate; mobile phase B was ACN/water (95:5, v/v) with
10 mM ammonium acetate. The gradient was 0 min, 99% B; 1 min, 99% B; 15
min, 50% B; 17 min, 50% B; 17.1 min, 99% B; and 22 min, 99% B. The mass
spectrometer was operated using equivalent conditions to positive ion
mode.
Data preprocessing and alignment
The generated *.d raw files were converted to mzML files using
ProteoWizard [[102]43]. The mzML files were then submitted to MS-DIAL
(version 4.60) [[103]44] for peak detection, alignment, and annotation.
Essential data processing parameters are given in S4 Table in [104]S2
File. Features with sample average signals lower than fivefold above
the blank average were removed. LOWESS signal correction across batches
was applied to the aligned data set. Before subsequent statistical
analyses, features with a relative standard deviation of ≥ 20% in QC
samples were removed. Features with missing values in ≥ 50% of samples
were also removed, otherwise imputed using feature-wise k-nearest
neighbors. Finally, normalized and filtered features were
log-transformed and Pareto scaled. Post-processing data treatment were
conducted using MetaboAnalyst 5.0 [[105]20].
The *.raw files from ST001231 were submitted directly to MS-DIAL
(version 4.60) for peak detection, alignment, and annotation. Data
processing parameters are given in S4 Table in [106]S2 File. Because
there were no blank samples, no feature removal based on blank
information was applied. Features with missing values in ≥ 50% of the
samples were removed, otherwise imputed using feature-wise k-nearest
neighbors. Features with a relative standard deviation of ≥ 20% in QC
samples were removed. Quantile normalization was employed for
cross-sample normalization. Finally, the data were log-transformed and
Pareto-scaled before subsequent analyses.
Normalized data are provided in S5-S8 Tables in [107]S2 File.
Data exploration and visualization
Principal component analysis was conducted to reduce data
dimensionality, thus facilitating exploration and visualization of the
data. The principal component analysis aims to find an orthogonal basis
(or new axes) that can explain data variability and project
observations onto a smaller subspace. In our study, e[1], e[2], e[3]
were the new axes (or eigenvectors corresponding to the three largest
eigenvalues of the sample covariance matrix) and each observation
[MATH: x∈Rp
:MATH]
was converted to a vector (x^Te[1], x^Te[2], x^Te[3]) and plotted in
three-dimensional space.
Statistical analysis
Multiple statistical methods were used to analyze the untargeted
metabolomics data. For univariate analysis, unpaired t-tests were used.
The adjusted p-value following the Benjamini-Hochberg procedure (i.e.,
a false discovery rate of 0.05) was used as the significance level.
Partial least-squares discriminant analysis and random forest analysis
(number of trees, 500; number of predictors, 50) were used to examine
the class discrimination (i.e., TB and the counterpart) using
metabolomics data. A 10-fold cross-validation procedure was used to
measure classification performance.
Pathway-level meta-analysis of metabolic features
The normalized and transformed data that contained m/z values,
retention time (in seconds), and peak intensity for each ion mode were
subjected to pathway-level meta-analysis. Before the pathway-level
integration, the following calculations were performed: individual m/z
statistics (i.e., t-test); putative metabolite annotation (mass
tolerance, 10 ppm); and pathway prediction. Next, the p-values from
individual studies were combined using Fisher’s method. Given
individual p-values p[i] from the ith hypothesis i = 1, …, n, the
method aggregated them by:
[MATH:
X2=-2∑i=1nlogpi, :MATH]
which follows the chi-square distribution with degrees of freedom 2n
under the null hypotheses.
GSEA was used for the pathway-level enrichment algorithm. In brief, the
algorithm ranked all genes in data based on t-statistics and compared
them to a prespecified gene set (or pathway), termed S. If top-ranked
genes (i.e., large t-statistics) had many overlaps with S, such that
the enrichment score increased, then S was regarded as an active
pathway. The Homo sapiens (human) [MFN] (combined KEGG, BiGG, and
Edinburgh) was used as the pathway library for analysis. A pathway with
a combined p-value of < 0.05 was considered statistically significant.
Supporting information
S1 File. Volcano plots of metabolic features and the random forest
classification of the two studies.
(DOCX)
[108]Click here for additional data file.^ (1.3MB, docx)
S2 File. Supplementary method and materials and data.
(XLSX)
[109]Click here for additional data file.^ (9.8MB, xlsx)
Data Availability
All relevant data are within the manuscript and its [110]Supporting
information files.
Funding Statement
This research was supported in part by the Bio & Medical Technology
Development Program of the National Research Foundation (NRF) funded by
the Korean government (MSIT) (No. 2019M3E5D1A01068994), and the
National Research Foundation of Korea (NRF) grant funded by the Korean
government (MSIT) (grant No. 2018R1A5A2021242). The funders did not
influence study design, data collection, data analysis and
interpretation, and the manuscriptșs content.
References