Abstract
Objective
Human blood metabolites are influenced by a number of lifestyle and
environmental factors. Identification of these factors and the proper
quantification of their relevance provides insights into human
biological and metabolic disease processes, is key for standardized
translation of metabolite biomarkers into clinical applications, and is
a prerequisite for comparability of data between studies. However, so
far only limited data exist from large and well-phenotyped human
cohorts and current methods for analysis do not fully account for the
characteristics of these data. The primary aim of this study was to
identify, quantify and compare the impact of a comprehensive set of
clinical and lifestyle related factors on metabolite levels in three
large human cohorts. To achieve this goal, we improve current
methodology by developing a principled analysis approach, which could
be translated to other cohorts and metabolite panels.
Methods
63 Metabolites (amino acids, acylcarnitines) were quantified by liquid
chromatography tandem mass spectrometry in three cohorts (total
N = 16,222). Supported by a simulation study evaluating various
analytical approaches, we developed an analysis pipeline including
preprocessing, identification, and quantification of factors affecting
metabolite levels. We comprehensively identified uni- and multivariable
metabolite associations considering 29 environmental and clinical
factors and performed metabolic pathway enrichment and network
analyses.
Results
Inverse normal transformation of batch corrected and outlier removed
metabolite levels accompanied by linear regression analysis proved to
be the best suited method to deal with the metabolite data. Association
analyses revealed numerous uni- and multivariable significant
associations. 15 of the analyzed 29 factors explained >1% of variance
for at least one of the metabolites. Strongest factors are application
of steroid hormones, reticulocytes, waist-to-hip ratio, sex,
haematocrit, and age. Effect sizes of factors are comparable across
studies.
Conclusions
We introduced a principled approach for the analysis of MS data
allowing identification, and quantification of effects of clinical and
lifestyle factors with metabolite levels. We detected a number of known
and novel associations broadening our understanding of the regulation
of the human metabolome. The large heterogeneity observed between
cohorts could almost completely be explained by differences in the
distribution of influencing factors emphasizing the necessity of a
proper confounder analysis when interpreting metabolite associations.
Keywords: Amino acids, Acylcarnitines, Metabolomics, Clinical factors,
Lifestyle factors, Network analysis
Abbreviations: AA, amino acid; AC, acylcarnitine; BMI, body mass index;
T2D, diabetes mellitus type 2; ATC, anatomical therapeutic chemical
(code for medication); LDL, low-density lipoprotein; HDL, high-density
lipoprotein; INT-LR, inverse normal transformation (INT) followed by
linear regression (LR); WHR, waist-to-hip ratio; MS, liquid
chromatography-mass spectrometry; BP, blood pressure
Graphical abstract
[45]Image 1
[46]Open in a new tab
Highlights
* •
Amino-acids and acylcarnitines analyzed in three studies with
>16,000 individuals.
* •
Develop a generic and adaptable bioinformatics workflow.
* •
Analysis of the impact of 29 clinical and life-style factors on
blood metabolites.
* •
Analysis of network between factors and metabolites.
* •
Comparison of results between studies.
1. Introduction
Targeted, high-throughput metabolomics using liquid chromatography-mass
spectrometry (MS) increasingly gains momentum in epidemiology.
Important fields of investigations are the understanding of the
molecular basis of metabolism-related phenotypes and diseases and
studying biomarkers for diagnostic and prognostic purposes [47][1],
[48][2], [49][3], [50][4]. Furthermore, analysis of metabolomic
features in relation to other molecular-genetic functional layers of
the organism, e.g. genomics and transcriptomics, is a promising
approach to extend our knowledge of regulatory pathways and associated
patho-mechanisms [51][5], [52][6], [53][7].
Proper identification of factors affecting metabolite levels across
multiple studies is highly relevant for standardized translation of
metabolite biomarkers into clinical applications and to understand
possible confounders of disease associations. However, only limited
data exist regarding kind, number, and relevance of possible
influencing factors. Furthermore, currently applied analysis methods do
not fully account for the characteristics of MS data. Here, zero
inflation (considerable proportion of measurements below the detection
limit) is one of the issues for which limited guidelines exist. Many
studies simply exclude these data, which may result in biased estimates
and conclusions.
In this study, we investigated the effects of 29 clinical and lifestyle
related factors on metabolite levels in dried whole blood derived from
MS in three large human studies with different designs comprising a
total of 16,222 subjects. We developed a generic and adaptable workflow
and made it publicly available so that it can be used for other cohorts
and metabolite panels. We interpreted the discovered associations
biologically by applying pathway-based methods and compared their
strength across studies.
2. Methods
Study design and flow of our analyses is shown in [54]Supplementary
Figure 1.
2.1. Study characteristics
Three different studies are investigated in the present work:
2.1.1. LIFE-Adult
LIFE-Adult is a population-based study of 10,000 randomly selected
individuals from the city of Leipzig, Germany [55][8]. Individuals were
phenotyped for several lifestyle diseases and corresponding lifestyle
associated risk factors. Data of metabolite and clinical/lifestyle
parameters were available for 9,481 participants and blood samples are
collected after an over-night fast.
2.1.2. LIFE-Heart
LIFE-Heart is an observational study of 7,000 patients with suspected
and confirmed coronary artery disease collected from the Heart Center,
Leipzig, Germany ([56]ClinicalTrials.gov No [57]NCT00497887 [58][9]).
Patients originate mainly from Leipzig and surrounding areas. Combined
metabolite data and clinical and lifestyle parameters were available
for 5,767 patients. Patients were not required to be at fasting state.
2.1.3. Sorbs study
The Sorb study is a convenience sample of individuals recruited in the
self-contained population of the Sorbs, an ethnic minority of Slavic
origin residing in the Upper Lusatia region of Eastern Saxony [59][10],
[60][11]. Data of metabolite and clinical/lifestyle parameters were
available for 974 participants. Blood was also collected after an
overnight fast.
All studies conform to the ethical standards of the Declaration of
Helsinki and were approved by the ethics committee of the University of
Leipzig (LIFE-Adult: Reg. No 263-2009-14122009, LIFE-Heart: Reg. No
276e2005, Sorbs: Reg. No: 088–2005). Written informed consent was
obtained from all participants.
2.2. Factors studied in relation to blood metabolite levels
We selected a number of parameters for which an impact on whole blood
metabolite levels is supposed. First, blood composition can be supposed
to affect measured metabolite levels derived from dried whole blood. We
here considered hematocrit, hemoglobin, erythrocytes, reticulocytes,
platelets, leucocytes, neutrophils, lymphocytes, and monocytes.
Second, previously applied covariates in metabolome association studies
were included. The most frequently considered factors were age, sex,
log-BMI, smoking status [61][6], [62][12], [63][13], [64][14],
[65][15], [66][16], and, to a lesser extent, type-2-diabetes (T2D) and
application of sex hormones [67][6], [68][7], [69][15], [70][17].
Third, we included waist-to-hip ratio (WHR) [71][18], systolic and
diastolic blood pressure (BP) [72][12], [73][13] as well as the pulse
pressure, defined as the difference of systolic and diastolic BP.
Additionally, we considered parameters of lipid metabolism as there is
a well-known relation to certain AAs [74][13], [75][14]. Regarding
medication, we considered the effects of statin treatment and other
lipid modifying agents (defined as Anatomical Therapeutic Chemical
(ATC) classification category C10) and sex hormones or modulators of
the reproductive system (ATC G03). Diabetes status was defined in our
study as either self-reported consumption of type-II-diabetes-specific
medication (ATC A10), self-reported diagnosis of T2D, or measured HBa1c
level of >6.5%. Fasting hours were available in LIFE-Adult and
LIFE-Heart. In the Sorbs study, it was required that fasting was >8 h.
Distribution of the considered clinical and lifestyle parameters of the
three cohorts is presented in [76]Table 1.
Table 1.
Subject characteristics of the three cohorts considered. For continuous
variables, median and IQR are shown. For binary variables, total
numbers and percentages are provided.
LIFE-Adult LIFE-Heart Sorbs
Area of collection Leipzig, Germany Leipzig, Germany Upper Lusatia
N 9481 5767 974
Sex (female/male) 4952 (52.2%) 1712 (29.7%) 574 (58.9%)
age (years) 57.91 [47.7–68.2] 63.11 [54.4–71.7] 48.7 [35.6–60.9]
WHR 0.93 [0.863–0.994] 0.98 [0.909–1.04] 0.87 [0.804–0.949]
BMI (kg/m^2) 26.58 [23.9–29.9] 28.41 [25.7–31.8] 26.5 [23.3–29.7]
fasting hours (hours) 12 [11–14] 3 [1.67–12.3] >8
Lipid modifying agents (yes/no) 1272 (13.4%) 2066 (35.8%) 176 (18.1%)
sex hormones (yes/no) 751 (7.9%) 52 (0.9%) 111 (11.4%)
diabetes status (yes/no) 1090 (11.5%) 1720 (29.8%) 86 (8.8%)
HBa1c (%) 5.32 [5.08–5.59] 5.7 [5.38–6.18] 5.4 [5.1–5.7]
self-reported diabetes (yes/no) 996 (10.5%) 1547 (26.8%) 71 (7.3%)
diabetes medication (yes/no) 840 (8.9%) 1258 (21.8%) 57 (5.9%)
smoking status (current, previous, never) 2034 (21.5%)/2706
(28.5%)/4483 (47.3%) 1581 (27.4%)/2108 (36.6%)/2063 (35.8%) 150
(15.4%)/195 (20%)/616 (63.2%)
Blood pressure (systolic) 127 [117–138] 136 [125–150] 132 [121–145]
Blood pressure (diastolic) 75 [68.5–81.5] 83.5 [76–90.5] 80 [73–87]
Pulse pressure 51 [44–60] 53 [44–63] 52 [44–61]
Cholesterol (mmol/l) 5.52 [4.85–6.26] 5.18 [4.4–6.01] 5.25 [4.63–5.94]
LDL-Cholesterol (mmol/l) 3.45 [2.84–4.11] 3.15 [2.48–3.87] 3.32
[2.71–3.98]
HDL-Cholesterol (mmol/l) 1.57 [1.28–1.9] 1.22 [1.01–1.48] 1.57
[1.33–1.89]
Blood hemoglobin (mmol/l) 14 [13.2–15] 14.3 [13.2–15.3] 8.8 [8.3–9.3]
Erythrocytes (10ˆ12/l) 4.66 [4.38–4.94] 4.67 [4.34–4.97] 4.73
[4.47–4.98]
Reticulocytes (per 1000) 12.1 [9.6–14.8] 12.9 [10.5–16.1] 10.6 [8.4–13]
Hematocrit (%) 41 [39.2–43.6] 42 [39–44] 42 [39.2–43.8]
Platelets (10ˆ9/l) 237 [204–275] 230 [194–271] 229 [201–263]
Leucocytes (10ˆ9/l) 5.94 [5–7.1] 7.9 [6.4–9.9] 5.25 [4.4–6.23]
Neutrophils (%) 57.6 [51.9–63.2] 66.5 [59.9–72.8] 54.65 [48.7–60.5]
Lymphocytes (%) 30.2 [25.1–35.5] 22.3 [16.8–28.2] 33.3 [27.9–38.6]
Monocytes (%) 8 [6.8–9.4] 8.5 [7.1–10.1] 8.1 [6.9–9.5]
Basophils (%) 0.6 [0.4–0.8] 0.3 [0.2–0.5] 0.03 [0.02–0.04]
Eosinophils (%) 2.5 [1.6–3.6] 1.4 [0.7–2.5] 0.14 [0.09–0.21]
[77]Open in a new tab
2.3. Metabolite measurement
In LIFE-Adult and LIFE-Heart, 40 μl of EDTA whole blood were spotted on
filter paper WS 903 (Schleicher and Schüll, Germany). In the Sorbs,
40 μl of the plasma free cell suspension was spotted after
centrifugation at 3500 ×g for 10 min.
Spots were dried for 3 h and stored at −80 °C until analysis. To
prepare samples for tandem mass spectrometric analysis, blood spots
with a 3.0 mm diameter (corresponds to 3 μl of blood) were punched out
and extracted via methanol containing isotope labeled internal
standards. After butylation, sample derivatives were analyzed by flow
injection analysis with an API 2000 tandem mass spectrometer (Applied
Biosystems, Germany) in 96-well plates. Each plate included two quality
control samples, from which inter-assay coefficients of variation were
estimated. A detailed description of sample preparation and the
measurement method can be found elsewhere [78][19], [79][20], [80][21].
In consequence, 63 metabolites (27 amino acids (AAs), 34 acylcarnitines
(ACs), free carnitine (C0), and the sum of total ACs, [81]Supplementary
Table 1) were quantified using the software ChemoView 1.4.2 (Applied
Biosystems, Germany).
2.4. Statistical analysis of the three cohorts
Metabolites were pre-processed prior to analysis. In order to stabilize
regression analysis, outliers were removed cohort-wise by applying a
cutoff of mean + 5 × SD of the logarithmized data. Zero values were
excluded for this purpose. In our hands, outlier analysis removed a
maximum of 0.3% of measurements per metabolite and cohort. Remaining
metabolite data were inverse-normal-transformed. Effects of known
technical batches (e.g. analysis plate ID) are removed by a
non-parametric empirical method as implemented in function ‘ComBat’
[82][22] of the R-package ‘sva’ [83][23]. We considered the plate ID of
the mass-spectrometer sample plate (96 well plates including two
analytical controls) as batch variable, resulting in 71, 68, and 15
batches for LIFE-Adult, LIFE-Heart, and the Sorbs, respectively. Since
the ‘ComBat’ procedure requires complete data, missing values were
mean-imputed, using within-batch data or all data when a certain
metabolite was completely missing in a batch. After batch correction,
imputed data points were set missing again. For Asparagine and
Cis-11,14,17-eicosatrienoic acid methyl ester (C20:3) in LIFE- Adult
and LIFE-Heart, batch affects were removed by residualization via a
linear model due to small batch variance.
Following batch correction, relatedness among Sorb subjects was
accounted for as described elsewhere [84][7], i.e. by fitting a
generalized linear model as implemented in the ‘polygenic’ function of
the ‘GenABEL′ package [85][24]. We used a kinship matrix estimated from
SNP data for this purpose [86][25].
Prior to association analysis with metabolites, all continuous clinical
or lifestyle parameters were mean-centered and scaled to one standard
deviation (SD). For association analysis, inverse-normal transformed
batch-adjusted metabolites were univariately associated with the
clinical/lifestyle parameters by linear regression analysis. For
multivariable analysis, correlated factors were pruned to avoid
collinearity and to improve interpretation (default Pearson's
|r| > 0.75 in any cohort [87][26]). Correlation structure between
factors is shown in [88]Supplementary Figures 3–5. In case of
correlated factors, we preferred those which are clinically more often
evaluated. In detail, we preferred diabetes status over diabetes
medication and anamnestic diabetes, hematocrit over blood hemoglobin
levels and erythrocytes, LDL-Cholesterol over total cholesterol,
systolic BP over pulse pressure, and neutrophils over lymphocytes. To
account for multiple testing of all metabolites and factors, we
implemented a Bonferroni correction [89][27] in a hierarchical way,
considering each tested factor as a family of hypotheses regarding
metabolite association [90][28], [91][29].
Effect sizes of metabolite associations are assessed by the explained
variance (r^2) of the considered factor in a univariable model or as
partially explained variance in a multivariable regression model. For
every factor, we quantify the difference in the distribution of r^2
between cohorts by Friedman test followed by Benjamini-Hochberg
correction for multiple testing. When two distributions were compared,
the Wilcoxon signed rank test was used. R-scripts of our analyses are
available at [92]https://github.com/cfbeuchel/Metabolite-Investigator.
For every factor, we performed a pathway analysis considering all
metabolites for which the factor explains at least 1% of the
metabolite's variance in at least one cohort. Enrichment was tested
with MetaboAnalyst [93][30] using the intersection of all representable
metabolites (M = 58) and KEGG-metabolic pathways as background.
Bi-partite networks, connecting metabolite nodes, and factor nodes with
edges representing the partial explained variance were created using
‘visNetwork’ [94][31].
2.5. Simulation study to justify the analysis approach
In our analysis approach, we applied inverse normal transformation of
metabolite data in combination with linear regression analyses (INT-LR
approach), i.e. no removal of measurements below the detection limit is
applied. We conducted a simulation study to compare this approach with
possible alternatives. In detail, we simulated data mirroring typical
issues of MS data, including zero-inflation, skewness (by assuming a
log-normal distribution) and batch effects and imposed different effect
sizes of factors on simulated metabolite levels. In the preprocessing
steps, we considered different data transformation methods [area sinus
hyperbolicus, inverse-normal-transformation, dichotomization (zero vs.
non-zero measurements), categorization (quantile or range-based equal
spaced categories), and creation of ranks]. Accordingly, we performed
univariate linear modeling, binary logistic regression, proportional
odds logistic regression, and Spearmans’ Rank correlation to perform
hypotheses testing in accordance to the chosen transformation method.
Performance was rated according to the ability of the individual method
to discover the imposed effect of a factor on a metabolite and the
ability to correctly control the number of false positives at the
expected 5% level. A schematic workflow of the simulations is presented
in [95]Supplementary Figure 2 and a detailed description of the
simulations can be found in the [96]Supplementary Methods.
3. Results
3.1. Justification of metabolite analysis method
To evaluate its performance, we compared our INT-LR approach in a
simulation study with three other analysis strategies (rank
correlation, binary, and ordinal logistic regression) and three other
data transformations (categorization, dichotomisation, inverse sinus
hyperbolicus transformation, see [97]Supplementary Figure 2 for the
design of the simulation study).
We found that INT-LR controlled false positives sufficiently well as
the number of identified associations was close to 5% in all scenarios
with no effect (β = 0, [98]Figure 1 and [99]Supplementary Table 5).
Furthermore, no other approach had better power to identify true
associations of factors with metabolite levels, especially in scenarios
with high zero inflation (see scenarios with β > 0 in [100]Figure 1 and
[101]Supplementary Table 5). As expected, increased zero-inflation
resulted in decreased observed vs. true effect size ([102]Supplementary
Figure 7).
Figure 1.
[103]Figure 1
[104]Open in a new tab
Comparison of INT-LR method with alternatives – selected results of
simulation study: Shown is the distribution of p-values from the
simulation study comparing INT-LR approach (Linear regression with
inverse-normal transformation) with other methodological approaches
(binary/ordinal logistic regression for binary/categorical data,
asinh-transformation followed by linear regression and Spearman's
Correlation Coefficient for rank data). Results from nine different
simulated scenarios are presented, differing in the simulated effect β
(no effect: β = 0, small effect: β = 0.02, and large effect β = 0.1)
and variable numbers of measurements below the detection limit (0%,
20%, and 80%). The percentage of hypotheses with nominal significance
(i.e. p < 0.05) is shown (based on 1000 replications). For scenarios
with β = 0, this number is required to be 0.05 (false positive control)
and for scenarios with β > 0 as large as possible (good power). The
binary model is only applicable in case of zeros. Overall, method
INT-LR performed best. Results of additional scenarios are reported in
[105]Supplementary Figure 6 and [106]Supplementary Table 5.
3.2. Identification and characterization of clinical and lifestyle related
factors affecting metabolite levels
We applied the INT-LR approach to determine the effect of 29 individual
clinical and lifestyle related factors on metabolite levels in our
studies.
3.2.1. Univariate analysis
We observed statistically significant associations for all 29 analyzed
parameters with at least one metabolite (multiple testing
p[adjusted] ≤ 0.05, [107]Figure 2 and [108]Supplementary Table 2). The
overall highest explained variances were found for sex on C0 and total
ACs (Sorbs, r^2 = 0.25 for both), on Leucine/Isoleucine (LIFE-Adult;
r^2 = 0.22), and on C5OH + HMG (Sorbs; r^2 = 0.21); followed by the
effect of WHR on Leucine/Isoleucine (LIFE-Adult; r^2 = 0.21).
Figure 2.
[109]Figure 2
[110]Open in a new tab
Heat map of univariable associations between metabolite levels and
clinical or lifestyle-related factors. Explained variance by the single
factor is color-coded (1 ≙ 100%) with direction of effect
(red = positive correlation, blue = negative correlation). Maximum
values across the three cohorts are presented. Stars indicated
associations significant after adjusting for multiple testing. Rows and
columns are ordered according to a hierarchical clustering.
Cohort-specific plots can be found in [111]Supplementary Figures 8–10.
The top-five factors affecting most metabolites (p[adjusted] ≤ 0.05 and
explained variance ≥1%) were WHR, sex, application of sex hormones,
age, and hematocrit, influencing 44, 41, 40, 38, and 36 metabolites,
respectively. Factors affecting the fewest number of metabolites at the
same level were smoking status (10), eosinophils (9), cholesterol (8),
fasting hours (6), and basophils (1), respectively.
To evaluate how strongly the metabolites are affected by the considered
factors, we averaged corresponding explained variances over all factors
and cohorts. The five most strongly affected metabolites are
leucine/isoleucine (mean explained variance 3.65%), valine (3.42%),
propionylcarnitine (2.70%), hydroxyproline (2.69%) and total ACs
(2.46%); the metabolites with the lowest amount of explained variance
(<0.14%) comprise nine ACs and the dipeptide carnosine. Of note, these
are low abundant metabolites with at least 40% of values below the
detection limit in at least one of the cohorts.
3.2.2. Independent effects of clinical and lifestyle related parameters on
metabolite levels
Next, we were interested in the variances independently explained by
the clinical and lifestyle related factors. Therefore, we performed
multivariable linear regression analysis considering all parameters
simultaneously for each study. This requires elimination of correlated
parameters to avoid collinearity (see methods). Thus, a total of 22
parameters were considered. Again, all parameters showed significance
for at least on metabolite after adjusting for multiple testing.
However, maximum partial explained variance was approximatively halved
compared to univariable association analysis ([112]Figure 3,
[113]Supplementary Table 3).
Figure 3.
[114]Figure 3
[115]Open in a new tab
Heat map of multivariable association results between clinical and
lifestyle-related factors and metabolite levels. Partial explained
variance (1 ≙ 100%) is color-coded according to the direction of the
effect (positive = red, negative = blue). Maximum values across the
three cohorts are presented. Rows and columns are ordered according to
a hierarchical clustering. To avoid collinearity, strongly correlated
factors were pruned (see methods). Cohort-specific plots can be found
in [116]Supplementary Figures 11–13.
The largest partial explained variance was found for sex hormones on
total ACs (r^2 = 0.13), threonine (r^2 = 0.13), citrulline
(r^2 = 0.12), C0 (r^2 = 0.12), and aminobutyric acid (r^2 = 0.11) in
the Sorbs. This is very similar to unadjusted analysis where all these
association (with the exception of threonine) were among the strongest
effects, too. Application of sex hormones, reticulocytes, WHR, sex,
haematocrit, and age were relevant for the highest numbers of
metabolites as they independently explained more than 1% of variance
for 58, 18, 14, 11, and 9 metabolites, respectively. Again, this is
similar to univariate analysis. Vice versa, leukocytes and platelets
are the least relevant parameters in multivariate analysis explaining
1% variance for only one metabolite each (hydroxyproline (3.3%) and
pipecolic acid (3.2%), respectively).
Overall, a highest percentage of variance explained by the
multivariable models was observed for leucine/isoleucine in two studies
(adjusted-r^2 = 0.37 and 0.38 in LIFE-Adult and Sorbs, respectively).
Additionally, adjusted-r^2 per metabolite was
[MATH: >0.3 :MATH]
for the six metabolites valine (adjusted-r^2 = 0.36, Sorbs),
hydroxyproline (adjusted-r^2 = 0.35, Sorbs), propionylcarnitine
(adjusted-r^2 = 0.34, Sorbs), phenylalanine (adjusted-r^2 = 0.31,
Sorbs), citrulline (adjusted-r^2 = 0.31, Sorbs) and total
acyl-carnitines (adjusted-r^2 = 0.31, Sorbs) ([117]Supplementary
Table 3). We observed that AA were more strongly affected by the
investigated factors than AC by mean explained variance (p = 0.039),
but not by median explained variance (p = 0.19, Wilcoxon-Test,
[118]Supplementary Figure 14).
We selected factors explaining at least 1% variance in multivariable
analysis of at least one metabolite in one of the cohorts. 14 such
factors where identified resulting in 94 factor-metabolite
relationships involving 39 metabolites. A bi-partite network of these
relationships is shown in [119]Figure 4, and interactively online, at
[120]https://cfbeuchel.shinyapps.io/interactivefig4/.
Figure 4.
[121]Figure 4
[122]Open in a new tab
Bi-partite network of metabolites (yellow) and factors (blue) based on
multivariable associations explaining at least 1% of variance.
Thickness of edges corresponds to the maximum partial explained
variance over the three cohorts. An interactive version of this plot is
available at [123]https://cfbeuchel.shinyapps.io/interactivefig4/.
To obtain further biological insights, we analyzed which metabolite
pathways are affected by the single factors analyzed. For this purpose,
we selected the same associations as for the bipartite network analysis
and performed formal enrichment analyses with respect to metabolite
pathways implemented in ‘MetaboAnalyst’ ([124]Supplementary Table 4).
Strongest enrichment was observed for WHR. Among others, WHR is
associated with the metabolites carnitine, acetylcarnitine, and
propionylcarnitine resulting in an over-representation of the pathway
“oxidation of branched-chain fatty acids” (p = 2.0 × 10^−4).
3.2.3. Comparison of cohorts
Distributions of effect sizes of the single studies are shown in
[125]Figure 5 for the 22 factors included in multivariable analysis.
Agreement of distribution of effect sizes are stronger in univariable
analyses compared to multivariable analysis. In univariable analyses,
13 factors had effect sizes >1% explained variance in all three
cohorts. In contrast, this applies to only five factors in
multivariable analysis.
Figure 5.
[126]Figure 5
[127]Open in a new tab
Distributions of uni- and multivariable explained variances of clinical
and lifestyle-related factors and comparison between cohorts. Boxplots
show the distribution of explained variances (respectively partial
explained variances for multivariable models) for the different
metabolites. The dashed line represents an exemplarily r^2 cutoff (1%)
to mark strong effects.
Among the 29 factors, 27 were associated significantly
(p[adjusted] ≤ 0.05) with at least one metabolite in all three studies.
Exceptions were fasting hours and diabetes medication, which are not
available in the Sorbs. We analyzed differences in effect sizes of our
factors between cohorts by formal interaction analysis considering
study as interaction partner. It revealed that only a few such
interactions were significant and that only one of the interactions
explains more than 1% of variability of the metabolite, namely the
interaction of diabetes (and study) regarding citrulline
(partial-r^2 = 0.015, p[adjusted] = 1.5 × 10^−56, [128]Figure 6).
Further, interaction effects were found for fasting hours regarding
proline, tyrosine and alanine, of log-BMI regarding sarcosine and
tyrosine and of sex regarding hydroxyproline and leucine/isoleucine.
These interactions explain 0.8% down to 0.2% of the respective
metabolites variances. All interactions are presented in
[129]Supplementary Figure 15.
Figure 6.
[130]Figure 6
[131]Open in a new tab
Heatmap of partial-r^2of interaction effects of study with the 22
factors and study regarding the 63 metabolites. Significance is
indicated as an asterisk and was computed via likelihood-ratio test of
multivariable linear regression models. The full model includes main
effects for each factor and study and their interactions. It is
compared with a reduced model not containing the considered interaction
effect. Correction for multiple testing was applied by a hierarchical
Bonferroni procedure (see methods).
4. Discussion
In this study, we comprehensively analyzed the impact of 29 clinical
and lifestyle related factors on plasma levels of 37 AA, 24 AC, C0, and
the sum of total ACs measured by the same tandem mass-spectrometric
method in three large cohorts over 10 years. For this purpose, we
propose a principled workflow of data preprocessing and analysis which
can be applied to other studies and metabolite panels. A major finding
is that the large heterogeneity of metabolite levels across cohorts can
almost completely be explained by the different distributions of
influencing factors rather than their effect size on metabolites, i.e.
there were almost no interactions between study and factors. We also
detected a number of known and novel associations broadening our
understanding of the regulation of the human metabolome, which we
discuss in the following.
Within the identified 14 strongest multivariable associating factors
(defined as explaining at least 1% variance for at least one
metabolite), we could confirm several previously reported AA- and
AC-affecting factors. These factors included sex [132][32], [133][33],
[134][34], [135][35], [136][36], medication with sex hormones (e.g.
contraceptives, [137][32], [138][37], [139][38]), hematocrit [140][39],
and medication with lipid modifying agents (e.g. statins, [141][40],
[142][41]). Additionally, our work provided novel support for
relationships for which contradicting results are present in the
literature. Exemplarily, rising levels of proline were reported for
prolonged fasting recently, contradicting an earlier study [143][42],
[144][43]. Our results support the earlier studies, as we identified
strong negative associations of proline levels with prolonged fasting
in LIFE-Adult (
[MATH: βˆ=−0.07
:MATH]
, p[adjusted] = 8.7 × 10^−15) and LIFE-Heart (
[MATH: βˆ=−0.22
:MATH]
, p[adjusted] = 6.4 × 10^−108). This observation is also in line with
research linking proline catabolism with lipid utilization during
fasting [145][44].
We also identified a number of novel findings. Among the 33 significant
(p[adjusted] ≤ 0.05) associations with sex hormones, negative
associations with glycine (
[MATH: βˆ=−0.57
:MATH]
, p[adjusted] = 2.2 × 10^−77, LIFE-Adult) and arginine (
[MATH: βˆ=−0.20
:MATH]
, p[adjusted] = 7 × 10^−12, LIFE Adult) were observed. Such an
interaction of sex hormones with the creatine formation pathway is
plausible given the role of estrogen in the upregulation of the
l-arginine:glycine amidinotransferase [146][45], [147][46].
Additionally, the negative association of sex hormones with ornithine (
[MATH: βˆ=−0.38
:MATH]
, p[adjusted] = 3.46 × 10^−54, LIFE- Adult) is corroborated by research
in animal studies linking sex hormones to increased ornithine
decarboxylase activity [148][47], [149][48], [150][49], but was to the
best of our knowledge not yet described for human cohorts. It needs to
be acknowledged that these associations, despite being plausible, do
not retain satisfactory evidence for causal relationships. Further
experimental validation of interesting associations is required to
unravel underlying causal mechanisms. Relevance of these mechanisms for
patho-mechanisms of diseases should be investigated in specifically
designed studies.
Pathway enrichment analysis revealed plausible results, all at nominal
significance level ([151]Supplementary Table 4). For instance,
metabolites associated with reticulocyte counts (carnitine,
acetylcaritine, propionylcarnitine) showed a significant enrichment in
the metabolism of branched chain fatty acids (p = 0.017). This is in
line with knowledge on the fatty acid catabolism in reticulocyte
mitochondria [152][50], [153][51]. Moreover, associations of carnitine
and acetylcarnitine with WHR showed an enrichment in beta oxidation of
very long chain fatty acids (p = 0.06) in line with knowledge on the
peroxisomes [154][50], [155][51].
Overall, we observed a stronger impact of the considered factors on AAs
rather than ACs. However, since ACs show a higher rate of
zero-inflation than AAs ([156]Supplementary Table 1) and higher
zero-inflation could result in underestimation of the observed
explained variance ([157]Supplementary Figure 7), we considered
metabolites with less than 10% zero inflation in a sensitivity
analysis. For this subset, no clear trend regarding differences in mean
or median of explained variances per factor were observed (mean:
p = 0.63, median p = 0.092, Wilcoxon-Test, [158]Supplementary
Figure 14).
When analyzing heterogeneity of effects across our three cohorts, we
observed strong similarities. 45/63 metabolites and all but three
factors associated significantly in all three studies. Limited sample
size and thus power issues could be a reason for the lower number of
strong associations in the Sorbs study. The low number and low effect
size of interaction effects between study and factors in a pooled
analysis supports reproducibility of our findings across multiple
cohorts and suggest excellent between-study comparability required for
mega- or meta-analyses.
The few differences in effect sizes could be explained by the different
study designs. Relevance of sex hormones was highest in the Sorbs, in
line with the younger age of this cohort and the higher percentage of
females before menopause in this cohort ([159]Table 1). Another example
is the higher importance of blood parameters in the Sorbs, in line with
the different type of blood specimen used here. Whereas dried whole
blood was used in LIFE-Adult and LIFE-Heart, cell suspension after
plasma removal was used in the Sorbs reducing the influence of
cell-free plasma-specific metabolites, providing a clearer picture of
the intracellular metabolism. Thus, associations with intracellular
metabolic actors, especially ACs, are stronger than in the cohort
utilizing plasma-free cell suspension as a tissue source, leading to
the strongest associations found in all three studies
([160]Supplementary Table 3). Finally, the higher effects of fasting in
LIFE-Heart is also plausible due to the effect that a considerable
percentage of patients were not at fasting state ([161]Table 1). Hence,
for the purpose of selecting relevant factors, we recommend study
specific analyses first, which can be efficiently done with the help of
our preprocessing and analysis tool provided online.
Funding source declaration
LIFE-Heart and LIFE-Adult are funded by the Leipzig Research Center for
Civilization Diseases (LIFE). LIFE is an organizational unit affiliated
to the Medical Faculty of the University of Leipzig. LIFE is funded by
means of the European Union, by the European Regional Development Fund
and by funds of the Free State of Saxony within the framework of the
excellence initiative. Initial funding of LIFE-Heart was supported by
the Roland-Ernst Foundation.
The Sorbs study was supported by grants from the Deutsche
Forschungsgemeinschaft (DFG, German Research Foundation – Projektnummer
209933838 – SFB 1052; B03, C01; SPP 1629 TO 718/2- 1), from the German
Diabetes Association and from the DHFD (Diabetes Hilfs-und
Forschungsfonds Deutschland). IFB Adiposity Diseases is supported by
the Federal Ministry of Education and Research, Germany, FKZ: 01EO1501
(AD2-060E, AD2-06E95, AD2-7123).
MS received funding from the Federal Ministry of Education and
Research, Germany, FKZ: 01EO1501.AD2-7117.
Availability of data and material
Data of the LIFE studies are available upon reasonable request on the
LIFE Research Centre for Civilization Diseases.
Author's contributions
Study data collection: AT, MStu, ML, JT, FB, MSch.
MS analysis and assessments: UC, SB, JD.
Data analyses: CB, HK.
Drafting of manuscript: CB, HK, MS.
Critical revision of manuscript: AT, MStu, ML, JT, UC.
Acknowledgements