Abstract
Variation in the blood metabolome is intimately related to human
health. However, few details are known about the interplay between
genetics and the microbiome in explaining this variation on a
metabolite-by-metabolite level. Here, we perform analyses of variance
for each of the 930 blood metabolites robustly detected across a cohort
of 1,569 individuals with paired genomic and microbiome data, while
controlling for a number of relevant covariates. We find that 595 (64%)
of these blood metabolites are significantly associated with either
host genetics or the gut microbiome, with 69% of these associations
driven solely by the microbiome, 15% driven solely by genetics, and 16%
under hybrid genome-microbiome control. Additionally, interaction
effects, where a metabolite-microbe association is specific to a
particular genetic background, are quite common, albeit with modest
effect sizes. This knowledge will help to guide targeted interventions
designed to alter the composition of the human blood metabolome.
Editor summary:
Diener and Dai et al analyse blood metabolites from 1569 individuals
and identify metabolites associated with the microbiome, host genetics,
or under hybrid genetic-microbiome control
Introduction
The human blood metabolome is shaped by a combination of intrinsic and
extrinsic forces and constitutes the primary resource pool for human
metabolism. While the composition of the plasma metabolite pool is
strongly driven by diet, lifestyle, and the ecology of the gut
microbiota, the fate of individual metabolites in the blood is often
tightly regulated by host genetics ^[40]1.
Genetic variants are known to alter the human blood metabolome in
several disease-relevant contexts. For example, certain deleterious
alleles impact cholesterol levels and contribute to
hypercholesterolemia, while other well-known alleles drive
phenylalanine accumulation and lead to phenylketonuria ^[41]2,[42]3.
The majority of genetic variants associated with blood metabolite
levels affect either enzymes or solute carriers, thus directly
influencing an individual’s ability to produce, consume, secrete, or
absorb small molecules ^[43]4,[44]5. Many of the currently identified
metabolome-associated genetic variants are pleiotropic ^[45]6, which
indicates a vast interconnectedness of the metabolome across different
systems of the body.
Recent studies have identified the human gut microbiome as a major
determinant of blood metabolite variability ^[46]1,[47]7. In this prior
work, host genetics and gut microbiome composition were found to
associate with plasma metabolite levels in a largely orthogonal manner,
which is supported by the observation that, for the most part, host
genetics and the gut microbiota do not tend to associate strongly with
one another ^[48]8, even though a number of clear examples of
genome-microbiome associations have been shown ^[49]9.
Despite these broad, multivariate regression results showing a global
correspondence between genomics, the microbiome, and the metabolome,
little is known about how this maps onto individual blood metabolite
levels. While one might expect that the variation in metabolites
produced by bacteria in the human gut is mostly governed by the
microbiota and that metabolites specific to human metabolic processes
are associated more with host genetics, there are examples, as in the
case of microbe-host co-metabolites like conjugated secondary bile
acids ^[50]10, where the story becomes more complicated.
Intestinal signaling, such as the activation of FXR and TGR5 receptors
in the human gut, can regulate glucose, insulin, cholesterol, and bile
acid homeostasis ^[51]11,[52]12. Furthermore, many microbiome-derived
metabolites are modified by hepatic enzymes and converted into a
variety of conjugated compounds, such as hippurate or polyamines
^[53]13,[54]14. These multi-layer filters on microbe-host co-metabolism
make it challenging to map blood metabolites to potential microbial
precursors. For example, blood cholesterol levels are affected by host
genetic variants, but they can also be regulated through intestinal
signaling, involving the production of secondary bile acids by gut
commensals, or through gut microbial cholesterol dehydrogenases that
funnel host and dietary cholesterol into fecal coprostanol
^[55]15,[56]16. Thus, even though host genetics and the microbiome
appear to have largely orthogonal effects on the blood metabolome as a
whole, they nonetheless act on an overlapping set of metabolites,
potentially explaining independent components of the variance of these
compounds.
Additionally, even in the absence of strong heritability of human gut
commensals, there are key examples where genetics and the microbiome
may interact in particular disease conditions, such as cystic fibrosis
^[57]17. This raises the question of whether there are instances where
microbiome-metabolome associations are modulated by the genetic
background of the host in a similar manner as other gene-environment
interactions ^[58]18, where a genetic variant may augment or attenuate
an individual’s risk of developing a particular disease when exposed to
a specific environmental risk factor.
Here, we studied variability in plasma metabolite abundances in a
cohort of 1,569 healthy individuals from the U.S. We find extensive
interplay between the genetic and gut microbial determinants of
individual metabolite levels in the blood, which provides deep insights
into the microbe-host co-metabolisms that govern the composition of the
human blood metabolome.
Results
Metabolites associated with genetics and the microbiome
We performed genome-metabolite and microbiome-metabolite association
tests for a healthy cohort of 1,569 individuals with paired blood and
fecal samples taken within 30 days of each other. The cohort consisted
of individuals in the U.S., mostly from the Pacific West, that had
enrolled in a scientific wellness program of the former company,
Arivale, Inc., between 2016 and 2019. These individuals were generally
healthy, between 18 and 87 years old, predominantly female (63.5%) and
with a mean±SD body mass index (BMI) of 28.2±6.6. Metabolite abundances
across the data set were first adjusted for common confounders and the
residuals were used for further downstream analyses (927 metabolites
associated with at least one confounding variable, FDR-corrected
p<0.05, mean R^2=0.12±0.06). The cohort was split randomly into a
training cohort of 1,000 individuals and a validation cohort of 569
individuals. Microbiome-metabolite and genome-metabolite association
analyses were performed within the training cohort and the resulting
training cohort coefficients were fixed in the validation cohort models
to generate out-of-sample R^2 estimates ([59]Fig. 1A).
Figure 1. Study design and cross-sectional variances in metabolites explained
by host genetics, gut bacterial genera, or both.
Figure 1.
[60]Open in a new tab
(A) The study comprised a metabolome-genome-wide association analysis
and a metabolome-microbiome-wide association analysis (regressions on
centered log-ratio transformed bacterial genus-level abundances)
performed in the Arivale cohort. Explained variance for a specific
blood metabolite can be partitioned into host genetic and microbiome
associated components. The cohort was split into a training cohort of
1,000 individuals and a validation cohort of 569 individuals. (B)
Fraction of variance (R^2) explained in the training cohort by host
genetic or microbiome features across the 595 metabolites with
significant associations with either the genome or the microbiome
(FDR-corrected p<0.05). Specific sub-groups are annotated in the plot,
including metabolites only associated with host genetics, metabolites
only associated with gut genera, or metabolites significantly
associated with both (i.e., hybrid). (C) Percentage of the 595
significant metabolites associated only with host genetics, only with
the microbiome, or with both (hybrid) in the training cohort. (D)
Fraction of variance (R^2) explained in the validation cohort.
Coefficients were fixed to those from the training cohort to estimate
out-of-sample R^2 values. (E) Percentage of the 595 significant
metabolites associated only with host genetics, only with the
microbiome, or with both (hybrid) in the validation cohort. (F)
Comparison of the R^2 values obtained in this study with the R^2 values
obtained in Bar, Korem et al. ^[61]1 for 159 metabolites identified as
significant in both studies. The solid line denotes an ordinary least
squares regression line, and the error bands denotes the 95% confidence
interval of the regression. P-values were obtained by a Pearson
correlation test and “r” denotes the Pearson product-moment correlation
coefficient.
To identify associations between host genetics and individual
circulating blood metabolite levels, we performed genome-wide
association analyses on 7.68 million common variants (Minor Allele
Frequency ≥ 1%) in the 1,000 individuals from the training cohort for
each of the 930 blood metabolites that passed our prevalence threshold.
A total of 182 metabolites of the 930 tested (19.6%) were associated
with one or more of the 297 independent lead variants that passed the
genome-wide significance threshold (p < 5.37 × 10^−11; see Fig. S1). Of
these 297 variants, 79 were in intergenic regions and 218 variants
mapped to 123 genes, including those associated with inherited
metabolic disorders such as ACADS, ALMS1, and GCKR. Potential
loss-of-function consequences were predicted for two variants: stop
gained for rs183603441 in HYKK, and a splice donor variant for
rs61825638 in KMO. Missense mutations were predicted for 25 variants,
affecting 39 metabolites. However, the majority of the
metabolite-associated variants found within genes were synonymous.
In particular, lactosylceramide lactosyl-N-nervonoyl-sphingosine
(d18:1/24:1) was associated with lead variants rs3752246 (P = 9.8 ×
10^−14) and rs3752248 (P = 5.6 × 10^−13) located in ABCA7. rs3752246 is
a missense variant and has been previously identified as a risk variant
for late-onset Alzheimer’s disease ^[62]19,[63]20. Prior studies have
identified altered levels of ceramides in the blood and brain of
individuals with Alzheimer’s Disease, with the d18:1/24:1 ceramide
showing a strong and robust association with Alzheimer’s Disease in a
recent meta-analysis ^[64]21. Our results further underscore a possible
shared genetic architecture underlying lactosyl-N-nervonoyl-sphingosine
(d18:1/24:1) levels and late-onset Alzheimer’s disease.
Associations with more than one metabolite were identified for 70 of
the 297 significant lead variants (23.6%), revealing the extent of
pleiotropy in our cohort. The effect sizes of pleiotropic variants are
generally higher than non-pleiotropic variants (median effect size
difference = 0.08, P = 0.04; two-sided Mann-Whitney U-test), while the
distributions of minor allele frequencies were similar between the two
types of variants. Overall, the 70 pleiotropic lead variants were
associated with 92 metabolites (52.2% of all significant
genetically-associated metabolites). Four variants (rs4149056,
rs1047891, rs148982377, and rs45446698), of which rs4149056 and
rs1047891 are missense variants, were each associated with more than 8
metabolites. rs4149056 in SLCO1B1 (solute transporter in liver) was
associated with 8 metabolites, including primary and secondary bile
acids, conjugates of polyunsaturated fatty acids, and free fatty acids.
rs1047891 in CPS1 (mitochondrial enzyme involved in the urea cycle) was
found to be associated with 8 metabolites, many of which were
conjugated to glycine and glutamine moieties and were highly
correlated. rs148982377 in ZNF789 and rs45446698 in ZSCAN25
(transcription factors) were associated with the same set of 9 steroid
hormone metabolites, primarily conjugates of DHEA and androsterone.
Analysis at the gene level reveals a further degree of pleiotropy as
well as polygenicity. Using annotations from dbSNP, we identified
associations between genes and metabolites. 182 metabolites were
associated with at least one gene, with 123 significant genes
identified in total. Of these, 35 genes (27.8%) were associated with
more than one metabolite. Individual pleiotropic genes tend to be
associated with metabolites with similar biochemical properties,
providing insights into unidentified metabolites. For example, the gene
cluster of UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8,
UGT1A9, and UGT1A10 was associated with 10 metabolites, including
bilirubin, biliverdin, and five unidentified compounds. These genes
encode UDP-glucuronosyltransferase (UGT) enzymes, which metabolize
bilirubin ^[65]22. Consistent with our results, Metabolon later
determined that these five unidentified compounds were degradation
products of bilirubin. In addition to pleiotropy, we also observe
instances of polygenic associations for several metabolites. For
example, ethylmalonate, a breakdown product of butyrate, was
significantly associated with six different genes: ACADS, SPPL3, CABP1,
OASL, DYNLL1, and MLEC.
In order to identify associations between the gut microbiome and
circulating blood metabolite levels, we performed regressions using
centered-log-ratio (CLR) transformed bacterial genus-level abundances
as independent variables, while correcting for sex, age, sex-age
interactions, BMI, and genetic kinship (i.e., the first 5 principal
components of the genetic distance matrix). A majority of the tested
metabolites had significant associations with at least one bacterial
genus in the gut microbiome (508/930 = 54.6%, FDR-corrected p<0.05).
Here, the average fraction of explained variance (R^2) was lower than
genetic associations (mean R^2 of 0.04 and 0.11 for microbiome and host
genetic features in the training set, respectively; 0.02 vs. 0.09 in
the validation set, respectively). However, these R^2 values are not
directly comparable across genomic and microbiome models due to the
different significance thresholds used (Bonferroni for the GWAS and
FDR-correction for the microbiome).
The blood metabolites with the largest fraction of variance explained
by microbial features were dominated by compounds involved in bacterial
metabolism of aromatic and phenolic compounds, such as
cinnamoylglycine, hydrocinnamate, and hippurate (R^2>0.25 in the
training cohort and R^2>0.15 in the validation cohort). Catabolism of
phenylalanine and phenylacetate to cinnamate and benzoate is exclusive
to the microbiome and is conspicuously absent in germ-free mice
^[66]23,[67]24. Hippurate is formed in the liver by conjugating
benzoate to a glycine and blood levels of hippurate have been
positively associated with gut microbiome alpha-diversity ^[68]7,[69]25
and with overall metabolic health ^[70]26.
Additionally, we found several products of bacterial protein
fermentation, such as p-cresol derivatives and the tryptophan breakdown
product indolepropionate, associated with the gut microbiome
([71]Tables S1–[72]S2). Furthermore, we found that all secondary bile
acids identified in plasma were significantly associated with the gut
microbiome (FDR-corrected p<0.05), which is expected because primarily
bile acids are deconjugated into secondary bile acids by bile salt
hydrolases expressed by gut commensal bacteria ^[73]27. Here,
isoursodeoxycholate had the largest fraction of variance explained by
gut microbiome composition (R^2 of 0.36 and 0.3 in the training and
validation cohorts, respectively).
R^2 estimates from the current study showed good agreement with prior
estimates from an Israeli cohort^[74]1 for microbiome-metabolite
associations (Pearson r=0.5, [75]Fig. 1F) and excellent agreement for
genome-metabolite associations (Pearson r=0.9, [76]Fig. 1F). The
largest differences in microbiome-metabolite variances across cohorts
were observed mostly for xanthine derivatives and may be due to
different microbiome sequencing strategies, confounder-adjustments,
different statistical modeling approaches, or differences between serum
(Bar, Korem, et al.) and plasma (this study).
Additive contributions of genetic and microbial factors
We identified a total of 595 out of 930 plasma metabolites (64%) that
were associated with either genetic factors, microbial factors, or both
([77]Fig. 1A–[78]B). Most of these metabolites (508) were significantly
associated with the microbiome. 16.3% and 11.4% of these metabolites
showed “hybrid” associations in the training and validation cohorts,
respectively, meaning they were significantly associated with both
genetic and microbial factors (i.e., >1% of total explained variance
explained by both sets of features, [79]Fig. 1B–[80]E). In particular,
we found that about ⅓ − ½ of all metabolites with a significant genetic
association also showed a microbial association (i.e., 82/161 and
57/160 metabolites associated with genetic variants in the training and
validation cohorts, respectively). Conversely, no more than ⅕ of all
metabolites with a microbiome association also showed a hybrid
association with host genetics across training and validation cohorts.
Consistent with the average explained variances of metabolites reported
above for genetics and the microbiome, hybrid metabolites with
particularly large total explained variances (i.e., >20%) showed a
tendency to have higher genetic R^2 values than microbial R^2 values
(see [81]Fig. 1B,[82]D and [83]Fig. 2A). However, as mentioned above,
these patterns should be interpreted with caution because of the
different significance thresholds used for microbiome and genetic
association tests.
Figure 2. Metabolites associated with both genetics and the microbiome.
Figure 2.
[84]Open in a new tab
(A) The 20 metabolites with the highest total R^2 in the validation
cohort that were significantly associated with both host genetics and
the microbiome in the training cohort. Blue and green colored bars
denote individual R^2 values in genetics-only (blue) and
microbiome-only (green) regression models, respectively. Gray bars
denote R^2 values from regression using joined genetics and microbiome
data. (B) R^2 values obtained by either adding individual contributions
of genetics and the microbiome (additive) or by performing a joint
regression (joint) for all metabolites in the hybrid group with an
identified standard. The difference between the two groups indicates a
very small, but nonetheless significant, overlap in variance explained
by genetics and the microbiome (n=82 metabolites in each group,
two-sided Mann-Whitney test). However, the variances explained by host
genetics and the microbiome were largely additive. Boxes denote the
interquartile range (IQR) and the center line denotes the median.
Whiskers extend up to the largest and smallest point within 1.5 IQRs.
Stars denote significance (*** - p<0.001). Metabolite names ending in
“*” or “**” indicate compounds for which a standard is not available,
but Metabolon is confident in its identity.
In order to test whether genetic and microbial factors contained
redundant information, we compared a joint genetics-microbiome model to
individual genetic and microbial models. If the overlap in variance
explained between genetic and microbial feature sets was large, the
joint model R^2 value would be substantially smaller than the sum of
the individual model R^2 values (see [85]Fig. 2A). Although we found
that the difference in R^2 between the joint model and the sum of the
individual model R^2 values was statistically significant in the
validation cohort across a very large number of models, the magnitude
of this difference was extremely small (median difference in
R^2<0.003), indicating that genetics and the microbiome explain nearly
exclusive components of the variance for a given metabolite. This
result is consistent with prior work showing that the variances in the
blood metabolome explained by the microbiome and host genetics were
largely orthogonal ^[86]1,[87]8. However, here we demonstrate that this
is not only true globally, but at the individual metabolite level as
well.
Genome-microbiome interaction effects influence metabolites
Finally, we investigated whether the genetic background of the host
could modulate microbiome-metabolite associations (see [88]Fig. 3A). To
this end, we tested all genetic variant / bacterial genus / metabolite
triplets across our full cohort (N=1,569), filtering for only those
variant-metabolite pairs that were previously detected in the
genome-wide associations studies (i.e., 23,126 triplets in total). We
employed a strategy similar to gene-environment interaction studies,
using the CLR transformed abundances of microbial genera as the
environmental variable, while adjusting for the covariates mentioned
above ^[89]28. 466 interaction effects were deemed significant under an
FDR-corrected p-value cutoff of 0.05, involving 69 distinct
metabolites. Though gene-microbiome interactions were quite common,
they only explained very small fractions of plasma metabolite
variability (mean R^2=0.005) across all genotypes. However,
gene-microbiome-metabolite interactions often showed appreciable R^2 in
the homozygous minor allele genotypes, where they explained up to 16%
of the metabolite variances ([90]Fig. 4).
Figure 3. Gene-microbiome interactions.
Figure 3.
[91]Open in a new tab
(A) Gene-microbiome interactions occur when the correlation between a
gut bacterial genus and a blood metabolite is itself conditional on a
specific allele pairing. (B) Pathway enrichment analysis for
metabolites involved in significant gene-microbe interactions. (C)
Bacterial family-level enrichment analysis for taxa involved in
significant gene-microbe interactions. (D) Host gene enrichment
analysis for genes involved in significant gene-microbe interactions.
In (B-D) red circles denote prevalence across all tested features
(i.e., background prevalence) and blue triangles denote prevalence in
only those features with significant interaction terms. Stars denote
significantly enriched features under a one-sided hypergeometric test
with a Benjamini-Hochberg correction for multiple testing (* - p<0.05,
** - p<0.01, *** - p<0.001). (B-D) include data from the entire cohort
(n=1,569).
Figure 4. Gene-microbiome interactions explain variation in blood metabolite
levels.
Figure 4.
[92]Open in a new tab
Shown are six selected examples where the genomic background modulates
the associations between bacterial genus-level abundances and
metabolite levels. In (A-F) metabolite abundances were log-transformed
and adjusted for common confounders described above. In (A-F) “ρ”
denotes the Spearman’s Rank-Order correlation coefficient of the
regression and “p” denotes the p-value under a Spearman’s correlation
test. The solid line indicates an ordinary least squares regression
line relating the transformed metabolite abundance with the transformed
genus abundance within each group. The shaded area is the 95%
confidence interval of the regression. Associations were corrected for
sex, age, age^2, sex:age, and sex:age^2 interactions, BMI, microbiome
vendor, metabolomics batch, and the first 5 principal components of
genetic ancestry (N=1,569). Metabolite names ending in “*” or “**”
indicate compounds for which a standard is not available, but Metabolon
is confident in its identity.
Gene-microbiome interactions were significantly enriched in metabolites
involved in fatty acid metabolism, phenylalanine and tyrosine
metabolism, pyrimidine metabolism, and glycine metabolism ([93]Fig. 3C,
one-sided hypergeometric test, FDR-corrected p<0.05) and showed
enrichment in several host genes, such as CPS1, AGXT2, and SLC2A9
([94]Fig. 3E, one-sided hypergeometric test FDR-corrected p<0.05).
Additionally, we observed enrichment for interactions that involved the
Ruminococcaceae family ([95]Fig. 3D). We observed several instances of
clinically relevant metabolites that showed stronger microbe-metabolite
associations in the homozygous minor allele genotype. For example, we
saw interactions with metabolites involved in a variety of disease
conditions, such as hepatic failure, mitohormesis, cancer, and
inflammatory diseases ([96]Tables S3) ^[97]29–[98]31. In particular,
N-acetyltyrosine levels were associated with Ruminococcaceae UCG-002
and Bacteroides, but only in the homozygous minor allele genotypes for
variants in the ALMS1 gene and in an unchracterized locus on chromosome
2 ([99]Fig. 4A, [100]E). Glycine levels switched from a slight negative
correlation with Ruminococcaceae UBA1819 in the homozygous major allele
genotype for a locus in the CPS1 gene to a stronger positive
correlation within the homozygous minor allele genotype ([101]Fig. 4B).
As a final example, sphingomyelin levels were associated with
Holdemania, but only in the homozygous minor allele genotype of a
variant in the SYNE2 gene ([102]Fig. 4C). All of the selected
interactions could also be observed in the unadjusted data (Fig. S2).
Genetics associate with conjugated secondary bile acids
Having mapped plasma metabolite variability to genetic and microbial
factors, we now asked whether the observed partitioning of variances
explained may change along pathways that involve host-microbe
co-metabolisms. To this end, we investigated the metabolism of
secondary bile acids. Secondary bile acids are formed in the large
intestine via microbial deconjugation of primary bile acids, reabsorbed
into the bloodstream via the portal vein, and further metabolized in
the liver ^[103]32. Thus, secondary bile acid levels in the bloodstream
are influenced by both the microbiome and the host. We investigated
whether individual bile acid species variances were predominantly
explained by genetic factors, microbial factors, or both.
While unconjugated secondary bile acids were exclusively associated
with the gut microbiome, 4 out of 6 detected secondary bile acid
conjugates showed strong genetic components of their variances in both
the training and validation cohorts ([104]Fig. 5A). In particular,
plasma deoxycholate and glycochenodeoxycholate abundances showed no
significant genetic contribution, whereas more than 40% of the variance
in the host-conjugated deoxycholate glucuronide and more than 20% of
the variance in host-conjugated glycochenodexycholate glucuronide was
explained by host genetics ([105]Fig. 5B). Other glucuronidated
non-bile-acid compounds, such as p-cresol glucuronide, did not show
this kind of pattern (see [106]Tables S1–[107]S2). Modifications like
glucuronidation or sulfation usually occur in the liver and are used to
mark metabolites for excretion in feces or urine ^[108]33,[109]34.
These results indicate that the blood levels of conjugated secondary
bile acids are under strong genetic control, while blood levels of
unconjugated secondary bile acids are not.
Figure 5. Host genetic and gut microbial associations with secondary bile
acids.
Figure 5.
[110]Open in a new tab
(A-C) Explained variances for unconjugated secondary bile acids and
hepatic secondary bile acid conjugates in the training cohort (A), the
validation cohort (B) and in Bar, Korem et. al.^[111]1 (C). (A-C) Gray
lines connect deoxycholate and glycochenodeoxycholate with their
glucuronidated forms and colored boxes denote unconjugated and
conjugated secondary bile acids. (D) Associations between bacterial
genera and secondary bile acids (both conjugated and unconjugated) in
the training cohort. Only significant associations are shown
(FDR-corrected and confounder-adjusted p<0.05, two-sided Pearson
correlation test). Fill colors denote the Pearson product-moment
correlation coefficients (see legend). (E) Colored cells denote
associations that passed a genome-wide significance threshold in the
GWAS and the color denotes the estimated coefficient in the GWAS
regression. In (A-B, D-E) associations and variances were corrected for
sex, age, age^2, sex:age, and sex:age^2 interactions, BMI, microbiome
vendor, metabolomics batch, and the first 5 principal components of
genetic ancestry. Metabolite names ending in “*” or “**” indicate
compounds for which a standard is not available, but Metabolon is
confident in its identity.
Most hepatically modified secondary bile acids showed associations with
both genetic and microbial features. One exception was
glycochenodeoxycholate glucuronide, which was only associated with
genetics and not with the microbiome, even though the unconjugated
form, glycochenodeoxycholate, was only associated with the microbiome
([112]Fig. 5A). This result was consistent across the training and
validation cohorts and in the Bar, Korem, et al. data set, where the
same transition to genetic association after glucuronidation could be
observed ([113]Fig. 5A–[114]C). The genomic variant rs4149056 was
associated with 3 of the 6 modified secondary bile acids and has been
shown previously to affect the abundance of bile acids in plasma
^[115]35 (see [116]Fig. 5E). This variant is located in the solute
carrier protein SLCO1B1, which is expressed in the liver ^[117]36 and
transports secondary bile acids, with a preference for sulfated bile
acids and bile salts ^[118]37. Several other variants in SLCO1B1 were
associated with secondary bile acid derivatives. The two detected
glucuronidated secondary bile acids were associated with distinctly
different sets of SNPs. Glycochenodeoxycholate glucuronide plasma
abundances were mostly affected by variants in the SLCO1B3-SLC1B7 gene
cluster, whereas we found several variants located on chromosome 4,
many of which were in the TMPRSS11E serine protease-encoding gene (and
a number of intergenic variants), that specifically affected
deoxycholate glucuronide levels ([119]Fig. 5E).
Furthermore, we also identified at least two significant
genome-microbiome-metabolite interactions affecting the levels of
deoxycholate glucuronide and glycochenodeoxycholate glucuronide. An
uncharacterized variant on chromosome 4 was associated with a decreased
level of deoxycholate glucuronide in the homozygous minor allele
genotype, but this effect was mitigated in individuals with higher
levels of Eggerthella in their gut microbiomes ([120]Fig. 4D).
Furthermore, a variant in the SLCO1B1 gene showed a positive
correlation with Ruminoclostrium only in the homozygous major allele
genotype, although this ‘major’ allele was actually less common in our
cohort ([121]Fig. 4E).
Thus, whereas cross-sectional variations in unmodified secondary bile
acid levels were not explained by host genetic variation, plasma
abundances of several secondary bile acid conjugates were significantly
associated with a diverse set of compound-specific genetic variants on
chromosomes 4 and 12. Additionally, we observed that unmodified
deoxycholate and host-conjugated deoxycholate glucuronide where
associated with mutually exclusive sets of bacterial genera, despite
sharing the same secondary bile acid backbone ([122]Fig. 5D).
Sphingosines and ceramides show heterogeneous associations
We next asked whether hybrid associations would also be common in
another disease-relevant class of blood metabolites: ceramides. High
ceramide levels have been shown to be associated with insulin
resistance, hypercholesterolemia, liver steatosis, and the formation of
lipid rafts ^[123]38,[124]39. Furthermore, some classes of ceramides in
the blood increase the risk for late-onset Alzheimer’s disease, as they
are neurotoxic and induce apoptosis ^[125]40,[126]41. Ceramides are the
simplest of the sphingolipids and are formed either by de novo
synthesis from sphingosine or by hydrolysis of sphingomyelin molecules
^[127]42. While ceramides are rarely found in bacteria, many bacteria
in the Bacteroidetes phylum can synthesize sphingolipids, which have
been shown to be taken up and processed by human epithelial cells in
vitro ^[128]43. Thus, we asked whether variation in ceramide and
sphingosine derivatives levels in blood was significantly explained by
microbial genera, host genetic factors, or a combination of both.
We observed a large degree of heterogeneity in variance partitioning in
ceramide and sphingosine molecules ([129]Fig. 6). Variance in
sphingosine itself was only weakly explained by the composition of the
gut microbiome (R^2=0.01), with Streptococcus as the only significant
microbiome-related association. However, other intermediates in
ceramide synthesis, such as sphinganine derivatives, showed stronger
microbiome and genome associations (i.e. up to 5% and 2% of variance
explained in training and validation cohorts, respectively; [130]Fig.
6A–[131]B). Ceramides showed a broad range of explained variances.
Whereas most ceramides were associated with the gut microbiome, a small
subset had additional genetic components to their variances, such as
lactosyl-N-nervonoyl-sphingosine and ceramide (d16:1/24:1, d18:1/22:1),
both of which had more than 5% of their variation explained by genetic
factors in the training and validation cohorts. Both of these lipid
species are examples of very long-chain fatty-acyl sphingolipids. While
shorter fatty acids (≤C18) are prevalent in the human diet ^[132]44,
often serving as preferential substrates for ceramide synthesis by
certain gut microbes, longer chain fatty acids, such as nervonic acid
(24:1), are often the product of elongation by host enzymes before
being incorporated into sphingolipids that are primarily found in brain
tissue ^[133]45. This result was corroborated in Bar, Korem, et al.,
where only short-chain sphingolipids were observed to be associated
with the microbiome ^[134]1. In summary, we observed broad
heterogeneity in the fraction of variance explained by the microbiome
and host genetics, where a difference in the fatty acid chain length
can distinguish circulating ceramides associated with gut microbial
genera (i.e., shorter chains) from those associated with host genetics
(i.e., longer chains).
Figure 6. Host genetic and gut microbial associations with sphingosine and
ceramides.
Figure 6.
[135]Open in a new tab
(A-C) Explained variances for sphingosine, sphingosine intermediates,
and ceramides in the training cohort (A), the validation cohort (B) and
the study from Bar, Korem et. al. (C). (D) Associations between
bacterial genera and sphingosines/ceramides in the training cohort.
Only significant associations are shown (FDR-corrected and
confounder-adjusted p<0.05, two-sided Pearson correlation test). Fill
colors denote the Pearson product-moment correlation coefficients (see
legend). (E) Colored cells denote associations that passed a
genome-wide significance threshold in the GWAS and the color denotes
the estimated coefficient in the GWAS regression. In (A-B, D-E)
associations and variances were corrected for sex, age, age^2, sex:age,
and sex:age^2 interactions, BMI, microbiome vendor, metabolomics batch,
and the first 5 principal components of genetic ancestry. Metabolite
names ending in “*” or “**” indicate compounds for which a standard is
not available, but Metabolon is confident in its identity.
Discussion
In this study, we partition the cross-sectional variance explained in
individual blood metabolite levels into their host genetic and gut
microbiome components, across a population of 1,569 generally healthy
U.S. residents with paired genomic, microbiome, and blood metabolomic
data. Prior studies have established that the plasma metabolome is
intimately connected to host genetics as well as to the gut microbiome,
and that overall genetic and microbial influences on the metabolome
appear to be orthogonal. Similarly, we quantified the overlap between
microbial and genetic variance components for a given metabolite and
found that it was significant but very small (<0.03% of explained
variance). This result has important implications for Mendelian
randomization studies, which rely on this overlap (i.e.,
non-additivity), and explains why previous studies required very large
cohorts to detect very few putatively causal effects ^[136]46.
Until now, it has been unclear as to whether or not genetics and the
microbiome affect the levels of mutually exclusive sets of metabolites
in human blood or act simultaneously on individual metabolites. Most of
the detected blood metabolites (508/930) showed significant
associations with the gut microbiome. We found that hybrid
genome-microbiome models were common and affected about 10–16% of all
metabolites associated with either host genetics or the microbiome.
Putting a more exact estimate to this proportion will require larger
studies than the one presented here because hybrid associations were
enriched in metabolites with small microbiome variance components
(R^2<0.02), many of which we were likely underpowered to detect (e.g.,
as illustrated by the lower proportion of hybrid cases in the smaller
validation cohort). We observed that between one third and one half of
blood metabolites associated with host genetics included significant
hybrid associations with the gut microbiome, while only one in five
metabolites associated with the microbiome showed an additional hybrid
genetic component to their variances. Thus, while both genetic and
microbiome variation were important to explaining variation in the
blood metabolome, the microbiome (and the myriad factors associated
with variation in the microbiome, like diet and lifestyle) appears to
be the dominant driving force behind variation in circulating
metabolites.
Additionally, we found some evidence for gene-microbiome interactions,
where a particular metabolite-microbiome association was modulated by
the genetic background of the host. In general, these interactions only
explained small fractions of metabolite variance (i.e., <2%). However,
these interactions often appear to result from relatively strong
associations within the homozygous minor allele genotype, but we had
limited numbers of these minor allele carriers to assess these
associations robustly. That being said, if we look within the
homozygous minor allele genotypes, we observed up to 16% of a
metabolite’s variance explained by a given bacterial genus. In
particular, N-acetyl-tyrosine, glycine, and sphingomyelin abundances
all showed strong associations with a distinct set of gut bacterial
genera, but only in the homozygous minor allele genotypes. Thus, it
appears that genetically-determined deviations from a particular
quantitative trait may be modulated, or even induced, by a particular
microbial community composition in the gut. These kinds of
genome-microbiome interaction effects could help guide the design of
microbiome-targeted therapeutics that mitigate host genetic disease
risk. For instance, we found that glycine abundance was modulated by 24
gene-microbiome-metabolome interactions ([137]Fig. 4A and [138]Table
S3), mostly involving variants in the CPS1 gene. Glycine-addiction is a
hallmark of cancer proliferation, and restriction of glycine and serine
can reduce tumor growth in murine xenograft and allograft models
^[139]29,[140]47. Thus, there may be benefit to microbiome-targeted
interventions in individuals with the observed variants in the CPS1
gene to lower circulating glycine levels. However, given the generally
low prevalence of the minor allele and the high level of heterogeneity
in gut bacterial community composition, targeted interventional studies
will be required to validate these interactions in vivo.
Prior studies of microbe-host co-metabolites derived from microbial
precursors in blood, like hippurate or p-cresol sulfate (i.e.,
hepatically modified forms of microbially-derived metabolites), hinted
that the cross-sectional variance in these molecules might only be
associated with the microbiome and not with host genetics
^[141]13,[142]25,[143]48. Here we show that those observations do not
extrapolate to all microbe-host co-metabolites. Indeed, we found that
much of the cross-sectional variation in many metabolites derived from
gut microbial precursors was explained by host genetics. For example,
while unconjugated secondary bile acids in the bloodstream are only
associated with the gut microbiome, their glucuronidated or sulfated
derivatives, formed in the liver, were often strongly influenced by
host genetic variation. Most of the variance in deoxycholate
glucuronide levels across the current study cohort could be explained
by a combination of genetic and microbial factors, while unconjugated
deoxycholate was solely associated with the microbiome. Deoxycholate
and deoxycholate glucuronide both showed associations with the
microbiome, but these associations were with different sets of
bacterial genera. Glucuronidation facilitates excretion into urine and
feces and prior work in rats has shown that glucuronidated bile acids
are reabsorbed less efficiently than unmodified secondary bile acids in
the intestine ^[144]49. The gut metagenome encodes a variety of
β-glucuronidases ^[145]50,[146]51 which likely enable gut commensals to
use these bile-acid-conjugated-glucuronides as a carbon source.
Conversely, free secondary bile acids can be toxic to many bacterial
taxa ^[147]52. Thus, bile acids can directly drive changes in the gut
microbiome, either by acting as carbon sources that promote growth or
as toxic compounds that inhibit growth, which may explain the very
different subsets of gut genera associated with conjugated and
unconjugated forms of the same secondary bile acid ([148]Fig. 5D).
Another intriguing finding from our analyses was the variable
association of certain plasma sphingolipids with either host genetics
or the gut microbiome, depending on the fatty-acyl groups comprising
each lipid species. Ceramides with a ≤C18 fatty-acyl group showed
stronger correspondence to the gut microbiome, consistent with the high
prevalence of these fatty acids in the diet ^[149]44 and their
preferential incorporation into ceramides by certain gut taxa ^[150]53.
On the other hand, ceramides with very long chain fatty-acyl groups
(22:1,24:1), that are most abundant in brain tissue and often
synthesized through elongation by host enzymes ^[151]45, showed a
stronger correspondence with host genetics. Importantly, ceramides with
different fatty-acyl chain lengths have been implicated in a number of
human diseases, including Alzheimer’s disease, depression and mood
disorders ^[152]54. Distinguishing between which ceramides are under
the control of diet and the gut microbiome versus genetic
predisposition may aid in the design of new and improved precision
therapeutics.
It should be noted that the current study only included individuals
from the U.S. who were predominantly of European descent. While our
results are consistent with results obtained from cohorts in Israel and
Sweden ^[153]1,[154]55, future studies in more diverse populations will
be required to see whether or not the reported observations replicate
more broadly. Finally, while we included many highly relevant
covariates in our regression analyses (i.e. age, sex, BMI, and genetic
ancestry), many of the observed microbe-metabolite associations are
likely confounded with lifestyle and dietary habits, which were not
comprehensively tracked in the current study population and can
strongly influence the composition of the gut microbiome.
Overall, our analyses show that the plasma metabolome is influenced by
the interplay between genetic and microbial factors, where the
abundances of individual microbially-derived metabolites absorbed in
the gut is often affected additively by both host genetic variation and
by variation in the ecology of the gut. Furthermore, many
microbe-metabolite associations are dependent upon host genetic
background. These hybrid genome-microbiome-metabolite models provide
unique insights into the forces underlying variation in the human blood
metabolome and can suggest possible therapeutic strategies.
Specifically, we suggest that disease-relevant blood metabolites
strongly associated with the microbiome may be modifiable through
dietary, probiotic, prebiotic or lifestyle interventions, whereas
metabolites under strong genetic control may require pharmacological
interventions that target host metabolic pathways. Understanding which
of these circulating small molecules fall predominantly under host
versus microbiome control will help to guide interventions designed to
prevent and/or treat a range of diseases.
Methods
Institutional Review Board Approval for the Study
Procedures for this study were reviewed and approved by the Western
Institutional Review Board with the Institutional Review Board (IRB)
study number 20170658 for the Institute for Systems Biology and 1178906
for Arivale.
Cohort Description
All study participants were subscribers of the Arivale Scientific
Wellness program who provided consent and research authorization
allowing the use of their anonymized, de-identified data in research.
Arivale, Inc. (USA), was a consumer scientific wellness company that
shut down in 2019. The Arivale program is described in detail in Zubair
et. al. ^[155]56. In brief, participants signed up for a comprehensive
deep phenotyping program coupled with personalized data-driven wellness
coaching in order to improve overall health and wellness. Baseline
blood draws were taken by trained phlebotomists at LabCorp or Quest
service centers and paired with at-home fecal sampling (see
methodological details below). The cohort contained 997 individuals
with female sex assigned at birth and 572 individuals with male sex
assigned at birth (self-reported). The average age of the cohort was
49±11 years, and the average BMI was 28.2±6.6.
Metabolomics
Plasma samples were transferred to Metabolon, Inc. (USA), for sample
processing and data acquisition. Untargeted metabolomics was performed
using Metabolon’s Global Metabolomics high-performance liquid
chromatography (HPLC)–mass spectrometry (MS) platform across a total of
11 batches. In brief, EDTA-plasma samples were thawed on ice and
protein was removed using aqueous methanol extraction. The resulting
samples were divided into five fractions (1 backup fraction) and placed
briefly on TurboVap (Zymark) to remove organic solvents. Measurements
were taken using Waters ACQUITY ultra-performance liquid chromatography
(UPLC) and Thermo Scientific Q-Exactive high resolution/accuracy mass
spectrometer interfaced with a heated electrospay ionization (HESI-II)
source and an Orbitrap mass analyzer operated at 35,000 mass
resolution. Each of the four sample fractions was reconstituted for
measurement in one of four methods. One aliquot was measured under
positive ion condition by elution from a C18 column (Waters UPLC BEH
C18–2.1×100 mm, 1.7 μm) using water and methanol with 0.05%
perfluoropentanoic acid and 0.1% formic acid. The second sample was
eluted from the same column but using acetonitrile along with methanol
and water to optimize for more hydrophobic compounds. The third aliquot
was measured under negative ion optimized conditions and gradient
eluted from a C18 column using methanol, water, and 6.5mM ammonium
bicarbonate at pH 8. The last aliquot was measured using negative
ionization after elution from a HILIC column (Waters UPLC BEH Amide
2.1×150 mm, 1.7 μm) using water, acetonitrile, and 10mM ammonium
formate at pH 10.8. MS analyses were run using MS and data-dependent
MSn scans with dynamic exclusion with an approximate range of 70–1000
m/z.
Assay batch correction was performed by running a selected set of
quality control plasma samples in every batch and normalizing the
measured abundances to the quality control samples. Metabolon judges
the confidence level of annotation on a three-tiered scale. The
majority (>80%) of metabolites were annotated by a match to an internal
standard of the metabolite run on the same HPLC/MS setup (no specific
mark added to the name). Some metabolites were annotated only by a
match to a previously published MS spectrum (marked with “*”), and some
metabolites were annotated only by a match to a known chemical formula
(marked with “**”). Metabolites not annotated on any of those tiers,
but detected consistently in different biofluids, are denoted by a “X
-” prefix followed by a unique ID to indicate unknown compounds.
Metabolite not observed in at least 50% of the samples were removed,
resulting in the final list of 930 metabolites.
Genome-wide association analysis
DNA was extracted from whole blood samples by Covance (Redmond, WA)
using a standardized protocol. Whole genome sequencing was performed by
Wuxi, Inc. (Shanghai, China) in a CLIA-certified laboratory for 1,569
individuals. Extracted libraries were sequenced using an HiSeq X
sequencer (Illumina, San Diego, USA) with 150bp libraries and aiming
for >30x coverage. Basecalling and conversion to raw FASTQ files was
performed using the Illumina Basespace software. Raw reads were aligned
to the hg19 human reference genome with BWA 0.7.12. The GATK
HaplotypeCaller with GATK 3.3.0 was used to call individual variants.
This included indel local realignment followed by base quality
recalibration. This yielded a set of around 7.68M measured SNPs.
To account for genetic structure and potential cryptic relatedness in
the cohort, we used a linear mixed model to test for associations
between genetic variants and metabolite levels. For each metabolite, we
performed a genome-wide association study using FastGWA using the hg19
genetic map and the metabolite confounder-adjusted residuals ^[156]57.
Linkage disequilibrium scores were extracted for all variants in the
study and used as an additional covariate in the genome-wide
association study. Metabolome genome-wide significance was called using
the Bonferroni corrected p-value threshold of 5.37e-11 (5e-8 /
#metabolites) where 5e-8 is the commonly used threshold for genome-wide
significance used in the human genomics field ^[157]58. Lead variants
were mapped by two-step clumping strategy based on, linkage equilibrium
(LD scores) where we first clumped all variants within 250kbp fragments
that exceeded LD score cutoff of 0.8, followed by second clumping in
250kbp fragments using an LD score cutoff of 0.3. This resulted in the
final set of 297 independent lead variants.
Gut Microbiome Sequencing
Fecal samples were collected and chemcially preserved using a
proprietary at-home swab kit (OMNIgene Gut, DNAGenotek, USA). 250μL of
homogenized stool from each sample were subsequently used for DNA
extraction, using a MoBio PowerMag Soil DNA isolation kit (QIAGEN,
Germany) and the KingFisher Flex instrument. DNA concentrations in each
sample were quantified with a QuBit (ThermoFisher, USA) and purity was
assessed by measuring the A260/A280 absorbance ratio. Amplification and
library preparation were performed by external providers using custom
and optimized protocols by either amplifying the V4 region
(SecondGenome, USA) or the V3-V4 region (DNAGenotek, USA) of the 16S
gene.
Sequencing was performed using a MiSeq (Illumina, USA) using either a
paired-end 250bp protocol (SecondGenome) or a paired-end 300bp protocol
(DNAGenotek). Basecalling was performed using the Illumina Basespace
platform which removed the added phiX reads and provided the final
FASTQ files used for downstream analysis. Quality of the sequencing
reads was assessed by manual inspection of the error rate across
sequencing cycles and appropriate length cutoffs of 250bp for the
forward reads and 230bp for the reverse reads were chosen based on the
profiles. Reads with more than 2 expected errors under the Illumina
error model were removed from the analysis along with reads containing
ambiguous base calls (“N” nucleotides). More than 97% of reads passed
those filters yielding a mean of around 200,000 reads per sample.
Filtered and truncated reads were then used to infer amplicon sequence
variants using DADA2 ^[158]59. Here error profiles were learned for
each sequencing run separately. The resulting ASVs and respective
counts were merged and chimeras were removed using the “consensus”
strategy implemented in DADA2, which removed around 16% of all reads.
Taxonomy assignment of ASVs was performed by using the naive Bayes
classifier in DADA2 with the SILVA database (version 128). Species
assignments were performed by exact match of the inferred ASVs with the
reference 16S gene in SILVA, where possible. About 90% of reads could
be classified down to the genus levels and 32% of reads could be
classified down to the species level. The total set of samples was then
filtered for those individuals that also had paired microbiome,
metabolomics, and whole genome shotgun sequencing data available. Where
several microbiome time points were available we used the one closest
to the blood draw used for generating the metabolomics data. Low
abundance taxa, with less than 10 reads on average or appearing in less
than 50% of all samples, were removed from the data set prior to
analysis.
Metabolite-microbiome associations and confounder adjustment
Genus-level read counts were scaled using the centered log-ratio (CLR)
transform and centered across the 1,569 samples. Metabolite abundances
were log-transformed and centered which yielded good similarity to a
normal distribution as judged by a QQ-plot.
Transformed and scaled metabolite and bacterial genera abundances were
then regressed against the set of confounders and the residuals were
saved as new dependent variables. For the confounder adjustment we
chose the most common intrinsic factors that would affect blood
metabolite and fecal microbe abundances. Firstly, we corrected for sex,
age, and BMI by including coefficients for biological sex, age, age^2,
sex:age, and sex:age^2 interactions, and BMI. Additionally, we included
covariates for common batch effects, like the season of the year the
samples were obtained, the metabolomics batch (each sample in a batch
was processed together), and the vendor for the microbiome sample (DNA
Genotek or Second Genome). Finally, we included continuous measures of
genetic ancestry as covariates in our analyses, given by the first 5
principal components from an analysis of 100,000 linkage disequilibrium
corrected SNPs (minor allele frequency > 5%) calculated with the PC-AiR
and PC-Relate methods ^[159]60. The residuals were used for all further
analyses as well as visualizations and analyses of variances.
After this we performed individual linear regressions between all
metabolite-genus pairs to find significant associations between
individual metabolites and bacterial genera. P-values were obtained
using F-tests and corrected for false discovery rate using the
Benjamini and Hochberg method. We then retained the significant
associations with an FDR-corrected p<0.05. We also performed the same
analysis with alpha-diversity as the independent variable or with
alpha-diversity as a covariate to the CLR-transformed bacterial genus
abundance to verify that individual genus-level associations were not
due to an association with alpha-diversity alone, which we did not
observe to be the case.
Analysis of explained variances (R^2)
Explained variances (R^2) for each metabolite in the training cohort
were obtained by ordinary least squares models only containing features
that were significantly associated with each metabolite individually in
the training cohort. Ergo, we only include genus abundances for those
genera with a FDR-corrected p<0.05 in the previous metabolite-microbe
associations and only those single nucleotide variants that were
identified as significant in the previous genome-wide association study
for the particular metabolite. This was done to avoid inflation of the
explained variances, as adding random variables to a regression model
tends to increase the explained variance. We performed multilinear
regressions with only microbiome features, only genetic features, and
both genetic and microbiome features to quantify the variance of the
transformed and confounder-adjusted metabolite abundances that was
explained by each feature type individually and jointly. Consequently,
it should be noted that the explained variances in this manuscript
refer to the metabolite abundance variance after removing components of
the variance explained by the covariates listed above. As such, these
R^2 caclulations pose an upper bound of the explained variance of the
raw metabolite abundance but are also independent of common
confounders, such as age, sex, and BMI and thus should be more
generalizable to other populations structures.
Overlap between microbiome and genetic features was quantified as the
difference between the sum of R^2 of the individual feature type models
and the joint model R^2 (R^2[microbiome] + R^2[genetics] − R^2[joint]).
In the case of complete independence of microbiome factors and
genetics, this difference would be zero and would become positive if
there is partial or complete overlap in the variances explained by the
microbiome and host genetics.
In the validation cohort we used the same regression models but as in
the training cohort but fixed all coefficients to the ones from the
training cohort except for the intercept term to account for different
baseline levels of metabolites between the cohorts and avoid negative
R^2 values. This bounds the validation R^2 values to [0, 1] where a
metabolite not associated/validated with the identified features from
the training cohort will intrinsically receive an R^2 of 0.
Genome-microbiome interactions
For gene microbiome interactions, we performed ordinary least squares
regression of the form
[MATH:
mi=β0+β1bj+β2sk+γbj
msub>sk+
mo>δc+ϵi, :MATH]
where m[i] denotes the scaled abundance of metabolite i, b[j] the
scaled abundance of the bacterial genus j, s[k] the ordinal versions of
the allele on variant k (0 - major allele, 1 - heterozygous minor
allele, 2 - homozygous major allele), c the vector of confounder
covariates, and ε[i] a random normally distributed variable with an
expectation of zero. Significance of the interaction was then evaluated
by an F-test comparing the full model to a model with a fixed ɣ=0.
P-values from all tested metabolite / variant / genus triplets were
adjusted for false discovery rate using the Benjamini and Hochberg
method and judged as significant with a FDR cutoff p<0.05. Even though
individual metabolite and bacterial genus abundances had been adjusted
for confounders previously, we still included them in this regression
because the product b[j]*s[k] had not been adjusted for the same
confounders. Also, to make our analysis computationally feasible, we
performed the regressions only for those metabolites, bacterial genera,
and genetic variants that had shown at least one significant
interaction in the prior analyses. The full cohort was includes in the
interaction analysis due to the low prevalence of the minor allele
groups.
Finally, post-hoc tests for microbe-metabolite interactions within
sub-cohorts carrying a specific allele pairing were performed by
Pearson tests on the product-moment correlation within each sub-cohort
and for those triplets that had shown significant interaction effects.
Extended Data
Extended Data Figure 1. Genome-wide association study results.
Extended Data Figure 1.
[160]Open in a new tab
Genome-wide association p-values for variants associated with at least
one metabolite (only p < 1e-5 shown here). Red dashed line denotes
genome-wide significance (p < 5.37e-11). Gray bars denote separation
between chromosomes. N=1,000 (training cohort). The full list of
p-values from the GWAS analysis can be found in [161]Table S4.
Extended Data Figure 2.
Extended Data Figure 2.
[162]Open in a new tab
Gene-microbiome interactions explain variation in blood metabolite
levels (unadjusted data). Shown are six selected examples where the
genomic background modulates the associations between bacterial
genus-level abundances and metabolite levels. In (A-F) metabolite
abundances were log-transformed, but no adjustment for common
confounders, was performed here. In (A-F) “ρ” denotes the Spearman’s
Rank-Order correlation coefficient of the regression and “p” denotes
the p-value under a two-sided Spearman’s correlation test. The solid
line indicates an ordinary least squares regression line relating the
transformed metabolite abundance with the transformed genus abundance
within each group. The shaded area is the 95% confidence interval of
the regression. Associations were corrected for sex, age, age^2,
sex:age, and sex:age^2 interactions, BMI, microbiome vendor,
metabolomics batch, and the first 5 principal components of genetic
ancestry (N=1,569). Metabolite names ending in “*” or “**” indicate
compounds for which a standard is not available, but Metabolon is
confident in its identity.
Supplementary Material
1845232_RS_File
[163]NIHMS1845232-supplement-1845232_RS_File.pdf^ (1.3MB, pdf)
1845232_Sup_Tab_1-5
Table S1. Training cohort R^2 values for genetics and microbiome for
the 595 metabolites with a significant association in the training
cohort (N=1,000). Significant associations with bacterial genera were
obtained by Pearson correlation tests between CLR-transformed genus
abundances and log-transformed metabolite residuals corrected for
common confounders (all FDR-corrected p<0.05). Significant associations
with genomic variants were obtained by the FastGWA mixed-linear model
with a Bonferroni correction across metabolites (p < 5e-8 /
#metabolites).
Table S2. Validation out-of-sample R^2 values for genetics and
microbiome for the 595 metabolites with a significant association in
the training cohort (N=569). Explained variances were calculated by
using the fixed association coefficients from the training set models.
Table S3. Genome-microbiome interactions for blood metabolite
regression models with a significant interaction term (466
interactions, N=1,569). P-values were obtained from a two-sided Wald
test on the interaction term coefficient γ of an ordinary least squares
regression model.
Table S4. Significant results from the genome-wide association analysis
in the training cohort (N=1,000) along with annotations for variants
and metabolites. Significant associations with genomic variants were
obtained by the FastGWA mixed-linear model using a sparse
genetic-relationship matrix with a Bonferroni correction across
metabolites (p < 5e-8 / #metabolites).
Table S5. Enrichment tests for genome-microbiome associations. P-values
were obtained from one-sided hypergeometric tests and corrected for
multiple testing using the Benjamini-Hochberg correction.
[164]NIHMS1845232-supplement-1845232_Sup_Tab_1-5.xlsx^ (4.7MB, xlsx)
Acknowledgments