Abstract Variation in the blood metabolome is intimately related to human health. However, few details are known about the interplay between genetics and the microbiome in explaining this variation on a metabolite-by-metabolite level. Here, we perform analyses of variance for each of the 930 blood metabolites robustly detected across a cohort of 1,569 individuals with paired genomic and microbiome data, while controlling for a number of relevant covariates. We find that 595 (64%) of these blood metabolites are significantly associated with either host genetics or the gut microbiome, with 69% of these associations driven solely by the microbiome, 15% driven solely by genetics, and 16% under hybrid genome-microbiome control. Additionally, interaction effects, where a metabolite-microbe association is specific to a particular genetic background, are quite common, albeit with modest effect sizes. This knowledge will help to guide targeted interventions designed to alter the composition of the human blood metabolome. Editor summary: Diener and Dai et al analyse blood metabolites from 1569 individuals and identify metabolites associated with the microbiome, host genetics, or under hybrid genetic-microbiome control Introduction The human blood metabolome is shaped by a combination of intrinsic and extrinsic forces and constitutes the primary resource pool for human metabolism. While the composition of the plasma metabolite pool is strongly driven by diet, lifestyle, and the ecology of the gut microbiota, the fate of individual metabolites in the blood is often tightly regulated by host genetics ^[40]1. Genetic variants are known to alter the human blood metabolome in several disease-relevant contexts. For example, certain deleterious alleles impact cholesterol levels and contribute to hypercholesterolemia, while other well-known alleles drive phenylalanine accumulation and lead to phenylketonuria ^[41]2,[42]3. The majority of genetic variants associated with blood metabolite levels affect either enzymes or solute carriers, thus directly influencing an individual’s ability to produce, consume, secrete, or absorb small molecules ^[43]4,[44]5. Many of the currently identified metabolome-associated genetic variants are pleiotropic ^[45]6, which indicates a vast interconnectedness of the metabolome across different systems of the body. Recent studies have identified the human gut microbiome as a major determinant of blood metabolite variability ^[46]1,[47]7. In this prior work, host genetics and gut microbiome composition were found to associate with plasma metabolite levels in a largely orthogonal manner, which is supported by the observation that, for the most part, host genetics and the gut microbiota do not tend to associate strongly with one another ^[48]8, even though a number of clear examples of genome-microbiome associations have been shown ^[49]9. Despite these broad, multivariate regression results showing a global correspondence between genomics, the microbiome, and the metabolome, little is known about how this maps onto individual blood metabolite levels. While one might expect that the variation in metabolites produced by bacteria in the human gut is mostly governed by the microbiota and that metabolites specific to human metabolic processes are associated more with host genetics, there are examples, as in the case of microbe-host co-metabolites like conjugated secondary bile acids ^[50]10, where the story becomes more complicated. Intestinal signaling, such as the activation of FXR and TGR5 receptors in the human gut, can regulate glucose, insulin, cholesterol, and bile acid homeostasis ^[51]11,[52]12. Furthermore, many microbiome-derived metabolites are modified by hepatic enzymes and converted into a variety of conjugated compounds, such as hippurate or polyamines ^[53]13,[54]14. These multi-layer filters on microbe-host co-metabolism make it challenging to map blood metabolites to potential microbial precursors. For example, blood cholesterol levels are affected by host genetic variants, but they can also be regulated through intestinal signaling, involving the production of secondary bile acids by gut commensals, or through gut microbial cholesterol dehydrogenases that funnel host and dietary cholesterol into fecal coprostanol ^[55]15,[56]16. Thus, even though host genetics and the microbiome appear to have largely orthogonal effects on the blood metabolome as a whole, they nonetheless act on an overlapping set of metabolites, potentially explaining independent components of the variance of these compounds. Additionally, even in the absence of strong heritability of human gut commensals, there are key examples where genetics and the microbiome may interact in particular disease conditions, such as cystic fibrosis ^[57]17. This raises the question of whether there are instances where microbiome-metabolome associations are modulated by the genetic background of the host in a similar manner as other gene-environment interactions ^[58]18, where a genetic variant may augment or attenuate an individual’s risk of developing a particular disease when exposed to a specific environmental risk factor. Here, we studied variability in plasma metabolite abundances in a cohort of 1,569 healthy individuals from the U.S. We find extensive interplay between the genetic and gut microbial determinants of individual metabolite levels in the blood, which provides deep insights into the microbe-host co-metabolisms that govern the composition of the human blood metabolome. Results Metabolites associated with genetics and the microbiome We performed genome-metabolite and microbiome-metabolite association tests for a healthy cohort of 1,569 individuals with paired blood and fecal samples taken within 30 days of each other. The cohort consisted of individuals in the U.S., mostly from the Pacific West, that had enrolled in a scientific wellness program of the former company, Arivale, Inc., between 2016 and 2019. These individuals were generally healthy, between 18 and 87 years old, predominantly female (63.5%) and with a mean±SD body mass index (BMI) of 28.2±6.6. Metabolite abundances across the data set were first adjusted for common confounders and the residuals were used for further downstream analyses (927 metabolites associated with at least one confounding variable, FDR-corrected p<0.05, mean R^2=0.12±0.06). The cohort was split randomly into a training cohort of 1,000 individuals and a validation cohort of 569 individuals. Microbiome-metabolite and genome-metabolite association analyses were performed within the training cohort and the resulting training cohort coefficients were fixed in the validation cohort models to generate out-of-sample R^2 estimates ([59]Fig. 1A). Figure 1. Study design and cross-sectional variances in metabolites explained by host genetics, gut bacterial genera, or both. Figure 1. [60]Open in a new tab (A) The study comprised a metabolome-genome-wide association analysis and a metabolome-microbiome-wide association analysis (regressions on centered log-ratio transformed bacterial genus-level abundances) performed in the Arivale cohort. Explained variance for a specific blood metabolite can be partitioned into host genetic and microbiome associated components. The cohort was split into a training cohort of 1,000 individuals and a validation cohort of 569 individuals. (B) Fraction of variance (R^2) explained in the training cohort by host genetic or microbiome features across the 595 metabolites with significant associations with either the genome or the microbiome (FDR-corrected p<0.05). Specific sub-groups are annotated in the plot, including metabolites only associated with host genetics, metabolites only associated with gut genera, or metabolites significantly associated with both (i.e., hybrid). (C) Percentage of the 595 significant metabolites associated only with host genetics, only with the microbiome, or with both (hybrid) in the training cohort. (D) Fraction of variance (R^2) explained in the validation cohort. Coefficients were fixed to those from the training cohort to estimate out-of-sample R^2 values. (E) Percentage of the 595 significant metabolites associated only with host genetics, only with the microbiome, or with both (hybrid) in the validation cohort. (F) Comparison of the R^2 values obtained in this study with the R^2 values obtained in Bar, Korem et al. ^[61]1 for 159 metabolites identified as significant in both studies. The solid line denotes an ordinary least squares regression line, and the error bands denotes the 95% confidence interval of the regression. P-values were obtained by a Pearson correlation test and “r” denotes the Pearson product-moment correlation coefficient. To identify associations between host genetics and individual circulating blood metabolite levels, we performed genome-wide association analyses on 7.68 million common variants (Minor Allele Frequency ≥ 1%) in the 1,000 individuals from the training cohort for each of the 930 blood metabolites that passed our prevalence threshold. A total of 182 metabolites of the 930 tested (19.6%) were associated with one or more of the 297 independent lead variants that passed the genome-wide significance threshold (p < 5.37 × 10^−11; see Fig. S1). Of these 297 variants, 79 were in intergenic regions and 218 variants mapped to 123 genes, including those associated with inherited metabolic disorders such as ACADS, ALMS1, and GCKR. Potential loss-of-function consequences were predicted for two variants: stop gained for rs183603441 in HYKK, and a splice donor variant for rs61825638 in KMO. Missense mutations were predicted for 25 variants, affecting 39 metabolites. However, the majority of the metabolite-associated variants found within genes were synonymous. In particular, lactosylceramide lactosyl-N-nervonoyl-sphingosine (d18:1/24:1) was associated with lead variants rs3752246 (P = 9.8 × 10^−14) and rs3752248 (P = 5.6 × 10^−13) located in ABCA7. rs3752246 is a missense variant and has been previously identified as a risk variant for late-onset Alzheimer’s disease ^[62]19,[63]20. Prior studies have identified altered levels of ceramides in the blood and brain of individuals with Alzheimer’s Disease, with the d18:1/24:1 ceramide showing a strong and robust association with Alzheimer’s Disease in a recent meta-analysis ^[64]21. Our results further underscore a possible shared genetic architecture underlying lactosyl-N-nervonoyl-sphingosine (d18:1/24:1) levels and late-onset Alzheimer’s disease. Associations with more than one metabolite were identified for 70 of the 297 significant lead variants (23.6%), revealing the extent of pleiotropy in our cohort. The effect sizes of pleiotropic variants are generally higher than non-pleiotropic variants (median effect size difference = 0.08, P = 0.04; two-sided Mann-Whitney U-test), while the distributions of minor allele frequencies were similar between the two types of variants. Overall, the 70 pleiotropic lead variants were associated with 92 metabolites (52.2% of all significant genetically-associated metabolites). Four variants (rs4149056, rs1047891, rs148982377, and rs45446698), of which rs4149056 and rs1047891 are missense variants, were each associated with more than 8 metabolites. rs4149056 in SLCO1B1 (solute transporter in liver) was associated with 8 metabolites, including primary and secondary bile acids, conjugates of polyunsaturated fatty acids, and free fatty acids. rs1047891 in CPS1 (mitochondrial enzyme involved in the urea cycle) was found to be associated with 8 metabolites, many of which were conjugated to glycine and glutamine moieties and were highly correlated. rs148982377 in ZNF789 and rs45446698 in ZSCAN25 (transcription factors) were associated with the same set of 9 steroid hormone metabolites, primarily conjugates of DHEA and androsterone. Analysis at the gene level reveals a further degree of pleiotropy as well as polygenicity. Using annotations from dbSNP, we identified associations between genes and metabolites. 182 metabolites were associated with at least one gene, with 123 significant genes identified in total. Of these, 35 genes (27.8%) were associated with more than one metabolite. Individual pleiotropic genes tend to be associated with metabolites with similar biochemical properties, providing insights into unidentified metabolites. For example, the gene cluster of UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, and UGT1A10 was associated with 10 metabolites, including bilirubin, biliverdin, and five unidentified compounds. These genes encode UDP-glucuronosyltransferase (UGT) enzymes, which metabolize bilirubin ^[65]22. Consistent with our results, Metabolon later determined that these five unidentified compounds were degradation products of bilirubin. In addition to pleiotropy, we also observe instances of polygenic associations for several metabolites. For example, ethylmalonate, a breakdown product of butyrate, was significantly associated with six different genes: ACADS, SPPL3, CABP1, OASL, DYNLL1, and MLEC. In order to identify associations between the gut microbiome and circulating blood metabolite levels, we performed regressions using centered-log-ratio (CLR) transformed bacterial genus-level abundances as independent variables, while correcting for sex, age, sex-age interactions, BMI, and genetic kinship (i.e., the first 5 principal components of the genetic distance matrix). A majority of the tested metabolites had significant associations with at least one bacterial genus in the gut microbiome (508/930 = 54.6%, FDR-corrected p<0.05). Here, the average fraction of explained variance (R^2) was lower than genetic associations (mean R^2 of 0.04 and 0.11 for microbiome and host genetic features in the training set, respectively; 0.02 vs. 0.09 in the validation set, respectively). However, these R^2 values are not directly comparable across genomic and microbiome models due to the different significance thresholds used (Bonferroni for the GWAS and FDR-correction for the microbiome). The blood metabolites with the largest fraction of variance explained by microbial features were dominated by compounds involved in bacterial metabolism of aromatic and phenolic compounds, such as cinnamoylglycine, hydrocinnamate, and hippurate (R^2>0.25 in the training cohort and R^2>0.15 in the validation cohort). Catabolism of phenylalanine and phenylacetate to cinnamate and benzoate is exclusive to the microbiome and is conspicuously absent in germ-free mice ^[66]23,[67]24. Hippurate is formed in the liver by conjugating benzoate to a glycine and blood levels of hippurate have been positively associated with gut microbiome alpha-diversity ^[68]7,[69]25 and with overall metabolic health ^[70]26. Additionally, we found several products of bacterial protein fermentation, such as p-cresol derivatives and the tryptophan breakdown product indolepropionate, associated with the gut microbiome ([71]Tables S1–[72]S2). Furthermore, we found that all secondary bile acids identified in plasma were significantly associated with the gut microbiome (FDR-corrected p<0.05), which is expected because primarily bile acids are deconjugated into secondary bile acids by bile salt hydrolases expressed by gut commensal bacteria ^[73]27. Here, isoursodeoxycholate had the largest fraction of variance explained by gut microbiome composition (R^2 of 0.36 and 0.3 in the training and validation cohorts, respectively). R^2 estimates from the current study showed good agreement with prior estimates from an Israeli cohort^[74]1 for microbiome-metabolite associations (Pearson r=0.5, [75]Fig. 1F) and excellent agreement for genome-metabolite associations (Pearson r=0.9, [76]Fig. 1F). The largest differences in microbiome-metabolite variances across cohorts were observed mostly for xanthine derivatives and may be due to different microbiome sequencing strategies, confounder-adjustments, different statistical modeling approaches, or differences between serum (Bar, Korem, et al.) and plasma (this study). Additive contributions of genetic and microbial factors We identified a total of 595 out of 930 plasma metabolites (64%) that were associated with either genetic factors, microbial factors, or both ([77]Fig. 1A–[78]B). Most of these metabolites (508) were significantly associated with the microbiome. 16.3% and 11.4% of these metabolites showed “hybrid” associations in the training and validation cohorts, respectively, meaning they were significantly associated with both genetic and microbial factors (i.e., >1% of total explained variance explained by both sets of features, [79]Fig. 1B–[80]E). In particular, we found that about ⅓ − ½ of all metabolites with a significant genetic association also showed a microbial association (i.e., 82/161 and 57/160 metabolites associated with genetic variants in the training and validation cohorts, respectively). Conversely, no more than ⅕ of all metabolites with a microbiome association also showed a hybrid association with host genetics across training and validation cohorts. Consistent with the average explained variances of metabolites reported above for genetics and the microbiome, hybrid metabolites with particularly large total explained variances (i.e., >20%) showed a tendency to have higher genetic R^2 values than microbial R^2 values (see [81]Fig. 1B,[82]D and [83]Fig. 2A). However, as mentioned above, these patterns should be interpreted with caution because of the different significance thresholds used for microbiome and genetic association tests. Figure 2. Metabolites associated with both genetics and the microbiome. Figure 2. [84]Open in a new tab (A) The 20 metabolites with the highest total R^2 in the validation cohort that were significantly associated with both host genetics and the microbiome in the training cohort. Blue and green colored bars denote individual R^2 values in genetics-only (blue) and microbiome-only (green) regression models, respectively. Gray bars denote R^2 values from regression using joined genetics and microbiome data. (B) R^2 values obtained by either adding individual contributions of genetics and the microbiome (additive) or by performing a joint regression (joint) for all metabolites in the hybrid group with an identified standard. The difference between the two groups indicates a very small, but nonetheless significant, overlap in variance explained by genetics and the microbiome (n=82 metabolites in each group, two-sided Mann-Whitney test). However, the variances explained by host genetics and the microbiome were largely additive. Boxes denote the interquartile range (IQR) and the center line denotes the median. Whiskers extend up to the largest and smallest point within 1.5 IQRs. Stars denote significance (*** - p<0.001). Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity. In order to test whether genetic and microbial factors contained redundant information, we compared a joint genetics-microbiome model to individual genetic and microbial models. If the overlap in variance explained between genetic and microbial feature sets was large, the joint model R^2 value would be substantially smaller than the sum of the individual model R^2 values (see [85]Fig. 2A). Although we found that the difference in R^2 between the joint model and the sum of the individual model R^2 values was statistically significant in the validation cohort across a very large number of models, the magnitude of this difference was extremely small (median difference in R^2<0.003), indicating that genetics and the microbiome explain nearly exclusive components of the variance for a given metabolite. This result is consistent with prior work showing that the variances in the blood metabolome explained by the microbiome and host genetics were largely orthogonal ^[86]1,[87]8. However, here we demonstrate that this is not only true globally, but at the individual metabolite level as well. Genome-microbiome interaction effects influence metabolites Finally, we investigated whether the genetic background of the host could modulate microbiome-metabolite associations (see [88]Fig. 3A). To this end, we tested all genetic variant / bacterial genus / metabolite triplets across our full cohort (N=1,569), filtering for only those variant-metabolite pairs that were previously detected in the genome-wide associations studies (i.e., 23,126 triplets in total). We employed a strategy similar to gene-environment interaction studies, using the CLR transformed abundances of microbial genera as the environmental variable, while adjusting for the covariates mentioned above ^[89]28. 466 interaction effects were deemed significant under an FDR-corrected p-value cutoff of 0.05, involving 69 distinct metabolites. Though gene-microbiome interactions were quite common, they only explained very small fractions of plasma metabolite variability (mean R^2=0.005) across all genotypes. However, gene-microbiome-metabolite interactions often showed appreciable R^2 in the homozygous minor allele genotypes, where they explained up to 16% of the metabolite variances ([90]Fig. 4). Figure 3. Gene-microbiome interactions. Figure 3. [91]Open in a new tab (A) Gene-microbiome interactions occur when the correlation between a gut bacterial genus and a blood metabolite is itself conditional on a specific allele pairing. (B) Pathway enrichment analysis for metabolites involved in significant gene-microbe interactions. (C) Bacterial family-level enrichment analysis for taxa involved in significant gene-microbe interactions. (D) Host gene enrichment analysis for genes involved in significant gene-microbe interactions. In (B-D) red circles denote prevalence across all tested features (i.e., background prevalence) and blue triangles denote prevalence in only those features with significant interaction terms. Stars denote significantly enriched features under a one-sided hypergeometric test with a Benjamini-Hochberg correction for multiple testing (* - p<0.05, ** - p<0.01, *** - p<0.001). (B-D) include data from the entire cohort (n=1,569). Figure 4. Gene-microbiome interactions explain variation in blood metabolite levels. Figure 4. [92]Open in a new tab Shown are six selected examples where the genomic background modulates the associations between bacterial genus-level abundances and metabolite levels. In (A-F) metabolite abundances were log-transformed and adjusted for common confounders described above. In (A-F) “ρ” denotes the Spearman’s Rank-Order correlation coefficient of the regression and “p” denotes the p-value under a Spearman’s correlation test. The solid line indicates an ordinary least squares regression line relating the transformed metabolite abundance with the transformed genus abundance within each group. The shaded area is the 95% confidence interval of the regression. Associations were corrected for sex, age, age^2, sex:age, and sex:age^2 interactions, BMI, microbiome vendor, metabolomics batch, and the first 5 principal components of genetic ancestry (N=1,569). Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity. Gene-microbiome interactions were significantly enriched in metabolites involved in fatty acid metabolism, phenylalanine and tyrosine metabolism, pyrimidine metabolism, and glycine metabolism ([93]Fig. 3C, one-sided hypergeometric test, FDR-corrected p<0.05) and showed enrichment in several host genes, such as CPS1, AGXT2, and SLC2A9 ([94]Fig. 3E, one-sided hypergeometric test FDR-corrected p<0.05). Additionally, we observed enrichment for interactions that involved the Ruminococcaceae family ([95]Fig. 3D). We observed several instances of clinically relevant metabolites that showed stronger microbe-metabolite associations in the homozygous minor allele genotype. For example, we saw interactions with metabolites involved in a variety of disease conditions, such as hepatic failure, mitohormesis, cancer, and inflammatory diseases ([96]Tables S3) ^[97]29–[98]31. In particular, N-acetyltyrosine levels were associated with Ruminococcaceae UCG-002 and Bacteroides, but only in the homozygous minor allele genotypes for variants in the ALMS1 gene and in an unchracterized locus on chromosome 2 ([99]Fig. 4A, [100]E). Glycine levels switched from a slight negative correlation with Ruminococcaceae UBA1819 in the homozygous major allele genotype for a locus in the CPS1 gene to a stronger positive correlation within the homozygous minor allele genotype ([101]Fig. 4B). As a final example, sphingomyelin levels were associated with Holdemania, but only in the homozygous minor allele genotype of a variant in the SYNE2 gene ([102]Fig. 4C). All of the selected interactions could also be observed in the unadjusted data (Fig. S2). Genetics associate with conjugated secondary bile acids Having mapped plasma metabolite variability to genetic and microbial factors, we now asked whether the observed partitioning of variances explained may change along pathways that involve host-microbe co-metabolisms. To this end, we investigated the metabolism of secondary bile acids. Secondary bile acids are formed in the large intestine via microbial deconjugation of primary bile acids, reabsorbed into the bloodstream via the portal vein, and further metabolized in the liver ^[103]32. Thus, secondary bile acid levels in the bloodstream are influenced by both the microbiome and the host. We investigated whether individual bile acid species variances were predominantly explained by genetic factors, microbial factors, or both. While unconjugated secondary bile acids were exclusively associated with the gut microbiome, 4 out of 6 detected secondary bile acid conjugates showed strong genetic components of their variances in both the training and validation cohorts ([104]Fig. 5A). In particular, plasma deoxycholate and glycochenodeoxycholate abundances showed no significant genetic contribution, whereas more than 40% of the variance in the host-conjugated deoxycholate glucuronide and more than 20% of the variance in host-conjugated glycochenodexycholate glucuronide was explained by host genetics ([105]Fig. 5B). Other glucuronidated non-bile-acid compounds, such as p-cresol glucuronide, did not show this kind of pattern (see [106]Tables S1–[107]S2). Modifications like glucuronidation or sulfation usually occur in the liver and are used to mark metabolites for excretion in feces or urine ^[108]33,[109]34. These results indicate that the blood levels of conjugated secondary bile acids are under strong genetic control, while blood levels of unconjugated secondary bile acids are not. Figure 5. Host genetic and gut microbial associations with secondary bile acids. Figure 5. [110]Open in a new tab (A-C) Explained variances for unconjugated secondary bile acids and hepatic secondary bile acid conjugates in the training cohort (A), the validation cohort (B) and in Bar, Korem et. al.^[111]1 (C). (A-C) Gray lines connect deoxycholate and glycochenodeoxycholate with their glucuronidated forms and colored boxes denote unconjugated and conjugated secondary bile acids. (D) Associations between bacterial genera and secondary bile acids (both conjugated and unconjugated) in the training cohort. Only significant associations are shown (FDR-corrected and confounder-adjusted p<0.05, two-sided Pearson correlation test). Fill colors denote the Pearson product-moment correlation coefficients (see legend). (E) Colored cells denote associations that passed a genome-wide significance threshold in the GWAS and the color denotes the estimated coefficient in the GWAS regression. In (A-B, D-E) associations and variances were corrected for sex, age, age^2, sex:age, and sex:age^2 interactions, BMI, microbiome vendor, metabolomics batch, and the first 5 principal components of genetic ancestry. Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity. Most hepatically modified secondary bile acids showed associations with both genetic and microbial features. One exception was glycochenodeoxycholate glucuronide, which was only associated with genetics and not with the microbiome, even though the unconjugated form, glycochenodeoxycholate, was only associated with the microbiome ([112]Fig. 5A). This result was consistent across the training and validation cohorts and in the Bar, Korem, et al. data set, where the same transition to genetic association after glucuronidation could be observed ([113]Fig. 5A–[114]C). The genomic variant rs4149056 was associated with 3 of the 6 modified secondary bile acids and has been shown previously to affect the abundance of bile acids in plasma ^[115]35 (see [116]Fig. 5E). This variant is located in the solute carrier protein SLCO1B1, which is expressed in the liver ^[117]36 and transports secondary bile acids, with a preference for sulfated bile acids and bile salts ^[118]37. Several other variants in SLCO1B1 were associated with secondary bile acid derivatives. The two detected glucuronidated secondary bile acids were associated with distinctly different sets of SNPs. Glycochenodeoxycholate glucuronide plasma abundances were mostly affected by variants in the SLCO1B3-SLC1B7 gene cluster, whereas we found several variants located on chromosome 4, many of which were in the TMPRSS11E serine protease-encoding gene (and a number of intergenic variants), that specifically affected deoxycholate glucuronide levels ([119]Fig. 5E). Furthermore, we also identified at least two significant genome-microbiome-metabolite interactions affecting the levels of deoxycholate glucuronide and glycochenodeoxycholate glucuronide. An uncharacterized variant on chromosome 4 was associated with a decreased level of deoxycholate glucuronide in the homozygous minor allele genotype, but this effect was mitigated in individuals with higher levels of Eggerthella in their gut microbiomes ([120]Fig. 4D). Furthermore, a variant in the SLCO1B1 gene showed a positive correlation with Ruminoclostrium only in the homozygous major allele genotype, although this ‘major’ allele was actually less common in our cohort ([121]Fig. 4E). Thus, whereas cross-sectional variations in unmodified secondary bile acid levels were not explained by host genetic variation, plasma abundances of several secondary bile acid conjugates were significantly associated with a diverse set of compound-specific genetic variants on chromosomes 4 and 12. Additionally, we observed that unmodified deoxycholate and host-conjugated deoxycholate glucuronide where associated with mutually exclusive sets of bacterial genera, despite sharing the same secondary bile acid backbone ([122]Fig. 5D). Sphingosines and ceramides show heterogeneous associations We next asked whether hybrid associations would also be common in another disease-relevant class of blood metabolites: ceramides. High ceramide levels have been shown to be associated with insulin resistance, hypercholesterolemia, liver steatosis, and the formation of lipid rafts ^[123]38,[124]39. Furthermore, some classes of ceramides in the blood increase the risk for late-onset Alzheimer’s disease, as they are neurotoxic and induce apoptosis ^[125]40,[126]41. Ceramides are the simplest of the sphingolipids and are formed either by de novo synthesis from sphingosine or by hydrolysis of sphingomyelin molecules ^[127]42. While ceramides are rarely found in bacteria, many bacteria in the Bacteroidetes phylum can synthesize sphingolipids, which have been shown to be taken up and processed by human epithelial cells in vitro ^[128]43. Thus, we asked whether variation in ceramide and sphingosine derivatives levels in blood was significantly explained by microbial genera, host genetic factors, or a combination of both. We observed a large degree of heterogeneity in variance partitioning in ceramide and sphingosine molecules ([129]Fig. 6). Variance in sphingosine itself was only weakly explained by the composition of the gut microbiome (R^2=0.01), with Streptococcus as the only significant microbiome-related association. However, other intermediates in ceramide synthesis, such as sphinganine derivatives, showed stronger microbiome and genome associations (i.e. up to 5% and 2% of variance explained in training and validation cohorts, respectively; [130]Fig. 6A–[131]B). Ceramides showed a broad range of explained variances. Whereas most ceramides were associated with the gut microbiome, a small subset had additional genetic components to their variances, such as lactosyl-N-nervonoyl-sphingosine and ceramide (d16:1/24:1, d18:1/22:1), both of which had more than 5% of their variation explained by genetic factors in the training and validation cohorts. Both of these lipid species are examples of very long-chain fatty-acyl sphingolipids. While shorter fatty acids (≤C18) are prevalent in the human diet ^[132]44, often serving as preferential substrates for ceramide synthesis by certain gut microbes, longer chain fatty acids, such as nervonic acid (24:1), are often the product of elongation by host enzymes before being incorporated into sphingolipids that are primarily found in brain tissue ^[133]45. This result was corroborated in Bar, Korem, et al., where only short-chain sphingolipids were observed to be associated with the microbiome ^[134]1. In summary, we observed broad heterogeneity in the fraction of variance explained by the microbiome and host genetics, where a difference in the fatty acid chain length can distinguish circulating ceramides associated with gut microbial genera (i.e., shorter chains) from those associated with host genetics (i.e., longer chains). Figure 6. Host genetic and gut microbial associations with sphingosine and ceramides. Figure 6. [135]Open in a new tab (A-C) Explained variances for sphingosine, sphingosine intermediates, and ceramides in the training cohort (A), the validation cohort (B) and the study from Bar, Korem et. al. (C). (D) Associations between bacterial genera and sphingosines/ceramides in the training cohort. Only significant associations are shown (FDR-corrected and confounder-adjusted p<0.05, two-sided Pearson correlation test). Fill colors denote the Pearson product-moment correlation coefficients (see legend). (E) Colored cells denote associations that passed a genome-wide significance threshold in the GWAS and the color denotes the estimated coefficient in the GWAS regression. In (A-B, D-E) associations and variances were corrected for sex, age, age^2, sex:age, and sex:age^2 interactions, BMI, microbiome vendor, metabolomics batch, and the first 5 principal components of genetic ancestry. Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity. Discussion In this study, we partition the cross-sectional variance explained in individual blood metabolite levels into their host genetic and gut microbiome components, across a population of 1,569 generally healthy U.S. residents with paired genomic, microbiome, and blood metabolomic data. Prior studies have established that the plasma metabolome is intimately connected to host genetics as well as to the gut microbiome, and that overall genetic and microbial influences on the metabolome appear to be orthogonal. Similarly, we quantified the overlap between microbial and genetic variance components for a given metabolite and found that it was significant but very small (<0.03% of explained variance). This result has important implications for Mendelian randomization studies, which rely on this overlap (i.e., non-additivity), and explains why previous studies required very large cohorts to detect very few putatively causal effects ^[136]46. Until now, it has been unclear as to whether or not genetics and the microbiome affect the levels of mutually exclusive sets of metabolites in human blood or act simultaneously on individual metabolites. Most of the detected blood metabolites (508/930) showed significant associations with the gut microbiome. We found that hybrid genome-microbiome models were common and affected about 10–16% of all metabolites associated with either host genetics or the microbiome. Putting a more exact estimate to this proportion will require larger studies than the one presented here because hybrid associations were enriched in metabolites with small microbiome variance components (R^2<0.02), many of which we were likely underpowered to detect (e.g., as illustrated by the lower proportion of hybrid cases in the smaller validation cohort). We observed that between one third and one half of blood metabolites associated with host genetics included significant hybrid associations with the gut microbiome, while only one in five metabolites associated with the microbiome showed an additional hybrid genetic component to their variances. Thus, while both genetic and microbiome variation were important to explaining variation in the blood metabolome, the microbiome (and the myriad factors associated with variation in the microbiome, like diet and lifestyle) appears to be the dominant driving force behind variation in circulating metabolites. Additionally, we found some evidence for gene-microbiome interactions, where a particular metabolite-microbiome association was modulated by the genetic background of the host. In general, these interactions only explained small fractions of metabolite variance (i.e., <2%). However, these interactions often appear to result from relatively strong associations within the homozygous minor allele genotype, but we had limited numbers of these minor allele carriers to assess these associations robustly. That being said, if we look within the homozygous minor allele genotypes, we observed up to 16% of a metabolite’s variance explained by a given bacterial genus. In particular, N-acetyl-tyrosine, glycine, and sphingomyelin abundances all showed strong associations with a distinct set of gut bacterial genera, but only in the homozygous minor allele genotypes. Thus, it appears that genetically-determined deviations from a particular quantitative trait may be modulated, or even induced, by a particular microbial community composition in the gut. These kinds of genome-microbiome interaction effects could help guide the design of microbiome-targeted therapeutics that mitigate host genetic disease risk. For instance, we found that glycine abundance was modulated by 24 gene-microbiome-metabolome interactions ([137]Fig. 4A and [138]Table S3), mostly involving variants in the CPS1 gene. Glycine-addiction is a hallmark of cancer proliferation, and restriction of glycine and serine can reduce tumor growth in murine xenograft and allograft models ^[139]29,[140]47. Thus, there may be benefit to microbiome-targeted interventions in individuals with the observed variants in the CPS1 gene to lower circulating glycine levels. However, given the generally low prevalence of the minor allele and the high level of heterogeneity in gut bacterial community composition, targeted interventional studies will be required to validate these interactions in vivo. Prior studies of microbe-host co-metabolites derived from microbial precursors in blood, like hippurate or p-cresol sulfate (i.e., hepatically modified forms of microbially-derived metabolites), hinted that the cross-sectional variance in these molecules might only be associated with the microbiome and not with host genetics ^[141]13,[142]25,[143]48. Here we show that those observations do not extrapolate to all microbe-host co-metabolites. Indeed, we found that much of the cross-sectional variation in many metabolites derived from gut microbial precursors was explained by host genetics. For example, while unconjugated secondary bile acids in the bloodstream are only associated with the gut microbiome, their glucuronidated or sulfated derivatives, formed in the liver, were often strongly influenced by host genetic variation. Most of the variance in deoxycholate glucuronide levels across the current study cohort could be explained by a combination of genetic and microbial factors, while unconjugated deoxycholate was solely associated with the microbiome. Deoxycholate and deoxycholate glucuronide both showed associations with the microbiome, but these associations were with different sets of bacterial genera. Glucuronidation facilitates excretion into urine and feces and prior work in rats has shown that glucuronidated bile acids are reabsorbed less efficiently than unmodified secondary bile acids in the intestine ^[144]49. The gut metagenome encodes a variety of β-glucuronidases ^[145]50,[146]51 which likely enable gut commensals to use these bile-acid-conjugated-glucuronides as a carbon source. Conversely, free secondary bile acids can be toxic to many bacterial taxa ^[147]52. Thus, bile acids can directly drive changes in the gut microbiome, either by acting as carbon sources that promote growth or as toxic compounds that inhibit growth, which may explain the very different subsets of gut genera associated with conjugated and unconjugated forms of the same secondary bile acid ([148]Fig. 5D). Another intriguing finding from our analyses was the variable association of certain plasma sphingolipids with either host genetics or the gut microbiome, depending on the fatty-acyl groups comprising each lipid species. Ceramides with a ≤C18 fatty-acyl group showed stronger correspondence to the gut microbiome, consistent with the high prevalence of these fatty acids in the diet ^[149]44 and their preferential incorporation into ceramides by certain gut taxa ^[150]53. On the other hand, ceramides with very long chain fatty-acyl groups (22:1,24:1), that are most abundant in brain tissue and often synthesized through elongation by host enzymes ^[151]45, showed a stronger correspondence with host genetics. Importantly, ceramides with different fatty-acyl chain lengths have been implicated in a number of human diseases, including Alzheimer’s disease, depression and mood disorders ^[152]54. Distinguishing between which ceramides are under the control of diet and the gut microbiome versus genetic predisposition may aid in the design of new and improved precision therapeutics. It should be noted that the current study only included individuals from the U.S. who were predominantly of European descent. While our results are consistent with results obtained from cohorts in Israel and Sweden ^[153]1,[154]55, future studies in more diverse populations will be required to see whether or not the reported observations replicate more broadly. Finally, while we included many highly relevant covariates in our regression analyses (i.e. age, sex, BMI, and genetic ancestry), many of the observed microbe-metabolite associations are likely confounded with lifestyle and dietary habits, which were not comprehensively tracked in the current study population and can strongly influence the composition of the gut microbiome. Overall, our analyses show that the plasma metabolome is influenced by the interplay between genetic and microbial factors, where the abundances of individual microbially-derived metabolites absorbed in the gut is often affected additively by both host genetic variation and by variation in the ecology of the gut. Furthermore, many microbe-metabolite associations are dependent upon host genetic background. These hybrid genome-microbiome-metabolite models provide unique insights into the forces underlying variation in the human blood metabolome and can suggest possible therapeutic strategies. Specifically, we suggest that disease-relevant blood metabolites strongly associated with the microbiome may be modifiable through dietary, probiotic, prebiotic or lifestyle interventions, whereas metabolites under strong genetic control may require pharmacological interventions that target host metabolic pathways. Understanding which of these circulating small molecules fall predominantly under host versus microbiome control will help to guide interventions designed to prevent and/or treat a range of diseases. Methods Institutional Review Board Approval for the Study Procedures for this study were reviewed and approved by the Western Institutional Review Board with the Institutional Review Board (IRB) study number 20170658 for the Institute for Systems Biology and 1178906 for Arivale. Cohort Description All study participants were subscribers of the Arivale Scientific Wellness program who provided consent and research authorization allowing the use of their anonymized, de-identified data in research. Arivale, Inc. (USA), was a consumer scientific wellness company that shut down in 2019. The Arivale program is described in detail in Zubair et. al. ^[155]56. In brief, participants signed up for a comprehensive deep phenotyping program coupled with personalized data-driven wellness coaching in order to improve overall health and wellness. Baseline blood draws were taken by trained phlebotomists at LabCorp or Quest service centers and paired with at-home fecal sampling (see methodological details below). The cohort contained 997 individuals with female sex assigned at birth and 572 individuals with male sex assigned at birth (self-reported). The average age of the cohort was 49±11 years, and the average BMI was 28.2±6.6. Metabolomics Plasma samples were transferred to Metabolon, Inc. (USA), for sample processing and data acquisition. Untargeted metabolomics was performed using Metabolon’s Global Metabolomics high-performance liquid chromatography (HPLC)–mass spectrometry (MS) platform across a total of 11 batches. In brief, EDTA-plasma samples were thawed on ice and protein was removed using aqueous methanol extraction. The resulting samples were divided into five fractions (1 backup fraction) and placed briefly on TurboVap (Zymark) to remove organic solvents. Measurements were taken using Waters ACQUITY ultra-performance liquid chromatography (UPLC) and Thermo Scientific Q-Exactive high resolution/accuracy mass spectrometer interfaced with a heated electrospay ionization (HESI-II) source and an Orbitrap mass analyzer operated at 35,000 mass resolution. Each of the four sample fractions was reconstituted for measurement in one of four methods. One aliquot was measured under positive ion condition by elution from a C18 column (Waters UPLC BEH C18–2.1×100 mm, 1.7 μm) using water and methanol with 0.05% perfluoropentanoic acid and 0.1% formic acid. The second sample was eluted from the same column but using acetonitrile along with methanol and water to optimize for more hydrophobic compounds. The third aliquot was measured under negative ion optimized conditions and gradient eluted from a C18 column using methanol, water, and 6.5mM ammonium bicarbonate at pH 8. The last aliquot was measured using negative ionization after elution from a HILIC column (Waters UPLC BEH Amide 2.1×150 mm, 1.7 μm) using water, acetonitrile, and 10mM ammonium formate at pH 10.8. MS analyses were run using MS and data-dependent MSn scans with dynamic exclusion with an approximate range of 70–1000 m/z. Assay batch correction was performed by running a selected set of quality control plasma samples in every batch and normalizing the measured abundances to the quality control samples. Metabolon judges the confidence level of annotation on a three-tiered scale. The majority (>80%) of metabolites were annotated by a match to an internal standard of the metabolite run on the same HPLC/MS setup (no specific mark added to the name). Some metabolites were annotated only by a match to a previously published MS spectrum (marked with “*”), and some metabolites were annotated only by a match to a known chemical formula (marked with “**”). Metabolites not annotated on any of those tiers, but detected consistently in different biofluids, are denoted by a “X -” prefix followed by a unique ID to indicate unknown compounds. Metabolite not observed in at least 50% of the samples were removed, resulting in the final list of 930 metabolites. Genome-wide association analysis DNA was extracted from whole blood samples by Covance (Redmond, WA) using a standardized protocol. Whole genome sequencing was performed by Wuxi, Inc. (Shanghai, China) in a CLIA-certified laboratory for 1,569 individuals. Extracted libraries were sequenced using an HiSeq X sequencer (Illumina, San Diego, USA) with 150bp libraries and aiming for >30x coverage. Basecalling and conversion to raw FASTQ files was performed using the Illumina Basespace software. Raw reads were aligned to the hg19 human reference genome with BWA 0.7.12. The GATK HaplotypeCaller with GATK 3.3.0 was used to call individual variants. This included indel local realignment followed by base quality recalibration. This yielded a set of around 7.68M measured SNPs. To account for genetic structure and potential cryptic relatedness in the cohort, we used a linear mixed model to test for associations between genetic variants and metabolite levels. For each metabolite, we performed a genome-wide association study using FastGWA using the hg19 genetic map and the metabolite confounder-adjusted residuals ^[156]57. Linkage disequilibrium scores were extracted for all variants in the study and used as an additional covariate in the genome-wide association study. Metabolome genome-wide significance was called using the Bonferroni corrected p-value threshold of 5.37e-11 (5e-8 / #metabolites) where 5e-8 is the commonly used threshold for genome-wide significance used in the human genomics field ^[157]58. Lead variants were mapped by two-step clumping strategy based on, linkage equilibrium (LD scores) where we first clumped all variants within 250kbp fragments that exceeded LD score cutoff of 0.8, followed by second clumping in 250kbp fragments using an LD score cutoff of 0.3. This resulted in the final set of 297 independent lead variants. Gut Microbiome Sequencing Fecal samples were collected and chemcially preserved using a proprietary at-home swab kit (OMNIgene Gut, DNAGenotek, USA). 250μL of homogenized stool from each sample were subsequently used for DNA extraction, using a MoBio PowerMag Soil DNA isolation kit (QIAGEN, Germany) and the KingFisher Flex instrument. DNA concentrations in each sample were quantified with a QuBit (ThermoFisher, USA) and purity was assessed by measuring the A260/A280 absorbance ratio. Amplification and library preparation were performed by external providers using custom and optimized protocols by either amplifying the V4 region (SecondGenome, USA) or the V3-V4 region (DNAGenotek, USA) of the 16S gene. Sequencing was performed using a MiSeq (Illumina, USA) using either a paired-end 250bp protocol (SecondGenome) or a paired-end 300bp protocol (DNAGenotek). Basecalling was performed using the Illumina Basespace platform which removed the added phiX reads and provided the final FASTQ files used for downstream analysis. Quality of the sequencing reads was assessed by manual inspection of the error rate across sequencing cycles and appropriate length cutoffs of 250bp for the forward reads and 230bp for the reverse reads were chosen based on the profiles. Reads with more than 2 expected errors under the Illumina error model were removed from the analysis along with reads containing ambiguous base calls (“N” nucleotides). More than 97% of reads passed those filters yielding a mean of around 200,000 reads per sample. Filtered and truncated reads were then used to infer amplicon sequence variants using DADA2 ^[158]59. Here error profiles were learned for each sequencing run separately. The resulting ASVs and respective counts were merged and chimeras were removed using the “consensus” strategy implemented in DADA2, which removed around 16% of all reads. Taxonomy assignment of ASVs was performed by using the naive Bayes classifier in DADA2 with the SILVA database (version 128). Species assignments were performed by exact match of the inferred ASVs with the reference 16S gene in SILVA, where possible. About 90% of reads could be classified down to the genus levels and 32% of reads could be classified down to the species level. The total set of samples was then filtered for those individuals that also had paired microbiome, metabolomics, and whole genome shotgun sequencing data available. Where several microbiome time points were available we used the one closest to the blood draw used for generating the metabolomics data. Low abundance taxa, with less than 10 reads on average or appearing in less than 50% of all samples, were removed from the data set prior to analysis. Metabolite-microbiome associations and confounder adjustment Genus-level read counts were scaled using the centered log-ratio (CLR) transform and centered across the 1,569 samples. Metabolite abundances were log-transformed and centered which yielded good similarity to a normal distribution as judged by a QQ-plot. Transformed and scaled metabolite and bacterial genera abundances were then regressed against the set of confounders and the residuals were saved as new dependent variables. For the confounder adjustment we chose the most common intrinsic factors that would affect blood metabolite and fecal microbe abundances. Firstly, we corrected for sex, age, and BMI by including coefficients for biological sex, age, age^2, sex:age, and sex:age^2 interactions, and BMI. Additionally, we included covariates for common batch effects, like the season of the year the samples were obtained, the metabolomics batch (each sample in a batch was processed together), and the vendor for the microbiome sample (DNA Genotek or Second Genome). Finally, we included continuous measures of genetic ancestry as covariates in our analyses, given by the first 5 principal components from an analysis of 100,000 linkage disequilibrium corrected SNPs (minor allele frequency > 5%) calculated with the PC-AiR and PC-Relate methods ^[159]60. The residuals were used for all further analyses as well as visualizations and analyses of variances. After this we performed individual linear regressions between all metabolite-genus pairs to find significant associations between individual metabolites and bacterial genera. P-values were obtained using F-tests and corrected for false discovery rate using the Benjamini and Hochberg method. We then retained the significant associations with an FDR-corrected p<0.05. We also performed the same analysis with alpha-diversity as the independent variable or with alpha-diversity as a covariate to the CLR-transformed bacterial genus abundance to verify that individual genus-level associations were not due to an association with alpha-diversity alone, which we did not observe to be the case. Analysis of explained variances (R^2) Explained variances (R^2) for each metabolite in the training cohort were obtained by ordinary least squares models only containing features that were significantly associated with each metabolite individually in the training cohort. Ergo, we only include genus abundances for those genera with a FDR-corrected p<0.05 in the previous metabolite-microbe associations and only those single nucleotide variants that were identified as significant in the previous genome-wide association study for the particular metabolite. This was done to avoid inflation of the explained variances, as adding random variables to a regression model tends to increase the explained variance. We performed multilinear regressions with only microbiome features, only genetic features, and both genetic and microbiome features to quantify the variance of the transformed and confounder-adjusted metabolite abundances that was explained by each feature type individually and jointly. Consequently, it should be noted that the explained variances in this manuscript refer to the metabolite abundance variance after removing components of the variance explained by the covariates listed above. As such, these R^2 caclulations pose an upper bound of the explained variance of the raw metabolite abundance but are also independent of common confounders, such as age, sex, and BMI and thus should be more generalizable to other populations structures. Overlap between microbiome and genetic features was quantified as the difference between the sum of R^2 of the individual feature type models and the joint model R^2 (R^2[microbiome] + R^2[genetics] − R^2[joint]). In the case of complete independence of microbiome factors and genetics, this difference would be zero and would become positive if there is partial or complete overlap in the variances explained by the microbiome and host genetics. In the validation cohort we used the same regression models but as in the training cohort but fixed all coefficients to the ones from the training cohort except for the intercept term to account for different baseline levels of metabolites between the cohorts and avoid negative R^2 values. This bounds the validation R^2 values to [0, 1] where a metabolite not associated/validated with the identified features from the training cohort will intrinsically receive an R^2 of 0. Genome-microbiome interactions For gene microbiome interactions, we performed ordinary least squares regression of the form [MATH: mi=β0+β1bj+β2sk+γbjsk+δc+ϵi, :MATH] where m[i] denotes the scaled abundance of metabolite i, b[j] the scaled abundance of the bacterial genus j, s[k] the ordinal versions of the allele on variant k (0 - major allele, 1 - heterozygous minor allele, 2 - homozygous major allele), c the vector of confounder covariates, and ε[i] a random normally distributed variable with an expectation of zero. Significance of the interaction was then evaluated by an F-test comparing the full model to a model with a fixed ɣ=0. P-values from all tested metabolite / variant / genus triplets were adjusted for false discovery rate using the Benjamini and Hochberg method and judged as significant with a FDR cutoff p<0.05. Even though individual metabolite and bacterial genus abundances had been adjusted for confounders previously, we still included them in this regression because the product b[j]*s[k] had not been adjusted for the same confounders. Also, to make our analysis computationally feasible, we performed the regressions only for those metabolites, bacterial genera, and genetic variants that had shown at least one significant interaction in the prior analyses. The full cohort was includes in the interaction analysis due to the low prevalence of the minor allele groups. Finally, post-hoc tests for microbe-metabolite interactions within sub-cohorts carrying a specific allele pairing were performed by Pearson tests on the product-moment correlation within each sub-cohort and for those triplets that had shown significant interaction effects. Extended Data Extended Data Figure 1. Genome-wide association study results. Extended Data Figure 1. [160]Open in a new tab Genome-wide association p-values for variants associated with at least one metabolite (only p < 1e-5 shown here). Red dashed line denotes genome-wide significance (p < 5.37e-11). Gray bars denote separation between chromosomes. N=1,000 (training cohort). The full list of p-values from the GWAS analysis can be found in [161]Table S4. Extended Data Figure 2. Extended Data Figure 2. [162]Open in a new tab Gene-microbiome interactions explain variation in blood metabolite levels (unadjusted data). Shown are six selected examples where the genomic background modulates the associations between bacterial genus-level abundances and metabolite levels. In (A-F) metabolite abundances were log-transformed, but no adjustment for common confounders, was performed here. In (A-F) “ρ” denotes the Spearman’s Rank-Order correlation coefficient of the regression and “p” denotes the p-value under a two-sided Spearman’s correlation test. The solid line indicates an ordinary least squares regression line relating the transformed metabolite abundance with the transformed genus abundance within each group. The shaded area is the 95% confidence interval of the regression. Associations were corrected for sex, age, age^2, sex:age, and sex:age^2 interactions, BMI, microbiome vendor, metabolomics batch, and the first 5 principal components of genetic ancestry (N=1,569). Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity. Supplementary Material 1845232_RS_File [163]NIHMS1845232-supplement-1845232_RS_File.pdf^ (1.3MB, pdf) 1845232_Sup_Tab_1-5 Table S1. Training cohort R^2 values for genetics and microbiome for the 595 metabolites with a significant association in the training cohort (N=1,000). Significant associations with bacterial genera were obtained by Pearson correlation tests between CLR-transformed genus abundances and log-transformed metabolite residuals corrected for common confounders (all FDR-corrected p<0.05). Significant associations with genomic variants were obtained by the FastGWA mixed-linear model with a Bonferroni correction across metabolites (p < 5e-8 / #metabolites). Table S2. Validation out-of-sample R^2 values for genetics and microbiome for the 595 metabolites with a significant association in the training cohort (N=569). Explained variances were calculated by using the fixed association coefficients from the training set models. Table S3. Genome-microbiome interactions for blood metabolite regression models with a significant interaction term (466 interactions, N=1,569). P-values were obtained from a two-sided Wald test on the interaction term coefficient γ of an ordinary least squares regression model. Table S4. Significant results from the genome-wide association analysis in the training cohort (N=1,000) along with annotations for variants and metabolites. Significant associations with genomic variants were obtained by the FastGWA mixed-linear model using a sparse genetic-relationship matrix with a Bonferroni correction across metabolites (p < 5e-8 / #metabolites). Table S5. Enrichment tests for genome-microbiome associations. P-values were obtained from one-sided hypergeometric tests and corrected for multiple testing using the Benjamini-Hochberg correction. [164]NIHMS1845232-supplement-1845232_Sup_Tab_1-5.xlsx^ (4.7MB, xlsx) Acknowledgments