Abstract Tumour metabolism is controlled by coordinated changes in metabolite abundance and gene expression, but simultaneous quantification of metabolites and transcripts in primary tissue is rare. To overcome this limitation and to study gene–metabolite covariation in cancer, we assemble the Cancer Atlas of Metabolic Profiles of metabolomic and transcriptomic data from 988 tumour and control specimens spanning 11 cancer types in published and newly generated datasets. Meta-analysis of the Cancer Atlas of Metabolic Profiles reveals two classes of gene–metabolite covariation that transcend cancer types. The first corresponds to gene–metabolite pairs engaged in direct enzyme–substrate interactions, identifying putative genes controlling metabolite pool sizes. A second class of gene–metabolite covariation represents a small number of hub metabolites, including quinolinate and nicotinamide adenine dinucleotide, which correlate to many genes specifically expressed in immune cell populations. These results provide evidence that gene–metabolite covariation in cellularly heterogeneous tissue arises, in part, from both mechanistic interactions between genes and metabolites, and from remodelling of the bulk metabolome in specific immune microenvironments. Subject terms: Cancer metabolism, Data integration, Metabolism, Computational biology and bioinformatics __________________________________________________________________ Benedetti et al. produce a Cancer Atlas of Metabolic Profiles from 988 metabolomic and transcriptomic tissue profiles across 15 studies to analyse and identify gene–metabolite interactions that transcend cancer types. Main Coordinated changes of genetically encoded metabolic enzymes and transporters, and the metabolites they act on, underpin diverse cancer-associated phenomena, including tumorigenesis^[46]1, pluripotency^[47]2,[48]3, the onset of drug resistance^[49]4–[50]6 and the modulation of immune responses^[51]7–[52]11. However, despite the high value of joint profiling of metabolites and gene expression/protein levels, previous large-scale studies of tumour metabolism have overwhelmingly focused on the analysis of gene expression data^[53]12. Conversely, the few instances of multimodal metabolomic and transcriptomic profiling of human tumour specimens have largely been performed in disparate, unrelated studies by a multitude of research teams^[54]13–[55]22. Integration of metabolomic datasets produced in different patient cohorts is challenging due to technical batch effects and the semi-quantitative nature of untargeted metabolomic data (reported in arbitrary units of relative abundance). Thus, both the scarcity of multimodal metabolomic/transcriptomic data from tissue specimens and the challenges of harmonizing available datasets fundamentally impede the discovery of recurrent, coordinated changes in metabolic gene expression and metabolite abundance across cancers. Tumours from diverse cancer types differ in their cell-type composition, vascularization and other factors ultimately influencing metabolism. Yet, they share a convergent set of metabolic alterations^[56]23–[57]27. For example, several meta-analyses of the tumour metabolic transcriptome have identified recurrent upregulation of genes in one-carbon metabolism and oxidative phosphorylation across cancer types^[58]26,[59]28,[60]29. Analogously, meta-analyses of metabolomics data have demonstrated that numerous central carbon metabolites (for example, lactate) and effector metabolites (for example, kynurenine) are at higher abundance in tumour tissue compared to normal tissue across many cancer types^[61]23,[62]30. These studies have illustrated the power of meta-analysis for distilling highly recurrent metabolic phenotypes from heterogeneous data but have left unresolved the question of how metabolic gene expression and metabolite abundance are coordinated and ultimately shape tumour physiology. To systematically investigate gene–metabolite covariation in cancer, we assembled, harmonized and integratively analysed metabolomics and transcriptomics profiles from 988 primary tumour and matched adjacent normal tissue collected in 15 independent studies covering 11 cancer types. The preprocessed and harmonized data constitute the Cancer Atlas of Metabolic Profiles (CAMP), representing what is, to our knowledge, the largest harmonized dataset of multimodal metabolomic and transcriptomic data on primary tumour specimens. The CAMP is publicly available for download on Zenodo (10.5281/zenodo.7150252) and can be interactively explored at [63]https://rezniklab.shinyapps.io/CAMP-shiny-app/. Leveraging the diversity of diseases in our dataset, we designed a concordance-based statistical meta-analysis approach to discover instances of gene–metabolite interactions (GMIs) that transcended cancer type. This revealed two distinct classes of GMIs: First, we identified a small number of strong interactions between enzymes and metabolites involved in the same or subsequent reactions (‘proximal’ GMIs), suggesting that these enzymes are the primary determinants of their respective metabolite pool sizes. A second group of GMIs consisted of a small number of metabolites broadly correlated to large numbers of genes. Interestingly, this second class of GMIs was enriched for genes specifically expressed in immune cells, and for metabolites related to nicotinamide adenine dinucleotide (NAD^+), a pleiotropic metabolite that acts both as a central cofactor in metabolism^[64]31 and as a signalling molecule influencing cell identity^[65]32. Taken together, these findings suggest that gene–metabolite covariation in tumours emerges, in part, from two complementary phenomena: the expression of enzymes with strong control over metabolite pool size, and the presence of specific cell populations in the tumour microenvironment with characteristic metabolomic profiles. Results The Cancer Atlas of Metabolic Profiles Since metabolomic profiling has so far been excluded from large multimodal tumour profiling projects (for example, the TCGA^[66]33), there is no unified resource of metabolomic/transcriptomic data in the cancer research field. However, several groups have independently produced and released matched metabolomic/transcriptomic data in diverse cancer types^[67]13–[68]22. We combined these datasets with several in-house studies to create a comprehensive collection of 988 samples (764 tumour samples and 224 adjacent normal samples) across 11 different cancer types, covering 15 datasets, which we called the CAMP (Table [69]1 and Fig. [70]1a). The overall collection includes a total of more than 40,000 unique transcripts and almost 2,500 unique metabolites. To maximize comparability across these heterogeneous studies, we applied a unified workflow to process RNA expression from microarray and RNA-sequencing (RNA-seq) data, harmonize metabolite names and annotations, and standardize data normalization and preprocessing ([71]Methods). The CAMP represents an unprecedented resource to investigate the covariation of metabolite levels and gene expression at scale across diverse lineages of human cancers and normal tissues. To ensure high quality of the data, we evaluated several measures of quality control (QC). These included confirming that changes in metabolite abundance and gene expression between tumour and normal samples recapitulated those from prior work (Extended Data Fig. [72]1a,b), and demonstrating that covariation between pairs of metabolites was strongest between proximal metabolites in the metabolic network (that is, metabolite pairs acted upon by a common enzyme; Extended Data Fig. [73]1c). Table 1. Overview of cancer type, sample size and type of gene expression data in the study Cohort Cancer type Reference Samples (tumour/normal) Gene expression data type BRCA1 Breast cancer Terunuma et al.^[74]19 61/47 Microarray BRCA2 Breast cancer Tang et al.^[75]20 18/– RNA-seq COAD Colon adenocarcinoma Satoh et al.^[76]18 37/39 Microarray DLBCL Diffuse large B cell lymphoma Calvo-Vidal et al.^[77]17 62/– RNA-seq GBM Glioblastoma Wang et al. ^[78]16 74/6 RNA-seq HürthleCC Hürthle cell carcinoma Ganly et al. ^[79]84 28/3 RNA-seq HCC Hepatocellular carcinoma Chaisaingmongkol et al. ^[80]15 54/– Microarray ICC Intrahepatic cholangiocarcinoma Chaisaingmongkol et al. ^[81]15 86/– Microarray OV High-grade serous ovarian cancer Gentric et al. ^[82]14 45/– Microarray PDAC Pancreas adenocarcinoma Zhang et al. ^[83]21 27/12 Microarray PRAD Prostate adenocarcinoma Penney et al. ^[84]13,[85]85 91/46 Microarray ccRCC1 Clear-cell renal carcinoma Hakimi et al. ^[86]22 32/– RNA-seq ccRCC2 Clear-cell renal carcinoma Golkaram et al. ^[87]86 30/– RNA-seq ccRCC3 Clear-cell renal carcinoma Golkaram et al. ^[88]86 67/47 RNA-seq ccRCC4 Clear-cell renal carcinoma Golkaram et al. ^[89]86 52/24 RNA-seq [90]Open in a new tab Fig. 1. Summary of the Cancer Atlas of Metabolic Profiles. [91]Fig. 1 [92]Open in a new tab a, The CAMP integrates metabolomic and transcriptomic data from 15 datasets covering 11 different cancer types, comprising both tumour and normal tissue. BRCA1/BRCA2, breast cancer; COAD, colon adenocarcinoma; DLBCL, diffuse large B cell lymphoma; GMB, glioblastoma; HürthleCC, Hürthle cell carcinoma; OV, high-grade serous ovarian cancer; PDAC, pancreas adenocarcinoma; PRAD, prostate adenocarcinoma; ccRCC1/ccRCC2/ccRCC3/ccRCC4, clear-cell renal carcinoma. b, Overview of non-parametric concordance meta-analysis. For a given transcript (T) and metabolite (M) pair, all measurements from all datasets were considered and weighted according to sample size. The concordance is a non-parametric measure of bivariate association between T and M that can be applied in meta-analysis across multiple datasets. [93]Source data Extended Data Fig. 1. The Cancer Atlas of Metabolic Profiles (CAMP) data recapitulate prior work interactions and demonstrate robust GMIs. [94]Extended Data Fig. 1 [95]Open in a new tab (A)Differential abundance between tumor and normal samples across 7 cancer types. Blue circles indicate metabolites depleted in tumor related to normal tissue and red circles indicate increased in tumor related to normal tissue. Grey circles mean metabolites with no statistically significant change in abundance. (B) Scatterplot of gene expression differential fraction score and differential abundance score in the KEGG metabolic pathways. (C) Boxplot comparing absolute concordance to metabolic distance from 159 metabolites that with KEGG id in the human 1 GEM model (12560 pair-wise distance). The data is displayed as a boxplot, in which the box bounds represent the 25th to the 75th percentile, with the median shown as a line in the center. The bottom whisker extends to the smallest value that is not an outlier, and the upper whisker extends to the largest value that is not an outlier. (D) Barplot of statistically significant GMIs and the number of individual datasets the GMI is significant in after FDR correction. [96]Source data We interrogated the CAMP dataset for recurrent covariation between genes and metabolites across datasets and cancers. Such gene–metabolite covariation could emerge via numerous mechanisms including direct metabolic interactions, for example, via expression changes of a rate-limiting enzyme, or by the accumulation/depletion of metabolites as part of a broader phenotype, such as a cytotoxic immune response^[97]34–[98]36. While each cancer type is likely to demonstrate its own unique pattern of transcriptomic and metabolomic changes, the CAMP enables discovery of metabolomic/transcriptomic covariation that transcends diseases. To identify cancer-type-agnostic metabolite and transcript covariation across CAMP datasets in a statistically principled manner, we developed a concordance-based meta-analysis approach (Fig. [99]1b and [100]Methods). Concordance is a non-parametric measure of correlation, which enables the identification of consistently positive or negative gene–metabolite associations across datasets^[101]37. Our measure of concordance ranges from –1 to 1 and is closely related to non-parametric correlation coefficients such as Kendall’s tau, with a value of –1 corresponding to strong discordance and 1 corresponding to strong concordance ([102]Methods). We focused our analysis on tumour samples only in the CAMP (although analogous analysis could be carried out on normal samples), and focused on the 276 metabolites that were quantified in more than half of the tumour datasets (at least 8 of 15 tumour datasets) and the 16,082 genes that were quantified in all 15 studies. Of all possible gene–metabolite pairs (276 metabolites × 16,082 genes = 4,438,632 pairs), a total of 22,619 pairs (0.51%) were significantly correlated after multiple-testing correction at a false discovery rate (FDR) of 0.01 (Fig. [103]2a). This included 269 metabolites (~97%) and 7,987 genes (~50%) participating in at least one significant association (Supplementary Table [104]1), which we refer to as GMIs. Post-hoc QC confirmed that 7,737/22,619 (78%) of GMIs were statistically significant in two or more individual datasets (Extended Data Fig. [105]1d). Statistically significant GMIs identified in single-study concordance analysis of the BRCA1 (microarray) and BRCA2 (RNA-seq) were highly consistent, indicating that the choice of transcriptomic profiling technology (that is, microarray versus RNA-seq) did not introduce substantial artefacts (Extended Data Fig. [106]2). Importantly, the results of the concordance were not affected by imputation, and an analogous concordance analysis omitting all imputed data reproduced 20,291/22,619 (89.7%) of statistically significant GMIs. Finally, we examined the consistency of GMIs (considering only those measured in >7 distinct datasets) generated from two distinct, non-overlapping subsamples of the full CAMP dataset. This analysis identified a high degree of consistency between estimates of concordance for each GMI (Spearman rho 0.31, P value < 10^−^16), confirming the robustness of the results of our concordance meta-analysis. Fig. 2. Meta-analysis across the CAMP captures lineage-transcending gene–metabolite interactions. [107]Fig. 2 [108]Open in a new tab a, Volcano plot of the GMIs computed between the 16,082 genes present in all datasets, and the 276 metabolites present in at least 8 of our 15 tumour cohorts. The x axis indicates the scaled concordance value, where values above 0 indicate positive association and values below 0 indicate negative association. The y axis represents the corresponding −log[10] FDR-adjusted P value. Two-tailed P values were estimated from the unscaled concordance value’s z-score ([109]Methods) and were corrected for multiple testing using the Benjamini–Hochberg method. The horizontal line indicates the significance cut-off of 0.01 FDR. Light grey dots indicate nonsignificant gene–metabolite pairs, and black dots indicate significant pairs. Three top hits have been highlighted. b, Statistically significant GMIs are enriched for proximal interactions, but proximal interactions nevertheless constitute a minority of all statistically significant GMIs. Of all significant gene–metabolite pairs in our concordance meta-analysis, 3,304/22,619 pairs had a defined distance (~14.61%), but only 565/22,619 (~2.50%) of these were proximal. c, Proximal GMI prioritization. GMIs for the 22 metabolites whose strongest GMI was proximal (distance less or equal 2). For each metabolite, we ranked genes by their statistical significance. Two-tailed P values were estimated from the unscaled concordance value’s z-score ([110]Methods) and were corrected for multiple testing using the Benjamini–Hochberg method. Red and black dots indicate proximal and non-proximal genes significantly associated with the corresponding metabolite, respectively, while grey dots indicate genes with nonsignificant associations. Bold metabolites exhibit a large gap between the dominant GMI and all other GMIs for a metabolite. d,e, Scatterplots of the association between kynurenine levels and two proximal genes (IDO1, d; AFMID, e). Metabolite abundances were scaled within each dataset to be displayed together. Two-tailed P values were estimated from the unscaled concordance value’s z-score ([111]Methods) and were corrected for multiple testing using the Benjamini–Hochberg method. f, CRISPR–CAS9-mediated knockout of IDO1 depleted IDO1 protein levels in HCT116 cells. Western blot was performed once and not repeated. g, Kynurenine levels were depleted upon knockout of IDO1 in HCT116 cells (n = 3 in each condition). Data are presented as mean values ± s.d. h, Scatterplot of the association between tryptophan levels and IDO1 in the CAMP. Two-tailed P values were estimated from the unscaled concordance value’s z-score ([112]Methods) and were corrected for multiple testing using the Benjamini–Hochberg method. i, Tryptophan levels increase upon IDO1 knockout (n = 3 in each condition). Data are presented as mean values ± s.d. j, Scatterplot of oxidized glutathione (GSSG) and GGT1. Two-tailed P values were estimated from the unscaled concordance value’s z-score ([113]Methods) and were corrected for multiple testing using the Benjamini–Hochberg method. k, Validation of the GSSG and GGT1 relationship was based on the study from Priolo et al.^[114]42. The dataset includes four data points for each condition. The two-tailed P value was estimated with a Wilcoxon rank-sum test. a.u., arbitrary units. [115]Source data Extended Data Fig. 2. The statistically significant GMIs identified from different transcriptomic profiling technology (microarray v.s. RNA sequencing) were highly consistent. Extended Data Fig. 2 [116]Open in a new tab Scatterplot of statistically significant GMIs in BRCA1 (microarray) and BRCA2 (RNA sequencing) P-values was estimated from Pearson’s correlation test. Axes represent concordance values. [117]Source data A subset of gene–metabolite covariation represent direct enzyme–substrate interactions Among the statistically significant GMIs (Fig. [118]2a), we noted that the strongest positively correlated GMI (IDO1–kynurenine, adjusted P value = 4.82 × 10^−^35) and the two strongest negatively correlated GMIs, GDA–guanine (adjusted P value = 2.31 × 10^−^20) and CD38–nicotinamide mononucleotide (NMN; adjusted P value = 3.90 × 10^−^16) corresponded to ‘proximal’ metabolic interactions, which are interactions between an enzyme and its direct or nearly direct substrate/product. For example, IDO1 catalyses the catabolism of tryptophan to N-formyl-kynurenine, which is subsequently metabolized to kynurenine^[119]38, and both CD38 and GDA directly degrade the metabolites guanine and NMN, respectively^[120]39,[121]40. We confirmed that statistical significance for these three GMIs was likely driven by several cancer types rather than a single dataset with very strong associations (Extended Data Fig. [122]3). Extended Data Fig. 3. Statistical significance of these 3 GMIs was likely driven by several cancer types. Extended Data Fig. 3 [123]Open in a new tab Box plots of concordance for gene-metabolite pair of CD38-MNM (n=8), GDA-guanine (n=13) and IDO1-kynurenine (n=12) across individual datasets. The data is displayed as a boxplot, in which the box bounds represent the 25th to the 75th percentile, with the median shown as a line in the center. The bottom whisker extends to the smallest value that is not an outlier, and the upper whisker extends to the largest value that is not an outlier. [124]Source data The direct biochemical relationship between the three GMIs above raised the possibility that functional proximity between enzymes and their substrates/products might underlie a large fraction of GMIs. To test this, we systematically computed the biochemical distance between all gene–metabolite pairs using the highly curated Human-1 metabolic network model from Robinson et al.^[125]41. In this framework, a distance of one represents molecules that are involved in the same metabolic reaction, and a distance of two indicates a gene–metabolite pair that take part in subsequent reaction steps ([126]Methods). Although statistically significant GMIs were enriched for proximal interactions relative to nonsignificant gene–metabolite pairs (odds ratio 1.42, Fisher’s exact test P value = 3.09 × 10^−^15), proximal interactions themselves constituted only a small fraction of the total ensemble of statistically significant GMIs (2.5%, 565/22,619; Fig. [127]2b). Thus, while several of the strongest GMIs arose from proximal interactions, gene–metabolite proximity was a weak determinant of the full GMI landscape. To further investigate the above observations about the strength of specific proximal GMIs and the overall relationship between metabolic proximity and GMIs, we investigated the relative strength of different GMIs affecting a common metabolite. We then focused on the 22 metabolites whose strongest GMI was proximal, covering diverse molecules involved in nucleotide metabolism (guanine, cytidine), cofactor metabolism (NAD^+), redox metabolism (cystine, oxidized glutathione (GSSG)) and other pathways (Fig. [128]2c). Interestingly, we found that for 8 of 22 metabolites (kynurenine, guanine, NMN, GSSG, tryptophan, glycerophosphocholine (GPC), cystine and cytidine), a large gap existed between the most significant GMI and the second-highest correlating transcript. This gap suggested that the pool size of these metabolites was strongly controlled, in a lineage-agnostic manner, by a single, dominant gene. Consequently, we hypothesized that targeted genetic knockdown of dominant GMIs for these 8 metabolites would have a higher likelihood of producing significant changes in pool size than for other metabolites with multiple, comparably strong GMIs, each of which might control the pool size of the metabolite (Fig. [129]2c). Similar results were found when relaxing the threshold for calling gene–metabolite proximity (Extended Data Fig. [130]4). Extended Data Fig. 4. Gene-metabolite proximity at relaxed distance thresholds has similar results. [131]Extended Data Fig. 4 [132]Open in a new tab -Log[10] adjusted p-value of GMIs when definition of proximity is 2 (top), and when it is relaxed to 3 (middle) and 4 (bottom). Two-tailed p-values were estimated from the unscaled concordance value’s z-score (see Methods) and were corrected for multiple testing using the Benjamini-Hochberg method. Adjusted p-values are multiplied by concordance sign, thus values below 0 indicate negative covariation. [133]Source data We sought to functionally validate a subset of these predicted, metabolite pool-controlling genes. First, we investigated the association between IDO1 and two metabolites, kynurenine and tryptophan. IDO1 converts tryptophan to N-formylkynurenine, which is subsequently catabolized to kynurenine by AFMID. We observed that kynurenine levels were strongly associated with IDO1 (Fig. [134]2d) but not AFMID (which acts directly to produce kynurenine; Fig. [135]2e) expression across the CAMP. These findings were consistent with independent measurements of the metabolome and transcriptome obtained from the Cancer Cell Line Encyclopedia consortium on ~900 cell lines, where IDO1 expression was associated with kynurenine abundance (P = 8.6 × 10^−^9) but AFMID expression was not (P = 0.8) (ref. ^[136]36) (Extended Data Fig. [137]5a,b). To experimentally test the hypothesis that disruption of IDO1 impacts both tryptophan and kynurenine pool sizes, we used CRISPR–Cas9-mediated knockout with single-guide RNAs (sgRNAs) targeted against IDO1 human colorectal carcinoma HCT116 cells. These experiments corroborated earlier data indicating that knockout of IDO1 depleted kynurenine pools (Fig. [138]2f,g). However, while the association between IDO1 and kynurenine has been widely described in the literature^[139]36, our analysis indicated that IDO1 is also expected to determine the pool size for tryptophan, an amino acid involved in numerous other reactions in the cell, most obviously the synthesis of proteins (Fig. [140]2h). Consistent with this observation, we observed that knockout of IDO1 was sufficient to increase tryptophan levels, indicating that the pool size of this highly connected proteinogenic amino acid could be perturbed in part through disruption of IDO1 activity (Fig. [141]2i). Second, in support of a proximal GMI between GGT1 and GSSG (Fig. [142]2j), we reanalysed existing metabolomic data from a functional knockdown of GGT1 versus control in human embryonic kidney HEK293T cells^[143]42,[144]43. This data confirmed that knockdown of GGT1 was associated with an increase in GSSG levels with respect to mock control (P value = 2.90 × 10^−^2), suggesting that GGT1 is a pool-determining consumer of GSSG (Fig. [145]2k). Taken together, these data demonstrate that lineage-transcending GMIs discovered through pathway-based analysis of the CAMP represent examples of genes exerting strong control over metabolite pool sizes. Extended Data Fig. 5. IDO1-kynurenine and AFMID-kynurenine correlations from the CCLE consortium. [146]Extended Data Fig. 5 [147]Open in a new tab (A) Scatterplots comparing kynurenine abundance to IDO1, and (B) AFMID expression in independent metabolome and transcriptome data obtained from the CCLE consortium. P-value was estimated from Spearman’s rank correlation test. [148]Source data Pathway-level metabolic and transcriptomic changes weakly covary Despite the interesting findings related to proximal GMIs, most (97.5%) statistically significant GMIs represented distant, non-proximal interactions beyond obvious enzyme–substrate metabolic relationships (Fig. [149]2b). One possible implication of such nonlocal covariation is that genes and metabolites in the same metabolic pathway would show asynchronous and uncorrelated changes across different groups of samples, such as tumours and normal tissues. To investigate this hypothesis, we studied the consistency of transcriptional and metabolic differences in tumour versus adjacent normal tissue across cancer types. To this end, we performed differential analysis of metabolite and transcript levels between tumour and normal tissues in the 7 CAMP datasets where both tissues were available (Table [150]1) and aggregated the results into 85 Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways. Of these, we considered 63 pathways with at least one metabolite or gene measured in at least 5 of the 7 CAMP datasets (Supplementary Table [151]2). For each KEGG pathway, we evaluated (using a differential abundance (DA) score; [152]Methods) whether metabolites and transcripts showed synchronous accumulation or depletion patterns in tumours relative to normal tissues. Pathways were biased towards asynchronous changes (276/441, 63%), where increases in metabolite levels coincided with decreases in transcript levels, and vice versa (Fig. [153]3a). Only one pathway (histidine metabolism) demonstrated fully synchronous changes in all datasets, whereas a few others demonstrated uniformly asynchronous changes (for example, primary bile acid biosynthesis). We also assessed whether there was a correlation between the extent of metabolomic versus transcriptomic disruption regardless of the direction (using a differential fraction (DF) score; [154]Methods). A minority of pathways (9/63) demonstrated significant associations (nominal P value < 0.05) between RNA and metabolite DF scores (Fig. [155]3b; see Fig. [156]3c as an example). Interestingly, these 9 pathways belonged to just 2 KEGG pathway classes (Supplementary Table [157]3): amino acid metabolism and carbohydrate metabolism. Enrichment analysis indicated that the class of carbohydrate metabolism pathway (of 8 in total) was significantly over-represented relative to the others (Fisher’s exact test P value: 2.36 × 10^−^2). Thus, most pathways showed no evidence of a correlation between metabolomic and transcriptomic disruption, emphasizing the implications of predominantly distally acting GMIs and prompting the question of which biological phenomena produce these distal GMIs. Fig. 3. Tumour versus normal changes in metabolite and transcript abundance are predominantly asynchronous. [158]Fig. 3 [159]Open in a new tab a, Heat map of metabolite and transcript DA scores across datasets and pathways, capturing the tendency for metabolites and genes to accumulate or deplete in tumours relative to normal tissues. The size of the dots indicates the number of molecules measured in that pathway, while the colour represents the DA score. b, Spearman correlation coefficients of the metabolite and transcript DF scores in KEGG pathways. Red dots indicate nominal significance (P value < 0.05). A minority of pathways showed significant association between metabolomic and transcriptomic disruption across CAMP datasets. P values were estimated from Spearman’s rank correlation test and were corrected for multiple testing using the Benjamini–Hochberg method. c, Example of Spearman correlation calculation: Distribution of the metabolite (x axis) and transcript (y axis) DF scores across datasets for the citrate cycle (tricarboxylic acid (TCA) cycle) pathway. P values were estimated from Spearman’s rank correlation test. [160]Source data Hub metabolites are enriched for immune genes associations Having established that most GMIs do not represent biochemically proximal interactions, we adopted a broader approach to identify the driving factors of metabolite–transcript correlations. First, we investigated the distribution of GMIs across metabolites and genes, observing that GMIs were strongly concentrated in a small number of metabolites (Fig. [161]4a). The top three metabolites with the highest number of GMIs, quinolinate, NMN and 5′-methylthioadenosine alone contributed to 17% (3,823/22,619) of all GMIs in our analysis, and the top ten metabolites covered 35% of the GMIs (8,048/22,619), far higher than the fraction covered by the top eight genes (Fig. [162]4b and Extended Data Fig. [163]6a,b). That is, a small number of metabolites participated in an exceptionally high number of GMIs, acting as ‘hubs’ for strong covariation with gene expression. Interestingly, hub metabolites concentrated in certain metabolic pathways. Among the top ten most correlated metabolites, we found several constituents of the NAD^+ biosynthesis pathway (quinolinate, NMN and NAD^+) and nucleotide metabolism (thymine, uracil and adenine). Fig. 4. Gene–metabolite interactions are concentrated in hub metabolites associated with immune-related genes. [164]Fig. 4 [165]Open in a new tab a, Distribution of significant GMIs across metabolites. The top ten metabolites with the highest number of significant associations are labelled. b, Distribution of significant GMIs across genes. The top eight genes with the highest number of significant associations are labelled. c, KEGG-based gene pathway enrichment analysis results for metabolites with at least one GMI and at least one significantly enriched pathway. The heat map colour represents the strength of the enrichment as the negative log[10](P value) of the pathway enrichment test. Two-sided P values were estimated with a Fisher’s exact test and adjusted for multiple testing using the Benjamini–Hochberg method. Cells coloured in shades of red indicate pathways that were significant after multiple-testing corrections (adjusted P value < 0.01), and grey cells indicate insignificant associations. Individual pathways were classified based on the type of process they describe: metabolism (lilac), immune (green) or other (cream). MTA, 5′-methylthioadenosine. [166]Source data Extended Data Fig. 6. 200 strongest non-proximal GMIs. [167]Extended Data Fig. 6 [168]Open in a new tab Barplot of metabolite involved in the 200 strongest non-proximal GMIs excluding pairs without a distance (A) and including pairs without a distance (B). Pairs without a distance are denoted as NA in Supplementary Table [169]1. Metabolites in black belong to the NAD+ metabolism pathway while all others are in grey. [170]Source data To determine whether the genes correlated with a particular metabolite were enriched for specific cellular functions, we performed an unsupervised pathway enrichment analysis. For each metabolite with at least one GMI, we investigated whether specific pathways and processes were over-represented in the transcripts correlated with that metabolite. Overall, the considered transcripts spanned over 146 KEGG pathways (Supplementary Table [171]4), covering the 85 metabolic processes in Fig. [172]3 as well as cellular processes (for example, cell growth and death), signalling pathways, genetic processing pathways (for example, transcription and translation) and organismal systems pathways (for example, immune, endocrine and sensory systems). A total of 32 unique pathways were over-represented across 40 metabolites (adjusted P value < 0.01; Fig. [173]4c). Interestingly, only 2 of those 32 pathways represented metabolic processes: oxidative phosphorylation, for which the top three most associated metabolites were adenine, NMN and FAD, and the TCA cycle, also associated with FAD. The remaining top-ranked pathways were exclusively non-metabolic ones. Three metabolites in particular, quinolinate, NMN and NAD^+, showed broad enrichment for immune-related cellular processes, including chemokine signalling, as well as B cell and T cell antigen receptor signalling pathways. This indicates that these metabolites are correlated with the expression of a wide array of genes associated with the immune response. NAD^+-related metabolites associate with immune cell infiltration Human tumour tissues are heterogeneous compositions of various cell populations, including tumour cells, immune cells and stromal cells. Bulk and single-cell profiling technologies have established that a large subset of genes are exclusively expressed in immune cells or non-immune cell subpopulations (for example, tumour cells)^[174]44,[175]45. We reasoned that the correlation between NAD^+-related metabolites and immune-related genes (Fig. [176]4c) could therefore arise if NAD^+-related metabolites were at a characteristically higher or lower abundance in immune cells relative to non-immune cells. One implication of this hypothesis is that, while each cancer type might demonstrate its own unique metabolomic changes associated with immune infiltration, NAD^+-related metabolites should be expected to demonstrate consistent effects across many different cancer types. To determine if NAD^+-related metabolites were associated with immune infiltration across tumours, we used single-sample gene-set enrichment analysis (ssGSEA)^[177]46 to compute a previously validated 141-gene RNA signature of overall immune cell infiltration (ImmuneScore) directly from bulk RNA-seq data^[178]47, and identified the individual metabolites correlated with this immune phenotype. Concordance between metabolite levels and the ImmuneScore signature was assessed across all samples in each CAMP dataset (Fig. [179]5a). In general, covariation between specific metabolite pools and ImmuneScore expression was cancer-type specific. For example, of the metabolites significantly associated with ImmuneScore in intrahepatic cholangiocarcinoma (ICC) and hepatocellular carcinoma (HCC; representing the top two cancer types with the highest number of metabolites significantly associated with ImmuneScore), only 3 metabolites were consistently associated with ImmuneScore in both datasets (Fig. [180]5b and Supplementary Table [181]5). We did not observe a significant correlation (Spearman’s rank correlation P value = 0.69) between the percentage of metabolites significantly associated with ImmuneScore and expression of the ImmuneScore signature itself (Fig. [182]5a), suggesting that the extent of immune infiltration did not confound our analyses. Fig. 5. The abundance of NAD^+-related metabolites is shaped by immune infiltration in the tumour microenvironment. [183]Fig. 5 [184]Open in a new tab a, Left, association of metabolite abundance with immune infiltration varied significantly across cancer types. The bar plot indicates the fraction of metabolites significantly associated with the ImmuneScore signature in each dataset. Bar colours code for the sample size of the dataset. Sample size is the number of tumour samples of the dataset in Fig. [185]1a. Middle, box plots indicate the expression range of the ImmuneScore signature within each dataset. Right, plot of scaled concordance calculated between metabolites and ImmuneScore. Red dots indicate metabolites with a positive association with ImmuneScore, blue dots indicate metabolites with negative associations, and grey dots indicate metabolites that were not statistically significant. b, Heat maps of five metabolites—three that were highly correlated with ImmuneScore in the ICC dataset and two that were highly correlated with ImmuneScore in the HCC dataset. Samples were sorted by increasing ImmuneScore. UDP-G, UDP-glucuronate; GPC-16:1, 1-palmitoleoyl-GPC (16:1); GPC-18:2, 1-linoleoyl-GPC (18:2). c, Bar plot indicating the strength of association between metabolites and ImmuneScore from concordance meta-analysis. Two-tailed P values were estimated from the unscaled concordance value’s z-score ([186]Methods) and were corrected for multiple testing using the Benjamini–Hochberg method. Bar length represents the −log[10] adjusted P value. Metabolites related to NAD^+ metabolism are shown in black. d, Scatterplots of the abundance of two NAD^+-related metabolites, quinolinate and NMN, versus ImmuneScore expression across all datasets. Metabolite abundances were scaled within each dataset. e, Bar plot comparing the absolute concordance values of metabolites to ImmuneScore in a pathway compared to all other pathways (one-sided P values were estimated from Wilcoxon rank-sum test and were corrected for multiple testing using the Benjamini–Hochberg method). f, Metabolomic measurements of purified populations of CD45^− tumour cells and CD45^+ T cells isolated from ovarian cancer tumours. NAD^+ was negatively correlated with the ImmuneScore signature in the ovarian cancer dataset (n = 45). NAD^+ was similarly lower in abundance in CD4^+/CD8^+ cells than CD45^− (non-immune) cells in the dataset of purified cell populations (n = 24 in CD4^+/CD8^+; n = 18 in CD45^−). [187]Source data While metabolite concordance with ImmuneScore was generally heterogeneous across CAMP datasets, we observed that several NAD^+-related metabolites, including quinolinate and NMN, were recurrently associated with high or low levels of immune infiltration in numerous disease contexts (Fig. [188]5a). To systematically identify such lineage-agnostic metabolomic correlates of immune infiltration, we again applied concordance meta-analysis, identifying 14 metabolites significantly associated with ImmuneScore across datasets (adjusted P value < 0.05; Fig. [189]5c), with quinolinate and NMN being the strongest hits (Fig. [190]5d). Interestingly, 4/14 significantly associated metabolites (quinolinate, NAD^+, NMN and kynurenine) were members of the NAD^+ biosynthesis pathway. Consistent with this, we identified NAD^+ metabolism as the sole pathway whose metabolites demonstrated significantly stronger association with immune infiltration than all other pathways (adjusted P value = 0.04; Fig. [191]5e). The above analysis suggested that NAD^+-related metabolites were at differential abundance in immune cells relative to non-immune cells, and that this effect produced a consequent accumulation of NAD^+-related metabolites in immune-infiltrated tumours. Some support for this hypothesis can be found in previously published immunohistochemical data indicating that the abundance of quinolinate increases dramatically in diverse immune cell populations in response to Toll-like receptor 4 ligands such as lipopolysaccharide^[192]48. To provide more evidence for this hypothesis, we compared our findings to a recently published study of metabolomic profiles of purified CD45^− tumour cells and CD45^+ (CD8^+ and CD4^+) T cells from ovarian cancer tumours^[193]48. In our data, NAD^+ was negatively correlated with ImmuneScore in ovarian cancer, suggesting that it was at lower abundance in immune cells relative to non-immune cells (Fig. [194]5f). Consistent with this, NAD^+ was at significantly lower abundance in CD45^+ T cells than CD45^− tumour cells in the dataset of purified cell populations (NAD^+ log[2] fold change = −1.22, P value = 1.7 × 10^−^4; Fig. [195]5f). Together, these analyses suggest that the pool sizes of NAD^+-related metabolites are at characteristically different abundance in immune cells relative to other cell types, and that this effect ultimately drives the association of quinolinate and other NAD^+-related metabolites with immune infiltration in bulk tumour data. Kynurenine and histamine levels correlate to specific cell populations Whereas in the previous analysis we investigated the association of metabolite levels with overall immune infiltration, we next turned to investigating the association of metabolite levels and specific immune cell populations (for example, T cells, macrophages and numerous other cell types), each with unique transcriptional phenotypes and immunological functions^[196]49–[197]51. To investigate how these diverse immune cell populations contributed to the observed GMIs, we estimated the abundance of 23 immune cell types from bulk transcriptomics profiles^[198]51 using ssGSEA as shown previously^[199]46, and computed their association with metabolite levels across cancers. We again focused on lineage-agnostic relationships by performing a concordance meta-analysis to calculate associations between metabolite levels and immune cell signatures across tumours from all cancer types. In total, 7.3% of all metabolite–signature pairs (466 of 6,348 pairs) demonstrated statistically significant associations (adjusted P value < 0.05; Fig. [200]6a). Among these, quinolinate was positively associated with almost all immune cell populations (17/23), consistent with prior immunohistochemical data and suggesting that it accumulates in a variety of immune cell types in a cancer-type-agnostic manner^[201]52. Fig. 6. A subset of metabolites associates with specific immune cell lineages. [202]Fig. 6 [203]Open in a new tab a, Volcano plot of cell-type signature–metabolite interactions. The rug plot at the bottom highlights the numerous associations with quinolinate. b, Manhattan plot of adjusted concordance P values between metabolites and cell types. c, Histamine associates with the abundance of mast cells across most datasets of the CAMP. d, Bar plot of adjusted concordance P values of metabolites correlated with the mast cell gene signature. The red dashed line indicates the significance cut-off of 0.05. Metabolites labelled in red are associated with histamine metabolism. e, HDC expression strongly associates with mast cell abundance across the CAMP. f, Kynurenine abundance associates with an aDC signature across the CAMP. In a–f, two-tailed P values were estimated from the z-scores of the unscaled concordance values ([204]Methods) and were corrected for multiple testing using the Benjamini–Hochberg method. [205]Source data Aside from quinolinate, the two strongest associations between immune cell-type signatures and metabolite levels were two comparatively rare cell populations, namely mast cells and histamine and activated dendritic cells (aDCs) and kynurenine (Fig. [206]6b). Mast cells are a myeloid cell population that, when stimulated, mediate the inflammatory process by synthesizing histamine from histidine using the enzyme HDC. We found both histamine and its related metabolite 1-methylhistamine were the two metabolites most significantly associated with the presence of mast cells (Fig. [207]6c,d), and that the association between histamine and mast cell abundance was driven by diverse cancer types in the CAMP (Extended Data Fig. [208]7a). Histamine levels themselves were strongly associated with the expression of HDC across the CAMP (Fig. [209]6e). Importantly, the mast cell signature contains HDC, but when the signature was recalculated without HDC the significant positive concordance was preserved (concordance = 0.28, P value = 3.4 × 10^−^17; Extended Data Fig. [210]7b). Moreover, single-cell data also indicate that HDC expression is strongly elevated in mast cells relative to other cell types (Extended Data Fig. [211]7c). Interestingly, the median variation of histamine across the tumour datasets in the CAMP was ~740-fold, implying that fluctuations in the abundance of an otherwise rare cell population were sufficient to produce large-magnitude changes in histamine abundance in the bulk tumour. Extended Data Fig. 7. Diverse cancer types likely drove the association between histamine and mast cells. [212]Extended Data Fig. 7 [213]Open in a new tab (A) Box plots of concordance of histamine – mast cells across 13 datasets. The data is displayed as a boxplot, in which the box bounds represent the 25th to the 75th percentile, with the median shown as a line in the center. The bottom whisker extends to the smallest value that is not an outlier, and the upper whisker extends to the largest value that is not an outlier. (B) Scatterplots comparing expression of the mast cell signature without HDC to histamine abundance. Two-tailed p-values were estimated from the unscaled concordance value’s z-score (see Methods). (C) Heatmap of single cell gene expression data for HDC across datasets from the TISCH2 database^[214]85,[215]86. Expression is standardized to z-scores across cell types in a dataset. Considering only datasets in which mast cells are identified, mast cells exhibit highest expression of HDC. [216]Source data In contrast to mast cells and their physiological role in producing histamine, DCs are not known to be dedicated sources of kynurenine in the microenvironment, although single-cell data indicate that IDO1 expression is strongly elevated in DCs relative to other cell types (Extended Data Fig. [217]8a). While the aDC signature contains IDO1 (and IDO1 participates in a strong GMI with kynurenine; Fig. [218]2), an aDC signature without IDO1 preserved strong positive concordance with kynurenine (concordance = 0.16, P value = 3.3 × 10^−^5; Extended Data Fig. [219]8b), confirming that this association was not solely dependent on IDO1 itself (Fig. [220]6f). IDO1 and kynurenine corresponded to the strongest GMI in the CAMP, raising two opposing hypotheses on the mechanism underlying the primary source of kynurenine in the tumour microenvironment: either bulk tumour expression of IDO1 is primarily determined by high DC-specific expression or, alternatively, it is driven by a comparatively low expression in far more abundant non-DC-cell populations. Resolving this ambiguity would require single-cell measurements of metabolite levels, which have recently become technically feasible^[221]53. Together, these data suggest that the presence of individual and comparatively rare cell populations is associated with shifts in the abundance of immunomodulatory metabolites in the tumour microenvironment. Extended Data Fig. 8. The association between kynurenine and dendritic cells was not solely dependent on IDO1 itself. [222]Extended Data Fig. 8 [223]Open in a new tab (A) Heatmap of single cell gene expression data for IDO1 across datasets from the TISCH2 database^[224]85,[225]86. Expression is standardized to z-scores across cell types in a dataset. Considering only datasets in which dendritic cells are identified, dendritic cells exhibit the highest expression of IDO1. (B) Scatterplots comparing expression of the aDC signature without IDO1 to kynurenine abundance. Two-tailed p-values were estimated from the unscaled concordance value’s z-score (see Methods). [226]Source data Discussion Metabolism is jointly controlled by genetically encoded enzymes and small-molecule metabolites. To study the interactions between genes and metabolites at scale, we assembled and harmonized a database of metabolomic and transcriptomic data from ~1,000 tumour and normal samples, which we refer to as the CAMP. Although large-scale multimodal measurements of metabolism have previously been produced in bacteria^[227]54 and yeast^[228]55, a comparable resource for human cancers was missing. The CAMP is thus a resource and represents a significant public database for studying the metabolism of complex human tissues and cancers, and for the interrogation of gene–metabolite covariation across different tissue and disease contexts. Our analysis of the CAMP demonstrates that large-scale studies of multimodal metabolic data can reveal fundamental principles of metabolic regulation at the scale of both individual metabolic reactions (Fig. [229]2) and complex human tissues (Figs. [230]5 and [231]6). Our detailed statistical analysis of the CAMP revealed a multitude of gene–metabolite associations that transcended the tissue of origin. We focused on two specific types of gene–metabolite covariation. The first, induced by functional proximity, identified metabolic genes whose activation or inhibition is likely to have a significant effect on the direct substrate or product of the respective reaction (Fig. [232]2). While metabolite pools are likely controlled by a large number of genes and other factors, our data-driven analysis identified a small subset of metabolites whose pool size was strongly associated with a single, proximal gene across tissue lineages. The GMIs identified in Fig. [233]2b offer a data-informed, rational approach for modulating the pool sizes of their metabolite constituents. While each of the highlighted metabolites in Fig. [234]2b may participate in numerous metabolic reactions, data from the CAMP specifically nominates single genes (for example, GGT1 for GSSG, and IDO1 for tryptophan and kynurenine) as those targets whose perturbation is most likely to disrupt the corresponding pool size. The second broad form of gene–metabolite covariation, likely induced by cell-type-specific physiology, corresponded to metabolites whose abundance was associated with the presence of specific immune cells in the microenvironment. Thus, a sizeable fraction of non-proximal GMIs was associated with a small number of metabolites (enriched for NAD^+-related molecules) that were significantly correlated with a large number of genes, and in particular immune-related genes. This apparent association between the abundance of specific immune cells in a tissue specimen and the levels of numerous NAD^+-associated metabolites suggests that immune cells have evolved mechanisms to maintain the concentrations of these metabolites at characteristically different levels relative to other cell types. Because NAD^+ is both the central mediator of redox poise in the cell and a cofactor for numerous metabolic and non-metabolic reactions, understanding the mechanisms by which immune cells achieve differential abundance of NAD^+-related metabolites, and the selective pressure to do so, can provide insights into the metabolic phenotypes underlying both cancer and other diseases involving dysfunctional immune responses^[235]56. Emerging spatial metabolomics (for example, high-resolution MALDI) and single-cell metabolomics (for example, rapid purification with paramagnetic beads, followed by mass spectrometry) technologies hold the promise of revealing the full extent of cell-type-specific metabolomic adaptations. Such approaches can be used alone or in tandem to measure the metabolomic profiles of individual cells in heterogeneous tissue slices^[236]57–[237]60, rendering the full extent of cell-type-specific metabolomic alterations within experimental reach. Our findings here nominate a new paradigm for understanding tumour-associated changes in the metabolome and how it relates to the cellular composition of the tissue microenvironment. Our discovery that the abundances of NAD^+-related metabolites change consistently across tissues as a function of immune cell abundance adds to a limited but pre-existing understanding of how cell-type-specific metabolism enables function (for example, in the accumulation of lipids in adipocytes and clear-cell tumour cells). More importantly, the discovery that the tumour metabolome changes significantly based on immune cell composition has significant implications for the interpretation of metabolomic data in the context of cancer. Prior studies (including those from our own group) have often used adjacent normal tissue as a reference to understand tumour-specific changes in metabolite levels. What the discoveries here reinforce is that changes in the bulk tumour metabolome may not be caused by tumour-cell-intrinsic changes in metabolism, and instead may in some or many cases arise from non-malignant cell populations. Discoveries about the function of cancer genes have emerged from a combination of untargeted, population-scale genomic surveys^[238]61 and mechanistic experiments in specific disease and genetic backgrounds^[239]62. Combining these approaches has proven transformative for the discovery of recurrent and large-effect-size alterations and prompted their characterization in model systems of human disease. In contrast, the field of cancer metabolism has primarily been driven by bottom-up experiments with modest support from large-scale (largely genetic or transcriptomic) datasets^[240]63–[241]65. The CAMP is a counterpoint to these efforts. By assembling and harmonizing in one database metabolomic and transcriptomic data from diverse diseases, the CAMP represents a unique opportunity for de novo discovery of translationally relevant metabolic phenotypes in cancer. Expanding the scope of the CAMP to include both additional multimodal metabolic data as it is published (enabled by open-source code; Data availability and Code availability) and other forms of data, including but not limited to genomic sequencing, epigenetic profiling and proteomic measurements, holds the potential to reveal entirely new and highly recurrent metabolic phenomena in cancer. Methods Collection of Cancer Atlas of Metabolic Profiles We combined 12 published datasets with 3 additional in-house datasets that profiled metabolite and gene expression from the same samples to create a comprehensive collection of 988 samples (764 tumour samples and 224 adjacent normal samples) across 11 different cancer types, covering 15 datasets, which we called the CAMP. Details and references