Abstract

   Tumour metabolism is controlled by coordinated changes in metabolite
   abundance and gene expression, but simultaneous quantification of
   metabolites and transcripts in primary tissue is rare. To overcome this
   limitation and to study gene–metabolite covariation in cancer, we
   assemble the Cancer Atlas of Metabolic Profiles of metabolomic and
   transcriptomic data from 988 tumour and control specimens spanning 11
   cancer types in published and newly generated datasets. Meta-analysis
   of the Cancer Atlas of Metabolic Profiles reveals two classes of
   gene–metabolite covariation that transcend cancer types. The first
   corresponds to gene–metabolite pairs engaged in direct enzyme–substrate
   interactions, identifying putative genes controlling metabolite pool
   sizes. A second class of gene–metabolite covariation represents a small
   number of hub metabolites, including quinolinate and nicotinamide
   adenine dinucleotide, which correlate to many genes specifically
   expressed in immune cell populations. These results provide evidence
   that gene–metabolite covariation in cellularly heterogeneous tissue
   arises, in part, from both mechanistic interactions between genes and
   metabolites, and from remodelling of the bulk metabolome in specific
   immune microenvironments.

   Subject terms: Cancer metabolism, Data integration, Metabolism,
   Computational biology and bioinformatics
     __________________________________________________________________

   Benedetti et al. produce a Cancer Atlas of Metabolic Profiles from 988
   metabolomic and transcriptomic tissue profiles across 15 studies to
   analyse and identify gene–metabolite interactions that transcend cancer
   types.

Main

   Coordinated changes of genetically encoded metabolic enzymes and
   transporters, and the metabolites they act on, underpin diverse
   cancer-associated phenomena, including tumorigenesis^[46]1,
   pluripotency^[47]2,[48]3, the onset of drug resistance^[49]4–[50]6 and
   the modulation of immune responses^[51]7–[52]11. However, despite the
   high value of joint profiling of metabolites and gene
   expression/protein levels, previous large-scale studies of tumour
   metabolism have overwhelmingly focused on the analysis of gene
   expression data^[53]12. Conversely, the few instances of multimodal
   metabolomic and transcriptomic profiling of human tumour specimens have
   largely been performed in disparate, unrelated studies by a multitude
   of research teams^[54]13–[55]22. Integration of metabolomic datasets
   produced in different patient cohorts is challenging due to technical
   batch effects and the semi-quantitative nature of untargeted
   metabolomic data (reported in arbitrary units of relative abundance).
   Thus, both the scarcity of multimodal metabolomic/transcriptomic data
   from tissue specimens and the challenges of harmonizing available
   datasets fundamentally impede the discovery of recurrent, coordinated
   changes in metabolic gene expression and metabolite abundance across
   cancers.

   Tumours from diverse cancer types differ in their cell-type
   composition, vascularization and other factors ultimately influencing
   metabolism. Yet, they share a convergent set of metabolic
   alterations^[56]23–[57]27. For example, several meta-analyses of the
   tumour metabolic transcriptome have identified recurrent upregulation
   of genes in one-carbon metabolism and oxidative phosphorylation across
   cancer types^[58]26,[59]28,[60]29. Analogously, meta-analyses of
   metabolomics data have demonstrated that numerous central carbon
   metabolites (for example, lactate) and effector metabolites (for
   example, kynurenine) are at higher abundance in tumour tissue compared
   to normal tissue across many cancer types^[61]23,[62]30. These studies
   have illustrated the power of meta-analysis for distilling highly
   recurrent metabolic phenotypes from heterogeneous data but have left
   unresolved the question of how metabolic gene expression and metabolite
   abundance are coordinated and ultimately shape tumour physiology.

   To systematically investigate gene–metabolite covariation in cancer, we
   assembled, harmonized and integratively analysed metabolomics and
   transcriptomics profiles from 988 primary tumour and matched adjacent
   normal tissue collected in 15 independent studies covering 11 cancer
   types. The preprocessed and harmonized data constitute the Cancer Atlas
   of Metabolic Profiles (CAMP), representing what is, to our knowledge,
   the largest harmonized dataset of multimodal metabolomic and
   transcriptomic data on primary tumour specimens. The CAMP is publicly
   available for download on Zenodo (10.5281/zenodo.7150252) and can be
   interactively explored at
   [63]https://rezniklab.shinyapps.io/CAMP-shiny-app/. Leveraging the
   diversity of diseases in our dataset, we designed a concordance-based
   statistical meta-analysis approach to discover instances of
   gene–metabolite interactions (GMIs) that transcended cancer type. This
   revealed two distinct classes of GMIs: First, we identified a small
   number of strong interactions between enzymes and metabolites involved
   in the same or subsequent reactions (‘proximal’ GMIs), suggesting that
   these enzymes are the primary determinants of their respective
   metabolite pool sizes. A second group of GMIs consisted of a small
   number of metabolites broadly correlated to large numbers of genes.
   Interestingly, this second class of GMIs was enriched for genes
   specifically expressed in immune cells, and for metabolites related to
   nicotinamide adenine dinucleotide (NAD^+), a pleiotropic metabolite
   that acts both as a central cofactor in metabolism^[64]31 and as a
   signalling molecule influencing cell identity^[65]32. Taken together,
   these findings suggest that gene–metabolite covariation in tumours
   emerges, in part, from two complementary phenomena: the expression of
   enzymes with strong control over metabolite pool size, and the presence
   of specific cell populations in the tumour microenvironment with
   characteristic metabolomic profiles.

Results

The Cancer Atlas of Metabolic Profiles

   Since metabolomic profiling has so far been excluded from large
   multimodal tumour profiling projects (for example, the TCGA^[66]33),
   there is no unified resource of metabolomic/transcriptomic data in the
   cancer research field. However, several groups have independently
   produced and released matched metabolomic/transcriptomic data in
   diverse cancer types^[67]13–[68]22. We combined these datasets with
   several in-house studies to create a comprehensive collection of 988
   samples (764 tumour samples and 224 adjacent normal samples) across 11
   different cancer types, covering 15 datasets, which we called the CAMP
   (Table [69]1 and Fig. [70]1a). The overall collection includes a total
   of more than 40,000 unique transcripts and almost 2,500 unique
   metabolites. To maximize comparability across these heterogeneous
   studies, we applied a unified workflow to process RNA expression from
   microarray and RNA-sequencing (RNA-seq) data, harmonize metabolite
   names and annotations, and standardize data normalization and
   preprocessing ([71]Methods). The CAMP represents an unprecedented
   resource to investigate the covariation of metabolite levels and gene
   expression at scale across diverse lineages of human cancers and normal
   tissues. To ensure high quality of the data, we evaluated several
   measures of quality control (QC). These included confirming that
   changes in metabolite abundance and gene expression between tumour and
   normal samples recapitulated those from prior work (Extended Data Fig.
   [72]1a,b), and demonstrating that covariation between pairs of
   metabolites was strongest between proximal metabolites in the metabolic
   network (that is, metabolite pairs acted upon by a common enzyme;
   Extended Data Fig. [73]1c).

Table 1.

   Overview of cancer type, sample size and type of gene expression data
   in the study
   Cohort Cancer type Reference Samples (tumour/normal) Gene expression
   data type
   BRCA1 Breast cancer Terunuma et al.^[74]19 61/47 Microarray
   BRCA2 Breast cancer Tang et al.^[75]20 18/– RNA-seq
   COAD Colon adenocarcinoma Satoh et al.^[76]18 37/39 Microarray
   DLBCL Diffuse large B cell lymphoma Calvo-Vidal et al.^[77]17 62/–
   RNA-seq
   GBM Glioblastoma Wang et al. ^[78]16 74/6 RNA-seq
   HürthleCC Hürthle cell carcinoma Ganly et al. ^[79]84 28/3 RNA-seq
   HCC Hepatocellular carcinoma Chaisaingmongkol et al. ^[80]15 54/–
   Microarray
   ICC Intrahepatic cholangiocarcinoma Chaisaingmongkol et al. ^[81]15
   86/– Microarray
   OV High-grade serous ovarian cancer Gentric et al. ^[82]14 45/–
   Microarray
   PDAC Pancreas adenocarcinoma Zhang et al. ^[83]21 27/12 Microarray
   PRAD Prostate adenocarcinoma Penney et al. ^[84]13,[85]85 91/46
   Microarray
   ccRCC1 Clear-cell renal carcinoma Hakimi et al. ^[86]22 32/– RNA-seq
   ccRCC2 Clear-cell renal carcinoma Golkaram et al. ^[87]86 30/– RNA-seq
   ccRCC3 Clear-cell renal carcinoma Golkaram et al. ^[88]86 67/47 RNA-seq
   ccRCC4 Clear-cell renal carcinoma Golkaram et al. ^[89]86 52/24 RNA-seq
   [90]Open in a new tab

Fig. 1. Summary of the Cancer Atlas of Metabolic Profiles.

   [91]Fig. 1
   [92]Open in a new tab

   a, The CAMP integrates metabolomic and transcriptomic data from 15
   datasets covering 11 different cancer types, comprising both tumour and
   normal tissue. BRCA1/BRCA2, breast cancer; COAD, colon adenocarcinoma;
   DLBCL, diffuse large B cell lymphoma; GMB, glioblastoma; HürthleCC,
   Hürthle cell carcinoma; OV, high-grade serous ovarian cancer; PDAC,
   pancreas adenocarcinoma; PRAD, prostate adenocarcinoma;
   ccRCC1/ccRCC2/ccRCC3/ccRCC4, clear-cell renal carcinoma. b, Overview of
   non-parametric concordance meta-analysis. For a given transcript (T)
   and metabolite (M) pair, all measurements from all datasets were
   considered and weighted according to sample size. The concordance is a
   non-parametric measure of bivariate association between T and M that
   can be applied in meta-analysis across multiple datasets.

   [93]Source data

Extended Data Fig. 1. The Cancer Atlas of Metabolic Profiles (CAMP) data
recapitulate prior work interactions and demonstrate robust GMIs.

   [94]Extended Data Fig. 1
   [95]Open in a new tab

   (A)Differential abundance between tumor and normal samples across 7
   cancer types. Blue circles indicate metabolites depleted in tumor
   related to normal tissue and red circles indicate increased in tumor
   related to normal tissue. Grey circles mean metabolites with no
   statistically significant change in abundance. (B) Scatterplot of gene
   expression differential fraction score and differential abundance score
   in the KEGG metabolic pathways. (C) Boxplot comparing absolute
   concordance to metabolic distance from 159 metabolites that with KEGG
   id in the human 1 GEM model (12560 pair-wise distance). The data is
   displayed as a boxplot, in which the box bounds represent the 25th to
   the 75th percentile, with the median shown as a line in the center. The
   bottom whisker extends to the smallest value that is not an outlier,
   and the upper whisker extends to the largest value that is not an
   outlier. (D) Barplot of statistically significant GMIs and the number
   of individual datasets the GMI is significant in after FDR correction.

   [96]Source data

   We interrogated the CAMP dataset for recurrent covariation between
   genes and metabolites across datasets and cancers. Such gene–metabolite
   covariation could emerge via numerous mechanisms including direct
   metabolic interactions, for example, via expression changes of a
   rate-limiting enzyme, or by the accumulation/depletion of metabolites
   as part of a broader phenotype, such as a cytotoxic immune
   response^[97]34–[98]36. While each cancer type is likely to demonstrate
   its own unique pattern of transcriptomic and metabolomic changes, the
   CAMP enables discovery of metabolomic/transcriptomic covariation that
   transcends diseases. To identify cancer-type-agnostic metabolite and
   transcript covariation across CAMP datasets in a statistically
   principled manner, we developed a concordance-based meta-analysis
   approach (Fig. [99]1b and [100]Methods). Concordance is a
   non-parametric measure of correlation, which enables the identification
   of consistently positive or negative gene–metabolite associations
   across datasets^[101]37. Our measure of concordance ranges from –1 to 1
   and is closely related to non-parametric correlation coefficients such
   as Kendall’s tau, with a value of –1 corresponding to strong
   discordance and 1 corresponding to strong concordance ([102]Methods).
   We focused our analysis on tumour samples only in the CAMP (although
   analogous analysis could be carried out on normal samples), and focused
   on the 276 metabolites that were quantified in more than half of the
   tumour datasets (at least 8 of 15 tumour datasets) and the 16,082 genes
   that were quantified in all 15 studies. Of all possible gene–metabolite
   pairs (276 metabolites × 16,082 genes = 4,438,632 pairs), a total of
   22,619 pairs (0.51%) were significantly correlated after
   multiple-testing correction at a false discovery rate (FDR) of 0.01
   (Fig. [103]2a). This included 269 metabolites (~97%) and 7,987 genes
   (~50%) participating in at least one significant association
   (Supplementary Table [104]1), which we refer to as GMIs. Post-hoc QC
   confirmed that 7,737/22,619 (78%) of GMIs were statistically
   significant in two or more individual datasets (Extended Data Fig.
   [105]1d). Statistically significant GMIs identified in single-study
   concordance analysis of the BRCA1 (microarray) and BRCA2 (RNA-seq) were
   highly consistent, indicating that the choice of transcriptomic
   profiling technology (that is, microarray versus RNA-seq) did not
   introduce substantial artefacts (Extended Data Fig. [106]2).
   Importantly, the results of the concordance were not affected by
   imputation, and an analogous concordance analysis omitting all imputed
   data reproduced 20,291/22,619 (89.7%) of statistically significant
   GMIs. Finally, we examined the consistency of GMIs (considering only
   those measured in >7 distinct datasets) generated from two distinct,
   non-overlapping subsamples of the full CAMP dataset. This analysis
   identified a high degree of consistency between estimates of
   concordance for each GMI (Spearman rho 0.31, P value < 10^−^16),
   confirming the robustness of the results of our concordance
   meta-analysis.

Fig. 2. Meta-analysis across the CAMP captures lineage-transcending
gene–metabolite interactions.

   [107]Fig. 2
   [108]Open in a new tab

   a, Volcano plot of the GMIs computed between the 16,082 genes present
   in all datasets, and the 276 metabolites present in at least 8 of our
   15 tumour cohorts. The x axis indicates the scaled concordance value,
   where values above 0 indicate positive association and values below 0
   indicate negative association. The y axis represents the corresponding
   −log[10] FDR-adjusted P value. Two-tailed P values were estimated from
   the unscaled concordance value’s z-score ([109]Methods) and were
   corrected for multiple testing using the Benjamini–Hochberg method. The
   horizontal line indicates the significance cut-off of 0.01 FDR. Light
   grey dots indicate nonsignificant gene–metabolite pairs, and black dots
   indicate significant pairs. Three top hits have been highlighted. b,
   Statistically significant GMIs are enriched for proximal interactions,
   but proximal interactions nevertheless constitute a minority of all
   statistically significant GMIs. Of all significant gene–metabolite
   pairs in our concordance meta-analysis, 3,304/22,619 pairs had a
   defined distance (~14.61%), but only 565/22,619 (~2.50%) of these were
   proximal. c, Proximal GMI prioritization. GMIs for the 22 metabolites
   whose strongest GMI was proximal (distance less or equal 2). For each
   metabolite, we ranked genes by their statistical significance.
   Two-tailed P values were estimated from the unscaled concordance
   value’s z-score ([110]Methods) and were corrected for multiple testing
   using the Benjamini–Hochberg method. Red and black dots indicate
   proximal and non-proximal genes significantly associated with the
   corresponding metabolite, respectively, while grey dots indicate genes
   with nonsignificant associations. Bold metabolites exhibit a large gap
   between the dominant GMI and all other GMIs for a metabolite. d,e,
   Scatterplots of the association between kynurenine levels and two
   proximal genes (IDO1, d; AFMID, e). Metabolite abundances were scaled
   within each dataset to be displayed together. Two-tailed P values were
   estimated from the unscaled concordance value’s z-score ([111]Methods)
   and were corrected for multiple testing using the Benjamini–Hochberg
   method. f, CRISPR–CAS9-mediated knockout of IDO1 depleted IDO1 protein
   levels in HCT116 cells. Western blot was performed once and not
   repeated. g, Kynurenine levels were depleted upon knockout of IDO1 in
   HCT116 cells (n = 3 in each condition). Data are presented as mean
   values ± s.d. h, Scatterplot of the association between tryptophan
   levels and IDO1 in the CAMP. Two-tailed P values were estimated from
   the unscaled concordance value’s z-score ([112]Methods) and were
   corrected for multiple testing using the Benjamini–Hochberg method. i,
   Tryptophan levels increase upon IDO1 knockout (n = 3 in each
   condition). Data are presented as mean values ± s.d. j, Scatterplot of
   oxidized glutathione (GSSG) and GGT1. Two-tailed P values were
   estimated from the unscaled concordance value’s z-score ([113]Methods)
   and were corrected for multiple testing using the Benjamini–Hochberg
   method. k, Validation of the GSSG and GGT1 relationship was based on
   the study from Priolo et al.^[114]42. The dataset includes four data
   points for each condition. The two-tailed P value was estimated with a
   Wilcoxon rank-sum test. a.u., arbitrary units.

   [115]Source data

Extended Data Fig. 2. The statistically significant GMIs identified from
different transcriptomic profiling technology (microarray v.s. RNA
sequencing) were highly consistent.

   Extended Data Fig. 2
   [116]Open in a new tab

   Scatterplot of statistically significant GMIs in BRCA1 (microarray) and
   BRCA2 (RNA sequencing) P-values was estimated from Pearson’s
   correlation test. Axes represent concordance values.

   [117]Source data

A subset of gene–metabolite covariation represent direct enzyme–substrate
interactions

   Among the statistically significant GMIs (Fig. [118]2a), we noted that
   the strongest positively correlated GMI (IDO1–kynurenine, adjusted P
   value = 4.82 × 10^−^35) and the two strongest negatively correlated
   GMIs, GDA–guanine (adjusted P value = 2.31 × 10^−^20) and
   CD38–nicotinamide mononucleotide (NMN; adjusted P
   value = 3.90 × 10^−^16) corresponded to ‘proximal’ metabolic
   interactions, which are interactions between an enzyme and its direct
   or nearly direct substrate/product. For example, IDO1 catalyses the
   catabolism of tryptophan to N-formyl-kynurenine, which is subsequently
   metabolized to kynurenine^[119]38, and both CD38 and GDA directly
   degrade the metabolites guanine and NMN, respectively^[120]39,[121]40.
   We confirmed that statistical significance for these three GMIs was
   likely driven by several cancer types rather than a single dataset with
   very strong associations (Extended Data Fig. [122]3).

Extended Data Fig. 3. Statistical significance of these 3 GMIs was likely
driven by several cancer types.

   Extended Data Fig. 3
   [123]Open in a new tab

   Box plots of concordance for gene-metabolite pair of CD38-MNM (n=8),
   GDA-guanine (n=13) and IDO1-kynurenine (n=12) across individual
   datasets. The data is displayed as a boxplot, in which the box bounds
   represent the 25th to the 75th percentile, with the median shown as a
   line in the center. The bottom whisker extends to the smallest value
   that is not an outlier, and the upper whisker extends to the largest
   value that is not an outlier.

   [124]Source data

   The direct biochemical relationship between the three GMIs above raised
   the possibility that functional proximity between enzymes and their
   substrates/products might underlie a large fraction of GMIs. To test
   this, we systematically computed the biochemical distance between all
   gene–metabolite pairs using the highly curated Human-1 metabolic
   network model from Robinson et al.^[125]41. In this framework, a
   distance of one represents molecules that are involved in the same
   metabolic reaction, and a distance of two indicates a gene–metabolite
   pair that take part in subsequent reaction steps ([126]Methods).
   Although statistically significant GMIs were enriched for proximal
   interactions relative to nonsignificant gene–metabolite pairs (odds
   ratio 1.42, Fisher’s exact test P value = 3.09 × 10^−^15), proximal
   interactions themselves constituted only a small fraction of the total
   ensemble of statistically significant GMIs (2.5%, 565/22,619; Fig.
   [127]2b). Thus, while several of the strongest GMIs arose from proximal
   interactions, gene–metabolite proximity was a weak determinant of the
   full GMI landscape.

   To further investigate the above observations about the strength of
   specific proximal GMIs and the overall relationship between metabolic
   proximity and GMIs, we investigated the relative strength of different
   GMIs affecting a common metabolite. We then focused on the 22
   metabolites whose strongest GMI was proximal, covering diverse
   molecules involved in nucleotide metabolism (guanine, cytidine),
   cofactor metabolism (NAD^+), redox metabolism (cystine, oxidized
   glutathione (GSSG)) and other pathways (Fig. [128]2c). Interestingly,
   we found that for 8 of 22 metabolites (kynurenine, guanine, NMN, GSSG,
   tryptophan, glycerophosphocholine (GPC), cystine and cytidine), a large
   gap existed between the most significant GMI and the second-highest
   correlating transcript. This gap suggested that the pool size of these
   metabolites was strongly controlled, in a lineage-agnostic manner, by a
   single, dominant gene. Consequently, we hypothesized that targeted
   genetic knockdown of dominant GMIs for these 8 metabolites would have a
   higher likelihood of producing significant changes in pool size than
   for other metabolites with multiple, comparably strong GMIs, each of
   which might control the pool size of the metabolite (Fig. [129]2c).
   Similar results were found when relaxing the threshold for calling
   gene–metabolite proximity (Extended Data Fig. [130]4).

Extended Data Fig. 4. Gene-metabolite proximity at relaxed distance
thresholds has similar results.

   [131]Extended Data Fig. 4
   [132]Open in a new tab

   -Log[10] adjusted p-value of GMIs when definition of proximity is 2
   (top), and when it is relaxed to 3 (middle) and 4 (bottom). Two-tailed
   p-values were estimated from the unscaled concordance value’s z-score
   (see Methods) and were corrected for multiple testing using the
   Benjamini-Hochberg method. Adjusted p-values are multiplied by
   concordance sign, thus values below 0 indicate negative covariation.

   [133]Source data

   We sought to functionally validate a subset of these predicted,
   metabolite pool-controlling genes. First, we investigated the
   association between IDO1 and two metabolites, kynurenine and
   tryptophan. IDO1 converts tryptophan to N-formylkynurenine, which is
   subsequently catabolized to kynurenine by AFMID. We observed that
   kynurenine levels were strongly associated with IDO1 (Fig. [134]2d) but
   not AFMID (which acts directly to produce kynurenine; Fig. [135]2e)
   expression across the CAMP. These findings were consistent with
   independent measurements of the metabolome and transcriptome obtained
   from the Cancer Cell Line Encyclopedia consortium on ~900 cell lines,
   where IDO1 expression was associated with kynurenine abundance
   (P = 8.6 × 10^−^9) but AFMID expression was not (P = 0.8) (ref.
   ^[136]36) (Extended Data Fig. [137]5a,b). To experimentally test the
   hypothesis that disruption of IDO1 impacts both tryptophan and
   kynurenine pool sizes, we used CRISPR–Cas9-mediated knockout with
   single-guide RNAs (sgRNAs) targeted against IDO1 human colorectal
   carcinoma HCT116 cells. These experiments corroborated earlier data
   indicating that knockout of IDO1 depleted kynurenine pools (Fig.
   [138]2f,g). However, while the association between IDO1 and kynurenine
   has been widely described in the literature^[139]36, our analysis
   indicated that IDO1 is also expected to determine the pool size for
   tryptophan, an amino acid involved in numerous other reactions in the
   cell, most obviously the synthesis of proteins (Fig. [140]2h).
   Consistent with this observation, we observed that knockout of IDO1 was
   sufficient to increase tryptophan levels, indicating that the pool size
   of this highly connected proteinogenic amino acid could be perturbed in
   part through disruption of IDO1 activity (Fig. [141]2i). Second, in
   support of a proximal GMI between GGT1 and GSSG (Fig. [142]2j), we
   reanalysed existing metabolomic data from a functional knockdown of
   GGT1 versus control in human embryonic kidney HEK293T
   cells^[143]42,[144]43. This data confirmed that knockdown of GGT1 was
   associated with an increase in GSSG levels with respect to mock control
   (P value = 2.90 × 10^−^2), suggesting that GGT1 is a pool-determining
   consumer of GSSG (Fig. [145]2k). Taken together, these data demonstrate
   that lineage-transcending GMIs discovered through pathway-based
   analysis of the CAMP represent examples of genes exerting strong
   control over metabolite pool sizes.

Extended Data Fig. 5. IDO1-kynurenine and AFMID-kynurenine correlations from
the CCLE consortium.

   [146]Extended Data Fig. 5
   [147]Open in a new tab

   (A) Scatterplots comparing kynurenine abundance to IDO1, and (B) AFMID
   expression in independent metabolome and transcriptome data obtained
   from the CCLE consortium. P-value was estimated from Spearman’s rank
   correlation test.

   [148]Source data

Pathway-level metabolic and transcriptomic changes weakly covary

   Despite the interesting findings related to proximal GMIs, most (97.5%)
   statistically significant GMIs represented distant, non-proximal
   interactions beyond obvious enzyme–substrate metabolic relationships
   (Fig. [149]2b). One possible implication of such nonlocal covariation
   is that genes and metabolites in the same metabolic pathway would show
   asynchronous and uncorrelated changes across different groups of
   samples, such as tumours and normal tissues. To investigate this
   hypothesis, we studied the consistency of transcriptional and metabolic
   differences in tumour versus adjacent normal tissue across cancer
   types. To this end, we performed differential analysis of metabolite
   and transcript levels between tumour and normal tissues in the 7 CAMP
   datasets where both tissues were available (Table [150]1) and
   aggregated the results into 85 Kyoto Encyclopedia of Genes and Genomes
   (KEGG) metabolic pathways. Of these, we considered 63 pathways with at
   least one metabolite or gene measured in at least 5 of the 7 CAMP
   datasets (Supplementary Table [151]2).

   For each KEGG pathway, we evaluated (using a differential abundance
   (DA) score; [152]Methods) whether metabolites and transcripts showed
   synchronous accumulation or depletion patterns in tumours relative to
   normal tissues. Pathways were biased towards asynchronous changes
   (276/441, 63%), where increases in metabolite levels coincided with
   decreases in transcript levels, and vice versa (Fig. [153]3a). Only one
   pathway (histidine metabolism) demonstrated fully synchronous changes
   in all datasets, whereas a few others demonstrated uniformly
   asynchronous changes (for example, primary bile acid biosynthesis). We
   also assessed whether there was a correlation between the extent of
   metabolomic versus transcriptomic disruption regardless of the
   direction (using a differential fraction (DF) score; [154]Methods). A
   minority of pathways (9/63) demonstrated significant associations
   (nominal P value < 0.05) between RNA and metabolite DF scores (Fig.
   [155]3b; see Fig. [156]3c as an example). Interestingly, these 9
   pathways belonged to just 2 KEGG pathway classes (Supplementary Table
   [157]3): amino acid metabolism and carbohydrate metabolism. Enrichment
   analysis indicated that the class of carbohydrate metabolism pathway
   (of 8 in total) was significantly over-represented relative to the
   others (Fisher’s exact test P value: 2.36 × 10^−^2). Thus, most
   pathways showed no evidence of a correlation between metabolomic and
   transcriptomic disruption, emphasizing the implications of
   predominantly distally acting GMIs and prompting the question of which
   biological phenomena produce these distal GMIs.

Fig. 3. Tumour versus normal changes in metabolite and transcript abundance
are predominantly asynchronous.

   [158]Fig. 3
   [159]Open in a new tab

   a, Heat map of metabolite and transcript DA scores across datasets and
   pathways, capturing the tendency for metabolites and genes to
   accumulate or deplete in tumours relative to normal tissues. The size
   of the dots indicates the number of molecules measured in that pathway,
   while the colour represents the DA score. b, Spearman correlation
   coefficients of the metabolite and transcript DF scores in KEGG
   pathways. Red dots indicate nominal significance (P value < 0.05). A
   minority of pathways showed significant association between metabolomic
   and transcriptomic disruption across CAMP datasets. P values were
   estimated from Spearman’s rank correlation test and were corrected for
   multiple testing using the Benjamini–Hochberg method. c, Example of
   Spearman correlation calculation: Distribution of the metabolite (x
   axis) and transcript (y axis) DF scores across datasets for the citrate
   cycle (tricarboxylic acid (TCA) cycle) pathway. P values were estimated
   from Spearman’s rank correlation test.

   [160]Source data

Hub metabolites are enriched for immune genes associations

   Having established that most GMIs do not represent biochemically
   proximal interactions, we adopted a broader approach to identify the
   driving factors of metabolite–transcript correlations. First, we
   investigated the distribution of GMIs across metabolites and genes,
   observing that GMIs were strongly concentrated in a small number of
   metabolites (Fig. [161]4a). The top three metabolites with the highest
   number of GMIs, quinolinate, NMN and 5′-methylthioadenosine alone
   contributed to 17% (3,823/22,619) of all GMIs in our analysis, and the
   top ten metabolites covered 35% of the GMIs (8,048/22,619), far higher
   than the fraction covered by the top eight genes (Fig. [162]4b and
   Extended Data Fig. [163]6a,b). That is, a small number of metabolites
   participated in an exceptionally high number of GMIs, acting as ‘hubs’
   for strong covariation with gene expression. Interestingly, hub
   metabolites concentrated in certain metabolic pathways. Among the top
   ten most correlated metabolites, we found several constituents of the
   NAD^+ biosynthesis pathway (quinolinate, NMN and NAD^+) and nucleotide
   metabolism (thymine, uracil and adenine).

Fig. 4. Gene–metabolite interactions are concentrated in hub metabolites
associated with immune-related genes.

   [164]Fig. 4
   [165]Open in a new tab

   a, Distribution of significant GMIs across metabolites. The top ten
   metabolites with the highest number of significant associations are
   labelled. b, Distribution of significant GMIs across genes. The top
   eight genes with the highest number of significant associations are
   labelled. c, KEGG-based gene pathway enrichment analysis results for
   metabolites with at least one GMI and at least one significantly
   enriched pathway. The heat map colour represents the strength of the
   enrichment as the negative log[10](P value) of the pathway enrichment
   test. Two-sided P values were estimated with a Fisher’s exact test and
   adjusted for multiple testing using the Benjamini–Hochberg method.
   Cells coloured in shades of red indicate pathways that were significant
   after multiple-testing corrections (adjusted P value < 0.01), and grey
   cells indicate insignificant associations. Individual pathways were
   classified based on the type of process they describe: metabolism
   (lilac), immune (green) or other (cream). MTA, 5′-methylthioadenosine.

   [166]Source data

Extended Data Fig. 6. 200 strongest non-proximal GMIs.

   [167]Extended Data Fig. 6
   [168]Open in a new tab

   Barplot of metabolite involved in the 200 strongest non-proximal GMIs
   excluding pairs without a distance (A) and including pairs without a
   distance (B). Pairs without a distance are denoted as NA in
   Supplementary Table [169]1. Metabolites in black belong to the NAD+
   metabolism pathway while all others are in grey.

   [170]Source data

   To determine whether the genes correlated with a particular metabolite
   were enriched for specific cellular functions, we performed an
   unsupervised pathway enrichment analysis. For each metabolite with at
   least one GMI, we investigated whether specific pathways and processes
   were over-represented in the transcripts correlated with that
   metabolite. Overall, the considered transcripts spanned over 146 KEGG
   pathways (Supplementary Table [171]4), covering the 85 metabolic
   processes in Fig. [172]3 as well as cellular processes (for example,
   cell growth and death), signalling pathways, genetic processing
   pathways (for example, transcription and translation) and organismal
   systems pathways (for example, immune, endocrine and sensory systems).
   A total of 32 unique pathways were over-represented across 40
   metabolites (adjusted P value < 0.01; Fig. [173]4c). Interestingly,
   only 2 of those 32 pathways represented metabolic processes: oxidative
   phosphorylation, for which the top three most associated metabolites
   were adenine, NMN and FAD, and the TCA cycle, also associated with FAD.
   The remaining top-ranked pathways were exclusively non-metabolic ones.
   Three metabolites in particular, quinolinate, NMN and NAD^+, showed
   broad enrichment for immune-related cellular processes, including
   chemokine signalling, as well as B cell and T cell antigen receptor
   signalling pathways. This indicates that these metabolites are
   correlated with the expression of a wide array of genes associated with
   the immune response.

NAD^+-related metabolites associate with immune cell infiltration

   Human tumour tissues are heterogeneous compositions of various cell
   populations, including tumour cells, immune cells and stromal cells.
   Bulk and single-cell profiling technologies have established that a
   large subset of genes are exclusively expressed in immune cells or
   non-immune cell subpopulations (for example, tumour
   cells)^[174]44,[175]45. We reasoned that the correlation between
   NAD^+-related metabolites and immune-related genes (Fig. [176]4c) could
   therefore arise if NAD^+-related metabolites were at a
   characteristically higher or lower abundance in immune cells relative
   to non-immune cells. One implication of this hypothesis is that, while
   each cancer type might demonstrate its own unique metabolomic changes
   associated with immune infiltration, NAD^+-related metabolites should
   be expected to demonstrate consistent effects across many different
   cancer types.

   To determine if NAD^+-related metabolites were associated with immune
   infiltration across tumours, we used single-sample gene-set enrichment
   analysis (ssGSEA)^[177]46 to compute a previously validated 141-gene
   RNA signature of overall immune cell infiltration (ImmuneScore)
   directly from bulk RNA-seq data^[178]47, and identified the individual
   metabolites correlated with this immune phenotype. Concordance between
   metabolite levels and the ImmuneScore signature was assessed across all
   samples in each CAMP dataset (Fig. [179]5a). In general, covariation
   between specific metabolite pools and ImmuneScore expression was
   cancer-type specific. For example, of the metabolites significantly
   associated with ImmuneScore in intrahepatic cholangiocarcinoma (ICC)
   and hepatocellular carcinoma (HCC; representing the top two cancer
   types with the highest number of metabolites significantly associated
   with ImmuneScore), only 3 metabolites were consistently associated with
   ImmuneScore in both datasets (Fig. [180]5b and Supplementary Table
   [181]5). We did not observe a significant correlation (Spearman’s rank
   correlation P value = 0.69) between the percentage of metabolites
   significantly associated with ImmuneScore and expression of the
   ImmuneScore signature itself (Fig. [182]5a), suggesting that the extent
   of immune infiltration did not confound our analyses.

Fig. 5. The abundance of NAD^+-related metabolites is shaped by immune
infiltration in the tumour microenvironment.

   [183]Fig. 5
   [184]Open in a new tab

   a, Left, association of metabolite abundance with immune infiltration
   varied significantly across cancer types. The bar plot indicates the
   fraction of metabolites significantly associated with the ImmuneScore
   signature in each dataset. Bar colours code for the sample size of the
   dataset. Sample size is the number of tumour samples of the dataset in
   Fig. [185]1a. Middle, box plots indicate the expression range of the
   ImmuneScore signature within each dataset. Right, plot of scaled
   concordance calculated between metabolites and ImmuneScore. Red dots
   indicate metabolites with a positive association with ImmuneScore, blue
   dots indicate metabolites with negative associations, and grey dots
   indicate metabolites that were not statistically significant. b, Heat
   maps of five metabolites—three that were highly correlated with
   ImmuneScore in the ICC dataset and two that were highly correlated with
   ImmuneScore in the HCC dataset. Samples were sorted by increasing
   ImmuneScore. UDP-G, UDP-glucuronate; GPC-16:1, 1-palmitoleoyl-GPC
   (16:1); GPC-18:2, 1-linoleoyl-GPC (18:2). c, Bar plot indicating the
   strength of association between metabolites and ImmuneScore from
   concordance meta-analysis. Two-tailed P values were estimated from the
   unscaled concordance value’s z-score ([186]Methods) and were corrected
   for multiple testing using the Benjamini–Hochberg method. Bar length
   represents the −log[10] adjusted P value. Metabolites related to NAD^+
   metabolism are shown in black. d, Scatterplots of the abundance of two
   NAD^+-related metabolites, quinolinate and NMN, versus ImmuneScore
   expression across all datasets. Metabolite abundances were scaled
   within each dataset. e, Bar plot comparing the absolute concordance
   values of metabolites to ImmuneScore in a pathway compared to all other
   pathways (one-sided P values were estimated from Wilcoxon rank-sum test
   and were corrected for multiple testing using the Benjamini–Hochberg
   method). f, Metabolomic measurements of purified populations of CD45^−
   tumour cells and CD45^+ T cells isolated from ovarian cancer tumours.
   NAD^+ was negatively correlated with the ImmuneScore signature in the
   ovarian cancer dataset (n = 45). NAD^+ was similarly lower in abundance
   in CD4^+/CD8^+ cells than CD45^− (non-immune) cells in the dataset of
   purified cell populations (n = 24 in CD4^+/CD8^+; n = 18 in CD45^−).

   [187]Source data

   While metabolite concordance with ImmuneScore was generally
   heterogeneous across CAMP datasets, we observed that several
   NAD^+-related metabolites, including quinolinate and NMN, were
   recurrently associated with high or low levels of immune infiltration
   in numerous disease contexts (Fig. [188]5a). To systematically identify
   such lineage-agnostic metabolomic correlates of immune infiltration, we
   again applied concordance meta-analysis, identifying 14 metabolites
   significantly associated with ImmuneScore across datasets (adjusted P
   value < 0.05; Fig. [189]5c), with quinolinate and NMN being the
   strongest hits (Fig. [190]5d). Interestingly, 4/14 significantly
   associated metabolites (quinolinate, NAD^+, NMN and kynurenine) were
   members of the NAD^+ biosynthesis pathway. Consistent with this, we
   identified NAD^+ metabolism as the sole pathway whose metabolites
   demonstrated significantly stronger association with immune
   infiltration than all other pathways (adjusted P value = 0.04; Fig.
   [191]5e).

   The above analysis suggested that NAD^+-related metabolites were at
   differential abundance in immune cells relative to non-immune cells,
   and that this effect produced a consequent accumulation of
   NAD^+-related metabolites in immune-infiltrated tumours. Some support
   for this hypothesis can be found in previously published
   immunohistochemical data indicating that the abundance of quinolinate
   increases dramatically in diverse immune cell populations in response
   to Toll-like receptor 4 ligands such as lipopolysaccharide^[192]48. To
   provide more evidence for this hypothesis, we compared our findings to
   a recently published study of metabolomic profiles of purified CD45^−
   tumour cells and CD45^+ (CD8^+ and CD4^+) T cells from ovarian cancer
   tumours^[193]48. In our data, NAD^+ was negatively correlated with
   ImmuneScore in ovarian cancer, suggesting that it was at lower
   abundance in immune cells relative to non-immune cells (Fig. [194]5f).
   Consistent with this, NAD^+ was at significantly lower abundance in
   CD45^+ T cells than CD45^− tumour cells in the dataset of purified cell
   populations (NAD^+ log[2] fold change = −1.22, P value = 1.7 × 10^−^4;
   Fig. [195]5f). Together, these analyses suggest that the pool sizes of
   NAD^+-related metabolites are at characteristically different abundance
   in immune cells relative to other cell types, and that this effect
   ultimately drives the association of quinolinate and other
   NAD^+-related metabolites with immune infiltration in bulk tumour data.

Kynurenine and histamine levels correlate to specific cell populations

   Whereas in the previous analysis we investigated the association of
   metabolite levels with overall immune infiltration, we next turned to
   investigating the association of metabolite levels and specific immune
   cell populations (for example, T cells, macrophages and numerous other
   cell types), each with unique transcriptional phenotypes and
   immunological functions^[196]49–[197]51. To investigate how these
   diverse immune cell populations contributed to the observed GMIs, we
   estimated the abundance of 23 immune cell types from bulk
   transcriptomics profiles^[198]51 using ssGSEA as shown
   previously^[199]46, and computed their association with metabolite
   levels across cancers. We again focused on lineage-agnostic
   relationships by performing a concordance meta-analysis to calculate
   associations between metabolite levels and immune cell signatures
   across tumours from all cancer types. In total, 7.3% of all
   metabolite–signature pairs (466 of 6,348 pairs) demonstrated
   statistically significant associations (adjusted P value < 0.05; Fig.
   [200]6a). Among these, quinolinate was positively associated with
   almost all immune cell populations (17/23), consistent with prior
   immunohistochemical data and suggesting that it accumulates in a
   variety of immune cell types in a cancer-type-agnostic manner^[201]52.

Fig. 6. A subset of metabolites associates with specific immune cell
lineages.

   [202]Fig. 6
   [203]Open in a new tab

   a, Volcano plot of cell-type signature–metabolite interactions. The rug
   plot at the bottom highlights the numerous associations with
   quinolinate. b, Manhattan plot of adjusted concordance P values between
   metabolites and cell types. c, Histamine associates with the abundance
   of mast cells across most datasets of the CAMP. d, Bar plot of adjusted
   concordance P values of metabolites correlated with the mast cell gene
   signature. The red dashed line indicates the significance cut-off of
   0.05. Metabolites labelled in red are associated with histamine
   metabolism. e, HDC expression strongly associates with mast cell
   abundance across the CAMP. f, Kynurenine abundance associates with an
   aDC signature across the CAMP. In a–f, two-tailed P values were
   estimated from the z-scores of the unscaled concordance values
   ([204]Methods) and were corrected for multiple testing using the
   Benjamini–Hochberg method.

   [205]Source data

   Aside from quinolinate, the two strongest associations between immune
   cell-type signatures and metabolite levels were two comparatively rare
   cell populations, namely mast cells and histamine and activated
   dendritic cells (aDCs) and kynurenine (Fig. [206]6b). Mast cells are a
   myeloid cell population that, when stimulated, mediate the inflammatory
   process by synthesizing histamine from histidine using the enzyme HDC.
   We found both histamine and its related metabolite 1-methylhistamine
   were the two metabolites most significantly associated with the
   presence of mast cells (Fig. [207]6c,d), and that the association
   between histamine and mast cell abundance was driven by diverse cancer
   types in the CAMP (Extended Data Fig. [208]7a). Histamine levels
   themselves were strongly associated with the expression of HDC across
   the CAMP (Fig. [209]6e). Importantly, the mast cell signature contains
   HDC, but when the signature was recalculated without HDC the
   significant positive concordance was preserved (concordance = 0.28, P
   value = 3.4 × 10^−^17; Extended Data Fig. [210]7b). Moreover,
   single-cell data also indicate that HDC expression is strongly elevated
   in mast cells relative to other cell types (Extended Data Fig.
   [211]7c). Interestingly, the median variation of histamine across the
   tumour datasets in the CAMP was ~740-fold, implying that fluctuations
   in the abundance of an otherwise rare cell population were sufficient
   to produce large-magnitude changes in histamine abundance in the bulk
   tumour.

Extended Data Fig. 7. Diverse cancer types likely drove the association
between histamine and mast cells.

   [212]Extended Data Fig. 7
   [213]Open in a new tab

   (A) Box plots of concordance of histamine – mast cells across 13
   datasets. The data is displayed as a boxplot, in which the box bounds
   represent the 25th to the 75th percentile, with the median shown as a
   line in the center. The bottom whisker extends to the smallest value
   that is not an outlier, and the upper whisker extends to the largest
   value that is not an outlier. (B) Scatterplots comparing expression of
   the mast cell signature without HDC to histamine abundance. Two-tailed
   p-values were estimated from the unscaled concordance value’s z-score
   (see Methods). (C) Heatmap of single cell gene expression data for HDC
   across datasets from the TISCH2 database^[214]85,[215]86. Expression is
   standardized to z-scores across cell types in a dataset. Considering
   only datasets in which mast cells are identified, mast cells exhibit
   highest expression of HDC.

   [216]Source data

   In contrast to mast cells and their physiological role in producing
   histamine, DCs are not known to be dedicated sources of kynurenine in
   the microenvironment, although single-cell data indicate that IDO1
   expression is strongly elevated in DCs relative to other cell types
   (Extended Data Fig. [217]8a). While the aDC signature contains IDO1
   (and IDO1 participates in a strong GMI with kynurenine; Fig. [218]2),
   an aDC signature without IDO1 preserved strong positive concordance
   with kynurenine (concordance = 0.16, P value = 3.3 × 10^−^5; Extended
   Data Fig. [219]8b), confirming that this association was not solely
   dependent on IDO1 itself (Fig. [220]6f). IDO1 and kynurenine
   corresponded to the strongest GMI in the CAMP, raising two opposing
   hypotheses on the mechanism underlying the primary source of kynurenine
   in the tumour microenvironment: either bulk tumour expression of IDO1
   is primarily determined by high DC-specific expression or,
   alternatively, it is driven by a comparatively low expression in far
   more abundant non-DC-cell populations. Resolving this ambiguity would
   require single-cell measurements of metabolite levels, which have
   recently become technically feasible^[221]53. Together, these data
   suggest that the presence of individual and comparatively rare cell
   populations is associated with shifts in the abundance of
   immunomodulatory metabolites in the tumour microenvironment.

Extended Data Fig. 8. The association between kynurenine and dendritic cells
was not solely dependent on IDO1 itself.

   [222]Extended Data Fig. 8
   [223]Open in a new tab

   (A) Heatmap of single cell gene expression data for IDO1 across
   datasets from the TISCH2 database^[224]85,[225]86. Expression is
   standardized to z-scores across cell types in a dataset. Considering
   only datasets in which dendritic cells are identified, dendritic cells
   exhibit the highest expression of IDO1. (B) Scatterplots comparing
   expression of the aDC signature without IDO1 to kynurenine abundance.
   Two-tailed p-values were estimated from the unscaled concordance
   value’s z-score (see Methods).

   [226]Source data

Discussion

   Metabolism is jointly controlled by genetically encoded enzymes and
   small-molecule metabolites. To study the interactions between genes and
   metabolites at scale, we assembled and harmonized a database of
   metabolomic and transcriptomic data from ~1,000 tumour and normal
   samples, which we refer to as the CAMP. Although large-scale multimodal
   measurements of metabolism have previously been produced in
   bacteria^[227]54 and yeast^[228]55, a comparable resource for human
   cancers was missing. The CAMP is thus a resource and represents a
   significant public database for studying the metabolism of complex
   human tissues and cancers, and for the interrogation of gene–metabolite
   covariation across different tissue and disease contexts. Our analysis
   of the CAMP demonstrates that large-scale studies of multimodal
   metabolic data can reveal fundamental principles of metabolic
   regulation at the scale of both individual metabolic reactions (Fig.
   [229]2) and complex human tissues (Figs. [230]5 and [231]6).

   Our detailed statistical analysis of the CAMP revealed a multitude of
   gene–metabolite associations that transcended the tissue of origin. We
   focused on two specific types of gene–metabolite covariation. The
   first, induced by functional proximity, identified metabolic genes
   whose activation or inhibition is likely to have a significant effect
   on the direct substrate or product of the respective reaction (Fig.
   [232]2). While metabolite pools are likely controlled by a large number
   of genes and other factors, our data-driven analysis identified a small
   subset of metabolites whose pool size was strongly associated with a
   single, proximal gene across tissue lineages. The GMIs identified in
   Fig. [233]2b offer a data-informed, rational approach for modulating
   the pool sizes of their metabolite constituents. While each of the
   highlighted metabolites in Fig. [234]2b may participate in numerous
   metabolic reactions, data from the CAMP specifically nominates single
   genes (for example, GGT1 for GSSG, and IDO1 for tryptophan and
   kynurenine) as those targets whose perturbation is most likely to
   disrupt the corresponding pool size.

   The second broad form of gene–metabolite covariation, likely induced by
   cell-type-specific physiology, corresponded to metabolites whose
   abundance was associated with the presence of specific immune cells in
   the microenvironment. Thus, a sizeable fraction of non-proximal GMIs
   was associated with a small number of metabolites (enriched for
   NAD^+-related molecules) that were significantly correlated with a
   large number of genes, and in particular immune-related genes. This
   apparent association between the abundance of specific immune cells in
   a tissue specimen and the levels of numerous NAD^+-associated
   metabolites suggests that immune cells have evolved mechanisms to
   maintain the concentrations of these metabolites at characteristically
   different levels relative to other cell types. Because NAD^+ is both
   the central mediator of redox poise in the cell and a cofactor for
   numerous metabolic and non-metabolic reactions, understanding the
   mechanisms by which immune cells achieve differential abundance of
   NAD^+-related metabolites, and the selective pressure to do so, can
   provide insights into the metabolic phenotypes underlying both cancer
   and other diseases involving dysfunctional immune responses^[235]56.
   Emerging spatial metabolomics (for example, high-resolution MALDI) and
   single-cell metabolomics (for example, rapid purification with
   paramagnetic beads, followed by mass spectrometry) technologies hold
   the promise of revealing the full extent of cell-type-specific
   metabolomic adaptations. Such approaches can be used alone or in tandem
   to measure the metabolomic profiles of individual cells in
   heterogeneous tissue slices^[236]57–[237]60, rendering the full extent
   of cell-type-specific metabolomic alterations within experimental
   reach.

   Our findings here nominate a new paradigm for understanding
   tumour-associated changes in the metabolome and how it relates to the
   cellular composition of the tissue microenvironment. Our discovery that
   the abundances of NAD^+-related metabolites change consistently across
   tissues as a function of immune cell abundance adds to a limited but
   pre-existing understanding of how cell-type-specific metabolism enables
   function (for example, in the accumulation of lipids in adipocytes and
   clear-cell tumour cells). More importantly, the discovery that the
   tumour metabolome changes significantly based on immune cell
   composition has significant implications for the interpretation of
   metabolomic data in the context of cancer. Prior studies (including
   those from our own group) have often used adjacent normal tissue as a
   reference to understand tumour-specific changes in metabolite levels.
   What the discoveries here reinforce is that changes in the bulk tumour
   metabolome may not be caused by tumour-cell-intrinsic changes in
   metabolism, and instead may in some or many cases arise from
   non-malignant cell populations.

   Discoveries about the function of cancer genes have emerged from a
   combination of untargeted, population-scale genomic surveys^[238]61 and
   mechanistic experiments in specific disease and genetic
   backgrounds^[239]62. Combining these approaches has proven
   transformative for the discovery of recurrent and large-effect-size
   alterations and prompted their characterization in model systems of
   human disease. In contrast, the field of cancer metabolism has
   primarily been driven by bottom-up experiments with modest support from
   large-scale (largely genetic or transcriptomic)
   datasets^[240]63–[241]65. The CAMP is a counterpoint to these efforts.
   By assembling and harmonizing in one database metabolomic and
   transcriptomic data from diverse diseases, the CAMP represents a unique
   opportunity for de novo discovery of translationally relevant metabolic
   phenotypes in cancer. Expanding the scope of the CAMP to include both
   additional multimodal metabolic data as it is published (enabled by
   open-source code; Data availability and Code availability) and other
   forms of data, including but not limited to genomic sequencing,
   epigenetic profiling and proteomic measurements, holds the potential to
   reveal entirely new and highly recurrent metabolic phenomena in cancer.

Methods

Collection of Cancer Atlas of Metabolic Profiles

   We combined 12 published datasets with 3 additional in-house datasets
   that profiled metabolite and gene expression from the same samples to
   create a comprehensive collection of 988 samples (764 tumour samples
   and 224 adjacent normal samples) across 11 different cancer types,
   covering 15 datasets, which we called the CAMP. Details and references