Abstract

   Copy number alteration (CNA) profiling of human tumors has revealed
   recurrent patterns of DNA amplifications and deletions across diverse
   cancer types. These patterns are suggestive of conserved selection
   pressures during tumor evolution but cannot be fully explained by known
   oncogenes and tumor suppressor genes. Using a pan‐cancer analysis of
   CNA data from patient tumors and experimental systems, here we show
   that principal component analysis‐defined CNA signatures are predictive
   of glycolytic phenotypes, including ^18F‐fluorodeoxy‐glucose (FDG)
   avidity of patient tumors, and increased proliferation. The primary CNA
   signature is enriched for p53 mutations and is associated with
   glycolysis through coordinate amplification of glycolytic genes and
   other cancer‐linked metabolic enzymes. A pan‐cancer and cross‐species
   comparison of CNAs highlighted 26 consistently altered DNA regions,
   containing 11 enzymes in the glycolysis pathway in addition to known
   cancer‐driving genes. Furthermore, exogenous expression of hexokinase
   and enolase enzymes in an experimental immortalization system altered
   the subsequent copy number status of the corresponding endogenous loci,
   supporting the hypothesis that these metabolic genes act as drivers
   within the conserved CNA amplification regions. Taken together, these
   results demonstrate that metabolic stress acts as a selective pressure
   underlying the recurrent CNAs observed in human tumors, and further
   cast genomic instability as an enabling event in tumorigenesis and
   metabolic evolution.

   Keywords: aneuploidy, DNA copy number alterations, genomic instability,
   glycolysis, metabolism

   Subject Categories: Cancer, Genome-Scale & Integrative Biology,
   Metabolism

Introduction

   Cancer cells differ from normal cells in that they exhibit aberrant
   proliferation, resist apoptosis, and invade other tissues (Hanahan &
   Weinberg, [74]2011). Modern cancer classification relies on molecular
   characterization, including examination of genomic DNA mutations and
   copy number alterations (CNAs; Stuart & Sellers, [75]2009). Although
   individual oncogenes and tumor suppressor genes are preferential
   targets of DNA amplifications and deletions, respectively, the
   recurrent CNA patterns in tumors cannot be fully explained by canonical
   cancer genes (Beroukhim et al, [76]2010; Muller et al, [77]2012; Davoli
   et al, [78]2013). Thus, the unexplained recurrent CNA patterns observed
   in human cancer subtypes are suggestive of additional, not yet fully
   defined, selective pressures that are conserved across patients and
   tumor types (Cahill et al, [79]1999; Sheltzer, [80]2013; Cai et al,
   [81]2016). Reports that the cumulative phenotypic effects of many small
   gene dosage alterations across the genome can impact the resulting
   tumor copy number landscape (Solimini et al, [82]2012; Davoli et al,
   [83]2013) illustrate a need to consider more subtle and combinatorial
   effects to explain the remaining selective forces underlying recurrent
   CNA patterns observed in human cancers.

   One of the fundamental and consequential differences between
   non‐transformed and tumorigenic cells is the reprogramming of cellular
   metabolism (Hanahan & Weinberg, [84]2011). The altered metabolism of
   tumors is thought to benefit transformed cells in several ways.
   Upregulation of glucose metabolism allows proliferating cells to meet
   their energy demand through synthesis of adenosine triphosphate (ATP),
   while increased flux through glycolysis branch pathways provides
   dividing cells with intermediates necessary for biosynthesis of
   nucleotides and fatty acids, as well as reducing agents such as
   glutathione and NADPH (DeBerardinis et al, [85]2008; Cairns et al,
   [86]2011). Moreover, in addition to glucose, cancer cells frequently
   upregulate consumption of other metabolites for energy and biomass
   generation, including glutamine, serine, and glycine (Jain et al,
   [87]2012; Maddocks et al, [88]2013). Notably, several individual
   metabolic enzymes have been directly implicated in tumorigenesis (Kim
   et al, [89]2007; Dang et al, [90]2009; Locasale et al, [91]2011;
   Possemato et al, [92]2011; Patra et al, [93]2013; Li et al, [94]2014;
   Wang et al, [95]2014; Xie et al, [96]2014) and/or immortalization
   (Kondoh et al, [97]2005; Kondoh, [98]2009; Kaplon et al, [99]2013),
   suggesting that altered metabolism is not a passive bystander, but
   rather a driving force of oncogenesis (Yun et al, [100]2009; Zhang
   et al, [101]2012). Using an integrative analysis of CNA data from human
   tumors, mouse models of cancer, cancer cell lines, and a murine
   experimental immortalization system, here we show that the loci of
   metabolic genes impact the recurrent CNA changes observed in
   genomically unstable tumors. Our bioinformatic and experimental results
   support a tumorigenesis model in which copy number changes in metabolic
   genes contribute to an enhanced glycolytic and proliferative state (see
   Fig [102]EV1 for a schematic of our overall approach).

Figure EV1. Integration of patient and model system CNA in a pan‐cancer and
cross‐species bioinformatic and experimental analysis.

   Figure EV1
   [103]Open in a new tab

   Overall workflow for the analyses described in the Results section.

Results

PCA‐defined CNA signatures in human cancers

   To develop an unbiased understanding of DNA copy number alterations
   (CNAs) in cancer, we performed principal component analysis (PCA) of
   gene‐based CNA data derived from comparative genomic hybridization
   (CGH) microarrays from 15 tumor types available from The Cancer Genome
   Atlas (TCGA). This pan‐cancer PCA revealed a high degree of similarity
   in CNA profiles between basal breast invasive carcinoma (BRCA basal),
   lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma
   (OV), and serous uterine corpus endometrial carcinoma (UCEC serous)
   (Fig [104]1A and [105]Appendix Fig S1). In tumor type‐specific PCA,
   analyzing the four tissue‐defined tumor sets of BRCA, LU (lung cancer
   consisting of LUSC and LUAD [lung adenocarcinoma]), OV, and UCEC
   revealed two strong PCA‐based CNA signatures, termed signatures A and
   B, in all four cases. Notably, signature A was highly consistent across
   each tumor type, and reflected the pattern of pan‐cancer PC1 loadings
   (Figs [106]1B and C, and [107]EV2A–C). In BRCA, signature A tumors were
   enriched for the basal subtype, p53 point mutations, high numbers of
   genomic breakpoints, and thus subchromosomal alterations (Figs [108]1B
   and D, and [109]EV2D; P < 0.001, P < 0.001, P = 2 × 10^−4,
   respectively). Signature B BRCA tumors, in contrast, were enriched for
   luminal type tumors (P < 0.001) and exhibited amplifications of the
   oncogenes MYC and MDM2 and deletion of the tumor suppressor CDKN2A
   (Fig [110]1B). Amplification of MDM2 and loss of CDKN2A were generally
   mutually exclusive in signature B tumors (Fig [111]EV2E), reflecting
   alternate mechanisms for disabling the p53/ARF axis (Sherr & Weber,
   [112]2000). In the other tissues, signature A tumors were enriched for
   lung squamous cell carcinomas, the proliferative subtype of ovarian
   cancer (The Cancer Genome Atlas Research Network, [113]2011), and the
   serous subtype of uterine cancer (Fig [114]EV2A–D). Overall, signature
   A tumors demonstrated enrichment of p53 mutations, more genomic
   breakpoints (BRCA, LU, and UCEC), and a higher degree of copy number
   alterations (LU, OV, and UCEC) than signature B tumors
   ([115]Appendix Fig S2A and B, P‐values indicated in figure). In
   signature A tumors, the per tumor average segment size is on the scale
   of 1 × 10^7 base pairs (approximately one‐tenth of a chromosome,
   containing on the order of 100 genes). Overall, the segment sizes in
   signature A span from focal to arm‐length/whole chromosome scale. The
   per tumor average segment size for signature B BRCA, LU, and UCEC
   tumors is statistically larger by 1.3‐fold to sevenfold
   ([116]Appendix Fig S2C–E; P = 2 × 10^−4 or less). Unlike signature A
   tumors, the signature B CNA patterns were quite distinct between tumor
   types, although some commonalities were observed including point
   mutations in oncogenes such as KRAS (LU and UCEC) and amplification of
   MYC (BRCA, LU, and OV). An alternative approach using hierarchical
   clustering confirmed the existence of the shared pan‐cancer CNA
   signatures across multiple tumor types (signature A), as well as
   distinct signature subtypes within each of the BRCA, OV, UCEC, and LU
   tumor types (signature A vs. signature B) (Fig [117]EV3 and
   [118]Appendix Fig S3, P‐values of concordance between PCA and
   clustering approaches indicated in figure). In summary, PCA revealed
   the CNA signature A as a pattern shared across a subset of tumors in
   multiple tissue types, as well as several tumor type‐specific cases of
   more distinct signature B patterns ([119]Appendix Fig S1D).

Figure 1. Principal component analysis (PCA) reveals a shared CNA signature
in breast, lung, ovarian, and uterine carcinomas.

   Figure 1
   [120]Open in a new tab
    1. Pan‐cancer PCA of copy number data from a balanced, random sampling
       of tumors of 15 tumor types from The Cancer Genome Atlas (TCGA).
       The average tumor PC scores for each tumor subtype are shown.
       Pan‐cancer PC1 scores primarily separate diploid from highly
       aneuploid tumors, while PC2 distinguished GBM from the other tumor
       types ([121]Appendix Fig S1A and B). Four tumor subtypes with
       similar PC scores are labeled as “Signature positive (+)”. Tumor
       type abbreviations are as defined by TCGA: ovarian serous
       cystadenocarcinoma (OV), breast (BR), bladder urothelial (BL), and
       thyroid (TH) carcinoma (*CA), uterine corpus endometrial carcinomas
       (UCEC), lung (LU), and head and neck (HN) squamous carcinomas
       (*SC), skin cutaneous melanoma (SKCM), lung (LU), rectal (RE),
       colon (CO), stomach (ST), and prostate (PR) adenocarcinomas (*AD),
       glioblastoma (GBM), low‐grade glioma (LGG), and kidney renal clear
       cell carcinoma (KIRC).
    2. Copy number profiles of 873 breast invasive carcinomas (BRCA).
       Tumors (rows) sorted by tumor‐specific principal component 1 (PC1)
       score with genomic locations listed across the top. PCA identified
       and distinguished two signatures with similar degrees of variance
       in the first component. The triangle marks the transition from
       signature A through diploid samples to signature B. The membership
       of each tumor with known molecular subtypes (e.g., basal/luminal)
       and mutant TP53 status are indicated on the right with red
       horizontal bars and corresponding permutation‐based enrichment
       P‐values. Basal and luminal subtype classifications are from a
       published gene expression‐based determination (The Cancer Genome
       Atlas Research Network, [122]2012b). Signature A was moderately
       enriched for claudin‐low tumors, whereas the HER2‐enriched subtype
       was not significantly associated with either signature A or B
       (Fig [123]EV2D). Similar PCA‐based distinctions were observed in
       the LU, OV, and UCEC datasets (Fig [124]EV2A–C).
    3. Pan‐cancer PC1 loadings and signature A summary profiles for
       ovarian (OV), breast (BRCA), uterine (UCEC), and lung (LU) tumor
       types. Summary signatures are normalized gene loci signal‐to‐noise
       ratios (SNRs) of the top 10% of PC1‐based core signature A tumors
       compared to normal (non‐tumor) samples. Pan‐cancer PC1‐3 loadings
       are shown in [125]Appendix Fig S1C. Consistency signatures of
       conserved amplification and deletions (consistent regions) are
       indicated by their signed absolute minimum consistency score
       (SAMCS). The SAMCS is non‐zero when all CNA summary signatures have
       the same sign across all tumor types, and is derived from the
       absolute value‐based minimum summary metric, and then re‐signed
       positive for amplification or negative for deletion (see
       [126]Materials and Methods). In the bar graph, red and blue denote
       consistently amplified and deleted regions, respectively. The
       number or percentage of consistent regions, genome coverage, and
       gene loci are indicated. The positions of canonical oncogenes,
       proto‐oncogenes, tumor suppressor genes, and telomerase components
       (TERC, TERT) present in these regions are indicated.
    4. Breast signature A tumors have more DNA breakpoints per chromosome
       than signature B tumors, but similar levels of copy number
       alteration (as measured by an integrated CNA score proportional to
       the extent and absolute value magnitude of amplifications and
       deletions). P‐values are indicated (top, Mann–Whitney U‐test;
       bottom, Student's t‐test), and data presented in box (median, first
       and third quartiles) and whisker (extreme value) plots. The number
       of breakpoints is inversely proportional to the mean DNA segment
       length.
    5. KEGG metabolism pathway enrichment analysis based on consistent CNA
       patterns in the core signature A tumors of (A). The OV, BRCA, UCEC,
       and LU signatures from (C) were sequentially added and only
       directionally consistent CNA changes were retained (see
       [127]Materials and Methods). The combined
       glycolysis–gluconeogenesis (glycolysis) and pentose phosphate
       pathway was included based on our prior mRNA work identifying the
       predictive value of this gene set (Palaskas et al, [128]2011). Core
       glycolysis is a KEGG‐defined gene subset ([129]M00001).
       [130]Table EV1 lists enrichment results for all KEGG metabolism
       pathways.
    6. Schematic showing average signal‐to‐noise (SNR) metrics of core
       glycolysis pathway genes plus
       6‐phosphofructo‐2‐kinase/fructose‐2,6‐biphosphatase (PFKFB and
       TIGAR), lactate dehydrogenase (LDH), and pyruvate dehydrogenase
       (PDH) in core signature A tumors from (C) and (E). A more detailed
       listing of the gene names can be found in [131]Materials and
       Methods.
    7. Gene copy number alteration distributions of selected glycolysis
       genes, TIGAR, the tumor suppressor p53, and the p53‐associated
       cancer genes MDM2 and CDKN2A in BRCA tumors. Data are presented in
       box (median, first and third quartiles) and whisker (extreme value)
       plots.

   Data information: In panels (C, E–G), the top 10% of tumors from each
   indicated signature based on PC1 scores were used in the analysis. See
   also Fig [132]EV2 and [133]Appendix Figs S1–S3.

Figure EV2. Principal component analysis of TCGA copy number data reveals a
shared CNA signature enriched for p53 mutation (related to Fig [134]1).

   Figure EV2
   [135]Open in a new tab
     * A–C
       Copy number profiles of 359 lung (LU) squamous carcinomas (LUSC)
       and 368 lung adenocarcinomas (LUAD) (A), 583 ovarian serous
       cystadenocarcinoma (OV) (B), and 492 uterine corpus endometrial
       carcinomas (UCEC) (C) tumors sorted by their PC1 score. Signature A
       tumors are sorted to the top of the heatmap. PC1‐ranked tumors were
       analyzed for enrichment of known mutations or molecular subtypes
       (e.g., basal/luminal in breast, see D for all subtypes tested). The
       red horizontal bars to the right of each tumor type indicate the
       tumor subtype (e.g., LUAD), or the presence of the indicated
       mutation (e.g., TP53). Normalized enrichment score and permutation
       P‐value for each significantly enriched mutation or clinical
       subtype are shown below the tick plots.
     * D
       Enrichment analysis of molecular subtypes in PC‐defined signatures
       A and B. The molecular subtypes tested were defined by the TCGA
       (The Cancer Genome Atlas Research Network, [136]2011,
       [137]2012a,[138]b, [139]2013, [140]2014). Because LUAD and LUSC
       were combined into a single dataset LU, all LUSC and LUAD subtypes
       were associated with signatures A and B, respectively. Within LU,
       signature A was most associated with the classical subtype of LUSC,
       and signature B was most associated with the bronchioid subtype of
       LUAD. Permutation‐based P‐values are indicated.
     * E
       Amplification of MDM2 and deletion of CDKN2A are mutually exclusive
       in signature B human BRCA. Each point represents a single patient
       sample. Samples are colored red or blue according to their
       membership in signatures A or B, respectively, with intermediate
       colors representing intermediate PC1 scores between signatures A
       and B. Samples colored white represent nearly diploid tumors, as
       indicated by the triangle in Fig [141]1B.

Figure EV3. Hierarchical clustering confirms the existence of the shared
pan‐cancer CNA signatures across multiple tumor types, and distinct signature
subtypes within tumor types (related to Fig [142]1).

   Figure EV3
   [143]Open in a new tab

   Pan‐cancer clustering of 3,037 copy number profiles using a balanced,
   random sampling of tumors from 15 tumor types. The dendrogram reveals
   the presence of a cluster containing multiple tumor types and enriched
   in PCA‐defined signature A tumors (hypergeometric
   P‐value = 1.1 × 10^−70 for concordance between the PCA and hierarchical
   clustering results). There is also a cluster enriched in the generally
   less uniform PCA‐defined signature B tumors (hypergeometric
   P‐value = 4.5 × 10^−12), which contains other tumor types as well
   (e.g., SKCM). In the key, the individual tumor types and pan‐cancer PC1
   scores are indicated. Additionally, for the core signature A tumors
   (OV, BRCA, UCEC, and LU), tumor‐specific subtypes (e.g., BRCA basal and
   luminal) and tumor type‐specific PC1 scores are indicated. The gene
   loci CNA levels in the heatmap are ordered based on their chromosomal
   locations (columns).

   Because altered metabolism is a hallmark of human tumors, we next
   tested whether the shared CNA signature A was enriched for genes from
   metabolic pathways. Using CNA‐based gene set enrichment analysis over
   all metabolic pathways defined by KEGG (Kanehisa et al, [144]2014), we
   found that the conserved profile of core signature A tumors (i.e., OV,
   BRCA, UCEC, LU) (Fig [145]1C) was significantly enriched for DNA
   amplifications of core glycolysis pathway genes (Fig [146]1E and F, and
   [147]Table EV1; P = 0.024). For example, BRCA signature A tumors
   exhibited DNA amplification of most genes from the glycolytic pathway,
   as well as amplification of lactate dehydrogenase B, deletion of
   pyruvate dehydrogenase subunits A and B, and amplification of the
   glycolysis‐regulating metabolic enzyme TIGAR (human gene C12orf5;
   Fig [148]1G). Notably, this shared CNA signature was defined by
   genome‐wide patterns, rather than by single gene loci, which were not
   consistently altered in all tumors with a strong signature
   ([149]Appendix Fig S2F). Interestingly, signature A summary profiles
   from breast, ovarian, uterine, and lung tumors all exhibited hexokinase
   2 (HK2) amplification (Fig [150]1F and G), whereas signature B profiles
   had primarily either HK3 (BRCA, LU, OV) or HK1 (UCEC) amplification
   (Fig [151]1G and [152]Appendix Fig S2G). Thus, PCA identified a shared
   signature from breast, lung, ovarian, and uterine carcinomas that was
   enriched for p53 mutations, higher numbers of genomic breakpoints, and
   CNA of genes from the core glycolysis pathway.

Elimination of passenger genes via cross‐species synteny mapping

   While canonical oncogenes and tumor suppressor genes drive some
   recurrent DNA copy number alterations, many recurrent CNA regions
   cannot be fully explained by the presence of known cancer genes
   (Beroukhim et al, [153]2010; Muller et al, [154]2012; Davoli et al,
   [155]2013). The conservation of CNA signature A across breast, lung,
   ovarian, and uterine tumor subsets suggests the existence of one or
   more selective pressures that are potentially shared by additional
   tumor types. In that many of the human signature A conserved regions
   identified in Fig [156]1C still span large, chromosome‐scale regions,
   we hypothesized that passenger genes could be diluting the enrichment
   signal. We subsequently reasoned that a previously reported approach of
   cross‐species comparison of CNA data from human tumors and non‐human
   tumorigenesis models would eliminate passenger genes via synteny
   mapping (Maser et al, [157]2007; Zhang et al, [158]2013; Tang et al,
   [159]2014).

   For the cross‐species analysis, we first empirically determined the
   full set of human signature A‐like tumor tissue types that could be
   included based on CNA similarity as defined by the pan‐cancer PC1
   scores from Fig [160]1A. Namely, we sequentially added each human tumor
   type in order of decreasing pan‐cancer PC1 score and tested whether
   inclusion improved the overall pathway enrichment. In a parallel
   analysis, at each sequential step we also added the corresponding
   tissue‐matched mouse epithelial cancer model signature if it existed.
   Mouse CNA signatures obtained from the literature and public
   repositories included signatures for genetically engineered mouse
   models of mammary (breast) cancer (Brca) (Drost et al, [161]2011;
   Herschkowitz et al, [162]2012), melanoma (Skcm) (Viros et al,
   [163]2014), glioblastoma/high‐grade astrocytoma (Gbm) (Chow et al,
   [164]2011), and prostate cancer (Prad) (Ding et al, [165]2012; Wanjala
   et al, [166]2015), as well as in vitro mouse epithelial cell models of
   bladder (Blca), colorectal (Coad), and kidney (Kirc) cancer
   (Padilla‐Nash et al, [167]2013).

   Using human tumors only, the sequential enrichment analysis showed a
   general improvement through the addition of the COAD tumor type
   (Fig [168]2A, dotted orange line; average enrichment signal of the top
   10 metabolic pathways). Moreover, sequential inclusion of the
   corresponding tissue‐matched mouse cancer model CNA signatures
   substantially enhanced the overall pathway enrichment results with the
   peak occurring at HNSC, a sequentially adjacent tumor type that has
   similar pan‐cancer PC1 score with COAD (Fig [169]2A, solid blue line
   and arrow indicating the peak). Overall pathway enrichment then
   decreased as additional, less copy number similar (lower pan‐cancer PC1
   scores) tumor types were added.

Figure 2. Pan‐cancer and cross‐species analysis reveals enrichment of
glycolytic genes in conserved amplification regions.

   Figure 2
   [170]Open in a new tab
    1. Average pathway enrichment signal strength for the top 10 enriched
       KEGG metabolism pathways based on the consistent CNA patterns
       across multiple tumor types. The four core CNA signature A patterns
       are defined as in Fig [171]1C. CNA signatures for other tissue
       tumor types are defined by the signal‐to‐noise ratios (SNRs) of all
       tumors compared to tissue‐matched normal (non‐tumor) samples.
       Starting with the four core signature A tumor types from Fig [172]1
       (OV, BRCA, UCEC, and LU), additional human (orange) or human and
       mouse (blue) signatures were sequentially added based on their
       decreasing tissue type pan‐cancer PC1 score from Fig [173]1A (see
       [174]Materials and Methods). The resulting sequentially restricted
       consistency signatures are illustrated in (D). A maximal peak
       (indicated by an arrow) in the enrichment signal of this sequential
       analysis is seen after adding tumors through human HNSC and mouse
       Coad (no mouse Hnsc signature was available). Enrichment signal
       strength is based on the negative log frequency of equivalent
       enrichment by chance (i.e., permutation P‐value). Mouse tumor type
       abbreviations are shown with lower case letters but otherwise match
       the human TCGA abbreviations.
    2. Enrichment analysis of metabolism (75 KEGG pathways) and canonical
       pathways (CP; 1,321 MSigDB canonical pathways). Permutation
       P‐values for the consistent CNA patterns from the four human core
       signature A tumors (defined in Fig [175]1C), the human expanded
       signature A (from A), and the human expanded signature A combined
       with corresponding mouse cancer models (from A). FDR calculations
       are described in [176]Materials and Methods. Tables [177]EV1 and
       [178]EV2 list all sequential enrichment results for KEGG metabolism
       and canonical pathways, respectively.
    3. Enrichment score for the glycolysis pathway upon sequential
       addition of human tumors and mouse models (as in A) showing a
       maximal peak in the enrichment signal corresponding to the expanded
       HNSC–Coad signature.
    4. Genome view of the sequential consistency signatures used in the
       enrichment analysis of (A–C). The locations of consistently
       amplified glycolysis genes, canonical oncogenes, and tumor
       suppressors are indicated. Genes listed in parentheses (i.e., RPIA,
       TERC, TERT, APC, and TP53) were consistently altered across the
       human tumors but not consistently altered in the mouse models, due
       in part to the inclusion of p53 genetic knockout mouse models
       (TP53) and known differences in human and mouse telomere
       maintenance (TERC, TERT) (Sherr & DePinho, [179]2000). The synteny
       graph at the bottom indicates the syntenic mouse chromosome number
       and thus the broken synteny regions between human and mouse
       genomes. This genome view chronicles how the sequential pan‐cancer
       and cross‐species analysis of conserved amplification and deletion
       loci greatly reduces the percentage of consistent copy number
       alterations, thereby reducing the percentage of the genome
       implicated as candidate driver regions.

   The improvement upon including the mouse signatures supports the
   hypothesis that the cross‐species analysis eliminates passenger genes
   and reveals pathways whose genes are enriched in the resulting
   cross‐species consistent CNA regions. Taken together, this pan‐cancer
   and cross‐species analysis demonstrates that a combination of nine
   human tumor types (OV, BRCA, UCEC, LU, BLCA, READ, SKCM, COAD, and
   HNSC) and four corresponding mouse tumor models (Brca, Blca, Skcm, and
   Coad) gives the strongest overall enrichment signal strength across all
   metabolic pathways (Fig [180]2A).

   Examining the highly ranked individual pathways from this optimized
   tumor type combination, we found that the carbohydrate metabolism
   pathway “amino sugar and nucleotide sugar metabolism” (hsa00520) was
   ranked first and glycolysis–gluconeogenesis (hsa00010, henceforth
   called glycolysis) was ranked second among KEGG metabolism gene sets
   (Fig [181]2B and [182]Table EV1). As in the average of top 10 pathways
   (Fig [183]2A), the enrichment score for the glycolysis gene set
   improved as more human tumor and mouse models were sequentially
   included up through the additions of HNSC and Coad (Fig [184]2C).
   Compared to our previous enrichment analysis of the four core signature
   A tumors (Fig [185]1E), the permutation P‐value of the glycolysis gene
   set improved from 0.05 to 0.001, reflecting that our pan‐cancer and
   cross‐species analysis has eliminated passenger genes in the
   non‐consistent CNA regions. Thus, we hereafter refer to the consistent
   signature pattern from OV up through and including HNSC and Coad as
   “expanded signature A”. Of particular note, upon expanding our analysis
   to include 1,321 gene sets from the MSigDB Canonical Pathways (CP)
   database, glycolysis remained a top result as the third ranked pathway,
   with the “glutathione‐conjugation” pathway (involved in cell
   detoxification and oxidative stress responses) also scoring strongly
   (Fig [186]2B and [187]Table EV2).

   To visualize the pan‐cancer and cross‐species conserved genomic
   regions, we plotted the consistency profiles of the expanded signature
   A human and mouse tumors (Fig [188]2D). Examination of conserved
   regions revealed that HK2, TPI1, GAPDH, PGAM2, ENO2, and LDHB
   glycolysis genes contribute to the cross‐species consistency signal.
   Among canonical oncogenes, MYC and KRAS were also present in the
   amplification regions of the expanded signature A human tumors and
   mouse models. Due to the well‐documented and clinically relevant role
   of glycolysis and pentose phosphate pathways in tumorigenesis, we chose
   to further examine the recurrent amplification of the set of these gene
   loci in subsequent analyses. Equally important for functional
   validation of these candidate pathways, the activity of glycolysis can
   be directly and indirectly measured by many assays in both patients
   (e.g., FDG‐PET imaging) and experimental systems.

CNA signatures are predictive of glycolysis

   In that the CNA‐consistent region‐defined signature A is enriched for
   core glycolysis genes, we next tested whether signature A patient
   tumors in vivo were associated with increased tumor glycolysis. To
   assign a signature A score to a set of FDG‐PET‐imaged breast cancers
   (Palaskas et al, [189]2011), we projected CNA data from these tumors
   onto a PCA of the four core human tumor tissue types (OV, BRCA, UCEC,
   and LU, Fig [190]1C). We found a strong correlation between the
   strength of CNA signature A and the measured FDG‐PET standardized
   uptake values (SUVs) (Fig [191]3A and B; Pearson rho = 0.94,
   P = 5 × 10^−5). A similar analysis using all nine expanded signature A
   tumor types demonstrated equivalent results ([192]Appendix Fig S4A;
   Pearson rho = 0.92, P = 2 × 10^−4). Thus, the CNA‐defined signature A
   is associated with increased FDG uptake in human primary tumors
   in vivo.

Figure 3. PCA‐based CNA signatures are predictive of breast cancer glycolytic
metabolism in vivo and breast cancer cell line metabolism in vitro .

   Figure 3
   [193]Open in a new tab
     * A
       PC1‐sorted copy number profiles of a balanced, random sampling of
       tumors from the four core CNA‐consistent tumor types (breast, lung
       squamous, ovarian, and uterine carcinomas, Fig [194]1A) along with
       copy number profiles from primary breast carcinoma tumors with
       glycolytic levels imaged in vivo by ^18F‐fluorodeoxy‐glucose
       positron emission tomography (FDG‐PET) (Palaskas et al, [195]2011).
       On the right, red and green values indicate high and low FDG
       standardized uptake values (SUVs), respectively, for FDG‐imaged
       tumors.
     * B
       FDG uptake values in FDG‐PET measured tumors are highly correlated
       with CNA signature A PC1 score (ρ, Pearson rho correlation;
       P‐value = 5.3 × 10^−5).
     * C
       CNA values for genes from glycolysis and pentose phosphate pathways
       have stronger predictive power of FDG‐PET SUV in breast tumors and
       of lactate secretion (sec.) in breast cancer cell lines than other
       metabolism pathway‐based sets of genes. The table indicates the
       correlation between weighted gene voting (WGV)‐based predictions on
       the test sets and the measured metabolic phenotypes. WGV was
       performed with individual training on signature A tumors from each
       of the core four tumor types (breast (BRCA), lung (LU), ovarian
       (OV), and uterine (UCEC); top 10% signature A compared to normal
       (non‐tumor) samples), the voting predictions were averaged, and
       compared to the measured metabolic phenotype. P‐values were
       assessed by permutation analysis. See [196]Table EV4 for all
       pathways.
     * D, E
       Signature A‐based WGV predictions and signature A‐based PC1
       projections (as in B) have stronger correlation with measured
       lactate secretion for basal‐like breast cancer cell lines than for
       luminal‐like lines (D). See [197]Appendix Fig S4B for luminal cell
       line data. Basal and luminal subtype classifications are from a
       published gene expression‐based determination (Neve et al,
       [198]2006). The breast cancer‐based glycolysis and pentose
       phosphate pathway (G & PP) WGV predictions are shown as a
       representative case (E).

   Data information: See also [199]Appendix Fig S4B.

   We next asked whether the CNA signature A‐defined tumors associated
   with high FDG uptake also had RNA‐based signatures of increased
   glycolysis. First, we compared signature A tumors to signature B tumors
   using RNA‐based enrichment analysis (GSEA). We analyzed both BRCA and
   LU tumors because these tumor types show distinct signature A and
   signature B subtypes. (OV signature B is highly similar to the
   pan‐cancer signature A pattern, and thus does not provide a
   differential test (Fig [200]EV2B). UCEC tumors were not included due to
   a lack of sufficient paired RNA and DNA profiling data.) In the
   enrichment analysis, we included a gene set consisting of genes from
   the glycolysis and pentose phosphate pathways that were upregulated in
   FDG‐high BRCA tumors, as defined by our previous work (Palaskas et al,
   [201]2011). In the GSEA, the empirically defined FDG‐high gene set
   ranked number one overall (NES = 2.5, permutation P‐value = 2 × 10^−4),
   confirming that Sig A tumors have glycolysis RNA expression profiles
   matching those of FDG‐high tumors (Fig [202]EV4A). Furthermore, BRCA
   and LU signature A tumors were significantly enriched for genes from
   the full glycolysis and pentose phosphate pathways (overall rank 4^th
   of 76 pathways, NES = 2.0, permutation P‐value = 6.8 × 10^−3), as well
   as other glycolysis‐related pathways ([203]Table EV3). This analysis of
   RNA expression data is consistent with the enrichment of
   glycolysis‐associated pathways at the DNA copy number level
   (Fig [204]1E and [205]2B), and points to glycolysis‐related pathways as
   selection targets for upregulation in signature A tumors.

Figure EV4. CNA‐defined signature A tumors exhibit RNA expression signatures
consistent with increased glycolytic activity (related to Fig [206]3).

   Figure EV4
   [207]Open in a new tab
     * A
       KEGG metabolism pathway enrichment analysis based on mRNA
       expression in BRCA and LU signature A and signature B tumors. RNA
       expression data are from BRCA and LU tumors with paired TCGA DNA
       and RNA data. To evaluate the similarity to BRCA tumors with high
       FDG‐PET uptake, we included a gene set consisting of genes from the
       glycolysis and pentose phosphate pathway that were upregulated in
       FDG‐high BRCA tumors (Palaskas et al, [208]2011). To combine BRCA
       and LU into one enrichment analysis, genes were ordered by their
       average rank in BRCA and LU tumor types. Enrichment scores for all
       KEGG metabolism pathways and for individual tumor types can be
       found in [209]Table EV4.
     * B–D
       RNA expression signatures of high glycolysis (Palaskas et al,
       [210]2011) were used to predict the glycolytic phenotype of BRCA
       (B), LU (C), and OV (D) tumors with paired TCGA DNA and RNA data.
       UCEC tumors were excluded due to a lack of sufficient samples with
       paired RNA and DNA data. The color scale was normalized for each
       tumor type by the mean FDG score for tumors with integrated CNA
       score > 0.2. Non‐normalized values are shown in the color legend
       and in (E).
     * E
       RNA‐based WGV predictions of glycolysis show that BRCA and LU
       signature A tumors (top 10% PC1) are predicted to be more
       glycolytic than the tissue type‐matched signature B tumors (bottom
       10% PC1). Mann–Whitney U‐test P‐values are shown. Data are
       presented in box (median, first and third quartiles) and whisker
       (extreme value) plots.
     * F
       Spearman correlation values between RNA‐based WGV glycolysis
       predictions and CNA‐based PC1 values, calculated for different
       range windows of integrated CNA levels to control for the general
       increase in glycolysis predictions at increased levels of genomic
       instability. P‐values shown are calculated by performing 1 × 10^6
       permutations where the WGV scores were randomly shuffled, and the
       P‐value was calculated as the percentage of random permutations
       that achieved a higher correlation value for any range window than
       did the experimentally observed relationship.

   To further explore the RNA expression data, we predicted the glycolytic
   phenotypes of the core signature A tumor types (BRCA, LU, and OV) using
   RNA‐based weighted gene voting (WGV) (Golub et al, [211]1999) and our
   previously defined FDG prediction model (Palaskas et al, [212]2011).
   UCEC tumors were again excluded because there were not a sufficient
   number of samples with paired RNA and DNA data. We found that RNA‐based
   predictions of high glycolysis were associated with the signature A end
   of tumor type‐specific PC1 for BRCA and LU (Fig [213]EV4B–D). Signature
   A tumors were predicted to be significantly more glycolytic than
   signature B tumors for BRCA and LU (P‐values of 2.4 × 10^−17 and
   5.7 × 10^−18, respectively, Fig [214]EV4E). There was no significant
   differential trend in OV tumors, potentially because almost all of
   these tumors are genomically unstable (integrated CNA scores > 0.2) and
   the signature B of OV is relatively signature A‐like (Fig [215]EV2B and
   [216]Appendix Fig S1D). The OV predictions were consistent with
   predicted high glycolysis across all tumors (Fig [217]EV4E). To control
   for the general increase in glycolysis predictions at increased levels
   of genomic instability, we also analyzed the correlation between
   glycolysis predictions and PCA scores for several range windows of
   integrated CNA scores. This analysis demonstrated that RNA‐based
   predictions of glycolysis were significantly correlated with PC1 scores
   in BRCA and LU tumors with high integrated CNA scores (permutation
   P‐value for BRCA and LU < 1 × 10^−6 each, Fig [218]EV4F). Taken
   together, the DNA‐defined signature A tumors exhibit RNA expression
   patterns consistent with increased FDG uptake. These RNA‐based results
   support the more downstream activity‐based findings of elevated FDG
   uptake in signature A tumors (Fig [219]3A and B).

   Returning to the result of the CNA‐defined signature A being associated
   with increased FDG uptake in human primary tumors in vivo, we next
   asked which sets of metabolic gene loci copy number levels were most
   predictive of glycolytic phenotypes. We performed a CNA‐based weighted
   gene voting (WGV) analysis to predict the glycolytic phenotypes of
   breast tumors and breast cancer cell lines (Neve et al, [220]2006)
   using individual gene sets from the KEGG metabolic pathways database
   (Kanehisa et al, [221]2014). Gene weights were calculated from each of
   our four tumor type CNA “training” signatures. Specifically, we tested
   the ability of individual metabolic pathways to predict (i) FDG uptake
   in patient primary breast tumors and (ii) the lactate secretion of a
   panel of 32 breast cancer cell lines (Hong et al, [222]2016). Averaging
   results across these two test cases revealed that genes from the
   glycolysis and pentose phosphate pathway were most predictive of these
   metabolic phenotypes (Fig [223]3C and [224]Table EV4; P = 0.01).
   Moreover, signature A‐based predictions were predictive of lactate
   secretion for basal cell lines more so than for luminal lines
   (Fig [225]3D and E, and [226]Appendix Fig S4B), consistent with the
   observed basal‐type tumor enrichment in signature A samples
   (Figs [227]1B and [228]EV2D). Thus, consistent with the gene
   expression‐based predictions above, the glycolysis and pentose
   phosphate pathway DNA copy number alterations from signature A are
   predictive of glycolytic phenotypes of primary human breast tumors and
   cancer cell lines.

Experimental recapitulation of tumor CNA signatures

   Having demonstrated that the glycolytic pathway is statistically
   associated with genome‐wide DNA copy number patterns, we sought an
   experimentally tractable system that would allow us to test the
   hypothesis that recurrent patterns of DNA amplification reflect
   metabolic selection pressures. We thus derived a panel of immortalized
   mouse embryonic fibroblasts (MEFs) using the classical 3T9 protocol
   (Todaro & Green, [229]1963). In this experimental system, under
   standard culture conditions (e.g., 21% O[2]), diploid cells undergo a
   crisis‐associated event that increases tolerance for genomic
   instability and allows them to escape senescence and evolve into cells
   with chromosomal instability and genomic aberrations (Fig [230]4A and
   [231]Appendix Fig S5A; Sun & Taneja, [232]2007). This system has been
   used to study core cancer phenotypes such as proliferation,
   anti‐apoptosis, and chromosomal instability (Lowe et al, [233]1993;
   Gupta et al, [234]2007; He et al, [235]2007; Sotillo et al, [236]2007;
   Sun et al, [237]2007; Weaver et al, [238]2007) and is one of the few
   experimentally tractable cancer models involving spontaneous genomic
   instability (Sherr & DePinho, [239]2000). In addition, because this
   system is not driven by strong oncogenes (e.g., KRAS mutation), it
   allows for complex CNA signatures to evolve from a combination of
   individual, presumably weaker, DNA alteration events.

Figure 4. A mouse embryonic fibroblast (MEF) immortalization system
recapitulates the two‐signature CNA patterns and glycolysis gene CNA
enrichment observed in human tumors and mouse models.

   Figure 4
   [240]Open in a new tab
    1. Top: Copy number profiles of 59 samples from 42 independent mouse
       embryonic fibroblasts (MEF) sublines before, during, and after
       senescence recapitulate the two‐signature CNA patterns observed in
       human tumors. Twenty‐two sublines evolved to Sig. A patterns, 13
       sublines exhibited Sig. B, and seven sublines remained diploid‐like
       due to redox protection. PC1 scores for analysis of only signature
       A MEFs (PC1‐Sig A), only signature B MEFs (PC1‐Sig B), or all MEFs
       (PC1‐Sig AB) are indicated on the left. Also indicated are a metric
       of the degree of senescence observed during immortalization
       (senescence score), protection by 3% O[2]‐based hypoxia or catalase
       (+), or both (++) during immortalization, CNA profiling at passage
       1 (P1), exogenous MYC expression (which enabled cells to bypass
       senescence), and Trp53 sequencing status (* indicates a non‐severe
       mutation p.183D > E, blank indicates not sequenced, additional p53
       sequencing information in [241]Appendix Table S1). Signature A MEF
       lines profiled at more than one passage number are indicated by the
       start (earlier passage) and end points (later passage) of the
       upward “evolving signature A” arrows. For comparison to (C), the
       early (25) and late (49) passages of the H1 subline that were
       characterized by spectral karyotyping (SKY) are indicated on the
       left by triangles. The indicated chromosome 6 region includes loci
       for Bpgm, Hk2, Rpia, Eno2, Tpi1, Gapdh, Tigar, Ldhb, and Kras.
       Bottom: Summary signatures of amplification and deletion loci in
       copy number profiles (normalized PC1 loadings) are shown for MEF WT
       and p53^−/− signature A (Sig. A) samples. A paired t‐test analysis
       of signature A MEF lines profiled at more than one passage number
       revealed genomic regions associated with mid‐ to late‐passage CNA
       evolution (log[10] t‐test P‐value signed positive for
       amplifications, negative for deletions; labeled “Evolving”).
       Individual profiles of p53^−/− signature A samples are not shown.
    2. Signature A MEFs have more DNA breakpoints and a larger degree of
       copy number alteration (integrated CNA score) than signature B
       MEFs. Signature A MEFs had a subset of cases with large numbers of
       breakpoints per chromosome (> 10) that were not observed in the
       signature B MEFs (hypergeometric P‐value = 0.02, Student's t‐test
       P‐value = 4 × 10^−5). Data are presented in box (median, first and
       third quartiles) and whisker (extreme value) plots.
    3. Spectral karyotyping (SKY) of the H1 MEF subline at passage 48.
       Whole chromosomal gains of varying extent are observed for all
       chromosomes with some chromosomes experiencing translocation, for
       example, t(3:16), and deletion, for example, del(4). Additional
       chromosome spreads and a summary table for P25 and P48 karyotypes
       are shown in [242]Appendix Fig S8. CGH profiles for H1 cells are in
       (A) (indicated by the triangles in the SKY column) and
       [243]Appendix Fig S6A and B.
    4. The amount of senescence demonstrated during MEF line derivation
       (senescence score, see [244]Materials and Methods) is highly
       correlated with the degree of copy number alterations obtained
       (integrated CNA score) (Spearman rho = 0.66, P‐value = 3 × 10^−8).
       Single protection indicates protection by 3% O[2] culture
       conditions or media supplementation with 250 U/ml catalase. Double
       protection indicates that cells were cultured at 3% O[2] with
       catalase.
    5. Conserved amplification and deletion loci in copy number profiles
       across core and expanded signature A human tumor types (Hs Sig. A),
       corresponding mouse tumorigenesis model signatures (Mm models), and
       the MEF immortalization system (MEF Sig. A). The human and mouse
       signatures are as defined in Fig [245]2, and the wild‐type MEF
       signatures are from (A). In the human consistent regions, the 52.3%
       consistently altered gene loci include bona fide tumor suppressor
       genes (p53), oncogenes (MYC, KRAS), telomere components (TERC,
       TERT), and core glycolysis genes (e.g., HK2). As in Fig [246]2D,
       genes listed in parentheses were consistently altered across the
       human tumors but not consistently altered in the mouse. The
       tissue‐specific, non‐lymphoid oncogenes from the COSMIC database
       that are in a conserved amplification or deletion region (TRRAP,
       CHD4) are listed for completeness (Forbes et al, [247]2015).
    6. Enrichment signal strength (negative of log permutation P‐value)
       for the glycolysis and core glycolysis gene sets improve when MEF
       CNA signatures are added to the human and mouse CNA signatures from
       Fig [248]2. The degree of increased enrichment upon addition of the
       MEF signatures is quantitatively similar to adding mouse
       tumorigenesis model signatures. Adding all signatures into the
       analysis further strengthens the enrichment (last column).
       [249]Table EV2 lists results for the enrichment analysis of the
       remaining MSigDB canonical pathways when human core and expanded
       signature A, mouse models, and MEF immortalization models are
       sequentially combined.

   Data information: See also [250]Appendix Figs S5–S9.

   We profiled the genome‐wide copy number of 42 independent MEF sublines
   by CGH microarray (Fig [251]4A). Most samples were profiled after
   immortalization (post‐senescence), with a few profiled before or during
   senescence. Analysis of this CNA data by PCA revealed that the MEF
   system recapitulated a two‐signature pattern (signatures A and B).
   These two signatures were generally orthogonal, with the exception of a
   few “mixed” samples that had CNA characteristics of both signatures A
   and B ([252]Appendix Fig S5B). As anticipated from prior MEF studies,
   immortalized MEF cells exhibited an increased number of genomic
   breakpoints and a higher degree of copy number alterations than the
   diploid, pre‐senescent MEF cells ([253]Appendix Fig S5C and D).
   Importantly, the MEF‐derived signature A resembled the shared signature
   A pattern derived from human breast, lung, ovarian, and uterine tumors.
   In particular, the MEF‐derived signature A was characterized by p53
   mutation and chromosome loss at the p53 locus, more genomic
   breakpoints, subchromosomal‐sized alterations, and a higher degree of
   copy number alteration (Fig [254]4A and B, and [255]Appendix Table S1).
   Additionally, this experimental system allowed us to profile the same
   MEF lines at subsequent passages, and a paired statistical analysis of
   five evolving signature A lines revealed genomic regions changing from
   mid‐ to late passage (Fig [256]4A and [257]Appendix Fig S6A and B).
   Similar to the shared human tumor signature A, both the MEF signature A
   and the evolving MEF signature demonstrated DNA amplifications of
   glycolysis and glycolysis‐related genes ([258]Table EV2). In
   particular, both human and mouse CNA signatures A included
   amplification of Hk2, Bpgm, Rpia, Tigar (mouse gene 9630033F20Rik),
   Eno2, Tpi1, Gapdh, Ldhb, Kras (mouse chr. 6), and Pgk2 (mouse chr. 17)
   (Figs [259]1G and [260]4A, and [261]Appendix Fig S6C).

   In contrast, the MEF‐derived signature B was characterized by
   amplification of Mdm2 or deletion of Cdkn2a (Ink4a/Arf) and fewer
   overall copy number alterations. Mdm2 amplification and Cdkn2a loss are
   alternative mechanisms for inactivating p53 function in human tumors
   and in MEFs (Sherr & DePinho, [262]2000), but as found here result in a
   distinct CNA signature as compared to the p53 mutation‐associated
   signature A. As in human BRCA (Fig [263]EV2E), amplification of Mdm2
   and loss of Cdkn2a were generally mutually exclusive in signature B
   MEFs ([264]Appendix Fig S6D), reflecting alternate mechanisms for
   disabling the p53/ARF axis (Sherr & Weber, [265]2000). Additionally,
   Mdm2 amplification tends to co‐occur with Hk1 amplification, both loci
   being located on mouse chr. 10. Thus, the signature B cases are
   associated with an alternate HK amplification, similar to our finding
   in signature B human tumors (HK1 or HK3 rather than HK2) (Fig [266]1G
   and Appendix Figs [267]S2G and [268]S6C). Immunoblotting confirmed that
   signature A MEF and signature B MEF lines generally had increased
   expression of Hk2 and Hk1 protein, respectively ([269]Appendix Fig S6E
   and F).

   To further characterize the association between p53 loss and CNA
   signatures, we profiled the CNA patterns of 29 independent p53^−/− MEF
   sublines derived in standard 3T9 culture conditions. Comparison of the
   p53^−/− MEF CNA patterns to wild‐type MEFs by PCA demonstrated that the
   CNA patterns of p53^−/− MEFs resemble the wild‐type MEF signature A
   pattern, with no signature B‐like sublines observed ([270]Appendix Fig
   S7). p53^−/− MEFs do not undergo senescence (Olive et al, [271]2004),
   and consistent with this, we observed that they tended to have less
   strong copy number alterations. In summary, strong p53 functional loss
   (p53 mutation or genetic loss) tends to lead to the CNA signature A
   pattern, which is associated with a higher degree of copy number
   alterations (higher integrated CNA score and more breakpoints) and Hk2
   amplification, while weaker or less complete p53 functional loss (e.g.,
   mediated by Mdm2 amplification or Cdkn2a loss) is associated with an
   alternative signature (signature B).

Numerical and structural chromosomal abnormalities

   Next we sought to understand how CNA signatures revealed by CGH relate
   to numerical and structural aneuploidy (i.e., whole chromosomal and
   subchromosomal gains or losses, respectively). Using propidium
   iodide‐based DNA staining, we found that immortalized MEF cells had
   increased total DNA content compared to pre‐senescent MEFs
   ([272]Appendix Fig S8A–C). Similar results were observed in a
   comparison between a genome‐stable, close to diploid, immortalized
   human mammary epithelial cell line (MCF10A) and a human breast cancer
   cell line with a high degree of DNA copy number alterations (Hs578T).
   In addition, we used spectral karyotyping (SKY) to assess the numerical
   and structural chromosomal aberrations in immortalized MEF cells.
   Profiling a representative signature A MEF subline just after
   immortalization (subline H1; passage 25, P25), SKY revealed increased
   total DNA content (105 ± 36 chromosomes) with nearly all chromosomes
   having experienced whole chromosome gains ([273]Appendix Fig S8D, E and
   H). In addition, there was substantial cell‐to‐cell heterogeneity in
   the number and type of chromosomes at passage 25. Profiling the same
   MEF cell line 23 passages later (P48), SKY revealed that the average
   number of chromosomes had slightly decreased and stabilized
   (84 ± 12 chr), as has been reported previously (Hao & Greider,
   [274]2004), and clonal markers had begun to emerge (e.g., translocation
   (3:16)) (Fig [275]4C and [276]Appendix Fig S8F, G and I). We also
   observed a substantial number of double minutes in some of the cells
   examined at P48. As expected, the most strongly amplified chromosomes
   by SKY were scored as gains in CGH data (e.g., chrs. 3 and 6 in subline
   H1 at P48/49). Since array CGH analysis is normalized by input DNA
   quantity, genomic regions scoring as negative on a log[2] CNA plot can
   be greater than diploid if most other chromosomes have amplified to an
   even greater extent (e.g., trisomy chrs. 1, 2, and 7 in subline H1 at
   P48/49). Similar results have been reported for karyotyping and CGH
   results of human cancer cell lines (Kytölä et al, [277]2000). Taken
   together, these data reveal that immortalized MEFs experience
   substantial numerical and structural chromosomal abnormalities, similar
   to what is observed in human tumors, and further support that selection
   for optimized rearranged genomes occurs.

Senescence‐associated oxidative stress as a selective force for copy number
alterations

   The MEF system allowed us to investigate the selective pressures
   driving copy number changes during immortalization. Because MEFs
   cultured under physiological oxygen conditions (3% oxygen) undergo
   little to no senescence and exhibit less DNA damage (Parrinello et al,
   [278]2003), we tested whether oxidative stress‐induced senescence is a
   selective pressure for copy number alterations. To protect cells from
   oxidative stress, we cultured MEFs in 3% oxygen in media supplemented
   with or without catalase, an enzymatic scavenger of reactive oxygen
   species (ROS) (Halliwell, [279]2003; Graham et al, [280]2012). When
   doubly protected from oxidative stress, MEFs did not undergo senescence
   and maintained relatively diploid genomes (Fig [281]4A and
   [282]Appendix Fig S5A). In addition, we observed a variation in the
   degree of senescence experienced by MEFs derived in atmospheric oxygen
   concentrations (21%) ([283]Appendix Fig S5A). Upon calculating the
   degree of senescence encountered by each subline (senescence metric
   described in [284]Materials and Methods), we found a correlation
   between senescence and the degree of copy number alterations
   (Fig [285]4D, Spearman rho = 0.66, P = 3 × 10^−8; and [286]Appendix Fig
   S9A). Furthermore, p53^−/− cells did not undergo senescence (Olive
   et al, [287]2004) and exhibited less strong copy number alterations
   than wild‐type signature A MEFs ([288]Appendix Fig S7B; P = 9 × 10^−4).
   Taken together, our results implicate senescence‐associated redox
   stress as one of the selective forces driving the copy number
   alterations recurrently observed in human tumors.

MEF CNA signatures recapitulate recurrent patterns from human tumors and
further implicate the glycolysis pathway in shaping the cancer genome

   Our pan‐cancer and cross‐species analysis revealed that the glycolysis
   pathway is highly enriched in conserved CNA regions (Fig [289]2).
   Because the MEF signature A is qualitatively similar to the human
   expanded signature A, we next asked how inclusion of our MEF signatures
   would affect metabolic and canonical pathway enrichment analysis.
   Although the MEF immortalization system utilizes fibroblast cells and
   the mouse tumors are epithelial in origin, both types of models share a
   similar cross‐species consistency with the expanded signature A tumor
   types ([290]Appendix Fig S9B). When the human tumors signatures were
   combined with the MEF signatures, consistent genome regions were
   reduced to 3.6% of the genome spread over 13 conserved regions
   (Fig [291]4E). In cross‐species pathway enrichment analysis, MEF CNA
   signatures added a similar amount of enrichment signal to human tumor
   signatures as do the non‐MEF mouse model signatures, and when combined
   together, the enrichment was even stronger (e.g., glycolysis and core
   glycolysis pathways, Fig [292]4F). When all signatures are used,
   glycolysis was the top‐ranked enriched pathway out of 1,321 MSigDB
   canonical pathways ([293]Table EV2). Thus, including the MEF signatures
   in the cross‐species analysis further implicates the glycolysis pathway
   in shaping the cancer genome.

   Examination of the cross‐species genomic regions conserved in human
   tumors, mouse models of cancer, and the MEF immortalization system
   revealed consistent amplification of five regions containing the COSMIC
   database‐enumerated oncogenes GATA2 (human chr. 3; Fig [294]4E, last
   SAMCS line), TRRAP (chr. 7, region 4), MYC (chr. 8, region 1), CHD4
   (chr. 12, region 1), and KRAS (chr. 12, region 2) (Forbes et al,
   [295]2015). Notably, four of the cross‐species conserved CNA genomic
   regions included genes from the glycolysis pathway: HK2 (chr. 2), GCK
   and PGAM2 (chr. 7, region 1), ENO2, TPI1, and GAPDH (chr. 12, region 1,
   which includes TIGAR), and LDHB (chr. 12, region 2, which also includes
   KRAS). In both human tumors and the MEF immortalization system, the
   genomic region harboring TIGAR–GAPDH–TPI1–ENO2 was separated from the
   LDHB–KRAS region by a deletion‐prone region that includes the tumor
   suppressor CDKN1B ([296]Appendix Fig S9C–E).

   To investigate whether amplification of the DNA cross‐species conserved
   CNA regions results in upregulated RNA expression levels, we examined
   the correlation between DNA copy number and RNA expression levels in
   BRCA, LU, and OV tumors for glycolytic genes and, as a point of
   comparison, known oncogenes ([297]Appendix Fig S10). We examined the
   average correlation across BRCA, LU, and OV signature A tumors for all
   genes in the 12 cross‐species conserved regions. We found that three
   glycolytic genes (LDHB, TPI1, and GAPDH) and one oncogene (KRAS)
   exhibited strong DNA–RNA correlation (r > 0.66), three glycolytic genes
   (HK2, ENO2, and PGAM2) and three oncogenes (MYC, CHD4, and TRRAP)
   exhibited moderate correlation (0.2 < r < 0.5), and only one glycolytic
   gene (GCK) and one oncogene (GATA2) exhibited weak DNA–RNA correlation
   (r < 0.2) ([298]Appendix Fig S10A and B). This analysis indicates that
   gene copy number alterations at the DNA level generally lead to
   increased RNA expression in signature A tumors in BRCA, LU, and OV
   tumors, with a similar degree of correlation observed for both
   glycolysis genes and oncogenes. Finally, we compared the upregulation
   of glycolytic genes and oncogenes on two cross‐species conserved
   regions of human chr. 12 that contain both types of genes
   ([299]Appendix Fig S10C). Within the centromere‐distal region of
   chromosome 12, the glycolytic genes TPI1 and GAPDH show strong
   correlation (Spearman rank correlation = 0.71 and 0.66, respectively)
   while the oncogene CHD4 and the glycolytic gene ENO2 exhibit moderate
   correlate (Spearman rank correlation = 0.42 and 0.32, respectively).
   Within the more centromere‐proximal region, LDHB and KRAS both exhibit
   strong correlation (Spearman rank correlation = 0.68 and 0.66,
   respectively). Importantly, the correlation of RNA expression with DNA
   amplification is not stronger for oncogenes than for glycolytic genes
   within these cross‐species regions. Taken together, these results
   support a model in which the selection pressures shared during
   immortalization and tumorigenesis result in cross‐species conservation
   of the glycolysis gene loci copy number alterations (Fig [300]4E and F,
   and [301]Table EV2).

Alteration of CNA signatures by exogenous expression of metabolic enzymes

   The presence of core glycolysis genes in cross‐species conserved
   amplification regions suggests that these metabolic gene loci drive the
   amplification of these regions. To test this hypothesis, we transduced
   pre‐senescent MEFs with either wild‐type HK2 or HK1, kinase‐dead HK2
   (D209A/D657A) (McCoy et al, [302]2014), or wild‐type ENO2 and allowed
   the cells to senesce and immortalize in the presence of these
   exogenously expressed proteins ([303]Appendix Fig S11). Analyzing the
   signature A set of sublines, we found that the endogenous Hk2 locus
   (chr. 6) was less amplified in cells expressing exogenous wild‐type
   hexokinase (P = 0.048) (Fig [304]5A and B). As a control, a signature A
   MEF cell line expressing kinase‐dead hexokinase did not show reduced
   amplification of the Hk2 locus (P = 2 × 10^−4). In that MEF lines
   exogenously expressing hexokinase still demonstrated positive selection
   for the centromere‐proximal half of chr. 6 (Fig [305]5A), we examined
   the ratio of Hk2 gene locus copy number to the maximal amplification on
   chromosome 6 (Hk2:Chr6 max). In this analysis, we found that cells
   expressing exogenous hexokinase demonstrated significantly reduced
   Hk2:Chr6 max ratios (P = 3 × 10^−3) (Fig [306]5A and C). Additionally,
   a MEF subline expressing exogenous ENO2 exhibited deletion rather than
   amplification of the Eno2 locus on chr. 6 (P = 0.02) (Fig [307]5A and
   D). Analyzing the signature B set of sublines, we found that the
   endogenous Hk1 locus (chr. 10) was copy number neutral, rather than
   amplified, in a cell line expressing exogenous hexokinase (P = 0.17),
   whereas a signature B MEF cell line expressing kinase‐dead hexokinase
   did not show reduced amplification of the Hk1 locus (Fig [308]5E and
   F). Fisher's combined statistical analysis of these results yielded
   P‐values of 0.001 or less (Fig [309]5).

Figure 5. Alteration of CNA signatures by exogenous expression of metabolic
enzymes.

   Figure 5
   [310]Open in a new tab
     * A
       CD1 MEFs expressing exogenous HK1/HK2 or ENO2 exhibit reduced
       amplification of the endogenous Hk2 or Eno2 loci, respectively.
       Chr. 6 copy number profiles from control signature A MEFs
       (untransduced or expressing red fluorescent protein (RFP)) compared
       to signature A MEFs expressing either wild‐type HK1 or HK2,
       kinase‐dead HK2 (HK2 KD, D209A/D657A), or wild‐type ENO2. The
       positions of endogenous Hk2 and Eno2 are indicated. MEF lines
       profiled at more than one passage number are indicated by the start
       (earlier passage) and end points (later passage) of the arrows
       under “Evolving MEFs”.
     * B–D
       Boxplots of Hk2 copy number (B), ratio of Hk2 copy number to
       maximum amplification of chromosome 6 (defined operationally as the
       5^th percentile CNA value across chr. 6 to avoid outlier effects)
       (C), and Eno2 copy number (D). Control MEFs: untransduced (squares)
       or expressing RFP (diamonds); test MEFs expressing wild‐type HK1
       (upside‐down triangles), HK2 (triangles in B and C), kinase‐dead
       HK2 (HK2 KD, D209A/D657A, circles), or wild‐type ENO2 (triangle in
       D). In this analysis, copy number data from samples profiled at
       multiple passages were averaged to prevent overrepresentation of
       these cell lines. P‐values were calculated using Student's t‐test
       (B), Mann–Whitney U‐test (C), and z‐score (B, C) based on criteria
       described in [311]Materials and Methods. Data are presented in box
       (median, first and third quartiles) and whisker (extreme value)
       plots.
     * E
       Signature B MEFs expressing exogenous HK2 exhibit less
       amplification of the Hk1 locus. Chr. 10 copy number profiles from
       control signature B MEFs (untransduced or expressing RFP) compared
       to signature B MEFs expressing wild‐type HK2 or kinase‐dead HK2
       (HK2 KD). The positions of endogenous Hk1 and Mdm2 are indicated.
     * F
       Boxplot of the Hk1 copy number with indicated P‐value (z‐score).
       Data are presented in box (median, first and third quartiles) and
       whisker (extreme value) plots.

   Data information: Fisher's combined P‐values for the alteration of
   endogenous loci CNA by exogenous expression of metabolic enzymes is
   3 × 10^−5 (panels B, D, F) or 7 × 10^−4 (panels C, D, F). For all
   comparisons in this figure, no significant differences were observed
   between untransduced and RFP‐expressing MEFs. See also
   [312]Appendix Fig S11.

   Taken together, these results demonstrate that exogenous expression of
   metabolic enzymes can alter the copy number status of the endogenous
   genomic loci, supporting these metabolic genes as drivers within the
   conserved amplification regions observed in human tumors and mouse
   models. In addition, these results support a model in which the net
   propensity for a chromosomal region to be amplified or deleted is in
   part related to the sum of the fitness effects of the genes present
   (Davoli et al, [313]2013). For example, when HK2 is exogenously
   expressed, the centromere‐distal half of chromosome 6 had a low copy
   number value in early culture but after additional culture demonstrated
   increased copy number, supporting that the other gene loci of this
   region (such as Eno2, Gapdh, and other glycolytic genes) do have a
   remaining pro‐fitness benefit (Fig [314]5A).

Metabolism and growth phenotypes of CNA signatures

   To test whether there exist phenotypic differences between signature A
   and signature B MEFs, we characterized 11 wild‐type MEF lines
   representative of either signature A or B and one mixed signature line.
   We found that signature A MEFs generally had higher rates of glucose
   consumption and lactate production than signature B MEF lines
   (Figs [315]6A and [316]EV5F). Plotting the PC1 score versus glucose
   consumption, we found that CNA‐based signature A was highly predictive
   of glucose consumption in the MEF signature A lines (Fig [317]6B). In
   contrast, signature B was only moderately predictive of the glucose
   consumption of signature B lines, and was not as accurate as signature
   A in predicting the glucose consumption of signature A lines
   (Figs [318]6B and [319]EV5A). In addition, we noted that the signature
   A cell lines generally exhibited significantly higher rates of
   proliferation than the signature B cell lines (Fig [320]6C). Similar to
   glucose consumption, signature A was predictive of the growth rates of
   signature A MEFs and signature B was predictive of the growth rates of
   signature B MEFs, while cross‐signature predictions had less power
   (Figs [321]6D and [322]EV5B). Furthermore, we observed a general
   coevolution of higher growth rates and increased CNA signature strength
   in MEF lines that were profiled at different passage numbers
   (Fig [323]6D). As noted above, the evolving MEF CNA signature pattern
   was enriched for DNA amplifications of genes in the core glycolysis and
   glycolysis‐associated pathways ([324]Table EV2), particularly due to
   amplification of chromosome 6, which contains multiple metabolic gene
   loci including Hk2 and Eno2 (Fig [325]4A and E, and [326]Appendix Fig
   S6B).

Figure 6. Signature A and B MEFs exhibit differential metabolic and
proliferation phenotype strengths.

   Figure 6
   [327]Open in a new tab
    1. PCA‐defined signature A MEFs exhibit higher glucose consumption
       rates than signature B MEFs (Student's t‐test P‐value = 0.009).
       Glucose consumption was measured using a bioanalyzer.
    2. PC1 scores of signature A MEFs are predictive of glucose
       consumption rates in signature A MEFs, while scores from signature
       B MEFs have weaker predictive power in signature B MEFs. In
       cross‐prediction tests, signature A‐based predictions of signature
       A MEF line metabolic phenotypes perform the best (Fig [328]EV5A).
       Glucose consumption was measured using a bioanalyzer.
    3. PCA‐defined signature A MEFs exhibit higher average growth fold
       change, as observed in 3T9 culture, compared to signature B MEFs.
       Signature A MEFs had a subset of cases with high growth rates
       (average fold change in cell number per passage > 4) that were not
       observed in the signature B MEFs (hypergeometric P‐value = 0.02).
    4. Correlation and a general coevolution of higher growth rates and
       increased CNA signature strength. Arrowed lines indicate
       progressing MEF lines profiled at more than one passage number
       (Appendix Figs [329]S6A and B, and [330]S9A). In cross‐prediction
       tests, PC1 scores of signature A MEFs are more predictive of
       average growth fold change in signature A MEFs, while scores from
       signature B MEFs are more predictive in signature B MEFs
       (Fig [331]EV5B).
    5. Metabolomic profiling of 11 wild‐type MEF lines representative of
       either signature A or B cultured for 24 h with [1,2‐^13C]‐labeled
       glucose. The metabolic pathway schematic is colored based on
       differences observed in the percent heavy label for intracellular
       metabolites (defined as percent of metabolite molecules with
       isotopomer mass greater than the monoisotopic molecular weight, M0)
       between signature A and B MEFs using the signal‐to‐noise (SNR)
       metric. Red indicates a higher heavy carbon‐labeling percentage in
       signature A MEFs, and blue indicates a higher heavy carbon‐labeling
       percentage in signature B MEFs. Signature A MEF lines incorporate
       more glucose‐derived carbon in metabolites from the early
       glycolysis steps, the pentose phosphate pathway, and nucleotide
       synthesis, and a smaller fraction of glucose‐derived carbon per
       molecule in metabolites from the later glycolysis steps, the serine
       synthesis pathway, and the TCA cycle. Full metabolomic data can be
       found in [332]Table EV5.
    6. Correlation of fructose‐1,6‐bisphosphate (F1,6BP) and negative
       correlation of 3‐phosphoserine (pSer) with glucose consumption
       rates. Note, the high to low range for percent metabolite molecules
       with incorporated label varies for each metabolite, but generally
       does not extend from 0 to 100%. Signature A and B MEFs are colored
       red and blue, respectively.
    7. Model of copy number selection and fitness gains during
       tumorigenesis. Genomic instability enables fitness gains in tumor
       metabolism. In human tumors, cancer cell lines, and an experimental
       MEF immortalization system, immortalization and tumorigenesis lead
       to multiple CNA signatures that are predictive of the tumor
       phenotypes of metabolism and proliferation. Senescence‐associated
       redox stress and other tumorigenesis‐related constraints select for
       stronger CNA signatures. A shared high glycolysis‐associated
       signature A is observed in breast, lung, ovarian, and uterine
       tumors, additional tumor types, mouse models, and the MEF model
       system, and is linked with a higher range of glycolysis and
       proliferation phenotypes. Genetic manipulation of glycolysis
       enzymes leads to alteration of corresponding CNA signature
       propensities. Signature A and B genomes reflect two distinct
       trajectories from diploidy to tumor aneuploidy. Signature A tumors
       are enriched for mutations in p53 and have smaller sized
       amplification and deletion genomic regions (i.e., have a higher
       number of genomic breakpoints), potentially providing increased
       alternative genome options. Signature A involves amplification of
       several genes in glycolysis‐related pathways (such as HK2, TIGAR,
       TPI1, GAPDH, ENO2, PGAM2, and LDHB). Signature B CNA patterns occur
       in generally less glycolytic and proliferative samples and show
       more variation across different tumor types. In particular,
       signature B is enriched for MDM2 amplification and CDKN2A loss,
       strong MYC amplification, and KRAS mutation and involves alternate
       hexokinase isoforms (HK1, HK3).

   Data information: Data in (A, C) are presented in box (median, first
   and third quartiles) and whisker (extreme value) plots. Error bars
   indicate standard deviations of biological replicates in (A, B, C and
   F). See also Fig [333]EV5.

Figure EV5. Metabolic patterns are predictive of increased glycolysis in MEF
lines (related to Fig [334]6).

   Figure EV5
   [335]Open in a new tab
     * A
       In cross‐prediction tests, signature A‐based predictions of
       signature A MEF line metabolic phenotypes perform better than
       signature B‐based predictions. Glucose consumption was measured
       using a bioanalyzer.
     * B
       In cross‐prediction tests, PC1 scores of signature A MEFs are more
       predictive of average growth fold change in signature A MEFs, while
       scores from signature B MEFs are more predictive in signature B
       MEFs.
     * C–G
       Mass spectrometry‐based metabolite measurements. Samples are
       arranged left to right from the strongest signature B (blue) PC1
       scores to the strongest signature A PC1 (red) scores. Sample D10
       (purple) is a mixed signature A/B line. (C) Relative intracellular
       metabolite levels. (D) Isotopomer distributions for intracellular
       metabolites (M0, monoisotopic molecular weight). (E) Percent label
       for intracellular metabolites defined as percent of metabolite
       molecules with isotopomer mass greater than the monoisotopic
       molecular weight. Note, the high to low range for percent
       metabolite molecules with incorporated label varies for each
       metabolite and generally does not extend from 0 to 100%. (F)
       Relative metabolite consumption from and secretion into media
       normalized to initial media levels (i.e., cellular metabolite
       footprint). Here, positive and negative values indicate metabolite
       secretion into and consumption from the media, respectively. In
       (D–G), the correlation of each metabolic parameter with glucose
       consumption (as measured by a bioanalyzer, and as a positive
       variable, Fig [336]6A) is indicated (Pearson correlation, ρ). In
       the isotopomer cases (D), the summary metric of percent label is
       used for the correlation determination. Notably, cells with high
       glucose consumption rates tend to exhibit high consumption of
       serine, glycine, and glutamine; and high secretion of lactate (with
       concordant lactate secretion results obtained by bioanalyzer‐based
       measurement of the same spent cell culture media). (G) Pathway
       schematics of differences in relative media levels (footprint, from
       F) between signature A and signature B MEFs. Each node is colored
       according the signal‐to‐noise (SNR) ratio. Red indicates higher
       post‐culture media levels (lower consumption or higher secretion
       rates) in signature A MEFs, and blue indicates higher post‐culture
       media levels in signature B MEFs. Error bars indicate standard
       deviations of biological replicates. Footprinting data were
       normalized to the integrated cell number of the time course of
       labeling (24 h), and intracellular metabolite concentrations were
       normalized to the number of cells present at the time of
       extraction. Full metabolomic data can be found in [337]Table EV5.

   To test whether signature A and signature B MEFs differentially use
   glucose, we cultured MEF cells with [1,2‐^13C]‐labeled glucose and
   conducted metabolomic profiling by mass spectrometry (Fig [338]EV5C–G
   and [339]Table EV5) (Metallo et al, [340]2009). In all MEF lines
   tested, we observed a low percentage of single heavy labeled carbon [M1
   isotopomer compared to M2] in pyruvate, lactate, and alanine,
   indicating that the contribution of glucose‐derived carbon from the
   oxidative arm of the pentose phosphate pathway to these metabolites was
   relatively low (Fig [341]EV5D). Nonetheless, the patterns of heavy
   isotope labeling revealed differences in nutrient utilization between
   signature A and B MEF cell lines. On average, metabolites of early
   glycolysis, the pentose phosphate, and nucleotide synthesis pathways
   showed a higher percentage of glucose‐derived heavy carbon labeling in
   signature A MEFs (Figs [342]6E and [343]EV5E). In contrast, signature A
   cells had a lower percentage of glucose‐derived heavy carbon labeling
   in metabolites of the serine synthesis pathway and the TCA cycle. When
   compared to signature B MEFs, signature A cells also tended to exhibit
   increased consumption of serine, glutamine, and other amino acids
   (Fig [344]EV5F and G). The increased glutamine consumption in signature
   A MEFs reflects similarity with basal breast cancer cell lines (also
   signature A associated), which exhibit increased glutamine consumption
   relative to luminal breast cancer cell lines (Timmerman et al,
   [345]2013). In general, the percentage of individual metabolites from
   the early glycolysis, the pentose phosphate, and nucleotide synthesis
   pathways that incorporated heavy, glucose‐derived carbon metabolites
   was positively correlated with glucose consumption rates (e.g.,
   fructose‐1,6‐bisphosphate, Fig [346]6F). Conversely, metabolites from
   the serine synthesis pathway and the TCA cycle showed a negative
   correlation between glucose consumption rates and the percentage of
   each molecule containing a heavy, glucose‐derived carbon (e.g.,
   3‐phosphoserine, Fig [347]6F). While caution must be exercised when
   interpreting metabolite labeling results, we overall observed a
   consistent trend of differences between signature A and B lines. Of
   note, the differences between signature A and B lines were mainly in
   regard to scale, with the strongest signature B lines demonstrating
   similar glycolysis and proliferation rates, as well as similar
   metabolic profiles, as the weakest signature A lines.

   Taken together, these results demonstrate that signature A MEFs, which
   resemble signature A human tumors, exhibit increased glycolysis and to
   a somewhat lesser extent increased proliferation, and have an increased
   relative proportion of glucose‐derived carbon in metabolites of pentose
   phosphate‐associated biosynthetic pathways such as nucleotide
   synthesis. These findings are consistent with published mouse model
   studies in which tumor cells that are channeling glucose toward
   nucleotide biogenesis achieve faster rates of proliferation (Boros
   et al, [348]1998; Ying et al, [349]2012).

   Thus, our cross‐species and pan‐cancer CNA analysis revealed conserved
   amplification regions shared by the majority of tumor types studied
   that are enriched for genes involved in core glycolysis. To aid others
   in pan‐cancer and cross‐species CNA signature comparisons, we have
   created an interactive web‐interface resource available at
   [350]http://systems.crump.ucla.edu/cna_conservation/.

Discussion

   Chromosomal instability and high glycolysis characterize some of the
   most aggressive tumors, and the complexity and plasticity of the
   genomes in aggressive tumors can hinder molecularly targeted therapies
   (Nakamura et al, [351]2011; McGranahan et al, [352]2012; Shi et al,
   [353]2012). While the glycolytic changes associated with tumorigenesis
   were one of the early defining phenotypes of cancer cells (Warburg,
   [354]1956), they have not previously been linked to recurrent DNA copy
   number patterns. Taken together, our experimental and computational
   data support a model in which glycolysis‐linked selective pressures
   encountered during tumorigenesis (e.g., redox stress and senescence)
   shape the highly recurrent DNA copy number alterations found in
   aneuploid human tumors (Fig [355]6G). We found that CNAs in core
   glycolysis enzymes (e.g., HK2) and other cancer‐linked metabolic
   enzymes such as TIGAR are coordinately enriched in tumors with distinct
   CNA signatures. These CNA signatures are predictive of glycolysis,
   including patient FDG‐PET activity and cell line proliferation
   phenotypes. The strong correlation of CNA‐based principle component
   scores to uptake of the glucose analogue FDG in breast cancer patients
   and the predictive power of CNA signatures for breast cancer cell
   metabolism (Fig [356]3) provide support that the CNAs affecting
   metabolic gene loci collectively act as a copy number‐based driver of
   metabolic differences. Importantly, in that exogenous expression of
   hexokinase and enolase enzymes reduced the propensities for
   amplifications of the corresponding endogenous hexokinase and enolase
   loci, these metabolic genes empirically score as driver loci. However,
   we cannot exclude the possibility that the observed metabolic
   differences are in part due to other cancer‐associated regulatory
   changes such as epigenetic events known to affect metabolism (Sebastián
   et al, [357]2012). Combined with the observation that metabolic genes
   can facilitate cellular immortalization (Kondoh et al, [358]2005;
   Kondoh, [359]2009; Kaplon et al, [360]2013), our results implicate
   tumor metabolism as an additional fitness measure linked to how genomic
   instability can enable tumorigenesis.

Chromosomal instability and aneuploidy—positive and negative impact on tumor
cell fitness

   Most solid human tumors exhibit both numerical and structural
   aneuploidy (Holland & Cleveland, [361]2012). Paradoxically, chromosomal
   instability can act either as an oncogene or as a tumor suppressor
   depending on the context (Weaver et al, [362]2007). Moreover, addition
   of a single chromosome in MEF cells induces a stress response that
   impairs proliferation and immortalization (Williams et al, [363]2008).
   However, numerical aneuploidy can lead to chromosomal instability
   (Nicholson et al, [364]2015) which results in subchromosomal gains and
   losses as observed in human tumors, in mouse models of cancer, and in
   immortalized MEF cells. Thus, aneuploidy can cause an initial fitness
   loss due to the costs of dealing with non‐optimized chromosome numbers,
   gene copy numbers, and resulting proteomic imbalances. However, the
   associated state of chromosomal instability enables further evolution
   of the genome toward fitness gains through refinement of gene copy
   number levels. In sum, these fitness gains either outweigh or offset
   the fitness losses due to aneuploidy. In this context, our data support
   a model (Fig [365]6G) in which metabolic selection forces and metabolic
   gene loci contribute to the recurrent patterns of DNA copy number
   alteration observed in human tumors.

Redox stress, biomass accumulation, and associated glycolytic changes in
tumorigenesis

   Tumorigenesis is a complex, multistage process during which cells must
   acquire the capability to maintain redox balance while accumulating the
   macromolecular precursors required for proliferation (DeBerardinis
   et al, [366]2008; Hanahan & Weinberg, [367]2011). Numerous stimuli,
   including RAS mutations, matrix detachment, altered metabolism, and
   hypoxia, induce the accumulation of intracellular ROS (Lee et al,
   [368]1999; Schafer et al, [369]2009; Weinberg et al, [370]2010;
   Anastasiou et al, [371]2011). Because increased ROS levels can trigger
   replicative senescence and subsequent cell cycle arrest (Lee et al,
   [372]1999; Takahashi et al, [373]2006), tumors must maintain pools of
   reduced glutathione using NADPH in part produced via the pentose
   phosphate pathway. Additionally, increased levels of ROS can divert
   glycolytic flux into the pentose phosphate pathway through, for
   example, oxidation and inhibition of the glycolytic enzyme PKM2,
   thereby supplying cells with the reducing power and precursors for
   anabolic processes (Boros et al, [374]1998; Anastasiou et al,
   [375]2011). Consistent with this published knowledge on the role of
   metabolism in tumorigenesis, our study suggests that the metabolic
   stress associated with senescence (Fig [376]4A and D) and the metabolic
   demands of rapid proliferation (Fig [377]6) are components of the
   selective pressures underlying recurrent CNA changes.

Experimentally and computationally deciphering CNA patterns

   Our experimental and bioinformatic approaches complement existing
   approaches for testing hypotheses for the selection pressures
   underlying recurrent CNA patterns observed in human tumors. Other
   CNA‐analysis approaches target different resolutions of the genome.
   Statistical algorithms such as GISTIC (Genomic Identification of
   Significant Targets in Cancer) have identified many strong individual
   driver genes and candidate regions (Mermel et al, [378]2011).
   Integrating CNA data with RNA knockdown screens and gene expression
   data has further identified driver genes missed by statistical analysis
   of CNA data alone (Sanchez‐Garcia et al, [379]2014). RNA
   knockdown‐based analyses have also been used to support a more
   systems‐level model in which the selection for amplification and
   deletion of a particular DNA region is based on the cumulative effects
   of many positive and negative fitness gains from multiple genes within
   that genomic region (Solimini et al, [380]2012; Cai et al, [381]2016).
   Subsequent computational extensions have incorporated somatic mutation
   patterns to infer the cumulative impact of co‐localized genes on
   fitness, and to successfully predict whole chromosome and
   chromosome‐arm resolution‐level CNAs (Davoli et al, [382]2013).

   Our approach using phenotypic data, functional gene sets, and
   cross‐species syntenic mapping has yielded additional insight into the
   selective pressures shaping tumor CNA patterns, namely coordinated
   alteration of genes involved in glycolytic metabolism. Incorporation of
   copy number data from genetically engineered mouse models allowed
   synteny constraints to substantially reduce passenger amplifications
   and define a minimal collection of stringently conserved copy number
   regions. Additionally, to our knowledge, our experimental approach is
   the first reported system in which CNA and associated phenotypes have
   been followed and repeatedly sampled as non‐immortalized cells undergo
   spontaneous genomic instability and proceed from a diploid state to an
   immortalized aneuploid state. Using this approach, we were able to
   validate genomic regions that are (i) associated with increased
   glycolysis and to a lesser but significant extent proliferation, (ii)
   enriched for genes from the core glycolysis pathway, and (iii)
   conserved in both human tumors and mouse models of cancer. Metabolic
   pathways are known to be coordinately regulated by modest changes in
   mRNA expression of functionally related genes (Mootha et al, [383]2003;
   Palaskas et al, [384]2011). The coordinated alterations of metabolic
   genes at the DNA level adds an additional mechanism, namely conserved
   sets of CNA changes, by which glycolysis is dysregulated to promote
   tumorigenesis.

   A strength of our PCA‐based approach is the ability to unbiasedly
   reveal distinct CNA subsignatures within a tumor type. Observed
   subsignatures, confirmed using an independent clustering analysis, were
   found to be associated with previously known pathology‐ or
   profiling‐defined tumor subtypes (e.g., basal/luminal CNA signatures in
   BRCA) (Bergamaschi et al, [385]2006) (Figs [386]1B and [387]EV2A–D).
   Across multiple tumor types, loss of p53 function through p53 mutation
   is associated with the high breakpoint signature A pattern. In
   contrast, loss of the p53/ARF axis via other mechanisms (MDM2
   amplification or CDKN2A deletion) in BRCA human tumors is associated
   with a different CNA signature (signature B) that in general has fewer
   breakpoints. In our experimental follow‐up, the MEF system
   recapitulated a two‐signature pattern. Notably, the two mouse
   signatures and their associated phenotype strengths were defined by the
   initiating loss of tumor suppression event, namely Trp53 mutation
   versus either Mdm2 amplification or Cdkn2a loss. Thus, while the
   consequences of TP53 mutation and “MDM2 amplification/CDKN2A loss” are
   considered functionally similar and therefore mutually exclusive (Wade
   et al, [388]2013), our findings indicate they are not fully equivalent
   in terms of genomic instability and subsequent metabolic evolution. The
   tolerance of more highly disrupted and rearranged genomes upon p53
   mutation appears to allow more flexibility in the evolution of
   aneuploid cancer genomes, thereby resulting in stronger glycolysis and
   somewhat enhanced proliferation. The specific combinations of CNA
   changes occurring in enzyme isoforms defining a metabolic pathway may
   be considered “onco‐metabolic isoenzyme configurations” with
   differential potency, and the sets of combinations possible may be
   limited in part by the degree or specifics of tolerance to genomic
   re‐arrangements.

   In summary, our work illustrates the value of cross‐species comparisons
   in the analysis of DNA copy number data, much as recent pan‐cancer and
   integrated genomic approaches have uncovered novel cancer subtypes and
   driver genes (Gatza et al, [389]2014; Hoadley et al, [390]2014). The
   broken chromosome synteny between human and mouse genomes reduced the
   size of potential driver regions by fivefold on average, and identified
   a relatively small number of amplification and deletion regions highly
   conserved between mouse and human (Figs [391]2D and [392]4E). These
   conserved regions are consistently found across the majority of tumor
   types examined, have subchromosomal scale with a median size on the
   order of 10 megabase pairs, and include the glycolysis pathway enzyme
   CNAs observed in our experimental work.

Therapeutic and diagnostic implications

   The most copy number aberrant tumors tend to have fewer point mutations
   in canonical oncogenes (e.g., KRAS, Fig [393]EV2A and C) and less
   canonical oncogene amplification (e.g., MYC, Fig [394]1B). Hence,
   genomic instability and subsequent coordinate alterations in multiple
   genes within a functional pathway may provide an alternate, more
   complex, pathway to acquisition of aggressive tumor phenotypes—with
   tumor evolution and selection guiding the trajectory (Ciriello et al,
   [395]2013). In that KRAS mutation and MYC amplification can drive
   glycolysis (Ying et al, [396]2012; Dang, [397]2013), the findings that
   signature A tumors are de‐enriched in these events relative to
   signature B tumors and enriched for glycolysis gene loci CNA
   amplifications support that tumor cells can meet their metabolic
   demands through distinct mechanisms, or combinations thereof. Future
   models of the most aggressive cases of cancer will need to incorporate
   aspects of spontaneous genomic instability (mediated by distinct
   instability mechanisms) and resulting copy number alterations.
   Translationally, understanding how CNA patterns alter cancer genomes
   and impact cancer phenotypes will aid in the identification of
   metabolic or other “hard‐wired” vulnerabilities that can be
   therapeutically targeted. Furthermore, the relative stability of DNA
   samples, combined with the growing linkage between highly recurrent
   copy number changes and phenotypes, supports the potential for
   molecular classification and diagnostic tests based on DNA copy number
   patterns (Hieronymus et al, [398]2014; Cai et al, [399]2016).

Materials and Methods

Cell culture and mouse strains

Tissue culture

   CD1 mouse embryonic fibroblasts, E14.5, were purchased from Stem Cell
   Technologies. p53^fl/fl MEFs were obtained at day E14.5 from p53^fl/fl
   (FVB.129P2‐Trp53^tm1Brn/Nci) crossed with C57BL/6‐129/SV mice. MEF
   cells were maintained in Dulbecco's modified Eagle's medium without
   pyruvate and supplemented with 10% FBS and 1% SPF. Cells were lifted
   and re‐plated at a density of 9 × 10^5 viable cells/60‐mm dish every
   3–4 days (i.e., 3T9 protocol; Todaro & Green, [400]1963). Cells were
   grown in atmospheric oxygen unless otherwise indicated. Breast cancer
   cell lines were obtained from the laboratory of Frank McCormick and
   extensively profiled both genomically and transcriptionally by the
   laboratory of Joe Gray (Neve et al, [401]2006; Hong et al, [402]2016).

Genetic engineering

   Overexpression of HK1, HK2, kinase‐dead HK2 (D209A/D657A), or ENO2
   glycolysis enzymes and the MYC oncogene or the control protein RFP in
   CD1 MEFs was accomplished by transduction of non‐immortalized cells
   with pDS‐FB‐neo retrovirus, followed by selection in 600 μg/ml G418.
   Deletion of p53 in p53^fl/fl MEFs was induced by infection of
   non‐immortalized cells with either retroviral Cre‐GFP or Cre‐ERT2 plus
   treatment with 1 μM 4‐OHT.

ROS protection

   To protect cells from reactive oxygen species (ROS), cells were
   cultured either at physiological oxygen concentrations (3% O[2]), with
   250 U/ml catalase from bovine liver (Sigma‐Aldrich), or with both 3%
   O[2] and catalase.

Genome profiling

Array CGH profiling

   Genomic DNA was harvested using the DNeasy kit (Qiagen). DNA from MEF
   lines and reference genomic DNA from C57BL/6J mouse tissue were
   hybridized to Agilent SurePrint G3 Mouse CGH 4 × 180 k CGH microarray
   chips at the UCLA Pathology Clinical Microarray Core. Bioconductor
   analysis tools were used for data processing: Moving minimum background
   correction and print‐tip loess normalization were performed in snapCGH
   package (Smith et al, [403]2009); circular binary segmentation (with a
   minimum of three markers per segment) was performed on smoothed and
   log2‐transformed copy number profiles using DNAcopy package (Seshan &
   Olshen, [404]2016); segmented data were converted into a matrix by
   genes for downstream analyses using mus musculus 9 RefSeq reference
   genome from 2011.08.11 in CNTools (Zhang, [405]2016). Copy number
   profiles are presented using the Integrative Genomics Viewer (IGV)
   (Thorvaldsdóttir et al, [406]2013). CNA public datasets and TCGA tumor
   type abbreviations are available in the [407]Data availability section.

Bioinformatic and statistical analysis

PCA

   Principal component analysis (PCA) was performed using the
   mean‐centered matrix of CNA values per gene locus. Genes with identical
   profiles across samples were collapsed to a single representative gene.
   Techniques such as PCA and the related singular value decomposition
   (SVD) have been applied to copy number data previously
   (Sankaranarayanan et al, [408]2015).

Mutation and tumor subtype enrichment analysis

   To test for enrichment of mutations or tumor subtypes within
   CNA‐defined PC scores, TCGA tumors were queried for mutations and tumor
   subtype designations using the cBioPortal for Cancer Genomics (Gao
   et al, [409]2013). For each tumor type, we included all genes with
   significant q‐values as calculated by MutSigCV (Lawrence et al,
   [410]2013). We also included the gene POLE, a catalytic subunit of DNA
   polymerase epsilon, in our analysis because of its frequent mutation in
   UCEC (The Cancer Genome Atlas Research Network, [411]2013). Molecular
   subtypes tested were as follows: BRCA: basal, luminal, claudin‐low, and
   HER2‐enriched (The Cancer Genome Atlas Research Network, [412]2012b);
   LUAD: bronchioid, magnoid, and squamoid (The Cancer Genome Atlas
   Research Network, [413]2014); LUSC: basal, classical, primitive, and
   secretory (The Cancer Genome Atlas Research Network, [414]2012a); OV:
   proliferative, immunoreactive, differentiated, and mesenchymal (The
   Cancer Genome Atlas Research Network, [415]2011); and UCEC: POLE
   ultramutated, microsatellite instability hypermutated, copy number low,
   and copy number high (The Cancer Genome Atlas Research Network,
   [416]2013). Tumors were sorted by their PC1 score, and we calculated a
   Kolmogorov–Smirnov statistic against the expected distribution of
   mutations or tumor subtypes. The statistical significance of enrichment
   was determined by permutation analysis.

Hierarchical clustering

   The TCGA tumors were hierarchically clustered using the pheatmap
   package in R (Kolde, [417]2015). Prior to clustering, the CNA values
   were filtered for gene loci that had zero values across all tumor
   samples. Non‐centered and non‐scaled tumor samples were then clustered
   using centered Pearson correlation distance and Ward's method (with
   dissimilarities squared before cluster updating). The strength of
   concordance between hierarchical clustering results and PCA‐based
   signatures was assessed using the hypergeometric P‐value.

Consistency signatures

   Consistency signatures of conserved amplification and deletions regions
   were determined using stringent consistency criteria. The signed
   absolute minimum consistency score (SAMCS) was defined as non‐zero when
   all CNA summary signatures have the same sign across all tumor types,
   and the score is derived from the absolute value‐based minimum summary
   metric and then re‐signed positive for amplification or negative for
   deletion. Consistent amplifications or deletions were combined into a
   “consistent region”, when absolute SAMCS values greater than 0.05
   spanned at least 1 Mbp. Any two consistent regions separated by < 1 Mbp
   were combined into a single consistent region.

Enrichment analysis and weighted gene voting (WGV)

   Metabolic pathway enrichment analysis (gene set enrichment analysis,
   GSEA; Subramanian et al, [418]2005) and pathway‐specific weighted gene
   voting (WGV) prediction analysis (Golub et al, [419]1999) were
   performed using 75 metabolic pathways defined by the Kyoto Encyclopedia
   of Genes and Genomes (KEGG) database (Kanehisa et al, [420]2014), using
   pathways with seven or more measured genes. In the RNA‐based enrichment
   analysis, we included a gene set consisting of genes from the
   glycolysis and pentose phosphate pathways that were upregulated in
   FDG‐high BRCA tumors, as defined by our previous work (Palaskas et al,
   [421]2011). For CNA data, an expanded run of enrichment analysis was
   performed using 1,321 canonical pathways (CP) of seven or more measured
   genes as defined by the Broad Institute's Molecular Signatures Database
   (MSigDB). For KEGG‐based metabolic pathway enrichment analysis of CNA
   data, we collapsed metabolic isoenzyme loci (genes with the same enzyme
   activity; Enzyme Commission [EC] numbers) that were within
   100 kilobases from each other into a single representative locus. For
   mRNA expression analysis, metabolic isoenzymes were not collapsed. In
   CNA‐based enrichment analysis using gene‐based versions of the genome
   consistency signatures defined above, enrichment scores were calculated
   through the ranked set of consistently amplified genes since after this
   point the genes that are not consistent across signatures have a
   consistency value of zero and accordingly have tied ranks. Consistency
   regions were stepwise restricted by sequentially adding human, and
   corresponding tissue‐matched mouse model of cancer, signatures based on
   their decreasing tissue type PC1 value from Fig [422]1A. To combine
   BRCA and LU into one enrichment analysis for mRNA data, genes were
   ordered by their average rank in BRCA and LU tumor types. For WGV
   predictions of metabolic phenotypes, t‐scores were used as gene
   weights. To calculate permutation P‐values, we calculated the fraction
   of 1,000 randomly chosen gene sets of equal size that gave average gene
   set rankings (column 2 of Fig [423]3C) better than the true gene set.
   False discovery rate (FDR) q‐values were calculated using the
   Benjamini–Hochberg procedure. To increase statistical stringency, FDR
   values for each individual glycolysis–gluconeogenesis gene set (KEGG
   Glycolysis‐Gluconeogenesis, KEGG Core Glycolysis, KEGG
   Glycolysis‐Gluconeogenesis & Pentose Phosphate Pathway, Reactome
   Gluconeogenesis, Reactome Glucose Metabolism, Reactome Glycolysis,
   Biocarta Glycolysis Pathway) were calculated while removing the other
   glycolysis–gluconeogenesis gene sets. CNA and metabolite changes were
   visualized in the context of metabolic pathway structure using
   Cytoscape (Smoot et al, [424]2011).

Core glycolysis and glycolysis‐associated gene list (from Fig [425]1F)

   HK—hexokinase (HK1, HK2, HK3, HK4/glucokinase/GCK);
   GPI—glucose‐6‐phosphate isomerase; G6PC—glucose‐6‐phosphatase catalytic
   subunit (G6PC, G6PC2); FBP—fructose‐bisphosphatase (FBP1, FBP2);
   PFK—phosphofructokinase (PFKL—liver type, PFKM—muscle, PFKP—platelet);
   PFKFB—6‐phosphofructo‐2‐kinase/fructose‐2,6‐biphosphatase (PFKFB1,
   PFKFB2, PFKFB3, PFKFB4); TIGAR—TP53 induced glycolysis regulatory
   phosphatase; ALDO—aldolase, fructose‐bisphosphate (ALDOA, ALDOB,
   ALDOC); TPI1—triosephosphate isomerase 1;
   GAPDH—glyceraldehyde‐3‐phosphate dehydrogenase (GAPDH,
   GAPDHS—spermatogenic); PGK—phosphoglycerate kinase (PGK1, PGK2);
   PGAM—phosphoglycerate mutase (PGAM1, PGAM2, PGAM4,
   BPGM—bisphosphoglycerate mutase); ENO—enolase (ENO1, ENO2, ENO3);
   PK—pyruvate kinase (PKLR—liver and red blood cell, PKM2—muscle);
   LDH—lactate dehydrogenase (LDHA, LDHB, LDHC, LDHAL6A—LDH A‐like 6A,
   LDHAL6B—LDH A‐like 6B); and PDH—pyruvate dehydrogenase (PDHA1, PDHA2,
   PDHB, DLD—dihydrolipoamide dehydrogenase, DLAT—dihydrolipoamide
   S‐acetyltransferase).

Metrics

   Signal‐to‐noise ratio (SNR) = (μ[1] − μ[2])/(σ[1] + σ[2]), t‐score =
   (μ[1] − μ[2])/sqrt(σ[1] ^2/n [1] + σ[2] ^2/n [2]), where μ = mean,
   σ = standard deviation, n = number of samples.

Correlation of mRNA and CNA

   Copy number alteration and mRNA expression levels for genes found in
   the cross‐species conserved regions in Fig [426]4E were compared by
   calculating the Spearman rank correlation coefficient. BRCA, LU, and OV
   samples with paired mRNA and CNA data were included. UCEC tumors were
   excluded because there were not sufficient UCEC samples with paired RNA
   expression data (26% of all samples with CNA data). Known oncogenes
   were identified by comparison with the Catalogue Of Somatic Mutations
   In Cancer database (COSMIC, [427]http://cancer.sanger.ac.uk/cosmic;
   Forbes et al, [428]2015).

Senescence score

   A summary senescence score for each MEF subline was calculated by
   subtracting the area under the MEF's growth curve, Z(x), in log[2]
   scale, from the area under an ideal growth curve, Y(x) (i.e.,
   consistent growth at the fastest observed rate). The area difference
   was then averaged by dividing by the passage number, p, and
   log[2]‐transformed, resulting in a normally distributed score.
   [MATH:
   <mrow><mi>S</mi><mi>e</mi><mi>n</mi><mi>e</mi><mi>s</mi><mi>c</mi><mi>e
   </mi><mi>n</mi><mi>c</mi><mi>e</mi><mspace
   width="0.166667em"></mspace><mi>S</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi
   >e</mi><mo>=</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>2</mn></msub><
   mfrac><mrow><msubsup><mo>∑</mo><mrow><mi>x</mi><mo>=</mo><mn>2</mn></mr
   ow><mi>p</mi></msubsup><mfenced close=")" open="("
   separators=""><mfrac><mfenced close=")" open="("
   separators=""><mrow><msub><mi>Y</mi><mrow><mo
   stretchy="false">(</mo><mi>x</mi><mo
   stretchy="false">)</mo></mrow></msub><mo>+</mo><msub><mi>Y</mi><mrow><m
   o stretchy="false">(</mo><mrow><mi>x</mi><mo>−</mo><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></msub></mrow></mfenced><mn>2</mn></mfrac
   ><mo>−</mo><mfrac><mfenced close=")" open="("
   separators=""><mrow><msub><mi>Z</mi><mrow><mo
   stretchy="false">(</mo><mi>x</mi><mo
   stretchy="false">)</mo></mrow></msub><mo>+</mo><msub><mi>Z</mi><mrow><m
   o stretchy="false">(</mo><mrow><mi>x</mi><mo>−</mo><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></msub></mrow></mfenced><mn>2</mn></mfrac
   ></mfenced></mrow><mi>p</mi></mfrac></mrow> :MATH]

Integrated CNA

   A genomic instability score termed “integrated CNA” was calculated by
   summation of the Circular Binary Segmentation algorithm‐inferred
   absolute mean copy number of segments multiplied by the length of each
   segment.
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><msub><mtext>Int.CNA</mtext><mi
   mathvariant="normal">sample</mi></msub></mtd><mtd
   columnalign="left"><mo>=</mo></mtd></mtr><mtr><mtd></mtd><mtd
   columnalign="left"><mfrac><mrow><msub><mo>∑</mo><mi
   mathvariant="normal">segments</mi></msub><mrow><mo
   stretchy="false">|</mo><mtext>segment
   end</mtext><mo>−</mo><mtext>segment</mtext><mspace
   width="0.166667em"></mspace><mspace
   width="0.166667em"></mspace><mtext>start</mtext><mo
   stretchy="false">|</mo><mo>×</mo><mo
   stretchy="false">|</mo><mtext>segment</mtext><mspace
   width="0.166667em"></mspace><mspace
   width="0.166667em"></mspace><mtext>mean</mtext><mo
   stretchy="false">|</mo></mrow></mrow><mrow><mo>#</mo><mrow><mtext>base
   pairs in
   sample</mtext></mrow></mrow></mfrac></mtd></mtr></mtable></mrow> :MATH]

Box and whisker plots

   In box and whisker plots, the box represents the median, as well as the
   first and third quartiles, and the whisker indicates the extreme values
   within 1.5 times the inter‐quartile range. In cases where the number of
   samples permitted, individual values are superimposed as jitter plots.

Statistical tests

   Indicated P‐values were calculated using (i) Student's t‐test for
   normally distributed data (with normality confirmed by the P‐value of
   the Shapiro–Wilk test being > 0.05 for both datasets under comparison);
   (ii) Mann–Whitney U‐test for non‐normally distributed data; (iii)
   hypergeometric distribution P‐values; and (iv) permutation‐based
   approaches, as described in the figure legends. For data with a single
   data point in one comparison group, the z‐score was used.

Propidium iodide staining

   Mouse embryonic fibroblasts cells were washed in cold PBS and then
   fixed in ice‐cold 70% ethanol. Fixed cells were washed in PBS and then
   incubated for 15 min in PBS with 20 μg/ml propidium iodide and 0.1%
   (v/v) Triton X‐100. Data were acquired using a FACSCalibur (Becton
   Dickinson) analytic flow cytometer in the UCLA Jonsson Comprehensive
   Cancer Center and Center for AIDS Research Flow Cytometry Core
   Facility. Cells were gated using forward scatter and side scatter to
   remove debris and dead cells, and 10,000 cell events were recorded.

Spectral karyotyping

   Exponentially growing MEF cells were exposed to colcemid (0.04 μg/ml)
   for 1 h at 37°C and to hypotonic treatment (0.075 M KCl) for 20 min at
   room temperature. Cells were fixed in a mixture of methanol and acetic
   acid (3:1 by volume) for 15 min, and then washed three times in the
   fixative. Slides were prepared by dropping the cell suspension onto wet
   slides followed by air‐drying. Slides were processed for spectral
   karyotyping (SKY) according to the manufacturer's protocol with slight
   modifications using mouse paint probes (ASI, Vista, CA). Images were
   captured using Nikon 80i microscope equipped with spectral karyotyping
   software from ASI, Vista, CA; 12–18 metaphases were karyotyped from
   each cell line.

Immunoblot analysis

   Cells were lysed in modified RIPA buffer (50 mM Tris–HCl (pH 7.5), 150
   NaCl, 10 mM β‐glycerophosphate, 1% NP‐40, 0.25% sodium deoxycholate,
   10 mM sodium pyrophosphate, 30 mM sodium fluoride, 1 mM EDTA, 1 mM
   vanadate, 20 μg/ml aprotinin, 20 μg/ml leupeptin, and 1 mM
   phenylmethylsulfonyl fluoride). Whole‐cell lysates were resolved by
   SDS–PAGE on 4–15% gradient gels and blotted onto nitrocellulose
   membranes (Bio‐Rad). Membranes were blocked overnight and then
   incubated sequentially with primary and either HRP‐conjugated (Pierce)
   or IRDye‐conjugated secondary antibodies (Li‐Cor). Blots were imaged
   using the Odyssey Infrared Imaging System (Li‐Cor). Protein levels were
   quantitated using ImageJ ([429]http://imagej.nih.gov/ij/). Primary
   antibodies used for Western blot analysis included hexokinase 1 (2024,
   Cell Signaling Technology), hexokinase 2 (2867, Cell Signaling
   Technology), p53 (NB200‐103, Novus Biologicals), and enolase 2 (8171,
   Cell Signaling Technology).

Glucose consumption and lactate secretion measurements

BRCA cell lines

   Lactate secretion rates of breast cancer cell lines were measured from
   the culture media using a colorimetric assay kit (BioVision) (Hong
   et al, [430]2016).

MEFs

   Glucose consumption and lactate secretion rates of MEFs were measured
   using a BioProfile Basic bioanalyzer (NOVA Biomedical). Data were
   normalized to the integrated cell number, which was calculated based on
   cell counts at the start and end of the time course and an exponential
   growth equation. Because the proliferation rates of MEF sublines vary,
   each cell line was seeded at the appropriate density so as to give an
   integrated cell number of approximately 6.5 × 10^5 cells in a 6‐well
   plate. All samples were run as biological triplicates, and consistent
   results were seen in multiple independent experiments.

Mass spectrometry‐based metabolomic analyses

Sample preparation

   Mouse embryonic fibroblasts sublines were seeded onto 6‐well plates,
   and after 24 h, media was replaced with media containing 4.5 g/l
   [1,2‐^13C]‐labeled glucose. Sample collection occurred after 24 h of
   culture in the labeled glucose media. For intracellular metabolite
   analysis, cells were washed with ice‐cold 150 mM ammonium acetate
   (NH[4]AcO) pH 7.3 and metabolites extracted in 1 ml ice‐cold 80% MeOH.
   The cells were quickly transferred into a microfuge tube, and 10 nmol
   norvaline was added to the cell suspension for use as an internal
   standard. The suspension was subsequently vortexed three times over
   15 min and then spun down at 4°C for 5 min. The supernatant was
   transferred into a glass vial, the cell pellet was re‐extracted with
   200 μl ice‐cold 80% MeOH and spun down and the supernatants were
   combined. Metabolites were dried at 30°C under vacuum and re‐suspended
   in 50 μl of 70% acetonitrile (ACN). For cell culture media metabolite
   analysis (footprint profiling), 20 μl of cell‐free media samples was
   collected. Metabolites were extracted by adding 300 μl ice‐cold 80%
   methanol, followed by vortexing three times over 15 min, and
   centrifugation for 10 min at 13,000 rpm at 4°C. The supernatant was
   transferred to a fresh tube, dried using a vacuum evaporator, and
   re‐suspended in 50 μl of 70% acetonitrile (ACN); 5 μl was used for mass
   spectrometry‐based analysis.

Mass spectrometry runs

   Samples were run on a Q‐Exactive mass spectrometer coupled to an
   UltiMate 3000RSLC UHPLC system (Thermo Scientific). The mass
   spectrometer was run in polarity switching mode (+3.00 kV/−2.25 kV)
   with an m/z window ranging from 65 to 975. Mobile phase A was 5 mM
   NH[4]AcO, pH 9.9, and mobile phase B was ACN. Metabolites were
   separated on a Luna 3 μm NH2 100 Å (150 × 2.0 mm) (Phenomenex) column.
   The flow was kept at 200 μl/min, and the gradient was from 15% A to 95%
   A in 18 min, followed by an isocratic step for 9 min and
   re‐equilibration for 7 min.

Data analysis

   Metabolites were detected and quantified as area under the curve (AUC)
   based on retention time and accurate mass (≤ 3 ppm) using the
   TraceFinder 3.1 (Thermo Scientific) software. Relative amounts of
   metabolites between various conditions, percentage of metabolite
   isotopomers (relative to all isotopomers of that metabolite), and
   percentage of labeled metabolite molecules (isotopomer M1 and greater,
   relative to all isotopomers) were calculated and corrected for
   naturally occurring ^13C abundance (Yuan et al, [431]2008).
   Footprinting data were normalized to the integrated cell number as
   described above, and intracellular metabolite concentrations were
   normalized to the number of cells present at the time of extraction.
   All samples were run as biological triplicates, and consistent results
   were seen in independent experiments. Our analysis focused on
   metabolite level, percent isotopomer, and percent labeled metabolite
   measurements with ANOVA P‐values across the sample panel of < 0.05 in
   individual experiments, and Pearson correlation coefficients across all
   samples and between independent experiments of > 0.5.

Patient tumor samples and quantitative FDG‐PET imaging

   Breast cancer patient samples with imaged FDG uptake within 4 weeks
   prior to surgery, excluding patients with secondary breast cancers and
   recurrent disease, were collected surgically and processed as
   previously described (Palaskas et al, [432]2011). Of eighteen tumors
   collected in the original study for RNA microarrays (Palaskas et al,
   [433]2011), ten samples had sufficient remaining frozen tissue for
   array CGH profiling. None of the patients received systemic therapy or
   radiation prior to imaging. ^18FDG tumor uptake was quantified as
   standardized uptake values (SUVs) and showed the expected wide dynamic
   range (3.8–18.5). There was no significant difference in patient age,
   tumor volume, and lymph node involvement between the groups of FDG‐high
   and FDG‐low breast cancers. Breast cancers with high ^18FDG‐PET SUVs
   frequently lacked expression of the estrogen receptor (ER) and the
   progesterone receptor (PR), but hormone receptor‐negative tumors were
   also represented among the tumors with the lowest FDG uptake (Palaskas
   et al, [434]2011). We excluded lobular breast carcinomas, because they
   have been shown to take up less FDG than ductal carcinomas (Avril
   et al, [435]2001) We excluded large breast carcinomas (> 5 cm) and
   breast carcinomas with multifocal FDG uptake because our protocol did
   not include tissue autoradiography to direct the molecular tissue
   analysis to areas of distinct radiotracer retention. This study was
   approved by the Institutional Review Board (IRB) of Memorial
   Sloan‐Kettering Cancer Center, and all participating patients signed
   the informed consent.

Data availability

CNA dataset

   Copy number profiling data for wild‐type and genetically modified MEF
   samples and FDG‐PET‐imaged human breast tumors are available through
   Gene Expression Omnibus (GEO) accession [436]GSE63306
   ([437]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63306).

CNA conservation web resource

   An interactive website for user‐defined pan‐cancer and cross‐species
   CNA conservation analysis to perform analysis analogous to that in
   Figs [438]1C, [439]2D, and [440]4E using any combination of tens of
   available CNA signatures from human tumors and mouse models (and
   additional signatures as they become available) and/or the inclusion of
   uploaded CNA signatures
   ([441]http://systems.crump.ucla.edu/cna_conservation/). Signatures and
   genome reference files used in the interactive website are additionally
   available through the Biostudies repository, accession number S‐BSST7.

Metabolomics dataset

   Provided in [442]Table EV5.

TCGA CNA dataset

   The Cancer Genome Atlas (TCGA) CNA profiles were downloaded from the
   TCGA portal in September 2012 ([443]https://cancergenome.nih.gov/).
   Copy number profiles obtained were pre‐processed level 3 data based on
   human genome 19, with copy number variations (CNVs) removed. TCGA tumor
   type abbreviations and number of samples analyzed: bladder urothelial
   carcinoma (BLCA, 97 samples), brain lower grade glioma (LGG, 181),
   breast invasive carcinoma (BRCA, 873), colon adenocarcinoma (COAD,
   447), glioblastoma multiforme (GBM, 593), head and neck squamous cell
   carcinoma (HNSC, 308), kidney renal clear cell carcinoma (KIRC, 539),
   lung adenocarcinoma (LUAD, 368), lung squamous cell carcinoma (LUSC,
   359), ovarian serous cystadenocarcinoma (OV, 584), prostate
   adenocarcinoma (PRAD, 171), rectum adenocarcinoma (READ, 168), skin
   cutaneous melanoma (SKCM, 256), stomach adenocarcinoma (STAD, 162),
   thyroid carcinoma (THCA, 333), and uterine corpus endometrial carcinoma
   (UCEC, 492).

TCGA mRNA expression dataset

   The Cancer Genome Atlas mRNA expression data were downloaded from the
   TCGA portal in May 2014. Gene‐based mRNA expression levels were
   pre‐processed, normalized Level 3 RNA Seq V2 RSEM values. TCGA tumor
   type abbreviations and number of samples analyzed: BRCA (865), LUAD
   (357), LUSC (358), OV (263).

Mouse tumor model CNA datasets

   The genetically engineered mouse models with characterized CNA were
   obtained from public datasets: mammary (breast) tumors (Brca, 57
   samples, [444]GSE30710; 62 samples, [445]GSE43997; 44 samples,
   [446]GSE27101) (Drost et al, [447]2011; Herschkowitz et al, [448]2012);
   melanoma (Skcm, 30 samples, [449]GSE58265) (Viros et al, [450]2014);
   glioblastoma/high‐grade astrocytoma (Gbm, 72 samples, [451]GSE22927)
   (Chow et al, [452]2011); and prostate tumors (Prad, 18 samples
   [453]GSE35247; 55 samples, [454]GSE61382) (Ding et al, [455]2012;
   Wanjala et al, [456]2015). Additionally, in vitro epithelial murine
   cell lines modeling human carcinomas were obtained from public
   datasets: transformed colon cells (Coad, seven samples, [457]GSE70790)
   and transformed bladder and kidney cells (Blca and Kirc, 6 and 7
   samples, [458]GSE45128; Padilla‐Nash et al, [459]2013). Abbreviated
   mouse tumor names match to the corresponding tissue‐based human tumor
   abbreviations from the TCGA datasets. The data were obtained from GEO
   and segmented using the algorithm described above. Datasets for which
   no mm9 genome annotation was available on the repository were lifted
   over to mm9 using UCSC web tools (Rosenbloom et al, [460]2015).

Author contributions

   NAG, AM, AL, AC, DB, HRC, IKM, and TGG designed the study. NAG, AM, AL,
   AC, MF, SC, SZ, AD, LB, JtH, DB, and HRC performed experiments. NAG,
   AM, AL, AC, NGB, JG, NP, NS, DB, and TGG performed bioinformatic
   analysis. NAG, AM, AL, CH, CN, RQ, HW, MM, HRC, and TGG provided or
   derived reagents. NAG, AM, AL, DB, HRC, and TGG performed and analyzed
   metabolomics. AM and ASM performed spectral karyotyping. EP, SML, and
   IKM generated patient data. NAG, AM, DB, HRC, IKM, and TGG wrote the
   manuscript. The laboratories of HRC and IKM contributed equally.
   Experiments were performed at UCLA.

Conflict of interest

   The authors declare that they have no conflict of interest.

Supporting information

   Appendix
   [461]Click here for additional data file.^ (3.4MB, pdf)

   Expanded View Figures PDF
   [462]Click here for additional data file.^ (3.2MB, pdf)

   Table EV1
   [463]Click here for additional data file.^ (18.4KB, xlsx)

   Table EV2
   [464]Click here for additional data file.^ (145.7KB, xlsx)

   Table EV3
   [465]Click here for additional data file.^ (19.1KB, xlsx)

   Table EV4
   [466]Click here for additional data file.^ (29.9KB, xlsx)

   Table EV5
   [467]Click here for additional data file.^ (692.1KB, xlsx)

   Review Process File
   [468]Click here for additional data file.^ (1.1MB, pdf)

Acknowledgements