Abstract

   Intra-tumor heterogeneity compromises the clinical value of
   transcriptomic classifications of colorectal cancer. We investigated
   the prognostic effect of transcriptomic heterogeneity and the potential
   for classifications less vulnerable to heterogeneity in a
   single-hospital series of 1093 tumor samples from 692 patients,
   including multiregional samples from 98 primary tumors and 35
   primary-metastasis sets. We show that intra-tumor heterogeneity of the
   consensus molecular subtypes (CMS) is frequent and has poor-prognostic
   associations independently of tumor microenvironment markers.
   Multiregional transcriptomics uncover cancer cell-intrinsic and
   low-heterogeneity signals that recapitulate the intrinsic CMSs proposed
   by single-cell sequencing. Further subclassification identifies
   congruent CMSs that explain a larger proportion of variation in patient
   survival than intra-tumor heterogeneity. Plasticity is indicated by
   discordant intrinsic phenotypes of matched primary and metastatic
   tumors. We conclude that multiregional sampling reconciles the
   prognostic power of tumor classifications from single-cell and bulk
   transcriptomics in the context of intra-tumor heterogeneity, and
   phenotypic plasticity challenges the reconciliation of primary and
   metastatic subtypes.

   Subject terms: Colorectal cancer, Tumour heterogeneity,
   Transcriptomics, Tumour biomarkers
     __________________________________________________________________

   Intratumoral heterogeneity has been documented in multiple cancer
   types, and can be linked to treatment resistance. Here, the authors
   analyse multiregional samples from colorectal cancers and show gene
   expression subtypes which are less vulnerable to heterogeneity and may
   partly contribute to differential patient survival.

Introduction

   Tumor heterogeneity is a main cause of cancer progression and treatment
   failure^[48]1,[49]2. Most solid tumors consist of multiple subclones
   with different genomic profiles, metastatic potentials and responses to
   treatment. Colorectal cancers (CRCs) commonly have polyclonal invasion,
   and genomic heterogeneity of the primary tumor is associated with
   frequent metastasis and a poor patient survival^[50]3,[51]4. However,
   not all tumor subclones have an impact on cancer evolution^[52]5.
   Clonal selection operates on cellular phenotypes, not genotypes, and
   heterogeneity appears to be decoupled at the genomic and transcriptomic
   levels in CRC^[53]6. In fact, it has been proposed that most genomic
   intra-tumor variation of CRCs has no major phenotypic
   consequences^[54]6. This emphasizes the importance of phenotypic
   plasticity^[55]7, but the clinical relevance of intra-tumor
   heterogeneity is less well studied on the transcriptomic level in CRC.

   CRC transcriptomes represent a collection of the four consensus
   molecular subtypes (CMS)^[56]8. This classification reflects tumor
   phenotypes and morphologies, and is associated with patient survival
   and drug sensitivities^[57]9–[58]11. It is generally accepted that the
   CMS framework provides a useful starting point for further
   transcriptomic investigations of primary CRCs. However, the
   classification was developed from single bulk tissue samples of
   individual tumors and is vulnerable to intra-tumor heterogeneity,
   possibly to the point where tumors contain a mixture of all CMS classes
   at different proportions^[59]10,[60]12. This compromises the biomarker
   value and the predictive power of CMS for clinical endpoints^[61]13.
   Single-cell RNA sequencing has illustrated that the diverse cell types
   of the tumor microenvironment contribute strongly to bulk tumor
   transcriptomes, as well as to intra-tumor heterogeneity and the
   definition of tumor subtypes^[62]14–[63]16. Indeed, both the
   classification accuracy and the prognostic value of CMS are confounded
   by the tumor microenvironment^[64]12,[65]13,[66]17. Cancer
   cell-intrinsic expression signals are also shaped by the
   microenvironment^[67]15, but might be less vulnerable to tumor
   heterogeneity^[68]18. Additional classification frameworks such as the
   CRC intrinsic subtypes (CRIS) and the two intrinsic CMS (iCMS) classes
   adhere to this rationale^[69]19,[70]20, but the potentially added
   clinical value of a cancer cell-intrinsic approach has yet to be
   defined. In this context, phenotypic plasticity during metastasis and
   metastatic heterogeneity of the classifications are likely to be
   relevant, as demonstrated with the original CMS^[71]21.

   Single-cell transcriptomics is a powerful technology for mapping of
   tumor heterogeneity. However, the high costs and technical and
   biological variation associated with single-cell analyses are
   challenges that currently limit the application to larger tumor series
   and the integration of datasets^[72]22. We hypothesized that bulk
   transcriptomics of multiple distinct regions of each tumor is a
   complementary approach in this setting, and that multiregional sampling
   can balance the needs to capture intra-tumor and inter-tumor variation.
   This has previously been used to illustrate intra-tumor heterogeneity
   and sampling bias in CRC, although in a limited number of tumors (up to
   25)^[73]6,[74]23,[75]24. Here, we analyze multiregional and single
   samples of primary tumors and liver metastases (n = 1093 samples from
   692 patients) and show that intra-tumor heterogeneity of CMS is
   associated with poor survival in patients with locoregional CRC. We
   further show potential for transcriptomic classifications less
   vulnerable to tumor heterogeneity, based on cancer cell-intrinsic
   signals with uniform expression across tumor regions. This approach
   recapitulates the iCMS from single-cell sequencing, and enables further
   substratification into a congruent CMS framework with prognostic value
   in the context of intra-tumor heterogeneity. We also show plasticity of
   intrinsic subtypes between patient-matched primary tumors and liver
   metastases, and conclude that classifications of primary and metastatic
   CRCs are challenging to reconcile.

Results

Transcriptomic intra-tumor heterogeneity among multiregional samples

   To get an initial overview of transcriptomic intra-tumor heterogeneity
   in CRC, we compared CMS classifications among 2–4 multiregional samples
   from each of 98 primary tumors (n = 286 samples; Fig. [76]1a,
   Supplementary Data [77]1 and Supplementary Fig. [78]1). Intra-tumor CMS
   heterogeneity was found in 40% (seven tumors were undetermined due to
   unclassified samples) and reflected general transcriptomic
   heterogeneity, estimated as the maximum Euclidean distance of principal
   components (PC) 1–3 between any pair of samples per tumor
   (Fig. [79]1b). The level of heterogeneity increased with the number of
   samples per tumor (p = 2 × 10^−4 by Kruskal–Wallis test; Supplementary
   Fig. [80]2b). Both CMS3 and CMS4 were enriched with heterogeneous
   tumors (Fig. [81]1c). CMS4 was most heterogeneous and mixed with other
   subtypes in 84% (n = 26 of 31 tumors with at least one CMS4 sample).
   The most common CMS combinations were CMS2/4 (n = 19, 49% of
   heterogeneous tumors) and CMS1/3 (n = 8, 21%), with CMS4 and CMS3 as
   the minor components, respectively. Combinations of CMS3/4 (n = 3, 8%)
   and CMS1/2 (n = 2, 5%) were rare. Histological cryosections of
   multiregional samples from three selected tumors showed morphological
   differences according to CMS heterogeneity (Supplementary
   Fig. [82]2c–e).

Fig. 1. Landscape of CMS heterogeneity among multiregional samples of primary
CRCs.

   [83]Fig. 1
   [84]Open in a new tab

   a CMS classification of 2–4 multiregional samples from each of 98
   primary tumors ordered according to intra-tumor classification
   heterogeneity. Each column represents one tumor. Spatially separated
   samples are annotated t1–t4 and colored according to the consensus
   molecular subtypes (CMS). MSI, RAS/BRAF and Location indicate the
   microsatellite instability status, mutation status (for BRAF^V600E,
   KRAS and NRAS) and location of each tumor in the large bowel. b Box
   plot of the general transcriptomic heterogeneity of each tumor,
   estimated as the maximum Euclidean distance of PC1–PC3 between any
   pairs of samples per tumor and plotted according to CMS heterogeneity.
   The center line of boxes represents the median, boxes represent the
   interquartile range, and whiskers represent 1.5× the interquartile
   range above the 75th percentile (maxima) or below the 25th percentile
   (minima). p value is two-sided and from Welch’s t-test. Source data are
   provided as a Source Data file. c Frequency of intra-tumor
   heterogeneity of each CMS class. The odds ratio and 95% confidence
   interval for enrichment of heterogeneous tumors are showed above each
   class. Source data are provided as a Source Data file. d Bar plot of
   gene sets sorted according to p values from gene set enrichment
   analysis of tumors with heterogeneous versus homogeneous CMS
   classifications (one random sample per tumor). Enrichment in
   heterogeneous or homogeneous tumors is indicated by the color code.
   Source data are provided as a Source Data file.

   Microsatellite instability (MSI) status and KRAS/NRAS/BRAF^V600E
   mutation status were concordant among all multiregional samples from
   each tumor with CMS heterogeneity (Fig. [85]1a). MSI and BRAF^V600E
   mutations were strongly enriched among tumors with a major CMS1
   component (MSI: 79%, odds ratio [OR] 55.6, 95% confidence interval [CI]
   13.6–303.0, p = 9 × 10^−13; BRAF^V600E: 82%, OR 69.0, 95% CI
   16.2–394.4, p = 9 × 10^−14), and not similarly frequent in CMS1-minor
   tumors (25% of the four tumors with <50% CMS1 samples). KRAS/NRAS
   mutations were most frequent in CMS3 tumors (major or minor) without a
   CMS1 component (88% of the 16 tumors, OR 17.3, 95% CI 1.7–308.7,
   p = 0.005).

Transcriptomic heterogeneity primarily driven by stromal infiltration

   Gene set enrichment analysis of a custom collection of gene sets
   relevant for CRC (n = 54) showed strong enrichment with
   mesenchymal-like and stromal features in tumors with heterogeneous
   compared to homogeneous CMS classifications (Fig. [86]1d and
   Supplementary Data [87]2). Results were similar in subgroup analyses of
   each of the CMS1-3 classes separately (Supplementary Fig. [88]3a and
   Supplementary Data [89]3). Similar results were also found by
   enrichment testing of differentially expressed genes between
   homogeneous and heterogeneous tumors among biological processes in the
   Gene Ontology database (Supplementary Fig. [90]4 and Supplementary
   Data [91]4). Sample-wise estimates of the abundance of
   cancer-associated fibroblasts were higher in heterogeneous tumors, but
   there was no difference in the abundance of cytotoxic lymphocytes
   (Supplementary Fig. [92]5 and Supplementary Data [93]5). This
   highlighted the tumor stroma as a key component of intra-tumor
   transcriptomic heterogeneity, consistent with the frequent
   heterogeneity of CMS4.

   In contrast, homogeneous tumors had strongest enrichment with
   signatures of cell cycle progression and regulation, as well as with
   MYC targets (Fig. [94]1d and Supplementary Fig. [95]3a). This was
   consistent with the large proportion of homogeneous tumors classified
   as CMS2 (48%). Notably, tumors homogeneous for CMS1 or CMS3 had no
   significant enrichments compared to heterogeneous tumors of the
   corresponding subtype, although signals were strongest for immune and
   metabolic processes, respectively (Supplementary Data [96]3). The
   unexpected subset of tumors homogeneous for CMS4 (n = 5 microsatellite
   stable [MSS] tumors not exposed to treatment prior to sampling, two
   with BRAF^V600E or KRAS mutation) were enriched with signatures of
   extracellular matrix organization, the top of colonic crypts and
   inflammatory response (Supplementary Fig. [97]3a). The signature of MYC
   targets was depleted in homogeneous versus heterogeneous CMS4 tumors.
   This was due to high MYC target scores in a subset of tumors with a
   major CMS2 component, and indicated CMS2 admixture also in the samples
   classified as CMS4 (Supplementary Fig. [98]3b). Signatures of
   mesenchymal-like traits and stromal infiltration were high in CMS4
   samples from both heterogeneous and homogenous tumors.

Independent prognostic impact of intra-tumor CMS heterogeneity

   Intra-tumor CMS heterogeneity was not associated with any
   clinicopathological parameters (Supplementary Data [99]5) or 5-year
   relapse-free survival (RFS) in the multiregional sample set (p = 0.7
   from Cox proportional hazards analysis; p = 0.6 from corresponding
   analysis of general transcriptomic heterogeneity as a continuous
   variable). To extend the analyses to a larger patient series, we
   performed computational modeling of intra-tumor CMS heterogeneity in
   single, bulk tissue samples from another 418 primary CRCs
   (Supplementary Data [100]1). The approach is illustrated in
   Supplementary Fig. [101]6 and was based on enrichment scores of
   template gene sets of each CMS class in each sample. The template gene
   sets were identified from differential gene expression analysis of each
   CMS class versus the rest, and the enrichment scores were estimated
   with the R package singscore^[102]25 (further details in “Methods”).
   Tumors with significant enrichments (p < 0.05) for more than one CMS
   class were considered heterogeneous, and the CMS class with the
   strongest enrichment was considered the major subtype. The major
   subtype of each tumor was largely concordant with assignments from the
   original random forest CMSclassifier^[103]8, with an overall accuracy
   of 83% (Cohen’s κ = 0.72 and 0.75 for tumors analyzed on Human
   Transcriptome 2.0 and Human Exon 1.0 ST arrays, respectively;
   Supplementary Fig. [104]7). The majority (88%) of discordances were due
   to more frequent CMS2 classifications with the enrichment analyses. CMS
   heterogeneity was identified in 30% of the tumors. This was less
   frequent than the heterogeneity observed among multiregional samples
   (OR 0.58, 95% CI 0.36–0.96, p = 0.03), and can likely be attributed to
   a combination of limited analytical discriminatory power (the accuracy
   for calling CMS heterogeneity in tumors with multiregional samples was
   72%; Supplementary Fig. [105]8a) and the indication that heterogeneity
   increased with the number of samples analyzed per tumor (Supplementary
   Fig. [106]2b). The distribution of the most common CMS combinations was
   similar between the multiregional and single-sample tumor series, apart
   from a more frequent combination of CMS3 with CMS1 in favor of CMS2
   among multiregional samples (Supplementary Fig. [107]8b), possibly
   related to the enrichment with MSI tumors in this series (Supplementary
   Data [108]1).

   Analysis of the combined tumor series confirmed that intra-tumor CMS
   heterogeneity was associated with a high abundance of cancer-associated
   fibroblasts, but not with any clinicopathological parameter, except for
   frequent CMS heterogeneity among male patients (Supplementary
   Data [109]5). Survival analysis of patients treated by complete
   resection of stage I–III CRC and with determined CMS heterogeneity
   status (n = 387) showed a lower 5-year RFS rate with heterogeneous
   (62.3%, 95% CI 54.2–71.5%) compared to homogeneous tumors (75.8%, 95%
   CI 70.7–81.2%; Fig. [110]2a). Results were similar when excluding
   patients with stage I tumors from the analysis (Supplementary
   Fig. [111]9a). CMS heterogeneity retained prognostic value in a
   multivariable Cox proportional hazards model of all clinicopathological
   and molecular parameters, and was the only molecular marker with a
   significant prognostic association (hazard ratio, HR 1.5, 95% CI
   1.0–2.2, p = 0.05; Supplementary Data [112]6). Notably, CMS
   heterogeneity explained a larger proportion of variation in 5-year RFS
   (11%) than cancer-associated fibroblasts (5%; Fig. [113]2b and
   Supplementary Fig. [114]9b).

Fig. 2. Prognostic value of intra-tumor CMS heterogeneity.

   [115]Fig. 2
   [116]Open in a new tab

   a Relapse-free survival according to intra-tumor heterogeneity of the
   consensus molecular subtypes (CMS) among patients treated by complete
   resection of stage I–III colorectal cancer. Patients with synchronous
   tumors, pre-surgical radiation treatment and undetermined CMS
   heterogeneity were excluded from all analyses. b Explained variation in
   5-year relapse-free survival by each variable in a multivariable Cox
   proportional hazards model, estimated as the percentage of the full
   model. c Survival according to intra-tumor CMS heterogeneity and
   stratified by CMS4 classification. Hazard ratios (HR) and 95%
   confidence intervals (CI) are from Cox proportional hazards analyses
   and p values are two-sided and from Wald tests. CAF cancer-associated
   fibroblasts, CTL cytotoxic lymphocytes, het heterogeneous, hom
   homogeneous, TN stage tumor-node stage.

   A stratified analysis of CMS heterogeneity according to the
   poor-prognostic CMS4 class (CMS4 versus CMS1-3) indicated that
   heterogeneous tumors with a CMS4 component (major or minor) were
   associated with the worst prognosis (Fig. [117]2c and Supplementary
   Fig. [118]9a). Heterogeneous tumors without CMS4 (different
   combinations of CMS1-3) had non-significant associations to worse
   survival relative to homogenous tumors.

Uniform intra-tumor activity of MSI-related and oncogenic processes

   The five CRIS classes derived from cancer cell-intrinsic expression
   signals^[119]19 showed a similar frequency of intra-tumor heterogeneity
   among multiregional samples as CMS (43%, p = 0.5 from Fisher’s exact
   test compared to CMS; Supplementary Data [120]7), although there was no
   significant overlap of tumors with heterogeneity according to CMS and
   CRIS (OR 2.0, 95% CI 0.7–5.4, p = 0.2; Supplementary Fig. [121]10).
   This indicated heterogeneity also within the epithelial cell
   compartment of CRCs and/or a stromal influence on the CRIS
   classification. To further investigate the basis for transcriptomic
   heterogeneity, we categorized protein-coding genes into three groups
   according to an intra-tumor heterogeneity score (ITH-score)
   representing intra-tumor relative to inter-tumor expression variation
   in the multiregional sample set (Fig. [122]3a, Supplementary
   Figs. [123]11 and [124]12, and Supplementary Data [125]8 and [126]9).
   The distribution of the ITH-scores was asymmetrical, with a heavy
   right-sided tail indicating a small subset of genes with high
   intra-tumor heterogeneity (ITH-high: 5% of genes). PC1 of tumor samples
   from principal components analysis (PCA) based on these ITH-high genes
   was most strongly correlated to single-sample enrichment scores of gene
   sets related to stromal and mesenchymal-like features (Fig. [127]3b).
   Similar gene set results were observed for PC1 of ITH-intermediate
   genes (48% of genes; Supplementary Fig. [128]13), supporting that the
   majority of gene expression variation can be attributed to the stromal
   tumor component. In contrast, ITH-low genes (48%) were in a similar
   analysis associated with cancer cell-intrinsic features. PC1 of tumors
   based on ITH-low genes was strongly correlated to MSS/MSI-like
   signatures only, while PC2 was correlated to signatures of the cell
   cycle and proliferation (Fig. [129]3b). Notably, ITH-low genes showed
   less frequent gene set correlations with PC1 than PC2, while the
   opposite was observed for ITH-high and ITH-intermediate genes (OR 0.5,
   95% CI 0.2–0.9, p = 0.02 comparing ITH-low and ITH-high genes;
   Fig. [130]3c). This suggested that genes with uniform expression across
   tumor regions (ITH-low) provided a more subtle tumor characterization
   based on the intrinsic features of cancer cells, compared to the
   dominating contribution from ITH-high genes and the tumor stroma.

Fig. 3. Gene categories and enrichments according to intra-tumor
heterogeneity.

   [131]Fig. 3
   [132]Open in a new tab

   a Upper part shows the proportion of genes (filtered to include only
   genes with inter-sample 10–90th percentile expression range >1)
   categorized as either ITH-low (n = 1540), ITH-intermediate (n = 1549)
   or ITH-high (n = 149) in the multiregional primary tumor set (n = 286
   samples). Lower part shows the density distribution of the
   corresponding ITH-scores, with dashed lines indicating thresholds for
   the three gene categories. The forest plot shows the ITH-score of genes
   designated as colorectal cancer-associated in the Cancer Gene Census
   (n = 13 genes passing the filter for the inter-sample expression
   range). The genes are sorted according to involvement in pathways
   over-represented among ITH-low genes (Wikipathways cancer;
   Supplementary Fig. [133]14). Source data are provided as a Source Data
   file. b Density plots (upper: all gene sets, n = 54) and bar plots
   (lower: 15 top-ranked gene sets) of Spearman’s correlation coefficients
   (absolute values) between PC1 or PC2 of ITH-high (right) or ITH-low
   (left) genes and single-sample enrichment scores of significantly
   correlated gene sets (p < 0.05). Source data are provided as a Source
   Data file. c Number of gene sets (of totally 54) with significant
   Spearman’s correlations (p < 0.05) to PC1 and PC2 of ITH-high and
   ITH-low genes. Source data are provided as a Source Data file. d
   Distribution of ITH-scores among genes in selected signatures. The
   signatures were grouped into six categories, and up to three of the
   top-ranked gene sets according to the correlation analyses in part b
   are plotted per category, in addition to the WNT/β-catenin signature.
   Source data are provided as a Source Data file. CIN chromosomal
   instability, EMT epithelial-mesenchymal transition, ITH intra-tumor
   heterogeneity, MSI microsatellite instability, MSS microsatellite
   stable, PC1 and PC2 principal components 1 and 2, ρ Spearman’s
   correlation coefficient.

   The distribution of ITH-scores among scorable genes in each signature
   supported the results from correlation analyses, showing low ITH-scores
   of most MSI/MSS and cell cycle-related genes relative to
   epithelial-mesenchymal transition genes (Fig. [134]3d). Notably, genes
   involved in hedgehog signaling showed a near bimodal distribution of
   ITH-scores, and this likely accounted for the correlation of this
   signature with both PC1 of ITH-high genes and PC2 of ITH-low genes.
   Genes of WNT/β-catenin signaling and several stem cell signatures were
   predominantly ITH-low, but a small subset of genes in the LGR5 and
   EPHB2 cancer stem cell signatures had high scores, which contributed to
   the correlation of these signatures with PC1 of ITH-high genes.

   Cancer-critical genes, defined by the Cancer Gene Census, were
   underrepresented in the ITH-high category (OR 0.3, 95% CI 0.03–1.0,
   p = 0.05; Supplementary Data [135]9). ITH-low cancer-critical genes
   were enriched in several pathways involved in CRC tumorigenesis, such
   as genomic instability (chromosomal and MSI), WNT signaling and the
   TP53 network (Fig. [136]3a and Supplementary Fig. [137]14). ITH-high or
   ITH-intermediate cancer-critical genes showed no significant
   enrichments in a similar overrepresentation test of the Wikipathway
   cancer collection, suggesting that malignancy processes are not prone
   to intra-tumor heterogeneity on the transcriptomic level.

   The ITH-scores were evaluated in a public single-cell RNA sequencing
   dataset of paired samples from the tumor core and border regions of six
   primary CRCs^[138]15. This confirmed that ITH-high genes had a higher
   expression variation among cells from paired samples than ITH-low genes
   (p < 1 × 10^−10 from Welch’s t-test; Supplementary Fig. [139]15).

Evolution of ITH-low subtypes in primary-metastasis comparisons

   ITH-low genes retained expression variation among tumors in the
   multiregional sample set, and had higher inter-tumor expression ranges
   (10–90th percentiles) than ITH-high (95% CI of the mean difference
   0.56–0.59) or ITH-intermediate genes (95% CI 0.29–0.33; p < 1 × 10^−15
   from Welch’s t-tests; Fig. [140]3a). To investigate the potential for
   transcriptomic classifications less prone to intra-tumor heterogeneity,
   we therefore performed subtype discovery by non-negative matrix
   factorization (NMF) of tumors based on the ITH-low genes. NMF across
   the full sample set (n = 704 samples from 516 primary tumors) at a
   predefined rank of k = 2 clusters resulted in subtypes (denoted k2)
   that were largely concordant with the two iCMS classes previously
   derived from single-cell RNA sequencing of the malignant epithelial
   compartment of CRCs^[141]20 (classification accuracy 90%, Cohen’s
   κ = 0.80; Fig. [142]4a). Subtype characteristics based on gene set
   enrichments were also highly similar between iCMS and k2, and both
   frameworks were primarily distinguished by MSI/MSS-like characteristics
   and immune signatures (Supplementary Fig. [143]16). Both iCMS and k2
   provided largely concordant intra-tumor classifications of
   multiregional primary tumor samples (82% and 99%, respectively;
   Fig. [144]4b). Collectively, this suggested that an average of three
   multiregional samples from each tumor could recapitulate the cancer
   cell-intrinsic subtypes from single-cell sequencing.

Fig. 4. Classification of primary tumors and liver metastases based on
ITH-low and cancer cell-intrinsic genes.

   [145]Fig. 4
   [146]Open in a new tab

   a Alluvial diagrams of classification concordances between iCMS and the
   k2 clusters identified based on ITH-low genes among primary colorectal
   tumors and liver metastases. The sample overlap between classes in iCMS
   and k2 is indicated relative to the total number per subtype. Source
   data are provided as a Source Data file. b Proportion of primary tumors
   with homogeneous and heterogeneous intra-tumor classifications of
   multiregional samples (n = 286 samples) according to the indicated
   frameworks (k2 and k4 are based on the ITH-low genes). Proportion of
   patients with inter-metastatic heterogeneity among liver lesions (2–7
   distinct lesions per patient, total n = 143 metastases) according to
   iCMS and the k2 clusters is plotted to the right. Source data are
   provided as a Source Data file. c iCMS classifications of matched
   primary tumors and liver metastases from 35 patients (n = 179 samples).
   Each column represents one patient. For primary tumors, each square
   represents one multiregional sample numbered with lower case t. For
   liver metastases, each square represents one tumor numbered with upper
   case T and separated by consecutive resections, and diagonal lines
   indicate multiregional samples. Indicated for each patient is the
   platform used for gene expression analysis, diagnosis with synchronous
   versus metachronous metastases and exposure to chemotherapy prior to
   sampling. cCMS congruent consensus molecular subtypes, CRIS colorectal
   cancer intrinsic subtypes, iCMS intrinsic consensus molecular subtypes,
   ITH intra-tumor heterogeneity, k2 and k4 factorization ranks 2 and 4
   from non-negative matrix factorization.

   CRC liver metastases (n = 304 tumor samples from 179 patients) also
   showed concordant classifications between iCMS and the ITH-low k2
   clusters (accuracy 83%, Cohen’s κ = 0.66; Fig. [147]4a). The subtype
   distributions were similar among primary tumors and metastases in both
   frameworks (iCMS: p = 0.8 and k2: p = 0.2 from Pearson’s chi-squared
   tests). Principal components analysis based on ITH-low genes or iCMS
   template genes showed no apparent distinctions according to tumor site
   (colorectum versus liver; Supplementary Fig. [148]17), supporting that
   also ITH-low genes primarily have cancer cell-intrinsic expression and
   that both classifications are directly applicable to metastatic tumors.
   However, the frequency of intra-patient subtype heterogeneity among
   metastatic lesions (n = 2–7 lesions from each of 47 patients) was
   higher than intra-tumor heterogeneity of the primary tumor (iCMS:
   Χ^2 = 6.7, p = 0.008 and k2: Χ^2 = 35.6, p = 3 × 10^−9, both with one
   degree of freedom; Fig. [149]4b). Furthermore, comparisons of
   patient-matched primary tumors and liver metastases (n = 179 samples
   from 35 patients) also showed evidence of phenotypic plasticity. Using
   iCMS for illustration, only 59% of evaluable patients had fully
   concordant classifications (17 of 29 patients, six were not evaluable
   due to unclassified samples; Fig. [150]4c). Subtype switching between
   all or a majority of primary-metastasis samples was observed in eight
   patients (28%). This occurred predominantly from iCMS2 primary tumors
   to iCMS3 liver metastases (six of eight patients, 75%). There was no
   significant association between subtype switching and use of different
   analysis platforms (RNA sequencing versus Human Transcriptome 2.0
   array: Χ^2 = 0.90, p = 0.6), the numbers of samples/tumors per patient
   (p = 0.5 from Wilcoxon’s test), diagnosis with synchronous versus
   metachronous metastases (p > 0.9 from Pearson’s chi-squared test), or
   exposure to chemotherapy prior to sampling (p = 0.9).

Prognostic value of congruent CMS classification based on ITH-low genes

   Subtype discovery based on ITH-low genes was tested with different sets
   of samples and ITH-score thresholds to evaluate a possible impact on
   classification results (Supplementary Figs. [151]18–[152]21; details in
   “Methods”). NMF at k = 4 or k = 5 were identified as the best sample
   clusterings, but k5 included two clusters with similar characteristics,
   and k4 was therefore used for further analyses of the complete primary
   tumor series. This ITH-low classification approach indicated potential
   for an intrinsic classification with a higher resolution than the
   two-state iCMS framework. The k4 clusters ranged in size from 10% to
   53% of samples and subdivided each of the k2 clusters, most prominently
   the cluster corresponding to iCMS3 (Fig. [153]5a, b; the iCMS framework
   was similarly split, Supplementary Fig. [154]22). The
   iCMS3-corresponding cluster was split into one cluster with strong
   immune signals and one with high expression of genes encoding
   extracellular matrix remodeling proteins (FN1 and SPP1^[155]26), while
   the largest and remaining cluster had high relative expression of genes
   involved in maintenance of the secretory intestinal stem cell niche
   (for example, REG4, TFF1, FCGBP and AGR2^[156]27–[157]30; analyzing
   ITH-low genes only; Supplementary Fig. [158]23).

Fig. 5. Prognostic value of the proposed congruent CMS framework.

   [159]Fig. 5
   [160]Open in a new tab

   a Proportion of the four cCMS classes among primary colorectal tumors
   (n = 704 samples from 516 tumors). Source data are provided as a Source
   Data file. b Alluvial diagrams of classification concordances between
   k2 and cCMS (left; representing k2 and k4 classifications based on
   ITH-low genes), as well as cCMS and CMS (right). The sample overlap
   between classes is indicated relative to the total number per subtype
   and illustrated by dashed lines. Source data are provided as a Source
   Data file. c Proportion of MSI tumors across the cCMS classes. Source
   data are provided as a Source Data file. d Relapse-free survival
   according to cCMS among patients treated by complete resection of stage
   I–III CRC (n = 398). Patients with heterogeneous intra-tumor cCMS
   classifications, synchronous tumors, or pre-surgical radiation
   treatment were excluded from analyses. Hazard ratios (HR) and 95%
   confidence intervals (CI) are from Cox proportional hazards analyses
   and p values are two-sided and from Wald tests. e Explained variation
   in 5-year relapse-free survival (n = 398 patients) by each variable in
   a multivariable Cox proportional hazards model, estimated as the
   percentage of the full model. CAF cancer-associated fibroblasts, cCMS
   congruent consensus molecular subtypes, CTL cytotoxic lymphocytes, iCMS
   intrinsic consensus molecular subtypes, ITH intra-tumor heterogeneity,
   k2 and k4 factorization ranks 2 and 4 from non-negative matrix
   factorization, MSI microsatellite instability, MSS microsatellite
   stable, TN stage tumor-node stage.

   The k4 clusters were not independent of the original CMS (Χ^2 = 589,
   nine degrees of freedom, p < 3 × 10^−16), and 67% of samples showed
   concordant classifications with CMS (classification accuracy 68%,
   Cohen’s κ = 0.52; Fig. [161]5b). The strongest discordance was found
   for the cluster corresponding to the original CMS3, and this cluster
   was split between CMS1 and CMS3. Samples with discordant
   classifications were located near the class boundaries in PCA
   (Supplementary Fig. [162]24). Gene set enrichment analyses further
   demonstrated that each of the four sample clusters defined by ITH-low
   genes had similar characteristics to the corresponding CMS class
   (Supplementary Fig. [163]25), and the k4 clusters were therefore termed
   congruent CMS (cCMS). The largest difference was enrichment with
   several signatures in cCMS1 and cCMS2 that in the original CMS
   classification were characteristic of CMS2 only, such as MYC targets
   and cell cycle signatures. This can likely be attributed to
   heterogeneity of the subtypes, tumors, or samples, and cCMS1 and cCMS2
   samples that were not of the corresponding original CMS class were more
   frequently from tumors with CMS heterogeneity (Supplementary
   Fig. [164]25). Intra-tumor classification concordances of multiregional
   samples were higher for cCMS (77% of tumors) than for the original CMS
   (53%, OR 2.4, 95% CI 1.3–4.8) and CRIS frameworks (45%, OR 3.1, 95% CI
   1.6–6.2; Fig. [165]4b), indicating stronger robustness to intra-tumor
   heterogeneity.

   Genomic markers (MSI and BRAF^V600E) and tumor microenvironment markers
   (cancer-associated fibroblasts and cytotoxic lymphocytes) showed
   similar subtype associations in the cCMS and original CMS frameworks,
   with the exception that KRAS mutations were not skewed among cCMS
   classes (Supplementary Data [166]10). Consistent with the strong
   enrichment for MSI-like characteristics among ITH-low genes, MSI status
   was strongly skewed according to cCMS (Χ^2 = 174, three degrees of
   freedom, p < 3 × 10^−16) and enriched in both cCMS1 and cCMS3 (OR 14.7,
   p < 3 × 10^−16 and OR 2.9, p = 4 × 10^−5, respectively; Fig. [167]5c).
   However, repeated subtype discovery of MSS tumors only (based on
   ITH-low genes; Supplementary Fig. [168]18) largely recapitulated the
   cCMS classification (accuracy 90%, Cohen’s κ = 0.82; Supplementary
   Fig. [169]26), indicating that the transcriptomic MSI-like features of
   the ITH-low genes extended beyond the genomic phenotype.

   Clinicopathological associations were also similar between the cCMS and
   original CMS frameworks, although patient age at diagnosis was skewed
   according to cCMS (older age with cCMS1 and younger with cCMS4: OR 4.5,
   p = 1 × 10^−4; Supplementary Data [170]10). Survival analyses of
   patients with concordant intra-tumor classifications (no subtyping
   heterogeneity among multiregional samples) showed that cCMS had strong
   associations to 5-year RFS in stage I–III CRC (n = 398 patients;
   Fig. [171]5d). Higher and lower RFS rates were observed with cCMS1 and
   cCMS4 tumors, respectively, relative to each of the other subtypes.
   These prognostic associations were consistent with cCMS1 consisting
   primarily of an immune-active subset of iCMS3 tumors, and cCMS4 of
   iCMS3 tumors (but also a proportion of iCMS2) with active extracellular
   matrix remodeling, which can promote immune suppression and
   metastasis^[172]15,[173]26,[174]31 (Supplementary Figs. [175]22 and
   [176]23). Results were similar with 5-year overall survival as the
   endpoint, and when excluding patients with stage I tumors
   (Supplementary Fig. [177]27). The cCMS framework retained prognostic
   value when added to the multivariable survival model shown in
   Fig. [178]2b (Table [179]1 and Supplementary Data [180]11), and
   explained a larger proportion of variation in 5-year RFS (21%) than CMS
   heterogeneity (9%) and any other molecular variable (Fig. [181]5e).
   Notably, the original CMS classes had no prognostic value in this
   subset of patients (Supplementary Fig. [182]28).

Table 1.

   Multivariable survival analysis of clinicopathological and molecular
   features in patients with stage I-III CRC
   Variable Patients^a Five-year relapse-free survival^b
   n (%) HR [95% CI] p-value cox.zph p-value
   Total 398 (100) 0.2
   Sex Female 204 (51) Reference 0.2
   Male 194 (49) 1.5 [1.0-2.2] 0.05
   Age (continuous) 398 (100) 1.0 [1.0-1.1] 0.0007 0.009
   Tumor location Right 176 (44) Reference 1.0
   Left 125 (31) 0.9 [0.6-1.5] 0.7
   Rectum 97 (24) 1.0 [0.6-1.7] 0.9
   TNM stage I 95 (24) Reference 0.4
   II 176 (44) 1.2 [0.7-2.2] 0.5
   III 127 (32) 2.1 [1.1-3.9] 0.02
   Adjuvant chemotherapy No 327 (82) Reference 0.4
   Yes 64 (16) 1.2 [0.6-2.2] 0.7
   Unknown 7 (2) 2.4 [0.7-8.1] 0.1
   MSI status MSS 320 (80) Reference 0.6
   MSI 78 (20) 0.8 [0.3-1.7] 0.5
   KRAS status Wild-type 258 (65) Reference 0.06
   Mutation 140 (35) 1.1 [0.7-1.8] 0.6
   BRAF^V600E status Wild-type 328 (82) Reference 0.4
   Mutation 70 (18) 2.6 [1.3-5.3] 0.01
   CTL-score (continuous) 398 (100) 1.2 [0.5-3.2] 0.7 0.6
   CAF-score (continuous) 398 (100) 1.1 [0.8-1.5] 0.7 1.0
   CMS heterogeneity Homogeneous 257 (65) Reference 0.5
   Heterogeneous 115 (29) 1.6 [1.1-2.4] 0.03
   Undetermined 26 (7) 2.2 [1.0-4.7] 0.05
   cCMS cCMS1 70 (18) Reference 0.5
   cCMS2 224 (55) 3.7 [1.6-8.7] 0.002
   cCMS3 76 (19) 2.3 [1.0-5.2] 0.04
   cCMS4 28 (7) 4.3 [1.6-11.3] 0.003
   [183]Open in a new tab

   CAF cancer-associated fibroblasts, CMS consensus molecular subtypes,
   cCMS congruent consensus molecular subtypes, CTL cytotoxic lymphocytes,
   MSI microsatellite instability, MSS microsatellite stable, TNM
   tumor-node-metastasis.

   ^aExcluding patients with synchronous tumors, heterogeneous cCMS
   classification, pre-surgical chemoradiation and residual tumor status 1
   or 2.

   ^bHazard ratios (HR) and 95% confidence intervals (CI) are from a
   multivariable Cox proportional hazards model, p values are two-sided
   and from Wald tests, and cox.zph p values are from tests of the
   proportional hazards assumption. Statistically significant p values are
   highlighted in bold. Results were similar in a stratified analysis
   according to the variable breaking the proportional hazards assumption
   (patient age; Supplementary Data [184]11).

ITH-low classifications of external primary tumor series

   Subtype discovery based on the ITH-low genes was also performed in two
   external datasets for validation purposes (Supplementary Data [185]12).
   As in the in-house series, NMF clustering of tumors in [186]GSE39582
   (n = 566)^[187]32 at a predefined rank of k = 2 was concordant with
   iCMS classification (accuracy 89%, Cohen’s κ = 0.77; Fig. [188]6a). NMF
   at k = 4 failed to distinguish tumors with immune and stromal
   infiltration (Supplementary Fig. [189]29a, b), but clustering at k = 5
   identified subtypes with highly similar characteristics to the k5
   clusters in the in-house series (Fig. [190]6b and Supplementary
   Fig. [191]19b, c). The two clusters from k5 that corresponded to the
   original CMS2 showed no clear distinctions in the custom gene set
   collection for either tumor series (Fig. [192]6d and Supplementary
   Fig. [193]20). However, the consistency of the two clusters in both
   tumor series supported a potential for subclassification of the large
   CMS2 group, and pathway enrichment testing of differentially expressed
   genes between the clusters in the KEGG pathway database indicated
   separation based on signatures of bacterial and viral infection, the
   cell cycle, and several oncogenic or tumor suppressor signaling
   pathways (Supplementary Fig. [194]29c). The k5 clusters also had
   prognostic associations among stage I–III cancers in the [195]GSE39582
   series and identified a subset of mesenchymal-like tumors associated
   with a low 5-year RFS rate (Fig. [196]6e). This subtype (denoted NMF4)
   had only partial overlap with the original CMS4 (Fig. [197]6b) and
   improved the prognostic stratification of tumors relative to the
   original CMS classification (Supplementary Fig. [198]29d). The
   CMS1-corresponding cluster (denoted NMF1) had a higher 5-year RFS rate
   than the other subtypes among stage III cancers, but not among stage II
   (Supplementary Fig. [199]29e).

Fig. 6. Classification of external primary tumor series based on ITH-low
genes.

   [200]Fig. 6
   [201]Open in a new tab

   Alluvial diagrams of classification concordances (a) between iCMS and
   the ITH-low k2 clusters among tumors in the [202]GSE39582 series, (b)
   the ITH-low k2 and k5 clusters and the original CMS in [203]GSE39582
   (top), as well as the iCMS, ITH-low k6 and original CMS in TCGA
   (bottom). The sample overlap is indicated relative to the total number
   per subtype. Source data are provided as a Source Data file. c Pie
   charts of the proportion of tumors in each of the ITH-low k5 and k6
   clusters in [204]GSE39582 and TCGA, respectively, bar charts of the
   proportion of MSI tumors in each cluster, and (d) heat maps of p values
   from gene set enrichment analyses (log10-scale; red: positive; blue:
   negative). Gene sets were from the custom collection and selected to
   include the same as in the corresponding analysis of the in-house tumor
   series (Supplementary Fig. [205]20). Source data are provided as a
   Source Data file. e Five-year relapse-free survival according to the
   ITH-low k5 clusters in patients with stage I–III CRC in [206]GSE39582.
   Hazard ratios (HR) and 95% confidence intervals (CI) are from Cox
   proportional hazards analyses and p values are two-sided and from Wald
   tests. The two-sided log-rank p value across subtypes is also given. f
   Common pathway enrichments in the KEGG database of differentially
   expressed genes between the two CMS2-corresponding subtypes (NMF2 and
   NMF2.5) from ITH-low k5 clustering of the in-house and [207]GSE39582
   series, and ITH-low k6 clustering of the TCGA series. The eight
   pathways with highest max enrichment score among the common pathways of
   the top-50 most significantly enriched in each tumor series were
   selected for plotting. Dot sizes indicate the significance level
   (Benjamini–Hochberg adjusted p value on log10-scale). Source data are
   provided as a Source Data file. cCMS congruent consensus molecular
   subtypes, EMT epithelial-mesenchymal transition, iCMS intrinsic
   consensus molecular subtypes, ITH intra-tumor heterogeneity, k2 and k4
   factorization ranks 2 and 4 from NMF, MSI microsatellite instability,
   MSS microsatellite stable, NMF non-negative matrix factorization, TCGA
   The Cancer Genome Atlas.

   Clustering of tumors in The Cancer Genome Atlas series (TCGA;
   n = 573)^[208]33 based on ITH-low genes at k = 2 segregated a small
   subset of exclusively MSS tumors (9%), and showed little correspondence
   with iCMS (Supplementary Fig. [209]30a). Clustering at k = 5 failed to
   distinguish tumors with immune and stromal infiltration (Supplementary
   Fig. [210]30b), similarly to the k4 clusters in [211]GSE39582, and a
   higher factorization rank was therefore used. Clustering at k = 6
   identified subtypes that showed a similar degree of overlap with the
   original CMS classification (accuracy 58% and Cohen’s κ = 0.41) as the
   k5 clusters in both the in-house and [212]GSE39582 series (in-house:
   accuracy 63% and Cohen’s κ = 0.44; [213]GSE39582: accuracy 67% and
   Cohen’s κ = 0.52; Fig. [214]6b, c and Supplementary Fig. [215]19c). The
   k6 clusters in TCGA included two CMS2-corresponding clusters (termed
   NMF2 and NMF2.5) and two CMS3-corresponding clusters (termed NMF3 and
   NMF3.5; Fig. [216]6d). Notably, merging of the two CMS2-corresponding
   clusters (NMF2 and NMF2.5) versus the rest (NMF1, NMF3, NMF3.5, NMF4)
   provided a two-state classification concordant with iCMS (accuracy 85%
   and Cohen’s κ = 0.69; Fig. [217]6b). Furthermore, comparisons of the
   two CMS2-corresponding clusters by pathway enrichment analyses of
   differentially expressed genes showed several of the same distinctions
   in TCGA as in the in-house and [218]GSE39582 series, providing further
   support for the subclassification of CMS2 based on characteristics such
   as bacterial or viral infections (Fig. [219]6f). The two
   CMS3-corresponding clusters in TCGA (NMF3 and NMF3.5) were primarily
   distinguished based on MSI status and signatures of the bottom versus
   top of colonic crypts (Fig. [220]6c, d). Collectively, these validation
   analyses suggested that the ITH-low genes distinguished tumors in
   independent series according to the same biological and
   clinicopathological characteristics, although with a varying number of
   sample clusters (factorization ranks).

Discussion

   Multiregional tumor transcriptomics represents a feasible approach to
   balance the needs to capture both intra-tumor and inter-tumor gene
   expression variation. We analyzed a large series of multiregional
   samples from primary CRCs and used this to distinguish heterogeneous
   and uniform expression features across tumor regions, while retaining
   information of tumor subtypes (that is, variation across tumors). Three
   bulk samples per tumor could recapitulate cancer cell-intrinsic
   expression patterns and subtypes that were less vulnerable to
   intra-tumor heterogeneity. While single-cell RNA sequencing was needed
   to initially delineate these patterns and define the iCMS
   classification^[221]20, this study showed potential to expand on the
   knowledge and suggested a further substratification of ITH-low
   intrinsic subtypes. This resulted in a split predominantly of the
   subtype corresponding to iCMS3. However, the split was not primarily
   defined by MSI status, as proposed with the refined IMF
   (intrinsic-microsatellite-fibrosis) classification^[222]20. The ITH-low
   subtypes rather converged on having the same discriminatory biological
   features as the original CMS^[223]8, although it has previously been
   shown that the original CMS classifier is depleted of genes with
   uniform expression among tumor glands^[224]6. Nonetheless, the
   convergence is consistent with the assumption that the tumor
   microenvironment is at least partly shaped by malignant epithelial
   cells and that the tumor epithelium can recapitulate the CMS
   classification^[225]16. This was also the premise for the successful
   classification of diverse pre-clinical model systems according to
   CMS^[226]34,[227]35. Overall, this supports that CMS-related features
   provide a bona fide phenotypic stratification of CRCs, but the precise
   cellular interactions defining the subtypes with a rich
   microenvironment component are still to be uncovered. Spatial
   transcriptomics has potential to delineate such interactions, as
   recently shown with the detailed description of the interaction
   networks of immune and malignant cells according to MSI status of the
   tumors^[228]36. In this context, the congruent subtypes proposed in
   this study can be considered as an alternative CMS classification that
   is based on cancer cell-intrinsic template genes and therefore less
   vulnerable to intra-tumor heterogeneity. However, this interpretation
   does not fully account for the stronger prognostic power of the
   congruent subtypes.

   In contrast to the original CMS classification, the congruent CMS
   classes provided substantial prognostic value beyond both intra-tumor
   heterogeneity and the tumor microenvironment components in patients
   with locoregional cancer. However, the two prognostic subtypes (cCMS1
   and cCMS4) constituted only one-fourth of the tumors in total. Both
   prognostic subtypes were dominated by tumors corresponding to iCMS3,
   but included only approximately half of all iCMS3 tumors. This is
   largely consistent with the original publication showing that the
   binary iCMS classification is not prognostic^[229]20. A poor patient
   survival was found with fibrotic iCMS3 tumors only, and this subtype
   constituted ~30% of iCMS3 tumors and 14% in total. Notably, the
   proposed cCMS classification additionally identified a subset of mostly
   iCMS3 tumors with a favorable prognostic association, independently of
   MSI status. This further supports that substratification of iCMS is
   needed in the evaluation of patient prognosis, and the proposed cCMS
   might reconcile the single-cell-derived iCMS and the original bulk
   transcriptomics-derived CMS for this purpose. Application of cCMS to
   additional tumors is not dependent on multiregional sampling and can be
   done based on the ITH-low genes, as illustrated in two external primary
   tumor series. However, different factorization ranks were needed to
   identify corresponding subtypes in the different series, and the
   optimal number of ITH-low subtypes remains inconclusive. It is not
   clear whether this inconsistency is related to technical variation from
   use of different gene expression platforms or to biological differences
   among the series. Nonetheless, subtypes with similar gene expression
   characteristics to the four cCMS classes were found in both external
   series, and a potential for subclassification of the large and
   heterogeneous group of CMS2-corresponding tumors based on
   characteristics such as bacterial or viral infections was supported in
   all the series analyzed. This subclassification is also consistent with
   a microbiome-dependent subtype proposed in a previous study^[230]37.
   Additional translational studies are needed to consolidate the ITH-low
   classification, support the prognostic value and explore additional
   clinical relevance, for example, by associations with drug
   sensitivities.

   Our work also showed prognostic relevance of intra-tumor heterogeneity.
   These results are highly similar to a previous study based on
   computational deconvolution of intra-tumor CMS heterogeneity in single
   samples of stage III colon cancers^[231]38, and supported a poor
   prognosis with a minor CMS4 component in particular. Notably, the CMS
   combinations frequently observed by multiregional sampling, or
   estimated by computational enrichments, were similar to results from
   single-cell sequencing of a smaller tumor series^[232]15. Nonetheless,
   tumors analyzed by the largest number of multiregional samples were
   frequently scored as heterogeneous, and it is likely that CMS
   heterogeneity is underestimated in studies based on bulk
   transcriptomics. Even small tumor subclones can have clinical relevance
   with respect to development of resistance during treatment^[233]39, but
   we cannot conclude on the lower limit of what can be considered
   prognostically relevant transcriptomic heterogeneity, or on the number
   of samples needed to detect this. According to the “big bang” model of
   CRC development, invasive cancers have spatially intermixed subclones
   on the genomic level^[234]40. This supports the potential to capture
   heterogeneity with a small number of samples, although a potential
   caveat is that such clonal intermixing is not necessarily reflected on
   the transcriptomic and phenotypic levels.

   Transcriptomic subtypes based on cancer cell-intrinsic signals have the
   presumed advantage of being applicable to both primary and metastatic
   tumors, without the need to adapt the classification approach. This was
   supported by PCA based on the ITH-low genes detected in this study and
   on the iCMS template genes, both showing intermingling of primary
   tumors and liver metastasis, which is in contrast to results based on
   unselected genes^[235]41. In further support of the appropriateness of
   intrinsic classifications for metastatic tumors, we did not observe any
   subtype depletions or shift in the distribution of iCMS classes between
   primary tumors and liver metastases. This was unexpected based on the
   strong depletion of the original CMS1 and CMS3 classes among
   metastases^[236]21, which would suggest a depletion also of iCMS3.
   Nonetheless, subtype switching of iCMS between matched primary and
   metastatic tumors was observed in almost a third of patients. This was
   noteworthy in particular since iCMS is only a two-state classification.
   Switches of cancer cell-intrinsic classes can be due to either clonal
   evolution or transition of differentiation states. Clonal evolution and
   selection of the minor clone is a possible explanation based on the
   non-exclusivity of iCMS classes among cells in each primary
   tumor^[237]20. However, phenotypic plasticity and cellular
   differentiation and dedifferentiation might be an essential trait for
   cancer metastasis^[238]42. The dynamic cellular states observed in
   models of CRC metastasis^[239]43 open up the possibility for cells to
   even transition between iCMS classes during metastasis and to
   eventually end up in their original iCMS in established metastatic
   tumors. According to this view, the heterogeneity is dependent on the
   timing of sampling and would therefore be underestimated in our study.
   The most frequently observed switch from iCMS2 primary tumors to iCMS3
   liver metastases is consistent with dedifferentiation from an
   LGR5-positive stem cell^[240]20, although our study was not
   sufficiently powered to confirm this predilection. Nonetheless, the
   profound phenotypic plasticity observed in at least a subset of
   patients challenges the potential reconciliation of subtyping schemes
   of primary and metastatic tumors, also of the congruent CMS proposed
   here. This supports the need for a de novo classification of metastases
   based on their in situ cellular states^[241]41.

   In conclusion, we describe transcriptomic features with prognostic
   value independently of the tumor microenvironment and in the context of
   intra-tumor heterogeneity of CRC. Multiregional transcriptomics
   captured cancer cell-intrinsic features with low intra-tumor
   heterogeneity, and identified congruent CMS classes that appeared to
   reconcile the prognostic potential of current classifications derived
   from single-cell and bulk transcriptomics. However, evidence of
   phenotypic plasticity during metastasis, even with a two-state cancer
   cell-intrinsic classification, indicated that reconciliation of primary
   and metastatic subtyping frameworks is challenging.

Methods

Patient material

   The study has been approved by the Regional Committee for Medical and
   Health Research Ethics, South Eastern Norway (REC numbers 1.2005.1629
   and 2010/1805). All patients provided written informed consent, and the
   study was conducted in accordance with the Declaration of Helsinki. All
   patients were treated according to national standard guidelines.
   Patient sex was assigned as registered in the medical records at Oslo
   University Hospital, Norway, and was not considered in the study
   design.

   A total of 1093 fresh frozen primary tumor and liver metastasis samples
   from 692 patients treated surgically for primary and/or metastatic CRC
   at Oslo University Hospital were analyzed for gene expression in the
   study. Samples were taken from surgical specimens at the operating
   theater and prior to pathological examination. Two to four
   multiregional samples (mean of 2.9) were taken from spatially distinct
   areas of each of 98 primary tumors from 96 patients treated in 2015 and
   2016 (n = 286 samples; Supplementary Data [242]1). Tumors with
   multiregional sampling had a diameter of at least 15 mm (median
   diameter of 40 mm, 95% CI 35–40), and multiple samples were not taken
   unless clearly spatially separated. There was no association between
   tumor size and the number of sampled regions from each tumor (p = 0.4
   by Kruskal–Wallis test; Supplementary Fig. [243]2a). RNA and DNA were
   extracted using the Qiagen AllPrep DNA/RNA/miRNA Universal Kit or
   DNA/RNA Mini Kit in accordance with the manufacturer’s protocol (Qiagen
   GmbH, Hilden, Germany). Cryosections of selected samples were stained
   with hematoxylin and eosin and evaluated for histologic tumor grade
   according to the WHO classification (5th edition)^[244]44, as well as
   morphological patterns previously associated with an image-based CMS
   classification^[245]10.

   Molecular data from single primary tumor samples of an additional 418
   patients treated between 2005 and 2013 have previously been published
   (Supplementary Data [246]1)^[247]9. Liver metastasis samples (n = 338)
   were collected from 191 patients treated by hepatic resection between
   2013 and 2018, and molecular data have previously been published for
   the majority (n = 280 samples from 1–7 liver lesions of each of 171
   patients)^[248]41. Patient-matched sets of primary tumor and metastasis
   samples were available from 35 patients (total n = 179 samples). The
   primary tumor from 22 of these patients (n = 51 samples) were included
   for longitudinal comparisons only and not otherwise analyzed in the
   study. Twenty-one (60%) of the patients with primary-metastasis samples
   had synchronous metastatic disease (liver metastases diagnosed within 6
   months of the primary tumor), and 14 (40%) had metachronous metastases.
   Eighteen (51%) received neoadjuvant chemotherapy for the sampled
   metastases, eight (23%) had previously received chemotherapy for
   primary and/or metastatic CRC, and nine (26%) were chemonaive at the
   time of sampling.

   Processed gene expression data and metadata of two external primary
   tumor series were downloaded from the SAGE Bionetworks Synapse platform
   ([249]https://www.synapse.org/#!Synapse:syn2634724) and used for
   validation analyses. This included 566 tumors from the [250]GSE39582
   series and 573 tumors from TCGA (Supplementary
   Data [251]12)^[252]32,[253]33. Processed single-cell RNA sequencing
   data and metadata for totally 17,678 cells from 12 paired samples of
   the tumor core and tumor border regions of each of six primary CRCs
   were downloaded from NCBI’s Gene Expression Omnibus (GEO) with
   accession number [254]GSE144735^[255]15.

MSI and mutation analyses

   MSI status of the multiregional primary tumor series was determined by
   PCR-based analyses of mononucleotide repeat markers using the Promega
   MSI Analysis System in accordance with the manufacturer’s protocol
   (Promega, Madison, WI, USA). Mutational hotspots in KRAS and NRAS exons
   2–4, as well as BRAF exon 15 (including codon 600) were analyzed by
   Sanger sequencing using the Cycle Sequencing Kit and 3730 DNA Analyzer
   (Applied Biosystems, Waltham, MA, USA) as previously described^[256]45.
   One randomly selected sample per tumor and all samples from tumors with
   discordant CMS classifications were analyzed (n = 158 samples).

   Tumor content has been confirmed in the multiregional samples by deep
   sequencing of a custom panel of twenty genes, using matched normal
   colonic mucosa samples as reference. Homogenous somatic single
   nucleotide variants or short insertion or deletions in APC, TP53, KRAS,
   NRAS, BRAF, PIK3CA and/or FBXW7 (same mutation present in all samples
   per tumor) were found in all tumors except one that was not scored, all
   with a mutant allele fraction above 5% (the data and additional details
   will be published elsewhere).

Gene expression profiling and data processing

   All in-house tumor samples have been analyzed for gene expression on
   high-resolution platforms (n = 1093; Supplementary Fig. [257]1).
   Multiregional primary tumor samples were analyzed on Affymetrix Human
   Transcriptome 2.0 arrays (HTA), using 100 ng of total RNA as input and
   following the manufacturer’s protocol (Thermo Fisher Scientific,
   Waltham, MA, USA). The extended single-sample primary tumor set has
   previously been analyzed on HTA (n = 217) or Affymetrix GeneChip Human
   Exon 1.0 ST arrays (HuEx; n = 201)^[258]9. Patient-matched
   primary-metastasis samples were analyzed on HTA (n = 23 patients and
   116 samples) or by total RNA sequencing (n = 12 patients and 63
   samples). The remaining liver metastases samples have been analyzed on
   HTA arrays^[259]41. RNA sequencing was performed in 2 × 101 base-pair
   paired-end mode on the Illumina HiSeq 4000 platform (Illumina, San
   Diego, CA, USA) at the Oslo University Hospital Genomics Core Facility
   to a median depth of 52.6 × 10^6 uniquely mapped read pairs per sample
   (10–90th percentile 40.5 × 10^6–71.6 × 10^6). Sample preparation was
   performed with ribosomal RNA depletion using the Ribo-Zero Gold rRNA
   removal kit and sequence library generation with the TruSeq Stranded
   Total RNA Library Prep Gold kit (Illumina).

   Raw intensity data CEL-files from microarray experiments were processed
   in five separate datasets (multiregional primary tumor samples, primary
   single-sample HTA, primary single-sample HuEx, all liver metastasis
   samples, all patient-matched primary-metastasis samples; Supplementary
   Fig. [260]1) according to the robust multi-array average
   approach^[261]46 using the function justRMA in the R package affy
   (v1.64.0)^[262]47 and custom CDF files from Brainarray
   (hta20hsgencodegcdf_23.0.0 and huex10sthsgencodegcdf_23.0.0). A batch
   effect from different lot numbers of the GeneChip™ WT Plus Reagent Kit
   was corrected among multiregional primary tumor samples with ComBat in
   the R package sva (v.3.36.0)^[263]48 using default parameters. Gene
   annotations were retrieved from GENCODE using the
   gencode.v29.annotation.gtf file. Only protein-coding genes were
   retained and genes on the Y chromosome were excluded. Entrez IDs were
   obtained using the R package org.Hs.eg.db (v.3.10.0) and gene symbols
   were updated with checkGeneSymbols in HGNChelper (v.0.8.1).

   Raw RNA sequencing reads were processed in a bioinformatics pipeline
   implemented with Snakemake (v.6.6.1) and using Python (v.3.9.5), Java
   (v.11.0.2) and PyYAML (v.5.4.1). The pipeline has previously been
   described and included adapter trimming with Trimmomatic (v.0.38), read
   alignment to the human reference genome GRCh38.p13 (v.41) using STAR
   (v.2.7.6a) with 2-pass mapping and the feature annotation file
   gencode.v41.annotation.gtf, quantification of reads mapping to
   protein-coding genes using the HTseq-count tool (v.2.0.2), and
   normalization of gene expression levels by estimation of transcripts
   per million (TPM) for non-overlapping exonic gene lengths^[264]49. The
   TPM values were log2-transformed after adding a pseudocount of 1.

Gene expression classification and enrichment analyses

   Tumor samples were classified according to CMS with the R package
   CMSclassifier (v.1.0.0)^[265]8 and using the function classifyCMS.RF
   with a custom posterior probability threshold of 0.4. The threshold was
   adjusted in the multiregional primary tumor set to lower the number of
   unclassified samples while retaining proportionality in the number of
   tumors with homogeneous and heterogeneous CMS classifications
   (Supplementary Fig. [266]31). CRIS classifications were assigned with
   the function cris_classifier in the R package CRISclassifier
   (v.1.0.0)^[267]19, using the inverse of log2-transformed gene
   expression data and default parameters (false discovery rate [FDR]
   <0.2). iCMS classification was performed using the approach and gene
   template described in the original publication^[268]20. Gene expression
   matrices on log2-scale were normalized to z-scores using ematAdjust and
   classified with the nearest template prediction approach using the
   function ntp and an FDR-threshold <0.05 in the R package CMScaller
   (v.2.0.1)^[269]34.

   Differential gene expression analyses were performed with limma as
   implemented in the function subDEG in CMScaller and with p value
   adjustment by the Benjamini–Hochberg procedure. Tumor-infiltrating
   cancer-associated fibroblasts and cytotoxic lymphocytes were estimated
   using the R package MCPcounter (v.1.2.0)^[270]50 on a combined and
   batch corrected gene expression dataset of all primary tumor samples
   (n = 704). Gene set enrichment analyses were performed with the R
   package topGO (v.2.38.1) using fisher statistics and the weight01
   algorithm, as well as with the WEB-based Gene SeT AnaLysis Toolkit
   (WebGestalt)^[271]51 using default settings, over-representation
   analysis in the Wikipathway cancer database, FDR < 0.05 and the
   complete list of protein-coding genes as reference. Sample group
   comparisons were performed with the subCamera function in CMScaller on
   a custom gene set collection relevant for CRC (n = 54; Supplementary
   Data [272]2) and with FDR adjustment of p values according to the
   Benjamini–Hochberg procedure. One random sample from each tumor in the
   multiregional sample set was selected for comparisons according to CMS
   heterogeneity (the analysis was repeated across all multiregional
   samples and showed highly similar results; Supplementary Data [273]2).
   Single-sample enrichment scores were estimated with gene set variation
   analysis using the gsva function in R package GSVA (v.1.34.0)^[274]52.

Intra-tumor transcriptomic heterogeneity

   For tumors with multiregional samples, intra-tumor heterogeneity was
   evaluated as discordant sample classifications within subtyping
   frameworks (the subtype representing ≥50% of samples per tumor was
   considered the major component) and by general transcriptomic
   heterogeneity. The latter was estimated as the maximum Euclidean
   distance of PC1–PC3 for any pair of samples from each tumor.

   For primary tumors with single samples, intra-tumor CMS heterogeneity
   was estimated based on enrichment scores for each CMS class (the
   approach is illustrated in Supplementary Fig. [275]6). The
   single-sample HTA and HuEx datasets were analyzed separately. First,
   template gene sets for each of the four CMS classes were identified by
   differential gene expression analyses comparing tumors in each class
   with the rest using limma (Benjamini–Hochberg adjusted p value < 0.001
   and log2 fold change > |1.0|; Supplementary Data [276]13 and [277]14).
   Second, enrichment scores for each CMS-specific template gene set in
   each sample were obtained using the gsva function in the R package GSVA
   for up-regulated genes only, and with the functions simpleScore and
   rankGenes in the R package singscore (v.1.6.0)^[278]25 for up- and
   down-regulated genes combined. The CMS enrichment scores were evaluated
   in a similar analysis of the multiregional sample set, and the
   strongest correlations to the posterior probabilities from the original
   random forest CMSclassifier were found for the singscore enrichments
   (Spearman’s ρ > 0.8; Supplementary Fig. [279]32). Singscore also
   provided functions to evaluate statistical significance (generateNull
   and getPvals) and was selected for further analyses. Single-sample
   tumors were considered unclassified if none of the four CMS enrichment
   scores were significant, and classified with intra-tumor CMS
   heterogeneity if more than one was significant (p < 0.05). The approach
   was further evaluated in the multiregional sample set using the CMS
   template gene sets derived from single-sample tumors analyzed on the
   same platform (HTA). The major subtype of each multiregional sample was
   largely concordant with assignments from the original random forest
   CMSclassifier with an overall accuracy of 85% (Cohen’s κ = 0.77), and
   the majority (84%) of misclassified samples were from heterogeneous
   tumors (Supplementary Fig. [280]8a). The accuracy of computational
   intra-tumor CMS heterogeneity classifications (at least one sample
   classified as heterogeneous per tumor) was 72% relative to the
   spatially resolved analysis of multiregional samples (sensitivity of
   73% and specificity of 68%).

Gene-wise intra-tumor heterogeneity

   Intra-tumor heterogeneity of the expression level of each
   protein-coding gene (n = 18,823) was estimated in the multiregional
   sample set using a previously published method^[281]53. In brief, a
   linear mixed effects model was fitted for each gene across all samples
   from all tumors using the function lmer in the R package lme4
   (v.1.1-29)^[282]54 and with “tumor” as the random effect. Intra-class
   correlation coefficients (ICCs) were calculated for each model (gene)
   using the function icc in the R package performance (v.0.10.4)^[283]55:
   [MATH: <mi
   mathvariant="normal">ICC</mi><mo>=</mo><mfrac><mrow><msubsup><mrow><mi
   mathvariant="normal">σ</mi></mrow><mrow><mi>i</mi></mrow><mrow><mn>2</m
   n></mrow></msubsup></mrow><mrow><msubsup><mrow><mi
   mathvariant="normal">σ</mi></mrow><mrow><mi>i</mi></mrow><mrow><mn>2</m
   n></mrow></msubsup><mo>+</mo><msubsup><mrow><mi
   mathvariant="normal">σ</mi></mrow><mrow><mi>ϵ</mi></mrow><mrow><mn>2</m
   n></mrow></msubsup></mrow></mfrac> :MATH]
   1

   Here,
   [MATH: <msubsup><mrow><mi
   mathvariant="normal">σ</mi></mrow><mrow><mi>i</mi></mrow><mrow><mn>2</m
   n></mrow></msubsup> :MATH]
   is the random effects variance, that is, the variance explained by the
   grouping structure (tumor) and
   [MATH: <msubsup><mrow><mi
   mathvariant="normal">σ</mi></mrow><mrow><mi>ϵ</mi></mrow><mrow><mn>2</m
   n></mrow></msubsup> :MATH]
   is the residual variance. An ITH-score for each gene was calculated as:
   [MATH: <msub><mrow><mi mathvariant="normal">ITH</mi></mrow><mrow><mi
   mathvariant="normal">gene</mi></mrow></msub><mo>=</mo><mn>1</mn><mo>−</
   mo><msub><mrow><mi mathvariant="normal">ICC</mi></mrow><mrow><mi
   mathvariant="normal">gene</mi></mrow></msub> :MATH]
   2

   Genes with low expression variation across the dataset (10–90th
   percentile range <1; n = 15,585 genes) were considered non-informative
   and filtered out, retaining 3238 genes (17.2%) for analyses
   (Supplementary Fig. [284]11 and Supplementary Data [285]8). Genes were
   categorized according to the ITH-score using the previously published
   thresholds in four categories^[286]53, or custom thresholds in the
   three categories ITH-low, ITH-intermediate and ITH-high (Supplementary
   Fig. [287]12 and Supplementary Data [288]9). The two different
   thresholds to score ITH-low genes were compared in gene set enrichment
   analyses and showed largely concordant results (Supplementary
   Fig. [289]33). The custom threshold retained the largest number of
   ITH-low genes and was used for further analyses.

Tumor classification based on ITH-low genes

   Subtype discovery based on ITH-low genes was performed by the NMF
   approach implemented in the R package NMF (v.0.23.0)^[290]56 using the
   function nmf with the brunet method^[291]57, predefined ranks 2–10 and
   nrun = 100 on the inverse of log2-transformed gene expression data. The
   cluster number (k) preceding the first, large drop in the silhouette
   width and cophenetic score was selected as the optimal number of
   clusters. To evaluate a potential impact of the use of different gene
   expression platforms and the inclusion of multiregional samples for a
   subset of tumors, NMF was run both for the complete primary tumor
   sample set (n = 704 samples from 516 tumors) and for single, randomly
   selected samples from each of the primary tumors analyzed on HTA
   (n = 315). This resulted in k = 5 and k = 4 optimal sample clusters,
   respectively (Supplementary Fig. [292]18). There was a near perfect
   concordance in sample clustering between the two runs at k = 4
   (considering overlapping samples between the two sets only; accuracy
   97%, Cohen’s κ = 0.96; Supplementary Fig. [293]19a). In the full sample
   set, the largest sample cluster from k = 4 was subdivided into two
   clusters at k = 5 (Supplementary Fig. [294]19b, c), but gene set
   enrichment analyses showed little discrimination between the two
   clusters (Supplementary Fig. [295]20). The full tumor series and NMF at
   k = 4 was therefore used for further analyses, to strengthen the
   biological and statistical rigor. NMF classification was also tested
   using ITH-low genes defined by the previously published scoring
   threshold as a template (ITH-score 0–0.2; n = 396 genes)^[296]53. This
   resulted in only two sample clusters differentiated mainly based on
   MSI/MSS-like gene expression characteristics (Supplementary
   Fig. [297]21). Classification of liver metastases by NMF was also based
   on genes identified as ITH-low in primary tumors. Alluvial diagrams
   were plotted using the R package ggalluvial (v.0.12.4).

Classification of external tumor series based on ITH-low genes

   The ITH-low gene set was filtered prior to analyses of two external
   tumor series, due to variation in gene expression platforms. Tumors in
   [298]GSE39582 (n = 566) were analyzed on Affymetrix Human Genome U133
   Plus 2.0 Arrays, and probe sets were mapped to unique gene symbols
   using the function collapseRows in the R package WGCNA (v.1.72-5) with
   default settings^[299]58. Genes with low median expression (<4 on
   log2-scale) or variance (<0.1) across the tumors were filtered out,
   retaining 1217 (79%) of the ITH-low genes. Filtering with the same
   thresholds in TCGA (n = 573 tumors analyzed by RNA sequencing) retained
   1387 (90%) of the ITH-low genes. The two tumor series were classified
   separately according to the same approach as in the in-house series,
   using NMF with predefined ranks 2–6 on the filtered set of ITH-low
   genes. Pathway enrichment analysis of differentially expressed genes
   between NMF subtypes (limma: Benjamini–Hochberg adjusted p
   value < 0.001 and log2 fold change > |1.0|) was performed with the R
   package pathfindR (v.2.3.0) using default settings, including testing
   of the KEGG pathway database and Benjamini–Hochberg adjustment of p
   values^[300]59.

Statistical analyses

   All statistical analyses were performed in R v.4.2.2^[301]60. Two-sided
   p values < 0.05, or adjusted p values as specified, were considered
   significant. Principal components analysis was performed with the
   prcomp function in the package stats (v.4.2.2) based on the genes with
   highest cross-sample variance (n = 1000). Pearson’s and Spearman’s
   correlations were calculated and visualized using the functions cor,
   cor.mtest and corrplot in the R package corrplot (v.0.92), and with
   conf.level = 0.95. Odds ratios and 95% CIs were estimated with Fishers’
   exact test (fisher.test), and were together with Pearson’s chi-squared
   test (chisq.test) and Welch’s two sample t-test (t.test) used to
   evaluate associations between clinicopathological parameters and sample
   groups according to molecular characteristics. Classification accuracy
   and Cohen’s κ were estimated with the function confusionMatrix in the
   package caret (v.6.0-93). The center line of box plots represents the
   median, boxes represent the interquartile range, and whiskers represent
   1.5× the interquartile range above the 75th percentile (maxima) or
   below the 25th percentile (minima).

   Survival analyses were performed for patients with stage I–III CRC
   (unless otherwise stated) treated by complete tumor resection (residual
   tumor status R0) and with no pre-surgical chemoradiation or synchronous
   tumors. Five-year RFS was the primary endpoint and estimated as time
   from surgery to relapse or death from any cause. Patients with no
   events were censored after 5 years or at last follow-up. Overall
   survival was evaluated as the time from surgery to death from any
   cause. Multivariable and univariable Cox proportional hazards models
   were estimated using the coxph function in the survival package
   (v.3.4-0) with p values from Wald tests. The proportional hazards
   assumption was assessed for all models using the cox.zph function, and
   all variables met the assumption, except for patient age or KRAS
   mutation status in multivariable models including gene expression
   subtypes. Stratification of models according to these variables did not
   have a strong impact on the results (Supplementary Data [302]6 and
   [303]11). Kaplan–Meier plots were generated with the ggsurvplot
   function in the survminer package (v.0.4.9), with p values from Wald
   test. The proportion of explained variation in 5-year RFS by each
   variable in multivariable models was calculated using rsq in the
   survMisc package (v.0.5.6)^[304]61, and bootstrapped with 5000
   iterations and sampling with replacement. Survival analyses of the
   [305]GSE39582 series were also performed for patients with stage I–III
   cancers and with 5-year RFS as the endpoint (n = 493 patients with
   follow-up data). Survival analysis of the TCGA series was not performed
   due to short follow-up time of the majority of patients (70% were lost
   to follow-up during the first 12 months).

Reporting summary

   Further information on research design is available in the [306]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [307]Supplementary Information^ (2.5MB, pdf)
   [308]Peer Review File^ (856.5KB, pdf)
   [309]Reporting Summary^ (1.6MB, pdf)
   [310]41467_2024_48706_MOESM4_ESM.pdf^ (108.1KB, pdf)

   Description of Additional Supplementary Files
   [311]Supplementary Data 1-14^ (1.5MB, xlsx)

Source data

   [312]Source Data^ (2.8MB, xlsx)

Acknowledgements