Graphical abstract graphic file with name fx1.jpg [53]Open in a new tab Highlights * • An integrative omics study of early-onset colorectal cancer in a Chinese cohort * • Omics architecture supports early-onset tumorigenesis and therapeutic development * • Mutational landscape varies by ancestry, and LMTK3 mutation has Chinese specificity * • LMTK3 works at DNA repair process, immunophenotype, and drug target __________________________________________________________________ The molecular characterization of early-onset colorectal cancer remains undetermined. We report a multi-omics landscape of young colorectal tumors in a Chinese cohort. These profiles reveal distinct molecular alterations and immunophenotypes in early-onset tumorigenesis and provide an ancestry-specific LMTK3 as a modulator, biomarker, and therapeutic target for precision oncology. Introduction Colorectal cancer is estimated to be the third most common cancer and the second leading cause of cancer death worldwide.[54]^1 Although the incidence has generally declined in high-income countries, mainly due to cancer screenings,[55]^2 the increasing rate of patients with early-onset colorectal cancer (EOCRC), who are diagnosed before 50 years of age, is of great concern, especially in China.[56]^3^,[57]^4 Currently, the reasons for this widespread increase in EOCRC remain elusive, but accumulating studies have revealed several salient characteristics, such as greater consumption of a Western-style diet, obesity, and physical inactivity as well as molecular alterations in driver genes of APC, TP53, and KRAS,[58]^5^,[59]^6^,[60]^7 indicating that these established risk factors for late-onset colorectal cancer (LOCRC) also play important roles in EOCRC. However, notably, compared with patients with LOCRC, patients with EOCRC are more likely to be male, have less environmental exposure accumulation, have a history of inflammatory bowel disease or a family history of colorectal cancer, and have left-sided and advanced-stage tumors with genetic and molecular heterogeneity.[61]^5^,[62]^6^,[63]^8 A recent report described a higher tumor mutation burden (TMB) in African American patients than in White patients as well as different mutation frequencies in driver genes between diverse populations of patients with EOCRC and LOCRC.[64]^9^,[65]^10 These findings supported a differential mutational landscape across both different ages of onset and diverse ancestries. In addition, to address the extensive molecular complexity of this disease, previous studies of colorectal cancer cohorts of The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium have described multi-omics features related to colorectal tumorigenesis, progression, and therapeutic development.[66]^11^,[67]^12 However, these studies focused more on overall colorectal cancer in non-Asian populations, generating a knowledge gap in age-of-onset-specific etiological processes of colorectal tumorigenesis. In this molecular characterization study, we profiled the genome, epigenome, transcriptome, and proteome of EOCRC tumors based on a Chinese colorectal cancer cohort. Identifying the distinct multi-omics architecture of these EOCRC tumors compared with LOCRC tumors would improve the understanding of age-of-onset disparities in colorectal cancer etiology and provide valuable insight into the development of precision therapies. Results Overview of population features and multi-omics profiling The demographic characteristics of patients with colorectal cancer are summarized in [68]Table S1. A total of 79 patients with colorectal cancer were enrolled from the Nanjing Colorectal Cancer (NJCRC) cohort ([69]Figure 1A), namely, 30 (37.97%) patients with EOCRC and 49 (62.03%) patients with LOCRC. All patients provided tumor tissues and normal adjacent tissues (NATs) for multi-omics profiling ([70]Figure 1B). Principal-component analysis clearly separated tumors and NATs among the epigenomic, transcriptomic, and proteomic spectra ([71]Figure S1A). For age-related features, we observed a strong positive correlation between chronological age at diagnosis and DNA methylation age (DNAmAge) in tissues (r[Tumor] = 0.566, p[Tumor] = 7.64e−5; r[NAT] = 0.910, p[NAT] < 2.2e−16; [72]Figure S1B). Furthermore, the pathway enrichment analysis for the transcriptomic comparison between EOCRC and LOCRC showed that the top 10 significant Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in either tumors or NATs included pathways related to aging-associated diseases ([73]Figure S1C). These findings highlight the ability to identify EOCRC-specific molecular features at multi-omics layers. Figure 1. [74]Figure 1 [75]Open in a new tab Overview of population features and multi-omics profiling (A) Multi-omics profiling for designedly recruited NJCRC patients. The histogram plot indicates the age distribution of candidate patients with colorectal cancer for omics analyses; the density plot presents the age distribution of overall patients with colorectal cancer in NJCRC. For the definition of age-of-onset-subgroup-specific molecules, omics molecules located in the yellow pie plot with dashed lines are specific to EOCRC. BioRender supports the icon. (B) Illustration of 79 paired colorectal cancer samples used in the individual omics measurements. The genome, transcriptome, epigenome (DNA methylation), and proteome are colored blue/green (WGS/WES), orange, yellow, and gray, respectively. For each omics dataset, the top panel represents tumor tissues, and the bottom panel represents NATs from the same donor. Genomic alterations in EOCRC tumors There were a total of 19,251 somatic protein altering events, namely 17,442 somatic single-nucleotide variants (SNVs) and 1,809 somatic insertions or deletions (indels), in 9,815 genes ([76]Table S2A) according to a combination of whole-genome sequencing (WGS) and whole-exome sequencing (WES). At the gene level, we identified 11 significantly mutated genes (SMGs) in NJCRC tumors ([77]Figure 2A; [78]Table S2B), including 7 well-defined driver genes of colorectal cancer[79]^13 (i.e., TP53 [65.8%], APC [63.3%], KRAS [39.2%], FBXW7 [11.4%], RNF43 [10.1%], TCF7L2 [10.1%], and ACVR2A [6.3%]; [80]Figures 2A, [81]S2A, and S2B; [82]Table S2B). Of note, NJCRC tumors showed significantly lower mutation frequencies of APC and FBXW7 than those from the TCGA European cohort (APC: 63.3% versus 90.7%, p = 1.68e−7; FBXW7: 11.4% versus 35.8%, p = 2.49e−5; [83]Figure S2B) and marginally lower mutation frequencies of TP53 and KRAS (TP53: 65.8% versus 77%, p = 0.054; KRAS: 39.2% versus 50.4%, p = 0.091). In addition, BRAF and PTEN were found to have extremely low mutation frequencies in NJCRC tumors (BRAF: 3.8% and PTEN: 5.1%; [84]Table S2A). These results were consistent with a previous ancestry comparison between Asian and European subjects.[85]^14 In addition, although no additional SMGs were identified separately in EOCRC and LOCRC tumors ([86]Figure 2B; [87]Table S2B), three well-known driver genes (APC, TP53, and KRAS) were shared across subgroups.[88]^5^,[89]^6 The statistical power for detecting SMGs across each group was more than 80% ([90]Table S2B). Figure 2. [91]Figure 2 [92]Open in a new tab Genomic alteration in EOCRC tumors from NJCRC (A) Mutation spectrum of 11 SMGs. (B) Venn diagrams presenting the overlap of SMGs among groups of all samples, EOCRC, and LOCRC. (C) Differential mutational features between age-of-onset subgroups of EOCRC and LOCRC, including TMB, proportions of six mutational subtypes, and mutational signatures by COSMIC. p value was calculated via Wilcoxon rank-sum test. (D and E) Somatic copy-number alterations in colorectal tumors. UpSetR plots show the overlap of somatic copy-number alterations (SCNAs) at the arm level (D) and the focal level (E, left) across groups of all samples, EOCRC, and LOCRC. The bar plot shows the mutation rate of EOCRC-specific SCNAs at the focal level (E, right). SCNA includes amplification (Amp) and deletion (Del). In terms of the global genomic alterations, we observed a significantly higher TMB in EOCRC tumors than in LOCRC tumors (median = 3.92/Mb versus 2.63/Mb, p = 0.031; [93]Figure 2C) but a non-significant difference in microsatellite instability (MSI) status or hypermutation status ([94]Table S2C). For mutational subtypes of transition and transversion in [95]Figure 2C and [96]Table S2C, EOCRC tumors frequently had more mutations of the T > G subtype (p = 2.74e−4), but fewer of the C > A (p = 0.029) and C > G subtypes (p = 0.040), which could possibly contribute to mutational signatures in the Catalog of Somatic Mutations in Cancer (COSMIC) with increased proportions of signatures 3 (associated with failure of DNA repair), 9 (attributed to polymerase η), 17 (unknown), and 28 (unknown) but decreased proportions of signature 1 (related to age of cancer diagnosis). Regarding somatic copy-number alterations (SCNAs), 11 amplifications and 7 deletions at the arm level were identified across 79 tumors ([97]Figures 2D and [98]S2C; [99]Table S2D), including previously reported arm amplifications of 1q, 7p and q, 8p and q, 13q, and 20p and q, as well as deletions of 17p and q and 18p and q.[100]^11 In age-of-onset subgroups, we observed distinct SCNAs between EOCRC and LOCRC tumors ([101]Figures 2D and [102]S2C; [103]Table S2D), including some arm-level SCNAs prone to appear in young tumors, such as two well-documented deletions (14q and 15q).[104]^15^,[105]^16 Similarly, focal SCNA events displayed differential patterns across each group ([106]Figures 2E, [107]S2D, and S2E; [108]Table S2E). EOCRC-specific molecular events across epigenomic, transcriptomic, and proteomic levels At a global view of DNA methylation, we found a higher but non-significant occurrence of the CpG island methylator phenotype (CIMP) in EOCRC tumors (66.7%) than in LOCRC tumors (47.4%; [109]Table S3), similar to previous studies.[110]^5 Furthermore, we noted a dramatic discordance of DNA methylation in EOCRC tumors that the correlation of DNAmAge and chronological age at diagnosis remained significant within LOCRC tissues (r[Tumor] = 0.579, p[Tumor] = 0.009; r[NAT] = 0.717, p[NAT] = 5.45e−4) and EOCRC NATs (r[NAT] = 0.837, p[NAT] = 3.52e−7) but diminished within EOCRC tumors (r[Tumor] = −0.124, p[Tumor] = 0.565; [111]Figure 3A). Further examination of DNA methylation patterns revealed evident and site-specific differentially methylated positions ([112]Figures S3A and S3B) and regions ([113]Figure S3C) derived from each group, and the following pathway enrichment analysis disclosed 52 significant KEGG pathways harboring six pathways specific to EOCRC that included two cancer pathways (colorectal cancer and endometrial cancer; [114]Figure S3D). Figure 3. [115]Figure 3 [116]Open in a new tab EOCRC-specific molecular events across epigenomic, transcriptomic, and proteomic levels (A) Correlations between chronological age at diagnosis of colorectal cancer and DNAmAge across tumors and NATs stratified by age-of-onset subgroups. Pearson’s correlation test was used to calculate correlation coefficients (r) and p values. (B) Differences in abundances of six immune cell types between EOCRC and LOCRC. p value was calculated via Wilcoxon rank-sum test. (C) Heatmap of TMB and TIME in four clusters at “hot,” “pre-hot,” “cold,” and “high TMB” immunophenotypes. K-means clustering by column is 4, accompanied by hierarchical clustering. (D) Differential distributions of four immunophenotypes in age-of-onset subgroups. (E) Pathway enrichment analysis for the age-related proteome. A total of 1,192 proteins were assigned to age-related proteins via differential expression analysis of the proteome between EOCRC and LOCRC tumors at p < 0.05. (F) Differential expression pattern of platinum drug resistance-related genes. All genes existed in both transcriptomic and proteomic comparisons between EOCRC and LOCRC tumors. ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, calculated via t test; ns, not significant; FC, fold change in LOCRC versus EOCRC. The tumor immune microenvironment (TIME) decomposed from transcriptomic profiles showed significantly higher immune infiltration in EOCRC tumors than in LOCRC tumors, which was probably attributed to immunosenescence[117]^17 ([118]Figure 3B; [119]Table S3). Considering that both TMB and TIME have been explored as independent biomarkers of immunotherapy efficacy,[120]^18^,[121]^19 we performed hierarchical clustering to refine tumor subtypes and identified four distinct immunophenotypes, namely, “hot,” “pre-hot,” “cold,” and “high TMB” ([122]Figure 3C), as well as a dramatically high proportion of hot tumors in EOCRC (60%; [123]Figure 3D). Notably, these immunophenotypes could dramatically stratify 5-year survival in patients with EOCRC (p = 0.003) as validated in the TCGA cohort ([124]Figures S3E–S3G). In addition, we observed a similar distribution of consensus molecular subtypes (CMS) subgroups at 17.7%, 40.5%, 19%, 21.5%, and 1.3% for CMS1, CMS2, CMS3, CMS4, and CMS1/CMS2, respectively, as reported previously,[125]^12 but no significant differences between EOCRC and LOCRC tumors were found ([126]Table S3), which differed from a recent study.[127]^20 Subsequently, among differentially expressed genes generated by comparing tumors versus NATs across each group ([128]Figure S3H), 130 genes were specifically assigned to EOCRC (47 upregulated and 83 downregulated; [129]Figure S3I), and the corresponding enrichment analysis indicated six EOCRC-specific KEGG pathways ([130]Figure S3J), including fatty acid degradation (FAD), mismatch repair, and cell adhesion molecules (CAMs). Regarding the proteome, more than 100 differentially expressed proteins between tumors and NATs distinguished each group ([131]Figure S3K) and included four downregulated proteins with EOCRC specificity ([132]Figure S3L). Intriguingly, among the 23 significant KEGG pathways enriched in differential proteins ([133]Figure S3M), two EOCRC-specific KEGG pathways were consistent with those found in the transcriptomic comparison (CAMs and FAD; [134]Figure S3J), sharing two downregulated genes in EOCRC tumors (ADH1B in FAD, p[mRNA] = 1.96e−10 and p[Protein] = 6.67e−6; CADM3 in CAMs, p[mRNA] = 2.06e−13 and p[Protein] = 6.52e−6; [135]Figure S3N). In addition, we profiled the age-related proteome by comparing EOCRC and LOCRC tumors, and as shown in [136]Figure 3E, the longevity regulating pathway was enriched, similar to the transcriptomic findings ([137]Figure S1C). Notably, 13 genes linked to a clinical management pathway of platinum drug resistance ([138]Figure 3E) displayed inconsistent expression patterns, and most genes were more highly expressed at the mRNA level but less expressed at the protein level in EOCRC tumors relative to LOCRC tumors ([139]Figure 3F; [140]Table S3). Integrative omics architecture for EOCRC and clinical therapeutics Next, we evaluated cis- and trans-effects of SCNAs on both mRNA and protein abundances. For cis-effects of SCNAs on mRNA abundance, we observed strong positive correlations along the diagonal ([141]Figure S4A). Most diagonal signals corresponded to arm-level changes, consistent with previous studies (e.g., chromosomes 7, 13, and 20; [142]Table S2D).[143]^11^,[144]^21 Additionally, we confirmed a previously reported trans-acting SCNA hotspot on chromosome 20q at the mRNA level ([145]Figure S4A). In contrast, both cis- and trans-effects of SCNAs on protein abundance were considerably sparse ([146]Figure S4A), probably due to the limited number of proteomic samples. Notably, each age-of-onset subgroup displayed similar correlation patterns, but cis-/trans-effects were apparently dampened in EOCRC tumors ([147]Figure 4A). Furthermore, in the annotation of the most amplified region of 20q13.12 specific to EOCRC ([148]Figure 2E), we found a correlation hotspot at the mRNA level that disappeared at the protein level ([149]Table S2F). Figure 4. [150]Figure 4 [151]Open in a new tab Integrative omics architecture for EOCRC and clinical therapeutics (A) Correlations between SCNAs and the abundance of mRNA and protein across age-of-onset subgroups of EOCRC (left) and LOCRC (right). Significant positive and negative correlations are indicated in red and green, respectively (p < 0.01). SCNA-driven cis-effects appear as the red diagonal line; trans-effects appear as vertical red and green lines. The accompanying histograms show the number of significant cis- and trans-events (p < 0.01) corresponding to the indicated genomic loci (blue bars upward) as well as the overlap between SCNA-mRNA and SCNA-protein events (black bars downward). (B) Gene-wise correlation of mRNA and protein abundances in tumors and NATs across age-of-onset subgroups of EOCRC (left) and LOCRC (right). Correlation coefficient was calculated via Spearman’s correlation analysis. (C) Pathway enrichment annotation for significant positive mRNA-protein pairs within EOCRC (left) and LOCRC (right) tumors. (D) Profiling of differential expression patterns of immune-related genes. A total of 223 of 770 genes were shared in both transcriptomic and proteomic datasets. Volcano plots indicate the comparison between tumors of EOCRC and LOCRC at transcriptomic (left) and proteomic (right) levels. Brown lines represent criteria of p < 0.05 (calculated via t test) and log2FC >1 for significantly differential genes, where upregulated genes in LOCRC tumors are colored red and upregulated genes in EOCRC tumors are colored blue. Other genes are colored gray. To understand the regulatory relationship between transcriptomic and proteomic processes, we calculated gene-wise (inter-sample) and sample-wise (intra-sample) correlations. Gene-wise mRNA-protein correlations displayed striking differences within tumors (r[mean] = 0.229) and NATs (r[mean] = 0.026) ([152]Figure S4B), consistent with previous studies,[153]^11^,[154]^21 and each age-of-onset subgroup had a similar correlation (EOCRC: r[Tumor] = 0.256 versus r[NAT] = 0.035; LOCRC: r[Tumor] = 0.291 versus r[NAT] = 0.108; [155]Figure 4B), which indicated more active translational efficiency or less degradation of proteins underlying posttranscriptional mechanisms that occurred in colorectal tumors. In addition, there was a consistently positive correlation in fold changes of mRNA versus protein across each subgroup ([156]Figure S4C). Furthermore, 29.36% and 41.59% of mRNA-protein pairs had significant positive Spearman’s correlations in EOCRC and LOCRC tumors, respectively (p < 0.05; [157]Figure 4B), along with distinct network clusters where DNA damage repair pathways and metabolic networks were markedly enriched in EOCRC tumors ([158]Figures 4C and [159]S4D). In terms of sample-wise mRNA-protein correlations, we detected a fairly consistent correlation in tumors (r = 0.294) and NATs (r = 0.221), but these correlations were not associated with age of onset ([160]Figure S4E). For potential therapeutic targets of colorectal cancer, we attempted to assess gene expression patterns for cancer testis (CT) antigens,[161]^22 pan-essential genes,[162]^23 and a PanCancer Immune Profiling panel[163]^24 using omics datasets of this study. Interestingly, some genes had differential expression at both mRNA and protein levels in comparisons of tumors versus NATs and EOCRC versus LOCRC tumors ([164]Figures S4F–S4H and [165]4D). In particular, one gene from the immune panel, CCL21, was consistently significantly increased in EOCRC tumors compared with LOCRC tumors at both mRNA and protein levels, despite its lower expression in tumors compared with NATs ([166]Figure 4D). LMTK3 mutation specific to EOCRC independently triggers molecular alterations in tumors In addition to SMGs, we found nine differentially mutated genes (DMGs) specifically in EOCRC tumors compared with LOCRC tumors (p < 0.01; [167]Figure 5A; [168]Table S4), which had a significant performance in distinguishing the two age-of-onset subgroups with an area under the receiver operating characteristic curve (AUC) of 0.893 ([169]Figure S5A). Importantly, LMTK3 (p = 0.016) and GPR98 (p = 0.019) were also mutated in EOCRC tumors (versus LOCRC tumors) in the International Cancer Genome Consortium (ICGC) Chinese cohort ([170]Figure S5B; [171]Table S4) but not in Europeans ([172]Figure 5B; [173]Table S4). Accordingly, five genes exhibited ancestry-specific higher mutation rates in EOCRC tumors from the NJCRC cohort than the TCGA European cohort; one of these genes was LMTK3 (20% in NJCRC versus 2.78% in TCGA, p = 0.041; [174]Figure 5C). Collectively, we focused on the consequences of LMTK3 mutation on molecular phenotypes and clinical outcomes. Figure 5. [175]Figure 5 [176]Open in a new tab Identification of LMTK3 specifically mutated in EOCRC tumors alongside the multi-omics annotation (A–C) Differential mutation rates of 9 DMGs between age-of-onset subgroups in the NJCRC (A) and TCGA European colorectal cancer cohorts (B) and between EOCRC tumors of the two cohorts (C). p values (∗∗p < 0.01) and odds ratios were calculated via Fisher’s exact test. NA, not available because of zero mutation in the TCGA cohort. (D) Mutational lolliplot of LMTK3 mutations in colorectal tumors of NJCRC (upward) and ICGC (downward) cohorts. Mutations are shaped in circles and colored differently. (E) Significantly co-occurring genes with LMTK3 in NJCRC tumors. DDR, DNA damage response. p value was derived from Fisher’s exact test. (F) A cartoon comparison plot of LMTK3 protein structure between wild type and mutant. Wild-type LMTK3 is colored green, and the mutant is colored light blue. The red stick indicates the amino acid with/without LMTK3 mutations. (G–K) Differences in TMB (G), somatic signature (H), immune infiltration (I), immunophenotype (J), and LMTK3 expression (K) between LMTK3 mutant and wild-type tumors in NJCRC. p value was calculated via Wilcoxon rank-sum test. Seven mutations (six missense and one frameshift) existed in LMTK3 derived from six EOCRC tumors harboring over 90% statistical power ([177]Figure 5D; [178]Table S2B). They were mutually mutated with 157 genes (p < 0.05; [179]Table S5A), including five colorectal cancer driver genes[180]^13 ([181]Figure 5E), and altered LMTK3 protein structure ([182]Figure 5F). LMTK3 mutant tumors had a significantly higher TMB (p = 0.036; [183]Figure 5G; [184]Table S5B) and increased proportions of signatures 3, 9, 17, and 28 ([185]Figure 5H; [186]Table S5B) but decreased signature 1 ([187]Table S5B). No significant differences in MSI status or SCNAs were elicited by mutant LMTK3 ([188]Table S5B). Given that signature 3 is strongly associated with the failure of DNA repair by homologous recombination,[189]^25 we further evaluated co-occurring mutation patterns of 450 DNA damage response (DDR) genes.[190]^26 Among them, both SHPRH (p = 0.014) and POLE (p = 0.027) were concurrently mutated with LMTK3 ([191]Figure 5E; [192]Table S5A), but their mRNA levels had no correlation and were not affected by their own mutations ([193]Figures S5C and S5D). Regarding the epigenome, LMTK3 mutant tumors had an increased but non-significant proportion of low CIMPs, from 56.8% to 66.7% ([194]Table S5B). In addition, LMTK3 mutations did not alter the difference between DNAmAge and chronological age at diagnosis in EOCRC tumors ([195]Figure S5E). Among the 37 CpGs in LMTK3 ([196]Figure S5F), cg13996617 located in the gene body was significantly hyper-methylated in tumors, especially in EOCRC tumors (p = 2.64e−11; [197]Figure S5G), but it was not correlated with LMTK3 expression or mutation ([198]Figure S5H). In the transcriptome, LMTK3 mutations were independently associated with increased proportions of CD8^+ T cells and macrophages (p = 0.012 and 0.008, respectively; [199]Figure 5I; [200]Table S5B) as well as tumors with a hot immunophenotype ([201]Figure 5J) but not the expression of corresponding immune cell markers ([202]Figure S5C). In addition, LMTK3 mutant tumors had an increased but non-significant prevalence of CMS1 (33.3%; referred to as a microsatellite unstable/immune subtype) relative to the wild type (16.7%; [203]Figure S5I). Furthermore, among 5,887 mRNA-protein pairs in tumors, 229 were significantly affected by LMTK3 mutations, intriguingly including 161 coordinately and 68 exclusively expressed genes ([204]Figure S5J). These coordinate genes were enriched into 25 significant pathways, including four well-defined oncogenic pathways involving colorectal cancer development and immunotherapy, namely antigen processing and presentation,[205]^27 the transforming growth factor β (TGF-β) signaling pathway,[206]^28 the interleukin-17 (IL-17) signaling pathway,[207]^29 and the mRNA surveillance pathway,[208]^30 and clustered into five informative networks ([209]Figure S5K). Moreover, LMTK3 expression was dramatically elevated via LMTK3 mutations (p = 0.007; [210]Figure 5K) but not via SCNA ([211]Figure S5L) or DNA methylation ([212]Figure S5H) in LMTK3. LMTK3 differs in gene expression as a promising biomarker and target in clinical implications LMTK3 was significantly upregulated in the majority of tumors ([213]Figure S6A). In colorectal cancer, regardless of onset age, both mRNA and protein abundances of LMTK3 were highly expressed in tumors ([214]Figures 6A and 6B) but were not associated with 5-year survival (as was mutation; [215]Figure S6B) or tumor stage ([216]Figure S6C). Similar to a previous study in breast cancer,[217]^31 we observed significant mutual correlations among cascade genes of LMTK3, PRKCA-AKT1-FOXO3 signaling, and ESR1 in NATs (r[LMTK3-AKT1] = 0.56, p = 1.07e−7; r[LMTK3-FOXO3] = 0.42, p = 1.43e−4; r[LMTK3-ESR1] = 0.34, p = 1.92e−3; [218]Figure S6D) that were attenuated in tumors; however, LMTK3 mutations did not alter the gene expression pattern of PRKCA-AKT1-FOXO3 signaling ([219]Figure S5C). We further dissected the LMTK3 expression pattern in TIME of colorectal tumors at the single-cell level ([220]Figure S6E) and intuitively observed that LMTK3 was predominantly expressed in epithelial cells and significantly increased in EOCRC tumors, especially within B cells and stromal cells ([221]Figure 6C), but not in corresponding cell subtypes ([222]Figure S6F). Figure 6. [223]Figure 6 [224]Open in a new tab Differential expression pattern and clinical implication of LMTK3 (A) Differential expression of LMTK3 between colorectal tumors and NATs. p value was calculated via Wilcoxon rank-sum test. (B) Semiquantitative analysis of the immunohistochemical staining intensity of LMTK3 protein between colorectal cancer tissues and normal mucosal tissues. p value was calculated via Wilcoxon rank-sum test. Representative immunohistochemical images of LMTK3 protein expression in colorectal tumors (Ts; left) and normal mucosal tissues (Ns; right). (C) LMTK3 expression pattern in the single-cell transcriptome of colorectal cancer tissues. The t-distributed stochastic neighbor embedding (tSNE) shows 47,285 cells of colorectal cancer tissues between age-of-onset subgroups across six major cell types (left). Cells expressing LMTK3 are colored purple in the tSNE plot using the normalized value after canonical correlation analysis (CCA) correction (right). Scatterplots indicate differential expression of LMTK3 in tumors between age-of-onset subgroups across six cell types (bottom). ∗p < 0.05 and ∗∗∗p < 0.001 were calculated via Wilcoxon rank-sum test. ns, no significance. Discussion The study characterizes the molecular landscape with a focus on patients of Chinese ancestry with EOCRC and indicates the value of multi-omics integration in disclosing molecular heterogeneity in age-of-onset-specific tumorigenesis and demonstrates the utility of omics measurements in precision therapeutic development. We initially observed profound aggressive features for EOCRC tumors underlying the mutational signature 1, oncogenic SCNAs (14/15q and 20q13.12), DNAmAge, and carcinogenic pathway enrichment across each molecular layer, which indicated the feasibility and necessity of identifying molecular phenotypes in EOCRC tumorigenesis. To fully utilize the omics information, we launched transcriptomic and proteomic analyses and found predominant signaling processes of FAD, CAMs, and DNA repair involved in EOCRC development. ADH1B, a key metabolic enzyme of alcohol prioritized in FAD pathways, was concordantly downregulated at both mRNA and protein levels in EOCRC tumors, which might underlie the inverse association of the fatty acid metabolic signature capturing alcohol consumption alongside colorectal cancer development.[225]^32^,[226]^33 CADM3, a key protein prioritized in the CAM pathway, was decreased in EOCRC tumors, consistent with its suppressive role in the tumorigenicity of colon cancer cells.[227]^34 A significant increase in signature 3, related to DNA repair, was observed in EOCRC tumors, followed by pathway enrichment and network clustering of DNA repair processes, increased immune infiltration, and a hot immunophenotype, indicating the prominent inter-play of immune-inflammation response and DDR in the young tumor microenvironment.[228]^35^,[229]^36 These molecular phenotypes help to enrich our etiological understanding of EOCRC. These distinct omics profiles provide valuable insights into the development of new therapeutic strategies for EOCRC. TMB is an important independent genomic biomarker indicating the likelihood of a response to immunotherapy, including in colorectal cancer.[230]^37^,[231]^38 EOCRC tumors have a significantly high TMB and remarkably unique pathway enrichment in mismatch repair, suggesting a therapeutic preference for immunotherapy.[232]^39 In addition, we profiled the gene expression spectrum for platinum drug resistance, CT antigens, pan-essential genes, and an immune panel to identify potential drug targets. Importantly, most genes displayed inconsistent expression patterns at both mRNA and protein levels in the comparison between age-of-onset subgroups, which may be attributed to tighter posttranscriptional regulation in young tumors. For example, CCL21 may play important roles in clinical implications of EOCRC, as previous evidence supports that cancer therapy, by combining CCL21, serves as an effective strategy for colorectal cancer immunotherapy through inducing antitumor immune responses.[233]^40^,[234]^41 Hence, these discrepant genes in young tumors prompted us to take a precision approach to drug development, treatment selection, and therapeutic management for EOCRC. LMTK3 varied in mutation frequency across age of onset and ancestry, which potentially triggered molecular alterations in EOCRC tumorigenesis. We found that the LMTK3 mutation not only co-occurred with driver genes but also upregulated gene expression (including its cognate gene) in defined oncogenic pathways, consistent with agonistic features of EOCRC tumors. Intriguingly, SHPRH and POLE, two DNA repair genes, were concurrently mutated with LMTK3, consistent with the increased signature 3 by mutant LMTK3. Specifically, POLE mutations have also been reported with increased frequency in Chinese patients and younger tumors.[235]^42^,[236]^43 Moreover, pathogenic characteristics of large T cell infiltration and a higher TMB driven by POLE mutation[237]^42^,[238]^43 were in accordance with those driven by LMTK3 mutation. Notably, LMTK3 is a druggable target for cancer therapy (including colorectal cancer), independent of its RNA expression, acting as a protein kinase modulating ERα by or not by PRKCA-AKT1-FOXO3 signaling[239]^31 and producing a small molecule underlying the latest druggable genome evidence at the tier 3 priority for cancer treatments.[240]^44^,[241]^45 Collectively, mutation-relevant findings could be used to interpret the biological mechanism of DNA repair and immune-inflammation response in early-onset colorectal tumorigenesis and further support the discovery of new therapeutic targets. In summary, we performed a molecular characterization of colorectal cancer through genomic, epigenomic, transcriptomic, and proteomic analyses, when integrated appropriately, to delineate the molecular complexity underlying differences in the age of onset and ancestry. These discoveries bring insights that could be used to mitigate disparities among patients with colorectal cancer via the prospective exploration of precision oncology in drug development and therapeutic modalities. Limitations of the study We acknowledge several limitations in this study. First, we noted some obvious but not significant molecular alterations, such as increased MSI and low CIMP driven by the early onset or LMTK3 mutation. These results are probably due to the limited number of sequenced samples. Second, we did not profile posttranslational modifications, such as the phosphoproteome, and did not perform trans-omics-based subtype analysis because of a low number of, consistency of, and overlapping of samples of DNA methylation and proteomic profiling. These limitations restrict the synergy of multi-omics analysis to provide a unified view of the molecular heterogeneity of EOCRC. Third, there was a lack of survival analysis in NJCRC due to the short-term follow up, and the absence of experimental validation of LMTK3 mutation or expression involving age of onset and colorectal tumor aggressiveness. Therefore, larger sample sizes that increase the power of identifying molecular events; more in-depth profiling that includes the test in the single-cell level; deep clinical phenotyping that provides clinical insights; novel analytic frameworks that enrich the knowledge of cancer etiology; and functional experiments that elaborate cancer pathophysiology are warranted in future work. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies __________________________________________________________________ Mouse monoclonal anti-LMTK3 Antibody (I-17) Santa Cruz Biotechnology Cat# sc-100418; RRID:AB_2234692 __________________________________________________________________ Biological samples __________________________________________________________________ Tumor and normal adjacent tissues from human colorectal cancer patients Nanjing ColoRectal Cancer (NJCRC) cohort This paper Tissue microarray of colorectal cancer Shanghai Outdo Biotech Co., Ltd. Cat# HColA180Su19 __________________________________________________________________ Critical commercial assays __________________________________________________________________ QIAamp DNA Mini Kit QIAGEN Cat# 51306 TruSeq Nano DNA HT Library Prep Kit Illumina Cat# FC-121-4003 SureSelectXT Reagent Kit Agilent Cat# G9641B SureSelectXT Human All Exon V6 Kit Agilent Cat# 5190-8865 Infinium HumanMethylation EPIC BeadChip Kit Illumina Cat# WG-317-1001 EZ DNA Methylation Kit Zymo Research Cat# D5001 AllPrep DNA/RNA Mini Kit QIAGEN Cat# 80204 __________________________________________________________________ Deposited data __________________________________________________________________ Colorectal cancer whole-genome sequencing data This study [242]https://db.cngb.org/cnsa/Accession number: CNP0004009 Colorectal cancer whole-exome sequencing data This study [243]https://db.cngb.org/cnsa/Accession number: CNP0004009 Colorectal cancer whole-genome DNA methylation data This study [244]https://db.cngb.org/cnsa/Accession number: CNP0004009 Colorectal cancer whole-transcriptome sequencing data This study [245]https://db.cngb.org/cnsa/Accession number: CNP0004009 Colorectal cancer mass-spectrometry-based proteome data This study [246]https://db.cngb.org/cnsa/Accession number: CNP0004009 Colorectal cancer single-cell transcriptome data Lee et al.[247]^35 [248]https://www.ncbi.nlm.nih.gov/geo/GEO: [249]GSE132465 ICGC mutation data ICGC [250]https://dcc.icgc.org TCGA mutation data TCGA [251]https://portal.gdc.cancer.gov TCGA expression data TCGA [252]https://portal.gdc.cancer.gov Pan cancer expression data PanCanAtlas [253]https://gdc.cancer.gov/about-data/publications/pancanatlas __________________________________________________________________ Software and algorithms __________________________________________________________________ BWA (version 0.7.17-r1188) Li et al.[254]^46 [255]http://bio-bwa.sourceforge.net SAMtools (version 0.1.18) Li et al.[256]^47 [257]http://www.htslib.org Picard (version 2.9.0) Broad Institute [258]http://broadinstitute.github.io/picard GATK (version 3.8.0) DePristo et al.[259]^48 [260]https://gatk.broadinstitute.org/hc/en-us MuTect2 (version 1.1.4) Cibulskis et al.[261]^49 [262]https://gatk.broadinstitute.org/hc/en-us Strelka (version 1.0.13) Saunders et al.[263]^50 [264]https://github.com/Illumina/strelka ANNOVAR Wang et al.[265]^51 [266]https://annovar.openbioinformatics.org/en/latest MutSigCV (version 1.41) Lawrence et al.[267]^52 [268]https://software.broadinstitute.org/cancer/cga/mutsig MSMuTect (version 1.0) Maruvka et al.[269]^53 [270]https://github.com/getzlab/MSMuTect Control-FREEC (version 6.2) Boeva et al.[271]^54 [272]http://boevalab.inf.ethz.ch/FREEC/ GISTIC2 (version 2.0.23) Mermel et al.[273]^55 [274]https://github.com/broadinstitute/gistic2 iScan System Illumina [275]https://www.illumina.com/systems/array-scanners/iscan.html DNA Methylation Age Calculator Horvath[276]^56 [277]http://dnamage.genetics.ucla.edu HISAT2 (version 2.0.4) Kim et al.[278]^57 [279]http://daehwankimlab.github.io/hisat2 StringTie (version 1.3.4) Pertea et al.[280]^58 [281]https://ccb.jhu.edu/software/stringtie Cufflinks (version 2.1.1) Roberts et al.[282]^59 [283]http://cole-trapnell-lab.github.io/cufflinks TIMER Li et al.[284]^60 [285]http://cistrome.org/TIMER ESTIMATE Yoshihara et al.[286]^61 [287]https://bioinformatics.mdanderson.org/estimate Proteome Discoverer 2.2 Thermo Fisher [288]https://assets.thermofisher.com/TFS-Assets/CMD/manuals/Man-XCALI-9 7808-Proteome-Discoverer-User-ManXCALI97808-EN.pdf OmicsX Pan et al.[289]^62 [290]http://bioinfo.wilmer.jhu.edu/OmicsX/ R package deconstructSigs Rosenthal et al.[291]^63 [292]https://github.com/raerose01/deconstructSigs R package ChAMP Tian et al.[293]^64 [294]https://www.bioconductor.org/packages/release/bioc/vignettes/ChAMP /inst/doc/ChAMP.html R package ConsensusClusterPlus Wilkerson et al.[295]^65 [296]https://bioconductor.org/packages/release/bioc/html/ConsensusClust erPlus.html R package CMSclassifier Guinney et al.[297]^12 [298]https://github.com/Sage-Bionetworks/CMSclassifier R package corrplot N/A [299]https://cran.r-project.org/web/packages/corrplot/index.html R package clusterProfiler Yu et al.[300]^66 [301]https://bioconductor.org/packages/release/bioc/html/clusterProfile r.html R package factoextra N/A [302]https://cran.r-project.org/web/packages/factoextra/index.html R package Seurat Butler et al.[303]^67 [304]https://satijalab.org/seurat R package RCAv2 Li et al.[305]^68 [306]https://github.com/prabhakarlab/RCAv2 R package cancerSeqStudy Bailey et al.[307]^69 [308]https://github.com/KarchinLab/cancerSeqStudy R package UpSetR Conway et al.[309]^70 [310]https://github.com/hms-dbmi/UpSetR ProteinPaint St. Jude Children’s Research Hospital. [311]https://proteinpaint.stjude.org/ PyMOL Schrödinger, Inc. [312]https://pymol.org/2/ [313]Open in a new tab Resource availability Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Meilin Wang ([314]mwang@njmu.edu.cn). Materials availability This study did not generate new, unique reagents. Experimental model and subject details Specimens and clinical data The study protocol was reviewed and approved by the Ethics Committee of Nanjing Medical University. Colorectal cancer samples for omics measurements were designedly selected from the Nanjing ColoRectal Cancer (NJCRC) cohort.[315]^71^,[316]^72 The NJCRC cohort belongs to the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) and comprises 1,316 cases in total with 366 cases of early-onset colorectal cancer (EOCRC; 27.81%). Briefly, the NJCRC cohort included newly diagnosed, untreated colorectal cancer patients undergoing surgical resection. Basic patient demographics were collected via a face-to-face interview with subjects who provided written informed consent; and the corresponding clinical information regarding the status of surgery and relevant diagnostic information were collected using case report forms. In this study, 79 paired tumor tissues and normal adjacent tissues (NATs) were designedly selected for omics detection; the tumor tissues were from sporadic tumors without hereditary syndromes and histopathologically confirmed by pathologists and verified to have at least 70% tumor cell nuclei. Publicly available datasets We requested available colorectal cancer omics datasets from two databases, namely, International Cancer Genome Consortium (ICGC; [317]https://dcc.icgc.org/, version 2019_08_31) and The Cancer Genome Atlas (TCGA; [318]https://portal.gdc.cancer.gov/, version 2016_01_28), for global or candidate molecular evaluation as appropriate. In terms of the evaluation of ancestral disparity, we retained samples with explicit ancestry information specific to Asian (Chinese) or European populations. Therefore, 325 Chinese individuals with colorectal cancer cases (COCA-CN from ICGC) and 226 European individuals with colorectal cancer (COAD and READ combined from TCGA) were included for further analysis. Demographic characteristics are shown in [319]Table S1. It is noteworthy that the proportion of EOCRC patients in the overall NJCRC (27.81%), ICGC (27.69% in Chinese), and TCGA (15.93% in Europeans) cohorts was over 10% less than that recruited in this study (37.97%), which indicated adequate statistical power to observe molecular features in consideration of cost-effectiveness.[320]^73 In addition, the Pan-Cancer Atlas initiative ([321]https://gdc.cancer.gov/about-data/publications/pancanatlas) provided transcriptomic datasets across 32 cancer types for the evaluation of candidate gene expression patterns. Method details Whole-genome sequencing (WGS) and whole-exome sequencing (WES) We carried out both WGS and WES paired-end sequencing reactions for 27 and 52 pairs of tumors and NATs, respectively, in collaborations with Novogene Co., Ltd. (Beijing, China). Briefly, DNA was extracted from fresh frozen tissue blocks using a QIAGEN QIAamp DNA Mini Kit. For WGS, library preparation was performed using at least 0.8 μg of genomic DNA per sample with a high molecular weight (>20 kb single band) via the TruSeq Nano DNA HT Library Prep Kit. Pooled DNA libraries were sequenced on an Illumina HiSeq X Ten platform with 150-bp paired-end sequencing. For WES, library construction and whole-exome capture of genomic DNA were performed using the SureSelectXT Reagent Kit (Agilent) and SureSelectXT Human All Exon V6 Kit (Agilent). The captured DNA was then sequenced on an Illumina HiSeq 2500 sequencing platform with 150-bp paired-end sequencing. Genomic data processing The FastQC package was used to assess the quality-score distribution of the sequencing reads. All raw Illumina sequence data were demultiplexed and converted to fastq files, with the subsequent trimming of adaptors, contamination, and low-quality nucleotides to obtain clean data. Paired-end sequencing reads were aligned to the human reference genome (GRCh37) using Burrows-Wheeler Aligner (BWA; version 0.7.17-r1188);[322]^46 then, SAMtools (version 0.1.18),[323]^47 Picard (version 2.9.0; [324]http://broadinstitute.github.io/picard/), and Genome Analysis Toolkit (GATK; version 3.8.0)[325]^48 were used to sort BAM files and perform duplicate marking, local realignment, and base quality recalibration to generate final BAM files for computing the sequence coverage and depth. The average sequence coverage was 56× for 27 tumors and 35× for 27 NATs for WGS, as well as 188× for 52 tumors and 115× for 52 NATs for WES, which were utilized for downstream analyses of somatic mutation, microsatellite instability (MSI), tumor mutation burden (TMB), and somatic copy number alteration (SCNA). Somatic mutation detection MuTect2 (version 1.1.4)[326]^49 and Strelka (version 1.0.13)[327]^50 were used to call somatic single nucleotide variants (SNVs) and small insertions/deletions (INDELs) from the comparison between tumors and NATs, respectively. Somatic calls were then subjected to quality control using the following exclusion criteria: (1) they were found in a panel of normal tissues built by matched normal tissues; (2) they were located in the segmental duplication region marked by the UCSC browser ([328]http://genome.ucsc.edu/); and (3) they were found in the 1000 Genomes Project or Exome Aggregation Consortium with the same mutation direction. The filtered somatic mutations were then annotated with ANNOVAR[329]^51 based on GENCODE v19 for protein altering events. Analyses of significantly mutated gene (SMG) and differentially mutated gene (DMG) The significance of candidate mutations was evaluated using MutSigCV (version 1.41),[330]^52 which considered the global mutation situation, mutation frequency of genes near the mutation site, whether the site was located in a region where the chromosome was easy to open, and other parameters to find genes with a mutation rate higher than the background mutation rate. Genes with a false discovery rate (FDR; q value) less than 0.1 were considered SMGs. DMGs were defined as genes with a significantly differential mutation rate between EOCRC and LOCRC underlying a p value less than 0.01 using Fisher’s exact test. A mutational lolliplot was generated with ProteinPaint ([331]https://proteinpaint.stjude.org/), and the Pfam database ([332]https://pfam.xfam.org/) supports protein domains for candidate genes (e.g., LMTK3). PyMOL software was used to visualize LMTK3 protein structure with or without LMTK3 mutations. Microsatellite instability (MSI) status prediction MSMuTect (version 1.0) was applied to infer the somatic microsatellite level by comparing somatic microsatellite INDEL (MS INDEL) calling.[333]^53 MSMuTect used the Kolmogorov-Smirnov test to identify microsatellite sites with different alleles between tumor and normal pairs. Patients with MS INDELs greater than 40 were grouped into the MSI-high (MSI-H) group, while the others were grouped into the microsatellite stability (MSS) group. Tumor mutation burden (TMB) and hypermutation status calculations TMB, an emerging biomarker of immunotherapy responses,[334]^74 is defined as the number of nonsynonymous somatic mutations (including SNVs and INDELs) per mega base (Mb) of genome examined. The hypermutation status of the tumor was categorized by the threshold TMB > 12/Mb.[335]^75 Mutational signature analysis All mutation data from WES and WGS datasets were converted into a matrix composed of 96 features comprising six substitution patterns (C > A, C > G, C > T, T > A, T > C, and T > G) using each possible 5' (C, A, G, and T) and 3' (C, A, G, and T) context for all samples. Furthermore, the R package deconstructSigs[336]^63 was applied to delineate the proportion of 30 known mutational signatures derived from Catalogue Of Somatic Mutations In Cancer (COSMIC; [337]http://cancer.sanger.ac.uk/cosmic/signatures; v2, March 2015; [338]Table S2) assigned to each tumor tissue. Somatic copy number alteration (SCNA) analysis SCNAs were identified using Control-FREEC (version 6.2),[339]^54 of which amplified or deleted genomic regions were identified using GISTIC2 (version 2.0.23).[340]^55 For the focal level analysis, G-scores were calculated for significant genomic and gene-coding regions on the basis of the frequency and amplitude of amplification or deletion affecting each gene. A default q value threshold (0.25) of GISTIC2 was used to define highly amplified or deleted regions for both arm- and focal-level analysis. Whole-genome DNA methylation detection DNA methylation profiling for 43 pairs of tumors and NATs (including 24 EOCRC and 19 LOCRC) was performed with CapitalBio (Beijing, China) using the Illumina Infinium HumanMethylation EPIC BeadChip, which covered over 850,000 CpG loci across the whole human genome at single-nucleotide resolution. Briefly, the EZ DNA Methylation Kit (Zymo Research) was used to perform sodium bisulfite-treated arrays for genomic DNA. To minimize batch effects, pairs of tumor and normal tissues were randomly distributed to different arrays. The bisulfite-converted genomic DNA was isothermally amplified overnight and fragmented enzymatically. Precipitated DNA using isopropyl alcohol was resuspended in hybridization buffer and dispensed onto BeadChips (8 samples/chip). The hybridization procedure was performed at 48°C overnight using an Illumina hybridization oven. After hybridization, the free DNA was washed away, and BeadChips were processed through a single nucleotide extension followed by immunohistochemistry staining. Finally, we downloaded the manifest files of each array in advance and imaged BeadChips using an Illumina iScan. DNA methylation profile processing The methylation level of each CpG is represented as β values (M/[M+U], ranging from 0 to 1), where M represents the methylated intensity and U represents the unmethylated intensity. In this study, we processed all raw data of the Illumina array following the pipeline of R package ChAMP[341]^64 to perform quality control of the dataset and differential expression analysis as needed. Briefly, low-quality probes were excluded if probes (1) had a detection p value > 0.1; (2) had bead counts < 3 in at least 5% of samples; (3) were non-CpG labeled; (4) were related to single nucleotide polymorphisms; (5) were aligned to multiple locations; and (6) were located on chromosomes X and Y. Ultimately, 744,713 CpG probes were included for type-II probe normalization,[342]^76 which were used for downstream analyses. DNA methylation age estimation The DNA methylation age (DNAmAge), also known as the epigenetic age, was estimated based on DNA methylation profiles using a freely available web-based tool (DNA Methylation Age Calculator; [343]http://dnamage.genetics.ucla.edu/).[344]^56 Briefly, DNA methylation profiles originating from tumors and NATs were reduced to datasets with 28,587 CpGs, which were then analyzed based on DNA sources subjected to “colon”. The output variable of “DNAmAge” was used to determine the correlation with chronological age at diagnosis of colorectal cancer using Pearson’s correlation test. CpG island methylator phenotype (CIMP) calculation The CIMP status of each sample was predicted using CpGs derived from promoter regions with a high standard deviation (SD > 0.2) of the methylation level in tumor tissues and a low methylation level (mean β value < 0.05) in normal tissues.[345]^77 The R package ConsensusClusterPlus was used to perform consensus clustering analysis on the basis of the K-means algorithm.[346]^65 In this study, as limited samples were available for DNA methylation profiling, we grouped samples with hypermethylation patterns as CIMP-H and all others as CIMP-L. Analyses of differentially methylated position/region For the differentially methylated position analysis, R package limma (implemented in ChAMP package) was used to calculate the p value for differential methylation using a linear model, and CpG probes meeting the criteria of log2-fold change (FC) >1 and Bonferroni adjusted p value <0.05 (i.e., p < 0.05/744,713) were considered to be significantly differential CpGs. Differentially methylated region algorithms implemented Bumphunter (deposited in ChAMP package), which first grouped all probes into small clusters (or regions) and then applied a random permutation method to estimate significantly differential candidate regions accompanied by a Bonferroni adjusted p value <0.05 (i.e., p < 0.05/5,674 for all samples, 0.05/4,432 for EOCRC, 0.05/4,294 for LOCRC). Whole-transcriptome sequencing and data processing The transcriptomes of 79 pairs of tumors and NATs were generated in collaboration with Novogene Co., Ltd. (Beijing, China) using the Illumina HiSeq 2500 platform with 150-bp sequencing based on the Standard Illumina RNA sequencing (RNA-Seq) protocol. RNA was extracted from fresh frozen tissue blocks using the QIAGEN AllPrep DNA/RNA Mini Kit following the manufacturer’s instructions. RNA reads were generated, aligned to the GENCODE v19 genome assembly with HISAT2 (version 2.0.4),[347]^57 and quantified with StringTie (version 1.3.4).[348]^58 Cufflinks (version 2.1.1)[349]^59 was then used to quantify the gene expression abundance and calculate the standardized gene expression (represented as fragments per kilobase per million mapped reads [FPKM]) for each sample. In this study, a total of 18,094 protein-coding RNAs were initially identified in expression profiles. When performing quality control underlying criteria of call rate >80% of samples and gene expression level >0.5 FPKM, we included 13,070 genes for the general differential analysis. However, when focusing on evaluating expression or correlation patterns in candidate genes, we retained all available genes, which allowed us to capture low-abundance genes for further analysis, such as LMTK3, which exhibited low expression at the mRNA level but high expression at the protein level. Immune cell infiltration estimation The Tumor IMmune Estimation Resource (TIMER) algorithm was employed to estimate the infiltrations of diverse immune cell types in tumor tissues based on tumor RNA-Seq profiles.[350]^60 A leukocyte gene signature matrix was used as the reference data, including 2,271 signature genes overexpressed in the immune lineage. Batch effects between in-house data and the external reference datasets were removed using the ComBat function from R package sva. Six types of immune infiltrates were estimated in this study, including B cells, CD4^+ T cells, CD8^+ T cells, Neutrophils, Macrophages, and Dendritic cells. The Estimation of STromal and Immune cells in MAlignant Tumours using Expression data (ESTIMATE) algorithm used the transcriptome to calculate the estimated levels of infiltrating stromal and immune cells and estimated tumor purity.[351]^61 The ESTIMATE score infers the tumor purity of tumor tissues. Consensus clustering for transcriptomic data We employed the consensus molecular subtypes (CMS) classifier, exploited by random forest algorithm implemented in the R package CMSclassifier ([352]https://github.com/Sage-Bionetworks/CMSclassifier),[353]^12 to assign the CMS for each sample based on the gene expression matrix. Using a default posterior probability of 0.5 as a threshold for sample classification, we assigned 79 prospective tumors to four CMS subtypes, i.e., CMS1-4.[354]^12 Immunophenotype identification Tumor immunophenotypes were defined by combining TMB and immune infiltration underlying K-means = 4, accompanied by hierarchical clustering within each cluster. The “Hot” immunophenotype was characterized by CD8^+ T cell infiltration, followed by “pre-Hot” with CD4^+ T cell infiltration; in contrast, “Cold” tumors had less immune cell infiltration, and “high TMB” tumors were enriched by the high value of TMB. Differentially expressed gene analysis A two-sided paired t test was performed for each gene expression comparison between two subgroups, such as tumors versus NATs across age-of-onset subgroups, as well as EOCRC versus LOCRC across tumors and NATs. Differentially expressed genes were defined as differential genes with absolute log2FC > 1 and Bonferroni adjusted p value <0.05 (i.e., p < 0.05/13,070) in each comparison. Correlation pattern analysis The R package corrplot was utilized to analyze and visualize the correlation pattern of candidate genes, such as genes from the cascade of LMTK3, PRKCA-AKT1-FOXO3, and ESR1, as well as co-occurring mutant genes of SHPRH, POLE, and LMTK3. The correlation coefficient and p value were calculated via Spearman’s correlation analysis. Tandem mass tag (TMT) detection and processing TMT is a chemical label used for mass spectrometry (MS)-based identification and quantitation of proteins. In this study, we collaborated with Novogene Co., Ltd. (Beijing, China) to detect the proteome in 37 pairs of tumors and NATs. Briefly, a BSA quantitative kit was used to quantify extracted tissue proteins, followed by quality control using SDS-PAGE and Coomassie brilliant blue R-250 staining. Eligible protein samples were labeled using TMT labeling reagents, which were then mixed in equal volumes. For multiple labeled groups, a common reference was created by pooling an equal quantity of each sample. Fractions were separated using an L3000 HPLC system (Rigol). Shotgun proteomics analyses were performed using an EASY-nLC™ 1200 UHPLC system (Thermo Fisher) coupled with a Q Exactive HF (X) mass spectrometer (Thermo Fisher) operating in the data-dependent acquisition (DDA) mode. The raw data of MS detection were named “.raw”. In terms of protein identification and quantitation, Proteome Discoverer 2.2 (Thermo Fisher) was used to retrieve the detected spectra from each run separately against the Homo sapiens database (112,443 sequences). To improve the quality of available proteins, proteins were retained if they met the criteria of Peptide Spectrum Matches (PSMs) with a credibility >99%, contained >1 unique peptide, and had an FDR <0.01. Finally, a total of 9,772 proteins were identified in the TMT platform, and after filtering based on the criteria of call rate greater than 50% in both tumors and NATs, 6,621 proteins (assigned by 6,356 genes) were retained for further analysis. Differentially expressed protein analysis A two-sided paired t test was performed for each protein expression comparison between two groups, such as tumors and NATs across age-of-onset subgroups and EOCRC versus LOCRC across tumors and NATs. Differentially expressed proteins were regarded as differential proteins when the absolute log2FC > 1 and Bonferroni adjusted p value <0.05 (i.e., p < 0.05/6,621) in each comparison. SCNA-driven cis- and trans-effects Correlations between SCNA and the abundance of mRNA and protein were determined via 5,861 common genes present among omics datasets of SCNA, mRNA, and protein. Correlation coefficient and p value were calculated via Spearman’s correlation analysis. SCNA-driven cis- and trans-effects for a given gene were determined with statistically significant (p < 0.01) positive or negative correlations. Correlation of mRNA and protein expression A total of 5,887 genes with mRNA (13,070 genes) and protein (6,621 proteins) data across 37 pairs of colorectal tumors and NATs were used to measure gene-wise and sample-wise mRNA and protein correlations. We followed the pipeline of OmicsX ([355]http://bioinfo.wilmer.jhu.edu/OmicsX/)[356]^62 to perform the omics correlation analysis. Pathway enrichment analysis Pathway enrichment analysis was conducted separately for each omics dataset and each comparison. The champ.GSEA function implemented in R package ChAMP was used to enrich genes from differentially methylated regions (from the epigenome) specifically for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways using Fisher’s exact test. The R package clusterProfiler[357]^66 was applied to enrich and cluster differentially expressed genes or proteins (from tumors versus NATs, EOCRC versus LOCRC, and Mutants versus Wild-types, as appropriate) into KEGG pathways, and the deposited function of enrichKEGG was used to enrich significantly positively correlated genes from mRNA-protein pairs underlying background genes of the positive correlation-ranked list of genes. Principal component analysis (PCA) We performed PCA on all omics samples to illustrate the global epigenomic (DNA methylation), transcriptomic (mRNA), and proteomic (protein) differences between tumors and NATs ([358]Figure S1A). PCA using prcomp function in R package stats was implemented for unsupervised clustering analysis on the centered and scaled expression matrix of each omics dataset. The R package factoextra was used to visualize the PCA output. Immunocytochemistry of LMTK3 A tissue microarray including 93 surgical colorectal cancer specimens was purchased from Shanghai Outdo Biotech Co., Ltd. (HColA180Su19). The primary antibody applied for immunohistochemical detection of LMTK3 protein expression was anti-LMTK3 (sc-100418, Santa Cruz Biotechnology). Two experienced pathologists scored the staining results in a blinded manner. The immunostaining intensity was scored as 1 (low, +), 2 (medium, ++) or 3 (high, +++) ([359]Figure S6G), and the stained percentage was scored semiquantitatively as 1 (0-25%), 2 (26-50%), 3 (51-75%) or 4 (76-100%). Both the cell nucleus and cytoplasm were scored separately. Multiplication of the intensity and percentage scores, followed by the summation for those from the nucleus and cytoplasm, resulted in a score ranging from 1 to 24 for each tissue. Single-cell transcriptome analysis We retrieved single-cell 3′ mRNA sequencing data from the Gene Expression Omnibus (GEO) database under accession code [360]GSE132465,[361]^35 which included 23 colorectal cancer patients with 23 primary colorectal cancer and 10 matched normal mucosa samples. We followed the pipeline by the author to process the dataset[362]^35 and divided it into the comparison between EOCRC and LOCRC. Briefly, the sequencing data were aligned to the human reference genome (GRCh38) and processed using the CellRanger 2.1.0 pipeline (10x Genomics). The R package Seurat[363]^67 was used to filter the raw gene expression matrix according to the following criteria, namely, cells with >1,000 unique molecular identifier (UMI) counts, >200 genes and <6,000 genes, and <20% mitochondrial gene expression in UMI counts, which generated a filtered gene expression matrix with 67,296 cells. This filtered matrix was then normalized to the total UMI counts per cell and transformed to the natural log scale. The multiple canonical correlation analysis (multiCCA) implemented in Seurat was used for batch correction. To eliminate cells of ambiguous identity, multiCCA and reference component analysis (RCA; R package RCAv2)[364]^68 were combined for initial clustering and cell type identification, and discordant cells were removed from the downstream analysis. After defining the global cell type, cells with a number of genes exceeding the outliers were removed to eliminate potential doublets, which generated a final gene expression matrix with 63,689 cells available in this study. The t-distributed stochastic neighbor embedding (tSNE) deposited in R package Seurat was used to cluster cells in low-dimensional space. Quantification and statistical analysis Statistical power of SMG identification was calculated through a binomial model implemented in R package cancerSeqStudy[365]^69 based on the parameters of sample size, effect size (i.e., population mutation rate), basepair mutation rate (per base), and significance level (at p = 5e-6). The Wilcoxon rank-sum test, t test, and Kruskal-Wallis test were used for quantitative data comparisons, and Fisher’s exact test and χ^2 test were used to assess the difference in qualitative data, as appropriate. The R package UpSetR[366]^70 was used to visualize set intersections or overlap components across subgroups. The area under the receiver operating characteristic curve was used to assess the ability to distinguish between EOCRC and LOCRC using DMGs. The Kaplan-Meier survival curve was generated to assess the 5-year survival rate, and the log-rank test was used to test for significant differences. All reported p values were two-sided. Acknowledgments