Abstract Colorectal cancer (CRC) is a frequently lethal disease, with stage II/III CRC accounting for ≈70%. Metabolic reprogramming plays a pivotal role in deciphering cancer heterogeneity and progression. Here, 9 datasets and 83 machine learning algorithm combinations are leveraged‌ to develop the Machine Learning‐based Metabolic gene Prognostic Signature (MALMPS) model. The MALMPS model outperformed traditional clinical traits and molecular features in predicting prognosis for stage II/III CRC patients across training and validation datasets. COX7B, a key gene in MALMPS, is shown to promote CRC malignancy through multi‐omics analysis and in vitro assays. CRC patients are stratified into high‐ and low‐risk groups based on the median cutoff of MALMPS. Notably, the high‐risk subgroup exhibited poor prognosis, activated inflammation, and enriched carbohydrate, glycosaminoglycan, and lipid metabolism, with therapeutic potential for IGF‐1R and Wnt/β‐catenin inhibitor. In contrast, the low‐risk group displayed a TGF‐β pathway inactivating mutation and enriched in nucleotides, cofactors, and amino acids metabolism. Metabolite profiling in the in‐house SDCRC dataset validated the distinct metabolic alterations between the two groups. These findings indicate that MALMPS is a valuable instrument for predicting the recurrence risk of stage II/III colorectal cancer, particularly for identifying individuals at high risk. Keywords: colorectal cancer, machine learning, metabolic‐related gene, prognosis, tumor metabolism __________________________________________________________________ Chen et al. identified a metabolic gene prognostic signature with machine‐learning framework and assessed the prognostic value in independent CRC cohorts. The metabolic signature subtypes are characterized by specific metabolic and proteomic characteristics, and associated with distinct molecular pathway and therapeutic potential. graphic file with name ADVS-12-e01333-g009.jpg 1. Introduction Colorectal cancer (CRC) is the second‐most and third‐most common cancer in women and men, respectively. Moreover, it is still the second leading cause of death related to cancer, which causes a heavy disease burden.^[ [48]^1 ^] The TNM stage is a key prognostic factor for making therapy decisions, and stage II/III colorectal cancer accounts for ≈70% of all cases. However, despite advances in radiotherapy and chemotherapy, the overall survival rates for stage II/III colorectal cancer are still ≈70% and 50%, respectively.^[ [49]^2 , [50]^3 ^] The prognosis of stage II/III CRCs varies widely, and patients who routinely received adjuvant chemotherapy after surgery did not respond equally.^[ [51]^4 ^] These phenomena highlight the substantial heterogeneity among patients in stage II/III CRCs. The development of novel prognostic signatures for stage II/III CRCs represents a critical unmet need in clinical oncology. Several studies have developed molecular models to predict stage CRC prognosis, including lncRNA, hallmark, and immune‐based signatures etc.^[ [52]^5 , [53]^6 , [54]^7 , [55]^8 ^] Although the prediction performances varied across different studies, there was a notable lack of emphasis on sufficient validation and exploration of diverse modeling algorithms. Metabolism reprogramming plays a significant role in the initiation, growth, and spread of CRC, and metabolic adaptations are closely linked to the pathways that lead to cancer.^[ [56]^9 , [57]^10 ^] Meanwhile, more refinement in modeling algorithm combination and signature validation is needed to improve the credibility of the model. With advances in sequencing technology, machine learning is increasingly being used to predict prognosis and patient survival based on genomic, metabolomic, and transcriptomic data.^[ [58]^11 , [59]^12 ^] Clinicians and researchers are now turning to machine learning techniques, such as neural networks, ensemble approaches, and deep learning, for patient outcomes classification and prediction. Recent advancements in multi‐omics approaches, such as genomic, proteomic, metabolomic, scRNA‐seq, and spatial transcriptomics, have provided deeper insights into the molecular mechanisms underlying intratumoral heterogeneity and tumor‐associated cellular reprogramming.^[ [60]^12 , [61]^13 ^] Mass spectrometry imaging‐based spatially resolved metabolomics (SM) enables in situ screening of metabolic biomarkers associated with tumor initiation, progression, and metastasis, thereby facilitating the characterization of the metabolic architecture of tumors and their surrounding microenvironment.^[ [62]^14 ^] Under the background above, the study aims to screen metabolic‐related genes associated with clinical significance through a combined 10‐fold or bootstrap machine learning framework in stage II/III CRCs. The MAchine Learning‐based Metabolic Gene Prognostic Signature (MALMPS) model was validated in five other independent datasets. The biological activities, molecular mechanism, tumor microenvironment remodeling, metabolic reprogramming, and chemotherapy response across different MALMPS subgroups were further explored in multi‐omics approaches and in vitro assays. Finally, we established and validated a nomogram based on the integration of MALMPS and clinical and molecular features. 2. Results 2.1. Development of a Machine Learning‐Based Metabolic‐Related Gene Signature Model The overall study design workflow is presented in Figure [63]1 . A total of 1,053 overlapping genes, curated from the KEGG and REACTOME metabolic pathway, were subjected to univariate Cox regression analysis, which identified 149 metabolic‐related prognostic genes in the training meta‐cohort (Table [64]S1, Supporting Information). The expression profiles of the genes were subjected to the machine learning‐based modeling framework. We fitted 83 kinds of prediction models based on 10‐fold CV or bootstrap resampling via ten machine learning algorithms and further calculated the C‐index of each model across all validation datasets. The most robust model with the highest mean C‐index in the five validation datasets was the combination of random survival forest (RSF) and generalized boosted regression modeling (GBM) (Figure [65]2A), which showed a range of C‐index from 0.62 to 0.86 (training dataset: 0.862 [95%CI: 0.841‐0.882], [66]GSE17536: 0.698 [95%CI: 0.635‐0.760], [67]GSE29621: 0.673 [95%CI: 0.548‐0.798], [68]GSE92921: 0.740 [95%CI: 0.607‐0.871], [69]GSE143985: 0.705 [95%CI: 0.614‐0.797], [70]GSE161158: 0.619 [95%CI: 0.556‐0.682], Figure [71]2B). In the identified model combination, we obtained the optimistic mtry and nodesize parameters with the lowest out‐of‐sample error under the number of ntree 845. The appropriate ntree number was picked according to the error rate plot, and features were selected according to the variable importance via parameter‐adjusted RSF analysis (Figure [72]S1A, Supporting Information). The selected features were further subjected to GBM to fit the predictive model with the optimal number of iterations (Figure [73]S1B, Supporting Information). Finally, the risk score for each patient was calculated using the expression of 21 gene features, weighted by the relative importance in the GBM model, to generate the MAchine Learning‐based Metabolic gene Prognostic Signature (MALMPS, Figure [74]S1C and Table [75]S2, Supporting Information) model. We compared the prognostic efficacy of the MALMPS with that of commonly used clinical and molecular features, including age, sex, ethnicity, AJCC stage, grade, as well as driver gene mutations (KRAS, TP53, and BRAF) and microsatellite instability (MSI), etc. Intriguingly, the MALMPS score consistently outperformed all other clinical and molecular features in terms of the C‐index across all independent validation datasets. All variables were statistically significant in the training dataset (P < 0.001), and some variables such as age, race, gender, BRAF mutation, and MSI status were statistically significant in the validation datasets (P < 0.05, P < 0.01, or P < 0.001), highlighting the superior diagnostic precision of our constructed signature (Figure [76]2C). Figure 1. Figure 1 [77]Open in a new tab Overall workflow of the study. Figure 2. Figure 2 [78]Open in a new tab The identification of the best performance signature. A) C‐indices of 83 combinations of machine learning prediction models in five independent validation cohorts. B) C‐indices of MALMPS across all datasets. C) C‐index comparisons between clinical and molecular variables and MALMPS scores in [79]GSE161158, training meta‐cohort, [80]GSE17536, [81]GSE92921, [82]GSE29621, and [83]GSE143985. * P < 0.05, ** P < 0.01, *** P < 0.001. 2.2. Evaluation of MALMPS Model Performance and Prognostic Value We utilized the ROC analysis to measure the PFS discrimination of the MALMPS model (Figure [84]3A). The average AUC for the training dataset was 0.893, and the model showed a consistently decent level of 3‐year AUCs across the validation datasets (average of 0.699). In addition, we systematically collected and re‐analyzed five prognosis molecular signatures in the CRC transcriptomic dataset (Table [85]S3, Supporting Information). Our MALMPS scores achieved the highest C‐index among five published molecular signatures in the TCGA (training set) and additional independent validation datasets (Figure [86]3B). In summary, our MALMPS model could be a promising biomarker for predicting the PFS risk of stage II/III CRC patients. Figure 3. Figure 3 [87]Open in a new tab Evaluation of MALMPS performance. A) Time‐dependent ROC analysis for predicting PFS at 1, 3, and 5 years across the training meta‐cohort and all validation datasets. B) C‐index comparisons between MALMPS scores and 5 published signatures in TCGA‐CRC, [88]GSE17536, [89]GSE29621, [90]GSE92921, [91]GSE143985, and [92]GSE161158. C) Kaplan‒Meier survival curve of PFS between patients with a high MALMPS score and those with a low MALMPS in all datasets. * P < 0.05, ** P < 0.01, *** P < 0.001. The samples in our independent datasets were further stratified into high‐risk and low‐risk groups based on the median predicted MALMPS score. Patients in the high‐risk group had a significantly shorter PFS time than those in the low‐risk group in the training meta‐cohort (P < 0.0001), and similar trends were also observed in the validation datasets (P < 0.01 for three datasets, with [93]GSE29621 and [94]GSE92921 only marginally missing the statistical cut‐off, possibly due to their small sample sizes, with P values of 0.073 and 0.082, respectively. Figure [95]3C). The multifaceted evaluation demonstrated that the MALMPS model performed exceptionally well in identifying colorectal cancer patients with distinct clinical outcomes. 2.3. COX7B Promotes Malignant Behaviors in CRC COX7B emerged as the most relatively important gene in MALMPS, we further performed prognostic analysis and found that higher mRNA expression of COX7B was associated with worse PFS in stage II/III CRC patients (Figure [96]4A,B). Spatial transcriptomic (ST) data of paired colorectal cancer and liver metastasis samples from two untreated patients and two neoadjuvant chemoradiotherapy patients showed that COX7B was highly expressed in tumor tissues both in colorectal and liver metastases (Figure [97]4C; Figure [98]S2A, Supporting Information). We further analyzed the cell proliferation dynamics of the 21 genes in CRC cell lines based on Chronos, a model that infers gene fitness effects from CRISPR knockout screens. We observed that the Chronos score in COX7B knockout cell lines was all below zero, indicating its significance in the viability of tumor cells (Figure [99]S2B, Supporting Information). Furthermore, proteomic analysis revealed that elevated COX7B protein levels were associated with poor prognosis in colorectal cancer (CRC) and correlated with increased levels of the tumor marker CA19‐9 (Figure [100]S2C,D, Supporting Information). We then performed KEGG pathway enrichment analysis, which revealed that high COX7B expression is associated with key metabolic and cellular processes, including glycolysis, pyruvate, fatty acid metabolism, and the cell cycle (Figure [101]S2E, Supporting Information). Accordingly, we initiated comprehensive cytological assays to investigate the role of COX7B in regulating cellular behavior and characteristics. DLD‐1 cells were selected for siRNA infection, and the efficiency was verified by using qPCR (Figure [102]4D). Afterward, flow cytometry was utilized to assess the biological impact of COX7B knockdown, which showed an increase in apoptotic cells and the percentage of cells in the G0/G1 phase, as well as a decrease in the S phase (Figure [103]4E,F). Cell proliferation experiments, including CCK‐8 and colony formation assays, indicated that COX7B promoted cell proliferation (Figure [104]4G,H). In addition, Trans‐well and cell wound scratch assays also demonstrated that COX7B promoted the invasion and migration of colorectal cells (Figure [105]4I; Figure [106]S2F, Supporting Information). In summary, these findings suggest that COX7B may act as an oncogenic factor in the tumorigenesis of CRC. Figure 4. Figure 4 [107]Open in a new tab COX7B promotes the CRC malignant behaviors. A) The importance calculated by the GBM algorithm of the 21 most valuable metabolic genes determined by RSF algorithm. B) Kaplan‐Meier survival curve of PFS between patients with high COX7B and low COX7B mRNA expression. C) Heatmap and violin plot showing the unsupervised clustering analysis distribution and COX7B expression of ST among different cell types in CRC. D) Barplot showing the COX7B expression levels within COX7B‐scr, shCOX7B#1 and shCOX7B#2 groups. E) Scatter plots and barplot demonstrating the trends in the proportions of apoptotic cells. F) Histograms and barplot exhibiting the proportion of cells in different cell cycle phases. G) Line chart showing the proliferation fold increase within the three groups over time. H) Representative staining images and barplot of colony number among the three groups. I) Representative staining images and barplot of invasion and migration transwell number among the three groups. Scale bar, 100 µm. * P < 0.05, ** P < 0.01, *** P < 0.001. 2.4. scRNA‐seq Analysis Revealing the Tumor Microenvironment Concerning the MALMPS Recent advances in single‐cell RNA‐seq (scRNA‐seq) have provided an avenue to explore the tumor microenvironment and metabolic reprogramming at a cellular resolution.^[ [108]^15 , [109]^16 ^] Here, we utilized the integrated scRNA‐seq and bulk RNA‐seq data from 13 stage II/III CRC patients to portray the cellular heterogeneity and metabolic remodeling among different MALMPS risk subgroups. Graph‐based clustering of merged and normalized cells identified robust, discrete clusters of epithelial cells, fibroblasts, endothelial cells, T cells, B cells, and myeloid cells based on canonical marker genes in CRC (Figure [110]5A). Totally, we observed a significant clustering of epithelial cells in the high‐risk group, with a notably higher number of cell cycles in the G1 and G2M phases compared to the low‐risk group (Figure [111]S3A, Supporting Information). The clinical and molecular annotations of the 13 patients at the single‐cell level are shown in Figure [112]5B. Specifically, the high‐risk group had an elevated count of endothelial cells (mainly state‐like cells), epithelial cells (mainly stem‐like cells), fibroblast cells (mainly CAF‐S1 subtype) and CD4^+T cells (mainly CD4‐AXNA1, CD4‐memory), whereas the low‐risk group had a heightened number of B cells (mainly plasma cells), myeloid cells and CD8 ^+T (mainly CD8 effector cells) (Figure [113]5C). Furthermore, we observed high expression of COX7B in epithelial malignant cells that clustered within the high‐risk group, providing further evidence of the malignant behavior of COX7B (Figure [114]S3B, Supporting Information). Figure 5. Figure 5 [115]Open in a new tab Single‑cell characterization of tumor microenvironments and metabolic activity between MALMPS subgroup. A) t‐SNE of CRC samples colored by six cell types (B cells, endothelial, epithelial, fibroblast, myeloid and T cells) and MALMPS group. B) Bar plot showing the clinical and molecular characteristics of stage II/III CRC sample in different MALMPS subgroup. C) t‐SNE of B cells, endothelial, epithelial, fibroblast, myeloid, T cells colored by cell subtypes and bar plots showing the proportions and counts of the six cell types. D) Heatmap showing the median of the differential score of metabolic pathways calculated by high versus low‐risk MALMPS in each cell subtypes. The size of the circle indicates ‐log[10] (P value) and the color indicated the differential enrichment score in KEGG metabolic pathway. To understand the metabolic landscape of CRC with MALMPS, we evaluated the metabolic activity score of KEGG metabolic pathways at single cell level. Next, we computed the median of the differential metabolic score of each cell subtype (fibroblast‐normal and endothelial‐pericytes were excluded as cell counts <10) between high and low MALMPS risk groups (Figure [116]5D). The metabolic activity scores of cellular subtypes in the oxidative phosphorylation pathway exhibited significant variation between the high‐risk and low‐risk groups, showing an up‐regulation in epithelial and T cells, and down‐regulation in fibroblasts (Figure [117]S3C, Supporting Information). Specifically, the oxidative phosphorylation activity in CD4‐Th17, CD4‐memory, CD4‐Treg, memory B cells, and goblet‐like epithelial cells was notably enriched in the high‐risk group. Moreover, taurine and hypotaurine metabolism were significantly reduced in CD4‐memory T cells, CD8‐effector T cells, memory B cells, and naïve B cells of the high‐risk subgroup, suggesting that impaired taurine metabolism in immune cells suppresses the antitumor activity within the tumor microenvironment (Figure [118]S3D–F, Supporting Information). 2.5. Deciphering of Genetic Alteration Profiles with the MALMPS The mutational landscape analysis revealed that the tumor driver genes related to the TGF‐β (Transforming Growth Factor‐beta) signaling pathway, including ARID1A, ACVR2A, and ACVR1B, exhibited a higher mutation frequency in the low MALMPS risk subgroups (P = 0.0122, 0.0093, and 0.0239, respectively, Figure [119]S4A, Supporting Information). Intriguingly, mutation subtypes in these genes primarily consisted of loss of function mutations, including frameshift deletions/insertions, nonsense mutation, missense mutation, and splice‐site mutations (Figure [120]S4B, Supporting Information), suggested the attenuated TGF‐β signaling in the low‐risk group. We further compared the tumor mutation burden (TMB) and somatic copy number alteration (SCNA) levels of stage II/III CRC and found that the normalized SCNA level in the high‐risk group was significantly higher than that in the low‐risk group (Figure [121]S4C, Supporting Information). Subsequently, we extracted three mutational signatures from the genomic data with varying mutational activities and annotated them against the COSMIC‐V3 nomenclature by cosine similarity analysis (Figure [122]S4D–F, Supporting Information). The extracted mutational signatures included polymerase epsilon exonuclease domain mutations (COSMIC‐10), spontaneous or enzymatic deamination of 5‐methylcytosine (COSMIC‐1), DNA mismatch repair deficiency‐related signatures (COSMIC‐6) (Figure [123]S4G, Supporting Information). Mutational activities attributed to the COSMIC‐6 signature have a significant ascendance in the high MALMPS subgroup (P < 0.05, Figure [124]S4H,I, Supporting Information). 2.6. Potential Biological Pathway Associated with the MALMPS To elucidate the potential biological mechanisms of MALMPS, we identified the top positively and negatively enriched pathways associated with the MALMPS score in the GO, KEGG, and REACTOME datasets, using the normalized enrichment score (NES) from GSEA based on RNA‐seq data. The extracellular matrix and some organelle activities related to metabolism (e.g., mitochondria and ribosome) were found to be positively and negatively correlated with the MALMPS score, respectively, against the GO reference datasets. Similarly, the KEGG and REACTOME gene sets showed enrichment of processes related to the extracellular matrix, ribosomes, and mitochondria (Figure [125]S5A, Supporting Information). Several oncogenic signaling pathways, such as the MAPK, JAK, and TGF‐β signaling pathways, were upregulated in the high‐risk group, whereas cancer suppression signaling pathways, such as the P53 pathway and DNA repair, were enhanced in the low‐risk group (Figure [126]S5B, Supporting Information). In the realm of proteomics analysis, kinases related to Wnt/β‐catenin signaling (GSK3B and PKD1) and MAPK signaling (PBK and MAPK15) were enriched in the high‐risk group, while TCA cycle metabolism (PDK2/3/4) and cell‐cycle signaling (AURKA, CDK1/2) were enriched in the low‐risk group (Figure [127]S6A, Supporting Information). Certain phosphorylation pathways, such as the IGF1 and MAPK signaling pathways, were found to be enriched in the high‐risk group, while mTOR/MTOR was observed to be enriched in the low‐risk group (Figure [128]S6B, Supporting Information). IGF‐1 can promote the growth and survival of cancer cells, as well as stimulate the formation of new blood vessels to support tumor growth. The mTOR is involved in the regulation of autophagy, a cellular process that degrades and recycles damaged or unnecessary cellular components. Our exploration successfully identified 46 distinct proteins showing differential expression, with 21 proteins (PCSK1, PLIN, CCL18, LDHAL6B, MAP3K8, and SERPINB5 et al.) were found to be up‐regulated and 25 (ITLN1, S1, TKTL2, and CASP5 et al.) down‐regulated when comparing the high‐risk and low‐risk groups (Figure [129]S6C, Supporting Information). 2.7. MALMPS Model Prediction in Drug Sensitivity Drug sensitivity analysis against GDSC2‐derived antineoplastic drugs was performed in different MALMPS risk groups with the ‘oncoPredict’ package (Figure [130]6A). Patients with low risk were found to exhibit greater sensitivity to primary anticancer drugs commonly used in CRC therapy, such as 5‐fluorouracil and oxaliplatin. Additionally, cell cycle‐targeting drugs like MK‐1775 also demonstrated increased sensitivity in the low‐risk group, aligning with the findings from the GSEA results. Conversely, drugs such as BMS‐754807 (IGF‐1R inhibitors) and XAV939 (Wnt/β‐catenin inhibitors) were found to be more sensitive in the high‐risk group (Figure [131]6B). Our proteomic analysis also revealed dysregulation of IGF1 and Wnt/β‐catenin signaling in the high‐risk group (Figure [132]S6A,B, Supporting Information). To further validate these findings, we selected seven CRC cell lines to perform drug sensitivity assays based on the risk grouping of MALMPS (high‐risk group: DLD1, HCT116, SW480, LS180; low‐risk group: RKO, HCT‐15, SW620). Cell lines from the high‐risk group displayed significantly lower colony numbers and reduced cell viability with treatment with XAV‐939 and BMS‐754807 than low‐risk group cell lines, while 5‐fluorouracil, oxaliplatin, and MK‐1775 showed the opposite effect (Figure [133]6C–E). Figure 6. Figure 6 [134]Open in a new tab Drug sensitivity prediction and validation of MALMPS. A) Workflow of drug sensitivity prediction and validation. B) Relative distribution of drug sensitivity in fluorouracil, oxaliplatin, MK‐1775, BMS‐754807 and XAV939. C) Line charts showing cell viability in high‐ versus low‐risk groups with the identified five drugs and control drug. D) Bar plot showing the colony number in high‐ versus low‐risk groups with the identified five drugs and control drug. E) Representative staining images of colony number between risk groups among different cell lines. * P < 0.05, ** P < 0.01, *** P < 0.001. 2.8. Metabolic Biological Activities, Metabolites Alteration, and Proteomic Difference Between MALMPS Groups Metabolism biological activities, and metabolite profiles play important roles in understanding the metabolic reprogramming of CRC.^[ [135]^17 , [136]^18 ^] We utilized the paired metabolome and transcriptomics data of the 73 stage II/III CRC patients from our in‐house SDCRC cohort to identify the association between metabolite alteration and MALMPS risk groups (Figure [137]7A,B ; Table [138]S4, Supporting Information). A total of 2633 metabolites (1244 positive and 1389 negative) with different molecule category was identified through the untargeted LC‐MS/MS approach (Figure [139]S7A, Supporting Information). Initial correlation analyses between metabolite levels and the expression profiles of the MALMPS genes revealed dynamic bidirectional regulatory patterns between metabolic reprogramming‐associated genes (e.g., ARSA, SMPD1, COX7B) and key metabolites (Figure [140]S7B, Supporting Information). Furthermore, GSEA analysis of the enriched REACTOME pathway among different MALMPS subgroups further supported findings from independent public datasets. We observed the enrichment of glycosaminoglycan (GAGs), carbohydrates, and a spectrum of lipid‐related pathways, notably encompassing phospholipid, fatty acids, and sphingolipid metabolism, were significantly pronounced in the high‐risk group. Concurrently, the metabolic pathways of nucleotides, cofactors, and amino acids emerged as enriched beacons within the low‐risk group (Figure [141]7C). In terms of metabolite abundance, phosphatidylethanolamine (PE), phosphatidylserine (PS), phosphatidic acid (PA), and sphingomyelin (SM), which were related to the lipid metabolism, also exhibited elevated relative levels in the high‐risk group. Metabolites involved in nucleotides (such as guanosine monophosphate, hypoxanthine, and uracil), and amino acids (such as proline, valine, and L‐serine) had an elevated score in the low‐risk group (Figure [142]7D). Figure 7. Figure 7 [143]Open in a new tab Metabolic characteristics of MALMPS. A) Scheme plot of metabolites profile in terms of MALMPS in SDCRC cohort. B) Clinical annotations of stage II/III colorectal cancer in SDCRC cohort. C) Metabolic pathway enrichment analysis between MALMPS risk groups against the REACTOME database. D) Forest plot illustrating metabolites alternation between risk groups in SDCRC cohort. E) Spatial metabolomic analysis further validated the metabolic reprogramming of tumor region in different MALMPS subgroups. Spatial metabolomics (SM) provides a visualization solution to explore the intratumor metabolic heterogeneity and cell metabolic interactions in the tumor microenvironment.^[ [144]^19 , [145]^20 ^] Here, we adopted the paired transcriptomic and spatial metabolomics to dissect the MALMPS subgroup‐related tumor metabolic dependency and heterogeneity. Two CRC samples were separately stratified into the low and high‐risk subgroup, and then subjected to MALDI‐MS‐based SM analysis (Method section, Table [146]S5, Supporting Information). H&E images and spatial metabolites data‐driven PCA analysis, which was built based on region‐specific metabolite fingerprints, effectively stratified the two samples into distinct subgroups (Figure [147]S7C, Supporting Information). Furthermore, we compared the metabolite level among the different MALMPS subgroups and found malic acid, hypoxanthine, and proline were enriched in the low‐risk group, whereas arachidonic acid (FA‐C20:4), phosphatidylethanolamine (PE‐C38:4) and glucose were enriched in the high‐risk group (Figure [148]7E). In summary, both metabolomics and spatial metabolomics analyses revealed distinct metabolic profiles across different MALMPS subgroups, highlighting the subgroup‐specific metabolic reprogramming pattern. 2.9. Development and Validation of the Prognostic Nomogram Model To provide a potential clinical tool for recurrence risk prediction within a certain period, we attempted to establish a nomogram model consisting of the MALMPS score and molecular features. We curated the common biomarkers recommended by the NCCN guidelines in the collected datasets, including MSI, BRAF, and KRAS mutations, to fit the Biomarker model (termed Bio) in the TCGA training set. These biomarkers were integrated with MALMPS to develop a nomogram model, referred to as BioMALMPS (Figure [149]8A), which was subsequently validated using the [150]GSE92921 and [151]GSE143985 datasets. The continuous form of the MALMPS and BioMALMPS models showed a significantly improved estimation of PFS compared to the Bio model in the RMS analysis. The average slope of BioMALMPS was found to be larger than that of Bio and MALMPS (Figure [152]8B). Furthermore, we utilized ROC analysis to test the performance of the 1‐year, 3‐year, and 5‐year survival predictions derived by Bio, MALMPS, and BioMALMPS. There is a significant increase compared to Bio, with the highest value of BioMALMPS observed (Figure [153]8C). Net benefit was also apparent in the BioMALMPS model compared to the Bio and MALMPS models with a threshold less than 0.25 in DCA analysis (Figure [154]8D). Figure 8. Figure 8 [155]Open in a new tab Combinations of MALMPS with molecular features. A) Prognostic nomogram predicting the probability of 1‐, 3‐, and 5‐year PFS in TCGA dataset. B) RMS plot of Bio, MALMPS and BioMALMPS in [156]GSE92921 and [157]GSE143985. C) AUCs of ROC curves of Bio, MALMPS and BioMALMPS for 1‐, 3‐, and 5‐year PFS probability. D) DCA of Bio, MALMPS and BioMALMPS model in terms of PFS for 1‐, 3‐, and 5‐year PFS prediction. 3. Discussion This study developed the MALMPS model with 21 metabolic‐related prognostic genes through a combined approach using RSF and GBM, achieving the highest average C‐index in the independent validation datasets. Compared to traditional survival analysis methods, RSF has the advantages of handling high‐dimensional data and preventing overfitting.^[ [158]^21 ^] GBM combines the advantages of the Boosting algorithm and generalized regression models to gradually improve the predictive performance of the model.^[ [159]^22 ^] Among the genes in MALMPS, COX7B was identified as a relatively important gene in the MALMPS model, combined with the Chronos scores and prognostic analyses indicating its significance in tumor malignancy. Furthermore, an in vitro assay demonstrated the invasive and migratory effects of this gene on tumor cells. COX7B is a component of Complex IV in the mitochondrial respiratory chain, playing a crucial role in cellular respiration and energy production. As a part of the mitochondrial respiratory chain, it may promote cancer through its impact on mitochondrial function and metabolism, such as metabolic reprogramming and oxidative stress.^[ [160]^23 ^] Meanwhile, MALMPS had the highest C‐index level in our comparisons with common clinical and molecular features, and the AUCs of the 3‐year PFS were robust across all datasets. These findings demonstrated that our signature could be a promising biomarker for predicting the high‐risk recurrence population of stage II/III CRC in clinical practice. Our findings showed that the low‐risk group had a higher frequency of inactivating mutation in genes belonging to TGF‐β signaling, suggesting reduced TGF‐β signaling activity in the low MALMPS subgroup. TGF‐β plays a critical role in tumor immune regulation by suppressing effector T cells, promoting regulatory T cells, and driving epithelial‐mesenchymal transition (EMT), which enhances tumor invasiveness and resistance to PD‐1 immunotherapy.^[ [161]^24 ^] Cao et al. demonstrated that taurine deficiency in the immune microenvironment resulted the T cell exhaustion and tumor progression,^[ [162]^25 ^] which aligns with our findings from scRNA‐derived metabolic landscape alteration in the high‐risk subgroup. On the other hand, the low‐risk group showed enrichment in cell cycle processes, and subsequent drug sensitivity analysis indicated their sensitivity to drugs targeting the cell cycle pathway. For example, 5‐fluorouracil (5‐FU) is an essential component of systemic chemotherapy for CRC, while XAV‐939 can disrupt Wnt signaling, affecting cellular proliferation and survival.^[ [163]^26 , [164]^27 ^] Our metabolic pathway enrichment revealed that sphingolipid (including glycosphingolipid and phospholipid) metabolism was enriched in the high‐risk group, with the abundance of PE, PS, PA, and SM acquired by LC‐MS. GAGs affect cancer progression by modulating growth factor signaling, extracellular matrix properties, and cancer cell spread, presenting opportunities for therapeutic targeting and drug delivery to enhance antitumor responses.^[ [165]^28 , [166]^29 ^] Phospholipids are essential components of cell membranes, which are involved in cellular signaling and can influence tumor growth and metastasis.^[ [167]^30 , [168]^31 ^] Cancer cells robustly incorporate exogenous fatty acids, which are subsequently remodeled into various lipid species, including phospholipids such as PA, PE, lysophosphatidylethanolamines (LPE), PI, and PS. Neutral lipids like triacylglycerols (TAG) and diacylglycerols (DAG) are formed, along with sphingolipids such as SM.^[ [169]^32 ^] Recent findings indicate that sphingolipid signaling can control intracellular complement activation, leading to inflammasome‐mediated metastasis, presenting a promising strategy for cancer therapy.^[ [170]^33 , [171]^34 ^] We found that glutamate is specifically enriched in the high‐risk group, which may suggest its significant role in the process of carcinogenesis and metastasis. Proliferating cancer cells predominantly depend on glutamine for their survival and proliferation.^[ [172]^35 ^] Glutamine functions as a crucial carbon source for the biosynthesis of lipids and metabolites through the tricarboxylic acid (TCA) cycle, while also providing nitrogen for the synthesis of amino acids and nucleotides.^[ [173]^36 ^] The observed negative correlation between L‐Homoserine, Phosphoserine, L‐Glutamate, and metabolic‐epigenetic regulatory gene clusters (e.g., ARSA, SMPD1, LHPP) likely stems from bidirectional feedback mechanisms at the metabolite‐enzyme interface. Metabolites negatively regulate the activity of epigenetic enzymes by disrupting cofactor availability (e.g., SAM, α‐KG) or through substrate competition, triggering cellular metabolic reprogramming to adapt to microenvironmental stress.^[ [174]^37 ^] This ultimately establishes a dynamic equilibrium between metabolic homeostasis and epigenetic stability. Some limitations must be underscored with the current study, even though the results of our investigation were profound. First, although a total of 1442 patients were included with both microarray and RNA‐seq platforms, the follow‐up cohorts have reached conclusions with outcomes that have already materialized and new cases of our in‐house SDCRC cohort did not have adequate follow‐up outcomes, offering a foundation for future exploration. Second, several metabolic‐related genes from MALMPS in stage II/III CRC, and the linkage of MALMPS and other biological mechanisms, remain to be elucidated. Third, using the combination of metabolites and mRNAs might generate a more robust signature. Last, the effectiveness of the nomogram can be re‐validated in the future using our prospective cohort. Collectively, our analysis established a stable and powerful signature based on machine learning algorithms by metabolic‐related genes. The findings indicate that MALMPS might be a valuable instrument for predicting the recurrence risk of stage II/III colorectal cancer within a specific timeframe, particularly for identifying individuals at high risk. 4. Experimental Section Study Design and Public Data Preprocessing for Prognostic Signature Establishment The gene expression data and corresponding clinical features of CRC samples were collected from publicly available datasets of the NCBI GEO database (https://www.ncbi.nlm.nih.gov/geo/) and TCGA (https://cancergenome.nih.gov/). Transcriptomic and drug stativity data of CRC cell lines were downloaded from the Depmap database^[ [175]^38 ^] (https://depmap.org/portal/download/). The proteomic data of colon cancer samples were curated from the CPTAC dataset^[ [176]^39 ^] (https://www.cbioportal.org/study/summary?id = coad_cptac_2019). A total of 1442 stage II/III CRC samples with RNA‐seq or microarray test, distributed in nine independent datasets, were used to establish the prognostic signature. A large retrospective follow‐up dataset from the Guinney et al. study,^[ [177]^40 ^] which includes [178]GSE14333, [179]GSE37892, [180]GSE39582, and TCGA‐CRC, served as the training dataset for developing a recurrence prognostic prediction model. Independent validation datasets ([181]GSE17536, [182]GSE29621, [183]GSE92921, [184]GSE143985, [185]GSE161158) from different nations were employed to select the optimal model combination and evaluate its predictive performance. In this analysis, the data by excluding samples with zero survival time was refined, individuals who received preoperative chemotherapy and/or radiotherapy, and those with missing recurrence prognosis, staging, or transcriptome data. Only individuals with stage II/III colorectal cancer were retained, and survival time was standardized in months to ensure dataset homogeneity. A total of 1,000 samples were included in the training set to develop the prognostic model. Furthermore, 111, 40, 141, 59, and 91 samples were included in the five independent test cohorts to evaluate the effectiveness of the model. The processed data of the combined training set and five independent validation datasets are provided in the supplementary materials and Table [186]S4 (Supporting Information). Collections and Preparation of SDCRC Cohort Clinical Specimen 73 stage II/III CRC samples from the in‐house Shandong colorectal cancer (SDCRC) cohort were collected and subjected to transcriptomics and metabolomics analysis. Another two CRC samples were utilized for paired transcriptomic and spatial metabolomics analysis. This cohort comprised patients who underwent surgical resection at the Shandong Hospital between August 2021 and June 2024. Post‐surgery, tissue samples were promptly snap‐frozen in liquid nitrogen within 30 min and subsequently stored at −80 °C until further analysis. The inclusion of these tissue samples in the study was sanctioned following the approval from the hospital ethics committee (Approval No. 2021‐176). Before their participation, each patient provided written informed consent. Importantly, all patients in this cohort were newly diagnosed with CRC and had not undergone any prior treatments for the disease. Comprehensive clinical information for each patient, such as age, sex, tumor location, tumor‐node metastasis, and AJCC staging, as well as the constructed MALMPS risk scores and respective groupings, are cataloged in Table [187]S4 (Supporting Information). The transcriptomic data in TPM format was also log‐2 transformed and normalized into a Z score for further analysis. The detailed methodologies for sample collection and preparation, along with the processes involved in transcriptome sequencing, LC‐MS/MS metabolomic, spatial metabolomic analysis, and subsequent data processing specific to the SDCRC cohort, are thoroughly documented in the supplementary methods. Metabolic‐Related Gene Signature Establishment In this study, the outcome variables are survival information about recurrence, specifically focusing on the duration from surgery to either recurrence or follow‐up, along with the status of recurrence. The predictor variables in the signature were the expression level of the metabolic genes that meticulously screened. Since low‐expressed or nonvarying genes usually represent noise, the transcriptome data in the training dataset consisted of ≈6000 genes with the largest median absolute deviation (MAD). These genes were measured by at least one probeset in all datasets, and each gene was represented by the probe‐set with the largest MAD. 84 and 82 metabolic‐related pathways were extracted in the KEGG and REACTOME databases with 1715 and 2679 genes, respectively (Table [188]S6, Supporting Information). A total of 1053 overlapping genes were utilized for further analysis. The intersection of the genes and performed univariate Cox regression analysis was performed with these genes. A total of 149 metabolic‐related prognostic genes were enrolled in the machine learning framework. Ten separate machine learning algorithms and their combinations composed the machine learning framework. The ten algorithms included random survival forest (RSF), elastic network (Enet), Lasso, Ridge, stepwise Cox, CoxBoost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), generalized boosted regression modeling (GBM), and survival support vector machine (survival‐SVM). Six algorithms, including RSF, Enet, Lasso, Ridge, Stepwise Cox, and CoxBoost, could function with feature selection. A total of 83 combined algorithms based on 10‐fold cross‐validation (CV) or bootstrap resampling methods were utilized to select the optimal performance model. The implementation of the machine learning algorithm framework and corresponding hyperparameter optimization are shown in Table [189]S7 (Supporting Information). The algorithm with the highest average C‐index across all validation datasets was regarded as the optimal model to generate the signature. Construction and Validation of the Nomogram A novel nomogram for recurrence prediction of stage II/III CRC patients was constructed by the ‘rms’ package. The 2021 National Comprehensive Cancer Network guidelines recommend testing 7 biomarkers (ie, KRAS, NRAS, BRAF, microsatellite instability [MSI], mismatch repair [MMR], ERBB2 amplification, and NTRK fusion) to determine the optimal clinical management.^[ [190]^41 ^] There are three biomarkers present in both TCGA and validation datasets ([191]GSE92921 and [192]GSE143985). Through Cox proportional hazards regression analysis, these molecular features were systematically integrated with the established signature to develop a comprehensive prognostic model, which was subsequently visualized as a clinically applicable nomogram. The decision curve analysis (DCA) results could be performed to obtain the clinical net benefit of different models and all and no strategies.^[ [193]^42 ^] The prognostic performance of the comprehensive model was also compared in terms of the AUC and revealed by the restricted mean survival (RMS) curve. Statistical Analysis The data processing, statistical analysis, and plotting were generated in R 4.2.2 software. Correlations between two continuous variables were evaluated via Spearman correlation coefficients. The Wilcoxon rank‐sum test or T test was applied to compare the difference between two groups for quantitative data. Analysis of variance or the Kruskal‐Wallis test was used to compare differences among four groups. Two‐sided Fisher exact tests were used to analyze categorical variables. The Cox proportional hazards model and Kaplan‐Meier analysis were performed with the ‘survival’ package. The receiver operating characteristic (ROC) curve was used to assess the prognosis classification performance of MALMPS. The area under the curve (AUC) was calculated via the ‘timeROC’ package. The comparisons between clinical and molecular traits and risk scores were implemented by the ‘compareC’ package. All statistical tests were two‐sided. P < 0.05 was considered statistically significant. The length of error bars represents 95% confidence intervals. Other relevant bioinformatics analyses and in vitro assays are detailed in the supplementary methods and Table [194]S8 (Supporting Information). Availability of Data and Materials The datasets supporting the conclusions of this article are available in TCGA ([195]https://cancergenome.nih.gov/; [196]https://www.cbioportal.org/), DepMap ([197]https://depmap.org/portal/), GEO ([198]https://www.ncbi.nlm.nih.gov/geo/), and CPTAC ([199]https://cptac‐data‐portal.georgetown.edu/cptac) database. The raw transcriptomic data of the 73 CRC samples from SDCRC dataset have been deposited in the Genome Sequence Archive ([200]https://ngdc.cncb.ac.cn/gsa/) under accession number HRA006800. Raw mass spectrometry data of metabolome of CRC samples in SDCRC have been deposited in the OMIX database of the National Genomics Data Center under accession number OMIX005924. The remaining data are available within the supplementary materials and reasonable request to corresponding author. Code Availability Essential scripts for implementing machine learning‐based identification and validation procedure in multiple independent datasets are available on the Github website (https://github.com/chenhao‐qilu/machine‐learning). Conflict of Interest The authors declare no conflict of interest. Author Contributions H.C., Z.W., C.S. contributed equally to this work. W.C., M.L., F.F.T., and L.P.L. have conceived and designed the study. H.C., C.L.S., and Z.W. has contributed to the development of methodology. Z.W., Y.Z., Y.L., T.C.Z., Y.Z., Y.K.L., and Y.L. supported in acquisition, analysis and interpretation of data. Z.W., Y.L., M.L., F.F.T., W.C., L.P.L., and H.C. participated in writing, review, and/or revision of the manuscript. H.C., Z.W., F.F.T., and C.L.S. undertook the administrative, technical, or material support. All the authors read and approved the final version of the review. Supporting information Supporting Information [201]ADVS-12-e01333-s002.docx^ (5.8MB, docx) Supporting Information [202]ADVS-12-e01333-s001.xlsx^ (92.1KB, xlsx) Acknowledgements