Abstract This study develops a prognostic model to predict metastasis and prognosis in colorectal cancer liver metastases by identifying distinct macrophage subsets. Using scRNA-seq data from primary colorectal cancer and liver metastases, we dissected the cellular landscape to find unique macrophage subpopulations, particularly EEF1G + macrophages, which were prevalent in liver metastases. The study leveraged data from [34]GSE231559, TCGA, and GEO databases to construct an 8-gene risk model named EMGS, based on the EEF1G + macrophage gene signature. Patients were divided into high-risk and low-risk groups using the median EMGS score, with the high-risk group showing significantly worse survival. This group also demonstrated upregulated pathways associated with tumor progression, such as epithelial-mesenchymal transition and angiogenesis, and downregulated metabolic pathways. Moreover, the high-risk group presented an immunosuppressive microenvironment, with a higher TIDE score indicating lower effectiveness of immunotherapy. The study identifies potential drugs targeting the high-risk group, suggesting therapeutic avenues to improve survival. Conclusively, the EMGS score identifies colorectal cancer patients at high risk of liver metastases, highlighting the role of specific macrophage subsets in tumor progression and providing a basis for personalized treatment strategies. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-024-77248-2. Keywords: scRNA-seq, Colorectal cancer, Macrophage, Metastatic, Prognosis, Metabolic, Immunosuppressive Subject terms: Gastrointestinal cancer, Tumour biomarkers, Tumour immunology Introduction Colorectal cancer (CRC) is among the most common malignant tumors and the second leading cause of cancer-related deaths. In the United States, the estimated number of new cases and deaths for CRC in 2023 are 153,020 and 52,550, respectively^[35]1. Metastasis is the primary cause of death in CRC patients. with liver metastasis occurring in over half of the cases^[36]2. Despite advances in surgical resection of liver metastases, the utilization of immunotherapy, targeted molecular therapies, and third-line chemotherapeutic agents, which have significantly improved survival outcomes for advanced CRC^[37]3, the five-year survival rate for stage IV CRC remains at only 25–30%^[38]4,[39]5. In contrast, early-stage CRCs can achieve survival rates exceeding 80%^[40]6. Therefore, there is an urgent need for a superior biomarker to predict the risk of metastasis in CRC patients and to stratify prognosis accordingly. Tumor immune microenvironment (TME) plays a critical role in the metastatic process of CRC. Studies have discovered that CRC liver metastases possess a highly immunosuppressive phenotype, primarily characterized by the enrichment of specific tumor-associated macrophages (TAMs)^[41]7. TAMs display considerable plasticity and heterogeneity. Various components of TME, such as cytokines, chemokines, and exosomes, recruit TAMs to the tumor area and drive TAMs toward a polarization state on a spectrum between anti-tumor (M1-like) and pro-tumor (M2-like) phenotypes. The prevalence of pro-tumorigenic TAMs is a critical factor in tumor progression, chemotherapy resistance, and suboptimal responses to immune checkpoint blockade (ICB)^[42]8. Numerous studies have utilized single-cell RNA sequencing (scRNA-seq) to dissect pro-tumorigenic TAM populations in CRC, uncovering key functional subgroups that go beyond the conventional M1/M2 paradigm. For instance, Sathe et al. have identified a distinct SPP1 + macrophage cell state in the mCRC with fibrogenic properties and altered metabolism^[43]9. Meanwhile, Bao et al. developed an immunometabolism subtyping system and found that S100A9 + macrophage population contributes to the C3 subtype, which exhibits the poorest prognosis. A combination treatment involving PD-1 blockade and the S100A9 inhibitor has been shown to reverse the dysfunctional immunotherapy response in the C3 subtype^[44]10. Moreover, studies have noted metabolic activity variations among different TAM subgroups. M1 macrophages tend to favor glycolysis secreting large amounts of lactate for rapid energy supply. In contrast, M2 macrophages often show enhanced fatty acid oxidation capabilities, generating substantial ATP as a vital energy source^[45]11,[46]12. Metabolic reprogramming has emerged as one of the significant breakthroughs in macrophage-targeted interventions and enhancing anti-tumor efficacy. A recent study suggests that the expression of the metabolic enzyme IRG1 in TAM subsets can enhance their immunosuppressive function. Targeting IRG1 may induce TAMs to enter an anti-tumor mode and improve responses to anti-PD-(L)1 immunotherapy^[47]13. Therefore, elucidating the role of TAMs in CRC metastasis and immune metabolic reprogramming could aid in prognosis prediction and the development of promising therapeutic strategies. In this study, we delineate the cellular landscape of primary colorectal cancer (CRC) lesions and liver metastases, identifying a metastasis-specific EEF1G + TAM subgroup based on scRNA-seq datasets. Next, we constructed a novel prognostic signature comprising eight EEF1G + TAM-related genes (EMGS), which can significantly and independently predict overall survival. We validated the prognostic value of EMGS in three external cohorts. Furthermore, we explored the tumor biology, metabolic characteristics, immune microenvironment, and immunotherapy responses in EMGS-high and EMGS-low patients. Our results provide new insights into TAM functions during metastasis, leading to the discovery of precise therapeutic targets and improved outcomes for CRC patients. Results Cellular landscape of primary CRC and liver metastasis To understand the single-cell level landscape of primary colorectal cancer (CRC) and liver metastases, we reanalyzed the scRNA-seq dataset [48]GSE231559^[49]14. This dataset initially comprised 26 samples from 18 patients with colorectal cancer. Following quality filtering of the single-cell sequencing data, the final analysis included 24 samples from 17 patients. These samples consisted of 4 colorectal cancer tissues (Col_T), 3 adjacent normal tissues (Col_N), 9 liver metastasis tissues (Liv_T), and 8 liver normal samples (Liv_N). For data processing, principal component analysis (PCA) was performed with n_comps = 50, followed by the construction of neighborhood graphs with the number of neighbors set to 10. Using the Louvain algorithm, we identified 23 unique cell clusters, which were relatively evenly distributed across all samples (Fig. [50]1A, Supplementary File Figure [51]S1A), suggesting that there is no significant batch effect interference among the samples. Cell types were annotated using CellTypist based on known marker genes, categorizing 56,163 cells into 9 major cell types (Fig. [52]1B). T cells and myeloid cells had the highest cell numbers, with 36,566 and 6,549 cells, respectively (Supplementary File Table [53]S1). T cells, NK cells, myeloid cells, mast cells, B cells, plasma cells, epithelial cells, fibroblast cells, and endothelial cells exhibited distinguishing canonical marker gene expression patterns (Fig. [54]1C, Supplementary File Figure [55]S1B-D). Notably, CD3D, NKG7, S100A9, CPA3, MS4A1, MZB1, KRT18, DCN, and CLDN5 were highly specifically expressed in their respective cell types (Fig. [56]1D, Supplementary File Figure [57]S1B). These results demonstrate that we have clearly delineated the distinct cell lineages in CRC liver metastasis patient samples. Fig. 1. [58]Fig. 1 [59]Open in a new tab Analysis of the colorectal cancer single-cell atlas. (A) Uniform Manifold Approximation and Projection (UMAP) visualization of 56,163 cells categorized into 23 distinct clusters. (B) UMAP delineation of nine principal cell lineages (T cells, NK cells, Myeloid cells, Mast cells, B cells, Plasma cells, Epithelial cells, Fibroblasts, and Endothelial cells) derived from CRC patient samples. (C) Violin plots depicting the distribution of a selected canonical marker gene expression across each of the major cell lineages. (D) UMAP representations of the nine principal cell lineages, each highlighted by the expression of its respective canonical marker gene. Characterization of different myeloid cell subsets Immunosuppressive myeloid cells are an indispensable driver in tumor progression and immunotherapy resistance in CRC. Therefore, we further analyzed different myeloid cell subsets to identify key subgroups associated with metastasis. A total of 6,627 myeloid cells were included in the analysis and further classified into 9 subtypes (Fig. [60]2A-B). These myeloid cells were primarily composed of four cell types: macrophages, monocytes, dendritic cells, and a small number of mast cells. Distinct marker gene expression differences were observed among the four myeloid cell subsets (Fig. [61]2C-D, Supplementary File Figure S2A-C), and UMAP visualization of marker gene expression also revealed subset-specific expression patterns (Fig. [62]2E). We assessed the proportion of different myeloid cell subsets across various sample types. Notably, macrophages were significantly enriched in liver metastasis lesions, while they were less prevalent in normal liver tissue, with monocytes showing the opposite pattern (Fig. [63]2F). This prominent presence of macrophages in liver metastases highlights their immunosuppressive effect, aiding tumor metastasis and colonization. Collectively, characterizing tumor-associated macrophages in CRC may provide valuable insights for understanding tumor metastasis and predicting patient prognosis. Fig. 2. [64]Fig. 2 [65]Open in a new tab Gene expression profiling of myeloid cell clusters in CRC patients. (A) A UMAP visualization of myeloid cells delineated into nine distinct clusters. (B) A detailed UMAP plot of 6,549 myeloid cells, categorized by four myeloid cell phenotypes. (C) Violin plots depicting the expression levels of three canonical marker genes for each myeloid cell type (Macrophages: APOE, C1QA, C1QB; Monocytes: S100A9, FCN1, S100A8; Dendritic Cells: IDO1, XCR1, CLEC9A; Mast Cells: CPA3, KIT, MS4A2). (D) A heatmap revealing the expression patterns of three representative genes across macrophages, monocytes, dendritic cells, and mast cells. (E) UMAP plot of the same representative genes highlight their expression within macrophages, monocytes, dendritic cells, and mast cells. (F) Proportional distribution of four myeloid cell types in each sample. Construction of prognostic signature based on metastasis-specific macrophages To identify key macrophage subgroups promoting tumor metastasis, we performed sub-cluster analysis on 4,319 macrophage cells. This analysis yielded four macrophage phenotypes, labeled as EEF1G+, C1QC+, SEPT2+, and S100A8 + macrophages (Fig. [66]3A-B, Supplementary File Figure S3A-B). Notably, EEF1G + macrophages were predominantly found in liver metastases (Fig. [67]3A), with their marker genes significantly more expressed in liver metastases than in primary lesions (Fig. [68]3C). C1QC + macrophage marker genes had similar expression levels in both liver metastases and primary lesions. SEPT2 + and S100A8 + macrophages were primarily expressed in primary lesions and reduced in liver metastases (Fig. [69]3C). Additionally, we compared our four macrophage subtypes with the previously reported seven pan-cancer TAM subtypes^[70]15. We found that EEF1G + macrophages predominantly expressed marker genes associated with Inflam-TAMs and IFN-TAMs, while also specifically exhibiting high expression of the Angio-TAMs marker gene VEGFA. Notably, TAM-derived VEGFA can facilitate metastasis by promoting various aspects of tumor progression. In contrast, C1QC + macrophages showed dominant expression of LA-TAMs marker genes, indicating a potential role in lipid metabolism and tissue remodeling. The SEPT2 + and S100A8 + macrophages exhibited moderate expression of marker genes from various oncogenic TAM subtypes (Fig. [71]3D). Taken together, we consider EEF1G + macrophages to be the primary metastasis-specific macrophage subgroup. We further explored currently available single-cell datasets for colorectal cancer and reanalyzed [72]GSE178318, which comprises paired primary and liver metastasis samples from six patients (Supplementary File Figure S3C and S3D). In this dataset, we observed a significantly higher prevalence of the EEF1G + macrophage subpopulation in liver metastases compared to primary tumors, thereby validating the existence of the liver metastasis-specific macrophage subpopulation identified in this study. Differentially expressed genes (DEGs) analysis for EEF1G + macrophages, compared to other macrophage cell types, revealed 1,264 up-regulated and 107 down-regulated genes (Fig. [73]3E). We assessed the prognostic value of these DEGs using the expression and survival data of TCGA-COAD cohort. Then, we applied the LASSO-Cox proportional hazards regression model to these prognostic DEGs, ultimately identifying a prognostic signature composed of 8 genes. We defined a risk model based on the EEF1G + macrophage-related gene signature as the EMGS (EEF1G + Macrophage Gene Signature). Subsequently, for each patient, the EMGS was calculated using the formula: EMGS = (0.002 × HIF1A-AS3 expression) + (0.359 × [74]AL645608.7 expression) + (0.022 ×INO80B expression) + (0.154 × HSPA1A expression) + (0.181 × SNAI1 expression) + (0.424 × IFITM10 expression) + (0.248 × ZNF736 expression) + (0.296 × SNHG7 expression) (Fig. [75]3F). Fig. 3. [76]Fig. 3 [77]Open in a new tab Gene expression profiling of macrophage cell cluster in CRC patients. (A) UMAP visualization categorizes macrophage cells into normal colon (Col_N), colon tumor (Col_T), normal liver (Liv_N), and liver tumor (Liv_T) groups. (B) A refined UMAP plot of 4,319 macrophage cells, classified into four macrophage phenotypes (Mph-EEF1G, Mph-C1QC, Mph-SEPT2, and Mph-S100A8). (C) Violin plots illustrate the expression profiles of three key marker genes within the Col_T and Liv_T macrophage populations. (D) The stacked violin plot shows the expression levels of the marker genes, such as IL1B, CCL3, ISG15, CXCL8, ARG1, CD274, MKI67, CDK1, LYVE1, HES1, VEGFA, SPP1, APOC1, and APOE, associated with specific functional states like inflammatory TAMs (Inflam-TAMs), IFN-TAMs, regulatory TAMs (Reg-TAMs), proliferative TAMs (Prolif-TAMs), resident TAMs (RTM-TAMs), angiogenic TAMs (Angio-TAMs), and lipid-associated TAMs (LA-TAMs). Each row represents different macrophage subtypes identified in the study, including Mph-EEF1G, Mph-C1QC, Mph-SEPT2, and Mph-S100A8. (E) A volcano plot highlights differentially expressed genes between the Mph-EEF1G phenotype and other macrophage cell types. (F) Forest plots present hazard ratios (HRs, depicted by pink circles) and 95% confidence intervals (represented by horizontal lines) from Cox regression survival analyses, correlating the overall survival with eight macrophage signature genes. Performance of the EMGS Signature in training and validation cohorts After calculating the EMGS for each patient, we used the median EMGS as a cutoff to divide patients into EMGS-high and EMGS-low groups. In the TCGA-COAD training set, the EMGS-high group demonstrated significantly higher mortality risk and lower overall survival rates (Fig. [78]4A, HR = 2.5, 95% CI: 1.77–3.54, P < 0.0001). Time-dependent ROC curves were employed to estimate the prognostic accuracy of the EMGS model, revealing 1-, 3-, and 5-year AUC values of 0.762, 0.682, and 0.693, respectively (Fig. [79]4B). Additionally, an overview illustrating risk score, grouping, survival time and status, along with the expression levels of model genes, showed that an increase in risk score was associated with a higher number of patient deaths and an increase in the expression levels of model genes (Fig. [80]4C). Furthermore, to validate the generalizability of the prognostic model, we calculated EMGS for patients in three independent GEO cohorts. The EMGS-high group had a significantly lower overall survival rate compared to the EMGS-low group in all three datasets: [81]GSE29621 (Fig. [82]4D, HR: 2.49, 95% CI: 1.29–4.82, P = 0.027), [83]GSE17536 (Fig. [84]4E, HR: 2.04, 95% CI: 1.38–3, P = 0.0024), and [85]GSE39582 (Fig. [86]4F, HR: 1.4, 95% CI: 1.1–1.78, P = 0.022). Finally, we performed multivariate Cox proportional hazards analysis using EMGS along with clinical factors such as age, gender, and tumor stage. The results indicated that EMGS is an independent predictor of overall survival (OS) in the TCGA-COAD cohort (P = 0.01, Fig. [87]4G). In summary, these results demonstrate the superior efficacy of the EMGS risk score as a prognostic signature. Fig. 4. [88]Fig. 4 [89]Open in a new tab Survival analysis in high-risk and low-risk groups based on EMGS. (A) Kaplan–Meier curves compared the overall survival of TCGA-COAD patients between high-risk and low-risk groups. (B) ROC curves of the EMGS score for predicting the risk of death at 1, 3, and 5 years. (C) The distribution of risk score (top), survival status (middle), and expression (bottom) of the identified eight EMGS genes. Validation of EMGS in (D) [90]GSE29621 (n = 65) cohort, (E) [91]GSE17536 (n = 177) cohort and (F) [92]GSE39582 (n = 556) cohort. (G) Forest plot shows HRs and 95% confidence intervals (horizontal ranges) derived from multivariate cox regression analyses including EMGS group, age, gender, and stage. Functional pathways and metabolic alterations associated with the EMGS signature To elucidate the biological mechanisms for the poorer prognosis associated with a high EMGS score, we initiated a differential expression analysis (DEG) between the high and low-risk groups. We found significant upregulation of 2779 genes and downregulation of 16 genes in the high-risk group (Supplementary File Table S2). Gene Ontology annotation of the DEGs revealed enrichment in biological processes associated with external encapsulating structure organization and extracellular matrix organization, with cellular components and molecular functions closely linked to the cell membrane and extracellular matrix (Fig. [93]5A). Gene Set Enrichment Analysis (GSEA) indicated that the high-risk group predominantly enriches in pathways related to epithelial-mesenchymal transition (EMT) and angiogenesis, key processes involved in metastasis, while downregulating pathways like the interferon alpha signaling pathway (Supplementary File Fig. [94]5B-C). Additionally, using gene set variation analysis (GSVA) to calculate sample-wise gene set enrichment scores, we observed significantly enhanced activity of the EMT and TGFβ pathways in the high-risk group compared to the low-risk group (Fig. [95]5D). These results suggest a greater metastatic potential in the high-risk group. Furthermore, we calculated the metabolic fluxes for 13,082 reactions and 142 metabolic pathway activities using METAFlux (METAbolic Flux balance analysis). We identified 166 fluxes with significant differences between the high and low-risk groups (Supplementary File Table S3), involving processes such as glycolysis, purine metabolism, nucleotide metabolism, and beta oxidation of fatty acids. By integrating fluxes of reactions associated with specific pathways, we found that at the metabolic pathway level, the high-risk group significantly downregulated beta oxidation of fatty acids, folate metabolism, and sulfur metabolism, while upregulating inositol phosphate metabolism and ether lipid metabolism (Fig. [96]5E, Supplementary File Table S4). These findings highlight the association between the EMGS signature and lipid metabolism reprogramming, reflecting the potential role of targeting metabolic pathways in cancer treatment. Fig. 5. [97]Fig. 5 [98]Open in a new tab Functional differences between EMGS high and low groups. (A) Gene Ontology analysis of differentially expressed genes between EMGS high and low groups. (B) GSEA analysis of hallmark pathways. (C) GSEA enrichplot displaying significantly upregulated and downregulated top pathways in the EMGS high group. (D) GSVA activity analysis of hallmark EMT and TGF-β signaling pathway. (E) Differences in metabolic pathways between EMGS high and low groups. Immune microenvironment shaped by EMGS To determine the differences in the immune microenvironment between the high and low EMGS score groups, we first deconvoluted bulk expression matrices to infer the proportions of various immune and stromal cell types. CIBERSORT indicated a notably higher proportion of M0 and M2 macrophages compared to the low-risk group, reflecting a positive association between the EMGS score and M2-type macrophages. Conversely, the low-risk group exhibited an elevated percentage of CD8 + T cell (Fig. [99]6A). MCPcounter also found that the high-risk group was significantly enriched in monocytic lineage, neutrophils, endothelial cells, and fibroblasts (Fig. [100]6B). Consistently, ESTIMATE indicated that the high-risk group had significantly higher stromal and ESTIMATE scores and lower tumor purity, although there were no significant differences in the immune score between the two groups (Supplementary File Figure S4A). Additionally, we characterized several gene signatures associated with immune evasion through T-cell exclusion. Our analysis revealed that patients in the high-EMGS group had notably elevated Cancer Associated Fibroblasts (CAF) scores (Fig. [101]6C), T-cell exclusion characteristics (Fig. [102]6D), and a higher TIDE score (Fig. [103]6E). These observations were consistent across three other independent validation datasets from GEO (Supplementary File Figure S4B-D). Collectively, these findings suggest that the higher EMGS score correlates with an immunosuppressive tumor microenvironment, which may contribute to reduced responsiveness to immune checkpoint blockade therapies and poorer clinical outcomes. Fig. 6. [104]Fig. 6 [105]Open in a new tab Differences in the immune microenvironment between EMGS high and low groups. (A) Proportional distribution of 22 immune cell types obtained through CIBERSORT deconvolution. (B) Proportional distribution of eight immune and two stromal cell populations calculated by MCPcounter. For EMGS high and low groups, scores calculated using the ssGSEA algorithm for (C) CAF score, (D) various immune exclusion signature scores, and (E) TIDE score. Prediction of drug response in EMGS-high patients Patients with a high EMGS score, due to their greater metastatic potential and poorer prognosis, urgently require effective drug treatments. Here, we predicted drugs with differential responses between high and low EMGS patients based on drug sensitivity data from colorectal cancer cell lines. The analysis revealed that the EMGS-high group is more sensitive to three anticancer drugs compared to the EMGS-low group: docetaxel (Fig. [106]7A), a chemotherapy drug used to treat multiple metastatic and non-resectable tumor types, and two targeted cancer drugs, pazopanib and sorafenib (Fig. [107]7B-C), which are widely used in the treatment of hepatocellular carcinoma and kidney cancer patients, respectively. Additionally, we utilized the DrugMap database ([108]http://drugmap.idrblab.net/) to search for Drug Therapeutic Targets among genes related to EMGS. We discovered that for the gene HSPA1A, which encodes Heat shock protein 70, several drugs are currently in development or clinical trials (Fig. [109]7D). These drugs may bring clinical benefits for the long-term survival of EMGS-high patients. Fig. 7. [110]Fig. 7 [111]Open in a new tab Predicted effective anticancer drugs for the EMGS high group. The IC50 values of three anticancer drugs were predicted by the pRRophetic algorithm and showed significant differences in sensitivity between EMGS high and low groups: (A) Docetaxel, (B) Pazopanib, (C) Sorafenib. (D) Targeted drugs for the EMGS gene HSPA1A identified using the DrugMap database. Discussion Macrophages are one of the most abundant immune cells in colorectal cancer. However, their function remains controversial due to the presence of various phenotypes. While several studies have identified specific functional subgroups of macrophages using single-cell sequencing techniques, few have linked these findings to patient prognosis or therapeutic target development. In this study, using scRNA-seq datasets from primary colorectal cancer lesions and liver metastases, we identified four unique macrophage subgroups. Notably, EEF1G + macrophages were predominantly found in liver metastases, with their marker genes significantly overexpressed compared to primary lesions. Therefore, we propose that EEF1G + macrophages may be the primary cell subgroup promoting liver metastasis. Based on the specific expression of genes in EEF1G + macrophages, we constructed a novel prognostic signature comprising eight EEF1G + macrophage-related genes (EMGS), which can significantly and independently predict overall survival. Recent advancements in single-cell sequencing have enabled a deeper understanding of the functional roles and potential applications of different cell types in colorectal cancer (CRC). For example, a study profiling three left-sided and three right-sided CRC patients using single-cell RNA sequencing revealed that M2-like macrophages were more prevalent in left-sided CRC and were associated with a favorable prognosis in CRC^[112]16. Liu et al. explored the immune phenotypic linkage between CRC and liver metastasis through single-cell analyses, identifying that SPP1 + macrophages were predominant in liver metastases and played a pro-metastatic role^[113]17. Additionally, a recent study found that tumor-associated macrophages and granulocytic myeloid-derived suppressor cells were significantly increased in CRC compared to their counterparts in normal tissue, displaying strong immune-inhibitory signatures. Further research suggested that Sirpa−/− macrophages exhibited enhanced phagocytosis and antigen presentation, promoting T cell activation and proliferation, indicating that Sirpa blockade could be a promising strategy to enhance cancer immunotherapy^[114]18.Beyond macrophages, epithelial cell subpopulation analyses have also identified novel metastatic and malignant traits. For instance, Joanito et al. integrated single-cell and bulk transcriptome sequencing to identify two intrinsic subtypes of epithelial tumor cells (iCMS2 and iCMS3), discovering that the consensus molecular subtype (CMS) classification 4 cancers with iCMS3 epithelium had poorer prognoses^[115]19. Another study focusing on primary CRC and liver or ovarian metastases identified a stem-like epithelial cell cluster with high PTPRO and ASCL2 expression. This cluster drives CRC metastasis, with distinct subpopulations contributing to organ-specific metastasis^[116]20. Moreover, research has also characterized CRC-specific fibroblast populations, including F3 + fibroblasts enriched in primary tumors and MCAM + fibroblasts enriched in liver metastases, which have opposing prognostic impacts. Additionally, CD8_CXCL13 and CD4_CXCL13 subpopulations were significantly increased in liver metastasis samples, with increased CXCL13 expression correlating with better prognosis in CRC^[117]21. These findings provide a multidimensional understanding of CRC metastasis and progression through the lens of diverse cell type. EEF1G encodes the Eukaryotic translation elongation factor 1 gamma, playing a pivotal role in the translation elongation process. It has been reported to be overexpressed in various cancers, including gastric carcinoma^[118]22, colon adenocarcinoma^[119]23, and pancreatic cancer^[120]24. A pan-cancer analysis of seven elongation factors, using public databases (Oncomine and TCGA), indicated that the transcription level of EEF1G and its prognostic impact are cancer-specific^[121]25. For instance, EEF1G expression is downregulated in breast and gastric cancers, yet its increased transcription level predicted favorable survival outcomes in breast cancer and was associated with poor prognosis in gastric cancer. Additionally, high EEF1G expression is linked to worse prognosis in lung, pancreatic, and liver cancers, while its elevated levels predicted better survival in colorectal, brain and CNS, ovarian, and kidney cancers. In this study, we observed that EEF1G is significantly overexpressed in liver metastases compared to primary lesions, suggesting its adverse prognostic role, which contrasts with the results from bulk RNA-Seq, indicating the expression specificity of this gene in different cell types. Leveraging single-cell sequencing technology, our findings reveal the potential role of EEF1G in promoting metastasis. Our analysis indicates that patients with a high EMGS score, compared to those with a low score, exhibit tumor biology enriched in gene ontology related to the extracellular matrix, as well as cancer hallmarks such as epithelial-mesenchymal transition (EMT) and angiogenesis. These findings align with previous studies revealing that tumor-associated macrophages induce the EMT process by secreting a multitude of cytokines and growth factors, like transforming growth factor-beta (TGF-β), thereby promoting tumor invasion and metastasis^[122]26,[123]27. This suggests that EMGS score reflects the pro-tumorigenic characteristics of TAMs and may serve as a predictive factor for the liver metastasis potential in colorectal cancer patients. Moreover, we identified notable metabolic differences between high and low EMGS groups, particularly in fatty acid, phosphatidylinositol, and ether lipid metabolism. Prior studies have shown that lipid metabolism reprogramming in TAMs can regulate their phenotype and function, thus manifesting pro-tumorigenic or anti-tumorigenic effects^[124]28,[125]29. In addition, a significant reduction in folate metabolism activity was observed in the high EMGS group. Studies have proposed variations in folate uptake and accumulation between macrophages of different polarization states^[126]30,[127]31. These results indicate that the EMGS high and low groups represent distinct macrophage states with clear metabolic differences. Targeting specific metabolic pathways to shift TAM polarization towards an M1-like phenotype could offer a potent approach in cancer treatment strategies. In the EMGS high group, an abundance of monocytes including M0 and M2 macrophages, along with endothelial and fibroblast cells, was detected via deconvolution algorithms like CIBERSORT and MCPcounter. Moreover, this group exhibited notably elevated stromal scores and reduced tumor purity, while the immune scores remained comparable between the two groups, as validated by ESTIMATE. These results suggest that immune cells do not play a primary role in the differing biological behaviors between the high and low-risk groups. The high-risk tumors are primarily influenced by various regulatory mechanisms of stromal cells, including providing growth factors, promoting angiogenesis, and altering the composition and stiffness of the extracellular matrix. Consistently, the high-risk group also exhibited significantly higher scores for CAF and various immune exclusion signatures, calculated using the ssGESA algorithm. Additionally, the high-risk group showed a substantially elevated TIDE score, an indicator used to predict tumor response to immune checkpoint blockade treatments. Overall, these results suggest that tumors in the high-risk group may have stronger tumor microenvironment-driven progression mechanisms, including active involvement of CAFs, enhanced immune evasion, and potential resistance to current immunotherapeutic strategies. Therefore, a high EMGS score derived from the 8-gene risk model may identify a subset of colorectal cancer patients who are less likely to respond to immunotherapy, even within the dMMR/MSI-H subtype. Additionally, in these patients, the efficacy of combining VEGF inhibitors with ICI therapy could be explored. Recent studies have reported that crosstalk between CAFs and TAMs regulates the immune microenvironment, promoting tumor progression and immune escape^[128]32–[129]34. Modulating the interaction and/or differentiation of CAFs and TAMs may significantly enhance the development of future targeted therapeutic approaches. Nevertheless, the current study has some limitations. Firstly, the metastasis-specific EEF1G + macrophages were identified from scRNA-Seq datasets of only four colorectal cancer tissues and nine liver metastasis tissues, necessitating validation of this subgroup’s functional characteristics in a larger sample size. Additionally, while our study emphasized macrophage-related prognostic features, it also indicated potential interactions with other immune and stromal cells like CAFs. Therefore, combining other tumor microenvironment components may further enhance the model’s performance. Lastly, the drugs identified for the high-risk group should be substantiated through further clinical trials. In summary, the EMGS score developed in this study demonstrated superior prognostic performance in TCGA and three independent datasets. The enrichment of metastasis-related pathways and the formation of an immunosuppressive microenvironment in the EMGS high group also underscore the role of EMGS in predicting patients’ metastatic potential. This study thus contributes valuable insights into the key cellular components and molecular mechanisms of the colorectal cancer metastatic process. Methods and materials Collection of single-cell RNA sequencing and GEO datasets The original scRNA-seq dataset from fresh colorectal cancer (CRC) tissues and adjucent normal tissues was sourced from [130]GSE231559^[131]14. This dataset comprised data from 17 patients, totaling 24 samples. These included 4 fresh colorectal cancer tissues (Col_T), 3 adjucent normal samples (Col_N), 9 liver metastasis samples (Liv_T) and 8 liver normal samples (Liv_N). Gene expression profiles for colorectal cancer (CRC) cohorts were obtained from The Cancer Genome Atlas (TCGA) portal, accessible at [132]https://portal.gdc.cancer.gov/. We accessed raw read counts and associated clinical information including age, gender, cancer stage, overall survival (OS), and vital status of CRC patients from the UCSC Xena website ([133]https://xenabrowser.net/datapages/). To further validate the utility of the prognostic model, we utilized microarray data and clinical details from the NCBI Gene Expression Omnibus (GEO) database, specifically for [134]GSE29621 (n = 65), [135]GSE17536 (n = 177), and [136]GSE39582 (n = 556). Single cell analysis and cell clustering The single-cell dataset matrix was imported into Scanpy (version 1.9.1) [PMID: 29409532] for comprehensive analysis. We performed principal component analysis (PCA) using the ‘sc.tl.pca’ function, setting the number of components to 50 (n_comps = 50). Neighborhood graphs were constructed using ‘sc.pp.neighbors’ with the number of neighbors set to 10 (n_neighbors = 10). Dimensionality reduction was carried out on highly variable genes, and cell clusters were identified using the Louvain algorithm through the ‘sc.tl.louvain’ module, setting the resolution parameter at 0.4. These cell clusters were visualized using the uniform manifold approximation and projection (UMAP) technique via the ‘sc.pl.umap’ module. Cell cluster annotations were determined based on the expression of known marker genes, utilizing CellTypist ([137]https://github.com/Teichlab/celltypist)^[138]35. Differential expressed gene analysis We compared the expression of individual genes in each cluster with the remaining cells using the ‘sc.tl.rank_genes_groups’ module, employing the Wilcoxon rank sum test. A gene was classified as up-regulated or down-regulated based on a significance threshold of P < 0.05. The cutoff criteria for determining up-regulation or down-regulation were set at a log(fold change) ≥ 1 and ≤ − 1, respectively. Construction and validation of the prognostic signature To ascertain the prognostic value of marker genes ranging from EEF1G + macrophages to others in TCGA-COAD patients, we conducted a univariate Cox regression analysis focusing on overall survival (OS). Genes with a P-value less than 0.05 were identified as prognostic genes. Subsequently, these genes were assessed using the least absolute shrinkage and selection operator (LASSO) method in Cox proportional hazards regression, employing the ‘glmnet’ package in R. Based on this analysis, we developed a risk model by linearly combining the mRNA expression levels of these genes with their respective risk coefficients. This process identified eight genes as key candidates for further prognosis-related investigation (EMGS model, EEF1G + Macrophage Gene Signature). Patients were divided into two groups based on the median EMGS score: high-risk and low-risk group. To validate the prognostic significance of the EMGS score, we generated receiver operating characteristic (ROC) curves^[139]36 and calculated the area under the curve (AUC) using the ‘survivalROC’ package in R, thereby evaluating the accuracy of the EMGS model. Survival analysis Kaplan-Meier (KM) curves were employed to display the cumulative survival rates between the high and low EMGS groups in the TCGA-COAD cohort and three independent GEO datasets. Additionally, Multivariable Cox regression analysis was performed to assess the independence of the EMGS prognostic model. The aforementioned analyses were conducted using the ‘survival’ and ‘survminer’ R packages. Pathway enrichment analysis The DESeq2 R package was utilized to identify differentially expressed genes between the high and low EMGS groups. Reference gene sets were sourced from c5.go.v2023.2.Hs.symbols.gmt and h.all.v2023.2.Hs.symbols.gmt from the Molecular Signatures Database (MSigDB). Gene Ontology and Gene Set Enrichment Analysis (GSEA) were conducted using the ‘clusterProfiler’ R package (v3.18.1). Gene Set Variation Analysis (GSVA) was employed to differentiate the activity of enriched pathways between the EMGS-high and EMGS-low groups. Using the ssGSEA algorithm within the ‘GSVA’ R package, we calculated the enrichment scores for each gene set in the TCGA-COAD samples. Metabolic fluxes and pathway activities were calculated and normalized from gene expression data using METAFlux^[140]37. Immune infiltration analysis The composition of immune cells was inferred using the CIBERSORT and MCPcounter algorithms through the Immuno-Oncology Biological Research (IOBR) R package^[141]38. The ESTIMATE algorithm was employed to calculate the proportions of immune and stromal components within the tumor microenvironment (TME) of each sample^[142]39. This calculation produces three scores: ImmuneScore, StromalScore, and ESTIMATEScore, corresponding to the immune component, stromal component, and their combined total, respectively. A higher score indicates a greater proportion of the respective component in the TME. Additionally, the Tumor Immune Dysfunction and Exclusion (TIDE) algorithm was utilized to predict potential responses to immune checkpoint blockade (ICB) therapies^[143]40. Patients with higher TIDE scores are typically more susceptible to antitumor immune evasion, often correlating with lower ICB treatment response rates. The prediction of response to targets agents We evaluated the response of each sample to targeted agents by calculating the half-maximal inhibitory concentration (IC50) using the ‘pRRophetic’ R package, which utilizes data from the Genomics of Drug Sensitivity in Cancer (GDSC) database ([144]https://www.cancerrxgene.org/). Our investigation focused on identifying variations in sensitivity to these targeted agents between high-risk and low-risk groups. Statistics analysis We assessed differences of statistical significance using a two-tailed Student’s t-test or Wilcoxon test on the R platform (version 4.0.3). Multivariate analysis was conducted employing the Cox proportional hazards model, using the R packages ‘survival’, ‘survminer’, and ‘forestplot’. This analysis aimed to identify independent factors associated with overall survival (OS) in both TCGA-COAD and GEO cohorts. P values were adjusted for multiple comparisons using the false discovery rate (FDR) method, with FDR values less than 0.05 considered statistically significant. Electronic supplementary material Below is the link to the electronic supplementary material. [145]Supplementary Material 1^ (14.4MB, pdf) Acknowledgements