Abstract

   This study develops a prognostic model to predict metastasis and
   prognosis in colorectal cancer liver metastases by identifying distinct
   macrophage subsets. Using scRNA-seq data from primary colorectal cancer
   and liver metastases, we dissected the cellular landscape to find
   unique macrophage subpopulations, particularly EEF1G + macrophages,
   which were prevalent in liver metastases. The study leveraged data from
   [34]GSE231559, TCGA, and GEO databases to construct an 8-gene risk
   model named EMGS, based on the EEF1G + macrophage gene signature.
   Patients were divided into high-risk and low-risk groups using the
   median EMGS score, with the high-risk group showing significantly worse
   survival. This group also demonstrated upregulated pathways associated
   with tumor progression, such as epithelial-mesenchymal transition and
   angiogenesis, and downregulated metabolic pathways. Moreover, the
   high-risk group presented an immunosuppressive microenvironment, with a
   higher TIDE score indicating lower effectiveness of immunotherapy. The
   study identifies potential drugs targeting the high-risk group,
   suggesting therapeutic avenues to improve survival. Conclusively, the
   EMGS score identifies colorectal cancer patients at high risk of liver
   metastases, highlighting the role of specific macrophage subsets in
   tumor progression and providing a basis for personalized treatment
   strategies.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-024-77248-2.

   Keywords: scRNA-seq, Colorectal cancer, Macrophage, Metastatic,
   Prognosis, Metabolic, Immunosuppressive

   Subject terms: Gastrointestinal cancer, Tumour biomarkers, Tumour
   immunology

Introduction

   Colorectal cancer (CRC) is among the most common malignant tumors and
   the second leading cause of cancer-related deaths. In the United
   States, the estimated number of new cases and deaths for CRC in 2023
   are 153,020 and 52,550, respectively^[35]1. Metastasis is the primary
   cause of death in CRC patients. with liver metastasis occurring in over
   half of the cases^[36]2. Despite advances in surgical resection of
   liver metastases, the utilization of immunotherapy, targeted molecular
   therapies, and third-line chemotherapeutic agents, which have
   significantly improved survival outcomes for advanced CRC^[37]3, the
   five-year survival rate for stage IV CRC remains at only
   25–30%^[38]4,[39]5. In contrast, early-stage CRCs can achieve survival
   rates exceeding 80%^[40]6. Therefore, there is an urgent need for a
   superior biomarker to predict the risk of metastasis in CRC patients
   and to stratify prognosis accordingly.

   Tumor immune microenvironment (TME) plays a critical role in the
   metastatic process of CRC. Studies have discovered that CRC liver
   metastases possess a highly immunosuppressive phenotype, primarily
   characterized by the enrichment of specific tumor-associated
   macrophages (TAMs)^[41]7. TAMs display considerable plasticity and
   heterogeneity. Various components of TME, such as cytokines,
   chemokines, and exosomes, recruit TAMs to the tumor area and drive TAMs
   toward a polarization state on a spectrum between anti-tumor (M1-like)
   and pro-tumor (M2-like) phenotypes. The prevalence of pro-tumorigenic
   TAMs is a critical factor in tumor progression, chemotherapy
   resistance, and suboptimal responses to immune checkpoint blockade
   (ICB)^[42]8. Numerous studies have utilized single-cell RNA sequencing
   (scRNA-seq) to dissect pro-tumorigenic TAM populations in CRC,
   uncovering key functional subgroups that go beyond the conventional
   M1/M2 paradigm. For instance, Sathe et al. have identified a distinct
   SPP1 + macrophage cell state in the mCRC with fibrogenic properties and
   altered metabolism^[43]9. Meanwhile, Bao et al. developed an
   immunometabolism subtyping system and found that S100A9 + macrophage
   population contributes to the C3 subtype, which exhibits the poorest
   prognosis. A combination treatment involving PD-1 blockade and the
   S100A9 inhibitor has been shown to reverse the dysfunctional
   immunotherapy response in the C3 subtype^[44]10. Moreover, studies have
   noted metabolic activity variations among different TAM subgroups. M1
   macrophages tend to favor glycolysis secreting large amounts of lactate
   for rapid energy supply. In contrast, M2 macrophages often show
   enhanced fatty acid oxidation capabilities, generating substantial ATP
   as a vital energy source^[45]11,[46]12. Metabolic reprogramming has
   emerged as one of the significant breakthroughs in macrophage-targeted
   interventions and enhancing anti-tumor efficacy. A recent study
   suggests that the expression of the metabolic enzyme IRG1 in TAM
   subsets can enhance their immunosuppressive function. Targeting IRG1
   may induce TAMs to enter an anti-tumor mode and improve responses to
   anti-PD-(L)1 immunotherapy^[47]13. Therefore, elucidating the role of
   TAMs in CRC metastasis and immune metabolic reprogramming could aid in
   prognosis prediction and the development of promising therapeutic
   strategies.

   In this study, we delineate the cellular landscape of primary
   colorectal cancer (CRC) lesions and liver metastases, identifying a
   metastasis-specific EEF1G + TAM subgroup based on scRNA-seq datasets.
   Next, we constructed a novel prognostic signature comprising eight
   EEF1G + TAM-related genes (EMGS), which can significantly and
   independently predict overall survival. We validated the prognostic
   value of EMGS in three external cohorts. Furthermore, we explored the
   tumor biology, metabolic characteristics, immune microenvironment, and
   immunotherapy responses in EMGS-high and EMGS-low patients. Our results
   provide new insights into TAM functions during metastasis, leading to
   the discovery of precise therapeutic targets and improved outcomes for
   CRC patients.

Results

Cellular landscape of primary CRC and liver metastasis

   To understand the single-cell level landscape of primary colorectal
   cancer (CRC) and liver metastases, we reanalyzed the scRNA-seq dataset
   [48]GSE231559^[49]14. This dataset initially comprised 26 samples from
   18 patients with colorectal cancer. Following quality filtering of the
   single-cell sequencing data, the final analysis included 24 samples
   from 17 patients. These samples consisted of 4 colorectal cancer
   tissues (Col_T), 3 adjacent normal tissues (Col_N), 9 liver metastasis
   tissues (Liv_T), and 8 liver normal samples (Liv_N). For data
   processing, principal component analysis (PCA) was performed with
   n_comps = 50, followed by the construction of neighborhood graphs with
   the number of neighbors set to 10. Using the Louvain algorithm, we
   identified 23 unique cell clusters, which were relatively evenly
   distributed across all samples (Fig. [50]1A, Supplementary File Figure
   [51]S1A), suggesting that there is no significant batch effect
   interference among the samples. Cell types were annotated using
   CellTypist based on known marker genes, categorizing 56,163 cells into
   9 major cell types (Fig. [52]1B). T cells and myeloid cells had the
   highest cell numbers, with 36,566 and 6,549 cells, respectively
   (Supplementary File Table [53]S1). T cells, NK cells, myeloid cells,
   mast cells, B cells, plasma cells, epithelial cells, fibroblast cells,
   and endothelial cells exhibited distinguishing canonical marker gene
   expression patterns (Fig. [54]1C, Supplementary File Figure [55]S1B-D).
   Notably, CD3D, NKG7, S100A9, CPA3, MS4A1, MZB1, KRT18, DCN, and CLDN5
   were highly specifically expressed in their respective cell types
   (Fig. [56]1D, Supplementary File Figure [57]S1B). These results
   demonstrate that we have clearly delineated the distinct cell lineages
   in CRC liver metastasis patient samples.

Fig. 1.

   [58]Fig. 1
   [59]Open in a new tab

   Analysis of the colorectal cancer single-cell atlas. (A) Uniform
   Manifold Approximation and Projection (UMAP) visualization of 56,163
   cells categorized into 23 distinct clusters. (B) UMAP delineation of
   nine principal cell lineages (T cells, NK cells, Myeloid cells, Mast
   cells, B cells, Plasma cells, Epithelial cells, Fibroblasts, and
   Endothelial cells) derived from CRC patient samples. (C) Violin plots
   depicting the distribution of a selected canonical marker gene
   expression across each of the major cell lineages. (D) UMAP
   representations of the nine principal cell lineages, each highlighted
   by the expression of its respective canonical marker gene.

Characterization of different myeloid cell subsets

   Immunosuppressive myeloid cells are an indispensable driver in tumor
   progression and immunotherapy resistance in CRC. Therefore, we further
   analyzed different myeloid cell subsets to identify key subgroups
   associated with metastasis. A total of 6,627 myeloid cells were
   included in the analysis and further classified into 9 subtypes
   (Fig. [60]2A-B). These myeloid cells were primarily composed of four
   cell types: macrophages, monocytes, dendritic cells, and a small number
   of mast cells. Distinct marker gene expression differences were
   observed among the four myeloid cell subsets (Fig. [61]2C-D,
   Supplementary File Figure S2A-C), and UMAP visualization of marker gene
   expression also revealed subset-specific expression patterns
   (Fig. [62]2E). We assessed the proportion of different myeloid cell
   subsets across various sample types. Notably, macrophages were
   significantly enriched in liver metastasis lesions, while they were
   less prevalent in normal liver tissue, with monocytes showing the
   opposite pattern (Fig. [63]2F). This prominent presence of macrophages
   in liver metastases highlights their immunosuppressive effect, aiding
   tumor metastasis and colonization. Collectively, characterizing
   tumor-associated macrophages in CRC may provide valuable insights for
   understanding tumor metastasis and predicting patient prognosis.

Fig. 2.

   [64]Fig. 2
   [65]Open in a new tab

   Gene expression profiling of myeloid cell clusters in CRC patients. (A)
   A UMAP visualization of myeloid cells delineated into nine distinct
   clusters. (B) A detailed UMAP plot of 6,549 myeloid cells, categorized
   by four myeloid cell phenotypes. (C) Violin plots depicting the
   expression levels of three canonical marker genes for each myeloid cell
   type (Macrophages: APOE, C1QA, C1QB; Monocytes: S100A9, FCN1, S100A8;
   Dendritic Cells: IDO1, XCR1, CLEC9A; Mast Cells: CPA3, KIT, MS4A2). (D)
   A heatmap revealing the expression patterns of three representative
   genes across macrophages, monocytes, dendritic cells, and mast cells.
   (E) UMAP plot of the same representative genes highlight their
   expression within macrophages, monocytes, dendritic cells, and mast
   cells. (F) Proportional distribution of four myeloid cell types in each
   sample.

Construction of prognostic signature based on metastasis-specific macrophages

   To identify key macrophage subgroups promoting tumor metastasis, we
   performed sub-cluster analysis on 4,319 macrophage cells. This analysis
   yielded four macrophage phenotypes, labeled as EEF1G+, C1QC+, SEPT2+,
   and S100A8 + macrophages (Fig. [66]3A-B, Supplementary File Figure
   S3A-B). Notably, EEF1G + macrophages were predominantly found in liver
   metastases (Fig. [67]3A), with their marker genes significantly more
   expressed in liver metastases than in primary lesions (Fig. [68]3C).
   C1QC + macrophage marker genes had similar expression levels in both
   liver metastases and primary lesions. SEPT2 + and S100A8 + macrophages
   were primarily expressed in primary lesions and reduced in liver
   metastases (Fig. [69]3C). Additionally, we compared our four macrophage
   subtypes with the previously reported seven pan-cancer TAM
   subtypes^[70]15. We found that EEF1G + macrophages predominantly
   expressed marker genes associated with Inflam-TAMs and IFN-TAMs, while
   also specifically exhibiting high expression of the Angio-TAMs marker
   gene VEGFA. Notably, TAM-derived VEGFA can facilitate metastasis by
   promoting various aspects of tumor progression. In contrast,
   C1QC + macrophages showed dominant expression of LA-TAMs marker genes,
   indicating a potential role in lipid metabolism and tissue remodeling.
   The SEPT2 + and S100A8 + macrophages exhibited moderate expression of
   marker genes from various oncogenic TAM subtypes (Fig. [71]3D). Taken
   together, we consider EEF1G + macrophages to be the primary
   metastasis-specific macrophage subgroup. We further explored currently
   available single-cell datasets for colorectal cancer and reanalyzed
   [72]GSE178318, which comprises paired primary and liver metastasis
   samples from six patients (Supplementary File Figure S3C and S3D). In
   this dataset, we observed a significantly higher prevalence of the
   EEF1G + macrophage subpopulation in liver metastases compared to
   primary tumors, thereby validating the existence of the liver
   metastasis-specific macrophage subpopulation identified in this study.
   Differentially expressed genes (DEGs) analysis for EEF1G + macrophages,
   compared to other macrophage cell types, revealed 1,264 up-regulated
   and 107 down-regulated genes (Fig. [73]3E). We assessed the prognostic
   value of these DEGs using the expression and survival data of TCGA-COAD
   cohort. Then, we applied the LASSO-Cox proportional hazards regression
   model to these prognostic DEGs, ultimately identifying a prognostic
   signature composed of 8 genes. We defined a risk model based on the
   EEF1G + macrophage-related gene signature as the EMGS
   (EEF1G + Macrophage Gene Signature). Subsequently, for each patient,
   the EMGS was calculated using the formula: EMGS = (0.002 × HIF1A-AS3
   expression) + (0.359 × [74]AL645608.7 expression) + (0.022 ×INO80B
   expression) + (0.154 × HSPA1A expression) + (0.181 × SNAI1 expression)
   + (0.424 × IFITM10 expression) + (0.248 × ZNF736 expression) + (0.296 ×
   SNHG7 expression) (Fig. [75]3F).

Fig. 3.

   [76]Fig. 3
   [77]Open in a new tab

   Gene expression profiling of macrophage cell cluster in CRC patients.
   (A) UMAP visualization categorizes macrophage cells into normal colon
   (Col_N), colon tumor (Col_T), normal liver (Liv_N), and liver tumor
   (Liv_T) groups. (B) A refined UMAP plot of 4,319 macrophage cells,
   classified into four macrophage phenotypes (Mph-EEF1G, Mph-C1QC,
   Mph-SEPT2, and Mph-S100A8). (C) Violin plots illustrate the expression
   profiles of three key marker genes within the Col_T and Liv_T
   macrophage populations. (D) The stacked violin plot shows the
   expression levels of the marker genes, such as IL1B, CCL3, ISG15,
   CXCL8, ARG1, CD274, MKI67, CDK1, LYVE1, HES1, VEGFA, SPP1, APOC1, and
   APOE, associated with specific functional states like inflammatory TAMs
   (Inflam-TAMs), IFN-TAMs, regulatory TAMs (Reg-TAMs), proliferative TAMs
   (Prolif-TAMs), resident TAMs (RTM-TAMs), angiogenic TAMs (Angio-TAMs),
   and lipid-associated TAMs (LA-TAMs). Each row represents different
   macrophage subtypes identified in the study, including Mph-EEF1G,
   Mph-C1QC, Mph-SEPT2, and Mph-S100A8. (E) A volcano plot highlights
   differentially expressed genes between the Mph-EEF1G phenotype and
   other macrophage cell types. (F) Forest plots present hazard ratios
   (HRs, depicted by pink circles) and 95% confidence intervals
   (represented by horizontal lines) from Cox regression survival
   analyses, correlating the overall survival with eight macrophage
   signature genes.

Performance of the EMGS Signature in training and validation cohorts

   After calculating the EMGS for each patient, we used the median EMGS as
   a cutoff to divide patients into EMGS-high and EMGS-low groups. In the
   TCGA-COAD training set, the EMGS-high group demonstrated significantly
   higher mortality risk and lower overall survival rates (Fig. [78]4A,
   HR = 2.5, 95% CI: 1.77–3.54, P < 0.0001). Time-dependent ROC curves
   were employed to estimate the prognostic accuracy of the EMGS model,
   revealing 1-, 3-, and 5-year AUC values of 0.762, 0.682, and 0.693,
   respectively (Fig. [79]4B). Additionally, an overview illustrating risk
   score, grouping, survival time and status, along with the expression
   levels of model genes, showed that an increase in risk score was
   associated with a higher number of patient deaths and an increase in
   the expression levels of model genes (Fig. [80]4C). Furthermore, to
   validate the generalizability of the prognostic model, we calculated
   EMGS for patients in three independent GEO cohorts. The EMGS-high group
   had a significantly lower overall survival rate compared to the
   EMGS-low group in all three datasets: [81]GSE29621 (Fig. [82]4D, HR:
   2.49, 95% CI: 1.29–4.82, P = 0.027), [83]GSE17536 (Fig. [84]4E, HR:
   2.04, 95% CI: 1.38–3, P = 0.0024), and [85]GSE39582 (Fig. [86]4F, HR:
   1.4, 95% CI: 1.1–1.78, P = 0.022). Finally, we performed multivariate
   Cox proportional hazards analysis using EMGS along with clinical
   factors such as age, gender, and tumor stage. The results indicated
   that EMGS is an independent predictor of overall survival (OS) in the
   TCGA-COAD cohort (P = 0.01, Fig. [87]4G). In summary, these results
   demonstrate the superior efficacy of the EMGS risk score as a
   prognostic signature.

Fig. 4.

   [88]Fig. 4
   [89]Open in a new tab

   Survival analysis in high-risk and low-risk groups based on EMGS. (A)
   Kaplan–Meier curves compared the overall survival of TCGA-COAD patients
   between high-risk and low-risk groups. (B) ROC curves of the EMGS score
   for predicting the risk of death at 1, 3, and 5 years. (C) The
   distribution of risk score (top), survival status (middle), and
   expression (bottom) of the identified eight EMGS genes. Validation of
   EMGS in (D) [90]GSE29621 (n = 65) cohort, (E) [91]GSE17536 (n = 177)
   cohort and (F) [92]GSE39582 (n = 556) cohort. (G) Forest plot shows HRs
   and 95% confidence intervals (horizontal ranges) derived from
   multivariate cox regression analyses including EMGS group, age, gender,
   and stage.

Functional pathways and metabolic alterations associated with the EMGS
signature

   To elucidate the biological mechanisms for the poorer prognosis
   associated with a high EMGS score, we initiated a differential
   expression analysis (DEG) between the high and low-risk groups. We
   found significant upregulation of 2779 genes and downregulation of 16
   genes in the high-risk group (Supplementary File Table S2). Gene
   Ontology annotation of the DEGs revealed enrichment in biological
   processes associated with external encapsulating structure organization
   and extracellular matrix organization, with cellular components and
   molecular functions closely linked to the cell membrane and
   extracellular matrix (Fig. [93]5A). Gene Set Enrichment Analysis (GSEA)
   indicated that the high-risk group predominantly enriches in pathways
   related to epithelial-mesenchymal transition (EMT) and angiogenesis,
   key processes involved in metastasis, while downregulating pathways
   like the interferon alpha signaling pathway (Supplementary File
   Fig. [94]5B-C). Additionally, using gene set variation analysis (GSVA)
   to calculate sample-wise gene set enrichment scores, we observed
   significantly enhanced activity of the EMT and TGFβ pathways in the
   high-risk group compared to the low-risk group (Fig. [95]5D). These
   results suggest a greater metastatic potential in the high-risk group.
   Furthermore, we calculated the metabolic fluxes for 13,082 reactions
   and 142 metabolic pathway activities using METAFlux (METAbolic Flux
   balance analysis). We identified 166 fluxes with significant
   differences between the high and low-risk groups (Supplementary File
   Table S3), involving processes such as glycolysis, purine metabolism,
   nucleotide metabolism, and beta oxidation of fatty acids. By
   integrating fluxes of reactions associated with specific pathways, we
   found that at the metabolic pathway level, the high-risk group
   significantly downregulated beta oxidation of fatty acids, folate
   metabolism, and sulfur metabolism, while upregulating inositol
   phosphate metabolism and ether lipid metabolism (Fig. [96]5E,
   Supplementary File Table S4). These findings highlight the association
   between the EMGS signature and lipid metabolism reprogramming,
   reflecting the potential role of targeting metabolic pathways in cancer
   treatment.

Fig. 5.

   [97]Fig. 5
   [98]Open in a new tab

   Functional differences between EMGS high and low groups. (A) Gene
   Ontology analysis of differentially expressed genes between EMGS high
   and low groups. (B) GSEA analysis of hallmark pathways. (C) GSEA
   enrichplot displaying significantly upregulated and downregulated top
   pathways in the EMGS high group. (D) GSVA activity analysis of hallmark
   EMT and TGF-β signaling pathway. (E) Differences in metabolic pathways
   between EMGS high and low groups.

Immune microenvironment shaped by EMGS

   To determine the differences in the immune microenvironment between the
   high and low EMGS score groups, we first deconvoluted bulk expression
   matrices to infer the proportions of various immune and stromal cell
   types. CIBERSORT indicated a notably higher proportion of M0 and M2
   macrophages compared to the low-risk group, reflecting a positive
   association between the EMGS score and M2-type macrophages. Conversely,
   the low-risk group exhibited an elevated percentage of CD8 + T cell
   (Fig. [99]6A). MCPcounter also found that the high-risk group was
   significantly enriched in monocytic lineage, neutrophils, endothelial
   cells, and fibroblasts (Fig. [100]6B). Consistently, ESTIMATE indicated
   that the high-risk group had significantly higher stromal and ESTIMATE
   scores and lower tumor purity, although there were no significant
   differences in the immune score between the two groups (Supplementary
   File Figure S4A). Additionally, we characterized several gene
   signatures associated with immune evasion through T-cell exclusion. Our
   analysis revealed that patients in the high-EMGS group had notably
   elevated Cancer Associated Fibroblasts (CAF) scores (Fig. [101]6C),
   T-cell exclusion characteristics (Fig. [102]6D), and a higher TIDE
   score (Fig. [103]6E). These observations were consistent across three
   other independent validation datasets from GEO (Supplementary File
   Figure S4B-D). Collectively, these findings suggest that the higher
   EMGS score correlates with an immunosuppressive tumor microenvironment,
   which may contribute to reduced responsiveness to immune checkpoint
   blockade therapies and poorer clinical outcomes.

Fig. 6.

   [104]Fig. 6
   [105]Open in a new tab

   Differences in the immune microenvironment between EMGS high and low
   groups. (A) Proportional distribution of 22 immune cell types obtained
   through CIBERSORT deconvolution. (B) Proportional distribution of eight
   immune and two stromal cell populations calculated by MCPcounter. For
   EMGS high and low groups, scores calculated using the ssGSEA algorithm
   for (C) CAF score, (D) various immune exclusion signature scores, and
   (E) TIDE score.

Prediction of drug response in EMGS-high patients

   Patients with a high EMGS score, due to their greater metastatic
   potential and poorer prognosis, urgently require effective drug
   treatments. Here, we predicted drugs with differential responses
   between high and low EMGS patients based on drug sensitivity data from
   colorectal cancer cell lines. The analysis revealed that the EMGS-high
   group is more sensitive to three anticancer drugs compared to the
   EMGS-low group: docetaxel (Fig. [106]7A), a chemotherapy drug used to
   treat multiple metastatic and non-resectable tumor types, and two
   targeted cancer drugs, pazopanib and sorafenib (Fig. [107]7B-C), which
   are widely used in the treatment of hepatocellular carcinoma and kidney
   cancer patients, respectively. Additionally, we utilized the DrugMap
   database ([108]http://drugmap.idrblab.net/) to search for Drug
   Therapeutic Targets among genes related to EMGS. We discovered that for
   the gene HSPA1A, which encodes Heat shock protein 70, several drugs are
   currently in development or clinical trials (Fig. [109]7D). These drugs
   may bring clinical benefits for the long-term survival of EMGS-high
   patients.

Fig. 7.

   [110]Fig. 7
   [111]Open in a new tab

   Predicted effective anticancer drugs for the EMGS high group. The IC50
   values of three anticancer drugs were predicted by the pRRophetic
   algorithm and showed significant differences in sensitivity between
   EMGS high and low groups: (A) Docetaxel, (B) Pazopanib, (C) Sorafenib.
   (D) Targeted drugs for the EMGS gene HSPA1A identified using the
   DrugMap database.

Discussion

   Macrophages are one of the most abundant immune cells in colorectal
   cancer. However, their function remains controversial due to the
   presence of various phenotypes. While several studies have identified
   specific functional subgroups of macrophages using single-cell
   sequencing techniques, few have linked these findings to patient
   prognosis or therapeutic target development. In this study, using
   scRNA-seq datasets from primary colorectal cancer lesions and liver
   metastases, we identified four unique macrophage subgroups. Notably,
   EEF1G + macrophages were predominantly found in liver metastases, with
   their marker genes significantly overexpressed compared to primary
   lesions. Therefore, we propose that EEF1G + macrophages may be the
   primary cell subgroup promoting liver metastasis. Based on the specific
   expression of genes in EEF1G + macrophages, we constructed a novel
   prognostic signature comprising eight EEF1G + macrophage-related genes
   (EMGS), which can significantly and independently predict overall
   survival.

   Recent advancements in single-cell sequencing have enabled a deeper
   understanding of the functional roles and potential applications of
   different cell types in colorectal cancer (CRC). For example, a study
   profiling three left-sided and three right-sided CRC patients using
   single-cell RNA sequencing revealed that M2-like macrophages were more
   prevalent in left-sided CRC and were associated with a favorable
   prognosis in CRC^[112]16. Liu et al. explored the immune phenotypic
   linkage between CRC and liver metastasis through single-cell analyses,
   identifying that SPP1 + macrophages were predominant in liver
   metastases and played a pro-metastatic role^[113]17. Additionally, a
   recent study found that tumor-associated macrophages and granulocytic
   myeloid-derived suppressor cells were significantly increased in CRC
   compared to their counterparts in normal tissue, displaying strong
   immune-inhibitory signatures. Further research suggested that Sirpa−/−
   macrophages exhibited enhanced phagocytosis and antigen presentation,
   promoting T cell activation and proliferation, indicating that Sirpa
   blockade could be a promising strategy to enhance cancer
   immunotherapy^[114]18.Beyond macrophages, epithelial cell subpopulation
   analyses have also identified novel metastatic and malignant traits.
   For instance, Joanito et al. integrated single-cell and bulk
   transcriptome sequencing to identify two intrinsic subtypes of
   epithelial tumor cells (iCMS2 and iCMS3), discovering that the
   consensus molecular subtype (CMS) classification 4 cancers with iCMS3
   epithelium had poorer prognoses^[115]19. Another study focusing on
   primary CRC and liver or ovarian metastases identified a stem-like
   epithelial cell cluster with high PTPRO and ASCL2 expression. This
   cluster drives CRC metastasis, with distinct subpopulations
   contributing to organ-specific metastasis^[116]20. Moreover, research
   has also characterized CRC-specific fibroblast populations, including
   F3 + fibroblasts enriched in primary tumors and MCAM + fibroblasts
   enriched in liver metastases, which have opposing prognostic impacts.
   Additionally, CD8_CXCL13 and CD4_CXCL13 subpopulations were
   significantly increased in liver metastasis samples, with increased
   CXCL13 expression correlating with better prognosis in CRC^[117]21.
   These findings provide a multidimensional understanding of CRC
   metastasis and progression through the lens of diverse cell type.

   EEF1G encodes the Eukaryotic translation elongation factor 1 gamma,
   playing a pivotal role in the translation elongation process. It has
   been reported to be overexpressed in various cancers, including gastric
   carcinoma^[118]22, colon adenocarcinoma^[119]23, and pancreatic
   cancer^[120]24. A pan-cancer analysis of seven elongation factors,
   using public databases (Oncomine and TCGA), indicated that the
   transcription level of EEF1G and its prognostic impact are
   cancer-specific^[121]25. For instance, EEF1G expression is
   downregulated in breast and gastric cancers, yet its increased
   transcription level predicted favorable survival outcomes in breast
   cancer and was associated with poor prognosis in gastric cancer.
   Additionally, high EEF1G expression is linked to worse prognosis in
   lung, pancreatic, and liver cancers, while its elevated levels
   predicted better survival in colorectal, brain and CNS, ovarian, and
   kidney cancers. In this study, we observed that EEF1G is significantly
   overexpressed in liver metastases compared to primary lesions,
   suggesting its adverse prognostic role, which contrasts with the
   results from bulk RNA-Seq, indicating the expression specificity of
   this gene in different cell types. Leveraging single-cell sequencing
   technology, our findings reveal the potential role of EEF1G in
   promoting metastasis.

   Our analysis indicates that patients with a high EMGS score, compared
   to those with a low score, exhibit tumor biology enriched in gene
   ontology related to the extracellular matrix, as well as cancer
   hallmarks such as epithelial-mesenchymal transition (EMT) and
   angiogenesis. These findings align with previous studies revealing that
   tumor-associated macrophages induce the EMT process by secreting a
   multitude of cytokines and growth factors, like transforming growth
   factor-beta (TGF-β), thereby promoting tumor invasion and
   metastasis^[122]26,[123]27. This suggests that EMGS score reflects the
   pro-tumorigenic characteristics of TAMs and may serve as a predictive
   factor for the liver metastasis potential in colorectal cancer
   patients. Moreover, we identified notable metabolic differences between
   high and low EMGS groups, particularly in fatty acid,
   phosphatidylinositol, and ether lipid metabolism. Prior studies have
   shown that lipid metabolism reprogramming in TAMs can regulate their
   phenotype and function, thus manifesting pro-tumorigenic or
   anti-tumorigenic effects^[124]28,[125]29. In addition, a significant
   reduction in folate metabolism activity was observed in the high EMGS
   group. Studies have proposed variations in folate uptake and
   accumulation between macrophages of different polarization
   states^[126]30,[127]31. These results indicate that the EMGS high and
   low groups represent distinct macrophage states with clear metabolic
   differences. Targeting specific metabolic pathways to shift TAM
   polarization towards an M1-like phenotype could offer a potent approach
   in cancer treatment strategies.

   In the EMGS high group, an abundance of monocytes including M0 and M2
   macrophages, along with endothelial and fibroblast cells, was detected
   via deconvolution algorithms like CIBERSORT and MCPcounter. Moreover,
   this group exhibited notably elevated stromal scores and reduced tumor
   purity, while the immune scores remained comparable between the two
   groups, as validated by ESTIMATE. These results suggest that immune
   cells do not play a primary role in the differing biological behaviors
   between the high and low-risk groups. The high-risk tumors are
   primarily influenced by various regulatory mechanisms of stromal cells,
   including providing growth factors, promoting angiogenesis, and
   altering the composition and stiffness of the extracellular matrix.
   Consistently, the high-risk group also exhibited significantly higher
   scores for CAF and various immune exclusion signatures, calculated
   using the ssGESA algorithm. Additionally, the high-risk group showed a
   substantially elevated TIDE score, an indicator used to predict tumor
   response to immune checkpoint blockade treatments. Overall, these
   results suggest that tumors in the high-risk group may have stronger
   tumor microenvironment-driven progression mechanisms, including active
   involvement of CAFs, enhanced immune evasion, and potential resistance
   to current immunotherapeutic strategies. Therefore, a high EMGS score
   derived from the 8-gene risk model may identify a subset of colorectal
   cancer patients who are less likely to respond to immunotherapy, even
   within the dMMR/MSI-H subtype. Additionally, in these patients, the
   efficacy of combining VEGF inhibitors with ICI therapy could be
   explored. Recent studies have reported that crosstalk between CAFs and
   TAMs regulates the immune microenvironment, promoting tumor progression
   and immune escape^[128]32–[129]34. Modulating the interaction and/or
   differentiation of CAFs and TAMs may significantly enhance the
   development of future targeted therapeutic approaches.

   Nevertheless, the current study has some limitations. Firstly, the
   metastasis-specific EEF1G + macrophages were identified from scRNA-Seq
   datasets of only four colorectal cancer tissues and nine liver
   metastasis tissues, necessitating validation of this subgroup’s
   functional characteristics in a larger sample size. Additionally, while
   our study emphasized macrophage-related prognostic features, it also
   indicated potential interactions with other immune and stromal cells
   like CAFs. Therefore, combining other tumor microenvironment components
   may further enhance the model’s performance. Lastly, the drugs
   identified for the high-risk group should be substantiated through
   further clinical trials. In summary, the EMGS score developed in this
   study demonstrated superior prognostic performance in TCGA and three
   independent datasets. The enrichment of metastasis-related pathways and
   the formation of an immunosuppressive microenvironment in the EMGS high
   group also underscore the role of EMGS in predicting patients’
   metastatic potential. This study thus contributes valuable insights
   into the key cellular components and molecular mechanisms of the
   colorectal cancer metastatic process.

Methods and materials

Collection of single-cell RNA sequencing and GEO datasets

   The original scRNA-seq dataset from fresh colorectal cancer (CRC)
   tissues and adjucent normal tissues was sourced from
   [130]GSE231559^[131]14. This dataset comprised data from 17 patients,
   totaling 24 samples. These included 4 fresh colorectal cancer tissues
   (Col_T), 3 adjucent normal samples (Col_N), 9 liver metastasis samples
   (Liv_T) and 8 liver normal samples (Liv_N). Gene expression profiles
   for colorectal cancer (CRC) cohorts were obtained from The Cancer
   Genome Atlas (TCGA) portal, accessible at
   [132]https://portal.gdc.cancer.gov/. We accessed raw read counts and
   associated clinical information including age, gender, cancer stage,
   overall survival (OS), and vital status of CRC patients from the UCSC
   Xena website ([133]https://xenabrowser.net/datapages/). To further
   validate the utility of the prognostic model, we utilized microarray
   data and clinical details from the NCBI Gene Expression Omnibus (GEO)
   database, specifically for [134]GSE29621 (n = 65), [135]GSE17536
   (n = 177), and [136]GSE39582 (n = 556).

Single cell analysis and cell clustering

   The single-cell dataset matrix was imported into Scanpy (version 1.9.1)
   [PMID: 29409532] for comprehensive analysis. We performed principal
   component analysis (PCA) using the ‘sc.tl.pca’ function, setting the
   number of components to 50 (n_comps = 50). Neighborhood graphs were
   constructed using ‘sc.pp.neighbors’ with the number of neighbors set to
   10 (n_neighbors = 10). Dimensionality reduction was carried out on
   highly variable genes, and cell clusters were identified using the
   Louvain algorithm through the ‘sc.tl.louvain’ module, setting the
   resolution parameter at 0.4. These cell clusters were visualized using
   the uniform manifold approximation and projection (UMAP) technique via
   the ‘sc.pl.umap’ module. Cell cluster annotations were determined based
   on the expression of known marker genes, utilizing CellTypist
   ([137]https://github.com/Teichlab/celltypist)^[138]35.

Differential expressed gene analysis

   We compared the expression of individual genes in each cluster with the
   remaining cells using the ‘sc.tl.rank_genes_groups’ module, employing
   the Wilcoxon rank sum test. A gene was classified as up-regulated or
   down-regulated based on a significance threshold of P < 0.05. The
   cutoff criteria for determining up-regulation or down-regulation were
   set at a log(fold change) ≥ 1 and ≤ − 1, respectively.

Construction and validation of the prognostic signature

   To ascertain the prognostic value of marker genes ranging from
   EEF1G + macrophages to others in TCGA-COAD patients, we conducted a
   univariate Cox regression analysis focusing on overall survival (OS).
   Genes with a P-value less than 0.05 were identified as prognostic
   genes. Subsequently, these genes were assessed using the least absolute
   shrinkage and selection operator (LASSO) method in Cox proportional
   hazards regression, employing the ‘glmnet’ package in R. Based on this
   analysis, we developed a risk model by linearly combining the mRNA
   expression levels of these genes with their respective risk
   coefficients. This process identified eight genes as key candidates for
   further prognosis-related investigation (EMGS model, EEF1G + Macrophage
   Gene Signature). Patients were divided into two groups based on the
   median EMGS score: high-risk and low-risk group. To validate the
   prognostic significance of the EMGS score, we generated receiver
   operating characteristic (ROC) curves^[139]36 and calculated the area
   under the curve (AUC) using the ‘survivalROC’ package in R, thereby
   evaluating the accuracy of the EMGS model.

Survival analysis

   Kaplan-Meier (KM) curves were employed to display the cumulative
   survival rates between the high and low EMGS groups in the TCGA-COAD
   cohort and three independent GEO datasets. Additionally, Multivariable
   Cox regression analysis was performed to assess the independence of the
   EMGS prognostic model. The aforementioned analyses were conducted using
   the ‘survival’ and ‘survminer’ R packages.

Pathway enrichment analysis

   The DESeq2 R package was utilized to identify differentially expressed
   genes between the high and low EMGS groups. Reference gene sets were
   sourced from c5.go.v2023.2.Hs.symbols.gmt and
   h.all.v2023.2.Hs.symbols.gmt from the Molecular Signatures Database
   (MSigDB). Gene Ontology and Gene Set Enrichment Analysis (GSEA) were
   conducted using the ‘clusterProfiler’ R package (v3.18.1). Gene Set
   Variation Analysis (GSVA) was employed to differentiate the activity of
   enriched pathways between the EMGS-high and EMGS-low groups. Using the
   ssGSEA algorithm within the ‘GSVA’ R package, we calculated the
   enrichment scores for each gene set in the TCGA-COAD samples. Metabolic
   fluxes and pathway activities were calculated and normalized from gene
   expression data using METAFlux^[140]37.

Immune infiltration analysis

   The composition of immune cells was inferred using the CIBERSORT and
   MCPcounter algorithms through the Immuno-Oncology Biological Research
   (IOBR) R package^[141]38. The ESTIMATE algorithm was employed to
   calculate the proportions of immune and stromal components within the
   tumor microenvironment (TME) of each sample^[142]39. This calculation
   produces three scores: ImmuneScore, StromalScore, and ESTIMATEScore,
   corresponding to the immune component, stromal component, and their
   combined total, respectively. A higher score indicates a greater
   proportion of the respective component in the TME. Additionally, the
   Tumor Immune Dysfunction and Exclusion (TIDE) algorithm was utilized to
   predict potential responses to immune checkpoint blockade (ICB)
   therapies^[143]40. Patients with higher TIDE scores are typically more
   susceptible to antitumor immune evasion, often correlating with lower
   ICB treatment response rates.

The prediction of response to targets agents

   We evaluated the response of each sample to targeted agents by
   calculating the half-maximal inhibitory concentration (IC50) using the
   ‘pRRophetic’ R package, which utilizes data from the Genomics of Drug
   Sensitivity in Cancer (GDSC) database
   ([144]https://www.cancerrxgene.org/). Our investigation focused on
   identifying variations in sensitivity to these targeted agents between
   high-risk and low-risk groups.

Statistics analysis

   We assessed differences of statistical significance using a two-tailed
   Student’s t-test or Wilcoxon test on the R platform (version 4.0.3).
   Multivariate analysis was conducted employing the Cox proportional
   hazards model, using the R packages ‘survival’, ‘survminer’, and
   ‘forestplot’. This analysis aimed to identify independent factors
   associated with overall survival (OS) in both TCGA-COAD and GEO
   cohorts. P values were adjusted for multiple comparisons using the
   false discovery rate (FDR) method, with FDR values less than 0.05
   considered statistically significant.

Electronic supplementary material

   Below is the link to the electronic supplementary material.
   [145]Supplementary Material 1^ (14.4MB, pdf)

Acknowledgements