Abstract Background Colonic adenocarcinoma (COAD) is the most common pathological type of colon cancer. Tumor microenvironment (TME) plays an important role in the occurrence and development of COAD. There are currently no specific studies indicating the mechanism of action of TME in COPD patients. Methods The percentage of tumor-infiltrating immune cells (TIC) in 512 COAD cases from The Cancer Genome Atlas (TCGA) database was calculated using CIBERSORT and ESTIMATE. Weighted gene coexpression network analysis (WGCNA) was performed to find modules of differentially expressed genes (DEGs) with high correlations followed by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses to determine the function of distant metastasis (M)-stage-related modules. Pathway enrichment analysis, protein–protein interaction (PPI) network, Cox regression analysis, and Kaplan–Meier survival analysis were performed on DEGs to select the most critical genes. The correlation between SIGLEC1 expression in COAD and TME status and between immune checkpoints and SIGLEC1 was examined using gene set enrichment analysis (GSEA) and Pearson correlation coefficients. Results A WGCNA screen was performed to obtain 12,342 DEGs and 209 key genes associated with M stage between tumor and normal samples. GO and KEGG analysis revealed that the DEGs primarily engaged in pathways such as Th1 and Th2 cell differentiation and cell adhesion molecules. SIGELEC1 gene was identified by univariate Cox regression, PPI network construction, and survival analysis. GSEA showed that the genes in the high-expression SIGLEC1 group were mainly enriched in immune-related activities. In the low-expression SIGLEC1 group, the genes were enriched in MYC targets. CIBERSORT analysis of the proportion of TICs showed that SIGLEC1 was positively correlated with macrophages (M0, M2), T-cell CD8 and immune checkpoint-related genes, suggesting that SIGLEC1 may be responsible for maintaining the immune dominance of TME. Immunohistochemical and prognostic analysis showed that the group with higher SIGLEC1 expression had more severe lesions and a worse prognosis than the group with lower SIGLEC1 expression. Conclusions SIGLEC1 gene is a distant metastasis-related gene that affects the survival prognosis of COAD patients and provides additional insight into the treatment of COAD. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-02093-2. Keywords: Colon cancer, Tumor microenvironment (TME), Distant metastasis, ESTIMATE, Algorithm, WGCNA analysis Introduction Colon adenocarcinoma (COAD) is a malignant neoplasm occurring in adeno-squamous epithelial cells and is one of the most common pathological types of colon cancer due to its extremely high incidence in many regions and countries, with approximately millions of COAD patients worldwide [[32]1, [33]2], and the mortality rate increases with age [[34]3]. However, the current surgical and adjuvant treatments for COAD are very limited in improving the prognosis of patients and no longer meet the treatment needs of patients [[35]4, [36]5]. Therefore, it is urgent to explore effective immunotherapies and accurate biomarkers for the treatment of this disease. The tumor microenvironment (TME) is a highly complex ecosystem [[37]6] that includes fibroblasts, immune cells, adipocytes, vascular endothelial cells, and extracellular matrixes. During tumor development, the TME interacts with tumor cells and jointly mediates immune tolerance to tumors, which has an impact on the clinical outcome of immunotherapy [[38]7]. Furthermore, it has been shown that TME can largely enhance the immune effect of tumors [[39]8–[40]10], and a large number of immune cells can lead to tumor infiltration by secreting different factors, which are closely related to the prognosis of patients [[41]11]. A recent study elucidated the important role of CSF-1R in the TME and the immunotherapeutic target for COAD, suggesting that the immune status of the TME is greatly beneficial for the treatment and prognosis of COAD [[42]12]. Estimate, an approach to determining tumor purity by gene expression characteristics, can be used to infer the ratio of tumor samples’ mesenchymal and immune cells that play a significant role in tumor tissues. In addition, Weighted Gene Co-expression Network Analysis (WGCNA), a systems biology approach that can describe gene association patterns and identify biomarker genes or therapeutic targets based on highly synergistic sets of variation, has been widely used in immunotherapy studies [[43]13, [44]14]. Although the impact of TME on the efficacy of immunotherapy has now become a hot topic of research, the specific mechanism of action regarding precise genetic analysis remains to be investigated. Therefore, the present study attempted to analyze the tumor microenvironment of patients with colon adenocarcinoma in the TCGA database by these two methods and to find differential genes possessing prognostic value. Materials and methods Data sources A total of 512 samples were obtained from the TCGA‐COAD database combined with transcriptome data and clinical information, including 471 tumor samples and 41 normal samples from a public database UCSC Cancer Genomics Browser ([45]https://xenabrowser.net/datapages/). Generation of ESTIMATE algorithms and survival analysis Based on the ESTIMATE algorithm, the scores associated with the immune-stromal component of TME were estimated for each sample [[46]10] and presented as three subsets of scores: ImmuneScore, StromalScore, and ESTIMATEScore. According to the median values of these subsets, the COAD cases were divided into the high group and the low group, respectively. Additionally, the R package survival and survminer were applied for survival analysis. Analysis of differentially expressed genes and WGCNA analysis Differentially expressed genes (DEGs) with p-value < 0.05 were generated using the R “limma” package and visualized using the R “pheatmap” package [[47]15, [48]16]. Based on the DEGs, co-expression networks were constructed using the WGCNA. Then, DEGs and clinical data were analyzed according to the appropriate soft-threshold power. Moreover, to further analyze the modules, the merged cutoff threshold was set to 0.2, the dissimilarity result of the modules was calculated, and some of them were merged. Finally, the significant module was chosen for PPI network using the STRING database [[49]17], followed by reconstruction via Cytoscape. To better identify the hub genes, the loaded R package survival was applied for univariate COX regression find hub gene. Functional analysis To obtain the functions of the genes, enrichment analyses were performed on the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) databases by R “ggplot2,” “org.Hs.eg.db,” “enrichplot,” and “clusterProfiler” packages [[50]18]. GSEA analysis Hallmark and C7 gene collections were downloaded from the Molecular Signatures Database ([51]http://www.gsea-msigdb.org/gsea/index.jsp). Using SIGLEC1 expression level as the phenotype annotation, data from COAD patients in the TCGA cohort were divided into the low and high expression for analysis, and only gene sets with NOM p < 0.05 and FDR q < 0.25 were considered significant. Tumor-infiltrating immune cells (TICs) profile CIBERSORT algorithm was carried out to estimate the contents of immune-infiltrating cells in all tumor samples, followed by quality filtering to select samples with p < 0.05 for subsequent analysis. GEPIA2 ([52]http://gepia2.cancer-pku.cn/) [[53]19], an updated version of gene expression profiling interactive analysis, was performed to analyze the association of SIGLEC1 with immune checkpoints. Immunohistochemistry (IHC) staining Tumor and paracancer tissue microarrays from the Affiliated Tumor Hospital of Nantong University were used for the validation cohort. Immunohistochemical staining was conducted using primary anti-SIGLEC1 antibody (1:200; 55427-1-AP, proteintech, China). The density of positive staining was measured, and the H score was evaluated by two independent pathologists blinded to the clinicopathological information. A microscopy system (Nikon, Japan) was used to scan immunohistochemistry sections. Statistical analysis R 4.2.3 was used for the statistical analysis in this study. Statistics were deemed significant at p < 0.05. Results Immune scores revealed more prognostic value of COAD patients The overall workflow of this study was shown in Fig. [54]1. To establish the correlation between the ESTIMATE algorithms and the survival rate, we calculate the immune score, stromal score, and ESTIMATE score based on the ESTIMATE algorithm and performed a Kaplan–Meier survival analysis separately. The results showed that there was no significant correlation between stromal score and ESTIMATE score and overall survival (Fig. [55]2A and C). However, there was a positive correlation between immune scores and overall survival, where colonized patients with low immune scores were associated with poor survival outcomes (Fig. [56]2B). These results implied that the immune components were more suitable for indicating the prognosis of COAD patients. Fig. 1. [57]Fig. 1 [58]Open in a new tab The overall workflow of this study Fig. 2. [59]Fig. 2 [60]Open in a new tab ESTIMATE algorithms are associated with the survival of the COAD. A Kaplan–Meier curves of the high and low stromal scores of COAD patients. B Kaplan–Meier curves of the high and low immune scores of COAD patients. C Kaplan–Meier curves of the high and low estimate scores of COAD patients Additionally, we investigated the relationship between immune scores and stromal scores and clinical features, including pathological stage and AJCC‐TNM stage. As shown in the immune scores, TMN stages and pathological stages were negatively correlated with M—distant metastasis classification. However, no significant difference was observed in the stromal scores and ESTIMATE scores with any other clinical features according to the results from the above survival analysis. These results suggested that the immune components were linked with the progress of COAD, such as metastasis and pathological stage (Fig. [61]3). Fig. 3. [62]Fig. 3 [63]Open in a new tab Correlation of ImmuneScore, StromalScore and EstimateScore with clinicopathological features. A–D Distribution of StromalScore in pathological stage and TMN stages. E–H Distribution of ImmuneScore in pathological stage and TMN stages. I–L Distribution of EstimateScore in pathological stage and TMN stages To confirm the exact gene alteration in the immune components of COAD patients regarding, we classified the 471 COAD patients into high‐level and low‐level groups based on the median immune scores. As shown in the heatmap, we identified 12,342 differentially expressed genes (DEGs) according to immune scores. (Fig. [64]4A) Fig. 4. [65]Fig. 4 [66]Open in a new tab Screening of DEGs related by heatmap and WGCNA analysis. A Heatmap of DEGs by comparison of the high score and the low score groups in ImmuneScore. B Analysis of the scale-free index for best soft threshold powers (β). When β = 5, it satisfies the scale-free topology threshold of 0.89. C Analysis of the mean connectivity for best soft-threshold powers. D The DEGs clustered dendrogram and nodule color of WGCNA. E Correlation analysis of the modules and clinical features P‐values are shown. Scatter plot analysis in the F brown modules, G green modules, H greenyellow modules and I midnightblue The DEGs were comprehensively analyzed and classified into modules by WGCNA. In addition, the modules were further analyzed by clustering analysis to remove abnormal data. A soft threshold (β = 5, scale‐free R 2 = 0.89) was used to construct a scale‐free network that determined seventeen modules (Fig. [67]4B and C). Next, the whole gene expression levels of related modules were analyzed to find correlations between the corresponding modules and clinical data including the M classification of TMN stages (Fig. [68]4D–E). Then, the related modules (darked, green, drak-green, green-yellow, and midnight-blue) exhibited a significant correlation with the M stage as the M stage-related module (Fig. [69]4F–I). Therefore, these modules were chosen for further analyses, and 209 critical genes associated with M stage were screened using the threshold correlation. Functional analysis of genes GO analysis was performed to analyze the function of the M stage-related module (Fig. [70]5A and Additional file [71]1: Table S1). We found that these genes were functionally almost associated with responding to immune-related GO terms, such as receptor activity, cytokine receptor activity, MHC protein complex binding, and T cell receptor binding. Fig. 5. [72]Fig. 5 [73]Open in a new tab A Gene ontology enrichment analysis and B Genes and Genomes enrichment analysis Genes and Genomes (KEGG) enrichment analysis also displayed the enrichment of cell adhesion molecules, Th1 and Th2 cell differentiation, Th17 cell differentiation, and hematopoietic cell lineage (Fig. [74]5B and Additional file [75]2: Table S2). Therefore, the overall function of DEG seems to be linked to immune-related activities, suggesting that the involvement of immune factors was a major feature of TME in COAD. Intersection analysis of PPI network and univariate COX regression To further explore the underlying association of the module genes, a PPI network based on the STRING database was constructed using Cytoscape software, as shown in Fig. [76]6A. The prognosis of COAD patients was analyzed by univariate Cox regression and only two factors, CYHR1 and SIGLEC1 were determined as the hub genes (Fig. [77]6B). Then, the survival analysis further screened out only SIGLEC1 (Fig. [78]6C and D). Fig. 6. [79]Fig. 6 [80]Open in a new tab Protein–protein interaction network and univariate COX. A Gene networks from the module gene based on the WGCNA method. B Univariate Cox regression analysis, listing to prognosis. C, D Overall survival analysis of the CYHR1 and SIGLEC1 genes in colon cancer based on the Kaplan–Meier plotter SIGLEC1 had potential to be an indicator of TME modulation To identify SIGLEC1-associated signaling pathways, we implemented the Gene Set Enrichment Analysis (GSEA) between high and low SIGLEC1 expression data sets. Genes in SIGLEC1 high-expression group were mainly enriched in hallmark gene sets, such as inflammatory response, allograft rejection, epithelial mesenchymal transition, and IL6-STAT3 signaling (Fig. [81]7A). However, the low-expression SIGLEC1 gene set was enriched in the MYC targets (Fig. [82]7B). Regarding the C7 collection defined by the MSigDB, the immunologic gene sets and genes enrichment of the high SIGLEC1 expression group were shown in Fig. [83]7C. As to SIGLEC1 low-expression group, the enrichment of genes was shown in Fig. [84]7D. These results suggested that SIGLEC1 might be a potential indicator for the TME in COAD. The ROC curve also confirmed the diagnostic value of SIGLEC1 for M1 stage (AUC = 0.904) and COAD (AUC = 0.891) (Figure S1). Therefore, the GSEA findings implied that SIGLEC1 expression was correlated with immune-related signals in COAD. Fig. 7. [85]Fig. 7 [86]Open in a new tab GSEA delineates high SIGLEC1 expression and low expression. A HALLMARK collection enriched in the high SIGLEC1 expression sample. B HALLMARK collection enriched in the low SIGLEC1 expression sample. C Enriched immunologic gene sets in C7 collection of the high SIGLEC1 expression group. D Enriched immunologic gene sets in C7 collection of the low SIGLEC1 expression group Correlation of SIGLEC1 with the TICs To further confirm the correlation of SIGLEC1 expression with the immune microenvironment, we detected the part of the entire 22 subtypes of immunocytes in each sample, as shown in Fig. [87]8A, and as revealed by the hierarchical clustering, immune genes presented distinct distribution differences in COAD tumor samples and adjacent samples. It was easily found that Neutrophils and activated mast cells exerted a remarkable positive association; however, Fig. [88]8B showed an obvious negative correlation between resting mast cells and activated ones. Fig. 8. [89]Fig. 8 [90]Open in a new tab The profile of tumor-infiltrated immune cells in tumor samples and correlation analysis. A Barplot showing the fractions of 21 types of tumor-infiltrating immune cells in COAD samples. B Heatmap displaying the correlation between the 22 types of tumor-infiltrating immune cells Notably, a comparison of the TIICs levels between the high and low expression of SIGLEC1 confirmed an elevated level of macrophage M0 and decreased B cells naive in the high SIGLEC1 expression (Fig. [91]9A). Through correlation analysis, SIGLEC1 was positively related to macrophages(M0, M2), resting mast cell, monocytes, neutrophils, activated NK cells, and T cells CD8, while negatively associated with activated mast cells, resting T cells CD4 memory, and activated T cells CD4 memory (Fig. [92]9B). These results further supported that the SIGLEC1 levels affected the immune activity of TME. Consequently, the study further determined whether SIGLEC1 was correlated with immune checkpoints such as PD-L1, which was also crucial for predicting the efficacy of immunotherapy in COAD. Immune-checkpoint-related genes such as CTLA-4, LGALS9 (GAL9), LAG-3, PDCD1 (PD-1), PDCD1LG2 (PD-L2), CD274 (PD-L1), TIGIT, and HAVCR2 (TIM3) were selected for further analysis [[93]20]. Interestingly, all the genes (CTLA-4, LAG-3, PD1, PDL2, PDL1, TIGIT and TIM-3) were positively associated with the SIGLEC1 expression in COAD (Fig. [94]9C). Fig. 9. [95]Fig. 9 [96]Open in a new tab Significant correlation between the expression of SIGLEC1 and the proportion of tumor-infiltrating immune cells. A Scatter plot showing the correlation between 10 types of tumor-infiltrating immune cell ratio with the SIGLEC1 expression (all p < 0.05). B Mendelian randomization analysis. C Correlation between the immune checkpoint genes between SIGLEC1 expression Clinical COAD samples validate SIGLEC1 signature A summary of the clinicopathological characteristics of SIGLEC1 expression levels based on tumor tissue microarrays from Nantong University Cancer Hospital showed that N stage (p < 0.001), M stage (p = 0.042), and Pathologic stage (p = 0.018) were significantly associated with SIGLEC1 expression levels in COAD (Table [97]1). We conducted immunohistochemical (IHC) staining on tissue sections obtained from surgical specimens to perform prognostic analysis of the patients. The IHC staining of SIGLEC1 exhibited heterogeneity but was able to distinguish between high and low expression samples (Fig. [98]10A). Subsequently, we evaluated the patient prognosis based on varying levels of SIGLEC1 expression and observed that patients with high SIGLEC1 expression had a significantly lower survival rate than those with low expression (p = 0.003) (Fig. [99]10B). Table 1. Patients’ characteristics and clinicopathological parameters of the study group Characteristic Low expression of SIGLEC1 High expression of SIGLEC1 p value n 96 119 T stage, n (%) 0.983 T1 11 (5.1%) 13 (6%) T2 10 (4.7%) 14 (6.5%) T3 43 (20%) 51 (23.7%) T4 32 (14.9%) 41 (19.1%) N stage, n (%) 0.001 N0 66 (30.7%) 55 (25.6%) N1 13 (6%) 46 (21.4%) N2 16 (7.4%) 18 (8.4%) N3 1 (0.5%) 0 (0%) M stage, n (%) 0.042 M0 87 (40.5%) 96 (44.7%) M1 9 (4.2%) 23 (10.7%) Pathologic stage, n (%) 0.018 I 20 (9.3%) 22 (10.2%) II 44 (20.5%) 34 (15.8%) III 26 (12.1%) 56 (26%) IV 6 (2.8%) 7 (3.3%) Age, median (IQR) 64 (55, 73) 66 (56, 73) 0.537 [100]Open in a new tab Fig. 10. [101]Fig. 10 [102]Open in a new tab Clinical COAD samples validate SIGLEC1 signature. A The immunohistochemical (IHC) staining on clinical sample tissue microarrays indicates high and low expression of SIGLEC1. B Kaplan–Meier analysis of SIGLEC1 expression levels Discussion COAD is one of the common malignancies occurring in adeno-squamous epithelial cells. In the present experiment, the association with immune checkpoint-related genes in COAD samples was validated, starting with the comparison of DEGs generated by immune components, suggesting that SIGLEC1 expression may be an effective predictor of prognosis in COAD patients and a clue to altering TME status. By WGCNA analysis, we identified genes associated with the M stage, which a large number of studies have shown to be linked to cancer development [[103]21–[104]23]. In addition, univariate Cox regression and Kaplan–Meier analysis indicated that SIGLEC1 is a key gene associated with the M phase. SIGLEC1, also known as CD169, can encode a protein that is expressed as a type I transmembrane protein on macrophages. Cassetta et al. found that SIGLEC1 plays an important role in cancer regulatory pathways [[105]24] and some investigators have also determined SIGLEC1 as a potential prognostic biomarker and a basis for immunotherapy through the ESTIMATE algorithm [[106]25]. Numerous studies confirm that TME remodeling is a potential regulator of cancer progression and a source of therapeutic targets [[107]26–[108]29]. However, whether dynamic regulation of TME contributes to the prognostic indicators of COAD remains unknown, while studies related to immunotherapy targeting TME have become a hot topic in cancer treatment [[109]30]. Additionally, a previous study found that macrophages, one of the major immune cell types in TMB, are associated with poor prognosis among patients with breast cancer and head and neck squamous cell carcinoma (SCC) [[110]31, [111]32]. Nevertheless, the prognostic value of SIGLEC1 in TME in patients with COAD has not been verified. To validate the prognostic value of SIGLEC1 in TME of COAD patients, we explored the potential mechanisms of SIGLEC1 by GSEA analysis. Earlier studies found immune-related pathways with high SIGLEC1 expression such as inflammatory response, allograft rejection, epithelial mesenchymal transition, and IL6-STAT3 signaling, and confirmed the role of SIGLEC1 in inflammatory responses [[112]33–[113]35]. And Jing et al. observed that SIGLEC1 on monocytes was associated with tumor progression and inhibited the STAT3 signaling pathway [[114]31]. To further confirm the correlation between SIGLEC1 expression and immune microenvironment, we analyzed the levels of TIICs with high and low SIGLEC1 expression and found that high and low expression coincided with an increase in macrophage M0 levels and a decrease in B cell naive, while it has been shown that SIGLEC1 induces homologous activation of B cells and stimulates the expression of the molecule [[115]36], which is highly expressed in M2 macrophages [[116]37]. It has been shown that SIGLEC1 expression levels in monocytes correlate with disease severity [[117]38, [118]39] and that SIGLEC can regulate T cells, macrophages, and neutrophils in the immune response [[119]40, [120]41], while our results found that SIGLEC1 was positively correlated with macrophages (M0, M2), monocytes, neutrophils, and T cells and negatively correlated with resting T cell CD4 memory and activated T cell CD4 memory. Mendelian randomization provides an alternative method to randomized clinical trials of investigating causal associations between an exposure and an outcome [[121]42, [122]43]. Through Mendelian randomization analysis, SIGLEC1 was positively related to macrophages (M0, M2), resting mast cell, monocytes, neutrophils. Immune checkpoints are closely linked to tumor proliferation, invasion, metastasis, and patient prognosis assessment, and play a role in stabilizing the immune function and are good targets for tumor therapy. At the same time, SIGLEC1 and SIGLEC2 are potential biomarkers that are considered sentinels of disease activity or drug response in autoimmune diseases [[123]44]. In the present study, we found a significant association between SIGLEC1 expression and immune checkpoints, such as CTLA-4, LAG-3, PD1, PDL2, PDL1, TIGIT, and TIM-3, while SIGLEC has been shown to serve as an immune checkpoint target for enhancing anti-tumor immune responses [[124]45]. Furthermore, numerous studies indicated that SIGLEC1 is associated with tumor metastasis [[125]24, [126]46, [127]47], which is consistent with our analysis of metastasis from distant sites to TME. In this study, we identified a predictive biomarker, SIGLEC1, and used TCGA’s COAD transcriptome sequencing model to calculate immune, stromal, and estimated scores in COAD patients and to investigate the correlation of these scores with the clinical characteristics and overall survival of the patients. Subsequently, the role of the SIGLEC1 gene in TME and its association with immune checkpoint genes were explored, and we found that SIGLEC1 may be involved in the progression of COAD and could be used to estimate prognosis and be a new therapeutic target. However, the specific role of SIGLEC1 in regulating the development and progression of COAD requires more exploration and research, and further experiments using in vivo and in vitro models to confirm TME in colon cancer. Limitation This study was mainly completed through bioinformatics, and the size of the sample size will directly affect our results. In addition, our study still needs more experiments to verify. Supplementary Information [128]Supplementary Material 1.^ (11.9KB, docx) [129]Supplementary Material 2.^ (14.1KB, docx) [130]Supplementary Material 3.^ (588.3KB, tif) Acknowledgements