Abstract Background Sialyltransferases are enzymes involved in the addition of sialic acid to glycoproteins and glycolipids, influencing various physiological and pathological processes. The expression and function of sialyltransferases in tumors, particularly in kidney renal clear cell carcinoma (KIRC) remained underexplored. This study aimed to develop a prognostic model based on sialyltransferase-related genes (SRGs) to predict the prognosis and treatment response of patients with KIRC. Methods We utilized RNA-Seq data of KIRC from The Cancer Genome Atlas (TCGA) database, selecting samples with survival data and clinical outcomes. Somatic mutation and neoantigen data were analyzed using the "maftools" package, and genes involved in the sialylation process were identified through the Molecular Signatures Database. Validation cohorts of KIRC samples were obtained from the International Cancer Genome Consortium (ICGC) database. Single-cell RNA sequencing (scRNA-seq) data were downloaded from the Gene Expression Omnibus (GEO) platform, and preprocessing, normalization, and dimensionality reduction analyses were conducted using the "Seurat" package. Differentially expressed sialylation genes were identified using the "limma" package, and their functional enrichment was assessed via Gene Ontology GO and KEGG analyses. Consensus clustering analysis was performed to identify molecular subtypes of KIRC based on sialylation, and drug sensitivity of different subtypes was evaluated using the "pRRophetic" package. A risk signature model comprising 5 SRGs was constructed through univariate and multivariate Cox regression analyses and validated in both the TCGA and ICGC cohorts. The "estimate" package was utilized to calculate immune and stromal scores for each KIRC sample, assessing the tumor immune microenvironment characteristics of different subtypes. Results Analysis of scRNA-seq data identified 25 cell subtypes, categorized into 9 cell types. CD4 + memory cells exhibited the highest potential interactions with other cell subtypes. We identified 14 differentially expressed sialylation genes and confirmed their enrichment in various biological pathways through GO and KEGG analyses. Consensus clustering analysis based on sialylation identified 2 molecular subtypes: C1 and C2. The C2 subtype demonstrated higher sialylation scores and poorer prognosis. Drug sensitivity analysis indicated that the C1 subtype had better responses to Dasatinib and Lapatinib, whereas the C2 subtype was more sensitive to Epothilone B and Vinorelbine. The risk signature model, constructed with five distinct SRGs, exhibited strong predictive accuracy, as indicated by Area Under the Curve (AUC) values of 0.68, 0.69, and 0.70 for 1-, 3-, and 5-year survival, respectively, across both the TCGA and ICGC validation cohorts. Immune microenvironment analysis revealed that the C1 subtype exhibited higher immune and stromal scores, while the C2 subtype showed significantly enhanced expression of immune checkpoint genes. Conclusion This study successfully developed a prognostic model based on SRGs, effectively predicting the prognosis and drug response of KIRC patients. The model demonstrated significant predictive performance and potential clinical application value. Furthermore, the study highlighted the critical role of sialylation in KIRC, offering new insights into its underlying mechanisms in tumor biology. These findings could guide personalized treatment strategies for KIRC patients, emphasizing the importance of sialylation in cancer prognosis and therapy. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-02566-4. Keywords: Kidney renal clear cell carcinoma, Sialyltransferase, Prognostic signature, Immunotherapy, Biomarker Introduction The incidence of malignant tumors of the urinary tract has been rising annually [[40]1]. Among these, renal cell carcinoma (RCC) is one of the most lethal malignancies within the urinary system [[41]2]. Notably, 75 to 80% of RCC cases are histologically classified as clear cell renal cell carcinoma (KIRC) [[42]3]. KIRC stands out as the predominant and formidable subtype among renal cell carcinomas, characterized by its propensity for metastasis and resistance to conventional therapies, with about 30% of cases presenting metastasis at the time of initial diagnosis [[43]4]. Despite the swift advancement of novel therapeutic strategies for KIRC, the carcinoma exhibits limited therapeutic efficacy to limited therapeutic efficacy [[44]5]. Given the multifaceted etiology of KIRC, further exploration into its molecular intricacies is essential to unravel its underlying mechanisms and pave the way for personalized treatment regimens tailored to individual patients'needs. In recent years, glycosylation, a critical post-translational modification involving the attachment of sugar moieties to proteins and lipids, has emerged as a crucial factor in cancer biology [[45]6–[46]8]. Among the various glycosylation processes, sialylation, the addition of sialic acids to glycoproteins and glycolipids mediated by sialyltransferases (SiaTs), played a pivotal role in modulating tumor progression, immune evasion, and metastasis [[47]9–[48]11]. Moreover, recent studies highlighted the potential of targeting sialylation pathways as a therapeutic strategy, which could disrupt sialic acid interactions were being investigated for their ability to enhance immune responses against tumors and reduce metastatic potential [[49]12–[50]14]. Furthermore, the differential expression of sialylated molecules on cancer cells compared to normal cells offered promising avenues for developing diagnostic biomarkers and personalized treatment approaches. In this study, we developed a prognostic model based on SiaTs-related genes (SRGs) to predict the prognosis and treatment response of KIRC patients. Utilizing RNA-Seq data from The Cancer Genome Atlas (TCGA) [[51]15] and the International Cancer Genome Consortium (ICGC), alongside single-cell RNA sequencing (scRNA-seq) data, we identified differentially expressed sialylation genes. We classified KIRC into molecular subtypes and constructed a robust prognostic model. Additionally, we examined the tumor immune microenvironment and drug sensitivity associated with these subtypes. Our model incorporated clinical data and was evaluated for predictive performance through survival analyses. We also explored implications for immunotherapy and targeted therapies via immune infiltration analyses and drug sensitivity tests. By elucidating the prognostic and therapeutic significance of sialylation in KIRC, this study provided valuable insights into the molecular mechanisms driving KIRC progression and response to treatment. The identification of SRGs as potential biomarkers and therapeutic targets opened new avenues for personalized treatment strategies, aiming to improve clinical management and outcomes for KIRC patients. Our findings underscored the importance of integrating multi-omics data to advance our understanding of cancer biology and enhance the precision of cancer prognostication and therapy. Materials and methods Data collection and processing RNA-Seq data for KIRC were collected from the TCGA database ([52]https://portal.gdc.cancer.gov/projects/TCGA). Samples without survival data and outcome status were excluded, resulting in a final dataset of 538 tumor samples and 72 adjacent normal samples. Somatic mutation and neoantigen data for KIRC were obtained from the TCGA database and analyzed using the"maftools"package. Genes involved in the biological sialylation process were identified through the Molecular Signatures Database (MSigDB) [[53]16], which includes SiaTs, transporters, and neuraminidases. A validation cohort of KIRC samples was downloaded from the ICGC database. [54]GSE121636 including 6 KIRC samples was downloaded from the GEO platform. Detailed processing of the scRNA-seq data is as follows: (1) The"Seurat"package was used for preprocessing the scRNA-seq data; the PercentageFeatureSet function was employed to determine the proportion of mitochondrial genes, and correlation analysis was performed to investigate the relationship between sequencing depth and mitochondrial gene sequences and/or total intracellular sequences. (2) Cells expressing fewer than 200 genes were removed, and each gene was set to be expressed in at least three cells. (3) Gene expression in each cell was limited to between 2000 and 4000 genes, each cell had a UMI count of at least 200, and the mitochondrial gene proportion was less than 20%. After data filtering, scRNA-seq data were normalized using the LogNormalize method. scRNA-seq analysis Following data filtration, scRNA-seq data underwent normalization using the LogNormalize method. The optimal principal component count was determined via an ElbowPlot, and t-distributed stochastic neighbor embedding (tSNE) analysis was employed to visualize inter-cluster relationships. Established marker genes were utilized to delineate broad cellular classifications. To identify subtype-specific marker genes, the FindAllMarkers function was applied with a log fold change threshold of 1. Genes exhibiting |avg_log[2] FC|> 1 and min.pct = 0.25 were selected as distinct markers for each cell subtype. Dimensionality reduction via tSNE was performed using the"RunTSNE"function in R. The scRNA-seq data for KIRC underwent re-analysis utilizing the"Seurat"package. Batch effects were mitigated using the FindIntegrationAnchors function. Non-linear dimensionality reduction was executed using the Uniform Manifold Approximation and Projection (UMAP) method. Subsequently, individual cells were clustered into distinct subgroups utilizing the FindNeighbors and FindClusters functions. The tSNE embedding was further refined using the RunTSNE function. Marker genes were discerned using the FindAllMarkers function, followed by Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis (KEGG) and Gene Ontology Enrichment Analysis (GO) via the"clusterProfiler"package, with notable enrichment observed in pathways governing the regulation of T cell activation. Analysis of ligand–receptor interactions CellPhoneDB is a publicly available repository curated for receptors, ligands, and their interactions. Both ligands and receptors include subunit structures, accurately representing heteromeric complexes. The ligand–receptor database within CellPhoneDB integrates data from UniProt, Ensembl, PDB, IUPHAR, and others, encompassing 978 proteins. This resource facilitates comprehensive and systematic analysis of intercellular signaling molecules, enabling the study of communication networks between different cell types. We conducted a significance analysis of ligand–receptor relationships in the feature space of single-cell expression profiles using the statistical_analysis function of the CellphoneDB software package. Cluster labels for all cells were randomly permuted 1000 times, and the mean expression levels of receptors within clusters and the mean expression levels of ligands within interacting clusters were determined. This process generated a null distribution (also known as a Bernoulli distribution or two-point distribution) for each receptor-ligand pair in every pairwise comparison between two cell types. Finally, we selected several ligand–receptor pairs of interest to illustrate their relationships. Selection of differentially expressed genes related to sialylation and enrichment analysis We performed data processing and statistical analysis of expression files using R. The"limma"package was utilized for data normalization and identification of Differentially Expressed Genes (DEGs), where genes with a P-value < 0.05 and |log[2] FC|> 1 were considered significant DEGs. A volcano plot illustrating the DEGs was generated using the"ggplot2"package. Additionally, the log2-transformed mRNA expression data were arranged into a heatmap using the"pheatmap"package in R. Consensus clustering algorithm, gene set variation analysis, and gene set enrichment analysis From 363 KIRC samples, we obtained expression patterns of 14 differentially sialylated genes along with clinical data. Subsequently, ConsensusClusterPlus was employed to cluster these differentially sialylated genes using the following parameters: reps = 50, pItem = 0.8, pFeature = 1, clusterAlg ="pam", distance ="spearman". Here,"pam"and"spearman"were used as the clustering algorithm and distance metric, respectively. The"GSVA"package in R was utilized to compute sialylation scores for each KIRC patient, serving as an indicator of sialylation. The"wilcox.test"function in R was then employed to compare differences in sialylation scores between different clusters. In addition to sialylation scores, enrichment scores for 50 hallmark signaling pathways were computed using the"GSVA"package. Subsequently, potential differences in signaling pathways between different clusters were compared using a similar approach. Cluster analysis of tumor immune microenvironment The"estimate"package in R was used to compute ImmuneScore, StromalScore, EstimateScore, and tumor purity for each KIRC sample. The"ggpubr"package in R was utilized to visualize these results. Furthermore, the TIMER2.0 database ([55]http://timer.cistrome.org/) provides comprehensive immune features of tumor-infiltrating cells in various tumor samples from the TCGA database, based on algorithms including TIMER, CIBERSORT, EPIC, and MCPCOUNTER. The"pheatmap"package in R was used to illustrate the infiltration of different immune cells in each KIRC sample. Subsequently, the"limma"package in R was employed to determine statistically significant changes in immune cell infiltration between subgroups C1 and C2. Immune cell subsets with p-values < 0.05 were separated and stored. The activation of Immune Checkpoint Genes (ICGs) played a crucial role in tumor immune evasion and malignant progression by inhibiting anti-tumor immune responses. It was widely acknowledged that ICGs play an irreplaceable role in regulating immune cell function and disease progression. Therefore, we further explored differences in the expression levels of ICGs between subgroups C1 and C2. Tumor Immune Dysfunction and Exclusion (TIDE) is a vital biomarker for immune therapy response. Drug sensitivity analysis Using the largest pharmacogenomics database, the GDSC Cancer Drug Sensitivity Pharmacogenomics database ([56]https://www.cancerrxgene.org/), we utilized the R package"pRRophetic"to predict the chemotherapy sensitivity of each tumor sample. Regression methods were employed to obtain estimated IC[50] values for specific chemotherapy drugs, and tenfold cross-validation was performed using the GDSC training set to assess the accuracy of regression and prediction. Default parameters were chosen for all analyses, including"combat"for batch effect removal and averaging duplicate gene expressions. Construction and validation of sialylation risk model In the survival package, single-factor Cox regression analysis was utilized to further identify prognostic-related genes with p < 0.05. To reduce the number of genes, we conducted Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression analysis, followed by stepwise regression for multi-factor Cox regression analysis. Based on the results of the multi-factor Cox model, we constructed a risk feature with the formula: Risk Score = ∑βi*Expi. Here, i represents genes in the risk feature, expi denotes the expression of gene i, and βi represents the coefficient of gene i in the multi-factor Cox model. After zero-mean normalization, patients were divided into high-risk and low-risk groups. The timeROC package was used for Receiver Operating Characteristic (ROC) curve analysis to evaluate the predictive performance of the risk feature. ROC curves were plotted based on risk scores, and the Area Under the Curve (AUC) was calculated to assess the accuracy of the risk model. The maximum sum of sensitivity and specificity was determined as the optimal cutoff point for stratifying high-risk and low-risk populations. Kaplan–Meier analysis was conducted on high-risk and low-risk groups to demonstrate that the risk score could serve as an independent clinical prognostic indicator. Construction of risk feature and nomogram To construct a clinically applicable nomogram model, we initially conducted univariate and multivariate Cox regression analyses on clinical pathology and risk features. Variables with p < 0.05 in the multivariate Cox model were used to construct a nomogram for predicting the prognosis of KIRC using the rms package. Calibration curves were generated to assess the predictive accuracy of the model. Decision Curve Analysis (DCA) was employed to evaluate the reliability of the model. Gene mutation analysis and gene set enrichment analysis The"GSVA"package was utilized to compute potential differences in enrichment scores for HALLMARKER signaling pathways and KEGG pathways between high and low-risk groups. Additionally, analysis of Single Nucleotide Variation (SNV) mutations and Copy Number Variation (CNV) alterations in genes of the risk model was performed. Response to immune checkpoint blockade We obtained transcriptomic and matched clinical data from the IMvigor210 cohort of KIRC patients treated with anti-PD-L1 therapy (atezolizumab) ([57]http://research-pub.gene.com/IMvigor210CoreBiologies). Additionally, the [58]GSE78220 cohort includes transcriptomic data from pre-treated melanoma patients receiving anti-PD-1 checkpoint inhibition therapy. Data were also downloaded to ascertain the potential value of risk feature scores in predicting responsiveness to immune checkpoint blockade (ICB). Statistical analysis All statistical analyses were conducted using R software (version 4.2.3) employing Cox univariate and multivariate regression analyses to determine independent prognostic factors for overall survival. Survival analysis was performed using Cox univariate regression analysis. ROC curve analysis was employed to measure the extent of prediction of the overall survival prognosis model outcome.In enrichment analysis, in order to avoid false positive results, multiple test correction is required, strictly following the standard of p-value < 0.05. Wilcoxon test was utilized to compare tumor immune infiltrating cells. Visualization was performed using the"ggplot2","pheatmap", and"forestplot"packages. A significance level of p < 0.05 was used to determine statistical significance. Results Quality control, normalization, and analysis of scRNA-seq data The single-cell data were filtered to ensure that each gene was expressed in at least three cells, and each cell expressed at least 200 genes. The"PercentageFeatureSet"function was used to calculate the proportion of mitochondria and rRNA, ensuring gene expression levels ranged from 200 to 4000 in each cell, with at least 200 UMIs per cell. This process yielded 25,066 cells. Positive correlations were observed between UMI/mRNA counts and mitochondrial gene content. Subsequently, 2000 variable genes were plotted on a scatter plot, and the optimal number of principal components (PCs) was observed to be 30 through ElbowPlot analysis. PCA dimensionality reduction analysis revealed different score values across dimensions, with the optimal PC number determined to be 30 through ElbowPlot analysis (Fig. [59]1A). Finally, 25 subtypes were obtained via tSNE, revealing significant expression differences for numerous genes among these subtypes in Fig. [60]1B. Expression levels of the top 10 genes with the greatest expression differences among subtypes were displayed in (Fig. [61]1C). Cell types were annotated using known marker genes, with 25 clusters assigned to 9 cell categories in (Fig. [62]1D): CD4 + memory cells, NK cells, Monocytes, B cells, CD8 + T cells, M2 macrophages, Regulatory T cells, Endothelial cells, and Cancer stem cells. Marker genes for each cell subtype were identified using the FindAllMarkers function (logFC = 1, minpct = 0.25, adj.p < 0.05), resulting in 1930 subtype-specific marker genes. Enrichment analysis of these marker genes revealed significant enrichment in GO terms such as B cell activation and immune response-regulating cell surface receptor in (Fig. [63]1E). Analysis of ligand–receptor relationships in the single-cell expression profile was conducted using the"CellphoneDB"package. Selected ligand–receptor pairs were highlighted, with high interaction scores observed for interactions between CD4 + memory cells and MIF- (CD74 + CXCR4). Finally, the number of ligand–receptor gene pairs corresponding to each cell group was tallied, revealing CD4 + memory cells to have the highest potential interaction with other cell subtypes in Fig. [64]1F. Additionally, Fig. [65]1G provides further evidence supporting this conclusion, illustrating the extensive network of interactions centered around CD4 + memory cells. Fig. 1. [66]Fig. 1 [67]Open in a new tab Different subtypes in single-cell sequencing. A Results of the ElbowPlot. B t-SNE plot of the 25 identified main cell types in KIRS lesions. C Top 10 genes with the greatest differential expression in each cell subtype. D Enrichment analysis of marker genes. E Interaction between CD4 + memory cells and MIF- (CD74 + CXCR4). F Interaction network of ligand–receptor genes Identification of differentially expressed genes in sialylation From TCGA data, 109 sialylation-related genes were extracted from KIRC samples for differential analysis with criteria of P < 0.05 and log[2] |FC|> 1. Among these, 14 genes were identified as differentially expressed sialylation genes. Visualization of DEGs was performed using volcano plots and heatmaps (Fig. [68]2A, B). GO and KEGG analyses were conducted for the identified differentially expressed sialylation genes. GO analysis, pathways such as carbohydrate binding were primarily identified in Fig. [69]2C. In KEGG enrichment analysis, enriched pathways mainly included Mucin type O-glycan biosynthesis in Fig. [70]2D. The chromosomal locations of differentially expressed sialylation genes in humans are depicted in the chromosomal map. All differentially expressed sialylation genes between the two groups showed significant differences (P < 0.05) (Fig. [71]2E). Further analysis of differentially expressed sialylation genes in ARDS revealed multiple correlations between genes, represented by the r-value in Fig. [72]2F. Additionally, Fig. [73]2G showed copy number variations (CNVs) of prognostic differentially expressed sialylation genes were observed in KIRC samples. Fig. 2. [74]Fig. 2 [75]Open in a new tab Identification of differentially expressed genes (DEGs) in SRGs. A Volcano plot of DEGs. B Heatmap of DEGs. C GO enrichment of DEGs. D KEGG enrichment of DEGs. E Chromosomal view of DEGs. F Correlation analysis of DEGs. G CNV analysis of DEGs Identification of sialylation-based molecular subtypes in KIRC using consensus clustering analysis To uncover sialylation-based molecular subtypes in KIRC, consensus clustering analysis was performed using KIRC samples.By increasing the clustering variable (k) from 2 to 9 (Supplementary Table S1), we found that when k = 2, the intragroup correlations were the highest and the intergroup correlations were low.From k = 2 to k = 9, there were significant differences in the relative changes of the area under the CDF curve. Considering the cumulative distribution function (CDF) curve and delta area, k = 2 was selected as the number of unique and non-overlapping subtypes in Fig. [76]3A, B. To uncover sialylation-based molecular subtypes in KIRC, consensus clustering analysis was performed using KIRC samples. Considering the cumulative distribution function (CDF) curve and delta area, k = 2 was selected as the number of unique and non-overlapping subtypes in Fig. [77]3A, B. From k = 2 to k = 9, there were significant differences in the relative changes of the area under the CDF curve..Thus, two sialylation-based molecular subtypes of KIRC were constructed in Fig. [78]3C, with C1 subtype comprising 270 cases and C2 subtype comprising 258 cases. Patients in C2 exhibited higher sialylation scores compared to C1 subtype, and distinct differences in the expression levels of differentially expressed sialylation-related genes were observed between C1 and C2 subtypes, with most genes being overexpressed in the C2 subtype (Fig. [79]3D). Findings from pathway-based single-sample gene set enrichment analysis (ssGSEA) indicated that the C2 subtype was primarily enriched in pathways such as ADIPOGENESIS and FATTY_ACID_METABOLISM, suggesting a close association between sialylation and the aforementioned typical tumor-related pathways in Fig. [80]3E. Fig. 3. [81]Fig. 3 [82]Open in a new tab Identification of two gene clusters. A Two gene clusters according to the consensus clustering matrix (k = 2). B Relative change in area under the CDF curve when k = 2. C Violin plot of sialic acidification score. D Heatmap of differential expression of 12 SRGs in different subtypes, with red indicating upregulation and blue indicating downregulation. E Heatmap of ssGSEA Drug sensitivity analysis based on subtypes Considering the widespread use of molecular targeted drugs in treating KIRC, the"pRRophetic"package in R was employed to systematically evaluate chemotherapy responses based on different subtypes. Our data indicated that drugs such as Dasatinib and Lapatinib were expected to benefit the C1 subtype. Conversely, the C2 subtype appeared to benefit from drugs like Epothilone.B and Vinorelbine (Fig. [83]4A). Fig. 4. [84]Fig. 4 [85]Open in a new tab Drug sensitivity analysis plot Clustering analysis of tumor immune microenvironment Transcriptomic data were utilized, and the"ESTIMATE"package was employed to assess differences in immune features between C1 and C2 subtypes (based on StromalScore, ImmuneScore, ESTIMATEScore, and Tumorpurity). Figure [86]5A demonstrated that the C1 subtype exhibited enhanced levels of ImmuneScore, StromalScore, and ESTIMATEScore, while tumor purity levels decreased. These findings suggested a positive correlation between prognosis in KIRC and immune and stromal components. To further explore the abundance of immune cell infiltration in the tumor microenvironment, various algorithms were applied to estimate the percentage of immune cell infiltration in C1 and C2 subtypes. As shown in Fig. [87]5B, based on TIMER, CIBERSORT-ABS, EPIC, and MCPCOUNTER algorithms, the C1 subtype shows an increase in proportions of immune cells such as Endothelial and Neutrophils, with most immune cells increasing in proportion in subtype C2. Immune checkpoint genes (ICGs) are determinants of immune function in immune cells. Similarly, our study results indicate enhanced expression of ICGs in the C2 subtype (Fig. [88]5C). Fig. 5. [89]Fig. 5 [90]Open in a new tab Immune infiltration analysis. A Violin plot of immune scores in different subtypes. B The distribution traits of immunocyte infiltration in SRG-based clusters. C Boxplot of immune checkpoint genes (ICGs). ns not significant, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001 Construction of sialylation risk model The prognostic value of 14 differential sialylation genes was assessed through univariate Cox regression analysis, with 6 genes showing prognostic significance (Fig. [91]6A, P < 0.05). Lasso-Cox regression analysis was performed on the prognostic genes, with λ = 0.0067 (Fig. [92]6B). Eventually, 5 genes were obtained and incorporated into the risk features after conducting multivariate Cox regression analysis using a stepwise regression method (Fig. [93]6C). The final 5-sialylation-associated gene signature formula is as follows: Riskscore ="−0.244*ST8SIA4+0.39*SIGLEC10 −0.559*GCNT4- 0.206*SELP −0.135*SLC17 A3". We calculated the risk score for each sample and categorized them into high-risk and low-risk groups after z-score normalization. The AUC values for the 1, 3, and 5-year survival models in the TCGA cohort ranged from 0.68, 0.69, to 0.70, respectively. Kaplan–Meier survival analysis showed that compared to low-risk patients in the TCGA cohort, high-risk patients had significantly worse survival outcomes (Fig. [94]6D, E). Similar results were observed in the ICGC validation cohort (Fig. [95]6F, G). Fig. 6. [96]Fig. 6 [97]Open in a new tab Construction of SRGs clinical model. A Univariate Cox regression. B Cross-validation in LASSO regression. C Multivariate Cox regression. D Kaplan–Meier survival analysis of high and low-risk scores in TCGA. E Survival model for 1, 3, and 5 years in TCGA. F Kaplan–Meier survival analysis of high and low-risk scores in ICGC. G Survival model for 1, 3, and 5 years in ICGC Identification of independent risk factors and construction of nomogram To enhance the predictive performance of the risk features, we integrated clinical-pathological characteristics and risk scores through univariate and multivariate Cox regression analyses. Univariate analysis indicated that Age, risk features, Stage, Grade, and TNM were significant independent prognostic factors for the risk model; multivariate analysis revealed that the risk features and M were important independent prognostic factors for the risk model (Fig. [98]7A, B). Consequently, a nomogram combining Stage and risk score was constructed, and calibration plots demonstrated that the nomogram could effectively predict actual survival outcomes (Fig. [99]7C–E). Fig. 7. [100]Fig. 7 [101]Open in a new tab Construction of forest plots. A Forest plot of univariate Cox analysis. B Forest plot of multivariate Cox analysis. C Nomogram based on risk score and clinical characteristics. D Calibration plot for internal validation of the nomogram. E DCA curve Analysis of the risk model on the tumor immune microenvironment Our data indicate significant positive correlations between ST8SIA4, SIGLEC10, SELP, SLC17 A3, and Stromal Score, Immune Score, and Estimate Score, while GCNT4 shows significant negative correlations with Stromal, Immune, and Estimate Scores (Fig. [102]8A). Subsequently, based on the median expression of each gene, correlation analysis of hub genes was conducted to investigate their relationships in KIRC (Fig. [103]8B). We simultaneously investigated the expression differences of immune scores of 5 SRGs between high and low-risk groups (Fig. [104]8C–G). Fig. 8. [105]Fig. 8 [106]Open in a new tab Impact of SRGs model on the immune microenvironment. A Heatmap of StromalScore, ImmuneScore, and ESTIMATEScore in risk model. B Correlation analysis of hub genes. (C-G) Boxplot of immune scores of hub genes in high and low-risk groups. ns not significant, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001 Pathway analysis of hub genes and mutation analysis We also conducted an analysis of potential pathways associated with each risk gene, a total of 5 pathways were significantly correlated with these 5 genes (Fig. [107]9A). Next, we examined the SNV mutations of the 5 genes in the risk signal. The study revealed that ST8SIA4, SIGLEC10, GCNT4, SELP, and SLC17 A3 had SNV mutations in more samples (Fig. [108]9B). We analyzed the co-occurrence probability of these key genes and the top 10 genes with the most mutations. Significant co-occurrence probabilities of mutations were found among these 5 genes, with only a very small number of samples showing CNV gains/losses among these 5 genes (Fig. [109]9C). To further elucidate the relationship between risk genes and KIRC, we analyzed the correlation between these genes and several molecular features of KIRC (Fig. [110]9D). The results showed that ST8SIA4, SIGLEC10, GCNT4, SELP, and SLC17 A3 were significantly negatively correlated with non-ploidy score, homologous recombination deficiency, partial alteration, fragment count, and non-silent mutation rate. Additionally, we also analyzed potential pathways associated with each risk gene (Fig. [111]9E). As shown in Fig. [112]9F, a total of 12 pathways were significantly associated with these 5 genes, including HISTIDINE_METABOLISM, AUTOIMMUNE_THYROID_DISEASE, and GRAFT_VERSUS_HOST_DISEASE. Fig. 9. [113]Fig. 9 [114]Open in a new tab Mutation analysis of hub genes. A Waterfall plot of mutations. B SNV mutations. C Gain/loss of CNV. D Correlation analysis between hub genes and KIRC molecular features. E Correlation heatmap of 5 key genes with homologous recombination defects, non-diploid scores, fragment counts, score changes, and non-silent mutation rates. F Pathway heatmap of hub genes Significant value of risk feature scores in predicting immune checkpoint blockade response Through the analysis of the IMvigor210 and [115]GSE78220 datasets (Fig. [116]10A–F), we found that risk feature scores had significant value in predicting the response to ICB. In the IMvigor210 dataset, the survival rate of the high-risk group was significantly lower than that of the low-risk group (p = 0.00028) (Fig. [117]10A). Although there was no significant difference in risk scores between the CR/PR (complete response/partial response) and PD/SD (progressive disease/stable disease) groups, the proportion of CR/PR was higher in the low-risk group (27% vs. 18%) (Fig. [118]10B, C). In the [119]GSE78220 dataset, the survival rate of the high-risk group was also significantly lower than that of the low-risk group (p = 0.015). Similarly, although there was no significant difference in risk scores between the PD (progressive disease) and PR/CR (partial response/complete response) groups, the proportion of PR/CR was higher in the low-risk group (33% vs. 67%) (Fig. [120]10D–F). These results indicated that patients in the low-risk group responded better to ICB therapy, and risk feature scores could serve as important biomarkers for predicting the efficacy of ICB. Fig. 10. [121]Fig. 10 [122]Open in a new tab Significant values in predicting immune checkpoint blockade response. A Survival curves of high and low-risk groups in IMvigor210. B Scores of different treatment groups. C Proportions of different treatment groups. D Survival curves of high and low-risk groups in [123]GSE78220. E Scores of different treatment groups. F Proportions of different treatment groups Discussion The comprehensive analysis presented in this study delves into the intricate relationship between SRGs and KIRC, offering insights into the prognostic implications and therapeutic avenues associated with these molecular mechanisms. Despite improvements in KIRC detection methods, prognosis prediction, and drug treatment selection, patients still fail to achieve favorable outcomes due to high recurrence or distant metastasis rates and drug resistance [[124]17–[125]19]. Early diagnosis, personalized treatment, and regular follow-up are key to achieving and improving patient prognosis. Therefore, the urgent need in KIRC is to discover new biomarkers to improve survival rates and assess drug responses. By unraveling the molecular intricacies of KIRC through the lens of sialylation, this research underscores the multifaceted nature of tumor biology and its implications for precision medicine. SiaTs, as a crucial class of enzyme molecules, play a pivotal regulatory role in tumor development by modulating various events such as cell proliferation, metastasis, angiogenesis, and chemotherapy resistance [[126]20–[127]23]. They are instrumental in regulating the growth, metastasis, and therapeutic resistance of tumor cells. In our study, we screened 5 SRGs (ST8SIA4, SIGLEC10, GCNT4, SELP, and SLC17 A3) and developed prognostic features associated with these SRGs to classify patients into high-risk and low-risk groups. Survival analysis indicated that patients in the low-risk group exhibited better prognoses and longer survival times. Furthermore, we evaluated the prognostic value of these features in predicting 1-year, 3-year, and 5-year survival rates using time-dependent ROC curves, demonstrating their robust predictive capability, further supported by constructed calibration plots. These findings underscore the effectiveness of the newly established SRGs-related prognostic features in assessing prognosis, thereby serving as complementary tools for outcome evaluation in KIRC patients. ST8SIA4 transfers sialic acid (located at the end positions of N- and O-linked polysaccharides) to the end positions of sugars [[128]32, [129]33]. A previous study found that lncRNA HOTAIR/miR-124/ST8SIA4 enhances the malignancy of renal cell carcinoma [[130]34]. The IFN-γ response related features of seven genes containing ST8SIA4 can effectively predict the prognosis of ccRCC [[131]35]. Siglecs is a receptor family that binds to sialic acid containing polysaccharides. SIGLEC10 plays an important role in cancer-related processes and can inhibit the proliferation and function of tumor infiltrating CD8 T cells in vitro through the Akt/P38/Erk signaling pathway [[132]34]. GCNT4 may play a role as an oncogene in a variety of tumors, such as pancreatic cancer, breast cancer and lung cancer; However, its potential role in OS has never been reported before [[133]36]. SELP plays an important role in cancer-related thrombosis, promoting cancer metastasis through platelet cancer cell interactions and mediating cancer immunity [[134]37]. We further found that the SRG gene is significantly associated with 186 pathways, while protective genes and risk genes clearly have different pathway characteristics. Protective genes are significantly positively correlated with allogeneic rejection, complement and coagulation, and cell apoptosis, while risk genes are significantly correlated with fatty acid metabolism and adipogenesis. The changes in fatty acid metabolism play an important role in KIRC, and the prognostic value of fatty acid metabolism related genes in KIRC has also been revealed [[135]41]. Apoptosis is a crucial cellular process for controlling tumor proliferation, spread, and future outcomes [[136]42, [137]43], and changes in metabolic pathways exist in many diseases, including tumors. Metabolic reprogramming is crucial for maintaining abnormal proliferation of tumor cells [[138]44]. Therefore, these data provide direction for us to further investigate the regulation of these risk genes in KIRC. SiaTs were capable of promoting cancer progression by influencing immune cell infiltration, with immune cell infiltration playing a crucial role in cancer prognosis [[139]24–[140]26]. The infiltration of immune cells was essential for regulating the tumor microenvironment, controlling tumor growth, and managing metastasis [[141]27–[142]29]. Immune cells, including T lymphocytes and natural killer cells, could recognize and eliminate abnormal cells, thus suppressing tumor growth and spread [[143]30]. Significantly, our research elucidates the intricate interplay between sialylation and the tumor immune microenvironment in KIRC. We discern distinct immune infiltration patterns across specific KIRC subtypes, implying disparate responses to immunotherapeutic modalities. Given the immunogenicity of KIRC, the use of PD-1 and PD-L1 as frontline treatments for metastatic disease was increasingly dominant based on ICB [[144]31]. The differential expression of immune related genes may also be another factor contributing to the good prognosis of C2 subtypes. Compared with C1 subtype, the immune suppressive factors PDCD1. The decreased expression of CTLA and LAG3, as well as the decrease in immunostimulatory factors [[145]38, [146]39], suggest that the C2 subtype may have a better prognosis. The differential distribution of gene mutations, immune cell and immune regulatory gene expression in C2 subtype compared to other subtypes may be the mechanism leading to a good prognosis for C2 subtype. In summary, these findings highlight the methods of patient stratification and targeted therapy. Our research also found that risk scores based on SRGs had significant value in predicting responsiveness to ICB. These revelations hold profound implications for devising personalized treatment regimens encompassing immunotherapy and targeted approaches tailored to exploit the susceptibilities of distinct KIRC subtypes and circumvent therapeutic resistance.Lapatinib: When EGFR and HER2 hormone therapy are equally effective, lapatinib has better tolerability [[147]40]. While our study advances our understanding of sialylation's role in KIRC, it is not devoid of limitations. The retrospective nature of our analysis precludes causal inference, necessitating prospective studies for validation. Firstly, our prognostic model was constructed and validated using retrospective data from public databases. More forward-looking real-world data is needed to validate its clinical efficacy. Secondly, the inherent weakness of constructing prognostic models solely based on a single marker is inevitable, as many prominent prognostic genes in KIRC may have been excluded.The prognostic model was developed and validated using the TCGA dataset, while ICGC was applied for external validation analysis. TCGA data removes samples without survival data and outcome status and performs corrective processing, while ICGC directly conducts external validation analysis. The differences in these data may affect the prognosis of SRG, and future basic experiments are needed to validate our findings and elucidate the functional role of SRG in KIRC and development. Additionally, elucidating the functional significance of individual SRGs and their intricate interplay in KIRC pathogenesis mandates further exploration through in vitro and in vivo experimentation. Furthermore, the clinical utility of our prognostic model warrants validation in independent patient cohorts to assess its generalizability and robustness across diverse populations. In conclusion, our study underscores the paramount importance of sialylation in shaping the landscape of KIRC, providing a comprehensive framework for elucidating the molecular mechanisms underpinning tumor progression and therapeutic response. Through the integration of multi-omics data and advanced computational methodologies, we have identified novel prognostic markers, delineated molecular subtypes, and elucidated the tumor immune microenvironment of KIRC. These findings offer pivotal insights for the development of precision medicine strategies aimed at enhancing patient outcomes in KIRC and beyond. Conclusions In conclusion, our study sheds light on the intricate relationship between SRGs and KIRC, unveiling distinct molecular subtypes and their implications for prognosis and treatment. By elucidating the role of sialylation dysregulation in KIRC progression and its interaction with the tumor immune microenvironment, our findings offer insights into potential therapeutic targets and strategies for personalized medicine in KIRC management. Moving forward, collaborative efforts are crucial to validate and translate these findings into clinical practice, ultimately improving patient outcomes and advancing precision medicine in KIRC. Supplementary Information [148]Supplementary Material 1^ (6.5MB, zip) Acknowledgements