Abstract

Background

   Sialyltransferases are enzymes involved in the addition of sialic acid
   to glycoproteins and glycolipids, influencing various physiological and
   pathological processes. The expression and function of
   sialyltransferases in tumors, particularly in kidney renal clear cell
   carcinoma (KIRC) remained underexplored. This study aimed to develop a
   prognostic model based on sialyltransferase-related genes (SRGs) to
   predict the prognosis and treatment response of patients with KIRC.

Methods

   We utilized RNA-Seq data of KIRC from The Cancer Genome Atlas (TCGA)
   database, selecting samples with survival data and clinical outcomes.
   Somatic mutation and neoantigen data were analyzed using the "maftools"
   package, and genes involved in the sialylation process were identified
   through the Molecular Signatures Database. Validation cohorts of KIRC
   samples were obtained from the International Cancer Genome Consortium
   (ICGC) database. Single-cell RNA sequencing (scRNA-seq) data were
   downloaded from the Gene Expression Omnibus (GEO) platform, and
   preprocessing, normalization, and dimensionality reduction analyses
   were conducted using the "Seurat" package. Differentially expressed
   sialylation genes were identified using the "limma" package, and their
   functional enrichment was assessed via Gene Ontology GO and KEGG
   analyses. Consensus clustering analysis was performed to identify
   molecular subtypes of KIRC based on sialylation, and drug sensitivity
   of different subtypes was evaluated using the "pRRophetic" package. A
   risk signature model comprising 5 SRGs was constructed through
   univariate and multivariate Cox regression analyses and validated in
   both the TCGA and ICGC cohorts. The "estimate" package was utilized to
   calculate immune and stromal scores for each KIRC sample, assessing the
   tumor immune microenvironment characteristics of different subtypes.

Results

   Analysis of scRNA-seq data identified 25 cell subtypes, categorized
   into 9 cell types. CD4 + memory cells exhibited the highest potential
   interactions with other cell subtypes. We identified 14 differentially
   expressed sialylation genes and confirmed their enrichment in various
   biological pathways through GO and KEGG analyses. Consensus clustering
   analysis based on sialylation identified 2 molecular subtypes: C1 and
   C2. The C2 subtype demonstrated higher sialylation scores and poorer
   prognosis. Drug sensitivity analysis indicated that the C1 subtype had
   better responses to Dasatinib and Lapatinib, whereas the C2 subtype was
   more sensitive to Epothilone B and Vinorelbine. The risk signature
   model, constructed with five distinct SRGs, exhibited strong predictive
   accuracy, as indicated by Area Under the Curve (AUC) values of 0.68,
   0.69, and 0.70 for 1-, 3-, and 5-year survival, respectively, across
   both the TCGA and ICGC validation cohorts. Immune microenvironment
   analysis revealed that the C1 subtype exhibited higher immune and
   stromal scores, while the C2 subtype showed significantly enhanced
   expression of immune checkpoint genes.

Conclusion

   This study successfully developed a prognostic model based on SRGs,
   effectively predicting the prognosis and drug response of KIRC
   patients. The model demonstrated significant predictive performance and
   potential clinical application value. Furthermore, the study
   highlighted the critical role of sialylation in KIRC, offering new
   insights into its underlying mechanisms in tumor biology. These
   findings could guide personalized treatment strategies for KIRC
   patients, emphasizing the importance of sialylation in cancer prognosis
   and therapy.

Supplementary Information

   The online version contains supplementary material available at
   10.1007/s12672-025-02566-4.

   Keywords: Kidney renal clear cell carcinoma, Sialyltransferase,
   Prognostic signature, Immunotherapy, Biomarker

Introduction

   The incidence of malignant tumors of the urinary tract has been rising
   annually [[40]1]. Among these, renal cell carcinoma (RCC) is one of the
   most lethal malignancies within the urinary system [[41]2]. Notably, 75
   to 80% of RCC cases are histologically classified as clear cell renal
   cell carcinoma (KIRC) [[42]3]. KIRC stands out as the predominant and
   formidable subtype among renal cell carcinomas, characterized by its
   propensity for metastasis and resistance to conventional therapies,
   with about 30% of cases presenting metastasis at the time of initial
   diagnosis [[43]4]. Despite the swift advancement of novel therapeutic
   strategies for KIRC, the carcinoma exhibits limited therapeutic
   efficacy to limited therapeutic efficacy [[44]5]. Given the
   multifaceted etiology of KIRC, further exploration into its molecular
   intricacies is essential to unravel its underlying mechanisms and pave
   the way for personalized treatment regimens tailored to individual
   patients'needs.

   In recent years, glycosylation, a critical post-translational
   modification involving the attachment of sugar moieties to proteins and
   lipids, has emerged as a crucial factor in cancer biology
   [[45]6–[46]8]. Among the various glycosylation processes, sialylation,
   the addition of sialic acids to glycoproteins and glycolipids mediated
   by sialyltransferases (SiaTs), played a pivotal role in modulating
   tumor progression, immune evasion, and metastasis [[47]9–[48]11].
   Moreover, recent studies highlighted the potential of targeting
   sialylation pathways as a therapeutic strategy, which could disrupt
   sialic acid interactions were being investigated for their ability to
   enhance immune responses against tumors and reduce metastatic potential
   [[49]12–[50]14]. Furthermore, the differential expression of sialylated
   molecules on cancer cells compared to normal cells offered promising
   avenues for developing diagnostic biomarkers and personalized treatment
   approaches.

   In this study, we developed a prognostic model based on SiaTs-related
   genes (SRGs) to predict the prognosis and treatment response of KIRC
   patients. Utilizing RNA-Seq data from The Cancer Genome Atlas (TCGA)
   [[51]15] and the International Cancer Genome Consortium (ICGC),
   alongside single-cell RNA sequencing (scRNA-seq) data, we identified
   differentially expressed sialylation genes. We classified KIRC into
   molecular subtypes and constructed a robust prognostic model.
   Additionally, we examined the tumor immune microenvironment and drug
   sensitivity associated with these subtypes. Our model incorporated
   clinical data and was evaluated for predictive performance through
   survival analyses. We also explored implications for immunotherapy and
   targeted therapies via immune infiltration analyses and drug
   sensitivity tests.

   By elucidating the prognostic and therapeutic significance of
   sialylation in KIRC, this study provided valuable insights into the
   molecular mechanisms driving KIRC progression and response to
   treatment. The identification of SRGs as potential biomarkers and
   therapeutic targets opened new avenues for personalized treatment
   strategies, aiming to improve clinical management and outcomes for KIRC
   patients. Our findings underscored the importance of integrating
   multi-omics data to advance our understanding of cancer biology and
   enhance the precision of cancer prognostication and therapy.

Materials and methods

Data collection and processing

   RNA-Seq data for KIRC were collected from the TCGA database
   ([52]https://portal.gdc.cancer.gov/projects/TCGA). Samples without
   survival data and outcome status were excluded, resulting in a final
   dataset of 538 tumor samples and 72 adjacent normal samples. Somatic
   mutation and neoantigen data for KIRC were obtained from the TCGA
   database and analyzed using the"maftools"package. Genes involved in the
   biological sialylation process were identified through the Molecular
   Signatures Database (MSigDB) [[53]16], which includes SiaTs,
   transporters, and neuraminidases. A validation cohort of KIRC samples
   was downloaded from the ICGC database. [54]GSE121636 including 6 KIRC
   samples was downloaded from the GEO platform. Detailed processing of
   the scRNA-seq data is as follows: (1) The"Seurat"package was used for
   preprocessing the scRNA-seq data; the PercentageFeatureSet function was
   employed to determine the proportion of mitochondrial genes, and
   correlation analysis was performed to investigate the relationship
   between sequencing depth and mitochondrial gene sequences and/or total
   intracellular sequences. (2) Cells expressing fewer than 200 genes were
   removed, and each gene was set to be expressed in at least three cells.
   (3) Gene expression in each cell was limited to between 2000 and 4000
   genes, each cell had a UMI count of at least 200, and the mitochondrial
   gene proportion was less than 20%. After data filtering, scRNA-seq data
   were normalized using the LogNormalize method.

scRNA-seq analysis

   Following data filtration, scRNA-seq data underwent normalization using
   the LogNormalize method. The optimal principal component count was
   determined via an ElbowPlot, and t-distributed stochastic neighbor
   embedding (tSNE) analysis was employed to visualize inter-cluster
   relationships. Established marker genes were utilized to delineate
   broad cellular classifications. To identify subtype-specific marker
   genes, the FindAllMarkers function was applied with a log fold change
   threshold of 1. Genes exhibiting |avg_log[2] FC|> 1 and min.pct = 0.25
   were selected as distinct markers for each cell subtype. Dimensionality
   reduction via tSNE was performed using the"RunTSNE"function in R. The
   scRNA-seq data for KIRC underwent re-analysis utilizing
   the"Seurat"package. Batch effects were mitigated using the
   FindIntegrationAnchors function. Non-linear dimensionality reduction
   was executed using the Uniform Manifold Approximation and Projection
   (UMAP) method. Subsequently, individual cells were clustered into
   distinct subgroups utilizing the FindNeighbors and FindClusters
   functions. The tSNE embedding was further refined using the RunTSNE
   function. Marker genes were discerned using the FindAllMarkers
   function, followed by Kyoto Encyclopedia of Genes and Genomes pathway
   enrichment analysis (KEGG) and Gene Ontology Enrichment Analysis (GO)
   via the"clusterProfiler"package, with notable enrichment observed in
   pathways governing the regulation of T cell activation.

Analysis of ligand–receptor interactions

   CellPhoneDB is a publicly available repository curated for receptors,
   ligands, and their interactions. Both ligands and receptors include
   subunit structures, accurately representing heteromeric complexes. The
   ligand–receptor database within CellPhoneDB integrates data from
   UniProt, Ensembl, PDB, IUPHAR, and others, encompassing 978 proteins.
   This resource facilitates comprehensive and systematic analysis of
   intercellular signaling molecules, enabling the study of communication
   networks between different cell types. We conducted a significance
   analysis of ligand–receptor relationships in the feature space of
   single-cell expression profiles using the statistical_analysis function
   of the CellphoneDB software package. Cluster labels for all cells were
   randomly permuted 1000 times, and the mean expression levels of
   receptors within clusters and the mean expression levels of ligands
   within interacting clusters were determined. This process generated a
   null distribution (also known as a Bernoulli distribution or two-point
   distribution) for each receptor-ligand pair in every pairwise
   comparison between two cell types. Finally, we selected several
   ligand–receptor pairs of interest to illustrate their relationships.

Selection of differentially expressed genes related to sialylation and
enrichment analysis

   We performed data processing and statistical analysis of expression
   files using R. The"limma"package was utilized for data normalization
   and identification of Differentially Expressed Genes (DEGs), where
   genes with a P-value < 0.05 and |log[2] FC|> 1 were considered
   significant DEGs. A volcano plot illustrating the DEGs was generated
   using the"ggplot2"package. Additionally, the log2-transformed mRNA
   expression data were arranged into a heatmap using the"pheatmap"package
   in R.

Consensus clustering algorithm, gene set variation analysis, and gene set
enrichment analysis

   From 363 KIRC samples, we obtained expression patterns of 14
   differentially sialylated genes along with clinical data. Subsequently,
   ConsensusClusterPlus was employed to cluster these differentially
   sialylated genes using the following parameters: reps = 50, pItem
   = 0.8, pFeature = 1, clusterAlg ="pam", distance ="spearman".
   Here,"pam"and"spearman"were used as the clustering algorithm and
   distance metric, respectively. The"GSVA"package in R was utilized to
   compute sialylation scores for each KIRC patient, serving as an
   indicator of sialylation. The"wilcox.test"function in R was then
   employed to compare differences in sialylation scores between different
   clusters. In addition to sialylation scores, enrichment scores for 50
   hallmark signaling pathways were computed using the"GSVA"package.
   Subsequently, potential differences in signaling pathways between
   different clusters were compared using a similar approach.

Cluster analysis of tumor immune microenvironment

   The"estimate"package in R was used to compute ImmuneScore,
   StromalScore, EstimateScore, and tumor purity for each KIRC sample.
   The"ggpubr"package in R was utilized to visualize these results.
   Furthermore, the TIMER2.0 database ([55]http://timer.cistrome.org/)
   provides comprehensive immune features of tumor-infiltrating cells in
   various tumor samples from the TCGA database, based on algorithms
   including TIMER, CIBERSORT, EPIC, and MCPCOUNTER. The"pheatmap"package
   in R was used to illustrate the infiltration of different immune cells
   in each KIRC sample. Subsequently, the"limma"package in R was employed
   to determine statistically significant changes in immune cell
   infiltration between subgroups C1 and C2. Immune cell subsets with
   p-values < 0.05 were separated and stored. The activation of Immune
   Checkpoint Genes (ICGs) played a crucial role in tumor immune evasion
   and malignant progression by inhibiting anti-tumor immune responses. It
   was widely acknowledged that ICGs play an irreplaceable role in
   regulating immune cell function and disease progression. Therefore, we
   further explored differences in the expression levels of ICGs between
   subgroups C1 and C2. Tumor Immune Dysfunction and Exclusion (TIDE) is a
   vital biomarker for immune therapy response.

Drug sensitivity analysis

   Using the largest pharmacogenomics database, the GDSC Cancer Drug
   Sensitivity Pharmacogenomics database
   ([56]https://www.cancerrxgene.org/), we utilized the R
   package"pRRophetic"to predict the chemotherapy sensitivity of each
   tumor sample. Regression methods were employed to obtain estimated
   IC[50] values for specific chemotherapy drugs, and tenfold
   cross-validation was performed using the GDSC training set to assess
   the accuracy of regression and prediction. Default parameters were
   chosen for all analyses, including"combat"for batch effect removal and
   averaging duplicate gene expressions.

Construction and validation of sialylation risk model

   In the survival package, single-factor Cox regression analysis was
   utilized to further identify prognostic-related genes with p < 0.05. To
   reduce the number of genes, we conducted Least Absolute Shrinkage and
   Selection Operator (LASSO) Cox regression analysis, followed by
   stepwise regression for multi-factor Cox regression analysis. Based on
   the results of the multi-factor Cox model, we constructed a risk
   feature with the formula: Risk Score = ∑βi*Expi. Here, i represents
   genes in the risk feature, expi denotes the expression of gene i, and
   βi represents the coefficient of gene i in the multi-factor Cox model.
   After zero-mean normalization, patients were divided into high-risk and
   low-risk groups. The timeROC package was used for Receiver Operating
   Characteristic (ROC) curve analysis to evaluate the predictive
   performance of the risk feature. ROC curves were plotted based on risk
   scores, and the Area Under the Curve (AUC) was calculated to assess the
   accuracy of the risk model. The maximum sum of sensitivity and
   specificity was determined as the optimal cutoff point for stratifying
   high-risk and low-risk populations. Kaplan–Meier analysis was conducted
   on high-risk and low-risk groups to demonstrate that the risk score
   could serve as an independent clinical prognostic indicator.

Construction of risk feature and nomogram

   To construct a clinically applicable nomogram model, we initially
   conducted univariate and multivariate Cox regression analyses on
   clinical pathology and risk features. Variables with p < 0.05 in the
   multivariate Cox model were used to construct a nomogram for predicting
   the prognosis of KIRC using the rms package. Calibration curves were
   generated to assess the predictive accuracy of the model. Decision
   Curve Analysis (DCA) was employed to evaluate the reliability of the
   model.

Gene mutation analysis and gene set enrichment analysis

   The"GSVA"package was utilized to compute potential differences in
   enrichment scores for HALLMARKER signaling pathways and KEGG pathways
   between high and low-risk groups. Additionally, analysis of Single
   Nucleotide Variation (SNV) mutations and Copy Number Variation (CNV)
   alterations in genes of the risk model was performed.

Response to immune checkpoint blockade

   We obtained transcriptomic and matched clinical data from the
   IMvigor210 cohort of KIRC patients treated with anti-PD-L1 therapy
   (atezolizumab)
   ([57]http://research-pub.gene.com/IMvigor210CoreBiologies).
   Additionally, the [58]GSE78220 cohort includes transcriptomic data from
   pre-treated melanoma patients receiving anti-PD-1 checkpoint inhibition
   therapy. Data were also downloaded to ascertain the potential value of
   risk feature scores in predicting responsiveness to immune checkpoint
   blockade (ICB).

Statistical analysis

   All statistical analyses were conducted using R software (version
   4.2.3) employing Cox univariate and multivariate regression analyses to
   determine independent prognostic factors for overall survival. Survival
   analysis was performed using Cox univariate regression analysis. ROC
   curve analysis was employed to measure the extent of prediction of the
   overall survival prognosis model outcome.In enrichment analysis, in
   order to avoid false positive results, multiple test correction is
   required, strictly following the standard of p-value < 0.05. Wilcoxon
   test was utilized to compare tumor immune infiltrating cells.
   Visualization was performed using the"ggplot2","pheatmap",
   and"forestplot"packages. A significance level of p < 0.05 was used to
   determine statistical significance.

Results

Quality control, normalization, and analysis of scRNA-seq data

   The single-cell data were filtered to ensure that each gene was
   expressed in at least three cells, and each cell expressed at least 200
   genes. The"PercentageFeatureSet"function was used to calculate the
   proportion of mitochondria and rRNA, ensuring gene expression levels
   ranged from 200 to 4000 in each cell, with at least 200 UMIs per cell.
   This process yielded 25,066 cells. Positive correlations were observed
   between UMI/mRNA counts and mitochondrial gene content. Subsequently,
   2000 variable genes were plotted on a scatter plot, and the optimal
   number of principal components (PCs) was observed to be 30 through
   ElbowPlot analysis. PCA dimensionality reduction analysis revealed
   different score values across dimensions, with the optimal PC number
   determined to be 30 through ElbowPlot analysis (Fig. [59]1A). Finally,
   25 subtypes were obtained via tSNE, revealing significant expression
   differences for numerous genes among these subtypes in Fig. [60]1B.
   Expression levels of the top 10 genes with the greatest expression
   differences among subtypes were displayed in (Fig. [61]1C). Cell types
   were annotated using known marker genes, with 25 clusters assigned to 9
   cell categories in (Fig. [62]1D): CD4 + memory cells, NK cells,
   Monocytes, B cells, CD8 + T cells, M2 macrophages, Regulatory T cells,
   Endothelial cells, and Cancer stem cells. Marker genes for each cell
   subtype were identified using the FindAllMarkers function (logFC = 1,
   minpct = 0.25, adj.p < 0.05), resulting in 1930 subtype-specific marker
   genes. Enrichment analysis of these marker genes revealed significant
   enrichment in GO terms such as B cell activation and immune
   response-regulating cell surface receptor in (Fig. [63]1E). Analysis of
   ligand–receptor relationships in the single-cell expression profile was
   conducted using the"CellphoneDB"package. Selected ligand–receptor pairs
   were highlighted, with high interaction scores observed for
   interactions between CD4 + memory cells and MIF- (CD74 + CXCR4).
   Finally, the number of ligand–receptor gene pairs corresponding to each
   cell group was tallied, revealing CD4 + memory cells to have the
   highest potential interaction with other cell subtypes in Fig. [64]1F.
   Additionally, Fig. [65]1G provides further evidence supporting this
   conclusion, illustrating the extensive network of interactions centered
   around CD4 + memory cells.

Fig. 1.

   [66]Fig. 1
   [67]Open in a new tab

   Different subtypes in single-cell sequencing. A Results of the
   ElbowPlot. B t-SNE plot of the 25 identified main cell types in KIRS
   lesions. C Top 10 genes with the greatest differential expression in
   each cell subtype. D Enrichment analysis of marker genes. E Interaction
   between CD4 + memory cells and MIF- (CD74 + CXCR4). F Interaction
   network of ligand–receptor genes

Identification of differentially expressed genes in sialylation

   From TCGA data, 109 sialylation-related genes were extracted from KIRC
   samples for differential analysis with criteria of P < 0.05 and log[2]
   |FC|> 1. Among these, 14 genes were identified as differentially
   expressed sialylation genes. Visualization of DEGs was performed using
   volcano plots and heatmaps (Fig. [68]2A, B). GO and KEGG analyses were
   conducted for the identified differentially expressed sialylation
   genes. GO analysis, pathways such as carbohydrate binding were
   primarily identified in Fig. [69]2C. In KEGG enrichment analysis,
   enriched pathways mainly included Mucin type O-glycan biosynthesis in
   Fig. [70]2D. The chromosomal locations of differentially expressed
   sialylation genes in humans are depicted in the chromosomal map. All
   differentially expressed sialylation genes between the two groups
   showed significant differences (P < 0.05) (Fig. [71]2E). Further
   analysis of differentially expressed sialylation genes in ARDS revealed
   multiple correlations between genes, represented by the r-value in
   Fig. [72]2F. Additionally, Fig. [73]2G showed copy number variations
   (CNVs) of prognostic differentially expressed sialylation genes were
   observed in KIRC samples.

Fig. 2.

   [74]Fig. 2
   [75]Open in a new tab

   Identification of differentially expressed genes (DEGs) in SRGs. A
   Volcano plot of DEGs. B Heatmap of DEGs. C GO enrichment of DEGs. D
   KEGG enrichment of DEGs. E Chromosomal view of DEGs. F Correlation
   analysis of DEGs. G CNV analysis of DEGs

Identification of sialylation-based molecular subtypes in KIRC using
consensus clustering analysis

   To uncover sialylation-based molecular subtypes in KIRC, consensus
   clustering analysis was performed using KIRC samples.By increasing the
   clustering variable (k) from 2 to 9 (Supplementary Table S1), we found
   that when k = 2, the intragroup correlations were the highest and the
   intergroup correlations were low.From k = 2 to k = 9, there were
   significant differences in the relative changes of the area under the
   CDF curve. Considering the cumulative distribution function (CDF) curve
   and delta area, k = 2 was selected as the number of unique and
   non-overlapping subtypes in Fig. [76]3A, B. To uncover
   sialylation-based molecular subtypes in KIRC, consensus clustering
   analysis was performed using KIRC samples. Considering the cumulative
   distribution function (CDF) curve and delta area, k = 2 was selected as
   the number of unique and non-overlapping subtypes in Fig. [77]3A, B.
   From k = 2 to k = 9, there were significant differences in the relative
   changes of the area under the CDF curve..Thus, two sialylation-based
   molecular subtypes of KIRC were constructed in Fig. [78]3C, with C1
   subtype comprising 270 cases and C2 subtype comprising 258 cases.
   Patients in C2 exhibited higher sialylation scores compared to C1
   subtype, and distinct differences in the expression levels of
   differentially expressed sialylation-related genes were observed
   between C1 and C2 subtypes, with most genes being overexpressed in the
   C2 subtype (Fig. [79]3D). Findings from pathway-based single-sample
   gene set enrichment analysis (ssGSEA) indicated that the C2 subtype was
   primarily enriched in pathways such as ADIPOGENESIS and
   FATTY_ACID_METABOLISM, suggesting a close association between
   sialylation and the aforementioned typical tumor-related pathways in
   Fig. [80]3E.

Fig. 3.

   [81]Fig. 3
   [82]Open in a new tab

   Identification of two gene clusters. A Two gene clusters according to
   the consensus clustering matrix (k = 2). B Relative change in area
   under the CDF curve when k = 2. C Violin plot of sialic acidification
   score. D Heatmap of differential expression of 12 SRGs in different
   subtypes, with red indicating upregulation and blue indicating
   downregulation. E Heatmap of ssGSEA

Drug sensitivity analysis based on subtypes

   Considering the widespread use of molecular targeted drugs in treating
   KIRC, the"pRRophetic"package in R was employed to systematically
   evaluate chemotherapy responses based on different subtypes. Our data
   indicated that drugs such as Dasatinib and Lapatinib were expected to
   benefit the C1 subtype. Conversely, the C2 subtype appeared to benefit
   from drugs like Epothilone.B and Vinorelbine (Fig. [83]4A).

Fig. 4.

   [84]Fig. 4
   [85]Open in a new tab

   Drug sensitivity analysis plot

Clustering analysis of tumor immune microenvironment

   Transcriptomic data were utilized, and the"ESTIMATE"package was
   employed to assess differences in immune features between C1 and C2
   subtypes (based on StromalScore, ImmuneScore, ESTIMATEScore, and
   Tumorpurity). Figure [86]5A demonstrated that the C1 subtype exhibited
   enhanced levels of ImmuneScore, StromalScore, and ESTIMATEScore, while
   tumor purity levels decreased. These findings suggested a positive
   correlation between prognosis in KIRC and immune and stromal
   components. To further explore the abundance of immune cell
   infiltration in the tumor microenvironment, various algorithms were
   applied to estimate the percentage of immune cell infiltration in C1
   and C2 subtypes. As shown in Fig. [87]5B, based on TIMER,
   CIBERSORT-ABS, EPIC, and MCPCOUNTER algorithms, the C1 subtype shows an
   increase in proportions of immune cells such as Endothelial and
   Neutrophils, with most immune cells increasing in proportion in subtype
   C2. Immune checkpoint genes (ICGs) are determinants of immune function
   in immune cells. Similarly, our study results indicate enhanced
   expression of ICGs in the C2 subtype (Fig. [88]5C).

Fig. 5.

   [89]Fig. 5
   [90]Open in a new tab

   Immune infiltration analysis. A Violin plot of immune scores in
   different subtypes. B The distribution traits of immunocyte
   infiltration in SRG-based clusters. C Boxplot of immune checkpoint
   genes (ICGs). ns not significant, *P < 0.05, **P < 0.01, ***P < 0.001,
   ****P < 0.0001

Construction of sialylation risk model

   The prognostic value of 14 differential sialylation genes was assessed
   through univariate Cox regression analysis, with 6 genes showing
   prognostic significance (Fig. [91]6A, P < 0.05). Lasso-Cox regression
   analysis was performed on the prognostic genes, with λ = 0.0067
   (Fig. [92]6B). Eventually, 5 genes were obtained and incorporated into
   the risk features after conducting multivariate Cox regression analysis
   using a stepwise regression method (Fig. [93]6C). The final
   5-sialylation-associated gene signature formula is as follows:
   Riskscore ="−0.244*ST8SIA4+0.39*SIGLEC10 −0.559*GCNT4- 0.206*SELP
   −0.135*SLC17 A3". We calculated the risk score for each sample and
   categorized them into high-risk and low-risk groups after z-score
   normalization. The AUC values for the 1, 3, and 5-year survival models
   in the TCGA cohort ranged from 0.68, 0.69, to 0.70, respectively.
   Kaplan–Meier survival analysis showed that compared to low-risk
   patients in the TCGA cohort, high-risk patients had significantly worse
   survival outcomes (Fig. [94]6D, E). Similar results were observed in
   the ICGC validation cohort (Fig. [95]6F, G).

Fig. 6.

   [96]Fig. 6
   [97]Open in a new tab

   Construction of SRGs clinical model. A Univariate Cox regression. B
   Cross-validation in LASSO regression. C Multivariate Cox regression. D
   Kaplan–Meier survival analysis of high and low-risk scores in TCGA. E
   Survival model for 1, 3, and 5 years in TCGA. F Kaplan–Meier survival
   analysis of high and low-risk scores in ICGC. G Survival model for 1,
   3, and 5 years in ICGC

Identification of independent risk factors and construction of nomogram

   To enhance the predictive performance of the risk features, we
   integrated clinical-pathological characteristics and risk scores
   through univariate and multivariate Cox regression analyses. Univariate
   analysis indicated that Age, risk features, Stage, Grade, and TNM were
   significant independent prognostic factors for the risk model;
   multivariate analysis revealed that the risk features and M were
   important independent prognostic factors for the risk model
   (Fig. [98]7A, B). Consequently, a nomogram combining Stage and risk
   score was constructed, and calibration plots demonstrated that the
   nomogram could effectively predict actual survival outcomes
   (Fig. [99]7C–E).

Fig. 7.

   [100]Fig. 7
   [101]Open in a new tab

   Construction of forest plots. A Forest plot of univariate Cox analysis.
   B Forest plot of multivariate Cox analysis. C Nomogram based on risk
   score and clinical characteristics. D Calibration plot for internal
   validation of the nomogram. E DCA curve

Analysis of the risk model on the tumor immune microenvironment

   Our data indicate significant positive correlations between ST8SIA4,
   SIGLEC10, SELP, SLC17 A3, and Stromal Score, Immune Score, and Estimate
   Score, while GCNT4 shows significant negative correlations with
   Stromal, Immune, and Estimate Scores (Fig. [102]8A). Subsequently,
   based on the median expression of each gene, correlation analysis of
   hub genes was conducted to investigate their relationships in KIRC
   (Fig. [103]8B). We simultaneously investigated the expression
   differences of immune scores of 5 SRGs between high and low-risk groups
   (Fig. [104]8C–G).

Fig. 8.

   [105]Fig. 8
   [106]Open in a new tab

   Impact of SRGs model on the immune microenvironment. A Heatmap of
   StromalScore, ImmuneScore, and ESTIMATEScore in risk model. B
   Correlation analysis of hub genes. (C-G) Boxplot of immune scores of
   hub genes in high and low-risk groups. ns not significant, *P < 0.05,
   **P < 0.01, ***P < 0.001, ****P < 0.0001

Pathway analysis of hub genes and mutation analysis

   We also conducted an analysis of potential pathways associated with
   each risk gene, a total of 5 pathways were significantly correlated
   with these 5 genes (Fig. [107]9A). Next, we examined the SNV mutations
   of the 5 genes in the risk signal. The study revealed that ST8SIA4,
   SIGLEC10, GCNT4, SELP, and SLC17 A3 had SNV mutations in more samples
   (Fig. [108]9B). We analyzed the co-occurrence probability of these key
   genes and the top 10 genes with the most mutations. Significant
   co-occurrence probabilities of mutations were found among these 5
   genes, with only a very small number of samples showing CNV
   gains/losses among these 5 genes (Fig. [109]9C). To further elucidate
   the relationship between risk genes and KIRC, we analyzed the
   correlation between these genes and several molecular features of KIRC
   (Fig. [110]9D). The results showed that ST8SIA4, SIGLEC10, GCNT4, SELP,
   and SLC17 A3 were significantly negatively correlated with non-ploidy
   score, homologous recombination deficiency, partial alteration,
   fragment count, and non-silent mutation rate. Additionally, we also
   analyzed potential pathways associated with each risk gene
   (Fig. [111]9E). As shown in Fig. [112]9F, a total of 12 pathways were
   significantly associated with these 5 genes, including
   HISTIDINE_METABOLISM, AUTOIMMUNE_THYROID_DISEASE, and
   GRAFT_VERSUS_HOST_DISEASE.

Fig. 9.

   [113]Fig. 9
   [114]Open in a new tab

   Mutation analysis of hub genes. A Waterfall plot of mutations. B SNV
   mutations. C Gain/loss of CNV. D Correlation analysis between hub genes
   and KIRC molecular features. E Correlation heatmap of 5 key genes with
   homologous recombination defects, non-diploid scores, fragment counts,
   score changes, and non-silent mutation rates. F Pathway heatmap of hub
   genes

Significant value of risk feature scores in predicting immune checkpoint
blockade response

   Through the analysis of the IMvigor210 and [115]GSE78220 datasets
   (Fig. [116]10A–F), we found that risk feature scores had significant
   value in predicting the response to ICB. In the IMvigor210 dataset, the
   survival rate of the high-risk group was significantly lower than that
   of the low-risk group (p = 0.00028) (Fig. [117]10A). Although there was
   no significant difference in risk scores between the CR/PR (complete
   response/partial response) and PD/SD (progressive disease/stable
   disease) groups, the proportion of CR/PR was higher in the low-risk
   group (27% vs. 18%) (Fig. [118]10B, C). In the [119]GSE78220 dataset,
   the survival rate of the high-risk group was also significantly lower
   than that of the low-risk group (p = 0.015). Similarly, although there
   was no significant difference in risk scores between the PD
   (progressive disease) and PR/CR (partial response/complete response)
   groups, the proportion of PR/CR was higher in the low-risk group (33%
   vs. 67%) (Fig. [120]10D–F). These results indicated that patients in
   the low-risk group responded better to ICB therapy, and risk feature
   scores could serve as important biomarkers for predicting the efficacy
   of ICB.

Fig. 10.

   [121]Fig. 10
   [122]Open in a new tab

   Significant values in predicting immune checkpoint blockade response. A
   Survival curves of high and low-risk groups in IMvigor210. B Scores of
   different treatment groups. C Proportions of different treatment
   groups. D Survival curves of high and low-risk groups in [123]GSE78220.
   E Scores of different treatment groups. F Proportions of different
   treatment groups

Discussion

   The comprehensive analysis presented in this study delves into the
   intricate relationship between SRGs and KIRC, offering insights into
   the prognostic implications and therapeutic avenues associated with
   these molecular mechanisms. Despite improvements in KIRC detection
   methods, prognosis prediction, and drug treatment selection, patients
   still fail to achieve favorable outcomes due to high recurrence or
   distant metastasis rates and drug resistance [[124]17–[125]19]. Early
   diagnosis, personalized treatment, and regular follow-up are key to
   achieving and improving patient prognosis. Therefore, the urgent need
   in KIRC is to discover new biomarkers to improve survival rates and
   assess drug responses. By unraveling the molecular intricacies of KIRC
   through the lens of sialylation, this research underscores the
   multifaceted nature of tumor biology and its implications for precision
   medicine.

   SiaTs, as a crucial class of enzyme molecules, play a pivotal
   regulatory role in tumor development by modulating various events such
   as cell proliferation, metastasis, angiogenesis, and chemotherapy
   resistance [[126]20–[127]23]. They are instrumental in regulating the
   growth, metastasis, and therapeutic resistance of tumor cells. In our
   study, we screened 5 SRGs (ST8SIA4, SIGLEC10, GCNT4, SELP, and SLC17
   A3) and developed prognostic features associated with these SRGs to
   classify patients into high-risk and low-risk groups. Survival analysis
   indicated that patients in the low-risk group exhibited better
   prognoses and longer survival times. Furthermore, we evaluated the
   prognostic value of these features in predicting 1-year, 3-year, and
   5-year survival rates using time-dependent ROC curves, demonstrating
   their robust predictive capability, further supported by constructed
   calibration plots. These findings underscore the effectiveness of the
   newly established SRGs-related prognostic features in assessing
   prognosis, thereby serving as complementary tools for outcome
   evaluation in KIRC patients.

   ST8SIA4 transfers sialic acid (located at the end positions of N- and
   O-linked polysaccharides) to the end positions of sugars [[128]32,
   [129]33]. A previous study found that lncRNA HOTAIR/miR-124/ST8SIA4
   enhances the malignancy of renal cell carcinoma [[130]34]. The IFN-γ
   response related features of seven genes containing ST8SIA4 can
   effectively predict the prognosis of ccRCC [[131]35]. Siglecs is a
   receptor family that binds to sialic acid containing polysaccharides.
   SIGLEC10 plays an important role in cancer-related processes and can
   inhibit the proliferation and function of tumor infiltrating CD8 T
   cells in vitro through the Akt/P38/Erk signaling pathway [[132]34].
   GCNT4 may play a role as an oncogene in a variety of tumors, such as
   pancreatic cancer, breast cancer and lung cancer; However, its
   potential role in OS has never been reported before [[133]36]. SELP
   plays an important role in cancer-related thrombosis, promoting cancer
   metastasis through platelet cancer cell interactions and mediating
   cancer immunity [[134]37]. We further found that the SRG gene is
   significantly associated with 186 pathways, while protective genes and
   risk genes clearly have different pathway characteristics. Protective
   genes are significantly positively correlated with allogeneic
   rejection, complement and coagulation, and cell apoptosis, while risk
   genes are significantly correlated with fatty acid metabolism and
   adipogenesis. The changes in fatty acid metabolism play an important
   role in KIRC, and the prognostic value of fatty acid metabolism related
   genes in KIRC has also been revealed [[135]41]. Apoptosis is a crucial
   cellular process for controlling tumor proliferation, spread, and
   future outcomes [[136]42, [137]43], and changes in metabolic pathways
   exist in many diseases, including tumors. Metabolic reprogramming is
   crucial for maintaining abnormal proliferation of tumor cells
   [[138]44]. Therefore, these data provide direction for us to further
   investigate the regulation of these risk genes in KIRC.

   SiaTs were capable of promoting cancer progression by influencing
   immune cell infiltration, with immune cell infiltration playing a
   crucial role in cancer prognosis [[139]24–[140]26]. The infiltration of
   immune cells was essential for regulating the tumor microenvironment,
   controlling tumor growth, and managing metastasis [[141]27–[142]29].
   Immune cells, including T lymphocytes and natural killer cells, could
   recognize and eliminate abnormal cells, thus suppressing tumor growth
   and spread [[143]30]. Significantly, our research elucidates the
   intricate interplay between sialylation and the tumor immune
   microenvironment in KIRC. We discern distinct immune infiltration
   patterns across specific KIRC subtypes, implying disparate responses to
   immunotherapeutic modalities. Given the immunogenicity of KIRC, the use
   of PD-1 and PD-L1 as frontline treatments for metastatic disease was
   increasingly dominant based on ICB [[144]31]. The differential
   expression of immune related genes may also be another factor
   contributing to the good prognosis of C2 subtypes. Compared with C1
   subtype, the immune suppressive factors PDCD1. The decreased expression
   of CTLA and LAG3, as well as the decrease in immunostimulatory factors
   [[145]38, [146]39], suggest that the C2 subtype may have a better
   prognosis. The differential distribution of gene mutations, immune cell
   and immune regulatory gene expression in C2 subtype compared to other
   subtypes may be the mechanism leading to a good prognosis for C2
   subtype. In summary, these findings highlight the methods of patient
   stratification and targeted therapy. Our research also found that risk
   scores based on SRGs had significant value in predicting responsiveness
   to ICB. These revelations hold profound implications for devising
   personalized treatment regimens encompassing immunotherapy and targeted
   approaches tailored to exploit the susceptibilities of distinct KIRC
   subtypes and circumvent therapeutic resistance.Lapatinib: When EGFR and
   HER2 hormone therapy are equally effective, lapatinib has better
   tolerability [[147]40].

   While our study advances our understanding of sialylation's role in
   KIRC, it is not devoid of limitations. The retrospective nature of our
   analysis precludes causal inference, necessitating prospective studies
   for validation. Firstly, our prognostic model was constructed and
   validated using retrospective data from public databases. More
   forward-looking real-world data is needed to validate its clinical
   efficacy. Secondly, the inherent weakness of constructing prognostic
   models solely based on a single marker is inevitable, as many prominent
   prognostic genes in KIRC may have been excluded.The prognostic model
   was developed and validated using the TCGA dataset, while ICGC was
   applied for external validation analysis. TCGA data removes samples
   without survival data and outcome status and performs corrective
   processing, while ICGC directly conducts external validation analysis.
   The differences in these data may affect the prognosis of SRG, and
   future basic experiments are needed to validate our findings and
   elucidate the functional role of SRG in KIRC and development.
   Additionally, elucidating the functional significance of individual
   SRGs and their intricate interplay in KIRC pathogenesis mandates
   further exploration through in vitro and in vivo experimentation.
   Furthermore, the clinical utility of our prognostic model warrants
   validation in independent patient cohorts to assess its
   generalizability and robustness across diverse populations.

   In conclusion, our study underscores the paramount importance of
   sialylation in shaping the landscape of KIRC, providing a comprehensive
   framework for elucidating the molecular mechanisms underpinning tumor
   progression and therapeutic response. Through the integration of
   multi-omics data and advanced computational methodologies, we have
   identified novel prognostic markers, delineated molecular subtypes, and
   elucidated the tumor immune microenvironment of KIRC. These findings
   offer pivotal insights for the development of precision medicine
   strategies aimed at enhancing patient outcomes in KIRC and beyond.

Conclusions

   In conclusion, our study sheds light on the intricate relationship
   between SRGs and KIRC, unveiling distinct molecular subtypes and their
   implications for prognosis and treatment. By elucidating the role of
   sialylation dysregulation in KIRC progression and its interaction with
   the tumor immune microenvironment, our findings offer insights into
   potential therapeutic targets and strategies for personalized medicine
   in KIRC management. Moving forward, collaborative efforts are crucial
   to validate and translate these findings into clinical practice,
   ultimately improving patient outcomes and advancing precision medicine
   in KIRC.

Supplementary Information

   [148]Supplementary Material 1^ (6.5MB, zip)

Acknowledgements