Abstract Regulatory T cells (Tregs) have been found to be related to immune therapeutic resistance in kidney cancer. However, the potential Tregs-related genes still need to be explored. Our study found that patients with high Tregs activity show poor prognosis. Through co-expression and differential expression analysis, we screened several Tregs-related genes (KTRGs) in kidney renal clear cell carcinoma. We further conducted the univariate Cox regression analysis and determined the prognosis-related KTRGs. Through the machine learning algorithm—Boruta, the potentially important KTRGs were screened further and submitted to construct a risk model. The risk model could predict the prognosis of RCC patients well, high risk patients show a poorer outcomes than low risk patients. Multivariate Cox regression analysis reveals that risk score is an independent prognostic factor. Then, the nomogram model based on KTRG risk score and other clinical variables was further established, which shows a high predicted accuracy and clinical benefit based on model validation methods. In addition, we found EMT, JAK/STAT3, and immune-related pathways highly enriched in high risk groups, while metabolism-related pathways show a low enrichment. Through analyzing two other external immune therapeutic datasets, we found that the risk score could predict the patient's immune therapeutic response. High-risk groups represent a worse therapeutic response than low-risk groups. In summary, we identified several Tregs-related genes and constructed a risk model to predict prognosis and immune therapeutic response. We hope these organized data can provide a theoretical basis for exploring potential Tregs' targets to synergize the immune therapy for RCC patients. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-01787-x. Keywords: Tregs, RCC, Machine learning, Clinical model, Immune therapy Introduction Renal cell carcinoma (RCC), as one of the most commonly diagnosed urological cancer, lead to over 179000 deaths every year in the world [[32]1]. RCC comprises several subtypes with unique characteristics. Clear-cell RCC is the most common subtype, accounting for nearly 70% of the cases [[33]2, [34]3]. RCC is known to be a tumour with dysfunction of the immune system, in which the potential mechanisms are supposed to be the block of anti-tumour immune response via immune suppressive factors, such as checkpoint molecules [[35]4]. Targeting immune checkpoints results in clinical responses in some patients with RCC, but a large proportion of patients do not benefit from this treatment [[36]4, [37]5]. Therefore, the identification of potential biomarks for immune therapeutic response is essetial to improve the clinical efficacy of these therapies. It has been reported that RCC tumours are severely infiltrated by T cells. However, the T cells could not mount effective anti-tumour efficacy [[38]4], which might probably be due to the suppressive function of regulatory T cells (Tregs) and myeloid cells. Regulatory T cells (Tregs) are an indispensable subset of T lymphocytes, crucial for maintaining immune homeostasis and preventing autoimmunity by suppressing excessive immune responses [[39]6]. In recent years, the common immunological basis for Treg-mediated suppression of autoimmunity and tumour immunity has been extensively explored. The research suggests that the mounting infiltration levels of Tregs were related to worse clinical outcomes in multiple tumours [[40]7]. The depletion of Tregs could elicit effective anti-tumour immunity by eliminating immunological unresponsiveness to syngeneic tumours, although it may also result in autoimmunity, especially if Treg cells are depleted systemically [[41]7, [42]8]. In the immune microenvironment of RCC, Tregs play a nonnegligible role in RCC progression. For instance, CD4 + FOXP3 + Tregs are involved with an increased recurrence of RCC [[43]9]. The PD1 expression on Tregs is correlated with resistance to anti-PD1 in metastatic clear cell RCC (mccRCC) [[44]10]. In addition, high infiltration of Tregs is also associated with resistance to antiangiogenic therapy in mRCC [[45]11]. These results suggest that Tregs-targeted therapy might synergize with anti-PD1 therapy for patients with RCC. So far, research has explored the potential biomarkers of Tregs in RCC [[46]12, [47]13]. However, few studies excavated the potential Tregs-related genes, and there is a shortage of clinical models based on Tregs to predict the immune therapeutic efficacy. In our study, we screened several potentially important Tregs-related genes in RCC using machine learning algorithms and constructed the risk model called the KTRGs risk model. Our model could predict prognosis and turn out to be an independent prognostic factor for patients with RCC. Mechanism analyses reveal that the KTRG risk model might be mainly related to EMT, JAK/STAT3, immune, and metabolism-related pathways or biological processes. Furthermore, KTRGs risk model could predict the immune therapeutic response well. The genes in the model might be the potential targets of Tregs, which could be used to synergize immune therapy. All in all, our research provided a theoretical foundation for treating Tregs against patients with RCC. Methods Data acquisition We selected datasets from patients diagnosed with RCC for our study, including TCGA-KIRC, TCGA-KIRP, and E-MTAB-1980 datasets. Patients were eligible for this study if their RCC samples included mRNA expression data and clinical data (at least including age, gender, histological grade, pathological stage, overall survival time, and overall survival status). The mRNA experssion data of KIRC and KIRP datasets (version: 07–19-2019) were obtained from the UCSC Xena website based on TCGA database ([48]http://xena.ucsc.edu/). KIRC contains contains 537 tumours and 72 normal samples, KIRP contains 291 tumours and 88 normal tissue in KIRP. The mRNA expression of the E-MTAB-1980 dataset (Release Date: 16 October 2013) was collected from the ArrayExpress database ([49]https://www.ebi.ac.uk/arrayexpress), which includes 100 tumour samples. The immune therapeutic dataset—Alexandra cohort, was obtained from previous research [[50]14], which comprises 25 tumour samples; [51]GSE78820 was obtained from the GEO database ([52]https://www.ncbi.nlm.nih.gov/gds), which contains 25 tumour samples. KIRC, KIRP, Alexandra, and [53]GSE78820 mRNA expression format was transformed into TPM units. The E-MTAB-1980 dataset contains the normalized mRNA expression profile that was to be downloaded. The KIRC dataset was primarily used to construct the risk score and explore molecular mechanisms. The KIRP and E-MTAB-1980 datasets were used to validate the practicability of the risk score. The immune therapeutic datasets, Alexandra and GEO cohorts, were used to explore the predictive accuracy of the risk score on immune therapeutic response. Pearson correlation analysis For two successive variables, the Pearson method was used to estimate the correlation between them by calculating the Pearson correlation coefficient (PCC) based on “rcorr” R function from “Hmisc” R package. The p-value was adjusted by false discovery rate (FDR) based on “p.adjust” R function. The absolute PCC value > 0.4 and adjusted p-value < 0.05 were considered to be a threshold to prove that the two variables have a positive or negative correlation. Differential genes screening The tumour and normal mRNA expression profiles were marched, and differential expression analysis was conducted based on the “limma” R-package (version: 3.52.1). The Benjamini–Hochberg method was used to adjust the P-value. The absoluted logFC (log2 fold change) > 2, and adjust p-value < 0.05 were regarded as a threshold to estimated whether a gene differently expressed between tumor and normal or not. Survival analysis Gene expression data per sample was matched with their own clinical overall survival (OS) data based on their own unique patient ID, and then the univariate Cox regression analysis was further performed using the “coxph” R function from “survival” R package. The univariate Cox regression analysis is a proportional hazards model, which produce the hazard ratio (HR) to estimate whether the expression of a gene is a risky or protective factor. Generally speaking, the variables with HR > 1 and p-value < 0.05 are regarded as risky factors, while the variables with HR < 1 and p-value < 0.05 are refered to as protective factors. Then, the sample was separated into high and low-expression groups based on single gene expression value, the ideal cut-off value per gene was calculated by “surv_cutpoint” R function from the “survminer” R package; the Kaplain-Meier survival curves were depicted to demonstrate the survival difference between the two groups, and a log-rank test was performed to evaluate the statistical significance based on the “survival” R-package. Gene set variateion analysis Cancer-related hallmark gene sets were acquired from the MSigDB database ([54]https://www.gsea-msigdb.org/gsea/downloads.jsp), and the progery pathway signature was obtained from previous research [[55]15]. The gene sets and whole-genome expression profile of TCGA-KIRC were presented to “GSVA” R package, and pathway activities per sample were finally calculated. Activity scores of pathways and gene expression profiles were subsequently merged by matching with the same sample names. Then then Pearson correlation coefficients (PCC) between gene expressions and pathway activities were calculated using the “rcorr” function from the “Hmisc” R-package. The P-value was adjusted by false discovery rate and transformed to -Log10(adjusted p-value) format to represent significance. The more signficant the correlation between two variables, the larger the size of bubble in heatmap. Identification of important genes and KTRGs risk score construction The KTRGs expression profile merged with clinical overall survival data was submitted to the “Boruta” algorithm from “Boruta” R pakage. Then, three types of KTRGs were determined, including confirmed (i.e., important) genes, tentative genes (i.e., probably important), and rejected (i.e., unimportant) genes. The KIRC(TCGA) data was divided into training and testing cohorts by a ratio of 7:3. The confirmed and tentative genes were further utilized to construct a risk model through lasso-cox regression analysis in the training cohort based on “glmnet” R package. Five genes were further determined, and the risk score was calculated using the lasso coefficient per gene times mRNA expression: [MATH: Risk score=Coefgene1Expressiongene1 +...+Coefgene5Expressiongene5 :MATH] Then, the risk score was further utilized to predict the overall survival of patients. Based on the optimal cut-off value of risk score, the high and low-risk groups were further determined in the training cohort, and then the Kaplain-Meier survival curve was portrayed to illustrate the survival difference between these two groups. ROC curve represents the accuracy of riskscore on predicting 3-/5-/7-year prognosis of the patients. Then, the same analyses were conducted in the internal testing cohort and the other two external validation cohorts—E-MTAB-1980 and KIRP(TCGA). Risk model evaluation We downloaded clinical data for KIRC, encompassing additional pharmaceutical therapy, radiation therapy, surgery (locoregional and metastatic procedures), age, gender, neoadjuvant treatment history, dimensions (intermediate and longest), laterality, grade, and stage, along with therapeutic outcomes. We matched these variables with risk scores and performed univariate Cox regression and data missing rate analyses. Variables with high missing rates or impractical results from the regression were excluded from further analysis. The risk score merged other clinical variables, including age, gender, laterality, grade, and stage, and then the multivariate Cox regression was further carried out. The factor with p-value < 0.05 during multivariate Cox regression was regarded as an independent prognostic factor. Then the data were separated into different clinical subgroups (i.e., age, age ≤ 60 & age > 60; gender, female & male; grade, grade I-II & grade III-IV; stage, stage I-II & stage III-IV). The risk score of each subgroup was computed and the high and low risk groups were established in the light of optimal cut-off value. Kaplain Meier survival curve was portrayed to demostrate the difference between high and low risk groups across sub-groups. Nomogram model established and model evaluation The variables risk score, age, grade, and stage were conducted using multivariable Cox regression analysis, and then stepwise regression analysis was conducted utilizing the “step” R function. The risk score, age, grade, and stage were finally screened to construct a nomogram model. The calibration curve, time-dependent ROC curve, and decision curve analysis (DCA) were depicted to evaluate model accuracy and clinical benefit based on “rms”, “survivalROC”, “ggDCA” R package, respectively. Kaplain-Meier survival curves were portrayed to demonstrate the survival difference between high and low-risk scores constructed by the nomogram model. Mechanism analyses The GSEA software (Version: GSEA-4.0.3) was downloaded from the GSEA website ([56]https://www.gsea-msigdb.org/gsea/downloads.jsp). The mRNA expression profile of the KIRC(TCGA) dataset to gene set enrichment analysis (GSEA) software. The cancer-related hallmark pathway gene sets and gene ontology—biological progression gene sets were downloaded from a public website ([57]https://www.gsea-msigdb.org/gsea/index.jsp). The metabolism-related gene sets were downloaded from previous research [[58]16]. Then, the tumor samples with whole genome experssion profile was ranked by risk groups, high risk group was arranged in front of the low risk group. The expression profile and gene sets were submitted to GSEA software to estimate potentially differentially enriched pathways. For the gene set variation analysis (GSVA), we first compute the hallmark pathway activity score, Then the pathway activity score and risk score were further used to calculate the Pearson correlation based on the Pearson method. Tumor immune infiltration analyses and immune therapeutic estimation The gene sets representing twenty-four types of immune cell were acquired from previous study [[59]17]. Then we applied the single-sample gene set enrichment (ssGSEA) method based on “GSVA” R-package to calculated the immune activity score of cells per sample. The proportion of 22 types of immune cell per samples was evaluated by Cell Type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT). The immune cell activity score and immune cell proportions were further used to explore the correlation with risk score. For the immune therapeutic estimation, we first using the tumour immune dysfunction and exclusion (TIDE) analysis, which is a public website merged with large scale omics data and biomarkers based on the immune checkpoint blockade (ICB) trials [[60]18] ([61]http://tide.dfci.harvard.edu). Based on the TIDE score, the patients was referred to as responder and non-responder, which were further used to analyze the relation to risk score. Statistic analysis We utilized the unpaired t-test to estimate the difference between the two continuous variables using “t.test” R function. The chi-square test was used for contingency table data to examine the distribution distinction utilizing “chisq.test” R function. The Kaplain-Meier curve was depicted to portray the survival difference between the two groups, and the log-rank test was applied to assess the significance using “coxph” R function from “survival” R package. We estimated the correlation between the two continuous variables by calculating the Pearson correlation coefficient (PCC) using “rcorr” R function from “Hmisc” R package. For all analyses, a p-value less than 0.05 was regarded as significant. *P-value < 0.05; **P-value < 0.01; ***P-value < 0.001; ****P-value < 0.0001. Results Identification of KIRC Tregs-related genes (KTRGs) Through the ssGSEA analysis, we figured out the Tregs activity score of KIRC. Based on the optimal cut-off value, the KIRC samples were separated into high- and low-Treg activity groups. We observed a poor prognosis that occurs in high Tregs activity groups (Fig. [62]1A), which lures us to further explore the potential reason and mechanism of Tregs on KIRC prognosis. The first thing we did was to determine the Tregs-related genes. To do this, we calculated the Pearson correlation coefficients (PCC) between each gene expression and Tregs score based on the whole genome expression profile of KIRC(TCGA). Those with absolute PCC > 0.4 and P-value < 0.05 were identified as Treg-related genes (Fig. [63]1B). Then we carried out differential expression analysis by comparing the tumour with the normal expression profile of KIRC with a filtering threshold based on absolute log2(Fold Change) > 2 and adjusted P-value < 0.05, differential expression genes (DEGs) were finally screened (Fig. [64]1B). 76 KIRC Tregs-related genes (KTRGs) were eventually determined by overlapping Treg-related genes and DEGs. The heatmap revealed that these genes' expressions could rightly discriminate between tumour and normal samples (Fig. [65]1C). When the KTRGs expression profile was conducted dimensionality reduction by t-distributed Stochastic Neighbor Embedding (t-SNE), we also observed tumour and normal samples harboured in different areas in a two-dimension axis, indicating that the expression of these genes could distinguish the tumour and the normal tissues well (Fig. [66]1D). We then presented the mRNA expression profile of KTRGs to clinical survival data, and a univariate Cox regression analysis was subsequently performed. 22 genes of 76 KTRGs were estimated to be associated with the overall survival of KIRC. Among these, SPI1, TYMP, LILRB1, IL20RB, ITGAX, C1QL1, FCER1G, SLAMF8, TGFBI, CCL5, and CXCL13 were risky genes to the prognosis of KIRC patients, high expression of which represent a worse survival probability (Fig. [67]1E and Figure S1). While OGDHL, MTURN, HSD11B2, PPARGC1A, LDHD, FREM1, NE3C2, TRIM2, SLAMF8, AGPAT9, and WDR72 were protective genes to OS of the patients and low expression of which represent a worse clinical outcome. In the protein–protein interaction (PPI) network, HSD11B2, NR3C2, and PPARGC1A show a protein interaction, SLAMF8, FCER1G, ITGAX, LILRB1, SPI1, CCL5, and CXCL13 show a protein interaction, while others do not show a relationship (Figure S2A). In mRNA levels, WDR72, MTURN, NR3C2, AGPAT9, TRIM2, PPARGC1A, OGFHL, LDHD, FREM1, and HSD11B2 show positive correlation with each other and negative correlation with other genes (Figure S2B). We further analyzed the epigenetic characterization, and we found that the mutation rate of all KTRGs is low (i.e., mutation rate < 5%), but the copy number variation (CNV) of some is exuberant (Figure S2C and D). Among these, TGFBI, SLAMF8, MTURN, LILRB1, LDHD, ITGAX, IL20RB, and HSD11B2 show high amplification (Amp) rate, TRIM2, OGDHL, NR3C2, IL20RB, FCER1G, and ALDH6A1 show high deletion (Del) rate. The expression of IL20RB, TRIM2, NR3C2, CXCL13, SLAMF8, and FCER1G are affected by copy number variation, while other genes do not show a significant effect. Fig. 1. [68]Fig. 1 [69]Open in a new tab Identification of KIRC Tregs-related genes. A Kaplain-Meier survival curve reveals the survival difference of the patients in high and low Tregs activity groups. B Schematic shows the process about screening KTRGs. C Heatmap shows the expression difference of KTRGs between tumor and normal groups. D The t-SNE analysis based on the expression of KTRGs between normal and tumor tissues. E The forest plot shows the prognosis-related KTRGs determined by the univariate Cox regression analysis Clinical relevance and mechanisms of KTRGs Apart from the relationship between prognosis and KTRGs mentioned, we matched the clinical stage and grade data and mRNA expression profile of KTRGs. Analysis of variance (ANOVA) was performed to estimate the difference of genes in different groups (i.e., Grade I—IV). Gene expression levels were shown by heatmap. We found OGDHL, PPARGC1A, LDHD, AGPAT9, ALDH6A1, WDR72, HSD11B2, FREM1, MTURN, NR3C2, and TRIM2 decrease little by little with the advance of grade and stage, whereas TGFBI, IL20RB, C1QL1, CCL5, CXCL13, TYMP, ITGAX, LILRB1, SLAMF8, SPI1, and FCER1G, gradually increase with development of disease (Fig. [70]2A and B). Through analyzing the correlation between 50 hallmark pathway activities and KTRGs expression, we found several genes (i.e., TGFBI, SPI1, SLAMF8, LILRB1, etc.) expression positively correlated with cancer-related pathways such as G2M checkpoint, E2F targets, Hedgehog signalling, Wnt pathways, P53 pathways, etc. PPARGC1A, OGDHL, ALDH6A1, and AGPAT9 have a positive correlation with adipogenesis, fatty acid metabolism, heme metabolism, bile acid metabolism, androgen response (Fig. [71]2C). PROGENy algorithm (Pathway RespOnsive GENes) concludes several cell-specific signalling pathways [[72]15]. Consistently, TGFBI, SPI1, SLAMF8, LILRB1, FCER1G, CXCL13, CCL5, and C1QL1 show a positive correlation with most cancer-related pathways. WDR72, TRIM2, PPARGC1A, FREM1, ALDH6A1, and AGPAT9 show a negative correlation with most cancer pathways but a positive correlation with androgen pathways (Fig. [73]2D). All in all, these KTRG expressions are related to tumour progression and might function differently by regulating specific metabolisms. Fig. 2. [74]Fig. 2 [75]Open in a new tab Clinical relevance and mechanism analysis of KTRGs. A The heatmap shows the expression of KTRGs across different grades. B The heatmap shows the expression of KTRGs across different stages. C The heat map shows the correlation between KTRGs and activity scores in cancer-related hallmark pathways. D The heat map shows the correlation between KTRGs and the activity score of the progery pathway signature. The red bubble represents a positive correlation; the blue bubble represents a negative correlation. The size of the bubble shows significance; the larger the size, the more significance it has. *P-value < 0.05; **P-value < 0.01; ***P-value < 0.001; ****P-value < 0.0001 KTRGs risk signature construction To further identify the potential important KTRGs, we matched the KTRGs expression profile with KIRC clinical data, and then the data was submitted to the Boruta algorithm. Three types of KTRGs, including confirmed (i.e., important) genes, tentative genes (i.e., probably important), and rejected (i.e., unimportant) genes, were finally determined. Among these, CXCL13, WDR72, TRIM2, ALDH6A1, NR3C2, PPARGC1A, and IL20RB were defined as confirmed genes, SLAMF8, FCER1G, LDHD, MTURN, TYMP, and OGDHL were defined as tentative genes, while others were considered as rejected genes (Fig. [76]3A). Then, we separated the KIRC(TCGA) datasets into two cohorts, training and testing cohorts, according to the ratio of 7:3. We dismissed the rejected genes and abstracted the expression profile of confirmed and tentative genes from the training cohort to conduct lasso-cox regression analysis. Five genes were finally screened, including NR3C2, WDR72, ALDH6A1, CXCL13, and IL20RB. Interestingly, the genes identified by the lasso-cox algorithm are all confirmed genes, indicating the importance of these genes for prognosis under these two algorithms. Based on the lasso coefficient per gene, a risk score was finally calculated according to the equation: [MATH: KTRGs risk score=0.01755748CXCL13< /mtext>-0.15617289WDR72-0.04032272ALDH6A1 -0.16222042NR3C2+0.01788896IL20R B :MATH] Fig. 3. [77]Fig. 3 [78]Open in a new tab KTRGs risk model construction. A Schematic shows the process about constructin KTRGs risk model. B and C Kaplain-Meier survival curve shows the overall survival distinction between high and low KTRGs risk model in KIRC(TCGA) training and testing cohort. Time-dependent ROC curve shows predictive accuracy of risk model in 3-/5-/7-year prognosis. Heatmap shows the KTRGs risk signature expression in high and low KTRGs risk groups According to the ideal cut-off value estimated by the surv_cutpoint function from the survminer R-package, the KIRC training cohort was divided into high and low-expression groups. Survival curve analysis shows that high risk group patients are faced with the problem of low survival probability (Fig. [79]3B). From ROC curve analysis, we perceived the area under curves (AUC) at 3-/5-/7-year were 0.6915, 0.7025, and 0.7050, indicating that the KTRG risk score is highly accurate in predicting the prognosis of KIRC patients. The heat map shows that NR3C2, WDR72, and ALDH6A are highly enriched in the low-risk group, while CXCL13 and IL20RB are highly enriched in the high-risk group. The same analysis was performed in KIRC testing cohort, and we observed a similar result (i.e., For ROC, AUC at 3 years = 0.7212, AUC at five years = 0.7392, AUC at 7 years = 0.7107) (Fig. [80]3C). We utilized other two RCC datasets to further validation, and the results bear resemblance to KIRC datasets (i.e., For E-MTAB-1980 ROC, AUC at three years = 0.7309, AUC at five years = 0.7455, AUC at 7 years = 0.6919; For KIRP ROC, AUC at three years = 0.5940, AUC at five years = 0.6196, AUC at seven years = 0.6097) (Figure S3A and B). These results revealed that the KTRG risk score is practical and accurate for predicting outcomes of patients with kidney cancer. Clinical relevance and robustness of KTRGs risk score To further explore the relationship between KTRG risk score and clinical variables, we matched the clinical variables (i.e., pathological stage, histological grade, Gender, and Age) with a 5-gene KTRG expression profile in two RCC cohorts. We found that KTRG risk groups were not correlated with patients' age in the KIRC cohort but correlated with stage and grade, appearing that more advanced stage and grade harboured in high-risk groups (Fig. [81]4A). CXCL13 and IL20RB are highly enriched in high-risk groups, indicating that the expression of these two genes might promote disease progression. We then divided the clinical data into different clinical sub-groups and matched them with the corresponding mRNA expression profile of KTRGs. Intriguingly, we found that, whatever the subgroup is, patients' outcomes in the high-risk group are always poorer, indicating that the KTRG risk score is robust and steady, which could predict the prognosis of patients in every sub-classes of kidney cancer (Figure S4). The same analysis was performed in the E-MTAB-1980 cohort, and we observed the same results (Fig. [82]4B and Figure S5). Fig. 4. [83]Fig. 4 [84]Open in a new tab Clinical relevance of risk model. A Heatmap shows the five gene expressions in high and low KTRG risk groups and the distribution of clinical variables (including pathological stage, histological grade, gender, age, OS status, and OS time) in high and low KTRGs risk group in KIRC(TCGA) dataset. B Heatmap shows the 5 gene expression in high and low KTRG risk groups and the distribution of clinical variables (including pathological stage, histological grade, gender, age, OS status, and OS time) in high and low KTRGs risk group in the E-MTAB-1980 dataset. *P-value < 0.05; **P-value < 0.01; ***P-value < 0.001; ****P-value < 0.0001 Clinical nomogram model construction To develop the clinical nomogram, we initially analyzed the correlation between various clinical variables and the prognosis of patients with KIRC (Figure S7A). Our analysis revealed that additional pharmaceutical therapy, additional radiation therapy, age, history of neoadjuvant treatment, tumor size, grade, stage, chemotherapy outcomes (Progressive Disease [PD], Stable Disease [SD], Complete Response [CR], Partial Response [PR]), laterality, and risk score are significantly associated with patient prognosis. However, we observed that patients who received additional pharmaceutical therapy, radiation therapy, and neoadjuvant treatment exhibited a higher risk for poor outcomes, which contradicted our expectations. Upon further investigation into missing clinical data, we identified several variables with a high rate of data absence (> 20%) (Figure S7B). To ensure data integrity and reflect clinical reality, we selected a refined set of variables for further analysis, including age, risk score, gender, grade, laterality, and stage. In a multivariate Cox regression analysis, we identified the Kidney Cancer Risk Group (KTRG) risk score, grade, age, and stage as independent prognostic factors (Figure S7C). The E-MTAB-1980 dataset, which only included age, gender, grade, and stage variables, was used to perform a multivariate Cox regression analysis, reaffirming that the KTRG risk score remained an independent prognostic factor (Figure S7D). Subsequently, we subjected the variables (age, stage, grade, and KTRG risk score) from the KIRC dataset to stepwise regression analysis. Based on the lowest Akaike information criterion (AIC) value within the KIRC (TCGA) dataset, these four variables were deemed suitable for constructing a clinical nomogram model to predict the 3-, 5-, and 7-year survival probabilities (Fig. [85]5A). The calibration curve and ROC curve revealed that the model possesses a high discriminative accuracy (AUC at three years = 0.8081, AUC at five years = 0.761, AUC at seven years = 0.7394) (Fig. [86]5B and C). The nomogram risk score was further calculated based on the established model, and the high and low model risk scores showed extremely significant differences in overall survival probability (i.e., after 100 months, the survival probability of patients with a high model risk score is proximately zero, while in low model risk score are fifty per cent or so) (Fig. [87]5D). The result reveals that the nomogram risk model could stratify the patients with different prognoses more effectively and discriminatively. This is indeed the case when observing decision curve analysis (DCA). The nomogram model shows a maximum benefit in predicting a 3-/5-/7-year prognosis of patients compared to other single variables (Fig. [88]5E). The same analysis recurred in E-MTAB-1980 datasets, and the results are consistent with KIRC(TCGA) dataset (i.e., for ROC curve, AUC at three years = 0.8974, AUC at five years = 0.8805, AUC at seven years = 0.8241; for survival curve, the survival rate of patients in high-risk group is forty per cent while in low-risk group are eighty per cent about, after fifty mouths) (Figure S6). Fig. 5. [89]Fig. 5 [90]Open in a new tab Nomogram model construction and model evaluation in the KIRC(TCGA) dataset. A The nomogram model (including risk score, age, stage, and grade) predicts 3-/5-/7-year survival probability in the KIRC(TCGA) dataset. B and C Calibration and time-dependent ROC curve of nomogram model. D The survival curve shows the prognostic difference between the high and low-risk groups based on the nomogram model. E Decision curve analysis compares the predictive clinical benefit of risk score, age, gender, grade, stage, and nomogram model for a 3-/5-/7-year prognosis Differentially enriched pathways between high and low KTRGs risk groups Given the differential outcomes of the high and low KTRG risk groups, we try to explore the potential distinct mechanisms between these two groups. We first submitted the mRNA expression profile of KIRC(TCGA) to gene set enrichment analysis (GSEA) software, and then the potential cancer-related hallmark pathways were further explored. We found several pathways highly enriched in the high KTRGs risk group, and the top three are the EMT pathway, Allograft rejection, and IL6 JAK STAT3 signalling (Fig. [91]6A). Several metabolism-related pathways are highly enriched in the low KTRGs risk group, such as heme metabolism, fatty acid metabolism, and adipogenesis (Fig. [92]6B). Further analysis combining the pathways activity score and risk score to conduct Pearson correlation coefficient, the results represent that KTRGs risk score highly positively correlated with MYC TARGET V2, IL6 JAK STAT3, ALLOGRAFT REJECTION, DNA REPAIR, and UNFOLDED PROTEIN RESPONSE, and negatively correlated with FATTA_ACID_METABOLISM, PROTEIN_SECRETION, ADIPOGENESIS, ANDROGEN_RESPONSE, HEME METOBOLISM, etc. (Figure S8). When analyzing the biological process, we found several immune processes highly enriched in high KTRG risk groups, including antimicrobial humoral response, antibacterial humoral response, and regulation of acute inflammatory response to antigenic stimulus (Fig. [93]6C). Furthermore, the metabolism process, in terms of lipid modification, lipid oxidation, and negative regulation of the amide metabolic process, is mainly enriched in low KTRG risk groups (Fig. [94]6D). Considering the metabolism-related pathways and processes consistently differential enrichment, we collected some metabolism-related signatures such as energy, amino acid, vitamin, nucleotide, carbohydrates, and TCA and submitted them onto GSEA software. The results are consistent, and all these metabolism signatures are highly enriched in low KTRG risk groups (Fig. [95]6E and F). These results suggested several hallmark pathways, metabolism signatures, and some immune mechanisms occurred differentially enrichment between the two risk groups. Fig. 6. [96]Fig. 6 [97]Open in a new tab Pathways enrichment analysis. A-D Top 6 enriched cancer-related hallmark pathways and biological process in high KTRGs group. E and F Metabolism-related pathways enrichments between high and low KTRGs groups Immune relevance and immune therapeutic response of risk score Previous research pointed to the effect of Tregs on the immune system and even on immune therapeutic response. Respecting the differential enrichment of the immune process mentioned above we had analyzed, we further explored the relevance of immune and KTRG risk scores. We found the fraction of some anti-tumor immune cells, such as T cells CD8 and T cells CD4, is higher, while B cell naive, NK cell activated, and M1 macrophages are lower in the high KTRGs risk group. Needless to say, the Tregs fraction is higher in the high-risk group. However, the pro-tumor immune cells—M2 macrophage fraction decreased in the high-risk group (Fig. [98]7A). For immune cell activity, interestingly, B cells, CD8 + T cells, NK cells, Cytotoxic cells, and Tregs consistently increase in high-risk groups. There is no doubt that the KTRG risk score shows the highest positive correlation with the Tregs activity score but shows a less positive correlation with other cells, such as CD8 T cells and NK cells (Fig. [99]7B). In addition, the KTRG risk score shows a negative correlation with several immune cell fractions, including monocytes, NK cells resting, and M1/M2 macrophages (Figure S9A). Besides Tregs, the risk score also shows a low positive correlation with the fraction of CD8 T cells and T cells CD4 memory activated. In addition, the immune checkpoint, CTLA4 and PDCD1 (PD1) are highly expressed in the high-risk group, but CD274 (PDL1) is highly expressed in the low-risk group (Figure S9B). The risk score is highly positively correlated with mRNA expression of PDCD1 and CTLA4 but negatively correlated with CD274 (Figure S9C). All in all, the relationship between the KTRG risk score and other immune cells is extremely complex, and we can not predict the potential immune therapeutic response among these results. Therefore, we conducted immune therapeutic response prediction by TIDE analysis. Notably, we found that most non-responders were distributed in the high-risk group, whereas most of the respondents were located in the low-risk group (Figure S9D). In addition, the risk score is higher in no-responders than that in responders (Figure S9E). From this humble evidence, we could speculate that the KTRG risk score could act as a potential predictor for cancer patients who receive immune therapy. We obtained two external immune therapy datasets (Alexandra and [100]GSE78820). After merging the risk score based on their own KTRGs mRNA expression profile and clinical variables (such as survival data and immune therapeutic response data), the subsequent analyses were further performed. Similar to the results above, the high-risk group showed a low survival probability in these two datasets (Fig. [101]8A and E). Risk score also shows high predicted accuracy (for the Alexandra cohort, AUC at one year = 0.6410, AUC at three years = 0.7868; for the [102]GSE78820 cohort, AUC at one year = 0.7072, AUC at three years = 0.7229) (Fig. [103]8B and F). Furthermore, the patients that prove to be non-responders (SD/PD) show a higher KTRGs risk score (Fig. [104]8C and G). In the same way, the patients in the high-risk group primarily harboured in the no-responder group, while those in the low-risk group were mainly distributed in the responder (CR/PR) group (Fig. [105]8D and H). These results provide potential evidence for our hypothesis that our KTRG risk score could be employed to act as a predictor for predicting immune therapeutic outcomes. Fig. 7. [106]Fig. 7 [107]Open in a new tab Correlation between KTRGs and immune system. A The boxplot shows the difference in immune cell fraction and activity score in high and low KTRG risk groups. B The heatmap shows a correlation between mRNA expression of KTRGs and immune cell activity scores. *P-value < 0.05; **,P-value < 0.01; ***,P-value < 0.001; ****P-value < 0.0001 Fig. 8. [108]Fig. 8 [109]Open in a new tab Immune therapeutic response prediction. A and E Kaplain-Meier survival curve reveals the survival difference in the high KTRGs risk group in two immune therapeutic cohorts. B and F Time-dependent ROC curve of KTRGs risk score on predicting 1-/3-year survival. C and G KTRGs risk score difference between CR/PR and SD/PD cohorts in two immune therapeutic cohorts. D and H The distribution of high and low-risk groups in CR/PR and SD/PD cohorts in two immune therapeutic cohorts. *,P-value < 0.05; ***,P-value < 0.001; ****,P-value < 0.0001 Discussion The most common treatment strategy based on cancer immunology is the blockade of co-inhibitory molecules, also known as immune checkpoints (ICs). The key IC targets are Cytotoxic T lymphocyte antigen-4 (CTLA-4), programmed cell death 1 (PD-1, also known as PDCD1) and its ligand PD-L1 (also known as CD274) [[110]19–[111]21]. The transferation of T cells to effector T cells needs T cell receptor (TCR) activation and CD28 co-activation. However, CTLA4 and PD1 block this process with their own pathways [[112]22]. Therefore, antibodies against CTLA4 and PD1 are used as a strategy for sustaining anti-tumour response. Though the immune checkpoint blockade (ICB) achieved unparalleled success, more than 50% of cancer patients, also including RCC patients, could not benefit from these therapies due to low or no response to them [[113]23]. Thus, it is urgent to explore the potential mechanism of therapeutic resistance and search for a combination treatment strategy. CD4 + regulatory T cells (Tregs), which express CD25 and CTLA4 on the cell surface and the transcription factor Forkhead box P3 (Foxp3) in the nucleus, play essential roles in maintaining peripheral tolerance and preventing autoimmunity. However, the accumulation of Tregs could suppress anti-tumour immunity [[114]7, [115]24]. Tregs secrete various immunosuppressive cytokines to lead to T cell dysfunction [[116]25]. In kidney cancer, research shows that Tregs could effectively inhibit activated T cell proliferation [[117]26]. High infiltration of Tregs is associated with resistance to immune therapy [[118]10]. RCC, especially ccRCC, is featured by rich leukocyte infiltrates, such as CD8 + T cells, CD4 + T cells and natural killer (NK) cells [[119]27]. It has been found that a high proportion of tumour-infiltrating CD8 + T cells was related to prolonged patient outcomes. However, the abundance of intratumoral CD8 + and CD4 + T cells has been correlated with worse patient survival, suggesting that the status of these cells is an essential determinant of effective anti-tumour activity [[120]28]. The anti-tumour function of these tumour-infiltrating lymphocytes (TILs) is usually inhibited by Tregs, macrophages, and neutrophils in the tumour microenvironment (TME). Therefore, exploring the potential mechanism of the inability of anti-tumour immune based on Tregs and developing the potential therapeutic targets on Tregs in RCC is necessary. So far, there are possible mechanisms related to how Tregs suppress the anti-tumour immune system. For instance, IL35 has been found to be a potentially important Treg suppressive mechanism in tumour models [[121]29]. T cell Ig and ITIM domain (TIGIT) is upregulated on Tregs [[122]30, [123]31], which marks a Tregs subpopulation that is highly suppressive against Th1, Th17, and CD8 + T cell function [[124]32]. These findings provide opportunities to develop novel Treg-targeted interventions. As it had come to be known [[125]7], in our study, we also find that high infiltration of Tregs is associated with poor prognosis of RCC patients, thus luring us to excavate the Tregs-related genes further. Through Pearson methods and differential differential expression analysis, we finally identified Tregs-related genes associated with KIRC. Multivariate Cox regression and two machine learning methods, boruta and Lasso Cox regression, were further performed and finally determined five potentially important KTRGs, including NR3C2, WDR72, ALDH6A1, CXCL13, and IL20RB. Among these, several genes have been found to be associated with the function of Tregs. The tumour-infiltrating CXCL13-producing T follicular helper cells (TFH cells) differentiation plays a key role in converting Treg-mediated immune suppression to de novo activation of adaptive antitumor humoral responses in breast cancer [[126]33]. In gastric cancer, CXCL13 + CD8 + T cells represented an exhausted CD8 + T cell subset, which predicted poor overall survival and was associated with immunoevasive contexture with increased Tregs [[127]34]. Cancer-associated fibroblasts could increase the production of CXCL13 and reduce the availability of IL-2 by expanding a population of activated Treg [[128]35]. In our study, CXCL3 shows a positive correlation with Tregs activity and poor prognosis of KIRC, revealing that Tregs might synergize with CXCL3 to regulate RCC biological behaviours. IL20RB has been reported as positively correlated with the number of Tregs [[129]36], while the correlation of other genes with Tregs has not been reported. Therefore, there is still a need to explore the function and mechanism of these KTRG genes on Tregs in RCC. Based on the mRNA expression of these five KTRGs, a risk model was established, and the model was effective in predicting prognosis. It also showed independent prognostic factors and higher robustness by analyzing multiple RCC datasets. These results indicate that the 5-gene risk model was available and practical. EMT pathways, allograft rejection, and JAK/STAT3 pathways were found to be highly enriched in the high KTRGs risk group. In gastric cancer, miR-192-5p/RB1 could promote EMT and induce Treg cell differentiation by regulating IL-10 secretion [[130]37]. In prostate cancer, STAT3 inhibition by GPB730 could enhance the antitumoral activity of anti-CTLA4 and decrease the intratumoral Treg frequency [[131]38]. Previous research revealed that disruption of the CXCL13/CXCR5 axis by gene knockdown impaired the production and recruitment of Tregs [[132]39], indicating that CXCL13 could act as a target to eliminate the abundance of Tregs. In addition, we found multiple metabolism-related pathways highly enriched in low KTRG risk groups. Intratumoral Tregs could maintain the suppressive functions of low-glucose, hypoxic, and acidic environment on effector T cell functions [[133]40], which might explain the reason for the dysfunction of metabolism between high and low KTRGs risk group. Finally, we found that the high KTRG risk group showed a worse immune therapeutic response, and the KTRG risk score showed a high correlation with Tregs activity and cell fraction, revealing that risk score could act as a marker of precaution about whether a patient could benefit from immune therapy and whether it could be necessary to change the treatment strategy for him. All in all, therapies targeting Treg cells are under active exploration. A combination of Treg cell attenuation with the activation of tumour-specific effector T cells may mutually enhance each individual treatment [[134]7]. We identified some potential Tregs-related genes, and some of them might develop to the target treating Tregs to combine with the anti-tumour system. We wish these organized data could be investigated by further pre-clinical and clinical research. Limitation Although the KTRG risk signature shows a robust ability to predict kidney cancer patients' prognosis and immune response, several limitations remain. The construction and validation of the KTRG risk signatures rely on public databases, which might contain inaccuracies, inconsistencies, or biases. Furthermore, whether the signature can serve as an independent predictor of prognosis and immune therapeutic response for RCC patients still requires prospective studies. The functions and molecular mechanisms of the identified genes are based on bioinformatic analysis and need further pre-clinical experiments for validation. Supplementary Information [135]12672_2025_1787_MOESM1_ESM.pdf^ (1.9MB, pdf) Additional file 1: Figure S1 Based on the cut-off value per gene expression, the KIRC data was seperated into high and low expression groups, the kaplain-survival curve was depicted to show the survival probability between the two groups. Figure S2 Cross-talk and epigenetic alteration of KTRGs.Protein-protein interaction network of KTRGs.mRNA expression correlation diagram among KTRGs.Mutation characterization of KTRGs in the KIRC dataset.Copy number variation characterization of KTRGs in the KIRC dataset. Boxplot shows the correlation between copy number variation and mRNA expression of corresponding genes. *, P-value < 0.05; **, P-value < 0.01; ***, P-value < 0.001; ****, P-value < 0.0001. Figure S3 Risk model validation.Kaplain-Meier survival curve shows the overall survival distinction between high and low KTRGs risk model in E-MTAB-1980 and KIRP cohort. Time-dependent ROC curve shows predictive accuracy of risk model in 3-/5-/7-year prognosis. Heatmap shows the KTRGs risk signature expression in high and low KTRGs risk groups. Figure S4 KIRCdataset was separated into different clinical subgroups, the Kaplain-Meier survival curve was depicted to demonstrate the survival difference. Figure S5 E-MTAB-1980 dataset was separated into different clinical subgroups, and the Kaplain-Meier survival curve was depicted to demonstrate the survival difference. Figure S6 Nomogram model construction and model evaluation in RCCdataset.The nomogram modelpredicts 3-/5-/7-year survival probability in the RCCdataset.Calibration and time-dependent ROC curve of nomogram model.The survival curve shows the prognostic difference between the high and low-risk groups based on the nomogram model.Decision curve analysis compares the predictive clinical benefit of risk score, age, gender, grade, stage, and nomogram model for a 3-/5-/7-year prognosis. Figure S7Univariate Cox regression analysis based on multiple clinical variables.Barplot shows the data missing rate.Multivariate Cox regression analysis based on variables including age, gender, grade, stage, and KTRGs risk score in KIRCand RCCdatasets. Figure S8 Barplot shows the Pearson correlation coefficient between KTRGs risk score and activity of cancer-related hallmark pathways. Figure S9The heatmap shows a correlation between mRNA expression of KTRGs and immune cell fraction.The boxplot shows the mRNA expression difference of CTLA4, PDCD1, and CD274 in the high and low KTRG risk groups.Correlation coefficient diagram between risk score and CTLA4, PDCD1, and CD274 expression.Distribution of a number of responders and non-responders in high and low-risk groups.Boxplot shows the KTRGs risk score of respondents and non-responders. *, P-value < 0.05; **, P-value < 0.01; ***, P-value < 0.001; ****, P-value < 0.0001 Acknowledgements