Abstract

   Regulatory T cells (Tregs) have been found to be related to immune
   therapeutic resistance in kidney cancer. However, the potential
   Tregs-related genes still need to be explored. Our study found that
   patients with high Tregs activity show poor prognosis. Through
   co-expression and differential expression analysis, we screened several
   Tregs-related genes (KTRGs) in kidney renal clear cell carcinoma. We
   further conducted the univariate Cox regression analysis and determined
   the prognosis-related KTRGs. Through the machine learning
   algorithm—Boruta, the potentially important KTRGs were screened further
   and submitted to construct a risk model. The risk model could predict
   the prognosis of RCC patients well, high risk patients show a poorer
   outcomes than low risk patients. Multivariate Cox regression analysis
   reveals that risk score is an independent prognostic factor. Then, the
   nomogram model based on KTRG risk score and other clinical variables
   was further established, which shows a high predicted accuracy and
   clinical benefit based on model validation methods. In addition, we
   found EMT, JAK/STAT3, and immune-related pathways highly enriched in
   high risk groups, while metabolism-related pathways show a low
   enrichment. Through analyzing two other external immune therapeutic
   datasets, we found that the risk score could predict the patient's
   immune therapeutic response. High-risk groups represent a worse
   therapeutic response than low-risk groups. In summary, we identified
   several Tregs-related genes and constructed a risk model to predict
   prognosis and immune therapeutic response. We hope these organized data
   can provide a theoretical basis for exploring potential Tregs' targets
   to synergize the immune therapy for RCC patients.

Supplementary Information

   The online version contains supplementary material available at
   10.1007/s12672-025-01787-x.

   Keywords: Tregs, RCC, Machine learning, Clinical model, Immune therapy

Introduction

   Renal cell carcinoma (RCC), as one of the most commonly diagnosed
   urological cancer, lead to over 179000 deaths every year in the world
   [[32]1]. RCC comprises several subtypes with unique characteristics.
   Clear-cell RCC is the most common subtype, accounting for nearly 70% of
   the cases [[33]2, [34]3]. RCC is known to be a tumour with dysfunction
   of the immune system, in which the potential mechanisms are supposed to
   be the block of anti-tumour immune response via immune suppressive
   factors, such as checkpoint molecules [[35]4]. Targeting immune
   checkpoints results in clinical responses in some patients with RCC,
   but a large proportion of patients do not benefit from this treatment
   [[36]4, [37]5]. Therefore, the identification of potential biomarks for
   immune therapeutic response is essetial to improve the clinical
   efficacy of these therapies.

   It has been reported that RCC tumours are severely infiltrated by T
   cells. However, the T cells could not mount effective anti-tumour
   efficacy [[38]4], which might probably be due to the suppressive
   function of regulatory T cells (Tregs) and myeloid cells. Regulatory T
   cells (Tregs) are an indispensable subset of T lymphocytes, crucial for
   maintaining immune homeostasis and preventing autoimmunity by
   suppressing excessive immune responses [[39]6]. In recent years, the
   common immunological basis for Treg-mediated suppression of
   autoimmunity and tumour immunity has been extensively explored. The
   research suggests that the mounting infiltration levels of Tregs were
   related to worse clinical outcomes in multiple tumours [[40]7]. The
   depletion of Tregs could elicit effective anti-tumour immunity by
   eliminating immunological unresponsiveness to syngeneic tumours,
   although it may also result in autoimmunity, especially if Treg cells
   are depleted systemically [[41]7, [42]8]. In the immune
   microenvironment of RCC, Tregs play a nonnegligible role in RCC
   progression. For instance, CD4 + FOXP3 + Tregs are involved with an
   increased recurrence of RCC [[43]9]. The PD1 expression on Tregs is
   correlated with resistance to anti-PD1 in metastatic clear cell RCC
   (mccRCC) [[44]10]. In addition, high infiltration of Tregs is also
   associated with resistance to antiangiogenic therapy in mRCC [[45]11].
   These results suggest that Tregs-targeted therapy might synergize with
   anti-PD1 therapy for patients with RCC. So far, research has explored
   the potential biomarkers of Tregs in RCC [[46]12, [47]13]. However, few
   studies excavated the potential Tregs-related genes, and there is a
   shortage of clinical models based on Tregs to predict the immune
   therapeutic efficacy.

   In our study, we screened several potentially important Tregs-related
   genes in RCC using machine learning algorithms and constructed the risk
   model called the KTRGs risk model. Our model could predict prognosis
   and turn out to be an independent prognostic factor for patients with
   RCC. Mechanism analyses reveal that the KTRG risk model might be mainly
   related to EMT, JAK/STAT3, immune, and metabolism-related pathways or
   biological processes. Furthermore, KTRGs risk model could predict the
   immune therapeutic response well. The genes in the model might be the
   potential targets of Tregs, which could be used to synergize immune
   therapy. All in all, our research provided a theoretical foundation for
   treating Tregs against patients with RCC.

Methods

Data acquisition

   We selected datasets from patients diagnosed with RCC for our study,
   including TCGA-KIRC, TCGA-KIRP, and E-MTAB-1980 datasets. Patients were
   eligible for this study if their RCC samples included mRNA expression
   data and clinical data (at least including age, gender, histological
   grade, pathological stage, overall survival time, and overall survival
   status). The mRNA experssion data of KIRC and KIRP datasets (version:
   07–19-2019) were obtained from the UCSC Xena website based on TCGA
   database ([48]http://xena.ucsc.edu/). KIRC contains contains 537
   tumours and 72 normal samples, KIRP contains 291 tumours and 88 normal
   tissue in KIRP. The mRNA expression of the E-MTAB-1980 dataset (Release
   Date: 16 October 2013) was collected from the ArrayExpress database
   ([49]https://www.ebi.ac.uk/arrayexpress), which includes 100 tumour
   samples. The immune therapeutic dataset—Alexandra cohort, was obtained
   from previous research [[50]14], which comprises 25 tumour samples;
   [51]GSE78820 was obtained from the GEO database
   ([52]https://www.ncbi.nlm.nih.gov/gds), which contains 25 tumour
   samples. KIRC, KIRP, Alexandra, and [53]GSE78820 mRNA expression format
   was transformed into TPM units. The E-MTAB-1980 dataset contains the
   normalized mRNA expression profile that was to be downloaded. The KIRC
   dataset was primarily used to construct the risk score and explore
   molecular mechanisms. The KIRP and E-MTAB-1980 datasets were used to
   validate the practicability of the risk score. The immune therapeutic
   datasets, Alexandra and GEO cohorts, were used to explore the
   predictive accuracy of the risk score on immune therapeutic response.

Pearson correlation analysis

   For two successive variables, the Pearson method was used to estimate
   the correlation between them by calculating the Pearson correlation
   coefficient (PCC) based on “rcorr” R function from “Hmisc” R package.
   The p-value was adjusted by false discovery rate (FDR) based on
   “p.adjust” R function. The absolute PCC value > 0.4 and adjusted
   p-value < 0.05 were considered to be a threshold to prove that the two
   variables have a positive or negative correlation.

Differential genes screening

   The tumour and normal mRNA expression profiles were marched, and
   differential expression analysis was conducted based on the “limma”
   R-package (version: 3.52.1). The Benjamini–Hochberg method was used to
   adjust the P-value. The absoluted logFC (log2 fold change) > 2, and
   adjust p-value < 0.05 were regarded as a threshold to estimated whether
   a gene differently expressed between tumor and normal or not.

Survival analysis

   Gene expression data per sample was matched with their own clinical
   overall survival (OS) data based on their own unique patient ID, and
   then the univariate Cox regression analysis was further performed using
   the “coxph” R function from “survival” R package. The univariate Cox
   regression analysis is a proportional hazards model, which produce the
   hazard ratio (HR) to estimate whether the expression of a gene is a
   risky or protective factor. Generally speaking, the variables with
   HR > 1 and p-value < 0.05 are regarded as risky factors, while the
   variables with HR < 1 and p-value < 0.05 are refered to as protective
   factors. Then, the sample was separated into high and low-expression
   groups based on single gene expression value, the ideal cut-off value
   per gene was calculated by “surv_cutpoint” R function from the
   “survminer” R package; the Kaplain-Meier survival curves were depicted
   to demonstrate the survival difference between the two groups, and a
   log-rank test was performed to evaluate the statistical significance
   based on the “survival” R-package.

Gene set variateion analysis

   Cancer-related hallmark gene sets were acquired from the MSigDB
   database ([54]https://www.gsea-msigdb.org/gsea/downloads.jsp), and the
   progery pathway signature was obtained from previous research [[55]15].
   The gene sets and whole-genome expression profile of TCGA-KIRC were
   presented to “GSVA” R package, and pathway activities per sample were
   finally calculated. Activity scores of pathways and gene expression
   profiles were subsequently merged by matching with the same sample
   names. Then then Pearson correlation coefficients (PCC) between gene
   expressions and pathway activities were calculated using the “rcorr”
   function from the “Hmisc” R-package. The P-value was adjusted by false
   discovery rate and transformed to -Log10(adjusted p-value) format to
   represent significance. The more signficant the correlation between two
   variables, the larger the size of bubble in heatmap.

Identification of important genes and KTRGs risk score construction

   The KTRGs expression profile merged with clinical overall survival data
   was submitted to the “Boruta” algorithm from “Boruta” R pakage. Then,
   three types of KTRGs were determined, including confirmed (i.e.,
   important) genes, tentative genes (i.e., probably important), and
   rejected (i.e., unimportant) genes. The KIRC(TCGA) data was divided
   into training and testing cohorts by a ratio of 7:3. The confirmed and
   tentative genes were further utilized to construct a risk model through
   lasso-cox regression analysis in the training cohort based on “glmnet”
   R package. Five genes were further determined, and the risk score was
   calculated using the lasso coefficient per gene times mRNA expression:
   [MATH: <mrow><mrow><mtext>Risk score</mtext><mspace
   width="0.333333em"></mspace></mrow><mo>=</mo><msub><mrow><mspace
   width="0.333333em"></mspace><mtext>Coef</mtext></mrow><mtext>gene1</mte
   xt></msub><mspace
   width="0.166667em"></mspace><mrow></mrow><mo>∗</mo><mspace
   width="0.166667em"></mspace><msub><mtext>Expression</mtext><mtext>gene1
   </mtext></msub><mo>+</mo><mspace
   width="0.166667em"></mspace><mo>.</mo><mo>.</mo><mo>.</mo><mspace
   width="0.166667em"></mspace><mo>+</mo><msub><mrow><mspace
   width="0.333333em"></mspace><mtext>Coef</mtext></mrow><mtext>gene5</mte
   xt></msub><mspace
   width="0.166667em"></mspace><mrow></mrow><mo>∗</mo><mspace
   width="0.166667em"></mspace><msub><mtext>Expression</mtext><mtext>gene5
   </mtext></msub></mrow> :MATH]

   Then, the risk score was further utilized to predict the overall
   survival of patients. Based on the optimal cut-off value of risk score,
   the high and low-risk groups were further determined in the training
   cohort, and then the Kaplain-Meier survival curve was portrayed to
   illustrate the survival difference between these two groups. ROC curve
   represents the accuracy of riskscore on predicting 3-/5-/7-year
   prognosis of the patients. Then, the same analyses were conducted in
   the internal testing cohort and the other two external validation
   cohorts—E-MTAB-1980 and KIRP(TCGA).

Risk model evaluation

   We downloaded clinical data for KIRC, encompassing additional
   pharmaceutical therapy, radiation therapy, surgery (locoregional and
   metastatic procedures), age, gender, neoadjuvant treatment history,
   dimensions (intermediate and longest), laterality, grade, and stage,
   along with therapeutic outcomes. We matched these variables with risk
   scores and performed univariate Cox regression and data missing rate
   analyses. Variables with high missing rates or impractical results from
   the regression were excluded from further analysis. The risk score
   merged other clinical variables, including age, gender, laterality,
   grade, and stage, and then the multivariate Cox regression was further
   carried out. The factor with p-value < 0.05 during multivariate Cox
   regression was regarded as an independent prognostic factor. Then the
   data were separated into different clinical subgroups (i.e., age,
   age ≤ 60 & age > 60; gender, female & male; grade, grade I-II & grade
   III-IV; stage, stage I-II & stage III-IV). The risk score of each
   subgroup was computed and the high and low risk groups were established
   in the light of optimal cut-off value. Kaplain Meier survival curve was
   portrayed to demostrate the difference between high and low risk groups
   across sub-groups.

Nomogram model established and model evaluation

   The variables risk score, age, grade, and stage were conducted using
   multivariable Cox regression analysis, and then stepwise regression
   analysis was conducted utilizing the “step” R function. The risk score,
   age, grade, and stage were finally screened to construct a nomogram
   model. The calibration curve, time-dependent ROC curve, and decision
   curve analysis (DCA) were depicted to evaluate model accuracy and
   clinical benefit based on “rms”, “survivalROC”, “ggDCA” R package,
   respectively. Kaplain-Meier survival curves were portrayed to
   demonstrate the survival difference between high and low-risk scores
   constructed by the nomogram model.

Mechanism analyses

   The GSEA software (Version: GSEA-4.0.3) was downloaded from the GSEA
   website ([56]https://www.gsea-msigdb.org/gsea/downloads.jsp). The mRNA
   expression profile of the KIRC(TCGA) dataset to gene set enrichment
   analysis (GSEA) software. The cancer-related hallmark pathway gene sets
   and gene ontology—biological progression gene sets were downloaded from
   a public website ([57]https://www.gsea-msigdb.org/gsea/index.jsp). The
   metabolism-related gene sets were downloaded from previous research
   [[58]16]. Then, the tumor samples with whole genome experssion profile
   was ranked by risk groups, high risk group was arranged in front of the
   low risk group. The expression profile and gene sets were submitted to
   GSEA software to estimate potentially differentially enriched pathways.
   For the gene set variation analysis (GSVA), we first compute the
   hallmark pathway activity score, Then the pathway activity score and
   risk score were further used to calculate the Pearson correlation based
   on the Pearson method.

Tumor immune infiltration analyses and immune therapeutic estimation

   The gene sets representing twenty-four types of immune cell were
   acquired from previous study [[59]17]. Then we applied the
   single-sample gene set enrichment (ssGSEA) method based on “GSVA”
   R-package to calculated the immune activity score of cells per sample.
   The proportion of 22 types of immune cell per samples was evaluated by
   Cell Type Identification by Estimating Relative Subsets of RNA
   Transcripts (CIBERSORT). The immune cell activity score and immune cell
   proportions were further used to explore the correlation with risk
   score. For the immune therapeutic estimation, we first using the tumour
   immune dysfunction and exclusion (TIDE) analysis, which is a public
   website merged with large scale omics data and biomarkers based on the
   immune checkpoint blockade (ICB) trials [[60]18]
   ([61]http://tide.dfci.harvard.edu). Based on the TIDE score, the
   patients was referred to as responder and non-responder, which were
   further used to analyze the relation to risk score.

Statistic analysis

   We utilized the unpaired t-test to estimate the difference between the
   two continuous variables using “t.test” R function. The chi-square test
   was used for contingency table data to examine the distribution
   distinction utilizing “chisq.test” R function. The Kaplain-Meier curve
   was depicted to portray the survival difference between the two groups,
   and the log-rank test was applied to assess the significance using
   “coxph” R function from “survival” R package. We estimated the
   correlation between the two continuous variables by calculating the
   Pearson correlation coefficient (PCC) using “rcorr” R function from
   “Hmisc” R package. For all analyses, a p-value less than 0.05 was
   regarded as significant. *P-value < 0.05; **P-value < 0.01;
   ***P-value < 0.001; ****P-value < 0.0001.

Results

Identification of KIRC Tregs-related genes (KTRGs)

   Through the ssGSEA analysis, we figured out the Tregs activity score of
   KIRC. Based on the optimal cut-off value, the KIRC samples were
   separated into high- and low-Treg activity groups. We observed a poor
   prognosis that occurs in high Tregs activity groups (Fig. [62]1A),
   which lures us to further explore the potential reason and mechanism of
   Tregs on KIRC prognosis. The first thing we did was to determine the
   Tregs-related genes. To do this, we calculated the Pearson correlation
   coefficients (PCC) between each gene expression and Tregs score based
   on the whole genome expression profile of KIRC(TCGA). Those with
   absolute PCC > 0.4 and P-value < 0.05 were identified as Treg-related
   genes (Fig. [63]1B). Then we carried out differential expression
   analysis by comparing the tumour with the normal expression profile of
   KIRC with a filtering threshold based on absolute log2(Fold Change) > 2
   and adjusted P-value < 0.05, differential expression genes (DEGs) were
   finally screened (Fig. [64]1B). 76 KIRC Tregs-related genes (KTRGs)
   were eventually determined by overlapping Treg-related genes and DEGs.
   The heatmap revealed that these genes' expressions could rightly
   discriminate between tumour and normal samples (Fig. [65]1C). When the
   KTRGs expression profile was conducted dimensionality reduction by
   t-distributed Stochastic Neighbor Embedding (t-SNE), we also observed
   tumour and normal samples harboured in different areas in a
   two-dimension axis, indicating that the expression of these genes could
   distinguish the tumour and the normal tissues well (Fig. [66]1D). We
   then presented the mRNA expression profile of KTRGs to clinical
   survival data, and a univariate Cox regression analysis was
   subsequently performed. 22 genes of 76 KTRGs were estimated to be
   associated with the overall survival of KIRC. Among these, SPI1, TYMP,
   LILRB1, IL20RB, ITGAX, C1QL1, FCER1G, SLAMF8, TGFBI, CCL5, and CXCL13
   were risky genes to the prognosis of KIRC patients, high expression of
   which represent a worse survival probability (Fig. [67]1E and Figure
   S1). While OGDHL, MTURN, HSD11B2, PPARGC1A, LDHD, FREM1, NE3C2, TRIM2,
   SLAMF8, AGPAT9, and WDR72 were protective genes to OS of the patients
   and low expression of which represent a worse clinical outcome. In the
   protein–protein interaction (PPI) network, HSD11B2, NR3C2, and PPARGC1A
   show a protein interaction, SLAMF8, FCER1G, ITGAX, LILRB1, SPI1, CCL5,
   and CXCL13 show a protein interaction, while others do not show a
   relationship (Figure S2A). In mRNA levels, WDR72, MTURN, NR3C2, AGPAT9,
   TRIM2, PPARGC1A, OGFHL, LDHD, FREM1, and HSD11B2 show positive
   correlation with each other and negative correlation with other genes
   (Figure S2B). We further analyzed the epigenetic characterization, and
   we found that the mutation rate of all KTRGs is low (i.e., mutation
   rate < 5%), but the copy number variation (CNV) of some is exuberant
   (Figure S2C and D). Among these, TGFBI, SLAMF8, MTURN, LILRB1, LDHD,
   ITGAX, IL20RB, and HSD11B2 show high amplification (Amp) rate, TRIM2,
   OGDHL, NR3C2, IL20RB, FCER1G, and ALDH6A1 show high deletion (Del)
   rate. The expression of IL20RB, TRIM2, NR3C2, CXCL13, SLAMF8, and
   FCER1G are affected by copy number variation, while other genes do not
   show a significant effect.

Fig. 1.

   [68]Fig. 1
   [69]Open in a new tab

   Identification of KIRC Tregs-related genes. A Kaplain-Meier survival
   curve reveals the survival difference of the patients in high and low
   Tregs activity groups. B Schematic shows the process about screening
   KTRGs. C Heatmap shows the expression difference of KTRGs between tumor
   and normal groups. D The t-SNE analysis based on the expression of
   KTRGs between normal and tumor tissues. E The forest plot shows the
   prognosis-related KTRGs determined by the univariate Cox regression
   analysis

Clinical relevance and mechanisms of KTRGs

   Apart from the relationship between prognosis and KTRGs mentioned, we
   matched the clinical stage and grade data and mRNA expression profile
   of KTRGs. Analysis of variance (ANOVA) was performed to estimate the
   difference of genes in different groups (i.e., Grade I—IV). Gene
   expression levels were shown by heatmap. We found OGDHL, PPARGC1A,
   LDHD, AGPAT9, ALDH6A1, WDR72, HSD11B2, FREM1, MTURN, NR3C2, and TRIM2
   decrease little by little with the advance of grade and stage, whereas
   TGFBI, IL20RB, C1QL1, CCL5, CXCL13, TYMP, ITGAX, LILRB1, SLAMF8, SPI1,
   and FCER1G, gradually increase with development of disease (Fig. [70]2A
   and B). Through analyzing the correlation between 50 hallmark pathway
   activities and KTRGs expression, we found several genes (i.e., TGFBI,
   SPI1, SLAMF8, LILRB1, etc.) expression positively correlated with
   cancer-related pathways such as G2M checkpoint, E2F targets, Hedgehog
   signalling, Wnt pathways, P53 pathways, etc. PPARGC1A, OGDHL, ALDH6A1,
   and AGPAT9 have a positive correlation with adipogenesis, fatty acid
   metabolism, heme metabolism, bile acid metabolism, androgen response
   (Fig. [71]2C). PROGENy algorithm (Pathway RespOnsive GENes) concludes
   several cell-specific signalling pathways [[72]15]. Consistently,
   TGFBI, SPI1, SLAMF8, LILRB1, FCER1G, CXCL13, CCL5, and C1QL1 show a
   positive correlation with most cancer-related pathways. WDR72, TRIM2,
   PPARGC1A, FREM1, ALDH6A1, and AGPAT9 show a negative correlation with
   most cancer pathways but a positive correlation with androgen pathways
   (Fig. [73]2D). All in all, these KTRG expressions are related to tumour
   progression and might function differently by regulating specific
   metabolisms.

Fig. 2.

   [74]Fig. 2
   [75]Open in a new tab

   Clinical relevance and mechanism analysis of KTRGs. A The heatmap shows
   the expression of KTRGs across different grades. B The heatmap shows
   the expression of KTRGs across different stages. C The heat map shows
   the correlation between KTRGs and activity scores in cancer-related
   hallmark pathways. D The heat map shows the correlation between KTRGs
   and the activity score of the progery pathway signature. The red bubble
   represents a positive correlation; the blue bubble represents a
   negative correlation. The size of the bubble shows significance; the
   larger the size, the more significance it has. *P-value < 0.05;
   **P-value < 0.01; ***P-value < 0.001; ****P-value < 0.0001

KTRGs risk signature construction

   To further identify the potential important KTRGs, we matched the KTRGs
   expression profile with KIRC clinical data, and then the data was
   submitted to the Boruta algorithm. Three types of KTRGs, including
   confirmed (i.e., important) genes, tentative genes (i.e., probably
   important), and rejected (i.e., unimportant) genes, were finally
   determined. Among these, CXCL13, WDR72, TRIM2, ALDH6A1, NR3C2,
   PPARGC1A, and IL20RB were defined as confirmed genes, SLAMF8, FCER1G,
   LDHD, MTURN, TYMP, and OGDHL were defined as tentative genes, while
   others were considered as rejected genes (Fig. [76]3A). Then, we
   separated the KIRC(TCGA) datasets into two cohorts, training and
   testing cohorts, according to the ratio of 7:3. We dismissed the
   rejected genes and abstracted the expression profile of confirmed and
   tentative genes from the training cohort to conduct lasso-cox
   regression analysis. Five genes were finally screened, including NR3C2,
   WDR72, ALDH6A1, CXCL13, and IL20RB. Interestingly, the genes identified
   by the lasso-cox algorithm are all confirmed genes, indicating the
   importance of these genes for prognosis under these two algorithms.
   Based on the lasso coefficient per gene, a risk score was finally
   calculated according to the equation:
   [MATH: <mrow><mrow><mtext>KTRGs risk score</mtext><mspace
   width="0.333333em"></mspace></mrow><mo>=</mo><mspace
   width="0.166667em"></mspace><mn>0.01755748</mn><mspace
   width="0.166667em"></mspace><mrow></mrow><mo>∗</mo><mrow><mtext>CXCL13<
   /mtext><mspace width="0.333333em"></mspace></mrow><mo>-</mo><mspace
   width="0.166667em"></mspace><mn>0.15617289</mn><mspace
   width="0.166667em"></mspace><mrow></mrow><mo>∗</mo><mrow><mtext>WDR72</
   mtext><mspace width="0.333333em"></mspace></mrow><mo>-</mo><mspace
   width="0.166667em"></mspace><mn>0.04032272</mn><mspace
   width="0.166667em"></mspace><mrow></mrow><mo>∗</mo><mrow><mtext>ALDH6A1
   </mtext><mspace width="0.333333em"></mspace></mrow><mo>-</mo><mspace
   width="0.166667em"></mspace><mn>0.16222042</mn><mspace
   width="0.166667em"></mspace><mrow></mrow><mo>∗</mo><mrow><mtext>NR3C2</
   mtext><mspace width="0.333333em"></mspace></mrow><mo>+</mo><mspace
   width="0.166667em"></mspace><mn>0.01788896</mn><mspace
   width="0.166667em"></mspace><mrow></mrow><mo>∗</mo><mrow><mspace
   width="0.333333em"></mspace><mtext>IL2</mtext></mrow><mn>0</mn><mtext>R
   B</mtext></mrow> :MATH]

Fig. 3.

   [77]Fig. 3
   [78]Open in a new tab

   KTRGs risk model construction. A Schematic shows the process about
   constructin KTRGs risk model. B and C Kaplain-Meier survival curve
   shows the overall survival distinction between high and low KTRGs risk
   model in KIRC(TCGA) training and testing cohort. Time-dependent ROC
   curve shows predictive accuracy of risk model in 3-/5-/7-year
   prognosis. Heatmap shows the KTRGs risk signature expression in high
   and low KTRGs risk groups

   According to the ideal cut-off value estimated by the surv_cutpoint
   function from the survminer R-package, the KIRC training cohort was
   divided into high and low-expression groups. Survival curve analysis
   shows that high risk group patients are faced with the problem of low
   survival probability (Fig. [79]3B). From ROC curve analysis, we
   perceived the area under curves (AUC) at 3-/5-/7-year were 0.6915,
   0.7025, and 0.7050, indicating that the KTRG risk score is highly
   accurate in predicting the prognosis of KIRC patients. The heat map
   shows that NR3C2, WDR72, and ALDH6A are highly enriched in the low-risk
   group, while CXCL13 and IL20RB are highly enriched in the high-risk
   group. The same analysis was performed in KIRC testing cohort, and we
   observed a similar result (i.e., For ROC, AUC at 3 years = 0.7212, AUC
   at five years = 0.7392, AUC at 7 years = 0.7107) (Fig. [80]3C). We
   utilized other two RCC datasets to further validation, and the results
   bear resemblance to KIRC datasets (i.e., For E-MTAB-1980 ROC, AUC at
   three years = 0.7309, AUC at five years = 0.7455, AUC at
   7 years = 0.6919; For KIRP ROC, AUC at three years = 0.5940, AUC at
   five years = 0.6196, AUC at seven years = 0.6097) (Figure S3A and B).
   These results revealed that the KTRG risk score is practical and
   accurate for predicting outcomes of patients with kidney cancer.

Clinical relevance and robustness of KTRGs risk score

   To further explore the relationship between KTRG risk score and
   clinical variables, we matched the clinical variables (i.e.,
   pathological stage, histological grade, Gender, and Age) with a 5-gene
   KTRG expression profile in two RCC cohorts. We found that KTRG risk
   groups were not correlated with patients' age in the KIRC cohort but
   correlated with stage and grade, appearing that more advanced stage and
   grade harboured in high-risk groups (Fig. [81]4A). CXCL13 and IL20RB
   are highly enriched in high-risk groups, indicating that the expression
   of these two genes might promote disease progression. We then divided
   the clinical data into different clinical sub-groups and matched them
   with the corresponding mRNA expression profile of KTRGs. Intriguingly,
   we found that, whatever the subgroup is, patients' outcomes in the
   high-risk group are always poorer, indicating that the KTRG risk score
   is robust and steady, which could predict the prognosis of patients in
   every sub-classes of kidney cancer (Figure S4). The same analysis was
   performed in the E-MTAB-1980 cohort, and we observed the same results
   (Fig. [82]4B and Figure S5).

Fig. 4.

   [83]Fig. 4
   [84]Open in a new tab

   Clinical relevance of risk model. A Heatmap shows the five gene
   expressions in high and low KTRG risk groups and the distribution of
   clinical variables (including pathological stage, histological grade,
   gender, age, OS status, and OS time) in high and low KTRGs risk group
   in KIRC(TCGA) dataset. B Heatmap shows the 5 gene expression in high
   and low KTRG risk groups and the distribution of clinical variables
   (including pathological stage, histological grade, gender, age, OS
   status, and OS time) in high and low KTRGs risk group in the
   E-MTAB-1980 dataset. *P-value < 0.05; **P-value < 0.01;
   ***P-value < 0.001; ****P-value < 0.0001

Clinical nomogram model construction

   To develop the clinical nomogram, we initially analyzed the correlation
   between various clinical variables and the prognosis of patients with
   KIRC (Figure S7A). Our analysis revealed that additional pharmaceutical
   therapy, additional radiation therapy, age, history of neoadjuvant
   treatment, tumor size, grade, stage, chemotherapy outcomes (Progressive
   Disease [PD], Stable Disease [SD], Complete Response [CR], Partial
   Response [PR]), laterality, and risk score are significantly associated
   with patient prognosis. However, we observed that patients who received
   additional pharmaceutical therapy, radiation therapy, and neoadjuvant
   treatment exhibited a higher risk for poor outcomes, which contradicted
   our expectations. Upon further investigation into missing clinical
   data, we identified several variables with a high rate of data absence
   (> 20%) (Figure S7B). To ensure data integrity and reflect clinical
   reality, we selected a refined set of variables for further analysis,
   including age, risk score, gender, grade, laterality, and stage. In a
   multivariate Cox regression analysis, we identified the Kidney Cancer
   Risk Group (KTRG) risk score, grade, age, and stage as independent
   prognostic factors (Figure S7C). The E-MTAB-1980 dataset, which only
   included age, gender, grade, and stage variables, was used to perform a
   multivariate Cox regression analysis, reaffirming that the KTRG risk
   score remained an independent prognostic factor (Figure S7D).
   Subsequently, we subjected the variables (age, stage, grade, and KTRG
   risk score) from the KIRC dataset to stepwise regression analysis.
   Based on the lowest Akaike information criterion (AIC) value within the
   KIRC (TCGA) dataset, these four variables were deemed suitable for
   constructing a clinical nomogram model to predict the 3-, 5-, and
   7-year survival probabilities (Fig. [85]5A). The calibration curve and
   ROC curve revealed that the model possesses a high discriminative
   accuracy (AUC at three years = 0.8081, AUC at five years = 0.761, AUC
   at seven years = 0.7394) (Fig. [86]5B and C). The nomogram risk score
   was further calculated based on the established model, and the high and
   low model risk scores showed extremely significant differences in
   overall survival probability (i.e., after 100 months, the survival
   probability of patients with a high model risk score is proximately
   zero, while in low model risk score are fifty per cent or so)
   (Fig. [87]5D). The result reveals that the nomogram risk model could
   stratify the patients with different prognoses more effectively and
   discriminatively. This is indeed the case when observing decision curve
   analysis (DCA). The nomogram model shows a maximum benefit in
   predicting a 3-/5-/7-year prognosis of patients compared to other
   single variables (Fig. [88]5E). The same analysis recurred in
   E-MTAB-1980 datasets, and the results are consistent with KIRC(TCGA)
   dataset (i.e., for ROC curve, AUC at three years = 0.8974, AUC at five
   years = 0.8805, AUC at seven years = 0.8241; for survival curve, the
   survival rate of patients in high-risk group is forty per cent while in
   low-risk group are eighty per cent about, after fifty mouths) (Figure
   S6).

Fig. 5.

   [89]Fig. 5
   [90]Open in a new tab

   Nomogram model construction and model evaluation in the KIRC(TCGA)
   dataset. A The nomogram model (including risk score, age, stage, and
   grade) predicts 3-/5-/7-year survival probability in the KIRC(TCGA)
   dataset. B and C Calibration and time-dependent ROC curve of nomogram
   model. D The survival curve shows the prognostic difference between the
   high and low-risk groups based on the nomogram model. E Decision curve
   analysis compares the predictive clinical benefit of risk score, age,
   gender, grade, stage, and nomogram model for a 3-/5-/7-year prognosis

Differentially enriched pathways between high and low KTRGs risk groups

   Given the differential outcomes of the high and low KTRG risk groups,
   we try to explore the potential distinct mechanisms between these two
   groups. We first submitted the mRNA expression profile of KIRC(TCGA) to
   gene set enrichment analysis (GSEA) software, and then the potential
   cancer-related hallmark pathways were further explored. We found
   several pathways highly enriched in the high KTRGs risk group, and the
   top three are the EMT pathway, Allograft rejection, and IL6 JAK STAT3
   signalling (Fig. [91]6A). Several metabolism-related pathways are
   highly enriched in the low KTRGs risk group, such as heme metabolism,
   fatty acid metabolism, and adipogenesis (Fig. [92]6B). Further analysis
   combining the pathways activity score and risk score to conduct Pearson
   correlation coefficient, the results represent that KTRGs risk score
   highly positively correlated with MYC TARGET V2, IL6 JAK STAT3,
   ALLOGRAFT REJECTION, DNA REPAIR, and UNFOLDED PROTEIN RESPONSE, and
   negatively correlated with FATTA_ACID_METABOLISM, PROTEIN_SECRETION,
   ADIPOGENESIS, ANDROGEN_RESPONSE, HEME METOBOLISM, etc. (Figure S8).
   When analyzing the biological process, we found several immune
   processes highly enriched in high KTRG risk groups, including
   antimicrobial humoral response, antibacterial humoral response, and
   regulation of acute inflammatory response to antigenic stimulus
   (Fig. [93]6C). Furthermore, the metabolism process, in terms of lipid
   modification, lipid oxidation, and negative regulation of the amide
   metabolic process, is mainly enriched in low KTRG risk groups
   (Fig. [94]6D). Considering the metabolism-related pathways and
   processes consistently differential enrichment, we collected some
   metabolism-related signatures such as energy, amino acid, vitamin,
   nucleotide, carbohydrates, and TCA and submitted them onto GSEA
   software. The results are consistent, and all these metabolism
   signatures are highly enriched in low KTRG risk groups (Fig. [95]6E and
   F). These results suggested several hallmark pathways, metabolism
   signatures, and some immune mechanisms occurred differentially
   enrichment between the two risk groups.

Fig. 6.

   [96]Fig. 6
   [97]Open in a new tab

   Pathways enrichment analysis. A-D Top 6 enriched cancer-related
   hallmark pathways and biological process in high KTRGs group. E and F
   Metabolism-related pathways enrichments between high and low KTRGs
   groups

Immune relevance and immune therapeutic response of risk score

   Previous research pointed to the effect of Tregs on the immune system
   and even on immune therapeutic response. Respecting the differential
   enrichment of the immune process mentioned above we had analyzed, we
   further explored the relevance of immune and KTRG risk scores. We found
   the fraction of some anti-tumor immune cells, such as T cells CD8 and T
   cells CD4, is higher, while B cell naive, NK cell activated, and M1
   macrophages are lower in the high KTRGs risk group. Needless to say,
   the Tregs fraction is higher in the high-risk group. However, the
   pro-tumor immune cells—M2 macrophage fraction decreased in the
   high-risk group (Fig. [98]7A). For immune cell activity, interestingly,
   B cells, CD8 + T cells, NK cells, Cytotoxic cells, and Tregs
   consistently increase in high-risk groups. There is no doubt that the
   KTRG risk score shows the highest positive correlation with the Tregs
   activity score but shows a less positive correlation with other cells,
   such as CD8 T cells and NK cells (Fig. [99]7B). In addition, the KTRG
   risk score shows a negative correlation with several immune cell
   fractions, including monocytes, NK cells resting, and M1/M2 macrophages
   (Figure S9A). Besides Tregs, the risk score also shows a low positive
   correlation with the fraction of CD8 T cells and T cells CD4 memory
   activated. In addition, the immune checkpoint, CTLA4 and PDCD1 (PD1)
   are highly expressed in the high-risk group, but CD274 (PDL1) is highly
   expressed in the low-risk group (Figure S9B). The risk score is highly
   positively correlated with mRNA expression of PDCD1 and CTLA4 but
   negatively correlated with CD274 (Figure S9C). All in all, the
   relationship between the KTRG risk score and other immune cells is
   extremely complex, and we can not predict the potential immune
   therapeutic response among these results. Therefore, we conducted
   immune therapeutic response prediction by TIDE analysis. Notably, we
   found that most non-responders were distributed in the high-risk group,
   whereas most of the respondents were located in the low-risk group
   (Figure S9D). In addition, the risk score is higher in no-responders
   than that in responders (Figure S9E). From this humble evidence, we
   could speculate that the KTRG risk score could act as a potential
   predictor for cancer patients who receive immune therapy. We obtained
   two external immune therapy datasets (Alexandra and [100]GSE78820).
   After merging the risk score based on their own KTRGs mRNA expression
   profile and clinical variables (such as survival data and immune
   therapeutic response data), the subsequent analyses were further
   performed. Similar to the results above, the high-risk group showed a
   low survival probability in these two datasets (Fig. [101]8A and E).
   Risk score also shows high predicted accuracy (for the Alexandra
   cohort, AUC at one year = 0.6410, AUC at three years = 0.7868; for the
   [102]GSE78820 cohort, AUC at one year = 0.7072, AUC at three
   years = 0.7229) (Fig. [103]8B and F). Furthermore, the patients that
   prove to be non-responders (SD/PD) show a higher KTRGs risk score
   (Fig. [104]8C and G). In the same way, the patients in the high-risk
   group primarily harboured in the no-responder group, while those in the
   low-risk group were mainly distributed in the responder (CR/PR) group
   (Fig. [105]8D and H). These results provide potential evidence for our
   hypothesis that our KTRG risk score could be employed to act as a
   predictor for predicting immune therapeutic outcomes.

Fig. 7.

   [106]Fig. 7
   [107]Open in a new tab

   Correlation between KTRGs and immune system. A The boxplot shows the
   difference in immune cell fraction and activity score in high and low
   KTRG risk groups. B The heatmap shows a correlation between mRNA
   expression of KTRGs and immune cell activity scores. *P-value < 0.05;
   **,P-value < 0.01; ***,P-value < 0.001; ****P-value < 0.0001

Fig. 8.

   [108]Fig. 8
   [109]Open in a new tab

   Immune therapeutic response prediction. A and E Kaplain-Meier survival
   curve reveals the survival difference in the high KTRGs risk group in
   two immune therapeutic cohorts. B and F Time-dependent ROC curve of
   KTRGs risk score on predicting 1-/3-year survival. C and G KTRGs risk
   score difference between CR/PR and SD/PD cohorts in two immune
   therapeutic cohorts. D and H The distribution of high and low-risk
   groups in CR/PR and SD/PD cohorts in two immune therapeutic cohorts.
   *,P-value < 0.05; ***,P-value < 0.001; ****,P-value < 0.0001

Discussion

   The most common treatment strategy based on cancer immunology is the
   blockade of co-inhibitory molecules, also known as immune checkpoints
   (ICs). The key IC targets are Cytotoxic T lymphocyte antigen-4
   (CTLA-4), programmed cell death 1 (PD-1, also known as PDCD1) and its
   ligand PD-L1 (also known as CD274) [[110]19–[111]21]. The transferation
   of T cells to effector T cells needs T cell receptor (TCR) activation
   and CD28 co-activation. However, CTLA4 and PD1 block this process with
   their own pathways [[112]22]. Therefore, antibodies against CTLA4 and
   PD1 are used as a strategy for sustaining anti-tumour response. Though
   the immune checkpoint blockade (ICB) achieved unparalleled success,
   more than 50% of cancer patients, also including RCC patients, could
   not benefit from these therapies due to low or no response to them
   [[113]23]. Thus, it is urgent to explore the potential mechanism of
   therapeutic resistance and search for a combination treatment strategy.

   CD4 + regulatory T cells (Tregs), which express CD25 and CTLA4 on the
   cell surface and the transcription factor Forkhead box P3 (Foxp3) in
   the nucleus, play essential roles in maintaining peripheral tolerance
   and preventing autoimmunity. However, the accumulation of Tregs could
   suppress anti-tumour immunity [[114]7, [115]24]. Tregs secrete various
   immunosuppressive cytokines to lead to T cell dysfunction [[116]25]. In
   kidney cancer, research shows that Tregs could effectively inhibit
   activated T cell proliferation [[117]26]. High infiltration of Tregs is
   associated with resistance to immune therapy [[118]10]. RCC, especially
   ccRCC, is featured by rich leukocyte infiltrates, such as CD8 + T
   cells, CD4 + T cells and natural killer (NK) cells [[119]27]. It has
   been found that a high proportion of tumour-infiltrating CD8 + T cells
   was related to prolonged patient outcomes. However, the abundance of
   intratumoral CD8 + and CD4 + T cells has been correlated with worse
   patient survival, suggesting that the status of these cells is an
   essential determinant of effective anti-tumour activity [[120]28]. The
   anti-tumour function of these tumour-infiltrating lymphocytes (TILs) is
   usually inhibited by Tregs, macrophages, and neutrophils in the tumour
   microenvironment (TME). Therefore, exploring the potential mechanism of
   the inability of anti-tumour immune based on Tregs and developing the
   potential therapeutic targets on Tregs in RCC is necessary. So far,
   there are possible mechanisms related to how Tregs suppress the
   anti-tumour immune system. For instance, IL35 has been found to be a
   potentially important Treg suppressive mechanism in tumour models
   [[121]29]. T cell Ig and ITIM domain (TIGIT) is upregulated on Tregs
   [[122]30, [123]31], which marks a Tregs subpopulation that is highly
   suppressive against Th1, Th17, and CD8 + T cell function [[124]32].
   These findings provide opportunities to develop novel Treg-targeted
   interventions.

   As it had come to be known [[125]7], in our study, we also find that
   high infiltration of Tregs is associated with poor prognosis of RCC
   patients, thus luring us to excavate the Tregs-related genes further.
   Through Pearson methods and differential differential expression
   analysis, we finally identified Tregs-related genes associated with
   KIRC. Multivariate Cox regression and two machine learning methods,
   boruta and Lasso Cox regression, were further performed and finally
   determined five potentially important KTRGs, including NR3C2, WDR72,
   ALDH6A1, CXCL13, and IL20RB. Among these, several genes have been found
   to be associated with the function of Tregs. The tumour-infiltrating
   CXCL13-producing T follicular helper cells (TFH cells) differentiation
   plays a key role in converting Treg-mediated immune suppression to de
   novo activation of adaptive antitumor humoral responses in breast
   cancer [[126]33]. In gastric cancer, CXCL13 + CD8 + T cells represented
   an exhausted CD8 + T cell subset, which predicted poor overall survival
   and was associated with immunoevasive contexture with increased Tregs
   [[127]34]. Cancer-associated fibroblasts could increase the production
   of CXCL13 and reduce the availability of IL-2 by expanding a population
   of activated Treg [[128]35]. In our study, CXCL3 shows a positive
   correlation with Tregs activity and poor prognosis of KIRC, revealing
   that Tregs might synergize with CXCL3 to regulate RCC biological
   behaviours. IL20RB has been reported as positively correlated with the
   number of Tregs [[129]36], while the correlation of other genes with
   Tregs has not been reported. Therefore, there is still a need to
   explore the function and mechanism of these KTRG genes on Tregs in RCC.
   Based on the mRNA expression of these five KTRGs, a risk model was
   established, and the model was effective in predicting prognosis. It
   also showed independent prognostic factors and higher robustness by
   analyzing multiple RCC datasets. These results indicate that the 5-gene
   risk model was available and practical. EMT pathways, allograft
   rejection, and JAK/STAT3 pathways were found to be highly enriched in
   the high KTRGs risk group. In gastric cancer, miR-192-5p/RB1 could
   promote EMT and induce Treg cell differentiation by regulating IL-10
   secretion [[130]37]. In prostate cancer, STAT3 inhibition by GPB730
   could enhance the antitumoral activity of anti-CTLA4 and decrease the
   intratumoral Treg frequency [[131]38]. Previous research revealed that
   disruption of the CXCL13/CXCR5 axis by gene knockdown impaired the
   production and recruitment of Tregs [[132]39], indicating that CXCL13
   could act as a target to eliminate the abundance of Tregs. In addition,
   we found multiple metabolism-related pathways highly enriched in low
   KTRG risk groups. Intratumoral Tregs could maintain the suppressive
   functions of low-glucose, hypoxic, and acidic environment on effector T
   cell functions [[133]40], which might explain the reason for the
   dysfunction of metabolism between high and low KTRGs risk group.
   Finally, we found that the high KTRG risk group showed a worse immune
   therapeutic response, and the KTRG risk score showed a high correlation
   with Tregs activity and cell fraction, revealing that risk score could
   act as a marker of precaution about whether a patient could benefit
   from immune therapy and whether it could be necessary to change the
   treatment strategy for him.

   All in all, therapies targeting Treg cells are under active
   exploration. A combination of Treg cell attenuation with the activation
   of tumour-specific effector T cells may mutually enhance each
   individual treatment [[134]7]. We identified some potential
   Tregs-related genes, and some of them might develop to the target
   treating Tregs to combine with the anti-tumour system. We wish these
   organized data could be investigated by further pre-clinical and
   clinical research.

Limitation

   Although the KTRG risk signature shows a robust ability to predict
   kidney cancer patients' prognosis and immune response, several
   limitations remain. The construction and validation of the KTRG risk
   signatures rely on public databases, which might contain inaccuracies,
   inconsistencies, or biases. Furthermore, whether the signature can
   serve as an independent predictor of prognosis and immune therapeutic
   response for RCC patients still requires prospective studies. The
   functions and molecular mechanisms of the identified genes are based on
   bioinformatic analysis and need further pre-clinical experiments for
   validation.

Supplementary Information

   [135]12672_2025_1787_MOESM1_ESM.pdf^ (1.9MB, pdf)

   Additional file 1: Figure S1 Based on the cut-off value per gene
   expression, the KIRC data was seperated into high and low expression
   groups, the kaplain-survival curve was depicted to show the survival
   probability between the two groups. Figure S2 Cross-talk and epigenetic
   alteration of KTRGs.Protein-protein interaction network of KTRGs.mRNA
   expression correlation diagram among KTRGs.Mutation characterization of
   KTRGs in the KIRC dataset.Copy number variation characterization of
   KTRGs in the KIRC dataset. Boxplot shows the correlation between copy
   number variation and mRNA expression of corresponding genes. *, P-value
   < 0.05; **, P-value < 0.01; ***, P-value < 0.001; ****, P-value <
   0.0001. Figure S3 Risk model validation.Kaplain-Meier survival curve
   shows the overall survival distinction between high and low KTRGs risk
   model in E-MTAB-1980 and KIRP cohort. Time-dependent ROC curve shows
   predictive accuracy of risk model in 3-/5-/7-year prognosis. Heatmap
   shows the KTRGs risk signature expression in high and low KTRGs risk
   groups. Figure S4 KIRCdataset was separated into different clinical
   subgroups, the Kaplain-Meier survival curve was depicted to demonstrate
   the survival difference. Figure S5 E-MTAB-1980 dataset was separated
   into different clinical subgroups, and the Kaplain-Meier survival curve
   was depicted to demonstrate the survival difference. Figure S6 Nomogram
   model construction and model evaluation in RCCdataset.The nomogram
   modelpredicts 3-/5-/7-year survival probability in the
   RCCdataset.Calibration and time-dependent ROC curve of nomogram
   model.The survival curve shows the prognostic difference between the
   high and low-risk groups based on the nomogram model.Decision curve
   analysis compares the predictive clinical benefit of risk score, age,
   gender, grade, stage, and nomogram model for a 3-/5-/7-year prognosis.
   Figure S7Univariate Cox regression analysis based on multiple clinical
   variables.Barplot shows the data missing rate.Multivariate Cox
   regression analysis based on variables including age, gender, grade,
   stage, and KTRGs risk score in KIRCand RCCdatasets. Figure S8 Barplot
   shows the Pearson correlation coefficient between KTRGs risk score and
   activity of cancer-related hallmark pathways. Figure S9The heatmap
   shows a correlation between mRNA expression of KTRGs and immune cell
   fraction.The boxplot shows the mRNA expression difference of CTLA4,
   PDCD1, and CD274 in the high and low KTRG risk groups.Correlation
   coefficient diagram between risk score and CTLA4, PDCD1, and CD274
   expression.Distribution of a number of responders and non-responders in
   high and low-risk groups.Boxplot shows the KTRGs risk score of
   respondents and non-responders. *, P-value < 0.05; **, P-value < 0.01;
   ***, P-value < 0.001; ****, P-value < 0.0001

Acknowledgements