Abstract
Regulatory T cells (Tregs) have been found to be related to immune
therapeutic resistance in kidney cancer. However, the potential
Tregs-related genes still need to be explored. Our study found that
patients with high Tregs activity show poor prognosis. Through
co-expression and differential expression analysis, we screened several
Tregs-related genes (KTRGs) in kidney renal clear cell carcinoma. We
further conducted the univariate Cox regression analysis and determined
the prognosis-related KTRGs. Through the machine learning
algorithm—Boruta, the potentially important KTRGs were screened further
and submitted to construct a risk model. The risk model could predict
the prognosis of RCC patients well, high risk patients show a poorer
outcomes than low risk patients. Multivariate Cox regression analysis
reveals that risk score is an independent prognostic factor. Then, the
nomogram model based on KTRG risk score and other clinical variables
was further established, which shows a high predicted accuracy and
clinical benefit based on model validation methods. In addition, we
found EMT, JAK/STAT3, and immune-related pathways highly enriched in
high risk groups, while metabolism-related pathways show a low
enrichment. Through analyzing two other external immune therapeutic
datasets, we found that the risk score could predict the patient's
immune therapeutic response. High-risk groups represent a worse
therapeutic response than low-risk groups. In summary, we identified
several Tregs-related genes and constructed a risk model to predict
prognosis and immune therapeutic response. We hope these organized data
can provide a theoretical basis for exploring potential Tregs' targets
to synergize the immune therapy for RCC patients.
Supplementary Information
The online version contains supplementary material available at
10.1007/s12672-025-01787-x.
Keywords: Tregs, RCC, Machine learning, Clinical model, Immune therapy
Introduction
Renal cell carcinoma (RCC), as one of the most commonly diagnosed
urological cancer, lead to over 179000 deaths every year in the world
[[32]1]. RCC comprises several subtypes with unique characteristics.
Clear-cell RCC is the most common subtype, accounting for nearly 70% of
the cases [[33]2, [34]3]. RCC is known to be a tumour with dysfunction
of the immune system, in which the potential mechanisms are supposed to
be the block of anti-tumour immune response via immune suppressive
factors, such as checkpoint molecules [[35]4]. Targeting immune
checkpoints results in clinical responses in some patients with RCC,
but a large proportion of patients do not benefit from this treatment
[[36]4, [37]5]. Therefore, the identification of potential biomarks for
immune therapeutic response is essetial to improve the clinical
efficacy of these therapies.
It has been reported that RCC tumours are severely infiltrated by T
cells. However, the T cells could not mount effective anti-tumour
efficacy [[38]4], which might probably be due to the suppressive
function of regulatory T cells (Tregs) and myeloid cells. Regulatory T
cells (Tregs) are an indispensable subset of T lymphocytes, crucial for
maintaining immune homeostasis and preventing autoimmunity by
suppressing excessive immune responses [[39]6]. In recent years, the
common immunological basis for Treg-mediated suppression of
autoimmunity and tumour immunity has been extensively explored. The
research suggests that the mounting infiltration levels of Tregs were
related to worse clinical outcomes in multiple tumours [[40]7]. The
depletion of Tregs could elicit effective anti-tumour immunity by
eliminating immunological unresponsiveness to syngeneic tumours,
although it may also result in autoimmunity, especially if Treg cells
are depleted systemically [[41]7, [42]8]. In the immune
microenvironment of RCC, Tregs play a nonnegligible role in RCC
progression. For instance, CD4 + FOXP3 + Tregs are involved with an
increased recurrence of RCC [[43]9]. The PD1 expression on Tregs is
correlated with resistance to anti-PD1 in metastatic clear cell RCC
(mccRCC) [[44]10]. In addition, high infiltration of Tregs is also
associated with resistance to antiangiogenic therapy in mRCC [[45]11].
These results suggest that Tregs-targeted therapy might synergize with
anti-PD1 therapy for patients with RCC. So far, research has explored
the potential biomarkers of Tregs in RCC [[46]12, [47]13]. However, few
studies excavated the potential Tregs-related genes, and there is a
shortage of clinical models based on Tregs to predict the immune
therapeutic efficacy.
In our study, we screened several potentially important Tregs-related
genes in RCC using machine learning algorithms and constructed the risk
model called the KTRGs risk model. Our model could predict prognosis
and turn out to be an independent prognostic factor for patients with
RCC. Mechanism analyses reveal that the KTRG risk model might be mainly
related to EMT, JAK/STAT3, immune, and metabolism-related pathways or
biological processes. Furthermore, KTRGs risk model could predict the
immune therapeutic response well. The genes in the model might be the
potential targets of Tregs, which could be used to synergize immune
therapy. All in all, our research provided a theoretical foundation for
treating Tregs against patients with RCC.
Methods
Data acquisition
We selected datasets from patients diagnosed with RCC for our study,
including TCGA-KIRC, TCGA-KIRP, and E-MTAB-1980 datasets. Patients were
eligible for this study if their RCC samples included mRNA expression
data and clinical data (at least including age, gender, histological
grade, pathological stage, overall survival time, and overall survival
status). The mRNA experssion data of KIRC and KIRP datasets (version:
07–19-2019) were obtained from the UCSC Xena website based on TCGA
database ([48]http://xena.ucsc.edu/). KIRC contains contains 537
tumours and 72 normal samples, KIRP contains 291 tumours and 88 normal
tissue in KIRP. The mRNA expression of the E-MTAB-1980 dataset (Release
Date: 16 October 2013) was collected from the ArrayExpress database
([49]https://www.ebi.ac.uk/arrayexpress), which includes 100 tumour
samples. The immune therapeutic dataset—Alexandra cohort, was obtained
from previous research [[50]14], which comprises 25 tumour samples;
[51]GSE78820 was obtained from the GEO database
([52]https://www.ncbi.nlm.nih.gov/gds), which contains 25 tumour
samples. KIRC, KIRP, Alexandra, and [53]GSE78820 mRNA expression format
was transformed into TPM units. The E-MTAB-1980 dataset contains the
normalized mRNA expression profile that was to be downloaded. The KIRC
dataset was primarily used to construct the risk score and explore
molecular mechanisms. The KIRP and E-MTAB-1980 datasets were used to
validate the practicability of the risk score. The immune therapeutic
datasets, Alexandra and GEO cohorts, were used to explore the
predictive accuracy of the risk score on immune therapeutic response.
Pearson correlation analysis
For two successive variables, the Pearson method was used to estimate
the correlation between them by calculating the Pearson correlation
coefficient (PCC) based on “rcorr” R function from “Hmisc” R package.
The p-value was adjusted by false discovery rate (FDR) based on
“p.adjust” R function. The absolute PCC value > 0.4 and adjusted
p-value < 0.05 were considered to be a threshold to prove that the two
variables have a positive or negative correlation.
Differential genes screening
The tumour and normal mRNA expression profiles were marched, and
differential expression analysis was conducted based on the “limma”
R-package (version: 3.52.1). The Benjamini–Hochberg method was used to
adjust the P-value. The absoluted logFC (log2 fold change) > 2, and
adjust p-value < 0.05 were regarded as a threshold to estimated whether
a gene differently expressed between tumor and normal or not.
Survival analysis
Gene expression data per sample was matched with their own clinical
overall survival (OS) data based on their own unique patient ID, and
then the univariate Cox regression analysis was further performed using
the “coxph” R function from “survival” R package. The univariate Cox
regression analysis is a proportional hazards model, which produce the
hazard ratio (HR) to estimate whether the expression of a gene is a
risky or protective factor. Generally speaking, the variables with
HR > 1 and p-value < 0.05 are regarded as risky factors, while the
variables with HR < 1 and p-value < 0.05 are refered to as protective
factors. Then, the sample was separated into high and low-expression
groups based on single gene expression value, the ideal cut-off value
per gene was calculated by “surv_cutpoint” R function from the
“survminer” R package; the Kaplain-Meier survival curves were depicted
to demonstrate the survival difference between the two groups, and a
log-rank test was performed to evaluate the statistical significance
based on the “survival” R-package.
Gene set variateion analysis
Cancer-related hallmark gene sets were acquired from the MSigDB
database ([54]https://www.gsea-msigdb.org/gsea/downloads.jsp), and the
progery pathway signature was obtained from previous research [[55]15].
The gene sets and whole-genome expression profile of TCGA-KIRC were
presented to “GSVA” R package, and pathway activities per sample were
finally calculated. Activity scores of pathways and gene expression
profiles were subsequently merged by matching with the same sample
names. Then then Pearson correlation coefficients (PCC) between gene
expressions and pathway activities were calculated using the “rcorr”
function from the “Hmisc” R-package. The P-value was adjusted by false
discovery rate and transformed to -Log10(adjusted p-value) format to
represent significance. The more signficant the correlation between two
variables, the larger the size of bubble in heatmap.
Identification of important genes and KTRGs risk score construction
The KTRGs expression profile merged with clinical overall survival data
was submitted to the “Boruta” algorithm from “Boruta” R pakage. Then,
three types of KTRGs were determined, including confirmed (i.e.,
important) genes, tentative genes (i.e., probably important), and
rejected (i.e., unimportant) genes. The KIRC(TCGA) data was divided
into training and testing cohorts by a ratio of 7:3. The confirmed and
tentative genes were further utilized to construct a risk model through
lasso-cox regression analysis in the training cohort based on “glmnet”
R package. Five genes were further determined, and the risk score was
calculated using the lasso coefficient per gene times mRNA expression:
[MATH: Risk score=Coefgene1∗Expressiongene1
+...+Coefgene5∗Expressiongene5
:MATH]
Then, the risk score was further utilized to predict the overall
survival of patients. Based on the optimal cut-off value of risk score,
the high and low-risk groups were further determined in the training
cohort, and then the Kaplain-Meier survival curve was portrayed to
illustrate the survival difference between these two groups. ROC curve
represents the accuracy of riskscore on predicting 3-/5-/7-year
prognosis of the patients. Then, the same analyses were conducted in
the internal testing cohort and the other two external validation
cohorts—E-MTAB-1980 and KIRP(TCGA).
Risk model evaluation
We downloaded clinical data for KIRC, encompassing additional
pharmaceutical therapy, radiation therapy, surgery (locoregional and
metastatic procedures), age, gender, neoadjuvant treatment history,
dimensions (intermediate and longest), laterality, grade, and stage,
along with therapeutic outcomes. We matched these variables with risk
scores and performed univariate Cox regression and data missing rate
analyses. Variables with high missing rates or impractical results from
the regression were excluded from further analysis. The risk score
merged other clinical variables, including age, gender, laterality,
grade, and stage, and then the multivariate Cox regression was further
carried out. The factor with p-value < 0.05 during multivariate Cox
regression was regarded as an independent prognostic factor. Then the
data were separated into different clinical subgroups (i.e., age,
age ≤ 60 & age > 60; gender, female & male; grade, grade I-II & grade
III-IV; stage, stage I-II & stage III-IV). The risk score of each
subgroup was computed and the high and low risk groups were established
in the light of optimal cut-off value. Kaplain Meier survival curve was
portrayed to demostrate the difference between high and low risk groups
across sub-groups.
Nomogram model established and model evaluation
The variables risk score, age, grade, and stage were conducted using
multivariable Cox regression analysis, and then stepwise regression
analysis was conducted utilizing the “step” R function. The risk score,
age, grade, and stage were finally screened to construct a nomogram
model. The calibration curve, time-dependent ROC curve, and decision
curve analysis (DCA) were depicted to evaluate model accuracy and
clinical benefit based on “rms”, “survivalROC”, “ggDCA” R package,
respectively. Kaplain-Meier survival curves were portrayed to
demonstrate the survival difference between high and low-risk scores
constructed by the nomogram model.
Mechanism analyses
The GSEA software (Version: GSEA-4.0.3) was downloaded from the GSEA
website ([56]https://www.gsea-msigdb.org/gsea/downloads.jsp). The mRNA
expression profile of the KIRC(TCGA) dataset to gene set enrichment
analysis (GSEA) software. The cancer-related hallmark pathway gene sets
and gene ontology—biological progression gene sets were downloaded from
a public website ([57]https://www.gsea-msigdb.org/gsea/index.jsp). The
metabolism-related gene sets were downloaded from previous research
[[58]16]. Then, the tumor samples with whole genome experssion profile
was ranked by risk groups, high risk group was arranged in front of the
low risk group. The expression profile and gene sets were submitted to
GSEA software to estimate potentially differentially enriched pathways.
For the gene set variation analysis (GSVA), we first compute the
hallmark pathway activity score, Then the pathway activity score and
risk score were further used to calculate the Pearson correlation based
on the Pearson method.
Tumor immune infiltration analyses and immune therapeutic estimation
The gene sets representing twenty-four types of immune cell were
acquired from previous study [[59]17]. Then we applied the
single-sample gene set enrichment (ssGSEA) method based on “GSVA”
R-package to calculated the immune activity score of cells per sample.
The proportion of 22 types of immune cell per samples was evaluated by
Cell Type Identification by Estimating Relative Subsets of RNA
Transcripts (CIBERSORT). The immune cell activity score and immune cell
proportions were further used to explore the correlation with risk
score. For the immune therapeutic estimation, we first using the tumour
immune dysfunction and exclusion (TIDE) analysis, which is a public
website merged with large scale omics data and biomarkers based on the
immune checkpoint blockade (ICB) trials [[60]18]
([61]http://tide.dfci.harvard.edu). Based on the TIDE score, the
patients was referred to as responder and non-responder, which were
further used to analyze the relation to risk score.
Statistic analysis
We utilized the unpaired t-test to estimate the difference between the
two continuous variables using “t.test” R function. The chi-square test
was used for contingency table data to examine the distribution
distinction utilizing “chisq.test” R function. The Kaplain-Meier curve
was depicted to portray the survival difference between the two groups,
and the log-rank test was applied to assess the significance using
“coxph” R function from “survival” R package. We estimated the
correlation between the two continuous variables by calculating the
Pearson correlation coefficient (PCC) using “rcorr” R function from
“Hmisc” R package. For all analyses, a p-value less than 0.05 was
regarded as significant. *P-value < 0.05; **P-value < 0.01;
***P-value < 0.001; ****P-value < 0.0001.
Results
Identification of KIRC Tregs-related genes (KTRGs)
Through the ssGSEA analysis, we figured out the Tregs activity score of
KIRC. Based on the optimal cut-off value, the KIRC samples were
separated into high- and low-Treg activity groups. We observed a poor
prognosis that occurs in high Tregs activity groups (Fig. [62]1A),
which lures us to further explore the potential reason and mechanism of
Tregs on KIRC prognosis. The first thing we did was to determine the
Tregs-related genes. To do this, we calculated the Pearson correlation
coefficients (PCC) between each gene expression and Tregs score based
on the whole genome expression profile of KIRC(TCGA). Those with
absolute PCC > 0.4 and P-value < 0.05 were identified as Treg-related
genes (Fig. [63]1B). Then we carried out differential expression
analysis by comparing the tumour with the normal expression profile of
KIRC with a filtering threshold based on absolute log2(Fold Change) > 2
and adjusted P-value < 0.05, differential expression genes (DEGs) were
finally screened (Fig. [64]1B). 76 KIRC Tregs-related genes (KTRGs)
were eventually determined by overlapping Treg-related genes and DEGs.
The heatmap revealed that these genes' expressions could rightly
discriminate between tumour and normal samples (Fig. [65]1C). When the
KTRGs expression profile was conducted dimensionality reduction by
t-distributed Stochastic Neighbor Embedding (t-SNE), we also observed
tumour and normal samples harboured in different areas in a
two-dimension axis, indicating that the expression of these genes could
distinguish the tumour and the normal tissues well (Fig. [66]1D). We
then presented the mRNA expression profile of KTRGs to clinical
survival data, and a univariate Cox regression analysis was
subsequently performed. 22 genes of 76 KTRGs were estimated to be
associated with the overall survival of KIRC. Among these, SPI1, TYMP,
LILRB1, IL20RB, ITGAX, C1QL1, FCER1G, SLAMF8, TGFBI, CCL5, and CXCL13
were risky genes to the prognosis of KIRC patients, high expression of
which represent a worse survival probability (Fig. [67]1E and Figure
S1). While OGDHL, MTURN, HSD11B2, PPARGC1A, LDHD, FREM1, NE3C2, TRIM2,
SLAMF8, AGPAT9, and WDR72 were protective genes to OS of the patients
and low expression of which represent a worse clinical outcome. In the
protein–protein interaction (PPI) network, HSD11B2, NR3C2, and PPARGC1A
show a protein interaction, SLAMF8, FCER1G, ITGAX, LILRB1, SPI1, CCL5,
and CXCL13 show a protein interaction, while others do not show a
relationship (Figure S2A). In mRNA levels, WDR72, MTURN, NR3C2, AGPAT9,
TRIM2, PPARGC1A, OGFHL, LDHD, FREM1, and HSD11B2 show positive
correlation with each other and negative correlation with other genes
(Figure S2B). We further analyzed the epigenetic characterization, and
we found that the mutation rate of all KTRGs is low (i.e., mutation
rate < 5%), but the copy number variation (CNV) of some is exuberant
(Figure S2C and D). Among these, TGFBI, SLAMF8, MTURN, LILRB1, LDHD,
ITGAX, IL20RB, and HSD11B2 show high amplification (Amp) rate, TRIM2,
OGDHL, NR3C2, IL20RB, FCER1G, and ALDH6A1 show high deletion (Del)
rate. The expression of IL20RB, TRIM2, NR3C2, CXCL13, SLAMF8, and
FCER1G are affected by copy number variation, while other genes do not
show a significant effect.
Fig. 1.
[68]Fig. 1
[69]Open in a new tab
Identification of KIRC Tregs-related genes. A Kaplain-Meier survival
curve reveals the survival difference of the patients in high and low
Tregs activity groups. B Schematic shows the process about screening
KTRGs. C Heatmap shows the expression difference of KTRGs between tumor
and normal groups. D The t-SNE analysis based on the expression of
KTRGs between normal and tumor tissues. E The forest plot shows the
prognosis-related KTRGs determined by the univariate Cox regression
analysis
Clinical relevance and mechanisms of KTRGs
Apart from the relationship between prognosis and KTRGs mentioned, we
matched the clinical stage and grade data and mRNA expression profile
of KTRGs. Analysis of variance (ANOVA) was performed to estimate the
difference of genes in different groups (i.e., Grade I—IV). Gene
expression levels were shown by heatmap. We found OGDHL, PPARGC1A,
LDHD, AGPAT9, ALDH6A1, WDR72, HSD11B2, FREM1, MTURN, NR3C2, and TRIM2
decrease little by little with the advance of grade and stage, whereas
TGFBI, IL20RB, C1QL1, CCL5, CXCL13, TYMP, ITGAX, LILRB1, SLAMF8, SPI1,
and FCER1G, gradually increase with development of disease (Fig. [70]2A
and B). Through analyzing the correlation between 50 hallmark pathway
activities and KTRGs expression, we found several genes (i.e., TGFBI,
SPI1, SLAMF8, LILRB1, etc.) expression positively correlated with
cancer-related pathways such as G2M checkpoint, E2F targets, Hedgehog
signalling, Wnt pathways, P53 pathways, etc. PPARGC1A, OGDHL, ALDH6A1,
and AGPAT9 have a positive correlation with adipogenesis, fatty acid
metabolism, heme metabolism, bile acid metabolism, androgen response
(Fig. [71]2C). PROGENy algorithm (Pathway RespOnsive GENes) concludes
several cell-specific signalling pathways [[72]15]. Consistently,
TGFBI, SPI1, SLAMF8, LILRB1, FCER1G, CXCL13, CCL5, and C1QL1 show a
positive correlation with most cancer-related pathways. WDR72, TRIM2,
PPARGC1A, FREM1, ALDH6A1, and AGPAT9 show a negative correlation with
most cancer pathways but a positive correlation with androgen pathways
(Fig. [73]2D). All in all, these KTRG expressions are related to tumour
progression and might function differently by regulating specific
metabolisms.
Fig. 2.
[74]Fig. 2
[75]Open in a new tab
Clinical relevance and mechanism analysis of KTRGs. A The heatmap shows
the expression of KTRGs across different grades. B The heatmap shows
the expression of KTRGs across different stages. C The heat map shows
the correlation between KTRGs and activity scores in cancer-related
hallmark pathways. D The heat map shows the correlation between KTRGs
and the activity score of the progery pathway signature. The red bubble
represents a positive correlation; the blue bubble represents a
negative correlation. The size of the bubble shows significance; the
larger the size, the more significance it has. *P-value < 0.05;
**P-value < 0.01; ***P-value < 0.001; ****P-value < 0.0001
KTRGs risk signature construction
To further identify the potential important KTRGs, we matched the KTRGs
expression profile with KIRC clinical data, and then the data was
submitted to the Boruta algorithm. Three types of KTRGs, including
confirmed (i.e., important) genes, tentative genes (i.e., probably
important), and rejected (i.e., unimportant) genes, were finally
determined. Among these, CXCL13, WDR72, TRIM2, ALDH6A1, NR3C2,
PPARGC1A, and IL20RB were defined as confirmed genes, SLAMF8, FCER1G,
LDHD, MTURN, TYMP, and OGDHL were defined as tentative genes, while
others were considered as rejected genes (Fig. [76]3A). Then, we
separated the KIRC(TCGA) datasets into two cohorts, training and
testing cohorts, according to the ratio of 7:3. We dismissed the
rejected genes and abstracted the expression profile of confirmed and
tentative genes from the training cohort to conduct lasso-cox
regression analysis. Five genes were finally screened, including NR3C2,
WDR72, ALDH6A1, CXCL13, and IL20RB. Interestingly, the genes identified
by the lasso-cox algorithm are all confirmed genes, indicating the
importance of these genes for prognosis under these two algorithms.
Based on the lasso coefficient per gene, a risk score was finally
calculated according to the equation:
[MATH: KTRGs risk score=0.01755748∗CXCL13<
/mtext>-0.15617289∗WDR72
mtext>-0.04032272∗ALDH6A1
-0.16222042∗NR3C2
mtext>+0.01788896∗IL20R
B :MATH]
Fig. 3.
[77]Fig. 3
[78]Open in a new tab
KTRGs risk model construction. A Schematic shows the process about
constructin KTRGs risk model. B and C Kaplain-Meier survival curve
shows the overall survival distinction between high and low KTRGs risk
model in KIRC(TCGA) training and testing cohort. Time-dependent ROC
curve shows predictive accuracy of risk model in 3-/5-/7-year
prognosis. Heatmap shows the KTRGs risk signature expression in high
and low KTRGs risk groups
According to the ideal cut-off value estimated by the surv_cutpoint
function from the survminer R-package, the KIRC training cohort was
divided into high and low-expression groups. Survival curve analysis
shows that high risk group patients are faced with the problem of low
survival probability (Fig. [79]3B). From ROC curve analysis, we
perceived the area under curves (AUC) at 3-/5-/7-year were 0.6915,
0.7025, and 0.7050, indicating that the KTRG risk score is highly
accurate in predicting the prognosis of KIRC patients. The heat map
shows that NR3C2, WDR72, and ALDH6A are highly enriched in the low-risk
group, while CXCL13 and IL20RB are highly enriched in the high-risk
group. The same analysis was performed in KIRC testing cohort, and we
observed a similar result (i.e., For ROC, AUC at 3 years = 0.7212, AUC
at five years = 0.7392, AUC at 7 years = 0.7107) (Fig. [80]3C). We
utilized other two RCC datasets to further validation, and the results
bear resemblance to KIRC datasets (i.e., For E-MTAB-1980 ROC, AUC at
three years = 0.7309, AUC at five years = 0.7455, AUC at
7 years = 0.6919; For KIRP ROC, AUC at three years = 0.5940, AUC at
five years = 0.6196, AUC at seven years = 0.6097) (Figure S3A and B).
These results revealed that the KTRG risk score is practical and
accurate for predicting outcomes of patients with kidney cancer.
Clinical relevance and robustness of KTRGs risk score
To further explore the relationship between KTRG risk score and
clinical variables, we matched the clinical variables (i.e.,
pathological stage, histological grade, Gender, and Age) with a 5-gene
KTRG expression profile in two RCC cohorts. We found that KTRG risk
groups were not correlated with patients' age in the KIRC cohort but
correlated with stage and grade, appearing that more advanced stage and
grade harboured in high-risk groups (Fig. [81]4A). CXCL13 and IL20RB
are highly enriched in high-risk groups, indicating that the expression
of these two genes might promote disease progression. We then divided
the clinical data into different clinical sub-groups and matched them
with the corresponding mRNA expression profile of KTRGs. Intriguingly,
we found that, whatever the subgroup is, patients' outcomes in the
high-risk group are always poorer, indicating that the KTRG risk score
is robust and steady, which could predict the prognosis of patients in
every sub-classes of kidney cancer (Figure S4). The same analysis was
performed in the E-MTAB-1980 cohort, and we observed the same results
(Fig. [82]4B and Figure S5).
Fig. 4.
[83]Fig. 4
[84]Open in a new tab
Clinical relevance of risk model. A Heatmap shows the five gene
expressions in high and low KTRG risk groups and the distribution of
clinical variables (including pathological stage, histological grade,
gender, age, OS status, and OS time) in high and low KTRGs risk group
in KIRC(TCGA) dataset. B Heatmap shows the 5 gene expression in high
and low KTRG risk groups and the distribution of clinical variables
(including pathological stage, histological grade, gender, age, OS
status, and OS time) in high and low KTRGs risk group in the
E-MTAB-1980 dataset. *P-value < 0.05; **P-value < 0.01;
***P-value < 0.001; ****P-value < 0.0001
Clinical nomogram model construction
To develop the clinical nomogram, we initially analyzed the correlation
between various clinical variables and the prognosis of patients with
KIRC (Figure S7A). Our analysis revealed that additional pharmaceutical
therapy, additional radiation therapy, age, history of neoadjuvant
treatment, tumor size, grade, stage, chemotherapy outcomes (Progressive
Disease [PD], Stable Disease [SD], Complete Response [CR], Partial
Response [PR]), laterality, and risk score are significantly associated
with patient prognosis. However, we observed that patients who received
additional pharmaceutical therapy, radiation therapy, and neoadjuvant
treatment exhibited a higher risk for poor outcomes, which contradicted
our expectations. Upon further investigation into missing clinical
data, we identified several variables with a high rate of data absence
(> 20%) (Figure S7B). To ensure data integrity and reflect clinical
reality, we selected a refined set of variables for further analysis,
including age, risk score, gender, grade, laterality, and stage. In a
multivariate Cox regression analysis, we identified the Kidney Cancer
Risk Group (KTRG) risk score, grade, age, and stage as independent
prognostic factors (Figure S7C). The E-MTAB-1980 dataset, which only
included age, gender, grade, and stage variables, was used to perform a
multivariate Cox regression analysis, reaffirming that the KTRG risk
score remained an independent prognostic factor (Figure S7D).
Subsequently, we subjected the variables (age, stage, grade, and KTRG
risk score) from the KIRC dataset to stepwise regression analysis.
Based on the lowest Akaike information criterion (AIC) value within the
KIRC (TCGA) dataset, these four variables were deemed suitable for
constructing a clinical nomogram model to predict the 3-, 5-, and
7-year survival probabilities (Fig. [85]5A). The calibration curve and
ROC curve revealed that the model possesses a high discriminative
accuracy (AUC at three years = 0.8081, AUC at five years = 0.761, AUC
at seven years = 0.7394) (Fig. [86]5B and C). The nomogram risk score
was further calculated based on the established model, and the high and
low model risk scores showed extremely significant differences in
overall survival probability (i.e., after 100 months, the survival
probability of patients with a high model risk score is proximately
zero, while in low model risk score are fifty per cent or so)
(Fig. [87]5D). The result reveals that the nomogram risk model could
stratify the patients with different prognoses more effectively and
discriminatively. This is indeed the case when observing decision curve
analysis (DCA). The nomogram model shows a maximum benefit in
predicting a 3-/5-/7-year prognosis of patients compared to other
single variables (Fig. [88]5E). The same analysis recurred in
E-MTAB-1980 datasets, and the results are consistent with KIRC(TCGA)
dataset (i.e., for ROC curve, AUC at three years = 0.8974, AUC at five
years = 0.8805, AUC at seven years = 0.8241; for survival curve, the
survival rate of patients in high-risk group is forty per cent while in
low-risk group are eighty per cent about, after fifty mouths) (Figure
S6).
Fig. 5.
[89]Fig. 5
[90]Open in a new tab
Nomogram model construction and model evaluation in the KIRC(TCGA)
dataset. A The nomogram model (including risk score, age, stage, and
grade) predicts 3-/5-/7-year survival probability in the KIRC(TCGA)
dataset. B and C Calibration and time-dependent ROC curve of nomogram
model. D The survival curve shows the prognostic difference between the
high and low-risk groups based on the nomogram model. E Decision curve
analysis compares the predictive clinical benefit of risk score, age,
gender, grade, stage, and nomogram model for a 3-/5-/7-year prognosis
Differentially enriched pathways between high and low KTRGs risk groups
Given the differential outcomes of the high and low KTRG risk groups,
we try to explore the potential distinct mechanisms between these two
groups. We first submitted the mRNA expression profile of KIRC(TCGA) to
gene set enrichment analysis (GSEA) software, and then the potential
cancer-related hallmark pathways were further explored. We found
several pathways highly enriched in the high KTRGs risk group, and the
top three are the EMT pathway, Allograft rejection, and IL6 JAK STAT3
signalling (Fig. [91]6A). Several metabolism-related pathways are
highly enriched in the low KTRGs risk group, such as heme metabolism,
fatty acid metabolism, and adipogenesis (Fig. [92]6B). Further analysis
combining the pathways activity score and risk score to conduct Pearson
correlation coefficient, the results represent that KTRGs risk score
highly positively correlated with MYC TARGET V2, IL6 JAK STAT3,
ALLOGRAFT REJECTION, DNA REPAIR, and UNFOLDED PROTEIN RESPONSE, and
negatively correlated with FATTA_ACID_METABOLISM, PROTEIN_SECRETION,
ADIPOGENESIS, ANDROGEN_RESPONSE, HEME METOBOLISM, etc. (Figure S8).
When analyzing the biological process, we found several immune
processes highly enriched in high KTRG risk groups, including
antimicrobial humoral response, antibacterial humoral response, and
regulation of acute inflammatory response to antigenic stimulus
(Fig. [93]6C). Furthermore, the metabolism process, in terms of lipid
modification, lipid oxidation, and negative regulation of the amide
metabolic process, is mainly enriched in low KTRG risk groups
(Fig. [94]6D). Considering the metabolism-related pathways and
processes consistently differential enrichment, we collected some
metabolism-related signatures such as energy, amino acid, vitamin,
nucleotide, carbohydrates, and TCA and submitted them onto GSEA
software. The results are consistent, and all these metabolism
signatures are highly enriched in low KTRG risk groups (Fig. [95]6E and
F). These results suggested several hallmark pathways, metabolism
signatures, and some immune mechanisms occurred differentially
enrichment between the two risk groups.
Fig. 6.
[96]Fig. 6
[97]Open in a new tab
Pathways enrichment analysis. A-D Top 6 enriched cancer-related
hallmark pathways and biological process in high KTRGs group. E and F
Metabolism-related pathways enrichments between high and low KTRGs
groups
Immune relevance and immune therapeutic response of risk score
Previous research pointed to the effect of Tregs on the immune system
and even on immune therapeutic response. Respecting the differential
enrichment of the immune process mentioned above we had analyzed, we
further explored the relevance of immune and KTRG risk scores. We found
the fraction of some anti-tumor immune cells, such as T cells CD8 and T
cells CD4, is higher, while B cell naive, NK cell activated, and M1
macrophages are lower in the high KTRGs risk group. Needless to say,
the Tregs fraction is higher in the high-risk group. However, the
pro-tumor immune cells—M2 macrophage fraction decreased in the
high-risk group (Fig. [98]7A). For immune cell activity, interestingly,
B cells, CD8 + T cells, NK cells, Cytotoxic cells, and Tregs
consistently increase in high-risk groups. There is no doubt that the
KTRG risk score shows the highest positive correlation with the Tregs
activity score but shows a less positive correlation with other cells,
such as CD8 T cells and NK cells (Fig. [99]7B). In addition, the KTRG
risk score shows a negative correlation with several immune cell
fractions, including monocytes, NK cells resting, and M1/M2 macrophages
(Figure S9A). Besides Tregs, the risk score also shows a low positive
correlation with the fraction of CD8 T cells and T cells CD4 memory
activated. In addition, the immune checkpoint, CTLA4 and PDCD1 (PD1)
are highly expressed in the high-risk group, but CD274 (PDL1) is highly
expressed in the low-risk group (Figure S9B). The risk score is highly
positively correlated with mRNA expression of PDCD1 and CTLA4 but
negatively correlated with CD274 (Figure S9C). All in all, the
relationship between the KTRG risk score and other immune cells is
extremely complex, and we can not predict the potential immune
therapeutic response among these results. Therefore, we conducted
immune therapeutic response prediction by TIDE analysis. Notably, we
found that most non-responders were distributed in the high-risk group,
whereas most of the respondents were located in the low-risk group
(Figure S9D). In addition, the risk score is higher in no-responders
than that in responders (Figure S9E). From this humble evidence, we
could speculate that the KTRG risk score could act as a potential
predictor for cancer patients who receive immune therapy. We obtained
two external immune therapy datasets (Alexandra and [100]GSE78820).
After merging the risk score based on their own KTRGs mRNA expression
profile and clinical variables (such as survival data and immune
therapeutic response data), the subsequent analyses were further
performed. Similar to the results above, the high-risk group showed a
low survival probability in these two datasets (Fig. [101]8A and E).
Risk score also shows high predicted accuracy (for the Alexandra
cohort, AUC at one year = 0.6410, AUC at three years = 0.7868; for the
[102]GSE78820 cohort, AUC at one year = 0.7072, AUC at three
years = 0.7229) (Fig. [103]8B and F). Furthermore, the patients that
prove to be non-responders (SD/PD) show a higher KTRGs risk score
(Fig. [104]8C and G). In the same way, the patients in the high-risk
group primarily harboured in the no-responder group, while those in the
low-risk group were mainly distributed in the responder (CR/PR) group
(Fig. [105]8D and H). These results provide potential evidence for our
hypothesis that our KTRG risk score could be employed to act as a
predictor for predicting immune therapeutic outcomes.
Fig. 7.
[106]Fig. 7
[107]Open in a new tab
Correlation between KTRGs and immune system. A The boxplot shows the
difference in immune cell fraction and activity score in high and low
KTRG risk groups. B The heatmap shows a correlation between mRNA
expression of KTRGs and immune cell activity scores. *P-value < 0.05;
**,P-value < 0.01; ***,P-value < 0.001; ****P-value < 0.0001
Fig. 8.
[108]Fig. 8
[109]Open in a new tab
Immune therapeutic response prediction. A and E Kaplain-Meier survival
curve reveals the survival difference in the high KTRGs risk group in
two immune therapeutic cohorts. B and F Time-dependent ROC curve of
KTRGs risk score on predicting 1-/3-year survival. C and G KTRGs risk
score difference between CR/PR and SD/PD cohorts in two immune
therapeutic cohorts. D and H The distribution of high and low-risk
groups in CR/PR and SD/PD cohorts in two immune therapeutic cohorts.
*,P-value < 0.05; ***,P-value < 0.001; ****,P-value < 0.0001
Discussion
The most common treatment strategy based on cancer immunology is the
blockade of co-inhibitory molecules, also known as immune checkpoints
(ICs). The key IC targets are Cytotoxic T lymphocyte antigen-4
(CTLA-4), programmed cell death 1 (PD-1, also known as PDCD1) and its
ligand PD-L1 (also known as CD274) [[110]19–[111]21]. The transferation
of T cells to effector T cells needs T cell receptor (TCR) activation
and CD28 co-activation. However, CTLA4 and PD1 block this process with
their own pathways [[112]22]. Therefore, antibodies against CTLA4 and
PD1 are used as a strategy for sustaining anti-tumour response. Though
the immune checkpoint blockade (ICB) achieved unparalleled success,
more than 50% of cancer patients, also including RCC patients, could
not benefit from these therapies due to low or no response to them
[[113]23]. Thus, it is urgent to explore the potential mechanism of
therapeutic resistance and search for a combination treatment strategy.
CD4 + regulatory T cells (Tregs), which express CD25 and CTLA4 on the
cell surface and the transcription factor Forkhead box P3 (Foxp3) in
the nucleus, play essential roles in maintaining peripheral tolerance
and preventing autoimmunity. However, the accumulation of Tregs could
suppress anti-tumour immunity [[114]7, [115]24]. Tregs secrete various
immunosuppressive cytokines to lead to T cell dysfunction [[116]25]. In
kidney cancer, research shows that Tregs could effectively inhibit
activated T cell proliferation [[117]26]. High infiltration of Tregs is
associated with resistance to immune therapy [[118]10]. RCC, especially
ccRCC, is featured by rich leukocyte infiltrates, such as CD8 + T
cells, CD4 + T cells and natural killer (NK) cells [[119]27]. It has
been found that a high proportion of tumour-infiltrating CD8 + T cells
was related to prolonged patient outcomes. However, the abundance of
intratumoral CD8 + and CD4 + T cells has been correlated with worse
patient survival, suggesting that the status of these cells is an
essential determinant of effective anti-tumour activity [[120]28]. The
anti-tumour function of these tumour-infiltrating lymphocytes (TILs) is
usually inhibited by Tregs, macrophages, and neutrophils in the tumour
microenvironment (TME). Therefore, exploring the potential mechanism of
the inability of anti-tumour immune based on Tregs and developing the
potential therapeutic targets on Tregs in RCC is necessary. So far,
there are possible mechanisms related to how Tregs suppress the
anti-tumour immune system. For instance, IL35 has been found to be a
potentially important Treg suppressive mechanism in tumour models
[[121]29]. T cell Ig and ITIM domain (TIGIT) is upregulated on Tregs
[[122]30, [123]31], which marks a Tregs subpopulation that is highly
suppressive against Th1, Th17, and CD8 + T cell function [[124]32].
These findings provide opportunities to develop novel Treg-targeted
interventions.
As it had come to be known [[125]7], in our study, we also find that
high infiltration of Tregs is associated with poor prognosis of RCC
patients, thus luring us to excavate the Tregs-related genes further.
Through Pearson methods and differential differential expression
analysis, we finally identified Tregs-related genes associated with
KIRC. Multivariate Cox regression and two machine learning methods,
boruta and Lasso Cox regression, were further performed and finally
determined five potentially important KTRGs, including NR3C2, WDR72,
ALDH6A1, CXCL13, and IL20RB. Among these, several genes have been found
to be associated with the function of Tregs. The tumour-infiltrating
CXCL13-producing T follicular helper cells (TFH cells) differentiation
plays a key role in converting Treg-mediated immune suppression to de
novo activation of adaptive antitumor humoral responses in breast
cancer [[126]33]. In gastric cancer, CXCL13 + CD8 + T cells represented
an exhausted CD8 + T cell subset, which predicted poor overall survival
and was associated with immunoevasive contexture with increased Tregs
[[127]34]. Cancer-associated fibroblasts could increase the production
of CXCL13 and reduce the availability of IL-2 by expanding a population
of activated Treg [[128]35]. In our study, CXCL3 shows a positive
correlation with Tregs activity and poor prognosis of KIRC, revealing
that Tregs might synergize with CXCL3 to regulate RCC biological
behaviours. IL20RB has been reported as positively correlated with the
number of Tregs [[129]36], while the correlation of other genes with
Tregs has not been reported. Therefore, there is still a need to
explore the function and mechanism of these KTRG genes on Tregs in RCC.
Based on the mRNA expression of these five KTRGs, a risk model was
established, and the model was effective in predicting prognosis. It
also showed independent prognostic factors and higher robustness by
analyzing multiple RCC datasets. These results indicate that the 5-gene
risk model was available and practical. EMT pathways, allograft
rejection, and JAK/STAT3 pathways were found to be highly enriched in
the high KTRGs risk group. In gastric cancer, miR-192-5p/RB1 could
promote EMT and induce Treg cell differentiation by regulating IL-10
secretion [[130]37]. In prostate cancer, STAT3 inhibition by GPB730
could enhance the antitumoral activity of anti-CTLA4 and decrease the
intratumoral Treg frequency [[131]38]. Previous research revealed that
disruption of the CXCL13/CXCR5 axis by gene knockdown impaired the
production and recruitment of Tregs [[132]39], indicating that CXCL13
could act as a target to eliminate the abundance of Tregs. In addition,
we found multiple metabolism-related pathways highly enriched in low
KTRG risk groups. Intratumoral Tregs could maintain the suppressive
functions of low-glucose, hypoxic, and acidic environment on effector T
cell functions [[133]40], which might explain the reason for the
dysfunction of metabolism between high and low KTRGs risk group.
Finally, we found that the high KTRG risk group showed a worse immune
therapeutic response, and the KTRG risk score showed a high correlation
with Tregs activity and cell fraction, revealing that risk score could
act as a marker of precaution about whether a patient could benefit
from immune therapy and whether it could be necessary to change the
treatment strategy for him.
All in all, therapies targeting Treg cells are under active
exploration. A combination of Treg cell attenuation with the activation
of tumour-specific effector T cells may mutually enhance each
individual treatment [[134]7]. We identified some potential
Tregs-related genes, and some of them might develop to the target
treating Tregs to combine with the anti-tumour system. We wish these
organized data could be investigated by further pre-clinical and
clinical research.
Limitation
Although the KTRG risk signature shows a robust ability to predict
kidney cancer patients' prognosis and immune response, several
limitations remain. The construction and validation of the KTRG risk
signatures rely on public databases, which might contain inaccuracies,
inconsistencies, or biases. Furthermore, whether the signature can
serve as an independent predictor of prognosis and immune therapeutic
response for RCC patients still requires prospective studies. The
functions and molecular mechanisms of the identified genes are based on
bioinformatic analysis and need further pre-clinical experiments for
validation.
Supplementary Information
[135]12672_2025_1787_MOESM1_ESM.pdf^ (1.9MB, pdf)
Additional file 1: Figure S1 Based on the cut-off value per gene
expression, the KIRC data was seperated into high and low expression
groups, the kaplain-survival curve was depicted to show the survival
probability between the two groups. Figure S2 Cross-talk and epigenetic
alteration of KTRGs.Protein-protein interaction network of KTRGs.mRNA
expression correlation diagram among KTRGs.Mutation characterization of
KTRGs in the KIRC dataset.Copy number variation characterization of
KTRGs in the KIRC dataset. Boxplot shows the correlation between copy
number variation and mRNA expression of corresponding genes. *, P-value
< 0.05; **, P-value < 0.01; ***, P-value < 0.001; ****, P-value <
0.0001. Figure S3 Risk model validation.Kaplain-Meier survival curve
shows the overall survival distinction between high and low KTRGs risk
model in E-MTAB-1980 and KIRP cohort. Time-dependent ROC curve shows
predictive accuracy of risk model in 3-/5-/7-year prognosis. Heatmap
shows the KTRGs risk signature expression in high and low KTRGs risk
groups. Figure S4 KIRCdataset was separated into different clinical
subgroups, the Kaplain-Meier survival curve was depicted to demonstrate
the survival difference. Figure S5 E-MTAB-1980 dataset was separated
into different clinical subgroups, and the Kaplain-Meier survival curve
was depicted to demonstrate the survival difference. Figure S6 Nomogram
model construction and model evaluation in RCCdataset.The nomogram
modelpredicts 3-/5-/7-year survival probability in the
RCCdataset.Calibration and time-dependent ROC curve of nomogram
model.The survival curve shows the prognostic difference between the
high and low-risk groups based on the nomogram model.Decision curve
analysis compares the predictive clinical benefit of risk score, age,
gender, grade, stage, and nomogram model for a 3-/5-/7-year prognosis.
Figure S7Univariate Cox regression analysis based on multiple clinical
variables.Barplot shows the data missing rate.Multivariate Cox
regression analysis based on variables including age, gender, grade,
stage, and KTRGs risk score in KIRCand RCCdatasets. Figure S8 Barplot
shows the Pearson correlation coefficient between KTRGs risk score and
activity of cancer-related hallmark pathways. Figure S9The heatmap
shows a correlation between mRNA expression of KTRGs and immune cell
fraction.The boxplot shows the mRNA expression difference of CTLA4,
PDCD1, and CD274 in the high and low KTRG risk groups.Correlation
coefficient diagram between risk score and CTLA4, PDCD1, and CD274
expression.Distribution of a number of responders and non-responders in
high and low-risk groups.Boxplot shows the KTRGs risk score of
respondents and non-responders. *, P-value < 0.05; **, P-value < 0.01;
***, P-value < 0.001; ****, P-value < 0.0001
Acknowledgements