Abstract
Clear cell renal cell carcinoma (ccRCC) is the most prevalent type of
renal cell carcinoma. However, our understanding of ccRCC risk genes
remains limited. This gap in knowledge poses challenges to the
effective diagnosis and treatment of ccRCC. To address this problem, we
propose a deep reinforcement learning-based computational approach
named RL-GenRisk to identify ccRCC risk genes. Distinct from
traditional supervised models, RL-GenRisk frames the identification of
ccRCC risk genes as a Markov Decision Process, combining the graph
convolutional network and Deep Q-Network for risk gene identification.
Moreover, a well-designed data-driven reward is proposed for mitigating
the limitation of scant known risk genes. The evaluation demonstrates
that RL-GenRisk outperforms existing methods in ccRCC risk gene
identification. Additionally, RL-GenRisk identifies eight potential
ccRCC risk genes. We successfully validated epidermal growth factor
receptor (EGFR) and piccolo presynaptic cytomatrix protein (PCLO),
corroborated through independent datasets and biological
experimentation. This approach may also be used for other diseases in
the future.
Subject terms: Machine learning, Tumour biomarkers, Renal cell
carcinoma
__________________________________________________________________
The genetic risk of clear cell renal cell carcinoma is complex and the
genes related are unclear. Here, the authors propose a deep
reinforcement based model to identify risk genes.
Introduction
Renal cell carcinoma (RCC), one of the most common cancers worldwide,
is a type of kidney cancer that initiates in the lining of the proximal
convoluted tubule^[44]1,[45]2. Clear cell renal cell carcinoma (ccRCC)
constitutes 80% of all RCC cases and is particularly aggressive due to
its high immune infiltration^[46]3–[47]5. In addition, over 30% of
ccRCC patients suffer from metastasis, which is a significant factor
leading to death in ccRCC patients^[48]6–[49]8. Although several drugs
have been utilized for the treatment of ccRCC, the efficacy is still
limited due to the heterogeneity of ccRCC^[50]9–[51]11. Therefore, it
is necessary to understand the pathogenesis and identify risk genes of
ccRCC, which may be beneficial for early diagnosis and treatment of
ccRCC^[52]12–[53]14.
Cancer is a complex genetic disorder. Its occurrence and progression
are associated with the accumulation of driver genetic mutations that
provide a selective growth advantage to cells^[54]15,[55]16.
Consequently, one class of methods to identify cancer risk genes is
based on mutation data. In the past years, several cancer sequencing
projects have generated mutation data from thousands of cancer
patients, enhancing the identification of cancer risk
genes^[56]15,[57]17. Traditional statistical approaches focus on genes
with a higher mutation frequency in the patient cohort than the control
cohort^[58]18. Youn et al.^[59]19 employed the functional impact of
mutations on proteins, variations in background mutation rates among
tumors, and the redundancy of the genetic code in tumor genome
sequencing data to identify key genes in non-small cell lung cancer.
Methods like MuSiC^[60]20, OncodriveCLUST^[61]21, and MutSigCV^[62]22
identified cancer risk genes by comparing the observed gene mutations
with the predefined background mutation frequency. By now,
frequency-based methods have identified many cancer risk genes and
enhanced cancer diagnosis and therapy^[63]23. However, the genetic
foundations of cancer are highly diverse. Except for genes mutated
across a large number of patients, some key genes in tumor initiation
and progression are observed to be mutated in only a few
patients^[64]24. For example, PIK3CA, which has been validated to be a
ccRCC risk gene by previous studies^[65]25–[66]27, is mutated in no
more than 5% of ccRCC patients^[67]27. It is difficult for purely
frequency-based approaches to identify these genes with low mutation
frequency but high risk.
To address the drawback of frequency-based methods, the interactions
among proteins are introduced for cancer risk gene identification,
since genes involved in the same signaling and regulatory pathways as
well as protein complexes may interact to exert their effects together.
Muffinn^[68]28 identified cancer risk genes through network
propagation, taking into account mutations not only in individual genes
but also in their neighbors within the protein-protein interaction
(PPI) network. DiSCaGe^[69]29 calculated a gene mutation score using an
asymmetric spreading strength based on the type of mutations and the
PPI network, then produced a ranking of prioritized cancer risk genes.
HotNet2^[70]30 used an insulated heat diffusion process to identify
cancer risk genes by propagating heat through the PPI network.
nCOP^[71]31 employed a heuristic search method to select connected
subnetworks from the PPI network based on the mutation data of cancer
patients, and then ranked cancer risk genes based on the frequencies of
genes appearing in these subnetworks. Both aforementioned methods are
unsupervised, which may suffer from the highly diverse genetic
foundations of cancer or the noise in the PPI network^[72]32. Recently,
several supervised methods have emerged as potentially valuable tools
for predicting cancer risk genes^[73]23,[74]33–[75]36. For example,
Agajanian Steve et al.^[76]34 trained a random forest classifier to
identify cancer risk genes based on the known cancer-driver mutations.
DeepDriver^[77]35 used gene mutation types as features, constructed a
K-nearest neighbor graph based on the Pearson correlation coefficient,
and trained a convolutional neural network to identify cancer risk
genes. Nevertheless, different from unsupervised
methods^[78]28,[79]29,[80]31,[81]37, supervised methods require a
substantial amount of known high-confidence risk genes as labeled data
for model training^[82]23. Unfortunately, the number of known
high-confidence ccRCC risk genes is currently limited^[83]38,[84]39.
For example, there are only 44 ccRCC risk genes in the IntOGen
database^[85]40. Owing to the reliance on labels, predicting ccRCC risk
genes using supervised methods is challenging.
To overcome the limitations of existing methods, we propose a deep
reinforcement learning-based approach for ccRCC risk gene
identification, named RL-GenRisk (Reinforcement Learning-based GENe
RISK). The reinforcement learning-based model leverages environmental
interactions for optimization^[86]41, tackling the challenge of scant
known risk genes. Specifically, RL-GenRisk models the PPI network as
the environment and utilizes a graph convolutional network^[87]42 to
learn state representations. It also incorporates the Deep Q-Network
(DQN)^[88]43 to combine reinforcement learning with deep neural
networks for ccRCC risk gene identification. Moreover, a data-driven
reward is designed to facilitate a straightforward method for
identifying ccRCC risk genes. By focusing on a sampled subgraph with
node features, the data-driven reward effectively leverages information
from both the PPI network and gene mutation data. This not only ensures
the accurate identification of genes with high mutation frequencies but
also enables RL-GenRisk to identify potential risk genes with low
mutation frequencies that functionally interact with genes having high
mutation frequencies. Extensive experiments demonstrate that RL-GenRisk
outperforms the existing methods in the identification of ccRCC risk
genes. Furthermore, several potential risk genes are revealed and
validated in independent datasets. Specifically, we validated two
top-rank genes EGFR and PCLO through statistical and biological
experiments. Statistical analyses show significant upregulation of EGFR
at both bulk and single-cell levels among ccRCC patients, with a
significant association between overexpression of the protein encoded
by EGFR and poor survival in ccRCC patients. The in vitro experimental
results show that decreased EGFR expression promotes ccRCC cell
apoptosis as well as suppresses colony formation and migration, and the
use of the EGFR inhibitor erlotinib effectively augments apoptosis and
inhibits migration. Moreover, knocking down PCLO expression in vitro
significantly inhibited ccRCC progression. Additionally, the in vivo
experimental results show that both the erlotinib and EGFR
downregulation can significantly repress the growth of ccRCC tumors in
mice.
Results
RL-GenRisk framework
We propose a deep reinforcement learning-based approach to identify
ccRCC risk genes (Fig. [89]1), named RL-GenRisk. Fundamentally
different from existing supervised deep learning-based methods,
RL-GenRisk incorporates the reinforcement learning paradigm.
Specifically, RL-GenRisk frames the ccRCC risk gene identification as a
sequence decision-making process, formulated as a Markov Decision
Process^[90]44. This enables RL-GenRisk to integrate reinforcement
learning algorithms seamlessly, thereby effectively addressing the
inherent challenge of scant known risk genes. RL-GenRisk takes PPI
network and gene mutation data as input (Fig. [91]1A). The PPI network
is represented as an undirected graph, with nodes representing genes
and edges representing interactions between genes. Gene mutation data
includes details on the presence of mutations in ccRCC patients for
each gene. RL-GenRisk treats the PPI network with gene mutation
information as the environment. The state includes a sampled subgraph
with node features. The action is selecting a node directly connected
to the sampled subgraph and adding it to this sampled subgraph. Thus,
the risk gene identification is framed as a Markov Decision Process of
node selection within the PPI network.
Fig. 1. Overview of workflow.
[92]Fig. 1
[93]Open in a new tab
A Input data of RL-GenRisk, including a Protein-Protein Interaction
(PPI) network and gene mutation data of ccRCC patients from The Cancer
Genome Atlas (TCGA). B Overview of the training and identification
process. During the training process, RL-GenRisk starts by selecting a
node randomly and appending this node to a sampled subgraph (indicated
by the red dashed circle). Nodes colored orange are included in the
sampled subgraph. Nodes interacting with the sampled subgraph are
colored blue, indicating their inclusion in the action space. The
policy comprises two graph convolutional layers and a node evaluation
network. It takes the current state as input and selects an action
based on Q values of nodes in the action space. Subsequently, the state
is updated, and the reward is obtained. This process repeats until the
sampled subgraph reaches its maximum size. Following training,
RL-GenRisk computes Q values for all genes based on an empty subgraph,
resulting in a ranking list of risk genes ordered by Q values. C
Experiment validation on external data for identified ccRCC risk genes.
Central to RL-GenRisk is a policy that is represented by a neural
network and interacts with the environment. The policy takes the
current state as input and predicts Q values, representing the
probability distribution of all possible actions. The policy of
RL-GenRisk consists of two main components: a Graph Convolutional
Network (GCN) for learning state representation and a node evaluation
network for computing action probability (Fig. [94]1B top). To enhance
state representation, RL-GenRisk employs the GCN^[95]42, an inductive
graph representation learning method, to capture node representations
in the PPI network. Node initial features are derived from the PPI
network’s topology information and ccRCC patients’ mutation
information. The policy of RL-GenRisk is trained to select the optimal
actions by maximizing the reward. In this study, we designed a
data-driven reward that focused on the sampled subgraph, considering
both information from the PPI network and gene mutation data. The DQN
algorithm is employed to update the policy’s parameters. Throughout the
training, we employed the ϵ-greedy strategy to choose actions based on
Q values, thus boosting RL-GenRisk’s exploratory potential. In the
identification phase (Fig. [96]1B bottom), RL-GenRisk starts with an
empty subgraph, incorporates all nodes into the action space, and
utilizes the trained policy to calculate Q values for each node.
Subsequently, ccRCC risk genes are ranked by Q values, with higher
values indicating greater risk. Further details about the RL-GenRisk
can be found in the “Methods” section.
RL-GenRisk shows superior performance for ccRCC risk gene prioritization over
existing methods
To assess the performance of RL-GenRisk, we utilized RL-GenRisk and
eight other existing methods to identify ccRCC genes with gene mutation
data of ccRCC patients from The Cancer Genome Atlas (TCGA)^[97]45 and
five different PPI networks. Six of these eight existing methods are
specifically designed for cancer risk gene identification, including
nCOP^[98]31, DiSCaGe^[99]29, Hierarchical HotNet^[100]46,
HotNet2^[101]30, Muffinn^[102]28, and MutSigCV^[103]22. The other two
are SVM^[104]47 and Random Forest^[105]48, which are widely used
supervised machine learning methods. The implementation details of SVM
and Random Forest are provided in the [106]Supplementary Materials. The
PPI networks used in our study included HPRD^[107]49, STRING^[108]50,
Multinet^[109]51, IRefIndex^[110]52, and HumanNet^[111]53. Known ccRCC
risk genes were retrieved from the IntOGen^[112]40 cancer-specific
database, the Network of Cancer Genes (NCG) database^[113]54, the
Cancer Gene Census (CGC) database^[114]55, and a cancer risk gene set
extracted from the recent and extensive study conducted by Bailey et
al.^[115]56. In particular, the ccRCC risk genes included in CGC and
Baylei et al. datasets are limited, containing only nine and twelve
genes respectively. Consequently, we evaluated different methods using
three datasets: IntOGen, NCG, and a “Merged” dataset that combined data
from IntOGen, NCG, Baylei et al. and CGC. The genes included in these
datasets are provided in Supplementary Data [116]1. To assess the
performance of the different methods, we used discounted cumulative
gain (DCG) as one of the primary evaluation metrics, consistent with
previous studies^[117]29,[118]57. Specifically, the DCG scores were
calculated based on the top 100 genes identified by each method.
Moreover, the DCG curves of different methods on the IntOGen, NCG, and
the Merged ccRCC risk gene dataset are shown in Fig. [119]2 and
Supplementary Fig. [120]2. Following this, we calculated the normalized
DCG (N-DCG), the area under the DCG curve (DCG-AUC), and the average
precision (AP) to further evaluate these methods’ performance (as shown
in Fig. [121]2D–F, and Supplementary Fig. [122]3). Evaluation results
indicated that RL-GenRisk outperformed the eight other established
methods in identifying ccRCC risk genes, achieving the highest DCG,
N-DCG, DCG-AUC, and AP on all used datasets (as shown in Supplementary
Fig. [123]1). As anticipated, MutsigCV showed the lowest performance
among the six existing methods specifically designed for cancer risk
gene identification due to its reliance solely on mutation data,
lacking integration with biological network insights. Muffinn performed
well with STRING but showed inferior performance with other PPI
networks, highlighting its instability across different PPI networks
(Supplementary Fig. [124]3). While nCOP and Hierarchical HotNet
performed well across different datasets and showed more stability
across various PPI networks, their performance was inferior to
RL-GenRisk (Supplementary Fig. [125]1 and Supplementary Fig. [126]3).
Additionally, SVM showed the lowest performance among the compared
methods. Random Forest performed better than SVM and MutSigCV. However,
its performance was inferior to RL-GenRisk. RL-GenRisk achieved the
best performance on HPRD (as shown in Supplementary Fig. [127]1).
Therefore, the subsequent analyses were grounded on the high-confidence
risk genes (HRGs) identified by the best-performing method, RL-GenRisk
with the HPRD network.
Fig. 2. Performance comparison and perturbation analysis.
[128]Fig. 2
[129]Open in a new tab
A–C Discounted Cumulative Gain (DCG) curves of the top 100 identified
genes with RL-GenRisk and other compared methods on the HPRD network,
evaluated on the IntOGen, NGC, and the Merged datasets. D–F Boxplot of
Average Precision (AP) on different PPI networks across different
methods, evaluated on the IntOGen, NGC, and the Merged datasets.
Boxplots display the 25th, 50th (median), and 75th percentiles as box
bounds, with whiskers extending to minima and maxima within
1.5 × interquartile range. Each box includes 5 data points,
corresponding to the results on 5 different PPI networks. G, H Mutation
frequency of the top 20 genes identified by RL-GenRisk (G. Before
perturbation; H. After perturbation.). Each dot represents a gene. The
dots in red indicate that the gene is included in the IntOGen database,
and the blue indicates that the gene is not included in the IntOGen
database. And the mutation frequency of the genes enclosed by the
orange box is less than 10%. Source data are provided as a Source Data
file.
Biological network facilitates the identification of low-frequency mutated
ccRCC risk genes
Although gene mutation frequency is crucial for assessing cancer
associations, not all cancer-related genes have high mutation
frequencies^[130]58. Therefore, relying solely on mutation data might
overlook low mutation frequency, high-risk cancer genes. To evaluate
the capability of RL-GenRisk in identifying ccRCC risk genes with low
mutation frequency but high risk, we utilized the “maftools”
package^[131]59 to analyze the mutation frequencies of the top 20 HRGs
(Supplementary Table [132]1) identified by RL-GenRisk in ccRCC patients
from TCGA (Supplementary Fig. [133]5A). Generally, 317 (77.13%) of the
411 ccRCC patients from TCGA exhibited somatic mutations in the top 20
HRGs. Missense mutations were predominant, followed by frameshift
mutations, among the identified somatic mutation types. Notably, the
mutation frequencies of four genes, including VHL, PBRM1, SETD2, and
BAP, exceeded 10%. These genes are already recognized as ccRCC risk
genes and included in the IntOGen database. Genes with high mutation
frequencies are more likely to be detected by methods that solely rely
on mutation data. However, it’s important to note that not all cancer
genes have high mutation frequencies^[134]58. Among the top 20 HRGs,
seven recognized ccRCC risk genes in the IntOGen database had mutation
frequencies below 5%. For example, only 5 of 411 ccRCC patients carry
mutations in PIK3CA, a known ccRCC risk gene in the IntOGen database,
indicating variable mutation frequencies among cancer genes. To assess
the contribution of the biological network in RL-GenRisk for
identifying low mutation frequency, high-risk ccRCC genes, we
progressively increased the percentage of edges randomly swapped
between node pairs from 25% up to 50%, 75%, and 100%. Perturbing the
PPI network disrupts its internal information, with more extensive
perturbation causing greater information loss. As the perturbation
ratio increased, the rankings of known ccRCC genes that initially had
high ranks and mutation frequencies below 5% gradually dropped (as
shown in Supplementary Fig. [135]5B). We then compared the mutation
frequencies of the top 20 HRGs identified by RL-GenRisk before and
after PPI network perturbation and created dot plots for visualization
(Fig. [136]2G, H). We found that after network perturbation, three
known ccRCC risk genes, TP53, PIK3CA, and SPEN, previously identified
in the top 20 with mutation frequencies below 5%, dropped out of the
top 20. This indicates that incorporating PPI network knowledge
improves the detection of low mutation frequency ccRCC genes with
significant risk implications.
Biological function analysis of high-confidence risk genes
To delve into the biological function of HRGs, we conducted pathway
enrichment analysis using the top 20 HRGs and the WikiPathways
database^[137]60, a comprehensive resource for pathway-based data
analysis. Enrichment analysis was performed using g:Profiler^[138]61.
Figure [139]3. A illustrated the top 10 significantly enriched pathways
(FDR p-value < = 1.21e-4). The results demonstrated significant
enrichment in various cancer-related pathways, with “Clear cell renal
cell carcinoma pathways” being the most prominently enriched (FDR
p-value = 1.48e-8). The Human Phenotype Ontology (HPO)
database^[140]62, which provides a standardized vocabulary of
phenotypic abnormalities associated with human diseases, highlighted
“Renal neoplasm” and “Renal cell carcinoma” as the most significantly
enriched phenotypes (FDR p-value < 4.05e-5, Supplementary
Fig. [141]6). Furthermore, Gene Ontology (GO) enrichment analysis
indicated that top 20 HRGs were notably enriched in various
cancer-related biological processes, including “cell adhesion” (FDR
p-value = 2.69e-4), “regulation of cell population proliferation” (FDR
p-value = 3.64e-4), “cell population proliferation” (FDR p-value =
6.85e-4), and “cell migration” (FDR p-value = 2.39e-3). These
observations showed a comprehensive molecular landscape linked to HRGs
within the scope of ccRCC, emphasizing the critical roles of these
genes in ccRCC-related mechanisms.
Fig. 3. Independent datasets analysis.
[142]Fig. 3
[143]Open in a new tab
A Top 10 enriched pathways of the Wikipathways based on the top 20
high-confidence genes (HRGs). B The expression levels of EGFR in tumor
(n = 523) and normal tissues (n = 72) of ccRCC patients from TCGA,
quantified using Fragments Per Kilobase Million (FPKM), “***” indicates
p-value < 0.001. The boxplot displays the 25th, 50th (median), and
75th percentiles as box bounds, with whiskers extending to minima and
maxima within 1.5 × interquartile range. p-value = 2.07e-38, using the
Wilcoxon signed-rank test. C, D Survival curves of EGFR. ccRCC patients
from TCGA were divided into two groups based on the top 25% (n = 108)
and others (n = 324) of expression levels of protein encoded by EGFR,
using Progression-Free Survival (PFS) and Disease-Specific Survival
(DSS) respectively. The red curve corresponds to the high expression
group, while the blue curve corresponds to the low expression group. E
UMAP plot of clustering of single-cell RNA-seq data from kidneys of
ccRCC patients. DC dendritic cell, NK natural killer cell, NKT natural
killer T cell, TAM tumor-associated macrophage, T-Reg regulatory T
cell. F Gene expression of EGFR in different cells of ccRCC patients,
with colors corresponding to the expression values. G Violin plot
displaying the normalized gene expression of EGFR in different cells of
ccRCC patients in the single-cell RNA-seq data. Source data are
provided as a Source Data file.
In addition, to further explore whether other top-performing methods
identified similar pathways albeit with different genes, we performed
an analysis based on the pathways enriched by the top 20 genes
identified by top-performing methods, including RL-GenRisk, nCOP,
Hierarchical HotNet, HotNet2, DiSCaGe, Muffinn, and Random Forest. We
first calculated the Jaccard similarity coefficients^[144]63 between
the top 20 gene lists identified by these methods (pairwise comparisons
among these 7 methods, totaling 21 pairs). The results showed that 76%
(16 out of 21 pairs) of method pairs have a Jaccard similarity
coefficient below 0.5 for their identified top 20 genes, indicating
that the top 20 gene lists identified by these methods do not exhibit a
high degree of overlap overall (Supplementary Fig. [145]22). After
that, we analyzed the intersection of pathways enriched by the top 20
genes identified by top-performing methods. The result revealed that
two pathways were significantly enriched (FDR p-value < 0.05) across
these methods, including “Clear cell renal cell carcinoma pathway” and
“Type 2 papillary renal cell carcinoma”. Notably, the top 20 genes
identified by RL-GenRisk exhibited the most significant enrichment in
the “Clear cell renal cell carcinoma pathway” among all these methods
(as illustrated in Supplementary Fig. [146]7). The detailed results of
enrichment analysis for different methods are provided in Supplementary
Data [147]3.
Differential expression analysis revealed significant differential gene
expression of EGFR at both bulk and single-cell levels
Among the top 20 HRGs identified by RL-GenRisk, 8 HRGs are not included
in these ccRCC risk gene databases. We then performed differential
expression analysis on these 8 HRGs using RNA-seq data from ccRCC
patients in TCGA. Differential expression analysis was performed using
Limma^[148]64, and the significance threshold was FDR p-value < 0.05.
Notably, EGFR and PCLO showed significant differential expression
between tumor tissues and normal tissues in TCGA among these 8 HRGs
(FDR p-value = 2.07e-38 for EGFR and FDR p-value = 1.04e-14 for PCLO,
Fig. [149]3B and Supplementary Fig. [150]8). The differential
expression results for these 8 HRGs are shown in Supplementary
Fig. [151]8. EGFR was notably upregulated in ccRCC tumor tissues
compared to normal tissues. EGFR is activated as a homodimer or
heterodimer, thereby regulating multiple signaling pathways, including
the RAS/RAF/MAPK, AKT, and JAK/STAT pathways, which play essential
roles in cell migration, proliferation, and survival^[152]65–[153]67.
These pathways are intricately involved in driving cell proliferation
and conferring resistance to apoptosis^[154]67. The overexpression of
EGFR leads to an excess of receptors on the cell surface, fostering
uncontrolled cell growth and division. This dysregulation can drive the
transformation of normal cells into tumor cells, creating a favorable
environment for sustained tumor cell survival^[155]68. Additionally,
the protein encoded by PCLO is a component of the presynaptic
cytoskeletal matrix, which is involved in establishing active synaptic
zones and in synaptic vesicle trafficking^[156]69. Recent research has
identified the expression level of PCLO as a prognostic biomarker for
esophageal squamous cell carcinoma^[157]70, and a notable high mutation
frequency (47.9%) of PCLO has been observed in large Central European
cohorts with gastric cancer^[158]71. In addition, to further explore
the transcriptional change of these 8 HRGs across tumor and normal
tissues, we conducted differential expression analysis using ccRCC
patient data from the Gene Expression Omnibus (GEO). We used two mRNA
expression profiles of ccRCC patients from [159]GSE46699^[160]72 and
[161]GSE36895^[162]73 for analysis. The [163]GSE46699 contains data
from 67 tumor tissues and 63 normal tissues. The [164]GSE36895 contains
data from 29 tumor tissues and 23 normal tissues. We found that among
these 8 HRGs, 5 genes showed significantly differential expression in
both [165]GSE46699 and [166]GSE36895 datasets (FDR p-value < 0.05, as
shown in Supplementary Fig. [167]9). In particular, MUC4 shows high
expression in tumor tissues in the [168]GSE36895 dataset, but low
expression in tumor tissues in the [169]GSE46699 dataset. Existing
studies reported that MUC4 expression is an independent prognostic
factor for overall survival in ccRCC patients^[170]74, and MUC4
mutation is associated with an exophytic growth pattern of
ccRCC^[171]75. Then, we compared the differential expression levels of
the top 20 ccRCC risk genes identified by RL-GenRisk and other methods
using the [172]GSE46699 and [173]GSE36895 datasets. The results
indicated that the top 20 genes identified by RL-GenRisk included a
higher number (14) of genes that were significantly differentially
expressed in both [174]GSE46699 and [175]GSE36895 (FDR p-value < 0.05).
Further details can be found in Supplementary Data [176]4.
To further investigate the expression patterns of the top 20 HRGs in
tumor cells and normal cells of ccRCC patients, we performed an
analysis utilizing single-cell RNA-seq data from ccRCC patients
reported in a recent work^[177]76 (Supplementary Table [178]2). Uniform
manifold approximation and projection (UMAP) was used to visualize the
distribution of 31,856 cells from kidneys in these patients in a
two-dimensional plane (Fig. [179]3E). We used UMAP to visualize the
expression of the top 20 HRGs in different cells (Supplementary
Fig. [180]10) and plotted the distribution of expression of these genes
in different cell types (Supplementary Fig. [181]11). Subsequently, we
assessed the expression levels of EGFR across different cell types
within these patients (Fig. [182]3F and Supplementary Fig. [183]11).
Our investigation showed a significant difference in EGFR expression
between tumor cells and other cells, (FDR p-value < = 1.21e-4,
Wilcoxon signed-rank test, Supplementary Fig. [184]12), with displaying
higher expression levels in tumor cells (Fig. [185]3G). Therefore, the
significant upregulation of EGFR at both bulk and single-cell levels in
ccRCC patients suggested its potential as a biomarker for ccRCC.
Expression level of protein encoded by the EGFR is significantly correlated
with the prognosis of ccRCC patients
To reveal whether EGFR affects the prognosis of ccRCC patients, the
survival analysis is conducted utilizing clinical data and expression
of EGFR-encoded protein obtained from the TCGA dataset. We obtained
clinical data and reverse-phase protein array (RPPA) data of ccRCC
patients from TCGA. Progression-Free Survival (PFS)^[186]77 and
Disease-Specific Survival (DSS)^[187]78 were utilized to assess the
relationship between the expression levels of protein encoded by EGFR
and the survival of ccRCC patients. The ccRCC patients from TCGA were
categorized into two groups (top 25% and others) based on the
expression levels of protein encoded by EGFR. We analyzed the ten-year
survival rate following cancer diagnosis and plotted Kaplan-Meier
survival curves to illustrate the impact of EGFR encoded protein
expression on the prognosis of patients (Fig. [188]3C, D). The results
of the survival analysis revealed a significant association between the
expression levels of protein encoded by EGFR and the survival time of
ccRCC patients, with higher EGFR encoded protein expression being
correlated with poorer survival outcomes (KM log-rank p-value = 0.0018,
PFS; KM log-rank p-value = 0.021, DSS). Our observations revealed that
the overexpression of protein encoded by EGFR may play a critical role
in cancer progression and could potentially serve as a prognostic
biomarker for ccRCC patients.
EGFR effectively promotes ccRCC progression in vitro and in vivo
To verify the effect of EGFR expression on ccRCC progression, the
stable cells harboring EGFR knockdown were obtained using ACHN
(Fig. [189]4A, B) and 786-O cells (Fig. [190]4G, H). The protein and
mRNA expression of EGFR was quantitated by western blotting and qPCR,
respectively, and short hairpin RNA (shRNA)-2/3 showed promising
knockdown epointing out this issuefficacies for EGFR silencing
(Fig. [191]4A, B, G, H). The downregulation of EGFR significantly
inhibits the cell viability (CCK8 assay, Fig. [192]4C-ACHN,
Fig. [193]4I-786-O) and migration (transwell assay, Fig. [194]4E-ACHN,
Fig. [195]4K-786-O). Furthermore, decreased EGFR expression markedly
promotes the ccRCC cell apoptosis (flow cytometry assay,
Fig. [196]4D-ACHN, Fig. [197]4J-786-O) and represses cell colony
formation in vitro (colony assay, Fig. [198]4F-ACHN,
Fig. [199]4L-786-O). Also, the EGFR overexpression was detected using
qPCR (Fig. [200]4M) and western blotting (Fig. [201]4N), and it
promotes the migration of 786-O cells significantly (Fig. [202]4O).
Moreover, the EGFR inhibitor, erlotinib, was further used to inhibit
its activity, and the data indicated that erlotinib can effectively
inhibit ccRCC cell migration (Fig. [203]5A middle panel) and growth
(Fig. [204]5A lower panel) as well as promotes the apoptosis
(Fig. [205]5A upper panel) in vitro. In vivo, both erlotinib and EGFR
downregulation can markedly repress the tumor growth (Fig. [206]5B, C).
Taken together, these results suggested that EGFR can significantly
promote ccRCC cell progression as a risk factor.
Fig. 4. EGFR dysregulation significantly affect ccRCC cell progression in
vitro.
[207]Fig. 4
[208]Open in a new tab
A, B The protein and mRNA expression of EGFR in stable cells of
ACHN-shEGFR. n = 4 independent samples. C. The cell viability of
ACHN-shEGFR cells. n = 10 from three biological replicates. D–F The
assays of apoptosis (n = 4 independent experiments), migration (n = 5
independent experiments), and colony formation (n = 3 independent
experiments) in ACHN-shEGFR cells. G, H The protein and mRNA expression
of EGFR in stable cells of 786-O-shEGFR. n = 4 independent samples. I
The cell viability of 786-O-shEGFR cells. n = 10 from three biological
replicates. J–L The assays of apoptosis (n = 3 independent
experiments), migration (n = 5 independent experiments), and colony
formation (n = 3 independent experiments) in 786-O-shEGFR cells. M, N
Overexpression of EGFR in 786-O cells at mRNA and protein levels. n = 4
independent samples. O The migration assay of 786-O-EGFR cells. n = 5
independent experiments. NC negative control, ns no significant. The
data were shown as the mean ± SEM. One-way ANOVA (B–F, H–K).
Two-tailed unpaired Student’s t-test (L–O). Source data are provided as
a Source Data file.
Fig. 5. EGFR inhibition represses ccRCC cell progression in vitro and in
vivo.
[209]Fig. 5
[210]Open in a new tab
A Erlotinib markedly promoted apoptosis and inhibited migration and
colony formation of ccRCC cells. n = 3 independent experiments. B, C
Both erlotinib (B) (n = 3 mice per group) and EGFR knockdown (C) (n = 3
mice per group) significantly reduced tumor growth in vivo. The data
were shown as the mean ± SEM. Two-tailed unpaired Student’s t-test
(A). Two-way ANOVA (B, C). Source data are provided as a Source Data
file.
The PCLO knockdown significantly impaired the ccRCC progression
To investigate the effect of PCLO on the progression of ccRCC in vitro,
we utilized four distinct shRNAs to knock down PCLO expression in 293T
cells, identifying shPCLO-1 as the most effective (Fig. [211]6A, B).
Subsequently, we obtained the PCLO knockdown 786-O cells using shPCLO-1
(Fig. [212]6C). The downregulation of PCLO significantly inhibited cell
viability (Fig. [213]6D) and promoted apoptosis (Fig. [214]6E).
Additionally, the reduction of PCLO expression significantly suppressed
cell colony formation (Fig. [215]6F), migration (Fig. [216]6G), and
invasion (Fig. [217]6H). To further investigate the effects of PCLO
knockdown on the morphology and motility of ccRCC cells, a high-content
imaging analysis system was employed. The results revealed that PCLO
knockdown significantly reduced both the length and width of the cells
at multiple time points (Fig. [218]6I), as well as their area and
perimeter (Fig. [219]6J). Notably, a reduction in cell movement speed
was observed in 786-O-shPCLO cells (Fig. [220]6K), leading to a
shortened accumulated distance (Fig. [221]6L). In addition, we
performed the same intervention on PCLO in another renal cancer cell
line, ACHN, and conducted related assays to assess proliferation and
metastatic potential. The results showed that stable knockdown of PCLO
in ACHN cells (Supplementary Fig. [222]24A) significantly reduced their
proliferation (Supplementary Fig. [223]24B), colony formation
(Supplementary Fig. [224]24D), migration (Supplementary Fig. [225]24E),
and invasion (Supplementary Fig. [226]24F) abilities, while increasing
the apoptosis index (Supplementary Fig. [227]24C). High-content imaging
analysis system also indicated that knockdown of PCLO in ACHN not only
decreased cell perimeter but also impaired the cells’ motility
(Supplementary Fig. [228]24G–I). Overall, the PCLO knockdown markedly
inhibited the progression of ccRCC.
Fig. 6. Loss of PCLO significantly impaired the ccRCC progression.
[229]Fig. 6
[230]Open in a new tab
A, B The protein and mRNA expression of PCLO after transient
transfection of shPCLO lentivirus in 293T cells. n = 3 independent
samples. C The protein expression of PCLO in cells of 786-O-shPCLO. D
The cell viability of 786-O-shPCLO cells. n = 10 from three biological
replicates. E–H The assays of apoptosis, colony formation, migration,
and invasion in 786-OshPCLO cells. n = 3 independent experiments. I, J
Average morphology of 786-O-shPCLO cells per well, including cell
width, length, area, and perimeter, was evaluated at multiple time
points using a high-content cell imaging analysis system. n = 10
independent samples. K, L Average movement speed and accumulated
distance of 786-O-shPCLO cells per well. n = 10 independent samples. NC
negative control, ns no significant. The data were shown as the mean
± SEM. One-way ANOVA (B). Two-tailed unpaired Student’s t-test (D–L).
Source data are provided as a Source Data file.
Discussion
In this study, we developed RL-GenRisk, an approach utilizing deep
reinforcement learning to enhance ccRCC risk gene identification by
integrating network knowledge with gene mutation data. By considering
the risk gene identification as a node selection problem, we model the
ccRCC risk gene identification as a Markov Decision Process, reducing
the dependency on labeled data. Furthermore, we designed a data-driven
reward and employed the DQN algorithm to optimize RL-GenRisk.
RL-GenRisk exhibits a substantial improvement in the task of ccRCC risk
gene identification and reveals several potential risk genes.
RL-GenRisk successfully identified known ccRCC risk genes. Among the
top 20 ccRCC risk genes identified by RL-GenRisk, 12 genes are listed
in the ccRC known risk gene datasets (IntOGen, NCG, and the Merged
dataset) and recognized as ccRCC risk genes in prior research,
including VHL^[231]79,[232]80, PBRM1^[233]81,[234]82,
SETD2^[235]83,[236]84, BAP1^[237]85,[238]86, MTOR^[239]87,[240]88,
ATM^[241]89,[242]90, SPEN^[243]91, TP53^[244]92, KMT2D (also known as
MLL2)^[245]93, SMARCA4^[246]94,[247]95, and PTEN^[248]96,[249]97. This
demonstrates that RL-GenRisk can effectively identify ccRCC risk genes.
Interestingly, 8 of the top 20 identified ccRCC risk genes are not
listed in the known ccRCC risk gene datasets, including MUC4, DST,
PABPC1, PCLO, PDE4DIP, USH2A, EGFR, and FLG. We visualized the top 20
genes identified by RL-GenRisk in the PPI network using the STRING-db.
These genes were significantly interconnected within the PPI network
(one-sided t-test, FDR p-value = 2.72e-07). To illustrate the
interactions between known ccRCC risk genes and the potential risk
genes, we used a bipartite graph (see Supplementary Fig. [250]16).
Among the potential risk genes, EGFR exhibited the most interactions
with known ccRCC risk genes, followed by MUC4 and PCLO. Notably, both
EGFR and PCLO were validated in our biological experiments. In
addition, some recent studies have found that PDE4DIP, FLG, and USH2A
are associated with ccRCC. For example, methylation levels of PDE4DIP
were found to be associated with reduced overall survival in ccRCC
patients^[251]98. FLG was found to be specifically mutated in specific
subtypes of ccRCC^[252]99. USH2A was found to have a significant
co-mutation with well-known high-confidence ccRCC risk genes VHL and
PBRM1 in ccRCC patients^[253]100,[254]101. While the genes within these
identified candidates have been shown a correlation with ccRCC, their
molecular mechanism within ccRCC needs to be further investigated.
Furthermore, we validated two top ccRCC risk genes, EGFR and PCLO,
identified by RL-GenRisk through independent datasets and biological
experiments. The results indicated that EGFR exhibits significant
upregulation at both bulk and single-cell levels in ccRCC patients. The
overexpression of the protein encoded by EGFR is significantly
associated with poor survival of ccRCC patients. To explore whether the
few EGFR mutations detected in ccRCC patients are functionally
relevant, we analyzed EGFR protein expression levels between ccRCC
patients with and without EGFR mutations, as proteins are the primary
functional molecules in cellular processes. We use the ccRCC patient
data from TCGA to perform analysis, dividing them into groups with and
without EGFR mutations to assess whether the protein expression levels
significantly differ between these two groups. The results showed that
EGFR protein expression levels were higher in ccRCC patients carrying
EGFR mutations (p-value = 0.017, Wilcoxon signed-rank test,
Supplementary Fig. [255]17). Moreover, we visualized the interacting
genes of EGFR using STRING-db (Supplementary Fig. [256]20). It is shown
that EGFR interacts with many genes. To further explore the effect of
these genes, we conducted pathway enrichment analysis on these genes.
The results showed that one ccRCC-related pathway and three
cancer-related pathways are enriched, including the ErbB signaling
pathway (FDR p-value = 4.99e-2), the PI3K Akt signaling pathway (FDR
p-value = 3.41e-7), MAPK signaling pathway (FDR p-value = 2.99e-8), and
the CKAP4 signaling pathway map (FDR p-value = 3.54e-7). The ErbB
signaling pathway has been found to play a critical role in the
initiation and progression of ccRCC^[257]102. The PI3K Akt signaling
pathway plays a crucial role in various cellular processes and is
aberrantly activated in cancers, contributing to the occurrence and
progression of tumors^[258]103. The MAPK signaling pathway is one such
complex interconnected signaling cascade with frequent involvement in
oncogenesis, tumor progression, and drug resistance^[259]104. The CKAP4
signaling pathway map has been found to be involved in regulating the
progression of various cancers^[260]105. The enrichment analysis
results suggested that the genes interacting with EGFR in the PPI
network were significantly enriched in pathways related to the
initiation and progression of ccRCC, as well as tumor initiation and
progression. Furthermore, through comprehensive biological experiments,
we validated the impact of EGFR and PCLO on ccRCC progression. The
results demonstrated that decreased EGFR expression promotes ccRCC cell
apoptosis and suppresses colony formation. To confirm the status of
EGFR expression in ccRCC cells, we performed additional experiments. We
found that the mRNA and protein expression levels of EGFR in the 786-O
and ACHN cells are significantly higher than that in the human
embryonic kidney (293T) cells using qPCR and western blotting analyses
(as shown in Supplementary Fig. [261]23). Accordingly, these results
showed that EGFR is overexpressed in both 786-O and ACHN cell lines.
Recent studies have indicated that the downregulation of certain
receptor tyrosine kinases (RTKs) members can prevent the progression of
RCC. For instance, silencing AXL and MET using shRNA may overcome the
resistance to long-term sunitinib treatment in metastatic RCC^[262]106.
Additionally, inhibition of EphA2 can suppress tumor growth both in
vitro and in vivo, and restore the sensitivity of sunitinib-resistant
tumor cells to sunitinib^[263]107. Therefore, we proposed that other
RTKs could produce the same effect, even these RTKs would play a
synergistic role with EGFR in ccRCC. However, their roles in ccRCC
progression need to be further validated in the future. Additionally,
the use of the EGFR inhibitor erlotinib effectively inhibits ccRCC cell
migration and growth in mice. While previous studies have associated
EGFR expression with RCC, functional studies directly examining EGFR’s
role in RCC cell lines have been limited. Lee et al. utilized shRNA to
knock down EGFR on 786-O cells, suggesting that EGFR knockdown inhibits
the invasiveness of RCC cells in vitro and tumorigenicity in
vivo^[264]108. However, their conclusions were primarily based on cell
proliferation and branching morphogenesis assays, as well as tumor
xenograft experiments. Similarly, Wen et al. found that EGFR knockdown
reversed ADAMTS1-induced prometastatic characteristics of RCC, this was
only verified by CCK8 and Matrigel invasion assay^[265]109. Therefore,
comprehensive experiments are essential to further elucidate the role
of EGFR in ccRCC cell lines. In our study, we stably knocked down EGFR
expression in both 786-O and ACHN cells using shRNA. We conducted a
series of experiments, including CCK8, apoptosis assays, transwell
assays, colony formation assays, and in vivo experiments, which
comprehensively explored the function of EGFR in ccRCC. Our results
demonstrate that knocking down EGFR significantly inhibits both the in
vitro proliferation and metastasis of ccRCC cells, as well as tumor
growth in vivo. These results not only demonstrate RL-GenRisk’s
capability in identifying ccRCC risk genes but also provide further
insights into the role of EGFR in ccRCC cell lines. Moreover, the
biological experimental results of PCLO showed that knocking down PCLO
expression in vitro significantly inhibited ccRCC progression. These
biological experimental results highlighted the potential therapeutic
significance of our findings.
In addition, we explored incorporating mutation type information and
pathogenicity scores into the feature vectors of RL-GenRisk. For
mutation type, we utilized the TCGA data to calculate the proportion of
each mutation type relative to the total number of mutations for each
gene. For the pathogenicity score, we computed the CADD score^[266]110
for each mutation, and the average CADD score across all mutations in a
gene was used as the pathogenicity score feature. By adding different
types of features, we evaluate three variations of RL-GenRisk,
including mutation type only, pathogenicity score only, and both. The
results show that after adding mutation type information separately to
the feature vector, RL-GenRisk showed improvement in AP on the NCG
dataset but showed decreases in DCG, N-DCG, and DCG-AUC on the NCG
dataset, as well as decreases in AP, DCG, N-DCG, and DCG-AUC on the
IntOGen and the Merged dataset (as shown in Supplementary
Fig. [267]13). After adding pathogenicity scores separately to the
feature vector, RL-GenRisk showed improvements in AP, DCG, and N-DCG on
the NCG and decreases in DCG-AUC on the NCG and the Merged dataset, as
well as decreases in AP and DCG-AUC on the IntOGen dataset (as shown in
Supplementary Fig. [268]13). Moreover, after adding both mutation type
information and pathogenicity scores to the feature vector, RL-GenRisk
showed a decrease in AP, DCG, N-DCG, and DCG-AUC across the IntOGen,
NCG, and the Merged dataset (as shown in Supplementary Fig. [269]13).
These results indicated that adding more clinically informative
features might improve the performance of the method, but further
exploration is needed.
Furthermore, we analyzed the overlap between the results of different
methods by calculating the Jaccard similarity coefficient for the top
100 genes identified by any two methods (Supplementary Fig. [270]18).
The result indicated that the top 100 genes identified by Hierarchical
HotNet and HotNet2 have the highest degree of overlap (with Jaccard
similarity coefficient = 0.39), as they are similar methods. In
general, there was a low overlap between the top 100 genes identified
by different methods (Supplementary Fig. [271]18). Only one gene VHL is
included in all the top 100 genes from different methods. VHL is a
known risk gene for ccRCC and with the highest mutation frequency in
TCGA ccRCC patients. In addition, seven genes were identified in the
top 100 genes by at least six out of the nine methods, including VHL,
PBRM1, SETD2, BAP1, MTOR, SPEN, and ATM. These six genes are all
well-established risk genes for ccRCC. These results suggested that
while different methods for ccRCC risk gene identification exhibit
different performances, few genes like VHL, consistently emerge across
multiple methods. Then, we conducted an analysis to determine the
minimum number of genes required to encompass all known ccRCC risk
genes for each method. Since nCOP only outputs a ranked list of genes
and does not allow users to adjust the list length by setting different
thresholds, we are unable to assess the minimum number of genes
required to encompass all known ccRCC risk genes. For other methods,
the results show that all tested methods require at least more than
8000 genes to encompass all known ccRCC risk genes (Supplementary
Table [272]7). This leads to a very low proportion of known risk genes
in the identified gene list ( <1%). Including all known risk genes
introduces numerous false positives, which significantly decreases the
precision of the results. A potential future direction could involve
developing methods that can identify all (or most) known risk genes
while maintaining high precision.
Since the interpretability of most reinforcement learning systems
remains a significant challenge, to enhance understanding, we have
incorporated several cues in the output to explain why certain genes
are more or less likely to be associated with ccRCC. Our hypothesis is
that genes with high ranks are likely to be in close proximity to known
ccRCC-related genes. We have provided two types of supplementary
information to measure the closeness between the predicted genes and
known high-risk ccRCC genes. First, other than gene rank, we also
output the average shortest path length between each gene and all known
risk genes in the PPI network, as well as the FDR p-values obtained by
the one-sided t-test that measures whether the average shortest path
length between each gene and the known risk genes was significantly
shorter. Second, we output the average cosine similarity between each
gene’s feature embeddings and the feature embeddings of known risk
genes, as well as the FDR p-values obtained by the one-sided t-test
that measures whether the average cosine similarity of feature
embeddings between each gene and the known risk genes was significantly
higher. Supplementary Table [273]6 shows an example of the output of
the top genes identified by RL-GenRisk. To further strengthen these
analyses and facilitate intuitive interpretation, RL-GenRisk can output
the comparison of the top K genes with K randomly selected genes
(randomly sampled 100 times) in terms of their average shortest path
lengths to known risk genes in the PPI network, as well as their
average cosine similarities of feature embeddings with known risk
genes. Users can evaluate the reasonableness of the top K predicted
genes. For illustration purposes, we set K = 20 in this example. The
results show that the average shortest path lengths between the top 20
genes identified by RL-GenRisk and known risk genes within the PPI
network are significantly shorter (FDR p-value = 2.73e-4, one-sided
t-test, Supplementary Fig. [274]21A), and the average cosine
similarities of the feature representations between the top 20 genes
and known risk genes are significantly higher (FDR p-value = 4.61e-8,
one-sided t-test, Supplementary Fig. [275]21B). These results indicate
that the top 20 genes identified by RL-GenRisk are closely connected to
known risk genes in both the network structure and feature space.
To illustrate whether the top 20 genes identified by each method are
included in the known ccRCC risk gene datasets, we provided heatmaps in
Supplementary Fig. [276]4. The results indicated that RL-GenRisk
identified the highest number of known risk genes (12 genes) among the
top 20 genes, followed by nCOP (11 genes), highlighting RL-GenRisk’s
capability in identifying ccRCC risk genes. Additionally, we provided
the runtime of various methods in Supplementary Table [277]5. The
results showed that HotNet2 and Hierarchical HotNet had the longest
runtimes among the methods tested. In comparison, RL-GenRisk was
completed in approximately 8 hours. Although some methods had shorter
runtimes than RL-GenRisk, their overall performance was inferior to
that of RL-GenRisk.
RL-GenRisk can also be applied to identify risk genes for other types
of cancer. We evaluated RL-GenRisk and other methods using additional
tumor datasets, specifically focusing on Bladder Urothelial Carcinoma
(BLCA) and Glioblastoma Multiforme (GBM) data from the TCGA dataset.
The known risk genes for BLCA and GBM were extracted from the same
source as ccRCC in our study. The results indicated that RL-GenRisk
outperformed other methods on both BLCA and GBM (as shown in
Supplementary Figs. [278]14 and [279]15), highlighting the potential of
RL-GenRisk for application in other cancer types.
Methods
Our research complies with all relevant ethical regulations and
guidelines. Animal handling and experimental procedures were approved
by the Ethical Review Committees of West China Hospital, Sichuan
University. The animal studies were authorized by the Animal Ethics
Review Committees of the West China Hospital, China (No. 20230214007).
All animal experiments were strictly implemented in compliance with the
NIH Guide for the Care and Use of Laboratory Animals.
Data preparation
Consistent with previous studies^[280]29,[281]31, we used the
information from the PPI network together with gene mutation data from
patients. The feature matrix H for nodes is constructed based on the
topological information of genes in the PPI network, the mutation
frequency of genes in ccRCC patients, and the genes’ length.
PPI networks
We collected protein-protein interactions from HPRD^[282]49,
STRING-db^[283]50, Multinet^[284]51, IRefIndex^[285]52, and
HumanNet^[286]53. Then, following the previous study^[287]31, we
performed a two-step preprocess on these PPI networks. First, we
excluded the nine longest genes (TTN, MUC16, SYNE1, NEB, MUC19,
CCDC168, FSIP2, OBSCN, GPR98) as they tend to acquire numerous
mutations by chance and cover many patients^[288]31. Second, to
mitigate the noise due to the dense connectivity in PPI networks, we
applied the diffusion state distance (DSD)^[289]111 metric on these PPI
networks.
Gene mutation data
We collected somatic mutation of genes in ccRCC patients from The
Cancer Genome Atlas. Each gene corresponds to a patient list containing
patients who carry mutations in that gene.
Feature representation
The initial feature of gene v is represented as
[MATH: hv∈R1×
k :MATH]
, where k is the dimension. In RL-GenRisk, the node feature is flexible
and can have different dimensions. We set k = 3 in RL-GenRisk.
Therefore, the initial feature of gene v can be represented as
[MATH: hv=[hv
(0)
mrow>,hv(1)
,hv<
/mi>(2)] :MATH]
, which respectively measures the topological importance of v in the
PPI network, the mutation frequency in patients, and the length
information of the gene. We first use the degree of v as the first
dimension of h[v] since it can represent the topological importance of
v in the PPI network:
[MATH:
hv
(0)=Degree(v
)μd<
/mi>egree, :MATH]
1
where Degree(v) represents the node degree of gene v in the PPI
network. μ[degree] is a hyperparameter for normalization. We set
μ[degree] equal to the maximum degree of nodes in the PPI network.
Then, the second dimension of h[v] considers the mutation frequency of
gene v in ccRCC patients:
[MATH:
hv
(1)=Np<
mrow>vNp, :MATH]
2
where
[MATH:
Npv
:MATH]
represents the number of patients carrying mutation on v and N[p]
represents the total number of patients. Moreover, the initial feature
should consider the length information of the gene, since the longer
genes tend to include more mutations by change. To dilute this type of
effect, the third dimension of h[v] is designed as follows:
[MATH:
hv
(2)=Length(v
)αNpv,
:MATH]
3
where Length(v) represents the gene length of v, and α is a
normalization parameter. We set α equal to the node number of a
connected subgraph in the PPI network that each patient carried
mutations on at least one gene in this subgraph. Finally, the initial
feature of gene was defined as
[MATH: hv=[hv
(0)
mrow>,hv(1)
,hv<
/mi>(2)] :MATH]
, and the initial feature matrix for all genes was represented as
[MATH: H∈Rn×
k :MATH]
, where n is the number of genes.
Key elements in RL-GenRisk
In RL-GenRisk, the identification of ccRCC risk genes is framed as a
Markov Decision Process. At each step, the policy of RL-GenRisk
receives the current state as input and selects an action. The action
is selecting a node that connects with the sampled subgraph and
appending this node to the sampled subgraph. Thus, this sampled
subgraph adds a new node at each step. A reward is obtained after
taking an action. Therefore, there are three key elements in
RL-GenRisk, including state, action, and reward. These three key
elements are defined as follows:
State
The state s at step t is represented as s[t], which consists of the
feature matrix H and the current subgraph G[t] sampled from the PPI
network. In the first step, RL-GenRisk creates an empty subgraph and
randomly adds a node to it. The design of incorporating the sampled
subgraph into the state allows RL-GenRisk to delve into localized
network structures. This enables RL-GenRisk to focus on the interaction
relationships relevant to the current sampled subgraph at each step.
Therefore, the subgraph information in the state enables RL-GenRisk to
accurately estimate its current environment and adjust its actions
accordingly.
Action
At step t, the action a[t] is selecting a node that connects with the
sampled subgraph G[t] and appending this node to G[t]. The action space
at step t is represented as A[t], which contains nodes that connect
with G[t]. In detail, after getting Q values for all possible actions,
RL-GenRisk uses an ϵ-greedy strategy to select an action a[t]:
[MATH:
at=randoma∈Atifp<ϵargmaxa∈At
msub>Q(st;a;θ)otherwise, :MATH]
4
where Q(s[t]; a; θ) represents the Q value of action a calculated by
the policy based on the current state s[t], θ stands for the parameters
of the policy. A[t] represents the action space at step t. With a
probability p not exceeding ϵ, we select an action in action space
randomly. Otherwise, the action with the highest Q value is selected.
The ϵ-greedy strategy can enhance the exploratory nature of the policy,
effectively preventing the model from getting stuck in local optimal
during training. Same with the previous study^[290]112, we set ϵ equal
to 0.95. To balance exploration and exploitation, ϵ is decreased
gradually during the training process.
Reward
In reinforcement learning algorithms, the design of the reward is
crucial to the performance of the algorithm. In this study, we designed
a specific data-driven reward based on the current sampled subgraph.
The sampled subgraph G[t] at step t is expected to cover more patients
so that risk genes are more likely to appear in it. Instead of focusing
on an individual gene, RL-GenRisk focuses on a sampled subgraph G[t]
with node feature matrix H. This makes RL-GenRisk effectively leverage
information from both the PPI network and gene mutation data. More
importantly, this design of the reward ensures the accurate
identification of genes with high mutation frequencies, and also
identifies potential risk genes with low mutation frequencies but
functionally interacting with genes that have high mutation
frequencies. However, only considering the number of patients covered
by the sampled subgraph G[t] is not enough. Longer genes are more
likely to mutate by chance in patients. Therefore, shorter genes with
high mutation frequencies are more likely to be risk genes^[291]31. To
take gene length into account, we integrate gene length information
when designing the reward. Thus, the single-step reward is higher when
the sampled subgraph covers more patients and the genes in the subgraph
have shorter lengths. The single-step reward r[t] at step t is designed
as:
[MATH:
rt=Rt−Rt+1,Rt=δ1−NpGtNp+(1−δ)∑v∈G
tLength(v
)Npv, :MATH]
5
where r[t] represents the single-step reward at step t. δ is a weight
hyperparameter. R[t] provides an evaluation score of the current state,
with
[MATH:
NpGt
:MATH]
representing the number of patients that carry at least one mutation on
the genes in the sampled subgraph G[t],
[MATH:
Npv
:MATH]
represents the number of patients carrying mutations on gene v, and
N[p]represents the total number of patients. Length(v) represents the
length of gene v. The cumulative reward is defined as the sum of all
single-step rewards.
Policy network
In RL-GenRisk, the policy takes the current state as input and outputs
Q values for all possible actions. The policy of RL-GenRisk is
represented by a neural network which is usually referenced as the
policy network. Specifically, the policy of RL-GenRisk comprises two
main components: a graph convolutional network (GCN) and a node
evaluation network. We also use three multi-layer perceptrons (MLP) to
perform dimensionality transformation. The GCN aggregates neighborhood
information to get the representation for each node by multiplying the
graph laplacian^[292]42 with the node feature matrix. Given the feature
matrix
[MATH: H∈Rn×
k :MATH]
, n is the number of nodes and k is the dimension, the hidden
representation of the l-th graph convolutional layer is calculated as:
[MATH: Hl=σ(LHl−1Wl), :MATH]
6
where H^l represents the hidden representation of the l-th graph
convolutional layer.
[MATH: L=D^<
mrow>−12A^D^<
mrow>−12 :MATH]
represents the normalized graph laplacian, which is used to aggregate
neighborhood information. GCN preserves the original node signal by
adding self-connections:
[MATH: A^=A~+I :MATH]
,
[MATH: A~ :MATH]
represents the adjacency matrix and I represents the identity matrix.
[MATH: D^ :MATH]
represents the degree matrix for
[MATH: A^ :MATH]
and
[MATH: D^<
mrow>ii=∑
jA^<
mrow>ij :MATH]
, W^l represents the trainable matrix of the l-th graph convolutional
layer. σ represents the non-linear activation function.
In RL-GenRisk, we use two graph convolutional layers. Before the first
graph convolutional layer receives data, we use an MLP to perform
dimensionality transformation on the initial features:
[MATH: H0=W1H, :MATH]
7
where H represents the initial feature matrix, W[1] represents a
trainable projection matrix. H^0 represents the input feature matrix of
the first graph convolutional layer.
Inspired by residual networks^[293]113, we concatenate the hidden
representation matrices of two graph convolutional layers with H^0, and
use another MLP for dimension transformation to obtain the final
representations matrix
[MATH: H^ :MATH]
:
[MATH: H^=W2(Concat(H0,H1,H2)), :MATH]
8
where H^1, H^2 represent the hidden representation matrices of the two
graph convolutional layers, W[2] represents a trainable projection
matrix, and Concat( ⋅ ) represents the concatenation operator.
At step t, after getting the representations matrix by GCN, RL-GenRisk
calculates the representation of the sampled subgraph through the
averaging pooling operation and uses the MLP for dimension
transformation:
[MATH: H^<
mrow>Gt
=W3∑v∈G
tH^<
mrow>vSize(G
mi>t)
, :MATH]
9
where
[MATH: H^<
mrow>Gt
:MATH]
represents the representation of sampled subgraph G[t],
[MATH: H^<
mrow>v :MATH]
represents the representation of gene v, Size( ⋅ ) represents the
number of nodes in the sampled subgraph, and W[3] represents a
trainable projection matrix.
Then, we use a two-layer MLP as the node evaluation network to
calculate the Q values for each action in action space:
[MATH: Q(st;a;θ)=
W5(σ(W4(Concat(H^<
mrow>Gt
,H^<
mrow>v))<
/mrow>)), :MATH]
10
where Q(s[t]; a; θ) represents the Q value for action a based on the
current state s[t], θ stands for the parameters of the policy.
[MATH: H^<
mrow>Gt
:MATH]
represents the representation of sampled subgraph G[t], v represents
the gene that selected by action a, and
[MATH: H^<
mrow>v :MATH]
represents the representation of gene v. W[4] and W[5] represent two
trainable projection matrices. σ represents the non-linear activation
function. By using a two-layer MLP that can continuously update
parameters during the training process as the node evaluation network,
RL-GenRisk can better predict Q values for actions. More details about
the policy network can be found in Supplementary Table [294]3.
Policy training for high-confidence risk gene identification
Consistent with the previous study^[295]31, we sampled the training
data, randomly collecting 85% samples from 379 ccRCC patients before
the training process started. In the training process, since we model
the ccRCC risk gene identification as a Markov Decision Process, the
policy of RL-GenRisk iteratively receives the current state as input,
selects an action, and obtains a reward. The DQN algorithm^[296]43,
which is widely used in reinforcement learning methods, is employed to
train the policy of RL-GenRisk. DQN uses two sets of Q values
calculated by an online network and a target network, respectively. The
online network in our study is the neural network of RL-GenRisk. The
target network is designed to prevent the online network from
overestimating Q values. The architecture of the target network is the
same as that of the online network, but their parameters are different.
In RL-GenRisk, the loss function is defined as follows:
[MATH: L=E[(rt
+γmax<
mrow>at+1∈At+1Q(st+1;at+
1;θ′)−Q
(st;at;<
mi>θ))<
mn>2], :MATH]
11
where
[MATH: L
:MATH]
represents the loss to be minimized. r[t] stands for the reward
received at the step t, and γ is the discount factor.
[MATH:
maxat+1∈At+1Q(st+1;at+
1;θ′) :MATH]
represents the target network estimate of the maximum expected Q value
for the next state s[t+1] and possible action a[t+1]. A[t+1] represents
the action space. Q(s[t]; a[t]; θ) represents the Q value calculated by
the online network based on the current state s[t] and action a[t]. θ
and
[MATH: θ′
:MATH]
represent the parameters of the online network and target network,
respectively. θ is updated through gradient backward based on the loss
function. Then
[MATH: θ′
:MATH]
is updated through soft updates based on the parameters of the online
network as follows:
[MATH:
θ′=τθ+(1−<
mi>τ)θ′, :MATH]
12
where τ represents a hyperparameter that controls the proportion of
each update. More details about the hyperparameters in RL-GenRisk are
shown in Supplementary Table [297]4. Moreover, we provided recommended
ranges for each hyperparameter in Supplementary Table [298]4. Users can
select hyperparameters within the recommended ranges. Additionally,
users who wish to try a wider range of hyperparameter values can also
use the grid search^[299]114 to select hyperparameter values.
In the identification process, RL-GenRisk differs from existing methods
that predict outcomes following the same procedure as the training
process^[300]23,[301]29,[302]31. Instead, capitalizing on the advantage
of RL-GenRisk combined with reinforcement learning, we have designed a
concise and effective identification process. Specifically, the model
with the highest cumulative reward during the training process is
selected as the best model and loaded. The sampled subgraph is
initialized as empty, and all nodes in the PPI network are included in
the action space at the beginning. The policy calculates Q values for
all genes, and subsequently, the final ranking list of ccRCC risk genes
is ordered based on these Q values. The higher Q value indicates higher
risk. We analyzed the top 20 high-confidence risk genes in our study:
[MATH:
HRGto<
/mi>p20={
g∈G∣Q(g)∈Qt
mi>op20}, :MATH]
13
where HRG[top20] represents the top 20 high-confidence risk genes. g
represents a gene and G represents the PPI network. Q(g) represents the
Q value of gene g and Q[top20] represents the top 20 highest Q values
in the set of all Q values calculated at the identification process.
Researchers can select the number of top genes for verification based
on actual verification costs.
Performance evaluation
Known ccRCC risk genes were sourced from the IntOGen cancer-specific
database^[303]40, the Network of Cancer Genes (NCG) database^[304]54,
the Cancer Gene Census (CGC) database^[305]55, and a cancer risk gene
set derived from a recent comprehensive study by Bailey et al.^[306]56.
The known ccRCC risk genes used in our study are provided in
Supplementary Data [307]1. Due to the limited number of genes in the
CGC (nine genes) and Bailey et al. (twelve genes) lists, we ultimately
used three datasets to evaluate different methods: IntOGen, NCG, and a
“Merged” dataset that combined IntOGen, NCG, Bailey et al., and CGC.
First, the performance was evaluated using the Discounted Cumulative
Gain (DCG), which has been used for evaluating cancer risk gene
identification in the previous study^[308]29. The DCG is calculated as:
[MATH: DCG=∑i=1
Nrelgi<
/mi>lo
g2(i+1), :MATH]
14
where N represents the number of the identified ccRCC risk genes.
[MATH:
rel
gi :MATH]
is equal to 1 if the i-th gene g[i] is contained in the known risk gene
database, and 0 otherwise. Therefore, the higher the ranking of known
ccRCC risk genes in the prediction results, the higher the DCG score.
Following the previous study^[309]31, the top 100 identified ccRCC risk
genes were considered in the performance evaluation. The top 100 genes
identified by different methods are listed in Supplementary
Data [310]2. Additionally, to provide a more comprehensive evaluation
of different methods, we presented DCG curves and calculated the area
under the DCG curve. We also used the normalized DCG (N-DCG) and the
average precision (AP) as evaluation metrics. The N-DCG is calculated
as:
[MATH:
N-DCG=DCGI-DCG,I-DCG=
∑i=1Nrelgiideallog2(i+1) :MATH]
15
where
[MATH:
rel<
mi>giideal :MATH]
is the relevance score of the item at g[i] in the ideal ranking. The AP
is calculated as:
[MATH: AP=1<
mi>N
∑i=1100
mn>P(i)⋅rel<
/mrow>gi
, :MATH]
16
where N represents the number of the ccRCC risk genes. P(i) represents
the precision for the top i genes, which is calculated by dividing the
number of known risk genes in identified genes by the total number of
identified genes.
[MATH:
rel
gi :MATH]
is equal to 1 if the i-th gene g[i] is contained in the known risk gene
database, and 0 otherwise.
Threshold selection for top K genes
To determine the optimal number of top K genes for downstream analysis,
we evaluated the proportion of known ccRCC risk genes across various
values of K (where K = 20, 30, 40, …, 100). We observed that when K was
set to 20, the average proportion of known ccRCC risk genes exceeded
50% (Supplementary Fig. [311]19). As K increased, the proportion of
known ccRCC risk genes decreased (Supplementary Fig. [312]19). Based on
this observation, we suggested that selecting K = 20 generates a gene
set with a higher likelihood of including unknown ccRCC risk genes.
Additionally, considering the potential downstream analysis, selecting
K = 20 kept the analysis cost within a manageable range in our study.
Statistical analysis
Gene set enrichment analysis
We used the g:Profiler^[313]61 for running functional enrichment
analysis of the top 20 high-confidence risk genes identified by
RL-GenRisk. g:Profiler maps genes to known functional information
sources and detects statistically significantly enriched terms. We
performed enrichment analysis on Gene Ontology, Human Phenotype
Ontology, and WikiPathways. We used FDR p-value < 0.05 as the
significance threshold.
Survival analysis
Clinical data of ccRCC patients and protein expression data of EGFR
were obtained from TCGA. We used cSurvival^[314]115 to perform
progression-free survival and disease-specific survival analysis on
these data. Progression-free survival utilizes the time from
randomization or initiation of treatment to the occurrence of disease
progression or death^[315]77. Disease-specific survival refers to
deaths caused specifically by a particular disease^[316]78. Patients
were categorized into quartiles based on the expression levels of
protein encoded by EGFR. The Kaplan-Meier estimator was used to
generate survival curves, and the difference was assessed using the
log-rank test^[317]116.
In vitro and in vivo experiments
Cell lines
ACHN and 786-O cells were purchased from Shanghai Zhong Qiao Xin Zhou
Biotechnology Co., Ltd. (Shanghai, China), and were cultured in DMEM
and RPMI-1640 medium (HyClone, Utah, USA), respectively, supplemented
with 10% fetal bovine serum (FBS, Gibco, Australia) and 1% antibiotics
(penicillin and streptomycin, HyClone) in a humidified incubator
containing 5% CO[2] at 37 °C. The stable cell lines of ACHN-shEGFR,
786-O-shEGFR, 786-O-EGFR, ACHN-shPCLO, and 786-O-shPCLO were obtained
as described previously^[318]117.
High-content screening analysis
Cells were inoculated into a 96-well plate at an optimal density to
ensure ideal growth and interaction. A high-content live-cell imaging
system was utilized to monitor the cells at multiple time points,
assessing various parameters. This imaging system, equipped with
fluorescence microscopy capabilities, allows for real-time
visualization of cell morphology and dynamics. At each time point,
images were automatically captured, focusing on specific areas of
interest within each well. The analysis included measuring cell
perimeter, calculating movement speed and accumulated distance. The
above analysis was done using Opera Phenix Plus and Harmony 5.2.
Western blot analysis
Total protein was extracted using TPER solution from Thermo Fisher
Scientific. The protein concentration was measured using a BCA kit from
Pierce. The proteins were separated using 10% SDS-PAGE. The proteins
were transferred onto a PVDF membrane, and the membrane was blocked
with 5% skimmed milk for 1 h at room temperature (RT). Then, the
membrane was incubated with primary rabbit anti-human EGFR antibodies
([319]R22778, ZenBio, WB: 1:1000), rabbit anti-PCLO antibody
(HPA015858, Sigma-Aldrich, WB: 1:1000) and mouse anti-GAPDH antibody
(250133, ZenBio, WB: 1:5000) at 4 °C overnight. After three washes with
TBST for 10 min, the membrane was incubated with horseradish peroxidase
(HRP)-conjugated goat antirabbit secondary antibodies (458, MBL) for 1
h at RT. Finally, the protein bands were detected using an HRP
substrate. All experiments were performed in triplicate.
Quantitative polymerase chain reaction (qPCR)
Total RNA was extracted from the cells using a QIAGEN kit (74104), and
the RNA concentration was determined using a NanoPhotometer from Implen
(München). cDNA was synthesized using a RevertAid first-strand cDNA
synthesis kit from Thermo Fisher Scientific (Waltham). PCR was
performed using the following protocol: 98 °C for 2 min and 40 cycles
at 98 °C for 5 s and 60 °C for 10 s. The relative gene expression was
calculated using 2^−ΔΔCt. β-Actin was used as an internal control. All
experiments were performed more than triplicate.
Cell counting kit-8 assay
The cell counting kit-8 (CCK-8) assay was performed using a kit from
Dojindo. A total of 3000 cells/well were cultured in a 96-well plate
for 24 h, and 10 μL of CCK-8 solution was added to the wells. Then, the
plate was incubated for 2 h at 37 °C, and the optical density (OD)
value was detected at 450 nm using a microplate reader. All experiments
were performed in triplicate.
Cell apoptosis
An Annexin V-Alexa Fluor 647/PI Kit was purchased from 4A Biotech
(Suzhou, China). The cells were digested and washed twice with cold
phosphate-buffered saline (PBS). Next, the 1 × binding buffer was used
to suspend cells to a concentration of 1−5 × 10^6 cells/mL. Then 5 μL
of Annexin V/Alexa Fluor 647 was added to 100 μL of cells, and the
mixture was incubated at RT for 5 min in the dark. Finally, the flow
cytometry assay was performed after adding 10 μL of 20 μg/mL propidium
iodide (PI) and 400 μL of PBS. Figures illustrating the gating strategy
are provided in the [320]Supplementary Materials.
Transwell assay
Transwell migration assays were conducted using a Transwell chamber
from Corning (REF 3422, Arizona, USA). Briefly, Transwell chambers were
placed on a 24-well plate. Fresh medium containing 10% FBS in 600 μL
was added to the lower chambers, and (2–5) × 10^4 cells in 200 μL of
medium without FBS were added to the upper chamber. The 24-well plate
was incubated at 37 °C for 48 h. Cells that invaded through the chamber
were washed, fixed (20 min with 4% paraformaldehyde), and stained
(30 min with crystal violet). Then, the upper chambers were washed,
photographed, and preserved under an inverted fluorescence OBSERVER
D1/AX10 cam HRC microscope (Zeiss). Transferred cells were analyzed
using ImageJ software.
Colony formation assay
The cells (500/well) were seeded into a 6-well plate and cultured at
37 °C for 7–10 days. Then, the clones were imaged using a Celigo
imaging cytometer from Nexcelom (Lawrence), and the clones were counted
using ImageJ software. All experiments were performed in triplicate.
Animal study
Four-week-old male BALB/C-nu/nu mice were purchased from GemPharmatech.
All these mice were maintained in pathogen-free conditions at 24 °C/50%
humidity, with a light/dark cycle for 12 h, and given a free supply of
food (reproductive diet, catalog number: F010201) and water. A total of
5 × 10^6 cells were inoculated into the right flank of mice, and the
tumor volume was recorded every 3 days starting from day 17 after
injection of the tumor cells. Mice were administered with erlotinib
(S1023, Selleck, Shanghai, China) at a dose of 50 mg/kg. The tumor
volume was calculated using the following equation: L × W^2 × 0.5236,
where L is tumor length and W is tumor width^[321]117. The maximum
tumor diameter allowed by the Ethics Committee is 2 cm. We ensured that
each time mice were sacrificed the maximal body weight loss did not
exceed the limit of 20%. The animal procedures were approved by the
ethics committee of West China Hospital, Sichuan University.
Statistical analysis
All data are presented as the mean ± standard error of the mean (SEM)
or standard deviation (SD). Statistical significance for the comparison
of multiple groups (>3) and between the groups was determined using
analysis of variance (ANOVA) and Student’s paired t-test, respectively,
in GraphPad Prism 9.0. p-value < 0.05 was considered statistically
significant.
Reporting summary
Further information on research design is available in the [322]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[323]Supplementary Information^ (29.4MB, pdf)
[324]Transparent Peer Review file^ (4.8MB, pdf)
[325]Reporting Summary^ (2.4MB, pdf)
[326]41467_2025_58439_MOESM4_ESM.pdf^ (170.6KB, pdf)
Description of Additional Supplementary Files
[327]Supplementary Data 1^ (10.4KB, xlsx)
[328]Supplementary Data 2^ (46.3KB, xlsx)
[329]Supplementary Data 3^ (442.2KB, xlsx)
[330]Supplementary Data 4^ (16.5KB, xlsx)
Source data
[331]Source Data^ (6.2MB, xlsx)
Acknowledgements