Abstract

Background

   Prostate cancer is one of the common malignant tumors in men. Recent
   studies have reported that non-invasive liquid biopsy is of great
   significance in tumor diagnosis. We hope to find relevant detection
   genes to establish a diagnostic and prognostic model for prostate
   cancer.

Methods

   This study was based on the RNA expression data of prostate cancer
   patients from the The Cancer Genome Atlas (TCGA) database and the data
   of exosome-related genes from the GeneCards website. Key
   exosome-related differential genes were identified through cluster
   modeling, univariate and multivariate regression analyses. The roles of
   these genes in the occurrence and prognosis of prostate cancer were
   assessed using ROC curves and survival analysis. Validation was
   performed using prostate cancer patient data from the Gene Expression
   Omnibus (GEO) database.

Results

   Firstly, we obtained 117 exosome-related differential genes (ERDEGs)
   from the RNA expression data of prostate cancer patients in the TCGA
   database. Next, through Least Absolute Shrinkage and Selection Operator
   (LASSO) regression analysis, univariate Cox regression analysis, and
   multivariate Cox regression analysis of the ERDEGs, we obtained three
   genes that were significantly associated with prognosis (AQP2, H4C2,
   ZNF114) and calculated the risk score accordingly. Patients were
   divided into high-risk and low-risk groups based on this score, with
   significant differences in overall survival between the groups. At the
   same time, we conducted an immunological infiltration analysis on
   prostate cancer patients and Weighted correlation network analysis
   (WGCNA) on the ERDEGs. Finally, we used the GEO database ([42]GSE69223,
   [43]GSE229904) for verification and found that AQP2 and ZNF114 had good
   predictive value for the occurrence of prostate cancer.

Conclusion

   Exosome-related genes such as AQP2 and ZNF114 exhibit good performance
   as non-invasive biomarkers in predicting the status and prognosis of
   prostate cancer to avoid the issues of high invasiveness associated
   with invasive examinations.

Supplementary Information

   The online version contains supplementary material available at
   10.1007/s12672-025-03485-0.

   Keywords: Prostate cancer, Exosome-related genes, Prognostic model,
   Cancer status, Survival analysis

Background

   Prostate cancer, one of the most prevalent malignancies in the male
   genitourinary system, has shown a rising incidence globally. The
   increasing aging population and advancements in diagnostic techniques
   have contributed to a year-on-year increase in prostate cancer
   diagnosis rates, posing a significant threat to men’s health and
   quality of life [[44]1]. While prostate biopsy remains a gold standard
   for diagnosis, it is inherently invasive and can cause physical harm
   [[45]2]. Therefore, there is a pressing need for non-invasive and
   efficient diagnostic methods for the majority of patients.
   Additionally, recurrence and progression following first-line treatment
   for advanced and castration-resistant prostate cancer are common,
   highlighting the urgent need for early diagnosis and more targeted
   therapeutic strategies.

   Liquid biopsy has emerged as a promising diagnostic technique, enabling
   disease detection through the analysis of biomarkers in body fluids
   such as blood, urine, and cerebrospinal fluid. Circulating tumor cells
   (CTCs), circulating tumor DNA (ctDNA), and exosomes in the blood can
   serve as indicators of cancer presence [[46]3]. This approach offers
   the advantage of early cancer detection, often before symptom onset,
   while avoiding the trauma associated with surgical biopsies.
   Identifying diagnostic and prognostic biomarkers for early detection
   and risk stratification of prostate cancer is crucial. By measuring
   specific prognostic biomarkers, diagnostic sensitivity can be enhanced,
   potentially improving patient survival outcomes.

   Extracellular vesicles (EVs), which are small, lipid bilayer-structured
   vesicles actively secreted by cells, range in diameter from 30 to
   150 nm. These vesicles, derived from various cell types, contain a
   diverse array of bioactive molecules, including proteins, nucleic acids
   (such as miRNA, mRNA, and lncRNA), and lipids, and play a pivotal role
   in intercellular communication and signal transduction [[47]4].
   Exosomes and their cargo may be implicated in cancer pathology. The
   specific proteins, nucleic acids, and other substances within EVs hold
   promise as cancer biomarkers, warranting further investigation for
   early diagnosis, prognosis evaluation, and treatment monitoring. This
   could ultimately facilitate early diagnosis, timely treatment, and
   continuous monitoring of cancer patients [[48]5, [49]6]. Moreover,
   exosomes can function as carriers for drug or gene therapy, enabling
   precise delivery of therapeutic agents to target cells or tissues,
   thereby enhancing therapeutic efficacy and minimizing side effects.
   Exosomes also influence the behavior of infiltrating immune cells,
   playing a complex role in both tumor immune evasion and immune
   infiltration, which underscores their potential in enhancing
   immunotherapy efficacy and modulating tumor progression [[50]7].

   In our experimental approach, we analyzed RNA expression data from
   prostate cancer patients obtained from the TCGA database,
   cross-referencing these with exosome-related genes from the GeneCards
   database. Through cluster model construction, univariate and
   multivariate regression analyses, and immune infiltration analysis, we
   identified key exosome-related differential genes. Subsequent ROC curve
   and survival analysis curve construction allowed us to better delineate
   the roles of these target genes in prostate cancer development and
   prognosis. Finally, we validated the feasibility of our target
   selection using prostate cancer patient data from the GEO database.

Methods

Collection of microarray data

   Gene expression data and clinical information of patients of TCGA-PRAD
   dataset was downloaded from the UCSC Xena database
   ([51]https://xena.ucsc.edu/) [[52]8]. cBioPortal for Cancer Genomics
   ([53]https://www.cbioportal.org/) provided tumor mutation information.
   Gene transcription data of 502 prostate cancer samples and 52 normal
   samples, clinical data of 572 prostate cancer patients were obtained.
   Exosome-related genes were obtained from geneCards
   ([54]https://www.genecards.org/) and we used the genes with a relevance
   score higher than 2. After excluding duplicate genes, 117 different
   exosome-related genes were identified [[55]9].

   Next, mRNA microarray data were downloaded from the Gene Expression
   Omnibus (GEO) database ([56]http://www.ncbi.nlm.nih.gov/geo/) under the
   accession number [57]GSE221709. In this dataset, there are 12 C4-2 cell
   samples, which are divided into four groups and treated with vehicle
   and enzalutamide respectively. RNA sequencing data of C4-2 cells and
   matching EVs are obtained, and gene expression profiling analysis is
   performed based on this. Samples included in this dataset were
   sequenced by the platform of [58]GPL20795.

   Finally, we selected [59]GSE69223 and [60]GSE229904 as the validation
   sets. These datasets were obtained from the Gene Expression Omnibus
   (GEO) database. The [61]GSE69223 dataset contains 30 samples, with 15
   samples taken from primary prostate cancer and 15 samples from adjacent
   normal prostate tissues. The [62]GSE229904 dataset includes 175
   samples, among which 140 samples were obtained from prostate cancer
   tissues and 35 samples from adjacent normal prostate tissues. The
   [63]GSE69223 was sequenced using the platform [64]GPL570, and the
   [65]GSE229904 was sequenced using the platform [66]GPL18573.(Table
   [67]1,  [68]2).

Table 1.

   Comparison of baseline characteristics between prostate cancer samples
   and adjacent normal Prostate tissue samples in TCGA-PRAD database
            All        Tumor                    Normal                   P
   Number   554        502                      52
   OS. status
   Alive    542 (97.8) 490 (97.6)               52 (100)                 0.26
   Dead     12 (2.2)   12 (2.4)                 0 (0)
   OS. time 554        924.00 (524.00, 1461.75) 951.00 (733.75, 1374.25) 0.561
   Age      554        61.00 (56.00, 66.00)     61.00 (54.25, 66.00)     0.639
   Gleason
   3        172 (31)   153 (30.5)               19 (36.5)                0.008
   4        269 (48.6) 238 (47.4)               31 (59.6)
   5        113 (20.4) 111 (22.1)               2 (3.8)
   T
   T2       219 (39.5) 190 (37.8)               29 (55.8)                0.049
   T3       315 (56.9) 294 (58.6)               21 (40.4)
   T4       13 (2.3)   11 (2.2)                 2 (3.8)
   N
   N0       505 (91.2) 457 (91.0)               48 (92.3)                0.842
   N1       3 (0.5)    3 (0.6)                  0 (0)
   M
   M0       394 (71.1) 348 (69.3)               46 (88.5)                0.008
   M1       81 (14.6)  80 (15.9)                1 (1.9)
   AQP2     554        1.00 (0.00, 2.00)        4.00 (2.00, 6.88)        < 0.001
   H4C2     554        2.32 (1.00, 3.46)        1.00 (0.00, 1.90)        < 0.001
   ZNF114   554        3.32 (2.58, 4.00)        4.52 (3.93, 5.04)        < 0.001
   [69]Open in a new tab

Table 2.

   List of information of selected datasets
   TCGA-PRAD [70]GSE229904 [71]GSE69223 [72]GSE221709
   Samples of prostate cancer we select 502 140 15 6
   Samples of normal prostate tissue we select 52 35 15 0
   Samples treated with DMSO / / / 3
   Samples treated with EZ / / / 3
   Species Homo sapiens Homo sapiens Homo sapiens Homo sapiens
   Platform / [73]GPL18573 [74]GPL570 [75]GPL20795
   Role training validation validation supplement
   Overall design gene expression RNAseq in primary tumors or the adjacent
   prostate tissue for patients with prostate cancer mRNA profiles of
   primary tumors or the adjacent prostate tissue for patients with
   prostate cancer mRNA microarray expression analysis was performed by
   using 30 matched malignant and non-malignant prostate tissue samples
   from 15 prostate cancer patients. Gene expression profiling analysis of
   RNA-seq data for Evs from C4-2 cells treated with Dimethyl sulfoxide
   (DMSO) or Enzalutamide (EZ)
   [76]Open in a new tab

Differential expression analysis

   To identify differentially expressed genes (DEGs) between different
   experimental conditions, we used package “limma” (Version 3.60.6) in R
   to calculate P.Value and fold change [[77]10]. This method calculated
   fold changes and statistical significance for each gene and genes with
   an adjusted P.Value less than threshold value (P.Value<0.05) and a
   certain fold change cutoff ( |log2(fold change)|>1 or 0.585 ) were
   considered as differentially expressed.

   Volcano plots, heatmaps and boxplots were applied to visualize the
   distribution of DEGs. Volcano plots displayed the relationship between
   fold change and statistical significance, allowing for identification
   of highly significant DEGs. Heatmaps and boxplots were generated to
   provide a visual representation of the differences in interested gene
   expression.

Functional enrichment analysis of the DEGs

   Package “clusterProfiler” (version 4.12.6) and “org.Hs.eg.db” (version
   3.19.1) were used to obtain the specific genes and matched Entrez ID
   which is utilized to identify enriched Gene Ontology (GO) terms,
   including biological processes, molecular functions, and cellular
   components [[78]11]. Additionally, we analyzed functional enrichment in
   Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Next, Gene Set
   Enrichment Analysis (GSEA) was performed on three hub genes (AQP2,
   H4C2, ZNF114).

   All in all, the enrichment analysis was performed by comparing the list
   of differentially expressed genes to a background gene set and enriched
   terms were identified based on statistical significance.

Calculation of the relevant score

   Package “GSVA” (version 1.52.3) was used to calculate each sample’s
   exosome-related enrichment scores by the analysis of single-sample gene
   set enrichment (ssGSEA) [[79]12]. We evaluated the correlation between
   the gene expression of each module and exosome-related genes using the
   exosome-related gene score which was on the basis of the selected gene
   expression levels.

   All tumor samples were divided into high-risk and low-risk group based
   on the risk scores. The dividing line between two groups was the median
   risk scores.

Weighted gene co-expression network analysis (WGCNA)

   Weighted correlation network analysis (WGCNA) was conducted to assess
   the co-expression relationships and categorizing co-expressed genes
   into different modules based on WGCNA package in R (Version 1.73)
   [[80]13]. Firstly, we calculated the soft threshold through the
   scale-free graph and obtained the optimal soft threshold as 6. Next, we
   divided the co-expressed genes into multiple gene modules (with no less
   than 200 genes in each module). There were significant differences
   among the various modules. Subsequently, we calculated the correlation
   between the gene expression of each module and the enrichment score of
   exosome-related genes, and selected the modules with relatively high
   correlations. Key genes were extracted from the overlap between the
   module genes and the differential expressed genes (DEGs), giving
   priority to those genes that had the greatest correlation with the
   exosome score.

Analysis of immune cell infiltration

   Firstly, we used the CIBERSORT package to integrate the LM22 signature
   gene matrix set and the GDC-TCGA-PRAD gene expression matrix.
   Single-sample gene set enrichment analysis (ssGSEA) estimated the
   scores of specific immune cell types at the single-sample level, and
   the “GSVA” package was employed to evaluate the relative abundances of
   22 immune cell types [[81]12]. The stacked histogram displayed the
   proportion of immune cell infiltration in the samples of the
   GDC-TCGA-PRAD dataset, while the box plot illustrated the differences.
   Finally, we selected the immune cells with significant differences and
   calculated their correlations.

Construction of risk model and prognosis model

   The gene expression matrix of ERDEGs was extracted from the TCGA-PRAD
   data, and the survival time corresponding to each patient was also
   obtained. Univariate Cox proportional hazards regression analysis was
   used to calculate the association between the patients’ overall
   survival time and the gene expression levels. Eight genes were found to
   be significantly associated with prognosis. Subsequently, we carried
   out regularization using the glmnet package. LASSO regression combined
   with 10-fold cross-validation was used, with “12345” as the selected
   random seed number, to analyze the eight key genes. We selected the key
   genes to construct a risk model, which was later used to construct the
   prognostic score for overall survival. Subsequently, we adopted
   multivariate Cox proportional hazards regression analysis to screen out
   the exosome-related genes as independent prognostic factors [[82]14].
   Finally, according to the critical value of the median risk score, the
   patients were divided into high-risk and low-risk groups. The
   calculation method of the risk score is as follows:
   graphic file with name d33e831.gif

   Finally, we included a multivariate Cox regression analysis of
   different clinical indicators and risk scores to explore the prognostic
   indicators that can predict the survival outcome of patients and drew
   the Receiver Operating Characteristic (ROC) curve and Kaplan-Meier
   curve by using timeROC package (version 0.4), survival package (version
   3.7.0) and surveminer package (version 0.5).

Protein-protein interaction network

   The protein-protein interaction (PPI) network consists of individual
   proteins that interact with each other to participate in various
   pathways. We imported the significantly differentially expressed genes
   obtained by comparing the high-risk group with the low-risk group into
   the STRING database ([83]https://string-db.org/) to determine the
   interaction relationships among the genes [[84]15]. Then, we imported
   this data into the Cytoscape and the network was drawn through
   Cytoscape (version 3.9.1, [85]https://cytoscape.org/). The package of
   “cytohubba” was used to identify the top 20 hub genes.

Results

Screening for hub genes through differential analysis and WGCNA

   To investigate the alterations in exosome-related genes using a novel
   prostate cancer dataset, we obtained the TCGA-PRAD prostate cancer data
   from the UCSC Xena website. The dataset comprised RNA-related gene
   expression data from 554 samples, including 502 cancer samples and 54
   adjacent normal tissue samples. Additionally, we downloaded survival
   and phenotypic data for 572 patients, providing a robust foundation for
   exosome-related research.

   Initially, we employed the limma package (Version 3.60.6) in R to
   identify differentially expressed genes (DEGs) between tumor tissues
   and adjacent normal tissues in the TCGA-PRAD dataset. Each gene’s
   expression level was standardized, resulting in the identification of
   3,345 DEGs, including 715 upregulated genes (logFC > 1 and p < 0.05)
   and 2,630 downregulated genes (logFC < -1 and p < 0.05). A
   corresponding volcano plot was generated to visualize these findings
   (Fig. [86]1A). We then retrieved the exosome-related gene library from
   the GeneCards website, filtering for 861 exosome-related genes (ERGs)
   with a relevance score greater than 2. By intersecting the DEGs and
   ERGs, we identified 117 exosome-related differentially expressed genes
   (ERDEGs), whose expression patterns were depicted in a heatmap
   (Fig. [87]1B).

Fig. 1.

   [88]Fig. 1
   [89]Open in a new tab

   Differential expression analysis and WGCNA screening of key genes A
   Volcano plot showing different expression genes between tumor group and
   the adjacent normal group. Statistically significant upregulated and
   downregulated genes are depicted in red and blue respectively. B
   Heatmap describing the expression of ERDEGs. C Result of weighted
   correlation network analysis (WGCNA) soft power selection. D All gene
   cluster dendrogram based on different metrics. E Heatmap showing the
   correlation between each gene module identified by WGCNA and the
   exosome-related score. F Mutation analysis diagram of key genes. G
   Mutation analysis based on TCGA-PRAD dataset

   Using the expression levels of these 117 ERDEGs, we calculated the
   exosome-related score for each sample in the cohort using the ssGESA
   package in R. Following weighted gene co-expression network analysis
   (WGCNA) on the standardized TCGA cohort data, we determined the optimal
   soft-thresholding power to be 6, based on the scale-free network
   (Fig. [90]1C). All genes in the cohort were clustered into 16 modules
   (Fig. [91]1D), with the brown module exhibiting the highest correlation
   (0.86), followed by the magenta module (0.71) and the greenyellow
   module (0.43) (Fig. 1E). By intersecting the ERDEGs with the genes in
   these three modules, we identified 15 key genes.

   Through the cBioPortal website, we analyzed the mutation status of
   these 14 hub genes (excluding the EPCIP gene) in the cohort and found
   that 10 genes exhibited mutations, with NXPE4 showing the highest
   mutation rate at 2.5% (Fig. [92]1F). The predominant mutation types
   were missense mutations and deep deletions. Analysis of the overall
   gene mutations in the TCGA cohort revealed that missense mutations were
   the most common somatic mutations. Additionally, among all cancer
   patients, single nucleotide variations were most frequently C-T,
   followed by T-C and C-A (Fig. [93]1G).

Construction of clustering model

   We utilized the ConsensusClusterPlus package (Version 1.60.1) in R to
   perform molecular subtyping based on the key genes. The maximum value
   of k was set to 6. By evaluating the classification of each subtype
   across k values ranging from 2 to 6, we determined that the most
   distinct classification was achieved when k = 2 (Fig. [94]2A).
   Consequently, it is reasonable to categorize all samples into two
   groups based on the expression levels of the key genes.

Fig. 2.

   [95]Fig. 2
   [96]Open in a new tab

   Clustering molecular subtype. A Heatmap showing the clustering
   molecular subtype (k = 2) in TCGA-PRAD dataset. B Principal component
   analysis (PCA) plot of transcriptomes from cluster 1 (red) and cluster
   2 (blue). C Heatmap describing the expression of 15 key genes in
   cluster 1 and cluster 2. D Boxplot illustrating the comparison of key
   genes in different subtypes

   The 554 samples were divided into two clusters, with cluster 1
   comprising 273 samples and cluster 2 comprising 281 samples. Principal
   component analysis (PCA) of the two clusters revealed significant
   differences in their compositional profiles (Fig. [97]2B). The
   expression patterns of the key genes across the clusters were
   visualized using heatmaps and boxplots (Fig. [98]2C ~ D). Notably,
   KRT13, OLFM4, and ITGB4 exhibited significant differential expression
   between the two clusters, with all three genes showing higher
   expression levels in cluster 2.

Construction of risk model

   Based on the survival time and survival status statistics of each
   patient sample in the TCGA database, we conducted a univariate Cox
   regression analysis on the 117 exosome-related differentially expressed
   genes (ERDEGs) identified from the differential analysis between the
   tumor group and the adjacent normal tissue group. This analysis
   revealed eight genes significantly associated with prognosis (p < 0.05)
   (Fig. [99]3A). The expression levels of these eight genes in the tumor
   group and the adjacent normal tissue group were visualized using a
   heatmap and boxplot (Fig. [100]3B ~ C).

Fig. 3.

   [101]Fig. 3
   [102]Open in a new tab

   Regression analysis to select hub genes. A Results of univariate Cox
   regression analysis of the 117 screened exosome-related differential
   genes. We present eight genes that are significantly associated with
   prognosis. B Heatmap describing the expression of the 8 screened
   related genes in the tumor group and the adjacent normal group. C
   Boxplot of the expression of the eight screened related genes in the
   tumor group and the adjacent normal group. D Least absolute shrinkage
   and selection operator (LASSO) analysis and LASSO coefficient of eight
   exosome-related genes associated with prognosis. E Tenfold
   cross-validation of the LASSO Cox model. F Results of multivariate Cox
   regression analysis of the eight screened related genes. We present
   three genes that are significantly associated with prognosis

   We further performed Least Absolute Shrinkage and Selection Operator
   (LASSO) regression analysis on these eight genes to identify key genes
   for constructing the prognostic model (Fig. [103]3D ~ E). The analysis
   identified six genes (AQP2, CA4, H4C2, RAB19, RRAS, and ZNF114) as key
   variables. Subsequently, multivariate Cox regression analysis of the
   eight genes from the univariate analysis highlighted three genes
   significantly associated with prognosis (p < 0.05): AQP2, H4C2, and
   ZNF114 (Fig. [104]3F). Notably, these three key genes were included in
   the six genes identified by the LASSO regression analysis. Using these
   three genes, we established a risk scoring system. The prognostic risk
   for each prostate cancer patient was quantified using the calculated
   prognostic score, and patients were stratified into high-risk and
   low-risk groups based on the median prognostic score. The baseline
   characteristics of the selected data are summarized in Table [105]1.

   Finally, we generated survival curves for the three key genes (Figure
   S2A ~ C). Among them, the AQP2 gene showed a significant association
   with prognosis (p = 0.018). Although the H4C2 gene did not exhibit a
   significant prognostic association, the survival curve clearly
   demonstrated its ability to distinguish between high-risk and low-risk
   patient groups.

Summary of survival status, differential analysis, and enrichment analysis in
the high-risk and low-risk groups

   Firstly, we analyzed the survival status of the high-risk and low-risk
   groups and observed a significant difference in overall survival rates
   between the two groups, as evidenced by the Kaplan-Meier curve
   (p < 0.001) (Fig. [106]4 ~ B).

Fig. 4.

   [107]Fig. 4
   [108]Open in a new tab

   Survival analysis and differential expression analysis between
   high-risk and low-risk group. A Survival analysis results of different
   risk groups based on the risk score from the expression of three key
   genes. B Risk factor linkage diagram describing the survival status of
   each sample and the risk score. C Volcano plot showing different
   expression genes between high-risk group and low-risk group.
   Statistically significant upregulated genes are depicted in red. D Gene
   oncology (GO) functional analysis results illustrating the
   significantly enriched terms for the differentially expressed
   exosome-related genes, with the size of the bubbles representing the
   number of genes and the color indicating the significance level. E Gene
   concept network of the main pathway from GO functional analysis

   Next, we conducted a differential analysis on the gene expression data
   from the TCGA dataset. The results revealed that, compared to the
   high-risk group, the low-risk group exhibited 362 upregulated genes
   (logFC > 0.585 and p < 0.05), with no downregulated genes identified.
   These findings were visualized using a volcano plot (Fig. [109]4C).

   Gene Ontology (GO) enrichment analysis of these differentially
   expressed genes indicated that they were predominantly enriched in
   pathways related to humoral immune response and antibody binding, with
   significant presence in the extracellular matrix and immune complexes
   (Fig. [110]4D). The differential genes involved in these common
   enrichment pathways are displayed in Fig. [111]4E.

   Finally, we performed Gene Set Enrichment Analysis (GSEA) on the three
   key prognostic genes (Figure S3A ~ C). AQP2 was primarily associated
   with basal cell carcinoma and cytoskeleton composition in muscle cells.
   H4C2 was mainly linked to basal cell carcinoma and EGFR tyrosine kinase
   inhibitor resistance. ZNF114 was related to axon guidance and focal
   adhesion. The analysis of these three gene groups may provide valuable
   insights for further treatment strategies in prostate cancer patients.

Exploration of the effect of enzalutamide on EV-RNA in prostate cancer
patients

   After identifying exosome-related genes strongly associated with the
   diagnosis and prognosis of prostate cancer patients, we sought to
   investigate the correlation between enzalutamide treatment for prostate
   cancer and these genes. The Gene Expression Omnibus (GEO), a public
   functional genomics data repository, provides access to individual DNA
   or RNA profiles from extracellular vesicles (EVs) in blood samples
   collected from prostate cancer patients. The primary gene expression
   data used in this study were obtained from the [112]GSE221709 dataset,
   which includes 12 samples of C4-2 cells derived from prostate cancer
   epithelial cells. We analyzed 6 samples to examine changes in EV-RNA in
   prostate cancer samples following treatment with either dimethyl
   sulfoxide (DMSO) or enzalutamide (EZ). Notably, the 6 samples in the
   dataset were pre-screened and standardized. EV-RNA from the
   cell-conditioned media was extracted using the ExoRNeasy kit, and the
   gene probe used for these samples was [113]GPL20795.

   After applying a logarithmic transformation, the overall variation in
   EV-RNA content across the six sample groups was minimal, allowing for
   further analysis (Fig. [114]5A). Based on predefined criteria for
   differentially expressed genes, we identified 3080 downregulated genes
   and 103 upregulated genes in the EZ treatment group compared to the
   DMSO control group. Statistical analysis of the differences in EV-RNA
   content between the EZ and DMSO treatment groups was visualized using a
   volcano plot (Fig. [115]5B).

Fig. 5.

   [116]Fig. 5
   [117]Open in a new tab

   Differential expression analysis based on [118]GSE221709 dataset. A Box
   plots describing the EV-RNA content of each sample after data
   standardization. B Volcano plot showing DEG about EV-RNA in CRPC
   patients who received DMSO or EZ. Statistically significant upregulated
   and downregulated EV-RNA are depicted in red and blue respectively. Top
   upregulated and downregulated EV-RNA are highlighted. C Gene oncology
   (GO) functional analysis results illustrating the significantly
   enriched terms for the differentially expressed EV-RNA, with the size
   of the bubbles representing the number of genes and the color
   indicating the significance level. D Kyoto Encyclopedia of Genes and
   Genomes (KEGG) pathway enrichment analysis performed on genes
   upregulated or downregulated in EV-RNA from CRPC patients treated by
   DMSO or EZ

   Enrichment analysis was performed on all differentially expressed
   EV-RNAs between the enzalutamide and control groups, and the results
   were presented in a bubble plot (Fig. [119]5C). The analysis revealed
   that differentially expressed genes were significantly associated with
   pathways such as transmembrane transport proteins, ion channel complex
   formation, and chemotaxis. This suggests that enzalutamide’s mechanism
   of action in treating prostate cancer, through androgen receptor
   inhibition, may mediate changes in ion channels and transmembrane
   proteins on the surface of tumor cells. The KEGG enrichment analysis
   (Fig. [120]5D) further indicated that downregulated EV-RNA genes were
   predominantly involved in androgen and estrogen metabolism pathways and
   renin secretion pathways, aligning with enzalutamide’s therapeutic
   mechanism.

   Further investigation revealed that among the 103 upregulated genes, 10
   genes (MSN, ANXA6, PRNP, CAV1, ITGB3, TUBB6, KRT1, CAMP, ITGB4,
   ANGPTL1) were identified as exosome-related differentially expressed
   genes (ERDEGs). This implies that enzalutamide may enhance the
   expression of these genes in exosomes through various pathways while
   targeting the androgen receptor, thereby inhibiting prostate cancer
   proliferation. Additionally, by calculating the expression changes of
   the three key genes, we found that the log fold change of AQP2 was 2.
   However, the p-value for AQP2 was 0.1, potentially due to the small
   sample size, which does not preclude AQP2’s role in enzalutamide’s
   therapeutic effect on prostate cancer.

Analysis of protein-protein interaction network and immune infiltration

   We compiled a list of 362 differentially expressed genes (DEGs). These
   genes were imported into the STRING database to calculate their
   interaction relationships, and the resulting data were subsequently
   integrated into Cytoscape software. Using Cytoscape (Version 3.9.1,
   [121]www.cytoscape.org/), we constructed a protein-protein interaction
   (PPI) network for the 362 DEGs to explore gene interaction
   relationships and identify common protein domains (Figure S4A). The
   genes were sorted based on their degree of connectivity, with the color
   depth of each gene positively correlated to its degree. Following this,
   we employed the “cytohubba” package to identify the top 20 hub genes
   (Figure S4B). These 20 genes were divided into two groups: one group
   comprised histone-related genes, including H4C2, H3C13, and H2AC8,
   while the other group consisted of keratin-related genes, such as KRT14
   and KRT5. This finding suggests that keratin and histone may serve as
   potential evaluation indicators for prostate cancer risk
   stratification. Notably, the proteins KRT15 and H3C13 exhibited a
   relatively high degree of connectivity with other genes among the DEGs.

   Finally, we calculated the infiltration abundances of 22 immune cell
   types to investigate differences in immune infiltration between the
   high-risk and low-risk patient groups. The proportions of each immune
   cell type in every sample were visualized using a bar chart
   (Fig. [122]6A). Additionally, we applied the Mann-Whitney U test to
   statistically analyze the differences in immune cell infiltration
   between the two groups, with the results displayed in boxplots
   (Fig. [123]6B). The analysis revealed significant differences in the
   infiltration abundances of four immune cell types: CD8 + T cells, M1
   macrophages, eosinophils, and neutrophils. We further calculated the
   correlations among these four immune cell types in both the high-risk
   and low-risk groups. Notably, in the high-risk group, the correlation
   between eosinophils and CD8 + T cells was particularly significant
   (Fig. 6[124]6C~ D).

Fig. 6.

   [125]Fig. 6
   [126]Open in a new tab

   Analysis of immune infiltration. A Stacked histogram of CIBERSORT
   immune infiltration analysis in high-risk group and low-risk group in
   TCGA-PRAD dataset. B Boxplot describing comparison about immune
   infiltration analysis between high-risk group and low-risk group. C ~ D
   Correlation analysis of the infiltration abundance describing the four
   immune cells with significant differences between high-risk group and
   low-risk group

Establishment of clinical risk model

   Through multivariate Cox regression analysis of the patient’s Gleason
   score, TNM stage, risk score, and age, we found that only the risk
   score was significantly associated with the patient’s overall survival
   rate (p = 0.039) (Fig. [127]7A). Subsequently, we used the timeROC
   package in R to construct a time-dependent receiver operating
   characteristic (ROC) curve for the TCGA-PRAD data risk model. The
   results showed that the area under the curve (AUC) for the three-year
   overall survival rate based on the risk score was 0.892, which was
   significantly higher than that of other clinical characteristics
   (Fig. [128]7B). These findings indicate that the risk score-based model
   for stratifying patients into high- and low-risk groups is clinically
   valuable for prognosis and survival period prediction. Combined with
   the results from Fig. [129]4A, it is evident that the overall survival
   rate of prostate cancer patients in the high-risk group is
   significantly lower. Finally, we generated a heatmap to visualize the
   distribution of clinical characteristics in patients from the high-risk
   and low-risk groups (Fig. [130]7C). The heatmap revealed that all
   patients with distant metastasis belonged to the high-risk group.
   Additionally, within the low-risk group, the proportions of patients
   with lymph node metastasis and those over 65 years old were slightly
   smaller compared to the high-risk group.

Fig. 7.

   [131]Fig. 7
   [132]Open in a new tab

   Risk model for prediction of prognosis. A Multivariate Cox regression
   analysis results of clinical indexes and the risk-score of three key
   genes. B The receiver operating characteristic (ROC) curves of clinical
   indexes and the risk-score of three key genes. C Heatmaps of
   correlations among different clinical characteristics and the
   risk-score of three key genes

Prediction of cancer status

   In the TCGA-PRAD dataset, multivariate Cox regression analysis
   identified three exosome-related differentially expressed genes
   (ERDEGs) significantly associated with prognosis: AQP2 (p = 0.018),
   H4C2 (p = 0.047), and ZNF114 (p = 0.031). The ROC curves for these
   three genes are presented in the figure, with AUC values of 0.826 for
   AQP2 (Fig. [133]8A), 0.742 for H4C2 (Fig. [134]8B), and 0.802 for
   ZNF114 (Fig. [135]8C). These results indicate that all three genes are
   closely related to the development of prostate cancer.

Fig. 8.

   [136]Fig. 8
   [137]Open in a new tab

   ROC analysis of validation for cancer status prediction in TCGA-PRAD
   Dataset and [138]GSE69223 and [139]GSE229904 dataset. ROC Curves for
   Verification of Cancer Status Prediction in the TCGA-PRAD Dataset A
   AQP2 B H4C2 C ZNF114 ROC Curves for Verification of Cancer Status
   Prediction in the GEO Dataset ([140]GSE69223) D AQP2 E ZNF114 ROC
   Curves for Verification of Cancer Status Prediction in the GEO Dataset
   ([141]GSE229904) F AQP2 G H4C2 H ZNF114

   To validate these findings, we utilized two additional datasets,
   [142]GSE69223 and [143]GSE229904. The [144]GSE69223 dataset comprises
   30 samples, including 15 from primary prostate cancer tissues and 15
   from adjacent normal prostate tissues. Since the gene expression matrix
   of this dataset does not include the H4C2 gene, we only calculated and
   plotted the ROC curves for AQP2 and ZNF114. The results showed AUC
   values of 0.809 for AQP2 (Fig. [145]8D) and 0.804 for ZNF114
   (Fig. [146]8E), confirming that both genes can effectively predict the
   occurrence of prostate cancer. The [147]GSE229904 dataset includes 175
   samples, with 140 from prostate cancer tissues and 35 from adjacent
   normal prostate tissues. The AUC values for this dataset were 0.736 for
   AQP2 (Fig. [148]8F), 0.446 for H4C2 (Fig. [149]8G), and 0.762 for
   ZNF114 (Fig. [150]8H). These results suggest that AQP2 and ZNF114 can
   reliably distinguish prostate cancer from normal prostate tissues,
   while H4C2 exhibits limited discriminatory ability.

Discussion

   Prostate cancer (PCa) is the most prevalent malignant tumor in the male
   genitourinary system. In Europe and America, the incidence of prostate
   cancer has consistently remained high. In China, with the accelerating
   aging population, the gradual adoption of Western lifestyles, and
   continuous advancements in diagnostic techniques, the detection rate of
   prostate cancer is also steadily increasing [[151]16]. Early diagnosis,
   early treatment, and early prediction of prostate cancer have become
   critical objectives in current research efforts.

   In recent years, there has been a growing focus on the application of
   liquid biopsy in prostate cancer, as it serves as a gold standard for
   non-invasive examination and precise diagnosis and treatment of cancers
   and related diseases [[152]17–[153]21]. Among these studies, two
   reports have explored genomic analysis following drug treatment for
   prostate cancer. These studies not only identified genomic drivers of
   resistance to first-line androgen receptor (AR)-directed therapy in
   castration-resistant prostate cancer (CRPC) to obtain drug resistance
   biomarkers but also investigated changes in the immune microenvironment
   of CRPC after first-line drug treatment, providing valuable insights
   for further therapeutic strategies [[154]18, [155]20]. Additionally, a
   previous study conducted whole genome sequencing analysis on plasma
   ctDNA from patients with metastatic castration-resistant prostate
   cancer, revealing the evolutionary history, subclonal dynamics, and
   treatment resistance mechanisms of metastatic cancer [[156]19]. This
   study also explored transcriptional factor signaling activity in
   metastatic cancer, offering considerations for treatment selection.

   Currently, research on ctDNA is relatively mature. In liquid biopsy,
   beyond using ctDNA to establish predictive diagnostic models,
   fragmentomics, methylation changes, and nucleosome positioning also
   provide innovative approaches for liquid biopsy and simplify its
   diagnostic process. It has been discovered that extracellular vesicles
   (EVs) in plasma carry DNA and RNA as cargo. Some studies have
   demonstrated methods for specifically enriching RNA cargo of
   protein-coding transcripts in EVs, enabling transcriptomic analysis of
   protein-coding genes and detecting the expression of metastatic
   prostate cancer-related genes in circulating EV-RNA [[157]22–[158]24].
   Recently, a novel study proposed an explainable artificial
   intelligence-based PCa screening system that integrates a highly
   sensitive dual-gate field-effect transistor-based multi-marker
   biosensor [[159]24]. This system significantly enhances diagnostic
   accuracy by analyzing biomarkers in exosomes derived from urine.

   Based on the TCGA-PRAD database, we analyzed the gene expression
   profiles of prostate cancer samples and adjacent normal prostate tissue
   samples. We identified differentially expressed exosome-related genes
   (ERDEGs) and investigated their association with prostate cancer
   survival. In addition to establishing a risk scoring system, we
   identified three genes significantly correlated with prognosis. We also
   performed differential expression and enrichment analyses on prostate
   cancer patients stratified into high-risk and low-risk groups to
   explore common pathways associated with the differentially expressed
   genes. Furthermore, we conducted survival analysis on patients in
   different risk groups to evaluate the predictive accuracy of the risk
   scoring system for prostate cancer diagnosis and prognosis. Finally, we
   validated this model using two datasets from the GEO database,
   revealing that the proteins AQP2 and ZNF114 exhibit strong predictive
   value for prostate cancer diagnosis.

   As a key member of the aquaporin family, AQP2 primarily regulates water
   reabsorption in the kidneys, thereby controlling urine concentration
   [[160]25]. AQP2, a target of vasopressin, plays a critical role in
   maintaining the body’s water balance under vasopressin regulation. The
   AQP2 water channel undergoes various post-translational modifications,
   such as phosphorylation, ubiquitination [[161]26], and
   glutathionylation [[162]27], which influence its localization,
   stability, and function. Studies have shown that AQP2 upregulation
   promotes bladder fibrosis, leading to bladder outlet obstruction
   [[163]28]. Additionally, AQP2 in urine is predominantly found in
   extracellular vesicles [[164]29] and serves as a diagnostic marker for
   urinary system diseases [[165]30]. As a cargo in exosomes, AQP2
   downregulation is significantly associated with chronic renal failure
   [[166]31]. Our findings suggest that exosome-related AQP2 expression is
   strongly linked to urinary system diseases, and its inclusion in the
   prostate cancer diagnostic and prognostic model enhances diagnostic
   accuracy. ZNF114, a transcription factor in the zinc finger protein
   family, plays a vital role in gene expression regulation, cell
   differentiation, embryonic development, and disease progression. In one
   study, ZNF114 was incorporated into a three-gene diagnostic model for
   oral cancer diagnosis and prognosis prediction [[167]32]. Zinc finger
   proteins also play a dual role in tumorigenesis, with some promoting
   cancer progression, such as ZNF574 in ovarian cancer [[168]33] and
   ZNF488 in pancreatic cancer [[169]34], while others, like ZNF331,
   inhibit tumor proliferation in head and neck cancers [[170]35]. Based
   on our results, we hypothesize that ZNF114 plays a pivotal role in
   prostate cancer progression.

   To explore potential mechanisms underlying castration-resistant
   prostate cancer (CRPC) and drug resistance, we investigated changes in
   circulating EV-RNA following androgen receptor signaling inhibitor
   treatment. Our findings suggest that AQP2 may be involved in pathways
   related to enzalutamide treatment for prostate cancer.

   Additionally, we performed immunological analysis on the TCGA-PRAD
   data. GO enrichment analysis revealed that differentially expressed
   genes between high-risk and low-risk groups are enriched in antibody
   binding and humoral immune response pathways, with these genes commonly
   found in the extracellular matrix and immune complexes. Given the
   immune system’s role in tumor suppression, we examined immune cell
   infiltration abundances between high-risk and low-risk groups and
   identified differences in CD8 + T cells, M1 macrophages, eosinophils,
   and neutrophils. One study has highlighted the importance of the tumor
   immune microenvironment in cancer progression [[171]36], suggesting
   that biomarker-based treatment strategies could be combined with
   immunotherapy to optimize tumor treatment outcomes.

   Finally, we conducted weighted gene co-expression network analysis on
   the TCGA-PRAD data. Genes were clustered into multiple modules, and the
   three modules most strongly correlated with exosome-related genes were
   selected. Through this analysis, we identified 15 key genes, analyzed
   their mutation status, and used them to classify all samples. By
   comparing Cluster 1 and Cluster 2, we evaluated the expression patterns
   of these key genes, further underscoring the significance of
   exosome-related gene research.

   In summary, our findings highlight the potential of minimally invasive
   biomarkers, particularly given the challenges associated with repeated
   tissue biopsies in clinical practice. The model we established not only
   predicts prostate cancer occurrence and patient prognosis but also
   provides a foundation for developing new combination therapies using
   these biomarkers.

Limitations of the study

   Firstly, both the number of samples in the dataset used in the early
   stage of our experiment and that in the final validation set are
   limited. Caution is required since the differentially expressed genes
   obtained from the analysis may be affected by errors from sampling,
   detection and other factors. Therefore, it is necessary to expand the
   sample size to rule out the sample bias in the enrichment analysis.

   Secondly, the gene counts in the TCGA public database were used in our
   training set. If we could conduct research using animal tumor models or
   human tumor samples, the obtained results would be more convincing. We
   need to further acquire clinical samples for analysis.

   Finally, we only used exosome-related genes to establish the diagnostic
   and prognostic model. Although some typical biomarkers may be missed in
   this way, it also enhances our understanding of exosome-related marker
   genes and their utility in diagnosis and prognosis.

Conclusion

   The characteristics of exosome-related genes have a good effect in
   predicting the status and prognosis of prostate cancer. Specifically, a
   risk model based on the expression levels of AQP2, H4C2 and ZNF114 can
   predict the prognosis of prostate cancer relatively well. AQP2 and
   ZNF114 can be used to evaluate the cancer status of prostate cancer. In
   addition, AQP2 may be related to the inhibitory effect of enzalutamide
   on prostate cancer. Therefore, AQP2 and ZNF114 can serve as
   non-invasive biomarkers for the early diagnosis and prognostic
   assessment of prostate cancer patients, avoiding the issues of high
   invasiveness and numerous complications associated with invasive
   examinations.

Supplementary Information

   Below is the link to the electronic supplementary material.
   [172]Supplementary Material 1^ (2.7MB, zip)

Acknowledgements