Abstract Background Prostate cancer is one of the common malignant tumors in men. Recent studies have reported that non-invasive liquid biopsy is of great significance in tumor diagnosis. We hope to find relevant detection genes to establish a diagnostic and prognostic model for prostate cancer. Methods This study was based on the RNA expression data of prostate cancer patients from the The Cancer Genome Atlas (TCGA) database and the data of exosome-related genes from the GeneCards website. Key exosome-related differential genes were identified through cluster modeling, univariate and multivariate regression analyses. The roles of these genes in the occurrence and prognosis of prostate cancer were assessed using ROC curves and survival analysis. Validation was performed using prostate cancer patient data from the Gene Expression Omnibus (GEO) database. Results Firstly, we obtained 117 exosome-related differential genes (ERDEGs) from the RNA expression data of prostate cancer patients in the TCGA database. Next, through Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis, univariate Cox regression analysis, and multivariate Cox regression analysis of the ERDEGs, we obtained three genes that were significantly associated with prognosis (AQP2, H4C2, ZNF114) and calculated the risk score accordingly. Patients were divided into high-risk and low-risk groups based on this score, with significant differences in overall survival between the groups. At the same time, we conducted an immunological infiltration analysis on prostate cancer patients and Weighted correlation network analysis (WGCNA) on the ERDEGs. Finally, we used the GEO database ([42]GSE69223, [43]GSE229904) for verification and found that AQP2 and ZNF114 had good predictive value for the occurrence of prostate cancer. Conclusion Exosome-related genes such as AQP2 and ZNF114 exhibit good performance as non-invasive biomarkers in predicting the status and prognosis of prostate cancer to avoid the issues of high invasiveness associated with invasive examinations. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-03485-0. Keywords: Prostate cancer, Exosome-related genes, Prognostic model, Cancer status, Survival analysis Background Prostate cancer, one of the most prevalent malignancies in the male genitourinary system, has shown a rising incidence globally. The increasing aging population and advancements in diagnostic techniques have contributed to a year-on-year increase in prostate cancer diagnosis rates, posing a significant threat to men’s health and quality of life [[44]1]. While prostate biopsy remains a gold standard for diagnosis, it is inherently invasive and can cause physical harm [[45]2]. Therefore, there is a pressing need for non-invasive and efficient diagnostic methods for the majority of patients. Additionally, recurrence and progression following first-line treatment for advanced and castration-resistant prostate cancer are common, highlighting the urgent need for early diagnosis and more targeted therapeutic strategies. Liquid biopsy has emerged as a promising diagnostic technique, enabling disease detection through the analysis of biomarkers in body fluids such as blood, urine, and cerebrospinal fluid. Circulating tumor cells (CTCs), circulating tumor DNA (ctDNA), and exosomes in the blood can serve as indicators of cancer presence [[46]3]. This approach offers the advantage of early cancer detection, often before symptom onset, while avoiding the trauma associated with surgical biopsies. Identifying diagnostic and prognostic biomarkers for early detection and risk stratification of prostate cancer is crucial. By measuring specific prognostic biomarkers, diagnostic sensitivity can be enhanced, potentially improving patient survival outcomes. Extracellular vesicles (EVs), which are small, lipid bilayer-structured vesicles actively secreted by cells, range in diameter from 30 to 150 nm. These vesicles, derived from various cell types, contain a diverse array of bioactive molecules, including proteins, nucleic acids (such as miRNA, mRNA, and lncRNA), and lipids, and play a pivotal role in intercellular communication and signal transduction [[47]4]. Exosomes and their cargo may be implicated in cancer pathology. The specific proteins, nucleic acids, and other substances within EVs hold promise as cancer biomarkers, warranting further investigation for early diagnosis, prognosis evaluation, and treatment monitoring. This could ultimately facilitate early diagnosis, timely treatment, and continuous monitoring of cancer patients [[48]5, [49]6]. Moreover, exosomes can function as carriers for drug or gene therapy, enabling precise delivery of therapeutic agents to target cells or tissues, thereby enhancing therapeutic efficacy and minimizing side effects. Exosomes also influence the behavior of infiltrating immune cells, playing a complex role in both tumor immune evasion and immune infiltration, which underscores their potential in enhancing immunotherapy efficacy and modulating tumor progression [[50]7]. In our experimental approach, we analyzed RNA expression data from prostate cancer patients obtained from the TCGA database, cross-referencing these with exosome-related genes from the GeneCards database. Through cluster model construction, univariate and multivariate regression analyses, and immune infiltration analysis, we identified key exosome-related differential genes. Subsequent ROC curve and survival analysis curve construction allowed us to better delineate the roles of these target genes in prostate cancer development and prognosis. Finally, we validated the feasibility of our target selection using prostate cancer patient data from the GEO database. Methods Collection of microarray data Gene expression data and clinical information of patients of TCGA-PRAD dataset was downloaded from the UCSC Xena database ([51]https://xena.ucsc.edu/) [[52]8]. cBioPortal for Cancer Genomics ([53]https://www.cbioportal.org/) provided tumor mutation information. Gene transcription data of 502 prostate cancer samples and 52 normal samples, clinical data of 572 prostate cancer patients were obtained. Exosome-related genes were obtained from geneCards ([54]https://www.genecards.org/) and we used the genes with a relevance score higher than 2. After excluding duplicate genes, 117 different exosome-related genes were identified [[55]9]. Next, mRNA microarray data were downloaded from the Gene Expression Omnibus (GEO) database ([56]http://www.ncbi.nlm.nih.gov/geo/) under the accession number [57]GSE221709. In this dataset, there are 12 C4-2 cell samples, which are divided into four groups and treated with vehicle and enzalutamide respectively. RNA sequencing data of C4-2 cells and matching EVs are obtained, and gene expression profiling analysis is performed based on this. Samples included in this dataset were sequenced by the platform of [58]GPL20795. Finally, we selected [59]GSE69223 and [60]GSE229904 as the validation sets. These datasets were obtained from the Gene Expression Omnibus (GEO) database. The [61]GSE69223 dataset contains 30 samples, with 15 samples taken from primary prostate cancer and 15 samples from adjacent normal prostate tissues. The [62]GSE229904 dataset includes 175 samples, among which 140 samples were obtained from prostate cancer tissues and 35 samples from adjacent normal prostate tissues. The [63]GSE69223 was sequenced using the platform [64]GPL570, and the [65]GSE229904 was sequenced using the platform [66]GPL18573.(Table [67]1, [68]2). Table 1. Comparison of baseline characteristics between prostate cancer samples and adjacent normal Prostate tissue samples in TCGA-PRAD database All Tumor Normal P Number 554 502 52 OS. status Alive 542 (97.8) 490 (97.6) 52 (100) 0.26 Dead 12 (2.2) 12 (2.4) 0 (0) OS. time 554 924.00 (524.00, 1461.75) 951.00 (733.75, 1374.25) 0.561 Age 554 61.00 (56.00, 66.00) 61.00 (54.25, 66.00) 0.639 Gleason 3 172 (31) 153 (30.5) 19 (36.5) 0.008 4 269 (48.6) 238 (47.4) 31 (59.6) 5 113 (20.4) 111 (22.1) 2 (3.8) T T2 219 (39.5) 190 (37.8) 29 (55.8) 0.049 T3 315 (56.9) 294 (58.6) 21 (40.4) T4 13 (2.3) 11 (2.2) 2 (3.8) N N0 505 (91.2) 457 (91.0) 48 (92.3) 0.842 N1 3 (0.5) 3 (0.6) 0 (0) M M0 394 (71.1) 348 (69.3) 46 (88.5) 0.008 M1 81 (14.6) 80 (15.9) 1 (1.9) AQP2 554 1.00 (0.00, 2.00) 4.00 (2.00, 6.88) < 0.001 H4C2 554 2.32 (1.00, 3.46) 1.00 (0.00, 1.90) < 0.001 ZNF114 554 3.32 (2.58, 4.00) 4.52 (3.93, 5.04) < 0.001 [69]Open in a new tab Table 2. List of information of selected datasets TCGA-PRAD [70]GSE229904 [71]GSE69223 [72]GSE221709 Samples of prostate cancer we select 502 140 15 6 Samples of normal prostate tissue we select 52 35 15 0 Samples treated with DMSO / / / 3 Samples treated with EZ / / / 3 Species Homo sapiens Homo sapiens Homo sapiens Homo sapiens Platform / [73]GPL18573 [74]GPL570 [75]GPL20795 Role training validation validation supplement Overall design gene expression RNAseq in primary tumors or the adjacent prostate tissue for patients with prostate cancer mRNA profiles of primary tumors or the adjacent prostate tissue for patients with prostate cancer mRNA microarray expression analysis was performed by using 30 matched malignant and non-malignant prostate tissue samples from 15 prostate cancer patients. Gene expression profiling analysis of RNA-seq data for Evs from C4-2 cells treated with Dimethyl sulfoxide (DMSO) or Enzalutamide (EZ) [76]Open in a new tab Differential expression analysis To identify differentially expressed genes (DEGs) between different experimental conditions, we used package “limma” (Version 3.60.6) in R to calculate P.Value and fold change [[77]10]. This method calculated fold changes and statistical significance for each gene and genes with an adjusted P.Value less than threshold value (P.Value<0.05) and a certain fold change cutoff ( |log2(fold change)|>1 or 0.585 ) were considered as differentially expressed. Volcano plots, heatmaps and boxplots were applied to visualize the distribution of DEGs. Volcano plots displayed the relationship between fold change and statistical significance, allowing for identification of highly significant DEGs. Heatmaps and boxplots were generated to provide a visual representation of the differences in interested gene expression. Functional enrichment analysis of the DEGs Package “clusterProfiler” (version 4.12.6) and “org.Hs.eg.db” (version 3.19.1) were used to obtain the specific genes and matched Entrez ID which is utilized to identify enriched Gene Ontology (GO) terms, including biological processes, molecular functions, and cellular components [[78]11]. Additionally, we analyzed functional enrichment in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Next, Gene Set Enrichment Analysis (GSEA) was performed on three hub genes (AQP2, H4C2, ZNF114). All in all, the enrichment analysis was performed by comparing the list of differentially expressed genes to a background gene set and enriched terms were identified based on statistical significance. Calculation of the relevant score Package “GSVA” (version 1.52.3) was used to calculate each sample’s exosome-related enrichment scores by the analysis of single-sample gene set enrichment (ssGSEA) [[79]12]. We evaluated the correlation between the gene expression of each module and exosome-related genes using the exosome-related gene score which was on the basis of the selected gene expression levels. All tumor samples were divided into high-risk and low-risk group based on the risk scores. The dividing line between two groups was the median risk scores. Weighted gene co-expression network analysis (WGCNA) Weighted correlation network analysis (WGCNA) was conducted to assess the co-expression relationships and categorizing co-expressed genes into different modules based on WGCNA package in R (Version 1.73) [[80]13]. Firstly, we calculated the soft threshold through the scale-free graph and obtained the optimal soft threshold as 6. Next, we divided the co-expressed genes into multiple gene modules (with no less than 200 genes in each module). There were significant differences among the various modules. Subsequently, we calculated the correlation between the gene expression of each module and the enrichment score of exosome-related genes, and selected the modules with relatively high correlations. Key genes were extracted from the overlap between the module genes and the differential expressed genes (DEGs), giving priority to those genes that had the greatest correlation with the exosome score. Analysis of immune cell infiltration Firstly, we used the CIBERSORT package to integrate the LM22 signature gene matrix set and the GDC-TCGA-PRAD gene expression matrix. Single-sample gene set enrichment analysis (ssGSEA) estimated the scores of specific immune cell types at the single-sample level, and the “GSVA” package was employed to evaluate the relative abundances of 22 immune cell types [[81]12]. The stacked histogram displayed the proportion of immune cell infiltration in the samples of the GDC-TCGA-PRAD dataset, while the box plot illustrated the differences. Finally, we selected the immune cells with significant differences and calculated their correlations. Construction of risk model and prognosis model The gene expression matrix of ERDEGs was extracted from the TCGA-PRAD data, and the survival time corresponding to each patient was also obtained. Univariate Cox proportional hazards regression analysis was used to calculate the association between the patients’ overall survival time and the gene expression levels. Eight genes were found to be significantly associated with prognosis. Subsequently, we carried out regularization using the glmnet package. LASSO regression combined with 10-fold cross-validation was used, with “12345” as the selected random seed number, to analyze the eight key genes. We selected the key genes to construct a risk model, which was later used to construct the prognostic score for overall survival. Subsequently, we adopted multivariate Cox proportional hazards regression analysis to screen out the exosome-related genes as independent prognostic factors [[82]14]. Finally, according to the critical value of the median risk score, the patients were divided into high-risk and low-risk groups. The calculation method of the risk score is as follows: graphic file with name d33e831.gif Finally, we included a multivariate Cox regression analysis of different clinical indicators and risk scores to explore the prognostic indicators that can predict the survival outcome of patients and drew the Receiver Operating Characteristic (ROC) curve and Kaplan-Meier curve by using timeROC package (version 0.4), survival package (version 3.7.0) and surveminer package (version 0.5). Protein-protein interaction network The protein-protein interaction (PPI) network consists of individual proteins that interact with each other to participate in various pathways. We imported the significantly differentially expressed genes obtained by comparing the high-risk group with the low-risk group into the STRING database ([83]https://string-db.org/) to determine the interaction relationships among the genes [[84]15]. Then, we imported this data into the Cytoscape and the network was drawn through Cytoscape (version 3.9.1, [85]https://cytoscape.org/). The package of “cytohubba” was used to identify the top 20 hub genes. Results Screening for hub genes through differential analysis and WGCNA To investigate the alterations in exosome-related genes using a novel prostate cancer dataset, we obtained the TCGA-PRAD prostate cancer data from the UCSC Xena website. The dataset comprised RNA-related gene expression data from 554 samples, including 502 cancer samples and 54 adjacent normal tissue samples. Additionally, we downloaded survival and phenotypic data for 572 patients, providing a robust foundation for exosome-related research. Initially, we employed the limma package (Version 3.60.6) in R to identify differentially expressed genes (DEGs) between tumor tissues and adjacent normal tissues in the TCGA-PRAD dataset. Each gene’s expression level was standardized, resulting in the identification of 3,345 DEGs, including 715 upregulated genes (logFC > 1 and p < 0.05) and 2,630 downregulated genes (logFC < -1 and p < 0.05). A corresponding volcano plot was generated to visualize these findings (Fig. [86]1A). We then retrieved the exosome-related gene library from the GeneCards website, filtering for 861 exosome-related genes (ERGs) with a relevance score greater than 2. By intersecting the DEGs and ERGs, we identified 117 exosome-related differentially expressed genes (ERDEGs), whose expression patterns were depicted in a heatmap (Fig. [87]1B). Fig. 1. [88]Fig. 1 [89]Open in a new tab Differential expression analysis and WGCNA screening of key genes A Volcano plot showing different expression genes between tumor group and the adjacent normal group. Statistically significant upregulated and downregulated genes are depicted in red and blue respectively. B Heatmap describing the expression of ERDEGs. C Result of weighted correlation network analysis (WGCNA) soft power selection. D All gene cluster dendrogram based on different metrics. E Heatmap showing the correlation between each gene module identified by WGCNA and the exosome-related score. F Mutation analysis diagram of key genes. G Mutation analysis based on TCGA-PRAD dataset Using the expression levels of these 117 ERDEGs, we calculated the exosome-related score for each sample in the cohort using the ssGESA package in R. Following weighted gene co-expression network analysis (WGCNA) on the standardized TCGA cohort data, we determined the optimal soft-thresholding power to be 6, based on the scale-free network (Fig. [90]1C). All genes in the cohort were clustered into 16 modules (Fig. [91]1D), with the brown module exhibiting the highest correlation (0.86), followed by the magenta module (0.71) and the greenyellow module (0.43) (Fig. 1E). By intersecting the ERDEGs with the genes in these three modules, we identified 15 key genes. Through the cBioPortal website, we analyzed the mutation status of these 14 hub genes (excluding the EPCIP gene) in the cohort and found that 10 genes exhibited mutations, with NXPE4 showing the highest mutation rate at 2.5% (Fig. [92]1F). The predominant mutation types were missense mutations and deep deletions. Analysis of the overall gene mutations in the TCGA cohort revealed that missense mutations were the most common somatic mutations. Additionally, among all cancer patients, single nucleotide variations were most frequently C-T, followed by T-C and C-A (Fig. [93]1G). Construction of clustering model We utilized the ConsensusClusterPlus package (Version 1.60.1) in R to perform molecular subtyping based on the key genes. The maximum value of k was set to 6. By evaluating the classification of each subtype across k values ranging from 2 to 6, we determined that the most distinct classification was achieved when k = 2 (Fig. [94]2A). Consequently, it is reasonable to categorize all samples into two groups based on the expression levels of the key genes. Fig. 2. [95]Fig. 2 [96]Open in a new tab Clustering molecular subtype. A Heatmap showing the clustering molecular subtype (k = 2) in TCGA-PRAD dataset. B Principal component analysis (PCA) plot of transcriptomes from cluster 1 (red) and cluster 2 (blue). C Heatmap describing the expression of 15 key genes in cluster 1 and cluster 2. D Boxplot illustrating the comparison of key genes in different subtypes The 554 samples were divided into two clusters, with cluster 1 comprising 273 samples and cluster 2 comprising 281 samples. Principal component analysis (PCA) of the two clusters revealed significant differences in their compositional profiles (Fig. [97]2B). The expression patterns of the key genes across the clusters were visualized using heatmaps and boxplots (Fig. [98]2C ~ D). Notably, KRT13, OLFM4, and ITGB4 exhibited significant differential expression between the two clusters, with all three genes showing higher expression levels in cluster 2. Construction of risk model Based on the survival time and survival status statistics of each patient sample in the TCGA database, we conducted a univariate Cox regression analysis on the 117 exosome-related differentially expressed genes (ERDEGs) identified from the differential analysis between the tumor group and the adjacent normal tissue group. This analysis revealed eight genes significantly associated with prognosis (p < 0.05) (Fig. [99]3A). The expression levels of these eight genes in the tumor group and the adjacent normal tissue group were visualized using a heatmap and boxplot (Fig. [100]3B ~ C). Fig. 3. [101]Fig. 3 [102]Open in a new tab Regression analysis to select hub genes. A Results of univariate Cox regression analysis of the 117 screened exosome-related differential genes. We present eight genes that are significantly associated with prognosis. B Heatmap describing the expression of the 8 screened related genes in the tumor group and the adjacent normal group. C Boxplot of the expression of the eight screened related genes in the tumor group and the adjacent normal group. D Least absolute shrinkage and selection operator (LASSO) analysis and LASSO coefficient of eight exosome-related genes associated with prognosis. E Tenfold cross-validation of the LASSO Cox model. F Results of multivariate Cox regression analysis of the eight screened related genes. We present three genes that are significantly associated with prognosis We further performed Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis on these eight genes to identify key genes for constructing the prognostic model (Fig. [103]3D ~ E). The analysis identified six genes (AQP2, CA4, H4C2, RAB19, RRAS, and ZNF114) as key variables. Subsequently, multivariate Cox regression analysis of the eight genes from the univariate analysis highlighted three genes significantly associated with prognosis (p < 0.05): AQP2, H4C2, and ZNF114 (Fig. [104]3F). Notably, these three key genes were included in the six genes identified by the LASSO regression analysis. Using these three genes, we established a risk scoring system. The prognostic risk for each prostate cancer patient was quantified using the calculated prognostic score, and patients were stratified into high-risk and low-risk groups based on the median prognostic score. The baseline characteristics of the selected data are summarized in Table [105]1. Finally, we generated survival curves for the three key genes (Figure S2A ~ C). Among them, the AQP2 gene showed a significant association with prognosis (p = 0.018). Although the H4C2 gene did not exhibit a significant prognostic association, the survival curve clearly demonstrated its ability to distinguish between high-risk and low-risk patient groups. Summary of survival status, differential analysis, and enrichment analysis in the high-risk and low-risk groups Firstly, we analyzed the survival status of the high-risk and low-risk groups and observed a significant difference in overall survival rates between the two groups, as evidenced by the Kaplan-Meier curve (p < 0.001) (Fig. [106]4 ~ B). Fig. 4. [107]Fig. 4 [108]Open in a new tab Survival analysis and differential expression analysis between high-risk and low-risk group. A Survival analysis results of different risk groups based on the risk score from the expression of three key genes. B Risk factor linkage diagram describing the survival status of each sample and the risk score. C Volcano plot showing different expression genes between high-risk group and low-risk group. Statistically significant upregulated genes are depicted in red. D Gene oncology (GO) functional analysis results illustrating the significantly enriched terms for the differentially expressed exosome-related genes, with the size of the bubbles representing the number of genes and the color indicating the significance level. E Gene concept network of the main pathway from GO functional analysis Next, we conducted a differential analysis on the gene expression data from the TCGA dataset. The results revealed that, compared to the high-risk group, the low-risk group exhibited 362 upregulated genes (logFC > 0.585 and p < 0.05), with no downregulated genes identified. These findings were visualized using a volcano plot (Fig. [109]4C). Gene Ontology (GO) enrichment analysis of these differentially expressed genes indicated that they were predominantly enriched in pathways related to humoral immune response and antibody binding, with significant presence in the extracellular matrix and immune complexes (Fig. [110]4D). The differential genes involved in these common enrichment pathways are displayed in Fig. [111]4E. Finally, we performed Gene Set Enrichment Analysis (GSEA) on the three key prognostic genes (Figure S3A ~ C). AQP2 was primarily associated with basal cell carcinoma and cytoskeleton composition in muscle cells. H4C2 was mainly linked to basal cell carcinoma and EGFR tyrosine kinase inhibitor resistance. ZNF114 was related to axon guidance and focal adhesion. The analysis of these three gene groups may provide valuable insights for further treatment strategies in prostate cancer patients. Exploration of the effect of enzalutamide on EV-RNA in prostate cancer patients After identifying exosome-related genes strongly associated with the diagnosis and prognosis of prostate cancer patients, we sought to investigate the correlation between enzalutamide treatment for prostate cancer and these genes. The Gene Expression Omnibus (GEO), a public functional genomics data repository, provides access to individual DNA or RNA profiles from extracellular vesicles (EVs) in blood samples collected from prostate cancer patients. The primary gene expression data used in this study were obtained from the [112]GSE221709 dataset, which includes 12 samples of C4-2 cells derived from prostate cancer epithelial cells. We analyzed 6 samples to examine changes in EV-RNA in prostate cancer samples following treatment with either dimethyl sulfoxide (DMSO) or enzalutamide (EZ). Notably, the 6 samples in the dataset were pre-screened and standardized. EV-RNA from the cell-conditioned media was extracted using the ExoRNeasy kit, and the gene probe used for these samples was [113]GPL20795. After applying a logarithmic transformation, the overall variation in EV-RNA content across the six sample groups was minimal, allowing for further analysis (Fig. [114]5A). Based on predefined criteria for differentially expressed genes, we identified 3080 downregulated genes and 103 upregulated genes in the EZ treatment group compared to the DMSO control group. Statistical analysis of the differences in EV-RNA content between the EZ and DMSO treatment groups was visualized using a volcano plot (Fig. [115]5B). Fig. 5. [116]Fig. 5 [117]Open in a new tab Differential expression analysis based on [118]GSE221709 dataset. A Box plots describing the EV-RNA content of each sample after data standardization. B Volcano plot showing DEG about EV-RNA in CRPC patients who received DMSO or EZ. Statistically significant upregulated and downregulated EV-RNA are depicted in red and blue respectively. Top upregulated and downregulated EV-RNA are highlighted. C Gene oncology (GO) functional analysis results illustrating the significantly enriched terms for the differentially expressed EV-RNA, with the size of the bubbles representing the number of genes and the color indicating the significance level. D Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis performed on genes upregulated or downregulated in EV-RNA from CRPC patients treated by DMSO or EZ Enrichment analysis was performed on all differentially expressed EV-RNAs between the enzalutamide and control groups, and the results were presented in a bubble plot (Fig. [119]5C). The analysis revealed that differentially expressed genes were significantly associated with pathways such as transmembrane transport proteins, ion channel complex formation, and chemotaxis. This suggests that enzalutamide’s mechanism of action in treating prostate cancer, through androgen receptor inhibition, may mediate changes in ion channels and transmembrane proteins on the surface of tumor cells. The KEGG enrichment analysis (Fig. [120]5D) further indicated that downregulated EV-RNA genes were predominantly involved in androgen and estrogen metabolism pathways and renin secretion pathways, aligning with enzalutamide’s therapeutic mechanism. Further investigation revealed that among the 103 upregulated genes, 10 genes (MSN, ANXA6, PRNP, CAV1, ITGB3, TUBB6, KRT1, CAMP, ITGB4, ANGPTL1) were identified as exosome-related differentially expressed genes (ERDEGs). This implies that enzalutamide may enhance the expression of these genes in exosomes through various pathways while targeting the androgen receptor, thereby inhibiting prostate cancer proliferation. Additionally, by calculating the expression changes of the three key genes, we found that the log fold change of AQP2 was 2. However, the p-value for AQP2 was 0.1, potentially due to the small sample size, which does not preclude AQP2’s role in enzalutamide’s therapeutic effect on prostate cancer. Analysis of protein-protein interaction network and immune infiltration We compiled a list of 362 differentially expressed genes (DEGs). These genes were imported into the STRING database to calculate their interaction relationships, and the resulting data were subsequently integrated into Cytoscape software. Using Cytoscape (Version 3.9.1, [121]www.cytoscape.org/), we constructed a protein-protein interaction (PPI) network for the 362 DEGs to explore gene interaction relationships and identify common protein domains (Figure S4A). The genes were sorted based on their degree of connectivity, with the color depth of each gene positively correlated to its degree. Following this, we employed the “cytohubba” package to identify the top 20 hub genes (Figure S4B). These 20 genes were divided into two groups: one group comprised histone-related genes, including H4C2, H3C13, and H2AC8, while the other group consisted of keratin-related genes, such as KRT14 and KRT5. This finding suggests that keratin and histone may serve as potential evaluation indicators for prostate cancer risk stratification. Notably, the proteins KRT15 and H3C13 exhibited a relatively high degree of connectivity with other genes among the DEGs. Finally, we calculated the infiltration abundances of 22 immune cell types to investigate differences in immune infiltration between the high-risk and low-risk patient groups. The proportions of each immune cell type in every sample were visualized using a bar chart (Fig. [122]6A). Additionally, we applied the Mann-Whitney U test to statistically analyze the differences in immune cell infiltration between the two groups, with the results displayed in boxplots (Fig. [123]6B). The analysis revealed significant differences in the infiltration abundances of four immune cell types: CD8 + T cells, M1 macrophages, eosinophils, and neutrophils. We further calculated the correlations among these four immune cell types in both the high-risk and low-risk groups. Notably, in the high-risk group, the correlation between eosinophils and CD8 + T cells was particularly significant (Fig. 6[124]6C~ D). Fig. 6. [125]Fig. 6 [126]Open in a new tab Analysis of immune infiltration. A Stacked histogram of CIBERSORT immune infiltration analysis in high-risk group and low-risk group in TCGA-PRAD dataset. B Boxplot describing comparison about immune infiltration analysis between high-risk group and low-risk group. C ~ D Correlation analysis of the infiltration abundance describing the four immune cells with significant differences between high-risk group and low-risk group Establishment of clinical risk model Through multivariate Cox regression analysis of the patient’s Gleason score, TNM stage, risk score, and age, we found that only the risk score was significantly associated with the patient’s overall survival rate (p = 0.039) (Fig. [127]7A). Subsequently, we used the timeROC package in R to construct a time-dependent receiver operating characteristic (ROC) curve for the TCGA-PRAD data risk model. The results showed that the area under the curve (AUC) for the three-year overall survival rate based on the risk score was 0.892, which was significantly higher than that of other clinical characteristics (Fig. [128]7B). These findings indicate that the risk score-based model for stratifying patients into high- and low-risk groups is clinically valuable for prognosis and survival period prediction. Combined with the results from Fig. [129]4A, it is evident that the overall survival rate of prostate cancer patients in the high-risk group is significantly lower. Finally, we generated a heatmap to visualize the distribution of clinical characteristics in patients from the high-risk and low-risk groups (Fig. [130]7C). The heatmap revealed that all patients with distant metastasis belonged to the high-risk group. Additionally, within the low-risk group, the proportions of patients with lymph node metastasis and those over 65 years old were slightly smaller compared to the high-risk group. Fig. 7. [131]Fig. 7 [132]Open in a new tab Risk model for prediction of prognosis. A Multivariate Cox regression analysis results of clinical indexes and the risk-score of three key genes. B The receiver operating characteristic (ROC) curves of clinical indexes and the risk-score of three key genes. C Heatmaps of correlations among different clinical characteristics and the risk-score of three key genes Prediction of cancer status In the TCGA-PRAD dataset, multivariate Cox regression analysis identified three exosome-related differentially expressed genes (ERDEGs) significantly associated with prognosis: AQP2 (p = 0.018), H4C2 (p = 0.047), and ZNF114 (p = 0.031). The ROC curves for these three genes are presented in the figure, with AUC values of 0.826 for AQP2 (Fig. [133]8A), 0.742 for H4C2 (Fig. [134]8B), and 0.802 for ZNF114 (Fig. [135]8C). These results indicate that all three genes are closely related to the development of prostate cancer. Fig. 8. [136]Fig. 8 [137]Open in a new tab ROC analysis of validation for cancer status prediction in TCGA-PRAD Dataset and [138]GSE69223 and [139]GSE229904 dataset. ROC Curves for Verification of Cancer Status Prediction in the TCGA-PRAD Dataset A AQP2 B H4C2 C ZNF114 ROC Curves for Verification of Cancer Status Prediction in the GEO Dataset ([140]GSE69223) D AQP2 E ZNF114 ROC Curves for Verification of Cancer Status Prediction in the GEO Dataset ([141]GSE229904) F AQP2 G H4C2 H ZNF114 To validate these findings, we utilized two additional datasets, [142]GSE69223 and [143]GSE229904. The [144]GSE69223 dataset comprises 30 samples, including 15 from primary prostate cancer tissues and 15 from adjacent normal prostate tissues. Since the gene expression matrix of this dataset does not include the H4C2 gene, we only calculated and plotted the ROC curves for AQP2 and ZNF114. The results showed AUC values of 0.809 for AQP2 (Fig. [145]8D) and 0.804 for ZNF114 (Fig. [146]8E), confirming that both genes can effectively predict the occurrence of prostate cancer. The [147]GSE229904 dataset includes 175 samples, with 140 from prostate cancer tissues and 35 from adjacent normal prostate tissues. The AUC values for this dataset were 0.736 for AQP2 (Fig. [148]8F), 0.446 for H4C2 (Fig. [149]8G), and 0.762 for ZNF114 (Fig. [150]8H). These results suggest that AQP2 and ZNF114 can reliably distinguish prostate cancer from normal prostate tissues, while H4C2 exhibits limited discriminatory ability. Discussion Prostate cancer (PCa) is the most prevalent malignant tumor in the male genitourinary system. In Europe and America, the incidence of prostate cancer has consistently remained high. In China, with the accelerating aging population, the gradual adoption of Western lifestyles, and continuous advancements in diagnostic techniques, the detection rate of prostate cancer is also steadily increasing [[151]16]. Early diagnosis, early treatment, and early prediction of prostate cancer have become critical objectives in current research efforts. In recent years, there has been a growing focus on the application of liquid biopsy in prostate cancer, as it serves as a gold standard for non-invasive examination and precise diagnosis and treatment of cancers and related diseases [[152]17–[153]21]. Among these studies, two reports have explored genomic analysis following drug treatment for prostate cancer. These studies not only identified genomic drivers of resistance to first-line androgen receptor (AR)-directed therapy in castration-resistant prostate cancer (CRPC) to obtain drug resistance biomarkers but also investigated changes in the immune microenvironment of CRPC after first-line drug treatment, providing valuable insights for further therapeutic strategies [[154]18, [155]20]. Additionally, a previous study conducted whole genome sequencing analysis on plasma ctDNA from patients with metastatic castration-resistant prostate cancer, revealing the evolutionary history, subclonal dynamics, and treatment resistance mechanisms of metastatic cancer [[156]19]. This study also explored transcriptional factor signaling activity in metastatic cancer, offering considerations for treatment selection. Currently, research on ctDNA is relatively mature. In liquid biopsy, beyond using ctDNA to establish predictive diagnostic models, fragmentomics, methylation changes, and nucleosome positioning also provide innovative approaches for liquid biopsy and simplify its diagnostic process. It has been discovered that extracellular vesicles (EVs) in plasma carry DNA and RNA as cargo. Some studies have demonstrated methods for specifically enriching RNA cargo of protein-coding transcripts in EVs, enabling transcriptomic analysis of protein-coding genes and detecting the expression of metastatic prostate cancer-related genes in circulating EV-RNA [[157]22–[158]24]. Recently, a novel study proposed an explainable artificial intelligence-based PCa screening system that integrates a highly sensitive dual-gate field-effect transistor-based multi-marker biosensor [[159]24]. This system significantly enhances diagnostic accuracy by analyzing biomarkers in exosomes derived from urine. Based on the TCGA-PRAD database, we analyzed the gene expression profiles of prostate cancer samples and adjacent normal prostate tissue samples. We identified differentially expressed exosome-related genes (ERDEGs) and investigated their association with prostate cancer survival. In addition to establishing a risk scoring system, we identified three genes significantly correlated with prognosis. We also performed differential expression and enrichment analyses on prostate cancer patients stratified into high-risk and low-risk groups to explore common pathways associated with the differentially expressed genes. Furthermore, we conducted survival analysis on patients in different risk groups to evaluate the predictive accuracy of the risk scoring system for prostate cancer diagnosis and prognosis. Finally, we validated this model using two datasets from the GEO database, revealing that the proteins AQP2 and ZNF114 exhibit strong predictive value for prostate cancer diagnosis. As a key member of the aquaporin family, AQP2 primarily regulates water reabsorption in the kidneys, thereby controlling urine concentration [[160]25]. AQP2, a target of vasopressin, plays a critical role in maintaining the body’s water balance under vasopressin regulation. The AQP2 water channel undergoes various post-translational modifications, such as phosphorylation, ubiquitination [[161]26], and glutathionylation [[162]27], which influence its localization, stability, and function. Studies have shown that AQP2 upregulation promotes bladder fibrosis, leading to bladder outlet obstruction [[163]28]. Additionally, AQP2 in urine is predominantly found in extracellular vesicles [[164]29] and serves as a diagnostic marker for urinary system diseases [[165]30]. As a cargo in exosomes, AQP2 downregulation is significantly associated with chronic renal failure [[166]31]. Our findings suggest that exosome-related AQP2 expression is strongly linked to urinary system diseases, and its inclusion in the prostate cancer diagnostic and prognostic model enhances diagnostic accuracy. ZNF114, a transcription factor in the zinc finger protein family, plays a vital role in gene expression regulation, cell differentiation, embryonic development, and disease progression. In one study, ZNF114 was incorporated into a three-gene diagnostic model for oral cancer diagnosis and prognosis prediction [[167]32]. Zinc finger proteins also play a dual role in tumorigenesis, with some promoting cancer progression, such as ZNF574 in ovarian cancer [[168]33] and ZNF488 in pancreatic cancer [[169]34], while others, like ZNF331, inhibit tumor proliferation in head and neck cancers [[170]35]. Based on our results, we hypothesize that ZNF114 plays a pivotal role in prostate cancer progression. To explore potential mechanisms underlying castration-resistant prostate cancer (CRPC) and drug resistance, we investigated changes in circulating EV-RNA following androgen receptor signaling inhibitor treatment. Our findings suggest that AQP2 may be involved in pathways related to enzalutamide treatment for prostate cancer. Additionally, we performed immunological analysis on the TCGA-PRAD data. GO enrichment analysis revealed that differentially expressed genes between high-risk and low-risk groups are enriched in antibody binding and humoral immune response pathways, with these genes commonly found in the extracellular matrix and immune complexes. Given the immune system’s role in tumor suppression, we examined immune cell infiltration abundances between high-risk and low-risk groups and identified differences in CD8 + T cells, M1 macrophages, eosinophils, and neutrophils. One study has highlighted the importance of the tumor immune microenvironment in cancer progression [[171]36], suggesting that biomarker-based treatment strategies could be combined with immunotherapy to optimize tumor treatment outcomes. Finally, we conducted weighted gene co-expression network analysis on the TCGA-PRAD data. Genes were clustered into multiple modules, and the three modules most strongly correlated with exosome-related genes were selected. Through this analysis, we identified 15 key genes, analyzed their mutation status, and used them to classify all samples. By comparing Cluster 1 and Cluster 2, we evaluated the expression patterns of these key genes, further underscoring the significance of exosome-related gene research. In summary, our findings highlight the potential of minimally invasive biomarkers, particularly given the challenges associated with repeated tissue biopsies in clinical practice. The model we established not only predicts prostate cancer occurrence and patient prognosis but also provides a foundation for developing new combination therapies using these biomarkers. Limitations of the study Firstly, both the number of samples in the dataset used in the early stage of our experiment and that in the final validation set are limited. Caution is required since the differentially expressed genes obtained from the analysis may be affected by errors from sampling, detection and other factors. Therefore, it is necessary to expand the sample size to rule out the sample bias in the enrichment analysis. Secondly, the gene counts in the TCGA public database were used in our training set. If we could conduct research using animal tumor models or human tumor samples, the obtained results would be more convincing. We need to further acquire clinical samples for analysis. Finally, we only used exosome-related genes to establish the diagnostic and prognostic model. Although some typical biomarkers may be missed in this way, it also enhances our understanding of exosome-related marker genes and their utility in diagnosis and prognosis. Conclusion The characteristics of exosome-related genes have a good effect in predicting the status and prognosis of prostate cancer. Specifically, a risk model based on the expression levels of AQP2, H4C2 and ZNF114 can predict the prognosis of prostate cancer relatively well. AQP2 and ZNF114 can be used to evaluate the cancer status of prostate cancer. In addition, AQP2 may be related to the inhibitory effect of enzalutamide on prostate cancer. Therefore, AQP2 and ZNF114 can serve as non-invasive biomarkers for the early diagnosis and prognostic assessment of prostate cancer patients, avoiding the issues of high invasiveness and numerous complications associated with invasive examinations. Supplementary Information Below is the link to the electronic supplementary material. [172]Supplementary Material 1^ (2.7MB, zip) Acknowledgements