Abstract Background In the context of prostate cancer (PCa), the occurrence of biochemical recurrence (BCR) stands out as a pivotal factor significantly impacting prognosis, potentially leading to metastasis and mortality. However, the early detection of BCR poses a substantial challenge for PCa patients. There is an urgent need to pinpoint hub genes that can serve as predictive indicators for BCR in PCa patients. Methods Our primary goal was to identify cell differentiation trajectory-related gene signature in PCa patients by pseudo-time trajectory analysis. We further explored the functional enrichment of overlapped marker genes and probed clinically relevant modules and BCR-related genes using Weighted Gene Co-expression Network Analysis (WGCNA) in PCa patients. Key genes predicting recurrence-free survival were meticulously identified through univariate and multivariate Cox regression analyses. Subsequently, these genes were utilized to construct a prognostic gene signature, the expression, predictive efficacy, putative functions, and immunological landscape of which were thoroughly validated. Additionally, we employed immunohistochemistry (IHC) and a western blotting assay to quantify the expression of PYCR1 in clinical samples. Results Our single-cell RNA (scRNA) sequencing analysis unveiled three subgroups characterized by distinct differentiation trajectories, and the marker genes associated with these groups were extracted from PCa patients. These marker genes successfully classified the PCa sample into two molecular subtypes, demonstrating a robust correlation with clinical characteristics and recurrence-free survival. Through WGCNA and Lasso analysis, we identified four hub genes (KLK3, CD38, FASN, and PYCR1) to construct a risk profile of prognostic genes linked to BCR. Notably, the high-risk patient group exhibited elevated levels of B cell naive, Macrophage M0, and Macrophage M2 infiltration, while the low-risk group displayed higher levels of T cells CD4 memory activated and monocyte infiltration. Furthermore, IHC and western blotting assays confirmed the heightened expression of PYCR1 in PCa tissues. Conclusion This study leveraged the differentiation trajectory and genetic variability of the microenvironment to uncover crucial prognostic genes associated with BCR in PCa patients. These findings present novel perspectives for tailoring treatment strategies for PCa patients on an individualized basis. Keywords: Prostate cancer, Single-cell analysis, Biochemical recurrence, Immune landscape 1. Introduction Prostate cancer (PCa) stands out as one of the most prevalent malignant tumors affecting older men, now recognized as a global public health concern [[33][1], [34][2], [35][3]]. In the previous year alone, there were a staggering 268,490 new cases and 34,500 deaths attributed to PCa, as reported by USA cancer statistics [[36]4]. Individuals with advanced PCa often face a grim prognosis, frequently developing metastases post-radical prostatectomy and chemoradiotherapy. In contrast, those with early-stage localized PCa exhibit a relatively favorable prognosis [[37]4,[38]5]. The prognostic landscape is significantly influenced by biochemical recurrence (BCR) in PCa patients, amplifying the risk of clinical metastasis and death [[39]6,[40]7]. Patients diagnosed with prostate cancer may experience BCR months to years after their initial treatment [[41]8,[42]9]. The timing of this recurrence is contingent upon the cancer's aggressiveness and the specific treatment modality employed. Consequently, treatment strategies for patients in different stages diverge, considering factors such as the site and extent of recurrence, the initial treatment received, and the overall health status of the patient [[43]10,[44]11]. Early detection and intervention are paramount strategies to enhance patient prognosis [[45]12,[46]13]. However, the molecular intricacies of prostate cancer largely elude our understanding. PCa patients may confront local recurrence or metastasis post-surgery, necessitating intensive interventions to extend recurrence-free survival [[47]14]. Recent strides in biomarker development and associated gene models have brought forth novel insights into early diagnosis and treatment for PCa [[48]15]. The advent of single-cell RNA sequencing (scRNA-seq) technology has significantly enriched cancer research by unveiling intricate gene activities at the cellular level [[49]16]. Previous studies, exploring cell development trajectories to identify differentially expressed genes across individual cells, have illuminated the cellular heterogeneity within the tumor microenvironment (TME) [[50]17,[51]18]. These findings provide a fresh perspective on the interrelation of TME heterogeneity in PCa. The exploration of cell differentiation-related marker genes gains prominence, as they may play a pivotal role in the recurrence process of PCa [[52]19]. In this study, we delved into the genetic variation within the TME through the lens of cell differentiation trajectory, aiming to identify key genes associated with BCR in PCa patients. Our findings contribute unique insights for subsequent personalized treatment approaches. 2. Materials and methods 2.1. Data processing We obtained scRNA-seq data from the Gene Expression Omnibus (GEO) database, specifically the [53]GSE176031 dataset, which comprises 6 prostate biopsies from 3 distinct patients, 4 radical prostatectomies with tumor-only samples from 4 patients, and 4 radical prostatectomies with matched normal samples from 4 patients ([54]https://www.ncbi.nlm.nih.gov/geo/) [[55]20]. To avoid the difference between multiple data, we selected “Radical Prostatectomy Tumor/Paired Tumor” samples for subsequent analysis. Download data of the The Cancer Genome Atlas Program-Prostate adenocarcinoma (TCGA-PARD) and [56]GSE116918 datasets was obtained, including mRNA expression and clinical data [[57]21]. Expression data for cell lines were acquired from the Cancer Cell Line Encyclopedia (CCLE) ([58]https://sites.broadinstitute.org/ccle) database [[59]22]. 2.2. Single-cell data analysis The scRNA-seq data underwent quality control and statistical analysis using Seurat packages (version 4.3.0) [[60]23], adhering to the following quality control criteria: (a) exclusion of genes with detection in fewer than 3 cells and cells with fewer than 50 detected genes; (b) exclusion of cells with a total detected gene count less than 200; (c) exclusion of cells with mitochondrial gene expression equal to or exceeding 25%. Correction for variation regression was applied to “percent.mt,” “nCount_RNA,” and “nFeature_RNA” using the “ScaleData” function in Seurat. After stringent quality assurance and filtering, 1500 genes with well-defined roles were retained, and subsequent data normalization was carried out using the LogNormalize method. The initial dimensionality reduction analysis of the single-cell RNA-seq data employed principal component analysis (PCA). Principal clusters were then identified by visualizing the high-dimensional data through the t-distributed random neighbor embedding (tSNE) approach. The data processing procedures closely followed the method described previously [[61]24]. 2.3. Cell annotation and pseudotime trajectory analysis The Seurat package's FindNeighbors and FindClusters functions facilitated unsupervised cluster analysis of cells, employing a nonlinear dimensionality reduction method. Subsequently, using the “SingleR” software package, the cluster was stratified into eight distinct cell types based on marker genes [[62]25]. Leveraging single-cell trajectory analysis through the “Monocle” package [[63]26], we identified marker genes within each trajectory. Differential expression analysis between the main branches, employing a cutoff value of |log2FC| ≥ 1 and P < 0.01, yielded cell differentiation trajectory-related genes, considered marker genes. 2.4. Function enrichment analysis Functional enrichment analysis of the marker genes of the three subgroups, including gene ontology (GO) and Kyoto Encyclopedia of genes and genomes (KEGG), was carried out using the “clusterProfiler” package [[64]27]. 2.5. Identification of molecular subtypes and clinical features analysis The intersection of three subgroups was gathered for subsequent cluster analysis through the utilization of the “ConsensusClusterPlus” package [[65]28]. The optimal cluster size (Kmax = 9) was determined by employing the K-means algorithm and the cumulative distribution function (CDF). The exploration of recurrence-free survival in the two subtypes was based on the “survival” and “survminer” packages. The analysis of clinical traits for the two subtypes was conducted using the “ggplot2” packages. 2.6. Construction of WGCNA and screening of the prognostic marker genes The TCGA-PRAD dataset retrieved from the TCGA database was employed in the construction of the Weighted Gene Co-expression Network Analysis (WGCNA) [[66]29]. Subsequently, clinically significant module genes were identified for in-depth analysis. The association between marker genes and recurrence-free survival in prostate cancer patients from the TCGA validation cohort was evaluated through univariate Cox regression analysis. Prognostic genes were then identified using LASSO and Cox multiple regression analysis, facilitated by the “glmnet” and “survival” packages refer to the previous report [[67]30]. 2.7. Validation of the predictive capability of the risk signature The risk score was calculated using the following equation: Risk score = (coefficient1 * mRNA expression level of gene1) + … + (coefficientX * mRNA expression level of geneX). Additionally, to validate the accuracy of the risk signature, we conducted Kaplan-Meier survival analysis, ROC analysis, and examined the distribution of risk scores as well as survival statuses. These analyses were performed utilizing the “survival” and “survivalROC” packages. 2.8. Mutation profiles, RNA stemness score (RNAss), and GSVA of the risk signature The analysis of gene mutation data was performed using the “Maftools” software, and comparisons were made with previous methodologies to assess the correlation between the risk score and RNA stem score [[68]31]. Additionally, Gene Set Variation Analysis (GSVA) was carried out using the “limma” “GSEABase” and “GSVA” packages [[69]32]. 2.9. Immune landscape and immunotherapeutic responses of the risk signature The immune cell infiltration status in prostate cancer patients was evaluated using the CIBERSORT algorithm [[70]33]. Building upon previously published work, we constructed the immune landscape in PCa patients [[71]30,[72]31]. Additionally, we obtained immunophenoscore (IPS) data from the Cancer Immunome Atlas (TCIA) and investigated associations between risk genes and immune checkpoint molecules [[73]34]. 2.10. Collecting clinical samples The study received approval from the Ethics Committee of the Second Affiliated Hospital of Anhui Medical University (Approve no. SZR2021031) and adhered to the standards outlined in the Declaration of Helsinki. Upon obtaining written informed consent from the patients, tissue samples and clinical data were collected. A total of 41 matched benign prostatic hyperplasia (BPH) and prostate cancer tissues were collected at our medical center between March 2021 and August 2022. 2.11. Immunohistochemistry (IHC) The tissue samples were fixed, and subsequently, 5 μm sections were cut. In the process of dewaxing, rehydration, and antigen extraction, the tissues were subjected to treatment with PYCR1 and Ki-67 antibodies from Affinity Biosciences (USA). The IHC staining results were independently evaluated by two senior pathologists. In previous studies, we have described in detail the steps of IHC scoring [[74]35]. 2.12. Western blotting Western blotting was conducted following standard experimental procedures [[75]36]. In brief, tissue proteins were extracted, separated on a 10% SDS-PAGE gel, and subsequently transferred to PVDF membranes. After blocking with 5% skim milk powder, primary antibodies were incubated overnight at 4 °C on PVDF membranes. β-actin from Affinity Biosciences served as the internal reference protein. 2.13. Statistical analysis Data analysis was performed using R (version 4.2.1), a statistical software application with extensively documented statistical methods in the literature [[76]24,[77]37]. The comparison of data between the two groups was executed using the t-test, with a threshold for statistical significance set at P < 0.05. 3. Results 3.1. Quality control, clustering, and markers of scRNA-seq data The scRNA-seq data from the [78]GSE176031 dataset were obtained for single-cell analysis. After quality control and normalization ([79]Fig. S1A), our analysis revealed a positive correlation between sequencing depth and both mitochondrial content and total intracellular sequence ([80]Fig. S1B). A total of 9,958 genes were functionalized in all samples, and the top 1,500 genes with the most significant functions were selected for further investigation ([81]Fig. S1C). Preliminary dimensionality reduction analysis of scRNA data was then conducted using the PCA method ([82]Fig. 1A; [83]Fig. S1D–F). The high-dimensional scRNA data were visualized through the tSNE algorithm, resulting in the successful classification of cells into 13 clusters ([84]Fig. 1B). The heatmap illustrated the identification of 3,907 marker genes in each cluster ([85]Fig. 1C). These thirteen clusters were annotated for eight distinct cell types, namely Epithelial cells, Monocytes, T cells, Endothelial cells, Macrophages, Chondrocytes, Smooth muscle cells, and B cells ([86]Fig. 1D). Subsequently, trajectory analysis were performed on the scRNA data ([87]Fig. 1E). Furthermore, three primary branches were identified, and genes associated with the cell differentiation trajectory of each branch were extracted through differential expression analysis ([88]Fig. 1F). These genes were then utilized as marker genes for subsequent analysis. Fig. 1. [89]Fig. 1 [90]Open in a new tab Identification of prostate cancer subsets through single-cell trajectory analysis. (A) Principal component analysis (PCA) of scRNA-seq data for initial dimensionality reduction. (B) The t-distributed stochastic neighbor embedding (tSNE) is utilized for clustering PCa cells. (C) Heatmap depicting marker genes in each cluster. (D) Cell type annotation of clusters. (E) Single-cell pseudo-time analysis of three subsets. (F) Trajectory analysis of cell types. 3.2. Function enrichment analysis of cell differentiation trajectory-related marker genes A total of 810, 465, and 682 marker genes were identified in branch I, branch II, and branch III, respectively. Collectively, 1957 marker genes were utilized for functional enrichment. The functional enrichment study, encompassing GO and KEGG analysis, was conducted to elucidate the biological roles of these marker genes in the three subtypes. Specifically, these marker genes in subset I exhibited higher expression in the binding of the MHC class II protein complex, endocytic vesicles, and the regulation of cell-cell adhesion ([91]Fig. 2A). Within subset II, these marker genes were notably enriched in the structural components of the extracellular matrix, collagen-containing extracellular matrix, and urogenital system development ([92]Fig. 2B). In subgroup III, these marker genes exhibited higher expression in the binding of the MHC class II protein complex, endocytic vesicles, and the regulation of cell-cell adhesion ([93]Fig. 2C). Furthermore, in the three subsets related to diabetic complications, KEGG pathways were enriched, including Phagosome, Epstein-Barr virus infection, Human T-cell leukemia virus-1 infection, Focal adhesion, PI3K-Akt signaling pathway, Human papillomavirus infection, Th17 cell differentiation, NF-kappa B signaling pathway, and the AGE-RAGE signaling pathway ([94]Fig. 2D–F). Fig. 2. [95]Fig. 2 [96]Open in a new tab Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses in three subsets. (A–C) GO functional analysis of marker genes in subsets I–III. (D–F) KEGG pathway enrichment analysis of marker genes in subsets I–III. 3.3. Identification of molecular subtypes and risk signature in prostate PCa patients The intersection of the marker genes within the three subgroups was obtained from the [97]GSE116918 dataset, involving the download of mRNA expression and clinical data. Our analysis revealed the presence of two molecular subgroups among these PCa patients ([98]Fig. S2A–C). The Kaplan-Meier plot demonstrated better recurrence-free survival in the C1 subtype compared to the C2 subtype, highlighting a significant association between molecular subtype and clinical outcomes in PCa patients ([99]Fig. S2D). Subsequently, we conducted correlation analysis to explore the relationship between various clinical characteristics and molecular subtypes. The results indicated significant differences in Gleason scores among different molecular subtypes, while no distinctions were observed in terms of age, clinical T stage, and BCR ([100]Fig. S2E). The three subcategories of these marker genes were subjected to WGCNA to identify modules associated with prognosis in PCa patients. The effectiveness of WGCNA's soft threshold was evaluated using the scale-free R^2 ([101]Fig. 3A). The genes displaying differential expression were hierarchically clustered based on various indicators, resulting in a dendrogram ([102]Fig. 3B). Four co-expression modules (Blue, Grey, Red, and Turquoise) were identified and merged when similar gene modules were found. The modules MEturquoise and MEgrey were found to be associated with multiple clinical traits of PCa patients ([103]Fig. 3C). In total, 376 genes were identified from these prognosis-related modules. For further validation, we utilized the TCGA-PRAD dataset, employing a heatmap and a volcano plot to visualize the marker genes from WGCNA modules ([104]Fig. S3A and B). Six marker genes with predictive value for recurrence-free survival were identified through univariate Cox regression analysis ([105]Fig. 3D; [106]Table 1). Subsequently, Lasso regression analysis was applied to further assess these six prognostic genes ([107]Fig. 3E). KLK3, CD38, FASN, and PYCR1 were ultimately identified as the four genes for constructing risk models through multivariate Cox regression analysis ([108]Table 2). A risk signatures were created, and risk scores were generated based on the coefficient values and expression levels of these four risk genes. Individuals with prostate cancer were categorized into low-risk and high-risk groups based on the median risk scores. Notably, the overall survival rate in the TCGA-PRAD cohort was significantly lower in the high-risk group compared to the low-risk group ([109]Fig. 3F). The Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve for predicting one-, three-, and five-year survival rates was “1.000,” “0.941,” and “0.966,” respectively ([110]Fig. 3G), indicating a high degree of predictive accuracy. The heatmap and risk score distribution in [111]Fig. 3H and I illustrate the association of high and low risks with PCa patients. Furthermore, the risk signature was validated using the [112]GSE116918 dataset ([113]Fig. S3C–F). Fig. 3. [114]Fig. 3 [115]Open in a new tab Identification, evaluation, and validation of the gene signature. (A) Cluster analysis and soft-thresholding powers. (B) Weighted Gene Co-expression Network Analysis (WGCNA) module colors. (C) Correlations of thirteen modules with clinical variables and clusters. (D) Forest plot illustrating the results of univariate analysis of marker genes. (E) Lasso coefficient profiles of four key marker genes. (F) Kaplan-Meier (KM) analysis of Overall Survival (OS) based on the four marker genes. (G) Receiver Operating Characteristic (ROC) analysis of the risk signature in predicting OS. (H) Distribution of survival status and risk score in the high-risk and low-risk groups. (I) Expression of risk genes in the high-risk and low-risk groups. Table 1. Univariate Cox regression analysis of six marker genes in prostate cancer dataset. Gene HR HR.95L HR.95H P-value KLK2 0.537 0.297 0.972 0.040 KLK3 0.416 0.211 0.817 0.011 CD38 0.369 0.145 0.939 0.036 ALDH1A3 0.429 0.203 0.907 0.027 FASN 3.035 1.291 7.133 0.011 PYCR1 31.997 2.279 449.147 0.010 [116]Open in a new tab Table 2. Mnivariate Cox regression analysis of four marker genes in prostate cancer dataset. Gene Coefficient HR HR.95L HR.95H P-value KLK3 −0.741 0.477 0.249 0.914 0.026 CD38 −1.200 0.301 0.067 1.344 0.116 FASN 0.982 2.670 0.743 9.587 0.132 PYCR1 2.419 11.232 0.458 275.280 0.138 [117]Open in a new tab 3.4. Mutation profiles, RNAss, and GSVA of the risk signature We conducted an in-depth investigation into genetic alterations to elucidate the impact of genetic mutations. Our findings indicate that risk genes exhibit either no mutation or an extremely low mutation frequency ([118]Fig. 4A). Furthermore, a noteworthy correlation was observed between RNAss and the risk score ([119]Fig. 4B). According to our findings, patients in the high-risk group demonstrated a higher tumor mutational burden (TMB) ([120]Fig. 4C). To gain insights into the potential pathways and better comprehend the functionality of risk genes, we performed GSVA between tow risk group. These pathways were found to be enriched in the high-risk group, including pentose and glucuronate interconversions, porphyrin and chlorophyll metabolism, glyoxylate and dicarboxylate metabolism, citrate cycle TCA cycle, RNA polymerase, cell cycle, and RNA degradation. Conversely, signaling pathways such as Retinol metabolism, drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450, arachidonic acid metabolism, linoleic acid metabolism, and beta-Alanine metabolism were enriched in the low-risk group ([121]Fig. 4D). Fig. 4. [122]Fig. 4 [123]Open in a new tab Mutation profiles, stemness, tumor mutational burden (TMB) correlation, and gene set variation analysis (GSVA) of the gene signature. (A) Mutation profiles of the four genes. (B) Association between the risk score and RNAss. (C) Differences in TMB between the high-risk and low-risk groups. (D) GSVA depicting potential signaling pathways enriched in high-risk and low-risk groups. 3.5. Immune landscape and immunotherapy response of the risk signature We delved deeper into the immune landscape associated with the risk signature, aiming to assess the impact of immune infiltration. Notably, B cell naive, Macrophage M0, and Macrophage M2 infiltration levels were elevated in the high-risk group of patients, while T cells CD4 memory activated and monocyte infiltration levels were higher in the low-risk group ([124]Fig. 5A). The distinctive characteristics of the immune landscape are elucidated in [125]Fig. 5B and C. Subsequently, we explored the relationship between ICIs and four risk genes. Positive correlations were observed between multiple immune checkpoint inhibitors and CD38, whereas KLK3, FASN, and PYCR1 exhibited negative correlations ([126]Fig. 5D). Furthermore, our observations revealed that patients within the low-risk group, exhibiting higher IPS, demonstrated a more favorable prediction for immunotherapy response ([127]Fig. 5E). Fig. 5. [128]Fig. 5 [129]Open in a new tab Association of the risk signature with the immune landscape. (A) Differences in immune cell infiltration between high-risk and low-risk groups. (B) Correlations between the risk signature and various immune functions. (C) Characteristics of the immune landscape based on immune subtypes. (D) Correlation between immune checkpoint inhibitors and the four risk genes. (E) Correlation between immunophenoscore and risk signature. 3.6. Expression levels and survival analysis of the risk genes Firstly, scRNA-seq data was employed for a visual inspection to determine the expression status of the four identified risk genes. Notably, these genes exhibited predominant expression in epithelial cells within PCa tissues ([130]Fig. 6A; [131]Fig. S4A). Secondly, an evaluation of the expression patterns of the four genes in PCa and surrounding tissues was conducted using data from the TCGA-PRAD cohort. In tumor tissues, KLK3, FASN, and PYCR1 demonstrated a significant up-regulation, whereas CD38 exhibited a noteworthy down-regulation ([132]Fig. 6B). Complementary to this analysis, we cross-referenced the expression of the four risk genes in PCa and healthy tissues through the GEPIA and Human Protein Atlas websites, revealing consistent findings ([133]Fig. S4B and C). Furthermore, survival analysis indicated that elevated expressions of FASN and PYCR1 were associated with improved survival in the TCGA cohort, while heightened expressions of KLK3 and CD38 predicted poorer survival in PCa patients ([134]Fig. 6C). This observation was validated in the GEO cohort, confirming the robustness of these prognostic associations ([135]Fig. 6D). Fig. 6. [136]Fig. 6 [137]Open in a new tab Expression levels and survival analysis of the risk genes. (A) Expression of the four risk genes in different cell subtypes. (B) Expression of the four risk genes in paired tumor and adjacent tissues in the TCGA database. (C) Kaplan-Meier survival plot for the four risk genes in the TCGA-PRAD cohort. (D) Kaplan-Meier survival plot for the four risk genes in the [138]GSE116918 dataset. (E) Representative images of Immunohistochemistry (IHC) staining of PYCR1 and Ki-67 staining in paired benign BPH tissues and PCa tissues (magnification, ×200). (F) The protein expression level of PYCR1 in six representative clinical specimens detected by western blotting, with β-actin as an endogenous control. (G) PYCR1 expression in PCa cell lines and normal prostate cell lines based on the Cancer Cell Line Encyclopedia (CCLE) database. Given the extensive prior research on KLK3, CD38, and FASN in PCa [[139]19,[140]38,[141]39], our focus in subsequent validation analyses was on PYCR1 expression. A set of 41 paired specimens from BPH and PCa was collected for this purpose ([142]Table S1). Notably, PYCR1 demonstrated substantial expression in cancer tissues, as evidenced by IHC results ([143]Fig. 6E). Consistent outcomes were observed in Western blotting experiments conducted on six pairs of tumor samples and six benign samples ([144]Fig. 6F; [145]Fig. S5). Additionally, exploration of PYCR1 expression in prostate cell lines using data from the CCLE database highlighted MDAPCA2B as having the highest expression level compared to other cell lines ([146]Fig. 6G). 4. Discussion The well-being of men is facing an escalating threat due to prostate cancer [[147]40,[148]41]. Following radical prostatectomy or chemoradiotherapy, a substantial proportion of patients—ranging from 27% to 53%—have been documented to experience BCR [[149]5]. Elevated PSA levels may indicate the recurrence or persistence of prostate cancer cells, with recurrence potentially confined to the prostate or extending to distant metastases in other organs, such as the bones or lymph nodes. Castration-resistant prostate cancer (CRPC) emerges as a common complication in advanced cases marked by the development of androgen resistance [[150]42,[151]43]. Therefore, to identify key genes involved in cell differentiation in prostate cancer patients, we combined multiple omics data for a comprehensive analysis. The prognosis and treatment trajectory for cancer patients can be significantly influenced by the co-evolution of immune cells and stromal cells engaged in tumor crosstalk within solid tumors [[152]44,[153]45]. Our exploration of cell differentiation trajectories enabled the identification of prognostic genes linked to this intricate process. These revelations provide novel insights for uncovering molecular markers associated with prostate cancer, ultimately contributing to advancements in treatment strategies and prognosis for cancer patients [[154]46,[155]47]. To identify subgroup-dependent phenotypic genes, we divided PCa cells into three subgroups based on their differentiation phases. These marker genes predominantly govern the binding of the MHC class II protein complex, positive regulation of cell adhesion, and cell-cell adhesion. Furthermore, the abnormal activation of KEGG pathways such as Phagosome, Epstein-Barr virus infection, Human T-cell Leukemia virus-1 infection, Focal adhesion, PI3K-Akt signaling pathway, Human papillomavirus infection, Th17 cell differentiation, and NF-kappa B signaling pathways was observed among these marker genes. Notably, the activation of oncogenic signals, including the PI3K-Akt/NF-kappa B pathway, stands out as one of the pivotal pathogenic mechanisms in cancer. These signaling pathways play a crucial role in the rapid development and dissemination of tumors [[156]48]. The prognosis of prostate cancer patients is influenced by clinical and pathological factors such as PSA, pathological T stage, and Gleason score [[157]49]. Despite similar clinical and pathological characteristics, the outcomes of identical treatment regimens for PCa patients have been shown to vary, with some experiencing BCR and others achieving tumor control [[158]50,[159]51]. Our hypothesis posits that these divergent outcomes may be linked to the extensive genetic variability observed in malignancies. Utilizing scRNA-seq data from PCa patients, our study delved into marker genes that contribute to gene heterogeneity in the TME through cell differentiation trajectories. Numerous effective clinical prediction models have been established in previous research, offering the potential for enhanced early diagnosis and prognosis for cancer patients [[160]31,[161][52], [162][53], [163][54]]. Employing WGCNA, we identified six crucial marker genes influencing recurrence-free survival in PCa patients. Subsequently, a risk signature, comprising four genes, was developed using lasso regression analysis for prognosis prediction. Our findings revealed that patients in the high-risk category exhibited markedly worse overall survival rates compared to those in the low-risk group. The risk score from the marker genes model demonstrated high accuracy in predicting overall survival, as confirmed by ROC curve analysis. Furthermore, we established a risk model based on the four prognostic genes to predict recurrence-free survival in PCa patients. Regrettably, due to the limitations imposed by a small sample size, our model did not reveal a discernible difference among these PCa patients. The immune landscape plays a pivotal role in shaping the clinical outcomes of cancer patients [[164]55,[165]56]. To evaluate immune cell infiltration in prostate cancer (PCa) patients, we employed the CIBERSORT algorithm. The results revealed significant differences between the high-risk and low-risk groups. Specifically, the high-risk group exhibited enrichment in B cells, M0, and M2 macrophages, while the low-risk group displayed enrichment in T cells CD4 memory activated, and monocytes. Macrophages, being the primary immune-invading cells in tumor tissues, are closely implicated in tumor immune escape and inflammatory responses [[166]57]. Reports indicate that macrophage infiltration is associated with tumor development and aggressive phenotypic polarization [[167]58]. The substantial reduction in immune activities observed in the high-risk group suggests a potential immunosuppressive phenotype in these patients. Furthermore, the higher IPS observed in the low-risk group compared to the high-risk group may be attributed to these findings. This implies that low-risk patients might exhibit a more favorable response to immunotherapy, aligning with the notion that their immune activities are comparatively more robust. The aberrant expression of the identified risk genes in PCa tissue holds the potential for predicting the survival of these patients. The KLK3 gene encodes a PSA-like protease found in semen and is a member of the KLK family. Our investigation revealed a prominent elevation of KLK3, particularly in PCa epithelial cells, correlating with its ability to predict overall patient survival. Notably, serum PSA, commonly used for monitoring tumor progression, may not comprehensively reflect the disease state [[168]59]. Previous research has associated elevated KLK3 mRNA levels in the blood with shorter progression-free survival in PCa patients [[169]60]. Our findings suggest that heightened KLK3, influenced by gene variations, the TME, and sample size, may serve as a protective factor for PCa patients [[170]61]. Initially identified as a marker of lymphocyte activation, CD38 exhibits extensive-expression across various immune cells [[171]39]. In our study, we observed elevated levels of FASN and PYCR1 in PCa tissue, particularly in epithelial cells. We further validated the expression of PYCR1 in PCa tissue samples. The dysregulation of FASN, a critical enzyme in adipogenesis, facilitates rapid cancer cell development and tumorigenesis [[172]38]. Overstimulation of the proline synthesis pathway, induced by an amino acid shortage, may lead to metabolic reprogramming and aberrant tumor biological activity. PYCR1, a crucial enzyme in this pathway, demonstrates significant up-regulation in PCa tissues and has been implicated in controlling cell proliferation and metastasis [[173]62]. However, it is essential to acknowledge certain limitations in our study. Further validation through in vivo and in vitro investigations, utilizing larger sample sizes, is warranted to enhance the robustness of our findings, given the current constraints posed by the small sample size. 5. Conclusion In summary, our comprehensive investigation delved into the intricate relationship between genes associated with the cell differentiation locus and the prognosis of prostate cancer patients. The genetic signature we identified demonstrates enhanced predictive capabilities for patient outcomes and responses to immunotherapy. Funding None. Data availability statement Data included in article/supplementary material/referenced in article. Additional information All data related to the results of this study are available within the article. CRediT authorship contribution statement Liangxue Sun: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Data curation, Conceptualization. Zhouting Tuo: Writing – review & editing, Writing – original draft, Visualization, Validation, Software. Xin Chen: Writing – review & editing, Writing – original draft, Visualization, Validation, Conceptualization. Huming Wang: Writing – review & editing, Writing – original draft, Visualization, Validation. Zhaojie Lyu: Writing – review & editing, Writing – original draft, Visualization. Guangyuan Li: Writing – review & editing, Writing – original draft, Visualization, Validation. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements