Abstract Prostate cancer (PCa) is the most common malignancy. New biomarkers are in demand to facilitate the management. The role of the pinin protein (encoded by PNN gene) in PCa has not been thoroughly explored yet. Using The Cancer Genome Atlas (TCGA-PCa) dataset validated with Gene Expression Omnibus (GEO) and protein expression data retrieved from the Human Protein Atlas, the prognostic and diagnostic values of PNN were studied. Highly co-expressed genes with PNN (HCEG) were constructed for pathway enrichment analysis and drug prediction. A prognostic signature based on methylation status using HCEG was constructed. Gene set enrichment analysis (GSEA) and the TISIDB database were utilised to analyse the associations between PNN and tumour-infiltrating immune cells. The upregulated PNN expression in PCa at both transcription and protein levels suggests its potential as an independent prognostic factor of PCa. Analyses of the PNN’s co-expression network indicated that PNN plays a role in RNA splicing and spliceosomes. The prognostic methylation signature demonstrated good performance for progression-free survival. Finally, our results showed that the PNN gene was involved in splicing-related pathways in PCa and identified as a potential biomarker for PCa. Keywords: prognosis signature, PNN, immune infiltration, drug prediction, methylation status, prostate cancer Introduction Prostate Cancer (PCa) is the third most common cancer overall ([38]Pan et al., 2017) and the most common malignant tumour in the male genitourinary system ([39]Ren et al., 2017; [40]Caggiano et al., 2019; [41]Jambor et al., 2019). Its prevalence and mortality vary greatly depending on race and geographic location ([42]Lindberg et al., 2013). At present, PCa is usually screened and diagnosed through digital rectal examination (DRE), prostate-specific antigen (PSA) value, Gleason score by prostate biopsy, and magnetic resonance imaging (MRI) of the prostate ([43]Patil and Gaitonde, 2016). New biomarkers used with techniques such as liquid biopsy and imaging have also been used for clinical diagnosis ([44]Kim et al., 2016; [45]Li et al., 2018; [46]Law et al., 2020). In fact, metastatic PCa remains incurable despite promising advances in biomedical research. Therefore, patients’ good prognosis is currently dependent on early detection. Conventional non-surgical options for PCa therapy include androgen deprivation therapy (ADT), radiotherapy (RT), ablation therapy, chemotherapy, and emerging immunotherapy. However, the effectiveness of the drugs including abiraterone and enzalutamide, are limited and temporary, but has been established clinically. New biomarkers for diagnosis and treatment need to explore the mechanism deeply. In the past two decades, several mechanisms of PCa have been continuously reported, including novel associations of androgen signalling ([47]Caggiano et al., 2019; [48]Cioni et al., 2020), TP53 signalling ([49]Ecke et al., 2010; [50]Liu et al., 2021), and the Wnt signalling pathway ([51]Murillo-Garzón et al., 2018; [52]Datta et al., 2020) with the disease. In fact, it is now believed that various cytokines and intercellular signals regulate PCa during its development ([53]Cucchiara et al., 2017). Thus, many potential mechanisms of PCa remain to be explored, which may lead to new diagnostic techniques or therapeutic strategies, especially for metastatic PCa. The pinin protein, reported as a desmosome-associated protein encoded by the PNN gene, is a phosphoprotein rich in serine and arginine with a molecular size of 140 kDa. Recently, it has been suggested that pinin is associated with cell adhesion ([54]Tang et al., 2020; [55]Yao and Ma, 2020). It serves as a putative tumour promoter by reversing the expression of E-cadherin ([56]Simon et al., 2015). The upregulation of pininhas been reported to enhance metastasis in colorectal cancer ([57]Wei et al., 2016), triple-negative breast cancer cells ([58]Kang et al., 2020), pancreatic cancer ([59]Yao and Ma, 2020), and nasopharyngeal carcinoma cells ([60]Tang et al., 2020). As an oncogenic factor, PNN can protect hepatocellular carcinoma cells from apoptosis ([61]Yang et al., 2016) and promote cell adhesion in ovarian cancer ([62]Zhang et al., 2016), as well as renal cell carcinoma ([63]Jin et al., 2021). These studies indicate the critical role of PNN in metastasis; thus, it could be a potential biomarker for some tumours. However, the role of pininin PCa progression has not been thoroughly studied yet. Since the tumour microenvironment (TME) has emerged as a critical factor in metastasis ([64]Yin et al., 2019; [65]Yuan et al., 2022), there may also be a functional linkage between TME and PNN in PCa, but this hypothesis remains to be investigated. Since the PNN gene has not been comprehensively deciphered in PCa, we conducted a series of studies on its roles in patients’ survival and prognosis, as well as in immune infiltration in PCa through various bioinformatic approaches. We explored the expression pattern of the PNN gene and its potential prognostic value for PCa. We also investigated the relationship between PNN and the tumour immune microenvironment (TIME), which could facilitate understanding the mechanism of immunotherapy for PCa and lead to the discovery of a prognosis signature or novel therapeutic targets. Materials and methods To illustrate the function of PNN in PCa, we conducted a comprehensive bioinformatic analysis using multiple datasets. The whole analysis pipeline performed here is displayed in [66]Figure 1. FIGURE 1. [67]FIGURE 1 [68]Open in a new tab Analysis pipeline of PNN performed in this study. Data source The transcriptome data [the level 3 mRNA expression data (FPKM), normalized using [MATH: log2(FPKM+1) :MATH] ] of normal tissues (52 cases) and tumour tissues with complete clinical information (379 cases) were extracted from The Cancer Genome Atlas (TCGA) database of prostate adenocarcinoma (PRAD). The mRNA expression profiles contained in the [69]GSE116918 ([70]Jain et al., 2018), [71]GSE29079 ([72]Börno et al., 2012), and [73]GSE6956 ([74]Wallace et al., 2008) datasets, which were normalized by their corresponding providers, were downloaded from Gene Expression Omnibus (GEO) database. A total of 248 PCa cancer samples with clinical information were included in the [75]GSE116918 dataset. The [76]GSE29079 dataset contained 48 normal samples and 47 PCa samples, while the [77]GSE6956 dataset had 18 normal samples and 69 PCa samples. However, neither [78]GSE29079 nor [79]GSE6956 contains clinical information. The BioGRID database offered 253 unique interactors of pinin with experimental pieces of evidence ([80]Oughtred et al., 2021). TSVdb offered PNN splicing variants expression ([81]Sun et al., 2018). For PNN expression in pan-cancer, we downloaded the standardised pan-cancer dataset TCGA TARGET GTEx (PANCAN, N = 19131, G = 60499) from the UCSC ([82]https://xenabrowser.net/) database and further extracted the expression data of PNN gene in each sample. In addition, we filtered out the samples with zero expression levels, and further transformed each expression value with log2 (x + 0.001), finally, we excluded those with less than three samples in a single cancer species. Protein expression analysis with the Human Protein Atlas database The Human Protein Atlas (HPA) provides the protein expression of pinin in normal prostate (via [83]https://www.proteinatlas.org/ENSG00000100941-PNN/tissue/prostate) and tumour tissues (via [84]https://www.proteinatlas.org/ENSG00000100941-PNN/pathology/prostate +cancer) ([85]Uhlén et al., 2015). All images of tissues in HPA database are stained by immunohistochemistry. We extracted the immunohistochemistry images directly from the HPA database. Independent prognostic analysis Correlation analysis of PNN expression and clinicopathological characteristics was performed. The expression of PNN between the subgroups was compared based on the following clinicopathological features: age (<60 or ≥60 years old), N stage (N0, N1), M stage (M0, M1), T stage (T2, T3, T4), surgical margin (R0, R1, R2, RX), prostate-specific antigen (PSA) level (<10 or ≥10 years), and Gleason score (6, 7, 8, 9, 10). Univariate and multivariate Cox regression analyses were implemented to identify independent predictors of survival in the TCGA-PRAD and [86]GSE116918 datasets. Expression profiles of PNN gene in primary and metastatic prostate cancer We downloaded [87]GSE38241 ([88]Aryee et al., 2013) and [89]GSE25136 ([90]Sun and Goodison, 2009) datasets (the authors processed normalisation) from GEO. For the merging of these datasets, we used the method of COMBAT ([91]Johnson et al., 2007), implemented in the R package inSilicoMerging ([92]Taminau et al., 2012) to obtain the expression matrix. Finally, the PNN expression was compared using the Kruskal-Wallis test. Construction of the PNN co-expression network We calculated the Pearson correlation of all genes (RNA-seq) in the TCGA dataset with PNN using the Linkomics database ([93]http://www.linkedomics.org/) and selected the genes with correlation coefficients > 0.8 and p < 0.05 as PNN co-expressed genes. Functional and pathway enrichment analysis The “clusterProfiler” R package was utilised to conduct Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis ([94]Yu et al., 2012). GO enrichment analysis mainly described the biological processes (BP), cellular components (CC), and molecular functions (MF) correlated with genes. The threshold for significant enrichment was set as a p-value < 0.05 or FDR < 0.05, as stated. Single sample gene set enrichment analysis (ssGSEA) enrichment scores were calculated in each sample using the “GSVA” package of R ([95]Hänzelmann et al., 2013). Identification of potential drugs In this research, potential drug (or molecules) was predicted using the Drug Signatures database (DSigDB) via Enrichr ([96]https://maayanlab.cloud/Enrichr/) based on the PNN gene as well as the positively co-expressed gene with PNN (correlation coefficient > 0.8 and p < 0.05) ([97]Chen et al., 2013; [98]Kuleshov et al., 2016; [99]Xie et al., 2021). DNA methylation analysis and construction of the prognostic signature The CpG sites in the promoter of PNN and PNN’s co-expressed genes were obtained from the MEXPRESS database ([100]Koch et al., 2015; [101]Koch et al., 2019). A univariate Cox analysis in R was used to determine the association between methylation levels at each CpG site and progression-free survival (PFS) for each patient, and p < 0.01 was considered statistically significant. Candidate prognostic CpG sites were selected using the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm. Based on the candidate CpG sites generated from the above algorithm, a multivariate Cox regression model was used to construct a prognostic signature. The RiskScore of each recipient was calculated using the following formula: [MATH: RiskSco re=Σi< mo>=1nβi×Methi :MATH] In which [MATH: β :MATH] refers to coefficient, and [MATH: Meth :MATH] refers to the level of methylation. Patients were divided into the high-risk ( [MATH: RiskSco remedi< /mi>an :MATH] ) and low-risk groups ( [MATH: RiskSco re<medi< /mi>an :MATH] ) in the TCGA dataset. Then, we performed ROC analysis using the R software package pROC (version 1.17.0.1) to obtain the AUC. The R package “survival” was used to perform the two risk groups’ Kaplan-Meier (KM) survival analysis. Gene set enrichment analysis To inspect the different signalling pathways between the PNN low- and high-expression groups in the TCGA-PRAD dataset, Gene Set Enrichment Analysis (GSEA) was conducted by the “clusterProfiler” package in R software ([102]Subramanian et al., 2005). Pathways with a p-value < 0.05 were considered significantly enriched. TISIDB database The Tumor and Immune System Interaction Database (TISIDB) ([103]http://cis.hku.hk/TISIDB) database was utilised to analyse the associations between PNN and tumour-infiltrating lymphocytes (TIL), immunosuppressors, and chemokines ([104]Ru et al., 2019). Statistical analysis Statistical analysis was performed using the R software package (version 3.6.1). The differential mRNA expression of PNN between tumour tissues and normal controls was compared using Student’s t-test. The expression of PNN among the clinicopathological parameters groups was compared using Student’s t-test and ANOVA. The area under the curve (AUC) of receiver operating characteristic (ROC) was utilised to determine the diagnostic ability of PNN and was calculated using the “pROC” R package ([105]Malone et al., 2015). KM curves of disease-free survival (DFS or PFS) of the patients were performed by setting the median expression of PNN as the cut-off in the ‘survival’ R package. The log-rank test was used to assess statistical differences, and a cut-off p-value < 0.05 was deemed statistically significant. Results Prognostic and diagnostic value of PNN in prostate cancer The expression levels of PNN between PCa and control samples were compared in the TCGA-PRAD, and the PNN expression level was validated with [106]GSE29079 and [107]GSE6956 datasets. As shown in the violin plots, the mRNA expression level of PNN was significantly higher in the PCa group in all datasets ([108]Figures 2A–C). Next, we used the same datasets to evaluate the diagnostic value of the PNN gene. The accuracy of the diagnostic model was evaluated by ROC curve analysis ([109]Figure 2D). As a result, the AUC of the PNN diagnostic model was greater than 0.7 in all three datasets, indicating that the PNN gene can be used to discriminate cancer from normal tissues. Moreover, we also observed that the abundance of pinin protein was higher in PCa tissue than in normal tissue ([110]Figures 2E,F). FIGURE 2. [111]FIGURE 2 [112]Open in a new tab PNN expression profile and its diagnostic value in Prostate Cancer (PCa). (A–C) Comparison of PNN expression levels in the TCGA-PRAD, [113]GSE29079, and [114]GSE6956 datasets. (D) The diagnostic value of PNN as evaluated by ROC curve. (E,F) Immunohistochemistry results of normal (two cases) and PCa tissue (four cases) from the HPA database. To explore the relationship between PNN expression and the clinicopathological characteristics in PCa, we compared the PNN expression levels according to sample clinical information. The high PNN expression was found in the advanced stage of PCa ([115]Figure 3B), and the Gleason scores were strongly correlated with the PNN expression levels in PCa patients in both TCGA-PRAD datasets (p = [MATH: 6.3×109 :MATH] ) and [116]GSE116918 dataset (p = 0.001) in [117]Figures 3E,I. Collectively, the Gleason score was highly positively correlated with PNN expression. Different the surgical margins (R0/1/2/X) found different PNN expression ([118]Figure 3D). It has been found that the PNN gene expression level was significantly higher in tumors than that of the primary tissue ([119]Figure 3J, data process in [120]Supplementary Figure S1), suggesting this gene can be used for diagnostics in metastatic patients. Age ([121]Figures 3A,F), T stage ([122]Figures 3C,H), or PSA level ([123]Figure 3G) are not correlated with the PNN expression’s significance. FIGURE 3. [124]FIGURE 3 [125]Open in a new tab Comparison of PNN expression and clinical information in TCGA (A) Age, (B) N stage, (C) T stage, (D) Surgical margin, and (E) Gleason score. Comparison of PNN expression and clinical information of [126]GSE116918 (F) Age, (G) PSA level, (H) T stage, and (I) Gleason score. The t-test was used to evaluate the difference between two groups, and analysis of variance (ANOVA) was used to compare data divided into more than two groups. (J) Comparision of the PNN gene expression between primary and metastatic PCa using [127]GSE38241 and [128]GSE25136 datasets following batch effects removal. Univariate and multivariate Cox analyses were conducted to investigate the independent prognostic factors in TCGA-PRAD and validated with [129]GSE116918 datasets. The univariate analysis in the TCGA-PRAD dataset indicated that the surgical margin, T stage, N stage, Gleason score, and PNN expression were associated with the prognosis of PCa patients ([130]Figure 4A). In contrast, multivariate Cox regression analyses in the same dataset demonstrated that only the Gleason score could be used independently to predict the prognosis of patients ( [131]Figure 4B). Similarly, the PSA levels, Gleason score, T stage, and PNN expression were found to be significant risk factors by univariate Cox analysis in the [132]GSE116918 dataset ([133]Figure 4C). In the same dataset, multivariate Cox regression analyses demonstrated that T stage and PNN expression could be used independently to predict the prognosis of patients ([134]Figure 4D). We then validated these findings by analysing the DFS curves of the PNN high- and low-expression groups, which showed that the PNN high-expression group had remarkably worse survival rates than the low-expression group in both the TCGA-PRAD and the [135]GSE116918 datasets ([136]Figures 4E,F). The hazard ratio of PNN was greater than 1 in both datasets. Taken together, it suggested that PNN was a risk factor in the prognosis of PCa. However, the independent prognostic value of PNN needed further investigation and confirmation. FIGURE 4. [137]FIGURE 4 [138]Open in a new tab PNN prognostic value in the TCGA-PRAD and the [139]GSE116918 cohorts. Forest plots of univariate and multivariate Cox regression analysis for the TCGA cohort (A) univariate, (B) multivariate and the [140]GSE116918 cohort (C) univariate, (D) multivariate. (E,F) DFS curves plotted according to the KM method for the TCGA-PRAD and [141]GSE116918 cohorts using the log-rank test. PNN co-expression network and potential drug targets in prostate cancer To identify pharmaceutical molecules with DsigDB database and further uncover the biological processes PNN participated, the co-expression pattern of PNN in PCa was explored. All co-expressed genes are listed in [142]Supplementary Table S2. BioGrid hosted 243 proteins interacting with pinin extracted from published literature. A total of 368 genes were co-expressed with pinin following the criteria of r > 0.6 and p < 0.05, of them, twenty-five genes overlapped with 243 interactive proteins of pinin (25UC for short). Those 25UC genes were enriched in RNA splicing and RNA/mRNA processing based on GO enriched analysis ([143]Figure 5A) and enriched in the spliceosome, mRNA surveillance pathway, and RNA transport based on KEGG enrichment analysis ([144]Figure 5B). These results suggest that PNN is mainly linked to the RNA process and RNA transport in PCa. PNISR, RBM39, DDX39B, SF3B1, SRSF11, CPSF6, CLK2, and SNRPB2 have the function of splicing or process of RNA; ACIN1 and NKTR participate in cell apoptosis and immune response. The protein-protein interaction network can be found in [145]Figure 5C. FIGURE 5. FIGURE 5 [146]Open in a new tab Co-expressed network with PNN. Enrichment results filtered with FDR < 0.05 based on 25 uniquely interacted and co-expressed genes with PNN with (A) GO and (B) KEGG. (C) Protein-Protein-interaction (PPI) network constructed using Cytoscape 3.8.2 based on PNN and 25UC. To explore the potential therapeutic targets in PCa, we focused on those genes that strongly positively (r > 0.8 and p < 0.05) correlated with upregulated PNN, including FNBP4, TCERG1, RBM39, DDX39B and DMTF1. Ten possible pharmaceutical molecules were identified using the Enrichr package from the DsigDB database, based on their p-value. [147]Table 1 lists the effective drugs from the DsigDB database for PCa. TABLE 1. List of the suggested drugs for PCa patients with PNN expression. Drug p-value Drug indication Drug stage (approved or not) Targeted gene References