Abstract Prostate cancer (PCa), a multifocal clinically heterogeneous disease, is the most commonly diagnosed non-cutaneous neoplasm in men worldwide. The epigenome of PCa is a typical representation of catastrophic model of epigenetic alterations during tumorigenesis and its progression. Alterations in methylation patterns in tumor suppressors, cell cycle, oncogenes and metabolism-related genes are the most commonly observed epigenetic alterations in PCa. In this study, we have developed a computational strategy to identify methylated biomarker signature panels as potential targets of PCa by screening >160 genes reported to be epigenetically dysregulated, and shortlisted 26 differentially methylated genes (DMGs) that significantly contribute to oncogenesis. The gene ontology and functional enrichment analysis were performed, which showed that identified DMGs contribute to cellular and metabolic processes such as cell communication, cell cycle, response to drugs, apoptosis and p53 signaling. The top hub genes AR, CDH13, CDKN2A, DAPK1, GSTP1, CD44 and RASSF1 identified from protein–protein interaction network construction using Search Tool for the Retrieval of Interacting Genes contributed to hormonal response, inflammatory response, cell cycle, reactive oxygen species activity and receptor kinase activity, which are related to hallmarks of cancer as revealed by their functional enrichment analysis by Cytoscape. These genes were further scrutinized for CpG islands, transcription start sites and positions of methylated cytosines to study their methylation profiles. Our analysis revealed high negative correlation values between methylation frequencies and gene expressions of the hub genes, namely, AR, CDH13, CDKN2A, DAPK1, CD44, GSTP1 and RASSF1, which can be used as potential diagnostic biomarkers for PCa. Keywords: prostate cancer, CpG islands, epigenetic modification, DNA methylation Introduction Prostate cancer (PCa) accounts for being the most commonly diagnosed non-cutaneous neoplasm and third leading cause of cancer mortalities in men worldwide.[27]^1 Reports suggest that men aged >40 years are ~30-fold more susceptible to developing PCa.[28]^2 Although a number of genetic, surface and intracellular markers have been identified to study cancer progression such as CD133, CD49f, Trop2, p63 and PSA, PSA is the only established biomarker for the detection of PCa.[29]^3 However, PSA lacks in specificity and sensitivity;[30]^3 thus, there exists an urgent need to identify biomarkers that are more specific. Epigenetic alterations in the form of DNA methylation and histone modifications have emerged as a major focus area for cancer treatment considering their contribution in modulation of crucial gene expressions in various malignancies.[31]^4 Tumor suppressor genes and oncogenes play crucial roles in cancer by regulating functions such as cell cycle control, DNA repair, cell adhesion and apoptosis and are known to be differentially methylated in their promoter regions in several types of cancers.[32]^5^,[33]^6 However till date, there is a limited knowledge of the specific functional mechanisms at the genome level that are being altered by epigenetic dysregulation during tumor progression in PCa.[34]^7 Given their key role in various stages of carcinogenesis by causing abnormal gene expression of tumor suppressor genes and oncogenes, histone modifications and hypermethylation status of genes are being explored as potential biomarkers of cancer progression and prognosis.[35]^8 Nowadays with the advancement of chromatin immunoprecipitation (ChIP) technology along with microarray profiling, scientists can screen thousands of differentially expressed genes (DEGs), analyze their mRNA profiles and methylation profiles computationally and subsequently scrutinize their involvement in various molecular and biological regulatory functions during various stages of tumor initiation and progression.[36]^9 The present study aims to explore the contribution of epigenetic modulation of genes involved in progression of PCa using computational approach. We screened >160 differentially methylated genes (DMGs), which are reported to be epigenetically dysregulated, and identified 26 genes that significantly contribute to oncogenesis in prostate gland. This was followed by screening the identified DMGs for gene ontology (GO) and their pathway enrichment. The protein–protein interaction (PPI) and functional enrichment analysis further shed light on the crucial roles played by the identified DMGs during tumor initiation and progression. Our analysis showed a high negative correlation value between methylation frequencies and gene expressions of identified hub genes, which have the potential to act as early prognostic and diagnostic biomarkers of PCa. Methods Identification of DMGs A comprehensive literature survey was performed using PubMed and Google Scholar. The keywords used were “Prostate cancer” and “DNA methylation”. The literature survey was limited to full text articles available in English. A preliminary abstract review was performed to decide relevance of these research articles to our methylation study. Once the literature survey was updated, an extensive data extraction exercise was carried out to retrieve genes that are differentially methylated in PCa using online databases. The Cancer Genome Atlas (TCGA) ([37]cancergenome.nih.gov/),[38]^10 Cancer Genetics Web ([39]www.cancerindex.org/geneweb/)[40]^11 and MethyCancer ([41]methycancer.psych.ac.cn/)[42]^12 were used for identifying the DMGs in PCa. GO and functional enrichment analysis Databases such as GeneCodis[43]^13 and Panther[44]^14 were used for functional enrichment analysis. GO of identified DMGs included categories such as cellular component (CC), biological process (BP) and molecular function (MF) terms. The parameters were set as p-value <0.01 for considering the results that were statistically significant. The enriched pathways were identified using Kyoto Encyclopedia of Genes and Genomes (KEGG)[45]^15 analysis. To further analyze the interrelationship of identified hub genes in regulating cellular pathways, gene set enrichment analysis module of Cytoscape software[46]^16 was used. PPI network construction by search tool for the retrieval of interacting genes (STRING) The interactive relationships between DMGs were identified using STRING database.[47]^17 A combined score of >0.7 (high) of only experimentally validated interactions was considered to be statistically significant. Epigenetic analysis The FASTA sequences of identified differentially methylated hub genes in PCa were retrieved from nucleotide database of National Center for Biotechnology Information ([48]www.ncbi.nlm.nih.gov/). CpGProD ([49]doua.prabi.fr/software/cpgprod)[50]^18 and EMBOSS CpGPlot ([51]www.ebi.ac.uk/Tools/seqstats/emboss_cpgplot)[52]^19 were used to monitor the presence of CpG islands if any. Information related to GC%, G + C skew, A + T skew, etc. was also retrieved from these online programs. Methylation and Expression database of Normal and Tumor tissues ([53]MENT; mgrc.kribb.re.kr:8080/MENT/)[54]^20 is an online integrative database to study DNA methylation and its correlation with gene expression in paired samples gathered from TCGA and Gene Expression Omnibus (GEO). MENT was used to identify the effect of methylation on the gene expression of the identified DMGs in PCa. Analysis of DNA methylation with exon–intron structures, genomic loci of epigenetic changes, positions of methylated cytosines, CpG islands and transcription start sites (TSS) using a graphical description was carried out using Prostate Epigenetic Database (PEpiD; [55]wukong.tongji.edu.cn/pepid).[56]^21 The identified hub genes after epigenetic analysis were further classified using extensive literature survey based on their specificity and sensitivity for determining their performance as prognostic and diagnostic biomarkers for PCa. The literature survey covered published reports in which the specificity and sensitivity of DMGs were determined in cell lines and also bodily samples such as serum, ejaculates, post-massage urine and biopsy samples using methylation-specific polymerase chain reaction (MSP-PCR), ChIP and microarray analysis.[57]^22^–[58]^45 Results Identification of DMGs involved in PCa initiation and progression A consolidated list of >160 genes from literature survey and online databases with evidences of differential methylation during PCa progression was prepared ([59]Table S1). We short-listed 26 candidate DMGs that were associated with metastatic and recurrent stages of the disease with at least two publications and were used for further analysis ([60]Table 1). Table 1. Differentially methylated genes (DMGs) shortlisted for analysis S no Shortlisted DMGs associated with PCa 1 Adenomatous polyposis coli (APC) 2 Caveolin 1 (CAV1) 3 Cluster of differentiation 44 (CD44) 4 Cadherin 13 (CDH13) 5 Cyclin-dependent kinase inhibitor 1C (CDKN1C) 6 Death-associated protein kinase 1 (DAPK1) 7 Endothelin receptor type B (EDNRB) 8 Estrogen receptor 2 (ESR2) 9 Glutathione peroxidase 3 (GPX3) 10 Hypermethylated in cancer 1 (HIC1) 11 Prostaglandin-endoperoxide synthase 2 (PTGS2) 12 Ras association domain family member 1 (RASSF1) 13 Stratifin (SFN) 14 Androgen receptor (AR) 15 Cyclin D2 (CCND2) 16 Cadherin 1 (CDH1) 17 Cyclin-dependent kinase inhibitor 1B (CDKN1B) 18 Cyclin-dependent kinase inhibitor 2A (CDKN2A) 19 Dickkopf WNT signaling pathway inhibitor (DKK3) 20 Estrogen receptor 1 (ESR1) 21 Fragile histidine triad (FHIT) 22 Glutathione S-transferase pi 1 (GSTP1) 23 O-6-methylguanine-DNA methyltransferase (MGMT) 24 Retinoic acid receptor beta (RARβ) 25 S100 calcium-binding protein A6 (S100A6) 26 TIMP metallopeptidase inhibitor 3 (TIMP3) [61]Open in a new tab Functional annotation and pathway enrichment analysis The altered biological function and MF of the identified DMGs were then analyzed through GO using GeneCodis and Panther. A p-value of <0.01 was considered as significant during the analysis. The enrichment GO terms were categorized into MF, BP and CC ([62]Table 2). During the MF analysis, we found that majority of DMGs contributed to protein binding (16 genes), zinc ion binding (seven genes), metal ion binding (eight genes), DNA binding (six genes) and receptor activity (six genes). Other regulated MFs included enzyme binding, sequence-specific DNA binding and calcium ion binding. The CC analysis revealed that majority of differentially methylated candidate genes were associated with cytoplasm (20 genes), nucleus (16 genes), plasma membrane (10 genes), nucleoplasm (seven genes) and cytosol (six genes). In the BP ontology, the most significantly regulated processes included response to drugs (six genes), negative regulation of cell proliferation (six genes), signal transduction (seven genes) and negative regulation of apoptotic process (five genes). The other regulated processes included response to estradiol stimulus, cell cycle arrest and negative regulation of MAPK cascade. Thus, the most significantly enriched pathways identified using KEGG analysis include pathways in cancer (10 genes), cell cycle (five genes), p53 signaling (three genes) and PCa (three genes; [63]Table 2). Functional gene set enrichment analysis using gene set enrichment analysis (GSEA) module of Cytoscape showed that the identified DMGs most commonly contributed in hormonal response, inflammatory response, cell cycle, reactive oxygen species (ROS) activity and receptor kinase activity, which are all related to hallmarks of oncogenesis ([64]Table S2). Table 2. Gene ontology (GO) analysis of candidate differentially methylated genes (DMGs) associated with prostate cancer (PCa) Annotation Name Gene count p-value Genes involved Molecular functions (MFs) GO:0005515 Protein binding 16 1.0633E–06 S100A6, TIMP3, CD44, CDKN1C, CAV1, ESR1, CDKN1B, RASSF1, CDH1, SFN, ESR2, AR, DAPK1, FHIT, GSTP1, HIC1 GO:0008270 Zinc ion binding 7 0.0042104 S100A6, ESR1, RASSF1, ESR2, AR, HIC1, RARβ GO:0046872 Metal ion binding 8 0.00454845 TIMP3, ESR1, RASSF1, ESR2, PTGS2, AR, HIC1, RARβ GO:0004872 Receptor activity 6 0.00461516 CD44, EDNRB, ESR1, ESR2, AR, RARβ GO:0003677 DNA binding 6 0.00644885 ESR1, ESR2, CD44, EDNRB, AR, RARβ Cellular localization GO:0005737 Cytoplasm 20 4.1853E–10 S100A6, TIMP3, CD44, CDKN1C, CAV1, ESR1, CDKN1B, RASSF1, CDH13, CDH1, SFN, CDKN2A, APC, PTGS2, AR, DAPK, FHIT, CCND2, GSTP1, RARβ GO:0005634 Nucleus 16 6.62952E–06 S100A6, CDKN1C, ESR1, CDKN1B, RASSF1, SFN, ESR2, MGMT, APC, PTGS2, AR, FHIT, CCND2, GSTP1, HIC1, RARβ GO:0005654 Nucleoplasm 7 8.58777E–05 ESR1, CDKN1B, ESR2, CDKN2A, MGMT, AR, RARβ GO:0005886 Plasma membrane 10 0.00182238 S100A6, CD44, EDNRB, ESR1, RASSF1, CDH13, CDH1, FHIT, GSTP1, CAV1 GO:0005829 Cytosol 6 0.019808 S100A6, CAV1, CDKN1B, FHIT, CCND2, GSTP1 Biological process (BP) GO:0042493 Response to drugs 6 1.86731E–05 GPX3, CAV1, CDH1, MGMT, APC, PTGS2 GO:0008285 Negative regulation of cell proliferation 6 1.82718E–07 CDKN1C, CAV1, CDKN1B, CDH13, SFN, PTGS2 GO:0043066 Negative regulation of apoptotic process 5 0.000106969 CD44, CDKN1B, APC, AR, GSTP1 GO:0007050 Cell cycle arrest 4 0.000128057 CDKN1C, CDKN1B, RASSF1, CDKN2A GO:0032355 Response to estradiol stimulus 4 5.21837E–05 ESR1, PTGS2, CCND2, GSTP1 GO:0014070 Response to organic cyclic compound 4 0.000109315 TIMP3, GPX3, PTGS2, CCND2 GO:0007165 Signal transduction 7 0.00051767 S100A6, ESR1, SFN, ESR2, AR, DAPK1, RARβ KEGG-enriched pathways (KEGG)05200 Pathways in cancer 10 1.00997E–12 RARβ, AR, CDKN1B, APC, GSTP1, RASSF1, DAPK1, CDKN2A, CDH1, PTGS2 (KEGG)04110 Cell cycle 5 4.91434E–07 CCND2, CDKN1B, SFN, CDKN2A, CDKN1C (KEGG)04115 p53 signaling 3 0.000129557 CCND2, SFN, CDKN2A (KEGG)05215 PCa 3 0.000251702 AR, CDKN1B, GSTP1 (KEGG)04310 Wnt signaling 2 0.0189158 CCND2, APC (KEGG)04510 Focal adhesion 2 0.0300763 CCND2, CAV1 [65]Open in a new tab PPI network The PPI networks between DMGs were investigated using STRING server. In total, 26 nodes with 90 edges and 6.92 as average node degree were analyzed ([66]Figure 1 and [67]Table 3). In the PPI network, seven node proteins, namely, GSTP1, CDH13, CDKN2A, RASFF1, CD44, AR and DAPK1, showed maximum degree of association with the other proteins (>5), thus indicating that these genes (proteins) have higher hub degrees. Thus, these hub genes (proteins) may play a crucial role in PCa initiation and progression. Figure 1. [68]Figure 1 [69]Open in a new tab Protein–protein interaction (PPI) network of identified differentially methylated genes (DMGs). Table 3. Associated genes of identified seven differentially methylated hub genes Hub gene Connected genes AR ESR1, CAV1, ESR2, CDH1, RARβ, CDKN1B CDH13 CDH1, RASSF1, DAPK1, GSTP1, CDKN2A, RARβ, FHIT, MGMT, TIMP3, HIC1 CDKN2A CCND2, CDKN1B, RASSF1, FHIT, CDH1, MGMT, GSTP1, ESR1, TIMP3, HIC1, RARβ, SFN, AR, ESR2 DAPK1 RASSF1, TIMP3, RARβ, MGMT, HIC1, FHIT, CDH1 GSTP1 GPX3, TIMP3, RASSF1, DAPK1, MGMT, CDH1, RARβ, HIC1, ESR1, FHIT, SFN, PTGS2, AR, CCND2, ESR2 RASSF1 FHIT, SFN, RARβ, MGMT, HIC1, TIMP3, CDH1, CCND2, ESR1, ESR2 CD44 AR, CDKN1B, CAV1, ESR1, CDKN2A, TIMP3, PTGS2 [70]Open in a new tab Epigenetic analysis Evidences have shown that the cytosines in the CpG dinucleotides are often methylated to 5-methylcytosines leading to change in gene expression.[71]^46 Almost 70%–80% cytosines in CpG mammals are seen to be methylated in mammals and are often located near the promoter regions.[72]^47 CpGProd and EMBOSS CpGPlot were used to analyze positions of CpG islands, CpGo/e ratio, G + C skew and A + T skew ([73]Table S3). The positions of methylated cytosines along with positions of TSS of the identified hub DMGs were also retrieved using PEpID (a prostate epigenetic database in mammals; [74]Table S4). Once the CpG islands and the methylated cytosines positions were retrieved, MENT database was used to scrutinize the correlation between methylation frequencies and gene expressions. The MENT launched in 2012 is an integrated database of DNA methylation and gene expressions. Datasets containing information regarding DNA methylation and gene expression in paired samples were retrieved from TCGA and GEO. We opted for gene search tool to predict the correlation between methylation values and gene expressions of our hub genes. The Illumina methylation datasets consists of Human Methylation27 and Golden Gate Methylation Cancer Panel I, which is composed of 125 prostate tumor samples and 92 normal prostate samples for analysis. Based on the correlation values between mean methylation frequencies and gene expression, it was observed that expressions of hub genes GSTP1, CDH13, CDKN2A, RASFF1, AR, CD44 and DAPK1 were seen to be significantly affected by their hypermethylation during PCa progression due to an observed negative correlation of as high as −0.7 as shown in [75]Table 4. Table 4. Methylation and Expression database of Normal and Tumor tissues (MENT) data showing correlation between mean methylation values and gene expressions Gene name Methylation Probe Correlation CDH13 ILMN_1766925 cg 13759328 −0.53224 DAPK1 ILMN_1708340 cg 08797471 −0.75606 RASSF1 ILMN_1808066 cg 06821120 −0.44059 AR ILMN_1792540 cg 05786601 −0.56984 CDKN2A ILMN_1717714 cg 11653709 −0.68581 GSTP1 ILMN_1679869 cg 05244766 −0.55374 CD44 ILMN_1778625 cg 04125208 −0.39674 [76]Open in a new tab These identified hub genes, ie, AR, CDH13, CDKN2A, DAPK1, GSTP1, CD44 and RASSF1, have been reported by various other research groups to show high specificity and sensitivity as independent biomarkers in prostatic intraepithelial neoplasia and different stages of PCa, including primary and metastatic tumors ([77]Table 5). These genes have been reported to be differentially regulated due to hypermethylation in PCa cells, ie, PC3, DU145 and LNCaP, as well as bodily samples, including serum, ejaculates, post-massage urine and biopsy samples.[78]^22^–[79]^45 Table 5. Specificity and sensitivity of hub genes identified in our study that are frequently hypermethylated as potential biomarkers for prostate cancer (PCa) S no Gene name Sample type Sensitivity Specificity Significance/stage References