Abstract Prostate cancer, the second most common cancer in men, has been rarely explored by integrating mRNA and miRNA expression profiles. In this study, we combined two mRNA expression datasets and six documented miRNAs to uncover the comprehensive molecular mechanism of prostate cancer. Results showed that a total of 30 genes were significantly differentially expressed in 49 tumor samples by comparing with 22 normal samples. Importantly, all samples from the two datasets can be clearly classified into two groups, tumor group and normal group, based on the selected differentially expressed genes (DEGs). The miRNA–mRNA regulation network indicated that 4 out of 30 DEGs can be regulated by three miRNAs. In addition, prognostic performance validation using in silico databases showed that the DEGs can significantly differentiate between low-risk and high-risk prostate cancer. In summary, multiple biological processes are probably involved in the development and progression of prostate cancer. First, dysregulation of SV2 can regulate transporter and transmembrane transporter activity and then provide the necessary nutrients for tumor cell proliferation. Second, HOXD10 can induce cell proliferation via TCF-4. Finally, dysregulation of CACNA1D can further suppress tumor apoptosis in prostate cancer. The identification of critical genes and valuable biological processes can be useful for the understanding of the molecular mechanism of prostate cancer. Keywords: mRNA, DEGs, miRNA, prognostic Introduction Prostate cancer, as the second most common cancer in men and fourth most common tumor type worldwide, is estimated to be diagnosed in 220,800 men in the USA, and almost 12% (27,540) died of their disease.[33]^1^,[34]^2 The World Health Organization (WHO)’s report in 2012 shows that the incidence rates were estimated to be 10.5% and 4.5% in east and south-central Asia, respectively.[35]^3 In recent years, however, the incidence and mortality rate of prostate cancer have experienced a steady increase in some Asian countries.[36]^3 The primary prostate cancer has a 5-year survival rate of nearly 100%, and the distant metastasis has a 5-year survival rate of 28%.[37]^4 Various factors contribute to the incidence of prostate cancer including genetic, age, lifestyle and race.[38]^5 In the current era, several prostate cancer biomarkers have been approved by the US Food and Drug Administration (FDA). One of these is PCa antigen 3 (PCA3) which is specifically upregulated in tumor tissues.[39]^6 Another one is PSA isoform (ProPSA) which can distinguish tumor tissues from normal tissues.[40]^7 In recent years, molecular and genetic profiles have been widely used for the screening of therapeutic targets. The fusion of androgen-regulated promoters with ERG and E26 transformation-specific (ETS) family members has been identified as the most common alteration in prostate cancer genome.[41]^8 Tomlins et al’s[42]^8 study shows that approximately 40%–50% of prostate cancers have TMPRSS2-ERG fusion. MALAT-1 fragment named as MD-miniRNA has been reported to differentiate prostate tumor and normal samples with the sensitivity of 56.6% and specificity of 84.8%.[43]^9 Somatic point mutations such as SPOP, TP53, FOXA1 and PTEN also have been frequently identified in prostate cancer.[44]^10 In addition, gene expression signatures of prostate cancer have been widely revealed based on oligonucleotide or cDNA microarray technology. One gene that encodes the serine protease hepsin was identified to be upregulated in malignant prostate tumor samples.[45]^11 The protein level of two genes, hepsin and pim-1, was significantly correlated with clinical outcome based on tissue microarray.[46]^12 In addition, the overexpression of genes that participated in multiple signal pathways especially cell cycle regulation, DNA replication and DNA repair was significantly related to proliferation rate of metastatic prostate cancer.[47]^13 Although various studies have been carried out based on microarray or next-generation sequencing (NGS) technologies, studies by integrating mRNA and miRNA expression profiles are rare in prostate cancer. More and more publicly available datasets were submitted to Gene Expression Omnibus (GEO) database with the development of technologies. In this study, to gain further insight into the molecular mechanism of prostate cancer, we comprehensively characterized the expression profiles of 49 prostate tumor samples. First, the significant common differentially expressed genes (DEGs) from two datasets were identified using advanced bioinformatic algorithms. Then, the common DEGs were subject to gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Further, the regulation network between mRNA and six reported miRNAs was constructed. Finally, prognostic performance of the common DEGs was validated in silico using SurvExpress online database. Materials and methods mRNA and miRNA expression profiles download A lot of publicly available gene expression profiles have been submitted to GEO or ArrayExpress database in recent years. In this study, we carefully searched the GEO database and downloaded two representative mRNA expression profiles of prostate cancer. One dataset, with GEO accession number [48]GSE46602, was deposited by Mortensen et al,[49]^14 and this dataset consisted of 36 tumor tissues and 14 normal tissues. Another dataset named [50]GSE55945 was submitted by Arredouani et al[51]^15 which contains 13 tumor tissues and eight normal tissues. The two datasets were obtained from Affymetrix Human Genome U133 Plus 2.0 Array. Detailed sample information and experiment design can be obtained from previous published papers. Identification of DEGs To explore the significant DEGs in prostate cancer, two datasets were carefully analyzed with in-house pipeline based on R language and public annotation databases. Briefly, mRNA expression values from the two datasets were subject to background correction to reduce noise. Then, normalization and log2 transformation of the expression values were carried out using GeneChip Robust Multi-array Analysis (GC-RMA) algorithm.[52]^16 In addition, control probe sets or uninformative probe sets were removed, and the mean expression value was calculated for the genes with multiple probes. Finally, DEGs between tumor and normal tissues were screened out using algorithm within Limma (linear models for microarray analysis) package.[53]^17 Criteria for DEG selection were set to adjust P-value ≤0.05 and absolute log2 fold change ≥2 which can significantly reduce false positive. In addition, common DEGs between the two datasets were figured out based on Venn diagram, and the expression pattern of the common DEGs in all samples was constructed by using heatmap. 2 method within ggplot package.[54]^18 GO and KEGG pathway annotation To explore biological functions of the DEGs, the identified common DEGs were further subject to GO and KEGG pathway enrichment using online tool of DAVID (Database for Annotation, Visualization and Integrated Discovery).[55]^19 All parameters were set as default, and the GO terms consisted of biological process (BP), cellular component (CC) and molecular function (MF). mRNA–miRNA regulation network construction More and more investigations have shown that miRNA can regulate tumor-related mRNA expression and play an important role in the whole process of tumor.[56]^20 In the study by Alhasan et al,[57]^21 six miRNAs, including miR-605-5p, miR-135a-3p, miR-495-5p, miR-433-5p, miR-371a-3p and miR-106a-5p, were demonstrated to be significantly up/down-expressed in prostate cancer. Blind studies also validated that the six miRNAs can differentiate prostate cancer patients from healthy people with 89% accuracy.[58]^21 Based on the six miRNAs, we constructed the regulation network using CyTargetLinker plugin in Cytoscape.[59]^22 In brief, target genes for the six miRNAs were predicted using microCosm, mirTarbase and TargetScan databases. Then, intersection between the common DEGs and target genes were selected and subject to Cytoscape. Prognostic performance of the common DEGs It is critical to validate the clinical outcomes of identified common DEGs for the diagnosis or treatment of prostate cancer. In this study, in silico validation was conducted using SurvExpress online tool that is based on publicly available mRNA expression profiles with clinical outcomes.[60]^23 Two datasets, [61]GSE16560 and [62]GSE40272, were used, and parameters were carefully selected with the instruction from the developer. In brief, we need to input our list of genes, select tissues, datasets, and set survival analysis options. The censored parameter was set to survival_month. All the samples were used as train set and test set for the estimation of beta coefficient using Cox model and the estimation of prognostic index. The optional weight parameter was estimated as the beta coefficient from the Cox model. Other parameters were left as default. Detailed sample information for the two datasets can be obtained from previous reports.[63]^24^,[64]^25 Results Identification of DEGs in prostate cancer To identify significant DEGs, the gene expression values were subject to preprocessing first. [65]Figure 1 shows that the values are almost at the same level for the two datasets after normalization. Then, significant DEGs were figured out by applying robust empirical Bayesian method. Results showed that a total of 286 and 51 genes in [66]GSE46602 and [67]GSE55945, respectively, were significantly dysregulated in tumor samples. In [68]GSE46602, 90 genes (31.5%) were upregulated and 196 genes were downregulated (68.5%), and in [69]GSE55945, 32 genes (62.7%) were upregulated and 19 genes were down-regulated (37.3%). In addition, 30 genes were found to be simultaneously differentially expressed in both [70]GSE46602 and [71]GSE55945 ([72]Figure 2A), and the fold change and adjusted P-value are listed in [73]Table 1. In addition, fold change correlation of the common DEGs was 0.78 ([74]Figure 2B), which indicates that the expression level of the common DEGs was consistent between the two datasets. Further, the expression pattern of the common DEGs in all samples was constructed and subject to hierarchical clustering. [75]Figure 3 shows that tumor (red) and normal (blue) samples can be clearly classified into two different subgroups except for Normal_18 and Normal_1. Error assignment for these two samples was probably due to tumor heterogeneity or variation in expression values. Figure 1. [76]Figure 1 [77]Open in a new tab Boxplots for gene expression of each sample in [78]GSE46602 (A) and [79]GSE55945 (B). Figure 2. [80]Figure 2 [81]Open in a new tab Venn diagram of the DEGs in [82]GSE46602 and [83]GSE55945 (A) and fold change correlation between the two datasets (B). Abbreviation: DEGs, differentially expressed genes. Table 1. The identified 30 common DEGs in [84]GSE46602 and [85]GSE55945 Gene symbol [86]GSE46602 __________________________________________________________________ [87]GSE55945 __________________________________________________________________ Adjusted P-value Log FC Adjusted P-value Log FC ACSM1 AMACR 1.16E–05 2.88 0.00014732 3.22 AOC1 1.26E–09 −3.05 0.00356 −2.67 C1QTNF3-AMACR 9.23E–09 3.79 7.60E–06 3.38 CACNA1D 0.000226 2.08 5.58E–05 2.30 CCK 1.82E–07 −2.38 0.00125 −2.06 CD177 1.89E–12 −6.06 0.0298 −2.30 CYP3A5 1.31E–13 −3.73 0.001324667 −2.28 DLX1 3.35E–08 4.41 3.87E–08 4.16 DNAH5 2.61E–05 2.04 0.000519 2.04 F5 0.000165767 3.00 0.00329 2.16 HOXC4 1.51E–09 2.51 2.24E–06 2.56 HOXC6 2.91E–08 3.10 5.86E–07 3.33 HOXD10 8.37E–09 −2.30 0.00146 −2.05 HPN 2.97E–14 2.60 2.75E–07 2.36 INSM1 0.00204 2.15 0.000294 2.14 KCNG3 0.00233 2.06 0.0028 2.54 LINC00844 6.18E–12 −4.45 0.000129 −2.36 LOC100287413 1.58E–08 3.19 8.63E–08 2.40 LSAMP 5.33E–11 −2.75 6.56E–05 −2.10 MS4A8 0.000133 2.04 0.000398 2.82 OR51E2 0.0029 2.60 0.00765 2.26 PCA3 1.07E–06 4.19 4.70E–05 3.58 PDLIM5 1.41E–08 2.49 0.0004 2.36 PENK 2.25E–05 −2.26 8.98E–06 −2.39 PRCAT47 0.00012 3.07 5.97E–05 3.87 SIM2 3.80E–11 3.52 2.50E–05 2.10 SLC2A5 8.13E–11 −2.35 0.00025 −2.52 SV2B 6.19E–13 −3.16 0.000577 −2.30 TMEM45B 0.000505 2.42 0.001 2.49 [88]Open in a new tab Abbreviations: DEGs, differentially expressed genes; FC, fold change. Figure 3. [89]Figure 3 [90]Open in a new tab Heat map showing the expression pattern of 30 common DEGs in all tumor and normal samples. Notes: The x-axis represents samples and y-axis represents genes. The bar on the top indicates the type of sample using red (tumor) and blue (normal). Abbreviation: DEGs, differentially expressed genes. GO and KEGG pathway annotation To obtain the corresponding biological functions of the common DEGs, functional annotations were carried out using DAVID online tool that is based on GO and KEGG pathway database. As listed in [91]Table 2, SLC2A5 and CACNA1D can be enriched into carbohydrate digestion and absorption KEGG pathway. GO annotation results indicated that the common DEGs were mainly related to the BP of xenobiotic metabolic process, anterior/posterior pattern specification, locomotory exploration behavior, proximal/distal pattern formation and behavioral fear response with P-value of <0.05 ([92]Table 2). In addition, four MF terms including sequence-specific DNA binding, transcription factor activity, sequence-specific DNA binding and neuropeptide hormone activity were enriched with P-value of <0.05 ([93]Table 2). CC annotation showed that HPN, SLC2A5, F5, etc mainly located on plasma membrane ([94]Table 2). Table 2. KEGG pathway and GO enrichment results for the common DEGs Type ID Term Genes P-value KEGG hsa04973 Carbohydrate digestion and absorption SLC2A5, CACNA1D 0.0649 BP GO:0006805 Xenobiotic metabolic process CYP3A5, ACSM1, AOC1 0.0046 GO:0009952 Anterior/posterior pattern specification HOXC6, HOXC4, HOXD10 0.0049 GO:0035641 Locomotory exploration behavior PENK, LSAMP 0.0156 GO:0009954 Proximal/distal pattern formation DLX1, HOXD10 0.0310 GO:0001662 Behavioral fear response CCK, PENK 0.0373 CC GO:0005886 Plasma membrane HPN, SLC2A5, F5, PENK, CD177, LSAMP, SV2B, KCNG3, OR51E2, CACNA1D, AOC1 0.0387 MF GO:0043565 Sequence-specific DNA binding HOXC6, DLX1, HOXC4, HOXD10 0.0287 GO:0003700 Transcription factor activity, sequence-specific DNA binding HOXC6, INSM1, HOXC4, HOXD10, SIM2 0.0336 GO:0005184 Neuropeptide hormone activity CCK, PENK 0.0384 [95]Open in a new tab Abbreviations: BP, biological process; CC, cellular component; DEGs, differentially expressed genes; GEO, Gene Expression Omnibus; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular function. miRNA–mRNA regulation network Further dysregulation mechanism of the common DEGs was explored by constructing miRNA–mRNA regulation network. Prediction results showed that the six miRNAs can target 15, 1,252 and 2,046 genes in miRTarBase, TargetScan and MicroCosm databases, respectively. In addition, four out of the predicted target genes can be identified in the common DEGs. Based on the miRNA–mRNA pairings constructed using Cytoscape, we can find that miR-106a-5p can regulate the expression of PDLIM5 and SV2B ([96]Figure 4). Also, miR-135a-3p and miR-371a-3p can regulate the expression of HOXD10 and CACNA1D, respectively ([97]Figure 4). Figure 4. Figure 4 [98]Open in a new tab Regulation network between the identified common DEGs and reported miRNAs. Note: Pink hexagons represent DEGs and green circles represent miRNAs. Abbreviation: DEGs, differentially expressed genes. Prognostic performance of the common DEGs Clinical outcome validation is critical for the identification of biomarker in cancer research. In this study, two independent publicly available datasets, [99]GSE16560 and [100]GSE40272, were used to validate the prognostic performance of the identified common DEGs. Survival analysis based on clinical information from the two datasets showed that high-/low-risk prostate cancer group can be significantly differentiated in the two datasets with P-value 1.9E–7 and 1.0E–5, respectively ([101]Figure 5). In addition, higher concordance index (CI) indicated that the common DEGs can be used for the prediction of prognostic performance ([102]Table 3). Figure 5. [103]Figure 5 [104]Open in a new tab Kaplan–Meier curves of [105]GSE16560 dataset (A) and [106]GSE40272 (B). Notes: Censoring samples are marked with “+”. The x-axis represents time to event and y-axis represents percentage. The number of samples, censored number, and CI are shown in the insets. High- and low-risk groups are labeled with red and green curves, respectively. The red data represent high-risk patients, and the green data represent low-risk patients. Abbreviation: CI, concordance index. Table 3. Virtual validation results of prognostic performance Dataset Samples Genes found Risk groups (P-value) DEGs between risk groups __________________________________________________________________ CI DEGs (n) DEGs [107]GSE16560 281 19 1.90E–07 66.6 9 CACNA1D, CCK, F5, HOXC4, HOXC6, LSAMP, MS4A7, PENK, SLC2A5 [108]GSE40272 98 26 1.00E–05 89 6 CYP3A5, HOXC4, HOXC6, KCNG3, LSAMP, PENK [109]Open in a new tab Abbreviations: CI, concordance index; DEGs, differentially expressed genes. Discussion The comprehensive analysis based on mRNA and miRNA expression profiles advanced the understanding of the molecular mechanism of prostate cancer. A total of 30 DEGs were identified in 49 tumor samples by comparing with normal samples. Importantly, all tumor and normal samples can be clearly classified into two groups based on the identified DEGs. KEGG pathway annotation showed that SLC2A5 and CACNA1D participated in the carbohydrate digestion and absorption pathway. The protein encoded by SLC2A5 is a fructose transporter responsible for fructose uptake,[110]^26 and the protein encoded by CACNA1D is the alpha-1D subunit of calcium channel.[111]^27 Enhanced growth and proliferation of tumor cells place increased demand for nutrients for the synthesis of DNA and protein.[112]^28 The dysregulation of transporter and calcium channel probably increase the availability of nutrients.[113]^29 The complex molecular mechanism of prostate cancer was further unveiled by miRNA–mRNA regulation network. The regulation network showed that miR-106a-5p can regulate the expression of SV2B ([114]Figure 4). This gene encodes a member of the synaptic vesicle protein 2 family which participates in the regulation of vesicle trafficking and exocytosis.[115]^30 Expressed sequence tag (EST) analysis results showed that SV2B was differentially expressed in prostate cancer.[116]^31 GO annotations related to this gene included transporter activity and transmembrane transporter activity. Report also showed that SV2 can recruit glucose-evoked granules to the plasma membrane.[117]^32 Due to high demand of nutrient for tumor generation and metastasis, the dysregulation of SV2 in prostate cancer can provide necessary nutrient. Another critical gene is HOXD10 which is the target of miR-135a-3p ([118]Figure 4). This gene belongs to homeobox (Hox) superfamily and encodes transcription factors that control cell differentiation and morphogenesis during development.[119]^33^,[120]^34 Documents showed that tumor growth can be suppressed with ectopic expression of HOXB13 via downregulation of β-catenin–TCF pathway.[121]^35 Jung et al’s[122]^34 study also demonstrated that HOXB13 can negatively regulate the expression of TCF-4 and then suppress cell proliferation in prostate cancer. Expression study also shows that HOXD10 is commonly downregulated in gastric tumor tissue compared with normal stomach tissue. Functional analysis revealed that HOXD10 can upregulate IGFBP3, which can activate caspase-3 and caspase-8 and subsequently induce cell apoptosis. All the results suggest that HOXD10 probably functions as a tumor suppressor in prostate cancer. In addition, CACNA1D was significantly dysregulated and can be regulated by miR-371a-3p ([123]Figure 4). This gene encodes the alpha-1D subunit of calcium channel. Calcium channel has been demonstrated to contribute to cell proliferation, differentiation and apoptosis. Calcium channel blocker has an important regulatory role and can affect malignant transformation in prostate cancer.[124]^36 It has also been documented that Ca2+ signaling participated in androgen-induced PSA secretion. Prostate cancer cell proliferation and androgen receptor-mediated gene expression are significantly suppressed with the addition of nifedipine and tetrandrine.[125]^37 In addition, knocking down of CACNA1D by siRNA can abrogate androgen receptor transactivation.[126]^37 Apart from the abovementioned genes, several novel or rarely reported genes such as PDLIM5, HOXC4 and PENK were also identified in prostate cancer. Conclusion Multiple biological processes probably involved in the development and progression of prostate cancer. First, the dysregulation of SV2 can regulate transporter and transmembrane transporter activity and then provide necessary nutrient for tumor cell proliferation. Then, HOXD10 can induce cell proliferation via TCF-4. Finally, the dysregulation of CACNA1D can further suppress tumor apoptosis. Acknowledgments