Abstract Objective: This work proposes to predict target genes and pathways for uveal melanoma (UM) based on an ensemble method and pathway analyses. Methods: The ensemble method integrated a correlation method (Pearson correlation coefficient, PCC), a causal inference method (IDA) and a regression method (Lasso) utilizing the Borda count election method. Subsequently, to validate the performance of PIL method, comparisons between confirmed database and predicted miRNA targets were performed. Ultimately, pathway enrichment analysis was conducted on target genes in top 1000 miRNA-mRNA interactions to identify target pathways for UM patients. Results: Thirty eight of the predicted interactions were matched with the confirmed interactions, indicating that the ensemble method was a suitable and feasible approach to predict miRNA targets. We obtained 50 seed miRNA-mRNA interactions of UM patients and extracted target genes from these interactions, such as ASPG, BSDC1 and C4BP. The 601 target genes in top 1,000 miRNA-mRNA interactions were enriched in 12 target pathways, of which Phototransduction was the most significant one. Conclusion: The target genes and pathways might provide a new way to reveal the molecular mechanism of UM and give hand for target treatments and preventions of this malignant tumor. Keywords: uveal melanoma, miRNA, mRNA, target, gene, pathway 1. Introduction Uveal melanoma (UM) is the most frequent and aggressive ocular primary tumor that arises from neural crest-derived melanocytes of the uveal tract of the eye in adults [[27]1], with an incidence rate of up to 8 per 1,000,000 person years in Europe [[28]2, [29]3]. The fatality rate of UM is high, since patients are at risk of developing metastases up to 20 years after the initial diagnosis, and 80% of metastatic patients die within one year and 92% within 2 years of the diagnosis of metastases [[30]4, [31]5]. However, no effective adjuvant therapy is available to prevent metastases, neither is there any effective treatment once metastases have developed at present [[32]3]. With the development of gene expression related analyses, target treatments could provide new insights for effective therapy to large extent and potentially improve patient survival [[33]6]. Besides, understanding the molecular characteristics and mechanisms of UM is critical for the creation of a treatment for this tumor. It has been demonstrated that intratumoral discordance in gene expression profile is associated with intratumoral heterogeneity based upon histopathologic features in UM [[34]7]. Furthermore, several gene signatures underlying UM have been uncovered, such as Gα[q] stimulatory subunit GNAQ and BAP1 [[35]8, [36]9]. However, mutated genes do not play roles individually and similar genes often work together to complete certain biological functions. What’s more, those correlated genes might be regulated by one microRNA (miRNA) whose signatures may be promising biomarkers for the classification or outcome prediction of large number of human cancers [[37]10]. Therefore, investigating miRNAs offers an excellent way to elucidate the complex pathological mechanisms underlying malignant tumors, and gives a hand to the design of drugs for treatments. In the present study, we proposed to predict targets of miRNAs in UM based on an ensemble method produced by Le et al. [[38]11]. It could solve the inconsistent results problem resulting from individual methods by including complementary results [[39]12]. Specifically, it merged a correlation method (Pearson correlation coefficient, PCC), a causal inference method (IDA) and a regression method (Lasso) utilizing the Borda count election method. Subsequently, the predicted miRNA targets were validated by matching them with the known confirmed databases. Ultimately, pathway enrichment analysis was conducted on target genes to identify target pathways for UM patients. The target genes and pathways might light a new lamp for revealing molecular mechanism of UM and give a hand for target treatments and preventions of this malignant tumor. 2. Materials and methods 2.1. Preparation of miRNA and mRNA data MiRNA and mRNA expression data for UM patients were downloaded from the Cancer Genome Atlas (TCGA) ([40]http://cancergenome.nih.gov/), respectively. Only 80 samples which were existed in both miRNA and mRNA expression data were reserved for the following analysis. Subsequently, the miRNAs or mRNAs with expression values = 0 were removed. Then the residual expression values were converted into log2 forms and normalized using a Global Variance Stabilizing Normalization (VSN) method [[41]13]. Consequently, 793 miRNAs and 19,511 mRNAs were obtained in the expression data. For purpose of making the data more confident and reliable, the PCC method was utilized to compute the correlations between miRNA and mRNA. If the absolute PCC value of a pair of miRNA and mRNA was more than 0.7, it would be remained. Finally, a total of 107 miRNAs and 904 mRNAs were obtained for subsequent analyses. Ethical approval The conducted research is not related to either human or animals use. 2.2. Prediction of miRNA targets Using the miRNA and mRNA data, the ensemble method which integrated three methods (PCC, IDA and Lasso) based on Borda count election method, was applied to predict miRNA targets for UM. This process was comprised of three steps: Firstly, the PCC, IDA and Lasso method was used to predict miRNA targets on the basis of miRNA and mRNA data, and then these miRNA targets were ranked, respectively. Only the top k (k = 100) ranked targets were left to perform the followed analysis. Secondly, Borda rank election method was employed to integrate top k ranks of each miRNA from PCC, IDA and Lasso method, and to produce a single ranking list of elected mRNAs with respect to the miRNA. Here, Borda rank election is a good approach to merge orderly appraising results from several separated methods [[42]14]. A z-score was assigned to the candidate across all voters through the average points. The higher the z-score was, the more significant the prediction results were. At last, we ranked the predicted miRNA targets according to their z-scores and obtain the top k ranked genes from the merged list as the final output, i.e. the potential target genes for the given miRNA of UM. 2.3. Validations of predicted miRNA targets To validate the feasibility and confidence of the predicted miRNA targets in UM patients, we compared our results with the union of four popular databases, miRTarbase v4.5 [[43]15], Tarbase v6.0 [[44]16], miRecords v2013 [[45]17] and miRWalk v2.0 [[46]18]. Briefly, miRTarbase provides the most current and comprehensive information of experimentally validated miRNA-mRNA target interactions [[47]19]. While TarBase is the first resource to provide experimentally verified miRNA target interactions by surveying pertinent literature [[48]20]. As for miRecords, it accumulates experimentally validated miRNA targets and computationally predictes miRNA targets [[49]17]. Last but not least, miRWalk is an available comprehensive resource that hosts the predicted as well as experimentally validated miRNA target interaction pairs [[50]18]. After removing the duplicated interactions, we could obtain a union of known interactions and referred them to confirmed interactions in the paper. If a miRNA target interaction was involved in confirmed interactions, we thought that the predicted miRNA target was validated. 2.4. Pathway enrichment analysis In order to investigate biological functions of miRNA targets enriched in the top k miRNA-mRNA interactions, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was carried out based on the Database for Annotation, Visualization, and Integrated Discovery (DAVID, [51]https://david.ncifcrf.gov/) tool [[52]21].Here, the KEGG database ([53]http://www.genome.jp/kegg/) is a collection of manually drawn pathway maps for metabolism, genetic information processing, environmental information processing [[54]22]. Besides, the Fisher’s exact test was employed to identify significant pathways between UM patients and normal controls [[55]23]. The threshold of significance was defined as P < 0.01 which were adjusted by false discovery rate (FDR) based on Benjamini & Hochberg method [[56]24]. 3. Results 3.1. Predicted miRNA targets In the current study, a total of 107 miRNAs and 904 mRNAs of UM were prepared from the TCGA database for the subsequent analyses. Based on these expression data, miRNA targets were predicted by PCC, IDA and Lasso method respectively, and the top 100 targets from the three individual methods were integrated by the Borda rank election method. For each miRNA, only its top 100 targets were computed. During this process, a z-score was calculated for each miRNA-mRNA interaction. All interactions were ordered in descending order of z-scores, and the top 50 interactions were regarded as seed miRNA-mRNA interactions for UM patients, as displayed in [57]Table 1. Table 1. Seed miRNA-mRNA interactions for UM patients ID miRNA mRNA z-score ID miRNA mRNA z-score 1 hsa-mir-203 ASPG 3204 26 hsa-mir-3166 LMAN1 873 2 hsa-mir-195 BSDC1 3179 27 hsa-mir-3612 MC2R 851 3 hsa-mir-3915 C4BPA 3007 28 hsa-mir-335 MEST 822 4 hsa-mir-30a C6orf155 2972 29 hsa-mir-155 MIR155HG 809 5 hsa-mir-1253 C6orf191 2748 30 hsa-mir-186 MKNK1 774 6 hsa-mir-511-2 CD209 2530 31 hsa-mir-92b MMP11 748 7 hsa-mir-150 CD96 2484 32 hsa-mir-501 NEDD9 729 8 hsa-mir-3927 DEFB109P1B 2218 33 hsa-mir-142 NLRP1 713 9 hsa-mir-1247 DIO3 2104 34 hsa-mir-708 ODZ4 710 10 hsa-mir-221 EXTL1 2007 35 hsa-mir-935 OGG1 705 11 hsa-mir-887 FBXL7 1986 36 hsa-mir-143 OR51E1 703 12 hsa-mir-504 FGF13 1863 37 hsa-mir-3200 OSBP2 701 13 hsa-mir-105-1 GABRA3 1853 38 hsa-mir-139 PDE2A 700 14 hsa-mir-1185-2 GPX5 1794 39 hsa-let-7b SEC22C 697 15 hsa-mir-1185-1 HECW1 1766 40 hsa-mir-383 SGCZ 693 16 hsa-mir-196b HOXA10 1735 41 hsa-mir-584 SH3TC2 689 17 hsa-mir-196a-1 HOXC10 1684 42 hsa-mir-134 SLIT3 682 18 hsa-mir-196a-2 HOXC11 1507 43 hsa-mir-181a-1 SORBS2 680 19 hsa-mir-10b HOXD8 1436 44 hsa-mir-513b TBC1D22B 679 20 hsa-mir-3614 ISG15 1332 45 hsa-mir-199a-1 TGFBI 679 21 hsa-mir-874 KLHL3 1105 46 hsa-mir-140 NFATC4 678 22 hsa-mir-2861 KRT39 1082 47 hsa-mir-24-2 PAIP2B 672 23 hsa-mir-511-1 LILRB5 973 48 hsa-mir-532 PCBP4 670 24 hsa-mir-618 LIN7A 927 49 hsa-mir-216b PDC 669 25 hsa-mir-873 LINGO2 904 50 hsa-mir-151 PYCRL 668 [58]Open in a new tab We found that among the 50 interactions, 10 of them had z-score > 2,000, especially 3 ones with z-score > 3,000, while the z-score of 12 interactions ranged from 1,000 to 2,000. In details, the pair of hsa-mir-203-ASPG obtained the highest z-score of 3,204. The other two interactions with z-score > 3,000 were hsa-mir-195-BSDC1 (z-score = 3,179), and hsa-mir-3915-C4BPA (z-score = 3,007). The followed two miRNA-mRNA interactions were hsa-mir-30a-C6orf155 (z-score = 2972), and hsa-mir-1253-C6orf191 (z-score = 2748). Interestingly, HOXA10 was regulated by two miRNAs (hsa-mir-196b and hsa-mir-196a-1) at the same time. 3.2. Validations of predicted miRNA targets With an attempt to validate miRNA targets predicted by the ensemble method, we took a comparison of our results with confirmed miRTarBase, Tarbase, miRecords and miRWalk database. In short, miRTarbasev4.5 contains 37,372 miRNA-mRNA interactions (covering 576 miRNAs). There were 20,095 interactions with 228 miRNAs in Tarbase v6.0. A total of 21,590 interactions representing 195 miRNAs were found in miRecords v2013. And miRWalk v2.0 covers 1,710 miRNA-mRNA interactions involved 226 miRNAs. By removing the duplicated interactions, we obtained total 62,858 confirmed interactions for validations.When comparing our predicted miRNA-mRNA interactions with confirmed interactions, 38 interactions were matched, which further indicated that our method was an available and valuable method for predicting miRNA targets. 3.3. Pathway enrichment analysis After prediction and validation for miRNA targets obtained from the ensemble method, we aimed to identify significant functional gene sets of miRNA targets. Due to the too large scale of miRNA targets, we selected genes enriched in the top 1, 000 ranked interactions which might be more important than the others for UM as study objects. Thus, KEGG pathway enrichment analysis was conducted on 601 targets in the top 1,000 miRNA-mRNA interaction based on the DAVID tool. When setting the cut-off as p-value < 0.05 (adjusted by Benjamini–Hochberg (BH) method), a total of 12 target pathways were detected ([59]Table 2). The top five significant pathways were Phototransduction (P = 1.85E-06), Chemokine signaling pathway (P = 4.36E-05), Ribosome (P = 7.13E-04), Phenylalanine metabolism (P = 2.25E-03), and Cytokine-cytokine receptor interaction (P = 5.02E-03). Particularly, Phototransduction was comprised of 9 targets including CNGB1, GNAT1, GNAT2, GNGT1, GUCA1A, GUCY2F, RCVRN, RHO and GUCA1C. Meanwhile, the Chemokine signaling pathway consisted of 21 targets (ADCY1, GNB3, GNGT1, HCK, ITK, PRKCD, CCL4, CCL5, CXCL11, VAV2, CXCL14, CXCR6, GNG13, RPL10A, RPL3, RPL11, RPL22, RPL35A, RPS8, RPS23 and RPS27A). Table 2. Target pathways in top 1000 miRNA-mRNA interactions ID Pathway miRNA targets P value 1 Phototransduction CNGB1;GNAT1;GNAT2;GNGT1;GUCA1A;GUCY2F;RCVRN;RHO;GUCA1C 1.85E-06 2 Chemokine signaling pathway ADCY1;GNB3;GNGT1;HCK;ITK;PRKCD;CCL4;CCL5;CXCL11;VAV2;CXCL14;CXCR6;GNG13 ; 4.36E-05 3 Ribosome RPL10A;RPL3;RPL11;RPL22;RPL35A;RPS8;RPS23;RPS27A 7.13E-04 4 Phenylalanine metabolism DDC;HPD;MAOB 2.25E-03 5 Cytokine-cytokine receptor interaction TNFRSF8;CSF2RB;CTF1;IL2RB;IL12RB1;LTB;NGFR;CCL4;CCL5;CXCL11;TNFRSF1B;CX CL14;CXCR6;TNFRSF19;RELT 5.02E-03 6 Long-term depression GRIA1;GRIA3;GRID2;GRM5;IGF1;RYR1 2.33E-02 7 Primary immunodeficiency LCK;PTPRC;TAP1;ZAP70 3.74E-02 8 Cell adhesion molecules (CAMs) HLA-F;PECAM1;PTPRC;SDC2;SIGLEC1;CNTNAP1;CADM1;CNTNAP2;CADM3 3.85E-02 9 Amyotrophic lateral sclerosis (ALS) DAXX;GRIA1;MAPK12;TNFRSF1B;DERL1 3.91E-02 10 Tyrosine metabolism DDC;HPD;MAOB;HEMK1 4.77E-02 11 Glycosaminoglycan biosynthesis - heparan sulfate / heparin EXT1;EXTL1;NDST4 4.79E-02 12 Neuroactive ligand-receptor interaction CHRNA3;CHRNA4;CHRNB3;EDNRB;GABRA1;GABRA3;GABRG2;GRIA1;GRIA3;GRID2;GRIK1 ;GRM5;HTR2B;MC2R 4.91E-02 [60]Open in a new tab The p-values have been corrected based on Benjamini & Hochberg method. P<0.01 was considered as the threshold of significance. 4. Discussion MiRNAs, a family of small non-coding RNA molecules, regulate expressions of genes by promoting mRNA degradation and repressing translation [[61]25]. Their roles and functions in tumors have attracted more and more attentions from researchers, and the possible inferences are that miRNA participate in cancer-related processes, including proliferation, metabolism, differentiation, apoptosis and even cancer development and progression [[62]26]. But there have been few studies to uncover miRNA targets in UM systemically. Hence, in this paper, we predicted target genes and pathways for UM patients based on the ensemble method that was an integration of PCC, IDA and Lasso methods. Briefly, PCC is the commonly used correlation method for the strength between a pair of variables [[63]27]. But it often leads to negative rank of miRNA-mRNA correlations due to down-regulation of miRNAs for mRNAs [[64]11]. In addition, the PCC would not be greatly reduced if the data were in the non-linear distribution [[65]28]. Meanwhile, IDA is a causal inference method that counts the causal effects between two variables [[66]29, [67]30]. And the miRNA-mRNA correlations predicted by the IDA method have parts of overlap with outcomes of the follow-up gene knockdown experiments [[68]31]. As for the Lasso, it minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients [[69]32]. Like the limitation of PCC method, the miRNA-mRNA pairs identified by Lasso have negative effects are ranked at the top of the ranking list to favor the down regulation. Moreover, the ensemble method captured confirmed interactions in the incomplete ground truth that existing individual methods fail to discover, although there is no complete ground truth of miRNA target prediction [[70]11]. Therefore, we employed Borda count election method to integrate the above three methods together, and obtained the ensemble method. Generally speaking, great challenges have been occurred on validating our predicted results, because the amount of experimentally confirmed miRNA targets is still limited and there is no complete authority for accessing and comparing different computational methods [[71]33]. Hence the feasibility of our predicted results has been validated by comparing them with confirmed interactions. Results of the ensemble method showed that hsa-mir-203-ASPG, hsa-mir-195-BSDC1 and hsa-mir-3915-C4BPA were the most important miRNA-mRNA interactions, and consequently ASPG, BSDC1 and C4BPA were more critical target genes for UM than the others predicted. However, there have still been no studies to investigate the regulatory mechanisms of hsa-mir-203-ASPG, hsa-mir-195-BSDC1, and hsa-mir-3915-C4BPA. miR-203 has been reported to be overexpressed in pancreatic adenocarcinoma cells [[72]34], while it also has been suggested as a tumor-inhibitory miRNA in hepatocellular carcinoma [[73]35]. The abnormal of miR-195 in many cancers has also been reported by many researchers. It increased in breast cancer and chronic lymphocytic leukemia while decreased in gastric cancer, hepatocellular carcinoma, colorectal carcinoma and bladder cancer[[74]36]. So far, study on miR-3915 was still limited. ASPG (asparaginase, also known as 60-kDa lysophospholipase) catalyzes the hydrolysis of L-asparagine to L-aspartate and ammonia [[75]37]. It is used for remission induction and intensification treatment in all pediatric regimens and in the majority of adult treatment protocols [[76]38]. C4BPA (complement component 4 binding protein alpha) a member of a super-family of proteins composed predominantly of tandemly arrayed short consensus repeats of approximately 60 amino acids [[77]39]. It had been reported that the C4BPA locus was a new susceptibility locus for venous thrombosis visa protein S regulation, opening a new research area focusing on C4BP regulatory pathway [[78]40]. It is the first time to uncover the relations between the target genes and UM, and further experimental validations would be finished as soon as possible. As mentioned above, KEGG pathway enrichment analysis for 601 target genes in top 1,000 miRNA-mRNA interactions were performed, and 12 target pathways with P < 0.05 were identified. Importantly, Phototransduction and Chemokine signaling pathway were the most ones for UM compared with normal controls. The definition for Phototransduction in KEGG pathway database is a biochemical process by which the photoreceptor cells generate electrical signals in response to captured photons. Aguila et al revealed that heat shock protein 90 inhibition on visual function are likely to relate to essential its client proteins in the phototransduction pathway in the retina and potentially elsewhere in the eye [[79]41]. Hence target pathway Phototransduction was related to UM tightly. In conclusion, we have successfully predicted miRNA target genes and pathways for UM patients based on the ensemble method. The findings in this study might shed new light on uncovering the molecular mechanism underlying UM, and provide potential target signatures for prevention and treatment of this tumor. Moreover, whether the predicted miRNA targets are indeed involved in the development of UM, need to be confirmed by experiments urgently. Acknowledgement