Abstract Somatic mutations in 3′-untranslated regions (3′UTR) do not alter amino acids and are considered to be silent in cancers. We found that such mutations can promote tumor progression by altering microRNA (miRNA) targeting efficiency and consequently affecting miRNA–mRNA interactions. We identified 67,159 somatic mutations located in the 3′UTRs of messenger RNAs (mRNAs) which can alter miRNA–mRNA interactions (functional somatic mutations, funcMutations), and 69.3% of these funcMutations (the degree of energy change > 12 kcal/mol) were identified to significantly promote loss of miRNA-mRNA binding. By integrating mRNA expression profiles of 21 cancer types, we found that the expression of target genes was positively correlated with the loss of absolute affinity level and negatively correlated with the gain of absolute affinity level. Functional enrichment analysis revealed that genes carrying funcMutations were significantly enriched in the MAPK and WNT signaling pathways, and analysis of regulatory modules identified eighteen miRNA modules involved with similar cellular functions. Our findings elucidate a complex relationship between miRNA, mRNA, and mutations, and suggest that 3′UTR mutations may play an important role in tumor development. Keywords: miRNA polymorphism, drug response alteration, drug target, functional somatic mutation, gene expression 1. Introduction MicroRNAs (miRNAs) are a class of small noncoding RNAs consisting of 19–25 nucleotides that play important roles in regulating mRNA expression at the post-transcriptional level. By binding to 3′-untranslated regions (3′UTR) of cytosolic messenger RNAs (mRNAs), miRNAs either reduce the translation or increase the degradation of transcripts [[48]1]. Cumulative studies have demonstrated the importance of miRNAs in targeting pivotal genes to promote tumor progression [[49]2,[50]3,[51]4]. Somatic mutations play crucial rules in tumor initiation and progression [[52]5,[53]6,[54]7]. A small number of mutations are identified to be “driver mutations” that yield a selective clonal growth advantage to cancer cells. Meanwhile, the large majority of mutations are “passenger mutations” which do not explicitly yield a growth advantage but may nonetheless provide useful information about the evolutionary forces within the cancer genome [[55]8]. Studies have demonstrated that the accumulation of somatic mutations influence the regulation of several important signaling pathways including the P53, TGF-β, and WNT signaling pathways [[56]9,[57]10]. Importantly, different mutational processes may generate different mutation types (or signatures) [[58]11]. For instance, the UV-induced mutation was strongly associated with C>T and CC>TT alterations [[59]12], and it was recently suggested that temozolomide (TMZ) treatment may also increase the C>T alterations in glioblastoma multiforme (GBM) patients [[60]13]. Non-synonymous gene mutations which alter the amino acids of protein products may result in significant functional changes and accelerate the progression of tumors. For example, somatic mutation at R132 of IDH1, a gene encoding NADP+-dependent isocitrate dehydrogenase 1, was found to be associated with early gliomagenesis [[61]14]. By changing the function of the enzyme to produce 2-hydroxyglutarate, the IDH1 mutation remodeled the environment to fuel tumor cell development [[62]15]. Synonymous mutations, meanwhile, do not alter amino acids and are largely considered to be “silent mutations”. Yet, Supek et al. present a compelling analysis that suggests synonymous mutations in cancer can be oncogenic by altering transcript splicing and thereby affecting protein function [[63]16]. Others have also implicated mutations in cancer [[64]17]. Akdeli et al. found a 3′UTR polymorphism that can influence PLK1 mRNA stability and may be a factor for response to PLK1 inhibitors [[65]18], suggesting that the mutations residing in the 3′UTR may alter miRNA binding efficiency and consequently trigger loss/gain of gene function. Cui et al. found 11 mutations in 3′UTR that alter miRNA target sites in cancer-related genes [[66]19]. A number of methods and tools have been developed for compiling the compendiums of functional genomic variations in miRNA target sites, including PolymiRTS [[67]20], Patrocles [[68]21], MicroSNiPer [[69]22], and dbSMR [[70]23]. SomamiR is a database for collecting somatic mutations in miRNAs and their target sites that potentially alter the interactions between RNAs [[71]24]. Systematic characterization of the functional implications of mutations in cancers remains to be done. In the present study we analyzed 744,270 3′UTR mutations among which we identified 67,159 mutations located in the 3′UTR of mRNAs that can alter miRNA–mRNA interactions, and 69.3% of these functional somatic mutations (funcMutations, the degree of energy change > 12 kcal/mol) were found to result in the loss of miRNA-mRNA binding. G>T mutations were observed with the highest proportion (about 22.9%) among funcMutations. The C and G wild-type mutation was prone to result in loss of miRNA binding while the A and T wild-type mutation may promote the ability for genes to bind with miRNAs. By integrating mRNA expression profiles of 21 cancer types, we found that the expression of target genes showed a positive correlation with the loss of absolute affinity level and a negative correlation with the gain of absolute affinity level. Functional enrichment analysis revealed that the funcMutations may regulate gene expression and affect cancer-related signaling pathways (e.g., WNT signaling pathway). Analysis of regulatory modules identified eighteen miRNA modules co-regulating similar cellular processes. Our findings suggest that 3′UTR mutations may perturb miRNA–mRNA interactions and consequently have important implications for tumor development. 2. Materials and Methods 2.1. Functional Somatic Mutation Identification Sequences of mature miRNA were acquired from miRBase version 21 [[72]25]. Somatic mutations located in the 3′UTRs were obtained from The Cancer Genome Atlas (TCGA, [73]https://portal.gdc.cancer.gov, annotated data) [[74]26]. Variants calling results (VCF files) of whole-exome target sequencing (WES) datasets of 10,429 tumors for 33 cancer types were obtained from TCGA which detected 1,648,474 potential single nucleotide variants in 3′UTRs. The 3′UTR position annotation analysis for mutation data were conducted by reference file derived from Ensembl ([75]https://www.ensembl.org/index.html, Version GRCh38.p10) [[76]27]. The sequences of mRNA 3′UTRs were obtained from the UCSC Genome browser (UCSC Genome Browser, [77]http://genome.ucsc.edu/, version: GRCh38, Dec. 2013) [[78]28]. The mRNAs with or without 3′UTR mutations were considered as the mutant or the wild-type mRNA, respectively. The sequences of mRNAs (either mutant or wild-type) were extended 30 bp on both upstream and downstream of variant position. Finally, these paired sequences were analyzed by Probability of Interaction by Target Accessibility (PITA) [[79]29] (with default parameters) to assess the change of free energy and predict potential miRNA target sites. We defined that miRNA and mRNA had a binding event if the absolute value of the PITA score > 10 kcal/mol (ΔΔG < −10). The somatic mutations changing the free energy between miRNA and mRNA sequences were defined as functional somatic mutations (funcMutations). According to the binding of miRNA to the mRNA 3′UTR, we assigned the potential funcMutations to one of the four classes: “complete gain”, the mRNA acquires a new miRNA binding site through the wild-type somatic mutation into variant-type somatic mutation; “complete loss”, mRNA loses a predicted miRNA binding site through the wild-type somatic mutation into variant-type somatic mutation; “partial gain”, mRNA acquires more stable miRNA binding site than that without the somatic mutation; “partial loss”, mRNA target site turns into unstable miRNA binding site with the somatic mutation. The degree of binding is quantified by the change of the PITA score, which was defined as follows: [MATH: absolute affinity=|Tvar< /mi>Twt< /mrow>| :MATH] where [MATH: Twt :MATH] represents the score of miRNA binding to the wild-type 3′UTR sequences and [MATH: Tvar :MATH] represents the score of miRNA binding to the variant-type 3′UTR sequences. Absolute affinity represents a strengthened miRNA regulation ability from wild-type allele to the variant-type. To analyze the position feature of somatic mutations in 3′UTR of genes, we calculated the distance between somatic mutations and the start position of 3′UTR of genes and the length of the localized 3′UTR of genes. Finally, we calculated the relative position of somatic mutations in 3′UTR. 2.2. Gene Expression Analysis and Tumor Hallmarks Enrichment Analysis mRNA expression data of multiple cancer types were obtained from TCGA (RNA-SeqV2). The expression profiles were normalized level 3 data across twenty-one kinds of cancer types (total 6078 samples), including bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), cholangiocarcinoma (CHOL), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), pancreatic adenocarcinoma (PAAD), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), thymoma (THYM), and uterine corpus endometrial carcinoma (UCEC). In order to reduce the influence of the batch effect of expressions across different cancer types, the gene expression value was adjusted by the average expression within the same cancer type. The ratio for each gene was denoted as the ratio of the gene expression in mutated sample to the mean of gene expression level in all samples. To interpret the influence of functional somatic mutations on tumors, we applied the affected genes (designated as tarGenes) to enrichment analysis, including subcellular localization (from The Human Protein Atlas, [80]https://www.proteinatlas.org/) [[81]30,[82]31] and KEGG pathway enrichment analysis. In addition, tumor hallmarks (50 gene sets) were available from GSEA dataset (Gene Set Enrichment Analysis, [83]http://software.broadinstitute.org/gsea/index.jsp) [[84]32,[85]33] and the relationship between tarGenes and cancer hallmarks were explored. The network was visualized by Cytoscape ([86]https://www.cytoscape.org) [[87]34]. 3. Results 3.1. Identification of Functional Somatic Mutations in 3′UTRs We obtained a total of 2048 sequences of miRNA from miRBase and a total of 25,539 effective 3′UTR sequences of transcripts of human protein-coding genes from the UCSC Genome browser. We integrated 744,270 somatic mutations located in 3′UTR from TCGA. Through mapping somatic mutations into 3′UTR of transcripts, 451,002 out of 744,270 somatic mutations were mapped into the 3′UTR sequences of transcripts obtained from the UCSC Genome browser. For each somatic mutation in human mRNA 3′UTR regions, we assessed whether its two alleles would cause different miRNA-mRNA target binding. Then, we obtained 451,002 pairs of the length of 61 bp of wild-type and variant-type mRNA sequences. After calculating the PITA score, 67,159 funcMutations were detected and these somatic mutations causing the change of 3,314,904 pairs of miRNA-mRNA binding, as shown in [88]Supplement Table S1. In addition, 9107 funcMutations were selected for subsequent analysis, which most significantly impacted 33,385 pairs of miRNA–mRNA interactions and the absolute affinity (degree of energy change) > 12 kcal/mol, as shown in [89]Supplement Figure S1. On the other hand, we integrated and compared the results with experimentally validated miRNA–mRNA interactions found in miRTarBase [[90]35]. We found that 54,244 of 3,314,904 pairs of miRNA–mRNA interactions were experimentally validated, as shown in [91]Supplement Table S2. 3.2. Position Feature and Base Substitution Characteristics of 3′UTR Mutations In the present study, all mutations located in the 3′UTRs were collected from TCGA. To explore the positional features of these somatic mutations, we conducted a statistical analysis of the positional distribution of somatic mutations in 3′UTR. Intriguingly, the position of somatic mutations concentrated on 5′ ends of 3′UTR, which is closer to the gene body, as shown in [92]Figure 1A. This finding may be caused by data bias as nearly all samples are WES data. The Illumina TruSeq Exome Enrichment Kit (Illumina, San Diego, CA, USA) is used by the TCGA project and covers 62 Mb DNA sites. Somatic mutations located in 3′UTR can be captured by WES since the enriched sequences include ~28 Mb of UTR. In addition, somatic mutations in TCGA cases were identified by Mutect2, which is a sensitive method to detect somatic mutations. Mutect2 will provide at least 80% power to detect mutations with an allelic fraction of 0.3 if the coverage of depth is more than 14-fold in tumors and 8-fold in normal samples [[93]36,[94]37]. In order to confirm whether the capture kit for WES can provide efficient power to detect mutations in 3′UTR, we analyzed the coverage of depth of 3′UTR in both tumor and normal samples. We found that 85.3% of 3′UTRs have more than 14-fold coverage in tumors, as shown in [95]Supplement Figure S2A, and 88.5% of 3′UTRs have more than 8-fold coverage in normal samples, as shown in [96]Supplement Figure S2B. The analysis demonstrates that the coverage of depth captured by the targeted sequencing kit utilized in the TCGA project provides an effective capability to detect mutations in 3′UTR. Figure 1. [97]Figure 1 [98]Open in a new tab (A) The distribution of somatic mutations in 3′-untranslated regions (3′UTR). Horizontal axis represents the relative position of somatic mutations in 3′UTR from 5′ to 3′. Vertical axis represents the density of somatic mutations located in 3′UTR. (B) Contributions of base substitutions in 3′UTR mutations. To assess the functional implications of different base substitutions located in the 3′UTR of genes, we analyzed the different base substitutions and their effect on binding status. We found that G>T mutations (~22.9%) occurred most frequently. Intriguingly, C and G wild-type mutations tended to result in loss of miRNA binding. Meanwhile, A and T wild-type mutations were more likely to result in gain of miRNA binding, as shown in [99]Figure 1B. Among the 30 mutational signatures collected in COSMIC (Catalogue of Somatic Mutations in Cancer, [100]https://cancer.sanger.ac.uk/cosmic/) [[101]38], signatures 11 and 23 exhibit strong transcriptional strand-bias for C>T mutation. For example, signature 11 has been found in melanomas and glioblastomas and exhibits a mutational pattern resembling that of alkylating agents. Patient histories have revealed an association between treatment with the alkylating agent TMZ and Signature 11 mutations [[102]13]. Accordingly, we speculate that some mutations can impact cancer development by altering miRNA targeting efficiency and thereby affecting miRNA-mRNA binding status. 3.3. The Impact of 3′-Untranslated Regions Mutations on Gene Expression The function of a gene depends on its expression and may be regulated by miRNAs. To further explore the implications of 3′UTR mutations for gene expression, we integrated mRNA expression across 21 cancer types (normalized by average expression) and analyzed their funcMutations. We focused on the genes having gain or loss of miRNA binding for which the frequency detected in cancer types was greater than two (common mutation) and the average absolute affinity was greater than 1 kcal/mol (explicit loss or gain status). As expected, we found that the expression of tarGenes exhibited a positive correlation with the loss of absolute affinity level, as shown in [103]Figure 2A. In contrast, in gaining status, expression of tarGenes significantly declined with the increase of absolute affinity level, as shown in [104]Figure 2B. These findings demonstrate the importance of funcMutations on the regulation of gene expression. Figure 2. [105]Figure 2 [106]Open in a new tab (A,B) Correlation between the expression of tarGene and miRNA (microRNA) binding absolute affinity (A: loss status, B: gain status). Vertical coordinate represents the ratio of the gene expression in mutated sample to the mean of gene expression level in all samples. Three colors represent different degrees of the absolute affinity level. Statistical testing was performed using the Wilcoxon rank-sum test. 3.4. 3′UTR Mutations Promote Tumor Progression To evaluate the influence of 3′UTR mutations in tumors, we performed enrichment analysis of the tarGenes by using subcellular localization and KEGG pathway enrichment analysis. Most of the organelles were enriched with a slight proportion (~0.3) and only five were found with a proportion ranging from 0.33 to 0.50. Moreover, Fisher’s exact test analysis revealed that tarGenes were significantly correlated with expression in the cytosol, vesicles, nucleoplasm, nucleus, mitochondria, and nucleoli (Benjamini–Hochberg adjustment, q-value < [MATH: 103 :MATH] ), as shown in [107]Figure 3A. The enrichment analysis based on the KEGG pathway indicated that tarGenes have a number of cancer-related functions including Pathways in cancer, Basal cell carcinoma, MAPK signaling pathway, Calcium signaling pathway, WNT signaling pathway, Ras signaling pathway, and Rap1 signaling pathway, as shown in [108]Figure 3B. Figure 3. [109]Figure 3 [110]Open in a new tab (A) Subcellular localization. Blue line denotes the hit proportion of tarGenes across subcellular gene sets. Red line signifies the odd ratio of tarGenes across subcellular gene sets. The bottom bar plot represents the log-transformed q-value (Benjamini–Hochberg adjustment) for enrichment analysis of each subcellular gene set. (B) KEGG pathway enrichment analysis, the false discovery rate <0.05. We also explored the relationship between tarGenes and cancer hallmarks. As expected, the number of tarGene mutations decreased with increasing number of hallmarks, as shown in [111]Figure 4A. By leveraging permutation analysis, we selected some of the tarGenes which exhibited large numbers of mutations relative to genes involved in an equal number of hallmarks, as shown in [112]Supplement Table S3, as we speculated they may be important for tumor progression. For example, PRX (permutation q-value < 0.001, mutations = 18) is a protein coding gene that contains PSD95 (post synaptic density protein) and DlgA (Drosophila disc large tumor suppressor) domains. The 3′UTR mutations of PRX generally resulted in a loss of binding affinity to miRNAs, as shown in [113]Figure 4B. The miRNAs may otherwise up-regulate PRX expression and thereby inhibit tumor progression. HIPK2 (permutation q-value = 0.001, mutations = 10), a tumor suppressor gene [[114]39], was found to be involved in two tumor hallmarks. The 3′UTR mutations in HIPK2 generally resulted in a gain of binding affinity to miRNAs, as shown in [115]Figure 4C. The miRNAs may down-regulate the expression of HIPK2 and thereby promote tumor progression. The 3′UTR mutations in PPARD (permutation p-value = 0.001, mutations = 10) were associated with loss of miRNA binding, as shown in [116]Figure 4D. The binding of miRNAs may otherwise inhibit PPARD expression as studies have demonstrated that PPARD expression promotes tumor progression and metastasis [[117]40]. These findings together indicate that mutations in the 3′UTR of tarGenes may have a complex impact on tumor progression by changing binding affinity of a spectrum of miRNAs. Figure 4. [118]Figure 4 [119]Open in a new tab (A) Number of hallmarks versus number of mutations across tarGenes. Each tarGene was subjected to permutation analysis for significance evaluation. (B–D) Differential analysis of affinity changes between loss and gain of 3′UTR mutation in PRX, HIPK2, and PPARD. Blue and red lines represent the absolute affinity change of loss and gain, respectively. 3.5. microRNA Modules Co-Regulate the Tumor Hallmarks Based on the analysis above, we next explored the involvement of miRNAs in co-regulating tumor hallmarks. In general, the 3′UTR mutations in cancers contributed to a loss of binding affinity (69.3% of the funcMutations were identified triggering the loss of miRNA-mRNA binding), as shown in [120]Figure 5A. To explore the function of changes in binding ability (designated as a loss and gain status), we applied the tumor hallmark enrichment analysis of the corresponding tarGenes to each of the miRNAs. In the loss status, a number of potential tumor suppressor miRNAs were free from mutated binding sites, including hsa-miR-4763-3p, and hsa-miR-328-5p [[121]41]. Meanwhile, tumor progression hallmarks that support the invasion and metastasis of tumors, including inflammatory response, DNA repair, and IL6/JAK/STAT3 signaling, were activated, as shown in [122]Figure 5B. A number of miRNAs (e.g., hsa-miR-6803-5p, etc.) that re-gained the tarGene binding site may inhibit pathways otherwise involved in tumor suppression (e.g., the P53 pathway) or activate pathways involved with oncogenesis (e.g., the IL6/JAK/STAT3 pathway), as shown in [123]Figure 5C. We investigated the latent correlation between the miRNAs that gained binding ability to new tarGenes (e.g., the number of tarGenes > 100). By conducting Fisher’s exact test (q-value < 0.01, odds ratio > 10) and MCODE cluster analysis (default parameter), we found eighteen modules consisting of 98 miRNAs, as shown in [124]Figure 5D and [125]Supplement Figure S3. For example, module3 was derived from the miRNAs having similar functions. Previous studies have demonstrated that the hsa-miR-331-3p [[126]42], hsa-miR-4486 [[127]43], hsa-miR-378g [[128]44], hsa-miR-6860 [[129]45], hsa-miR-615-3p [[130]46], and hsa-miR-5189-5p [[131]47] were involved in the proliferation, invasion, and metastasis of cancers. Herein we provide a new method for analysis of miRNA modules co-regulating tumor pathways, and our findings contribute to a better understanding of the implications of 3′UTR mutations in tumor biology. Figure 5. [132]Figure 5 [133]Open in a new tab (A) Influence of miRNAs on genes. Red bar represents the number of genes bound by miRNA which re-gained binding sites. Blue bar represents the number of genes that were free from miRNA due to the loss of binding sites; (B) Connections between tumor hallmarks and miRNAs in the loss status; (C) Connections between tumor hallmarks and miRNAs in the gain status; (D) The module of miRNAs in gain status. 4. Discussion Mutations of 3′UTR may affect tumor progression. In this article, we analyzed the mutation position feature, base substitution characteristics, tumor progression, and gene regulation to explore a broad landscape of miRNA–mRNA interactions. Theoretically, the greater change of free energy between miRNA and mRNA sequences indicates an increasing or decreasing impact of somatic mutations on miRNA-mRNA binding which may, in turn, result in a negative or positive regulation of mRNA expression. Most miRNAs primarily target the 3′UTR regions of mRNAs, and accordingly, the present study focused on that aspect in an effort to promote confidence of the analysis results. Future studies will integrate more functional somatic mutations in different regions of both miRNA and their target sites to further explore the influence of mutations on miRNA-mediated gene regulation. To explore how mutations may affect tumor progression, we assessed the position feature and base substitution characteristics of 3′UTR mutations. We found that some 3′UTR mutations alter miRNA targeting efficiency and thereby can impact tumor development. By analyzing the position feature, however, we found that sequencing technology may introduce data bias. In future studies, we need unbiased whole genome sequencing data to confirm our findings. To further evaluate the influence of mutations in tumors, we next integrated subcellular localization and KEGG pathway enrichment analysis to functionally explore the genes with 3′UTR somatic mutations. The analysis revealed significant enrichment of a number of cancer-related processes including MAPK and WNT signaling pathways. In addition, we identified that regulation of some of the tarGenes (e.g., PRX, HIPK2, and PPARD) may be important for tumor progression. Overall, the differential analysis demonstrated that the 3′UTR mutations in cancers generally caused an increasing loss of miRNA binding ability. The miRNAs with either loss or gain status appeared to regulate a series of tarGenes to activate or deactivate the tumor-related pathways in a parallel fashion which, together, may result in coordinated and stable effects in tumor development. Specifically, we identified eighteen miRNA modules co-regulating similar functions. We also found that the expression of tarGenes positively correlated with the loss of absolute affinity level and negatively correlated with the gain of absolute affinity level in 21 cancer types. Our findings together demonstrate that 3′UTR mutations may play an important role in tumor development and the phenomenon should be further investigated. Acknowledgments