Abstract
Background
While microRNAs (miRNAs) were widely considered to repress target genes
at mRNA and/or protein levels, emerging evidence from in vitro
experiments has shown that miRNAs can also activate gene expression in
particular contexts. However, this counterintuitive observation has
rarely been reported or interpreted in in vivo conditions.
Methods
We systematically explored the positive correlation between miRNA and
gene expressions and its potential implications in tumorigenesis, based
on 8375 patient samples across 31 major human cancers from The Cancer
Genome Atlas (TCGA).
Findings
We found that positive miRNA-gene correlations are surprisingly
prevalent and consistent across cancer types, and show distinct
patterns than negative correlations. The top-ranked positive
correlations are significantly involved in the immune cell
differentiation and cell membrane signaling related processes, and
display strong power in stratifying patients in terms of survival rate.
Although intragenic miRNAs generally tend to co-express with their host
genes, a substantial portion of miRNAs shows no obvious correlation
with their host gene plausibly due to non-conservation. A miRNA can
upregulate a gene by inhibiting its upstream suppressor, or shares
transcription factors with that gene, both leading to positive
correlation. The miRNA/gene sites associated with the top-ranked
positive correlations are more likely to form super-enhancers compared
to randomly chosen pairs. Wet-lab experiments revealed that positive
correlations partially remain in in vitro condition.
Interpretation
Our study brings new insights into the critical role of miRNA in gene
regulation and the complex mechanisms underlying miRNA functions, and
reveals both biological and clinical significance of miRNA-associated
gene activation.
Keywords: Pan-cancer miRNA, miRNA activation, Intragenic miRNA,
Super-enhancer
Abbreviations: miRNA, microRNA; 3′UTR, 3′-Untranslated region; AGO,
Argonaute protein; miRISC, miRNA-induced silencing complex; microRNPs,
micro-ribonucleoprotein; SE, super-enhancer; GO, gene ontology; KEGG,
Kyoto encyclopedia of genes and genomes; TCGA, The Cancer Genome Atlas;
ENCODE, encyclopedia of DNA elements; qRT-PCR, quantitative real-time
PCR
__________________________________________________________________
Research in context.
Evidence before this study
miRNAs have long been well-known to repress target genes at either mRNA
or protein level, through which regulating gene expression in a cell to
a physiologically favorable balance. Recent evidence from in vitro
studies indicates that miRNAs can also promote gene expression in
particular conditions. However, there is a lack of investigation of
this counterintuitive phenomenon in clinical samples. In addition, its
implications in tumorigenesis, and the general mechanisms underlying
this phenotype have not been systematically explored.
Added value of this study
By integrative analysis of miRNA and gene expression profile of 8375
patient samples across 31 major human cancers from The Cancer Genome
Atlas (TCGA), we for the first time showed that miRNA-associated gene
activation, rather than suppression, is surprisingly prevalent and
conserved across multiple cancer types. We further confirmed the
biological and clinical significance of the positive correlations in
terms of essential biological processes they participate and cancer
hallmarks they present. Additionally, we proposed and explored four
potential molecular mechanisms that can well explain the observed gene
upregulation associated with miRNA.
Implications of all the available evidence
The present study established a striking phenomenon regarding
miRNA-associated gene activation in human cancer samples, corroborated
its biological and clinical meaning, and partially explained the
underlying molecular basis. Our work sheds new light on the complex
miRNA-mRNA interaction and its implications in tumorigenesis.
Alt-text: Unlabelled Box
1. Introduction
Cancer is caused by uncontrolled cell growth reflective of multiple
established hallmarks [[33]1]. Underlying the aberrant cell
proliferation is activation of critical oncogenes and inactivation of
tumor suppressor genes resulted from multiple genetic and epigenetic
alterations in a cancer tissue-specific manner [[34][2], [35][3],
[36][4], [37][5], [38][6]]. Among these, microRNAs (miRNAs) are a class
of small (~22 nucleotides), non-protein-coding RNAs known as important
post-transcriptional regulators of gene expression [[39]7,[40]8].
miRNAs exert regulatory functions by base-pairing with complementary
sequences typically in the 3′-untranslated region (3′UTR) of mRNAs to
target them for degradation or prevent their translation [[41]7,[42]9].
It is estimated that >60% of human protein-coding genes are under
selective pressure to maintain pairing to miRNAs and over one third of
human genes appear to be conserved miRNA targets [[43]10,[44]11]. This
indicates that miRNAs can influence almost every critical signaling
pathway in a cell [[45]12].
A canonical miRNA is transcribed from miRNA gene by RNA polymerase II
(pol II) as a primary miRNA (pri-miRNA). The pri-miRNA transcript is
first cleaved in the nucleus by the microprocessor, which contains a
nuclear RNase III called Drosha and its cofactor DGCR8, into a hairpin
structured precursor miRNA (pre-miRNA). Then, the pre-miRNA is exported
to the cytoplasm through the activation of Exportin 5 and RAN-GTP, and
further processed by an endonuclease Dicer to generate the miRNA
duplex, which contains the miRNA paired to its passenger strand usually
called miRNA* (miRNA star). One strand of the miRNA duplex, the mature
miRNA, is loaded into an Argonaute protein (AGO) to form a
miRNA-induced silencing complex (miRISC), whereas the other strand, the
miRNA*, is degraded. Once loaded into the silencing complex, the miRNA
pairs to complementary sites within mRNAs or other transcripts and the
Argonaute protein exerts the posttranscriptional repression [[46]13].
Considering that ~2000 miRNA gene have been identified in human genome
[[47]14], that a miRNA can target hundreds even thousands of different
mRNAs, and an individual mRNA might be influenced by multiple miRNAs
[[48]15], the miRNA biogenesis pathway plays an essential role in
shaping the gene regulatory networks. Therefore, the miRNA biogenesis
is elaborately maintained in a favorable balance under physiological
condition, but can be severely impaired in cancer, resulting in
differential expression of critical miRNAs compared to that in the
normal tissues [[49][16], [50][17], [51][18]].
Based on the canonical miRNA biogenesis pathway and mechanism of
function, miRNAs have long been believed to elicit their effects
exclusively through mRNA degradation and/or translation inhibition.
Recently, evidence has accumulated to suggest that miRNA can also
promote gene expression in particular contexts via various mechanisms.
For instance, under cell cycle arrest, human miRNA miR-369-3 was
reported to activate tumor necrosis factor-α (TNFα) translation by
directing micro-ribonucleoproteins (microRNPs) including AGO and FXR1
to a special type of miRNA binding sites called AU-rich elements (AREs)
in the 3’UTR of TNFα, while it was also acknowledged that this
translational activation can be switched to repression in proliferating
cells [[52]19]. In most cases, however, miRNA was reported to activate
gene expression by binding to the promoter region followed by
recruitment of transcription factors to the miRNA binding sites.
Striking examples include: miR-373 activates E-cadherin (CDH1) and
cold-shock domain-containing protein C2 (CSDC2) by binding to their
promoter regions [[53]20], miR-205 induces the expression of tumor
suppressor genes interleukin 24 (IL24) and IL32 by targeting specific
sites in their promoters [[54]21], miR-744/1186/466d-3p induces Ccnb1
expression in mouse cell lines by targeting promoter elements [[55]22],
miR-589 binds the promoter RNA and activates cyclooxygenase-2 (COX-2)
transcription [[56]23], and let-7i binds to TATA-box of IL2 and
activates it at both mRNA and protein level [[57]24]. Recently, a new
mechanism regarding miRNA activation has been proposed, showing that
miR-24-1 activates FBP1 and FANCC genes by targeting their enhancers
[[58]25]. In addition, these studies invariably corroborated that
proteins related to miRNA biogenesis or functions, such as AGO and
Dicer, or transcription related enzymes are significantly enriched
surrounding the binding sites during gene activation. Collectively,
these previous findings, mainly observed in in vitro conditions,
established that miRNA can also upregulate gene expression by directly
binding to the transcriptional regulatory regions.
While the miRNA-mediated gene activation has been well studied in in
vitro conditions and in animal experiments, it was rarely explored in
human samples. To address the gap, we leveraged the Cancer Genome Atlas
(TCGA) and conducted an unprecedentedly comprehensive analysis on the
miRNA-gene interaction profiles in 8375 patient samples across 31 major
human cancer types. We checked the correlations between all 1046 miRNAs
and 20,531 genes annotated in TCGA, and found that positive miRNA-gene
correlation is surprisingly prevalent and consistent across human
cancers, even when the gene bears conserved binding sites for the
miRNA. And the positive correlations display disparate patterns
compared to the negative correlations. We performed a series of
stringent bioinformatics analysis to investigate whether this positive
correlation has any biological or clinical implication especially in
the context of human oncogenesis. We revealed that the miRNA-gene pairs
with positive correlation are extensively involved in many biological
processes pertaining to immune response, cell membrane signaling, cell
cycle control and other cancer hallmarks. In addition, the top ranked
miRNAs and genes can well stratify patients in terms of overall
survival rate based on their single or combined expression level. These
results warrant the biological and clinical significance of the
widespread positive correlations between miRNAs and genes across human
cancers.
We further investigated the molecular mechanisms underlying the
observed miRNA-gene positive correlation. Most of the positive
correlations (~87%) can be explained by one or more of our proposed
four indirect-regulation hypotheses, including miRNA-host gene
co-expression, gene activation by inhibiting upstream suppressor,
co-regulation by shared transcription factors, and co-activation by
common histone modifications. Considering that co-expression by shared
genetic or epigenetic factors cannot be viewed as causative
relationship, our study stresses that although some positive
correlations are implicated in tumorigenesis, the expression level of
the miRNA and gene are independent on each other. On the other hand,
mechanisms related to the miRNA-host gene co-expression and indirect
activation of gene by inhibiting its upstream suppressor involve
causation, in the sense that the expression level of one will influence
that of the other. We further hypothesized that at least part of the
remaining positive correlations (~13%) can be explained by direct
binding of miRNA to particular transcriptional regulatory regions of
the partner gene, as reviewed above. Our wet-lab experiments in
corresponding cancer cell line only partially recapitulated the
positive correlations observed in human patient samples, implying that
the miRNA-gene interaction are dramatically different between in vitro
and in vivo conditions. This in turn highlights the significance of a
comprehensive analysis on the miRNA-directed gene activation in human
samples.
2. Materials and methods
2.1. TCGA data acquisition, quality control and preprocessing
The miRNA and gene expression data, as well as clinical information of
each cancer, were downloaded from TCGA data portal
([59]https://portal.gdc.cancer.gov/). Three cancers, including FPPP
(FFPE Pilot Phase II), GBM (glioblastoma multiforme) and LAML (acute
myeloid leukemia) were excluded due to small sample size or platform
inconsistency. The pan-cancer analyses eventually consisted of 31
cancer types: ACC, BLCA, BRCA, CESC, CHOL, COAD, DLBC, ESCA, HNSC,
KICH, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, MESO, OV, PAAD, PCPG, PRAD,
READ, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC, UCS and UVM. Sample
size ranged from 36 (CHOL) to 1102 (BRCA), see [60]Table S1. A total of
1046 miRNAs and 20,531 genes (protein coding and noncoding) were
included in the TCGA IlluminaHiseq miRNASeq and IlluminaHiSeq RNASeqV2
data, respectively. We used level 3 expression data for both miRNA and
gene. The miRNA expression counts the reads aligning with the
corresponding precursor, while the gene expression is derived from the
reads per kilobase of transcript per million reads mapped (RPKM). The
miRNA and gene expression values were logarithmically transformed (base
2) prior to further analysis. Pearson correlation analysis was
performed to assess the co-expression between miRNAs and genes across
all 31 cancer types. A correlation was deemed significant in a cancer
if its absolute correlation coefficient |R| > 0.1 (R > 0.1 for positive
correlation and R < −0.1 for negative correlation) and Hochberg
adjusted p-value adj.P < .05 unless otherwise stated.
2.2. CCLE data curation and correlation analysis
We downloaded cell line data for gene and miRNA expression from the
Cancer Cell Line Encyclopedia (CCLE) database [[61]26] for verification
of TCGA data. In summary, the miRNA data involves 654 human miRNAs and
954 cell lines spanning 25 cancer types; and the RNAseq data contains
18,361 genes and 1019 cell lines across 26 cancer types. We calculated
the miRNA-gene correlation for each pair in each of the 25 overlapped
cancers. Due to relatively small sample size for each cancer type
compared to TCGA, at this targeted validation step we imposed a looser
restriction on the p-value cut-off (P < .01) for significance in CCLE.
To compensate this, we used the more conservative Spearman's rank
correlation instead of Pearson correlation.
2.3. Functional characterization of positive correlation associated miRNAs
and genes
To explore the biological significance of the top-ranked pairs, we
conducted gene ontology (GO) and KEGG [[62]27] signaling pathway
enrichment analyses on the genes involved in the top-ranked significant
pairs, using the R package clusterProfiler [[63]28]. Enrichment
profiles were checked over genes at different pan-levels
(PanCan10/15/20/25) respectively, with Fisher exact test p-value < .05
deemed significant. To further investigate the clinical relevance of
these pairs, we performed survival analysis on different groups based
on their miRNA and gene expression profiles. Briefly, patients were
first divided into non-overlapped groups based on their miRNA and gene
expression in three ways: 1) patients were divided into three groups as
high-middle-low (Hi-Mi-Lo) by miRNA expression; 2) patients were
divided into three groups as high-middle-low (Hi-Mi-Lo) by gene
expression and 3) patients were divided into two groups as high-low
(Hi-Lo) by gene:miRNA ratio. Then we adopted log-rank test to compare
the overall survival probability of patients from different groups,
categorized by single or combined stratification indexes. Therefore,
for a particular miRNA~gene pair, the HiHi+Hi group refers to patients
from high miRNA expression, high gene expression and high gene:miRNA
ratio group. The stratification power in patient survival of
miRNAs/genes from different pan-level groups was compared by the
Kolmogorov-Smirnov test (K—S test) performed on the density
distribution of their log-rank test p-values.
To further investigate the implications of our detected positive
correlations in tumorigenesis, we associated the top-ranked miRNA and
genes to the well-known cancer hallmark traits. Briefly, we checked the
association of the PanCan10 miRNAs with the well-known 10 cancer
hallmarks proposed by Hanahan and Weinberg 2011 [[64]1], as did in a
previous study [[65]29]. We also examined the enrichment of the
top-ranked genes of different pan-levels in 50 well-established
hallmark gene sets related to critical biological processes regarding
cellular component, development, DNA damage, immune, metabolic,
pathway, proliferation and signaling [[66]30], which are assumedly
essentially relevant to cancer initiation and progression.
2.4. Identification of miRNAs with host genes
We downloaded the gene annotations (hg38) of 27,423 genes (including
19,902 protein-coding genes and 7521 long non-coding RNAs – lncRNAs)
from GENCODE (GTF v25) [[67]31], and annotations of 1881 human miRNAs
from miRBase (hsa.gff3) [[68]14]. We determined whether a miRNA is
embedded in another gene (protein-coding or noncoding RNA) according to
their genomic coordinates (locations) from the annotation files. A
total of 591 miRNAs were found to be located inside a specific gene,
termed “host” gene. Of these, 451 pairs were covered in the TCGA data.
These 591 host genes consisted of 474 protein-coding genes and 117
lncRNAs, see [69]Table S4. We investigated the preference of a miRNA to
co-express with their host gene by hypergeometric test. The
conservation information of each miRNA was adopted from TargetScanHuman
v7.2 (miR_Family_Info.txt). A positive conservation score
(Conservation? = 1, 2) of a miRNA indicates conservation while a
negative score (−1) refers to non-conservation, remaining miRNAs with a
zero score were ignored ([70]Table S4).
2.5. Detection of double-negative patterns underlying positive correlation
To validate the hypothesis that a miRNA can upregulate a gene by
inhibiting its upstream suppressor, we attempted to detect the
double-negative patterns. For a significant positive miRNA-gene pair,
we first detected all the intermediate (IM) genes that negatively
correlate with both the miRNA and gene in the pair across multiple
cancers (R < −0.1, adj.P < .05, cancer coverage≥5). Then we narrowed
down the intermediate genes to real targets of the miRNA based on
TargetScanHuman v7.2 [[71]15]. We downloaded the predicted targets
(TAR) with context++ and weighted context++ scores, followed by a
series of preprocessing steps, including extraction of human species,
parsing/trimming miRNA names, removing duplicates, and eventually
obtained 198,312 targeting records with high confidence (i.e., with a
positive probability of conserved targeting, P[ct]), which involves 321
different miRNAs and 13,035 genes.
2.6. Detection of double-positive patterns underlying positive correlation
To validate the hypothesis that a positively correlated miRNA-gene pair
might be regulated by shared transcription activators, we tried to
detect the double-positive patterns. For a significant positive
miRNA-gene pair, we first detected all the intermediate (IM) genes that
positively correlate with both the miRNA and gene in the pair across
multiple cancers (R > 0.1, adj.P < .05, cancer coverage ≥5). Then we
narrowed down the genes by two steps. First, we restricted the IMs into
general transcription factors (gTF). We downloaded transcription
factors (TF) and their targets with the R data package tftargets
([72]https://github.com/slowkow/tftargets). This dataset includes human
TF information curated from six published databases: TRED [[73]32],
ITFP [[74]33], ENCODE [[75]34], Neph2012 [[76]35], TRRUST [[77]36],
Marbach2016 [[78]37]. We mapped the Entrez gene IDs into gene symbols
using two R packages: annotate ([79]10.18129/B9.bioc.annotate) and
[80]org.Hs.eg.db (10.18129/[81]B9.bioc.org.Hs.eg.db). After integration
and removal of duplicates, we obtained 2705 gTFs with their target
genes, based on which we removed intermediate genes of each pair that
were not included in the gTF sets. At the second step, we further
narrowed down the gTFs into specific transcription factors (sTF) by
removing gTFs whose target genes do not include the gene in the
miRNA-gene pair under investigation.
2.7. Super-enhancer (SE) identification from ENCODE H3K27ac ChIP-seq data
We downloaded the H3K27ac ChIP-seq data (bed files for narrowPeak) of
15 human tissues including 64 samples from ENCODE (including blood,
lung, liver, kidney, brain, large intestine, stomach, pancreas,
esophagus, prostate gland, adrenal gland, breast, ovary, thymus and
urinary bladder tissues, see [82]Table S5) [[83]34]. To detect the
super-enhancer formation profile, we investigated the H3K27ac signal
surrounding a miRNA or gene site based on the ROSE (Rank Ordering of
Super-enhancers) pipeline [[84]38] with minor modifications. Briefly,
we first stitched the detected peaks if they are within a certain
distance (called the “region”), and then calculated the average
input-subtracted H3K27ac signal intensity within the region by a
revised ROSE score:
[MATH: ROSEscore=1L<
/mi>∑i=1Nwidthpeaki×intensitypeaki :MATH]
In this formula, N is the number of peaks detected in the region, and L
is the length (in bp) of the region in question. In this study, L =
100 K bp, centered at the transcription start site (TSS) of each miRNA
or gene. The width and intensity of each peak (peak[i]) was obtained
from its genomic coordinates (width = chromEnd - chromStart) and signal
intensity (signalValue), respectively (see ENCODE narrowPeak format
description). We employed the scaled peak intensity (“score” in the bed
file) instead of signalValue for better visualization. As did in the
original ROSE pipeline, we adopted a promoter exclusion zone of 4 K bp,
i.e., if a peak was entirely contained within a window of ±2 K bp
around the TSS, the peak was excluded from the calculation. It should
be noted that in contrast to the original ROSE pipeline, we removed a
limitation on the maximum distance of 12.5 K bp between two constituent
enhancers, to focus on the general H3K27ac intensity surrounding the
miRNA/gene TSS site. Under our framework, a region with
ROSE[score] ≥ 10 in at least 11 out of 15 tissues was considered as a
super-enhancer (SE).
2.8. Cell culture
BJ human foreskin fibroblasts were maintained in minimum essential
medium supplemented with 10% fetal calf serum, nonessential amino
acids, and antibiotics. 293 T, MDA-MB-231 and Huh7 cells were grown in
Dulbecco's modified Eagle medium supplemented with 10% fetal calf
serum, glutamine, and antibiotics. Hela, Huh7 and T47D cells were
cultured in EMEM, DMEM and RPMI-1640, respectively, all basic medium
was supplemented with 10% FCS and 1% antibiotics.
2.9. Plasmids
The expression vectors for miRNAs and 3′UTR dual-luciferase reporter
plasmid (pmirGLO) were purchased from Biosettia, Inc. (San Diego, CA).
To construct target gene 3′UTR dual-luciferase reporters
(pmirGLO-CTLA4–3′UTR, pmirGLO-IGFBP5–3′UTR, pmirGLO-ITK-3’UTR,
pmirGLO-PDGFRA-3′UTR, pmirGLO-PIK3CG-3′UTR, pmirGLO-TGFBI-3’UTR,
pmirGLO-IL7R-3’UTR), target gene 3′UTR exons containing miRNA seed
sequences were amplified by PCR from the genomic DNA of 293 T cells.
The PCR primers are described in supplemental Materials. All 3’UTR
fragments are inserted into pmirGLO by NheI-HF and XhoI.
The pGL3-basic plasmid was purchased from Promega Corporation. To
construct pGL3-IL7R/PIK3CG-3Kb + 3′UTR reporter, a 3 kb promoter
sequence from the IL7R and PIK3CG gene was amplified from the genomic
DNA of 293 T cells and inserted into pGL3-basic plasmid by NheI and
XhoI (IL7R promoter), or MluI and BglII (PIK3CG promoter). Then, the
PCR product for the 3’UTR of IL7R or PIK3CG (still containing the miRNA
seed sequence) was digested by XbaI and inserted into
pGL3-IL7R/PIK3CG-3Kb reporter, respectively, downstream of luc+. The
PCR primers are described in supplemental Materials.
2.10. Lentivirus-based gene transduction
pLV-miR-ctrl, pLV-miR-21, pLV-miR-142, pLV-miR-155, pLV-miR-214
Recombinant lentiviruses were packaged in 293 T cells in the presence
of helper plasmids (pMDLg, pRSV-REV, and pVSV-G) using Lipofectamine
2000 (Invitrogen). BJ or MDA-MB-231 cells (1 × 10^5/well) were seeded
into 6-well plates, grown overnight, infected with 300 uL virus in 3 mL
fresh medium containing 8 μg/mL polybrene, and spun for 1 h at 1600 to
1800 rpm. Transduced cells were purified with 1.2 μg/mL of puromycin.
2.11. RNA isolation and quantitative Real-Time PCR
RNA was isolated from cells using TRIzol (Thermo Fisher Scientific)
according to manufacturer's protocol. 500 ng of RNA was reverse
transcribed to cDNA with iScript™ Reverse Transcription
Supermix(Bio-Rad). Quantitative real-time PCR was performed in
triplicate with gene-specific primers and SsoAdvance™ SYBR Green
Supermix (Bio-Rad) in a Bio-Rad CFX96 REAL TIME SYSTEM following
manufacturer's protocols. GAPDH was used as internal control to
normalize the mRNA input for each gene. qPCR primers are described in
supplemental [85]Table S7.
2.12. Dual-luciferase reporter assay
The target genes 3’UTR activity was analyzed in both 293 T and
MDA-MB-231 cells by transient transfection of luciferase reporter
constructs. On the 1st day, 6 × 10^5/well 293 T or 1.5 × 10^5/well
MDA-MB-231 cells were seeded into 12-well plates. These cells were
transfected with 0.17 μg pmirGLO reporter vector, 1.43 μg
pLV-miR-ctrl/pLV-miR-142 vector and 4.0 μl lipo-2000 according to
manufacturer's instruction on the next day. 48 h after transfection,
cell lysates were collected using Passive Lysis Buffer (E1941,
Promega). Firefly and Renilla luciferase activity was detected using
Dual-Luciferase Reporter Assay System (E1960, Promega) on
GloMax®-Multi+ Microplate Multimode Reader (Promega).
For gene IL7R and PIK3CG 3 kb promoter +3’UTR reporter analysis,
MDA-MB-231 cells were transiently transfected with 0.16 μg
pGL3-IL7R/PIK3CG-3Kb + 3’UTR plasmid, 1.36 μg pLV-miR-ctrl/pLV-miR-142
vector and 0.08 μg of the control Rluc vector driven by β-actin, TK or
CMV promoter, using 4 μl lipo-2000 according to manufacturer's
instruction. Other procedures are the same with 3’UTR dual-luciferase
assay.
3. Results
3.1. Positive miRNA-gene correlations are prevalent and consistent across
human cancers
By an integrative miRNA-gene correlation analysis on 1046 miRNAs and
20,531 genes across 31 TCGA cancers ([86]Table S1), we detected a total
of 2,842,030 pairs that were significantly positively correlated
(R > 0.1, adj.P < .05) in at least one TCGA cancer type. We then ranked
these positive pairs according to the number of cancer types (called
cancer coverage) in which their correlation appear to be positive
(R > 0.1) and significant (adj.P < .05). In most of the subsequent
analysis, we focus on the top ranked pairs with cancer coverage ≥10,
totaling 18,996 miRNA-gene pairs, which involves 348 miRNAs and 3074
genes ([87]Table S2). Each of the top 56 significant positive pairs
covered at least 27 cancer types ([88]Fig. 1A). Three pairs, including
miR-196b~HOXA10, miR-335~MEST and miR-483~IGF2, covered all 31 cancers
under investigation. Interestingly, miR-196b and HOXA10 have been
reported to co-express and their overexpression characterized poor
prognosis in patients with gastric cancer [[89]39]. The IGF2 intronic
miR-483 has been widely recognized as an oncogenic miRNA that
transcriptionally upregulates its host gene IGF2 [[90]40,[91]41], while
the long non-coding RNA (lncRNA) H19 intragenic miR-675 was shown to be
the most highly conserved feature of H19 and serves as the functional
regulatory unit of this lncRNA [[92]42]. These previous studies
corroborated the significant biological implications of the positive
correlations that are prevalent across multiple cancer types.
Fig. 1.
[93]Fig. 1
[94]Open in a new tab
Overview of miRNA-gene positive correlation landscape in human cancers.
(A) Heat map showing top positively correlated miRNA-gene pairs
covering ≥27 cancer types. TCGA cancer names and corresponding sample
sizes used for correlation calculation is shown on the left. Pairs are
shown in the format of miRNA~gene at the bottom, with human miRNA
prefix (hsa-mir-) omitted for better visualization. A positive
correlation with Pearson correlation coefficient R > 0.1 and Hochberg
adjusted p-value <0.05 was deemed significant. n.s.: non-significant.
(B) Top miRNAs ranked by the number of its positively correlated genes.
For better visualization, only miRNAs with ≥100 targets and across ≥10
cancer types are shown.
(C) Detailed correlation profiles of a representative positive pair
hsa-mir-15 ~ITK across 31 TCGA cancers. The sample size of each cancer,
Pearson correlation coefficient and p-value are also shown. Red color
indicates significant positive correlation.
See also Fig. S1–3. (For interpretation of the references to color in