Abstract
DNA copy number aberrated regions in cancer are known to harbor cancer
driver genes and the short non-coding RNA molecules, i.e., microRNAs.
In this study, we integrated the multi-omics datasets such as copy
number aberration, DNA methylation, gene and microRNA expression to
identify the signature microRNA-gene associations from frequently
aberrated DNA regions across pan-cancer utilizing a LASSO-based
regression approach. We studied 7294 patient samples associated with
eighteen different cancer types from The Cancer Genome Atlas (TCGA)
database and identified several cancer-specific and common
microRNA-gene interactions enriched in experimentally validated
microRNA-target interactions. We highlighted several oncogenic and
tumor suppressor microRNAs that were cancer-specific and common in
several cancer types. Our method substantially outperformed the five
state-of-art methods in selecting significantly known microRNA-gene
interactions in multiple cancer types. Several microRNAs and genes were
found to be associated with tumor survival and progression. Selected
target genes were found to be significantly enriched in cancer-related
pathways, cancer hallmark and Gene Ontology (GO) terms. Furthermore,
subtype-specific potential gene signatures were discovered in multiple
cancer types.
Subject terms: Cancer, Computational biology and bioinformatics
Introduction
MICRORNAs (miRNAs) are small non-coding RNAs that act as modulators of
the target genes' expression either by inhibiting translation or
promoting RNA degradation^[28]1. Several studies found miRNAs to be the
regulators of cancer driver genes that promote tumor initiation,
progression and proliferation^[29]2–[30]4.
Several state-of-the-art methods utilize miRNA and gene expression data
to infer miRNA-gene regulatory networks. Among these, ARACNe^[31]5 and
ProMISe^[32]6 use mutual information-based algorithms and
HiddenICP^[33]7, idaFast^[34]8 and jointIDA^[35]9 use invariant causal
relationships, i.e., direct or indirect effects of miRNAs on targets to
infer miRNA-gene regulatory networks.
Several studies found that DNA copy number aberrated areas, i.e.,
amplification and deletion regions harbor cancer-driving
genes^[36]10,[37]11 and miRNAs^[38]12–[39]14.
Several studies integrated copy number data, DNA methylation and gene
expression to compute miRNA-gene regulatory networks in
cancer^[40]15,[41]16 using regression-based approaches. These studies,
however, mined miRNAs and target genes from the entire genomic
locations.
In our previous study, we developed a computational pipeline called
miRDriver based on the hypothesis that copy number data from cancer
patient samples can be utilized to discover driver miRNAs of
cancer^[42]17. miRDriver assumes that miRNAs located within an
aberrated region regulate the expression of the genes outside the
aberration, extending the aberration effects across the genome and
beyond the aberrated region. Since other factors can influence the
expression of the genes outside the aberration, miRDriver integrates
DNA methylation and copy number aberration (CNA) of these genes,
transcription factors (TFs) and the expression of the genes located
inside an aberration along with the miRNAs to select the regulatory
miRNAs for these genes^[43]17. We computed frequently aberrated
chromosomal copy number regions, namely, GISTIC regions, among tumor
patient samples (see Materials and Methods). Then, for each GISTIC
region, we computed differentially expressed (DE) genes between the
tumor samples with the aberration and the samples that did not have the
aberration. Afterward, we computed DE trans genes (genes outside of
aberrated areas) and cis genes (genes inside of aberrated areas) for
each GISTIC region. Finally, we applied a LASSO-based^[44]18 regression
model to select miRNAs regulating DE genes' expression (Fig. [45]1).
Figure 1.
[46]Figure 1
[47]Open in a new tab
The overview of algorithmic steps used within the miRDriver
computational pipeline: GISTIC step, Differential Expression step,
REGULATOR step and LASSO step with R functions running on pan-cancer.
miRDriver outperformed ARACNe, ProMISe, Hidden-ICP, ICP-PAM50, idaFast
and jointIDA in retrieving significantly enriched miRNA-gene
interactions with the known miRNA-gene interactions. miRDriver
discovered several potentially novel interactions in multiple cancer
types. Several oncogenic and tumor suppressor miRNAs and genes were
found to be enriched in the computed miRNA-gene networks. Several
miRNAs were found to be associated with patients' survival and disease
progression. Selected target genes were found to be significantly
enriched in cancer-related biological pathways and GO terms^[48]19.
Furthermore, subtype-specific gene signatures were discovered in
multiple cancer types.
In our previous publication, we have demonstrated miRDriver’s
statistical robustness by applying it to two different cancer types.
This study has unique contributions. In the current study, we present
miRDriver as an R software package with various options for users to
run our workflow. We have also demonstrated its application and
biological importance by running miRDriver on eighteen different cancer
types. We have presented extensive results on these cancer types that
were not present in our prior publication. We have also presented
pan-cancer-wide findings and their relevance to cancer. We have put
together a resource of pan-cancer miRNA-gene interactions that will be
useful to biologists, clinicians and scientists working on cancer
research.
Results
In this study, we integrated CNA, DNA methylation, TF-gene
interactions, gene, and miRNA expression datasets in the miRDriver tool
to compute miRNA-gene interactions based on DNA copy number aberrated
regions in eighteen different cancer types from TCGA. Table [49]1 shows
the cohort sizes for each data modality, the number of all GISTIC
regions, the count of trans genes in the LASSO step, and the computed
miRNA-gene interactions in eighteen different cancer types.
Table 1.
TCGA cancer types in the study with cohort sizes in different data
modalities and results of miRDriver.
TCGA cancer
type name Abbreviation RSeq^a CNA^b 450K^c 27K^d miRNA^e GISTIC regions
trans genes^f Interactions
(genes-miRNAs)^g Fisher p-value^j
Adrenocortical carcinoma ACC 79 180 80 NA 79 59 4683 308 (253–33)
3.38e−13
Bladder Urothelial Carcinoma BLCA 411 810 437 NA 429 126 5466 578
(416–125) 4.28e−09
Breast invasive carcinoma BRCA 1102 1103 895 343 1165 66 10,494 1852
(776–187) 17.9e−11
Cervical squamous cell carcinoma and endocervical adenocarcinoma CESC
304 586 312 NA 311 91 4515 558 (349–86) 1.15e−05
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma DLBC 48 98 48 NA 47 42
3697 384 (288–31) 6.06e−16
Esophageal carcinoma ESCA 161 373 202 NA 195 119 4961 738 (521–92)
4.35e−10
Head and Neck squamous cell carcinoma HNSC 500 1090 580 NA 565 105 2591
326 (205–75) 2.47e−08
Kidney renal clear cell carcinoma KIRC 534 1067 483 NA 570 100 3995 586
(501–29) 1.45e−06
Acute Myeloid Leukemia LAML 151 397 194 418 188 46 3593 590 (431–21)
1.15e−04
Brain Lower Grade Glioma LGG 511 1021 534 NA 528 87 1653 226 (151–45)
1.26e−08
Liver hepatocellular carcinoma LIHC 371 767 430 NA 421 107 3593 316
(224–71) 1.37e−10
Lung adenocarcinoma LUAD 524 1110 507 150 555 131 4602 1172 (747–142)
1.8e−4
Lung squamous cell carcinoma LUSC 501 1038 412 161 511 131 2735 449
(266–105) 1.91e−05
Ovarian serous cystadenocarcinoma OV 374 573 10 613 486 64 3117 1347
(548–147) 0.03
Pancreatic adenocarcinoma PAAD 177 368 195 NA 182 75 2918 530 (371–55)
2.81e−12
Prostate adenocarcinoma PRAD 498 1038 553 NA 544 95 4016 266 (239–43)
9.51e−03
Thyroid carcinoma THCA 502 1025 571 NA 569 75 1138 204 (204–2) 1.58e−04
Uterine Corpus Endometrial Carcinoma UCEC 547 1098 485 118 556 174 6106
1118 (688–152) 5.73e−09
[50]Open in a new tab
NA not available.
Cohort sizes in ^aGene expression; ^bCopy number aberration; ^c450K DNA
methylation; ^d27K DNA methylation; ^emiRNA expression datasets. ^fNo.
of DE trans genes used in miRDriver's LASSO step; ^gNo. of selected
interactions with no. of selected DE trans genes and no. of selected
miRNAs in the parenthesis; ^jp-value of two-sided Fisher's exact test
for enrichment of oncogenic miRNAs in each cancer type.
Computed miRNAs were significantly enriched in the experimentally-validated
oncogenic miRNAs
We performed a two-sided Fisher's exact test to check the association
between the cancer-related miRNAs in OncomiRDB (see Materials and
Methods) and the computed miRNAs by miRDriver. For each cancer type,
the background set in the Fisher's exact test consisted of all TCGA
miRNAs used in the LASSO step (see Materials and Methods) for that
cancer type. For all cancer types, computed miRNAs were significantly
enriched (Fisher's exact test p-value < 0.05) in the oncogenic miRNAs
in OncomiRDB (Table [51]1).
Computed miRNA-gene interactions were enriched in the known miRNA-gene
interactions
To check if the miRNA-gene interactions computed by miRDriver were
significantly enriched in the known miRNA-gene interactions, we
performed a hypergeometric test for each miRNA's computed target genes
in each cancer type. We considered only those miRNAs that had at least
one known target in the ground truth data (i.e., known miRNA-gene
interactions) (see Materials and Methods) from the computed target
list. We labeled them as "Eligible miRNAs" for the hypergeometric test.
The background set, i.e., the hypergeometric test universe, was the set
of all the trans genes in the HGNC symbol^[52]20 that were common to
the ground truth data. For fourteen cancer types, at least 50% of the
"Eligible miRNAs" had significant enrichment (p-value < 0.05) (Table
[53]2). The entire list of the computed miRNAs with individual
hypergeometric p-values for all eighteen cancer types can be accessed
in Supplemental Table [54]S1.
Table 2.
Target enrichment.
Cancer type Eligible miRNAs^a Significant miRNAs^b (%)
ACC 4 0
BLCA 6 67
BRCA 59 63
CESC 8 88
DLBC 6 83
ESCA 5 60
HNSC 3 67
KIRC 3 67
LAML 7 43
LGG 2 100
LIHC 7 43
LUAD 4 50
LUSC 3 100
OV 27 89
PAAD 7 57
PRAD 1 100
THCA 1 0
UCEC 11 55
[55]Open in a new tab
For fourteen different cancer types, at least 50% of the "Eligible
miRNAs" had significantly enriched computed targets in the ground truth
data (p-value < 0.05).
^aNo. of "Eligible miRNAs" for hypergeometric test for the enrichment
of known targets; ^bpercentage of miRNAs with hypergeometric
p-values < 0.05.
miRDriver outperformed five state-of-the-art methods in inferring significant
miRNA-gene interactions
We compared miRDriver with five state-of-the-art methods, namely,
ARACNe, ProMISe, HiddenICP, idaFast and jointIDA, by running them on
eighteen different cancer types from TCGA. For all these methods, we
used gene expression data to compute miRNA-gene interaction networks
for our comparison (see Materials and Methods). We performed the
hypergeometric test to measure each miRNA's computed targets'
enrichment significance in the known miRNA-gene interaction data. We
selected only "Eligible miRNAs" (i.e., miRNAs with at least one known
target in the ground truth data) for this test. We computed the
overlapping "Eligible miRNAs" for miRDriver and each comparable method.
We checked if the count of the "Significant miRNAs" (i.e., miRNAs with
target enrichment test p-value < 0.05) in miRDriver was more (i.e.,
miRDriver won), less (i.e., miRDriver lost), or equal (i.e., there was
a draw) than the other method in the overlap. miRDriver had more
"Significant miRNAs" than all other methods for most of the cancer
types. For ACC, LUSC and THCA, miRDriver and the different methods had
no common "Eligible miRNAs"; hence, we eliminated these three cancer
types from this test. Table [56]3 summarizes the comparison results in
all the cancer types. Table [57]4 presents the comparison results for
ovarian cancer (OV) in detail with the number of "Eligible miRNAs" and
"Significant miRNAs" in all the methods. For a detailed comparison with
all the cancer types, see Supplemental Table [58]S2. We also compared
miRDriver with sequence-based competing endogenous RNA (ceRNA)
prediction tool, Cupid^[59]21 for BRCA. miRDriver outperformed Cupid as
well. Cupid predicts miRNAs that are also predicted to "mediate" ceRNA
interactions. For TCGA BRCA, the authors of Cupid predicted 299K
candidate miRNA–target interactions. We filtered this list with 6504
input genes and 255 miRNAs, the same inputs we used in miRDriver for
BRCA. We considered the top 2437 (top 1 percentile) of miRNA-gene
interactions based on Cupid reported scores to get highly confident
interactions for our comparison. The count of the "Significant miRNAs"
in miRDriver was higher than Cupid in the overlap (see Supplemental
Table [60]S2).
Table 3.
Comparison of miRDriver with other methods. We computed the overlapping
miRNAs computed by miRDriver and each comparable method.
graphic file with name 41598_2022_7628_Tab3_HTML.jpg
[61]Open in a new tab
We checked if the count of the "Significant miRNAs" (i.e., miRNAs with
target enrichment test p-value < 0.05) in miRDriver was more (i.e.,
miRDriver won), less (i.e., miRDriver lost), or equal (i.e., there was
a draw) than the other method in the overlap. miRDriver had more
"Significant miRNAs" than all other methods for most of the cancer
types.
Green—miRDriver won; Red— miRDriver lost; Black—draw.
Table 4.
Comparison results of miRDriver with five other methods in ovarian
cancer.
Method Input miRNAs Input genes Computed miRNAs Selected genes Eligible
miRNAs^a Overlapping eligible miRNAs^b Method's computed miRNAs in
overlap miRDriver's computed miRNAs in the overlap
miRDriver 198 2114 147 354 27 NA NA NA
ARACNe 198 2114 196 791 59 27 1 24
ProMise 198 2114 57 1938 34 22 0 17
hiddenICP 198 2114 198 2100 47 21 0 16
idaFast 50 1500 50 1194 32 22 0 17
jointIDA 50 1500 50 1294 32 22 0 17
[62]Open in a new tab
^aEligible miRNAs had at least one known target in the ground truth
data; ^bOverlapping eligible miRNAs were with respect to miRDriver. For
miRDriver, the number of significant miRNAs in every overlap with other
methods was much higher. NA means not applicable.
Computed genes were enriched in biological pathways, cancer hallmark and GO
terms
To evaluate the functional roles of the computed target genes by
miRDriver for each cancer type, we checked whether these genes were
enriched in the biological pathways and GO terms^[63]19. For this
purpose, we performed pathway enrichment analysis with the pathways in
REACTOME^[64]22 and KEGG^[65]23 databases. For REACTOME pathway
enrichment, we used R package Pathfinder^[66]24 and for KEGG pathways,
hallmark gene set from the MSigDB^[67]25,[68]26 database and GO
enrichment, we used R package clusterProfiler^[69]27. We selected the
pathways and GO terms with significant enrichment (multiple testing
corrected, i.e., adjusted p-value < 0.05). We found 213 unique REACTOME
pathways spanning over seventeen cancer types, twelve unique KEGG
pathways in twelve cancer types and 224 unique enriched GO terms
spanning over fifteen cancer types. Table [70]5 shows the enriched
pathways and GO terms that were common in multiple cancer types. We
provided the entire list of enriched pathways and GO terms for all the
cancer types in Supplemental Table [71]S3. Among these pathways,
"Immune System" related pathways were found to play essential roles in
cancer^[72]28,[73]29. The G protein-coupled receptors (GPCRs)-related
REACTOME pathways such as "Signaling by GPCR", "GPCR ligand binding"
and "GPCR downstream signalling", which were implicated in several
cancer-related studies, were found to be enriched in the computed
target genes in more than ten cancer types in our study. These pathways
were found to play crucial roles in tumor development, invasion,
migration, survival, and metastasis^[74]30,[75]31. The GO terms, such
as "receptor ligand activity" and "receptor regulator activity",
enriched in at least five cancer types, were highlighted in several
cancer studies for playing roles in drug toxicity, cell function, tumor
growth^[76]32–[77]34. The computed target genes in each cancer type
were also enriched in the cancer hallmark gene set (Table [78]6).
Table 5.
Enriched pathways and GO terms in pan-cancer.
REACTOME^a KEGG^b GO terms^c
Immune System
Metabolism
Signal Transduction
Innate Immune System
Hemostasis
Transport of small molecules
Developmental Biology
Signaling by GPCR
Class A/1 (Rhodopsin-like receptors)
GPCR ligand binding
GPCR downstream signalling
G alpha (i) signalling events
Neuronal System
Neuroactive Ligand Receptor Interaction
Metabolism of Xenobiotics by Cytochrome P450
Steroid Hormone Biosynthesis
Retinol Metabolism
Drug Metabolism Cytochrome P450
Cytokine-Cytokine Receptor Interaction
Systemic Lupus Erythematosus
Receptor ligand activity
Receptor regulator activity
Ion gated channel activity
Gated channel activity
Cation channel activity
Substrate-specific channel activity
Passive transmembrane transporter activity
Extracellular matrix
Ion channel activity
Nucleosome
DNA packaging complex
Nuclear nucleosome
Protein-DNA complex
Hormone activity
[79]Open in a new tab
The pathways that appeared in more than four cancer types are in bold.
^aREACTOME pathways, ^bKEGG pathways and ^cGO terms that were found to
be enriched in at least two cancer types.
Table 6.
Enriched cancer hallmark terms in pan-cancer for computed target genes.
Cancer type Cancer Hallmark Terms p-value
ACC Epithelial Mesenchymal Transition 0.013
BRCA Estrogen Response Late 0.003
BRCA Estrogen Response Early 0.017
CESC KRAS Signaling DN 0.022
CESC HEDGEHOG Signaling 0.031
DLBC KRAS Signaling DN 0.013
ESCA Myogenesis 0.005
ESCA Coagulation 0.007
HNSC Myogenesis 0.009
KIRC E2F Targets 0.000
KIRC G2M Checkpoint 0.000
LAML KRAS Signaling UP 0.002
LUAD KRAS Signaling DN 0.007
PRAD Myogenesis 0.017
[80]Open in a new tab
Furthermore, miRDriver computed 22 common miRNAs that were shared in at
least eight different cancer types among eighteen total cancer types
used in the study (Table [81]7). The targets of these miRNAs could
regulate the common biological processes in cancer. Hence, we performed
a GO enrichment test with 1161 computed genes targeted by at least one
of these 22 miRNAs among eighteen cancer types and found 49 GO terms
with significant enrichment. Table [82]8 shows a few of these GO terms
with their cancer-related citations; the entire list can be found in
Supplemental Table [83]S4.
Table 7.
Twenty two common miRNAs computed by miRDriver in multiple cancer
types.
graphic file with name 41598_2022_7628_Tab7_HTML.jpg
[84]Open in a new tab
Table 8.
Enriched GO terms with the cancer-related citations in the targets of
the common miRNAs in Table [85]7.
GO-ID Description Adjusted p-value
GO:000633 DNA replication-dependent nucleosome assembly^[86]4,[87]45
5.25e−07
GO:0006342 Chromatin silencing^[88]47,[89]48 1.17e−03
GO:0006323 DNA packaging^[90]49,[91]50 5.29e−05
GO:0045814 Negative regulation of gene expression,
epigenetic^[92]51,[93]52 2.0894e−03
GO:0060964 Regulation of gene silencing by miRNA^[94]53,[95]54
2.362e−02
GO:0060147 Regulation of post-transcriptional gene
silencing^[96]55,[97]56 2.767e−02
GO:0048018 Receptor ligand activity^[98]32–[99]34 3.377e−03
[100]Open in a new tab
Although there were common miRNAs across multiple cancer types, there
were not many common miRNA-gene interactions due to a much higher
number of trans genes than the miRNAs in this pan-cancer analysis.
Table [101]9 presents fourteen common gene-miRNA interactions shared in
two cancer types among 11,548 selected interactions from pan-cancer.
Among these, RSPO3 and miR-22 interaction have been selected in LAML
(leukemia) and LUAD (lung cancer). Interestingly, RSPO3 was found to
play a role in leukemia^[102]35 and promote tumors in lung
cancer^[103]36. miR-22 was found to play the anti-tumor role with
therapeutic potential in acute myeloid leukemia^[104]37 and found to
have roles in lung cancer via CNAs^[105]38. Another interaction PAX5
with miR-5699 was found in BLCA (bladder cancer) and OV (ovarian).
Interestingly, PAX5 was found to have a role in bladder cancer^[106]39
and ovarian cancer^[107]40 as a co-regulator of PAX8. miR-5699 has a
proven role in ovarian cancer treatment's oxidative response^[108]41.
There are some miRNA-long noncoding RNA (lncRNA) interactions in Table
[109]9. lncRNAs are known to have binding sites for miRNAs, also
lncRNAs can be direct–indirect targets of miRNAs^[110]42,[111]43.
Several lncRNAs were found to be prevalent in cancer^[112]44. In our
case, LINC01833- miR-1226, was found in BRCA (breast cancer) and LGG
(brain cancer). LINC01833 was listed in the top five lncRNAs according
to the prioritization of variation in ER-negative-associated lncRNAs in
breast cancer^[113]45. miR-1266 was found to regulate the expression of
the mucin 1 oncoprotein and induce cell death in a breast cancer
study^[114]46.
Table 9.
miRNA-gene interactions computed by miRDriver in multiple cancer types.
Cancer type column shows in which cancer types the interactions are
present.
Gene miRNA Cancer type
RSPO3 miR-22 LAML,LUAD
PAX5 miR-5699 BLCA,OV
LINC01833 miR-1226 BRCA,LGG
LINC01697 miR-5703 HNSC,UCEC
HIST1H4L miR-3613 BLCA,LUAD
LINC02489 miR-375 CESC,OV
NR0B1 miR-346 HNSC,KIRC
GABRG2 miR-744 PAAD,UCEC
PLAC8 miR-6510 CESC,HNSC
BPIFC miR-4469 LUSC, UCEC
RTL3 miR-26b CESC,UCEC
SLC17A2 miR-5699 LUSC,PAAD
[115]Open in a new tab
Several cancer-related terms and pathways were enriched in the targets of the
computed miRNAs
We checked the involvement of the computed miRNAs in cancer-related
pathways. For this analysis, we collected all 556 miRNAs that were
computed by miRDriver in at least one of the cancer type. We collected
the computed target genes for each of these miRNAs from all the cancer
types where that miRNA was present. We performed cancer hallmark gene
set enrichment with these collected target genes of each miRNA. We
found 38 unique enriched cancer hallmark terms (adjusted
p-value < 0.05) for 134 miRNAs (Supplemental Table [116]S5).
We also performed REACTOME pathway enrichment analysis with these
collected target genes of each miRNA. We found 240 unique enriched
REACTOME pathways (adjusted p-value < 0.05) for 69 miRNAs with these
target genes (Supplemental Table [117]S5). Eleven of these enriched
pathways, such as, "Epithelial-Mesenchymal Transition", "Hypoxia",
"Inflammatory Response", "KRAS Signaling Up", "p53 Pathway", "P13 AKT
MTOR Signaling", "Xenobiotic Metabolism", "Apoptosis", "DNA Repair" and
"Immune" were present in nineteen experimentally-validated
cancer-related pathways for miRNAs^[118]57.
Furthermore, we performed an analysis to find cancer-driving miRNAs
(i.e., tumor-suppressor, oncogenes or both) using the enriched cancer
hallmark terms (Supplemental Table [119]S5). We hypothesized that a
miRNA could be a candidate cancer-driving miRNA if its target genes
that were found to be enriched in the cancer hallmark terms could also
be enriched in the known cancer-driving genes. Hence, for each of the
enriched cancer hallmark terms, we gathered all the miRNAs with their
target genes for which that term was enriched (Table [120]10). We
downloaded a list of 83 cancer-driving genes found to be frequently
mutated in different cancer types from the Catalogue Of Somatic
Mutation In Cancer (COSMIC) database from the cancer gene census
project^[121]58. We performed a hypergeometric test for the overlapping
target genes with the 83 cancer-driving genes for each cancer hallmark
term. The background gene set for this test was all 5604 target genes
computed by miRDriver in pan-cancer. We considered the miRNAs related
to the hypergeometric p-value < 0.05 as the candidate miRNAs to be
evaluated as cancer-driving miRNAs since their targets were enriched in
known cancer-driving genes. Furthermore, considering the fact that the
up- or down-regulation of a miRNA causes the inverse regulation of its
target genes^[122]59–[123]61, we specifically checked the target genes
of these candidate miRNAs for different cancer types that were found to
have negative LASSO regression coefficient computed by miRDriver (Table
[124]11). Interestingly, all of the target genes in this group (Table
[125]11), except OLIG2, were found to be oncogene in the previous
studies^[126]62–[127]68. OLIG2 was found to be working as a
tumor-suppressor gene (TSG) in human glioblastoma^[128]69. All the
miRNAs except miR-5001 and miR-2276 were found to act as TSGs in cancer
in several studies^[129]70–[130]74. miR-5001 and miR-2276 were found to
have evidence of working as oncogenes in endometrial cancer and
colorectal cancer, respectively^[131]75,[132]76. These studies support
the findings of miRDriver in terms of connecting miRNAs and genes that
were related inversely, having a possibility to be working as drivers
in pairs of TSG-oncogene in different cancer types.
Table 10.
Hallmark term-related target enrichment in cancer driver genes.
Hallmark miRNAs^a Targets^b Overlap^c p-value^d
Complement 5 42 3 0.018
E2F Targets 2 85 4 0.026
MTORC1 Signaling 1 12 2 0.011
Myogenesis 12 44 3 0.020
P53 Pathway 1 12 2 0.011
TNFA Signaling via NFKB 2 17 2 0.021
Pancreas Beta Cells 2 48 3 0.026
[133]Open in a new tab
^aNo. of miRNAs in cancer hallmark term; ^bNo. of targets in the term;
^cNo. of overlapping targets in the cancer driver genes;
^dHypergeometric p-value of the overlap.
Table 11.
miRNA targets with negative LASSO coefficient in different cancer
types.
Cancer type Target miRNA
KIRC COL1A1 miR-4728
KIRC CYSLTR2 miR-346
KIRC CYSLTR2 miR-4728
KIRC ETV4 miR-4728
CESC ISX miR-5001
UCEC ISX miR-2276
UCEC ISX miR-4733
UCEC ISX miR-6842
PAAD KCNJ5 miR-5699
KIRC NTRK1 miR-4728
HNSC OLIG2 miR-5699
[134]Open in a new tab
Computed target genes revealed the subtype-specific expression signature in
multiple cancer types
We checked the subtype-specific association of gene expression of
computed target genes in BRCA, LGG, LUSC and PAAD. We used the R
package TCGAbiolinks^[135]77 to download the different subtype labels
for the different cancer types. Since TPM (transcript per million
reads) values are normalized and comparable across samples, for this
analysis, we utilized RNA-Seq data in TPM of TCGA samples whose subtype
labels were available. We applied log2(TPM + 1) transformation from
Cancer Dependency Map [[136]https://depmap.org]. For all these cancer
types, we performed unsupervised clustering using gene expression of
these target genes and compared these clusters with baseline (i.e.,
known) subtype clusters using Rand Index (RI) and Uniform Manifold
Approximation and Projection (UMAP)^[137]78 plots.
For BRCA, we computed a UMAP plot using around 1000 BRCA samples and
106 high-degree genes (i.e., computed genes targeted by more than three
miRNAs) to check the PAM50 gene-based subtypes^[138]79. These subtypes
were, Basal-like (BL), HER2-enriched (HER2+), LuminalA (LA), LuminalB
(LB) and Normal-like (NL) (Fig. [139]2A). We also computed the UMAP
plot using the PAM50 genes with PAM50 gene-based subtypes
(Fig. [140]2B). These UMAP plots show a clear separation between
different subtype-specific clusters. We also performed an unsupervised
clustering (k-means) (with R base package Stats with k = 5 and all
other parameters as default) on the BRCA cohort with high-degree target
genes (Fig. [141]2C) and with PAM50 genes (Fig. [142]2D). The computed
RIs between five known subtype labels with the five predicted clusters
by computed high-degree target genes and PAM50 genes were 0.74 and
0.82, respectively. This result shows that both the computed
high-degree target genes and PAM50 gene set were able to detect subtype
structure in BRCA samples with high accuracy.
Figure 2.
[143]Figure 2
[144]Open in a new tab
UMAP plots and confusion matrices are summarizing the classification
and clustering of the cancer samples. (A, B) UMAP plots with
high-degree target genes in BRCA with baseline and k-means clustering
labels, respectively; (C, D) UMAP plots with PAM50 genes in BRCA with
baseline and k-means clustering labels, respectively; (E, F) Confusion
matrices of subtype-classification in BRCA with F1 scores with respect
to the baseline labels, using high-degree target genes and PAM50 genes,
respectively. Accuracy and F1 score were closer in both cases; (G) UMAP
plot with all target genes using transcriptome-based baseline labels in
LGG; (H) UMAP plots with high-degree target genes using
expression-based baseline labels in LUSC; (I) UMAP plots with
high-degree target genes using mRNA-based clusters^[145]81 as a
baseline in PAAD.
Furthermore, we used the high-degree genes to classify the BRCA cohort
into five different classes. For this purpose, we used R package
keras^[146]80 ([147]https://github.com/rstudio/keras) implementation of
the Random Forest classifier with 80% samples for training with 10-fold
cross-validation where 20% of data was held out to test the performance
of the model. We achieved a high classification accuracy of 0.86. The
same sample cohort was classified with PAM50 genes and achieved a
classification accuracy of 0.89. Figure [148]2E,F present the confusion
matrices for both cases with F1 scores. The F1 scores for the
classification with high-degree target genes were comparable to F1
scores of the PAM50-based classification, which suggests that these
high-degree target genes can serve as potential markers for PAM50-based
subtype signatures in BRCA.
For the other cancer types except for LGG, we computed UMAP plots to
check the baseline subtype clusters with the selected high-degree
target genes. For these cancer types, since there was a fewer number of
genes targeted by more than three miRNAs, we defined high-degree genes
as the genes targeted by more than two miRNAs. For LGG, we used 402
samples with all 151 computed target genes since no gene was targeted
by multiple miRNAs (Fig. [149]2G). For LUSC, we used 178 patient
samples with 75 high-degree target genes (Fig. [150]2H), and in PAAD,
we used 150 patient samples with 101 selected high-degree target genes
(F[151]ig. [152]2I). We also performed k-means clustering for all these
cancer types. For LGG, LUSC and PAAD, the computed RIs between known
subtype clusters with the predicted clusters were 0.71, 0.62 and 0.70,
respectively. For LGG and PAAD in which we achieved high RI values, we
visualized clear separation among the known subtype-specific clusters
based on UMAP plots. For LUSC, although we achieved a lower RI value,
the "Basal" cluster was separated from other clusters (Fig. [153]2H).
These results showed that the computed high-degree target genes could
reveal subtype-specific expression signatures in multiple cancer types.
Computed miRNAs were found to be potential biomarkers for patients' survival
and progression of the disease in each cancer type
We performed survival analysis with the computed miRNAs to assess the
miRNAs' prognostic relevance as clinical biomarkers for patients'
survival (Fig. [154]3). For each miRNA, we divided the patient cohort
of each cancer type into two groups, such as high expression and low
expression for that miRNA. We considered the available clinical
variables among age, race, gender, stage, and grade as independent
variables (see Materials and Methods). To remove the confounding effect
of multiple factors, we used the Adjusted Kaplan–Meier Estimator and
computed adjusted survival curves by weighting the individual
contributions by the inverse probability weighting (IPW) using the R
package IPWsurvival^[155]82. We considered four different survival
endpoints, namely, Overall Survival (OS), Progression Free Interval
(PFI), Disease Specific Survival (DSS) and Disease Free Interval (DFI)
(see Materials and Methods). We found several prognostic miRNAs
(adjusted log-rank test p-value < 0.05) based on Adjusted Kaplan–Meier
survival plots in multiple cancer types. Figure [156]3 shows the
survival plots for the common miRNAs in different cancer types. Among
22 common miRNAs (Table [157]7), eighteen had significant survival
differences in high and low miRNA expression patient groups in at least
one cancer type (Fig. [158]3). We provided the survival plots for all
miRNAs for eighteen cancer types in Supplemental Figure
[159]S1–[160]S18.
Figure 3.
[161]Figure 3
[162]Open in a new tab
Adjusted Kaplan–Meier plots with adjusted log-rank test p-value for 18
common miRNAs in high and low expression groups, (A) let-7a-3 in OV
with OS; (B) let-7b in PAAD with OS; (C) miR-149 in ACC with PFI; (D)
miR-210 in BRCA with OS; (E) miR-31 in KIRC with OS; (F) miR-3187 in
HNSC with OS; (G) miR-3664 in PRAD with OS; (H) miR-4777 in LUAD with
DFI; (I) miR-4786 in LIHC with OS; (J) miR-3136 in BLCA with PFI; (K)
miR-34b in ESCA with PFI; (L) miR-3667 in LUSC with PFI; M) miR-4662a
in UCEC with PFI; (N) miR-548k in PRAD with PFI; (O) miR-6510 in PAAD
with PFI; (P) miR-4762 in LUSC with DFI; (Q) miR-486 in HNSC with DFI;
(R) miR-675 in ACC with PFI.
miRDriver discovered several cancer-specific miRNAs
In this study, miRDriver discovered 240 cancer-specific miRNAs, i.e.,
these miRNAs were selected in only one cancer type. We used the R
package OncoScore^[163]83 to measure these miRNAs' association with
cancer based on citation frequencies in cancer-related biomedical
literature. Fifty percent of these miRNAs (i.e., 121) were found to be
cited in cancer-related studies (Supplemental Table [164]S6). Moreover,
several of these miRNAs were found to be prognostic, i.e., associated
with patients' survival based on Adjusted Kaplan–Meier survival
analysis (adjusted log-rank test p-value < 0.05) (Table [165]12).
Table 12.
Cancer-specific miRDriver miRNAs with citation frequency.
Cancer type miRNA^a Citation^b Cancer type miRNA^a Citation^b
LIHC miR-1288 2 BLCA miR-3677 4
HNSC miR-134 56 LUSC miR-3934 1
KIRC miR-194-1 1 BLCA miR-4791 1
UCEC miR-195 197 BLCA miR-5003 1
KIRC miR-215 69 UCEC miR-552 12
LIHC miR-3170 1 HNSC miR-561 6
LUAD miR-3651 1 PAAD miR-6875 3
[166]Open in a new tab
^aThese miRNAs were prognostically significant in survival analysis;
^bOncoScore citation frequency.
The copy number changes of the computed miRNAs were predictive of their
expressions
We computed the Spearman correlation values between copy number and
expression across all the samples of the computed miRNAs of miRDriver
in eighteen different cancer types (Supplemental Figure [167]S19). As
expected, we observed that most miRNAs had a positive correlation
between their copy number and expression. There were also some negative
correlations, but this is not surprising as miRNA expression is
dependent on regulatory factors beyond copy number events, too. Despite
this, the positive median distribution of correlations across all
cancer types supports our hypothesis that miRNA expression in copy
number areas may be predictive of DE trans gene expression variation.
Selected high-degree genes were highly significant as potential biomarkers to
predict prognosis in cancer patients than low-degree genes in several cancer
types
We computed the hazard ratio (HR) of the selected high-degree target
genes as the genes targeted by four or more miRNAs and low-degree
target genes as the genes targeted by only one miRNA to get the
optimized list of high-degree and low-degree genes. We performed the
multivariate Cox regression analysis^[168]84 using these genes. Due to
the low sample size of the high-degree target genes, we computed effect
size using the r-value of the Mann–Whitney test with |ln (HR)|. Higher
|ln (HR)| implies a higher association with an event's risk with an
increase or decrease of gene expression. The r-value was negative if
the |ln (HR)| values in the high-degree group were higher than the
low-degree group and positive otherwise. We used OS, PFI, DSS and DFI
as clinical endpoints in this analysis. We ran this analysis on fifteen
different cancer types omitting the cancer types with no high-degree
target gene (THCA and PRAD) and no clinical endpoint (LAML). In our
previous work^[169]17 with BRCA and OV, we discussed the significance
of high-degree target genes; hence, we omitted these two cancer types
as well, leaving us thirteen cancer types for this analysis. Although
the Wilcoxon rank-sum test p-values for the comparison between the
boxplots of the two groups were insignificant (p-value > 0.05), we
found negative r-values in most of the cancer types (see Fig. [170]4).
The hazard ratio boxplots of all thirteen cancer types with r-values in
different clinical endpoints can be found in Supplemental Figure
[171]S20–[172]S23. Table [173]13 shows the high-degree target genes
with OS in seven cancer types that had negative r-values. These genes
were found to be cited in cancer-related work in a high percentage
(≥ 50%) among total citations in biomedical literature by OncoScore.
The entire list of high-degree genes with OncoScore frequencies has
been provided in Supplemental Table [174]S7.
Figure 4.
[175]Figure 4
[176]Open in a new tab
Boxplots of absolute values of the natural logarithm of hazard ratios
in high-degree and low-degree genes with an r-value of Mann–Whitney
test, (A) LUSC with OS, (B) BLCA with DSS, (C) ESCA with DFI, (D) HNSC
with OS, (E) LGG with OS, (F) PAAD with OS. These plots show that
computed high-degree genes were having higher |ln (Hazard Ratio)
(r-value < 0) to predict disease survival and prognosis in cancer
patients than low-degree genes.
Table 13.
Cancer types with negative r-values from the ^aMann-Whitney test
between low-degree and high-degree gene groups; ^bHighly cited
high-degree genes in these cancer types in cancer-related literature.
Cancer type ^ar-value ^bHigh degree genes
BLCA − 0.76 BTNL3, HNF1A-AS1, MIR1205, NAA11, NOL4, OR10H5, PDZD3
ESCA − 0.26 ANKRD26P3, C17orf64, CCDC60, FAM81B, LIN28A, MYLKP1
HNSC − 1.24 BTBD17, DNM3OS, KLHL33, SMCO1
KIRC − 0.03 HOTTIP
LGG − 0.83 C20orf85, C7orf65
PAAD − 0.79 ARHGAP36, C1QTNF1-AS1, TMPRSS15
[177]Open in a new tab
Materials and methods
All the experiments were conducted in accordance with relevant
guidelines and regulations.
Running miRDriver on pan-cancer
In this study, we conducted a pan-cancer analysis where we applied the
miRDriver R package to identify copy-number derived miRNA-gene
interactions. We integrated gene expression, CNA, DNA methylation,
TF-gene interactions and miRNA expression data from eighteen different
cancer types (Table [178]1). miRDriver has four computational steps:
GISTIC Step, DE Step, REGULATOR Step, and LASSO Step. In the following
paragraphs, we described the miRDriver R functions to run these steps.
The entire pipeline of miRDriver running on pan-cancer is illustrated
in Fig. [179]1.
To mine miRNAs that reside in the aberrated chromosomal regions of
cancer patients, in the first step (i.e., GISTIC Step), we computed
frequently aberrated chromosomal regions, namely, GISTIC regions, for
eighteen different cancer cohorts. We utilized segmented chromosomal
copy number profiles of each cancer cohort as inputs in GISTIC
2.0^[180]85 tool in GenePattern^[181]86 webserver and computed
chromosomal regions that were frequently aberrated within each patient
cohort using a confidence interval of 0.90. The GISTIC regions with a
[MATH: log2 :MATH]
ratio above 0.1 were considered amplified and the GISTIC regions with a
[MATH: log2 :MATH]
ratio below −
[MATH: 0.1 :MATH]
were considered deleted. We further processed the GISTIC regions of
each cancer type using the getRegionWiseGistic function in the
miRDriver R package to gather patients from each region with their
aberration status (i.e., aberrated and non-aberrated).
In the second step (i.e., the DE Step), we computed DE genes for each
GISTIC region. We computed these DE genes between frequently aberrated
and non-aberrated patient sample groups in each cancer type cohort
using getDifferentiallyExpressedGenes function in miRDriver with
default parameters. This function employed edgeR^[182]87 package in R
utilizing mRNA raw counts to compute DE genes among these two groups
using absolute log fold change (logFC) ≥ 1 and adjusted p-value < 0.05.
Using the makingCisAndTransGenes function, we annotated DE genes
located inside the GISTIC region as cis genes and DE genes outside of
the GISTIC region as trans genes. This step also retrieves the miRNAs
(i.e., cis miRNAs) in each GISTIC region. Since the number of cis
miRNAs per GISTIC region was extremely low, to avoid reducing the
sensitivity and precision of our findings, we did not further filter
cis miRNAs based on differential expression. The counts of trans genes,
cis genes and cis miRNAs for each GISTIC region in eighteen different
cancer types can be accessed from Supplemental Table [183]S8.
In the REGULATOR Step (i.e., the third step) of miRDriver, we collected
all the potential predictors, namely, cis genes, cis miRNAs,
gene-centric copy number data, gene-centric DNA methylation beta values
and TFs in each cancer type that could influence each DE trans gene's
expression. We used the getTransGenePredictorFile function to gather
all the predictors. This function only considered those trans genes
that had at least one cis miRNA as a possible predictor.
In the LASSO Step, we computed the potential cis miRNAs that regulate
the DE trans genes' expression variation. We used the
lassoParallelForTransGene function in the miRDriver R package that
utilized R package glmnet^[184]88 to perform LASSO to compute miRNA
regulators of the DE trans genes. This function considered the
gene-centric copy number, gene-centric DNA methylation, TFs, miRNA
expression as independent variables and the trans gene's expression as
the response variable. For each trans gene, out of all its candidate
predictors (i.e., independent variables), LASSO selected a set of
non-zero coefficient predictors. Since the independent variables
selected by LASSO have been shown to be inconsistent, especially when
the sample size gets large^[185]89, we ran LASSO 100 times for each
trans gene and kept the cis miRNAs selected by LASSO at least 70 times.
We found that miRNAs with threshold 70 to be the most consistent set of
potential regulator miRNAs to be considered in the computed miRNA-gene
interaction networks in each cancer type cohort (Supplemental Fig.
[186]S24). To optimize the regularization parameter λ of LASSO, for
each of 100 runs, we applied 10-fold cross-validation and picked λ that
provided the simplest model with the minimum cross-validation error.
Although miRNAs typically cause the inverse regulation of their target
genes^[187]59–[188]61, miRDriver considers both positively and
negatively correlated miRNA-target pairs for each cancer type. Since
miRDriver computes miRNA-gene interactions that could be direct or
indirect interactions, a positive correlation between them is also
possible. Furthermore, a positive correlation between miRNAs and their
direct targets is also possible^[189]90–[190]93. The computed
miRNA-gene interactions in eighteen different cancer types can be
accessed from Supplemental Table [191]S9.
Running state-of-the-art-methods
We compared miRDriver with five state-of-the-art methods, namely,
ARACNe^[192]5, ProMISe^[193]6, HiddenICP^[194]7, idaFast^[195]8 and
jointIDA^[196]9 by running them on datasets from eighteen cancer types
in TCGA. Since these methods can only utilize gene expression data, we
used gene expression data to compute miRNA-gene interaction networks
for our comparison For ARACNe, ProMIse and hiddenICP, we used the same
number of input genes and miRNAs that we used in miRDriver for each
cancer type. Since idaFast and jointIDA methods have high computational
complexity and therefore are not scalable to large datasets, we run
these two methods with ≤ 50 top miRNAs and ≤ 1500 top genes selected by
Feature Selection Based on The Most Variant Median Absolute Deviation
(FSbyMAD)^[197]94 for each cancer type. After running ARACNe, we
selected all of the miRNA-gene interactions that had non-zero scores to
be compared with our method. For ProMIse, hiddenICP, idaFast and
jointIDA, we considered the top 3, 3, 3.5 and 3.5 percentile of
miRNA-gene interactions based on reported scores, respectively. Based
on our previous work with the breast cancer cohort, these thresholds
were chosen to get highly confident gene-miRNA interactions for
comparison and were used for all eighteen different cancer types. The
details of running these methods can be found in our previous
publication^[198]17.
Datasets to run miRDriver on pan-cancer
In this study, we utilized gene expression, CNA, DNA methylation,
TF-gene interaction and miRNA expression data from eighteen different
cancer types. We used the R Bioconductor package TCGAbiolinks^[199]77
to download the genomic data of cancer patient samples from TCGA. We
retrieved gene expression quantification data for raw count (Illumina
HiSeq) and RNA sequencing data with FPKM (Fragments Per Kilobase of the
transcript, per Million, mapped reads) for all the cancer types. TCGA
gene expression data consist of mRNAs (i.e., messenger RNAs), lncRNAs,
and pseudogenes. Thus, our analysis considered all these RNAs.
We downloaded miRNAs' gene quantification expression with file type
hg19.mirbase20.mirna and isoform gene quantification data with file
type hg19.mirbase20.isoform from the legacy data of TCGA. For each
cancer type, we used the miRNAs that have ≥ 0.01 RPM (reads per million
mapped reads) value across ≥ 30% of the cohort.
We retrieved masked copy number variation (Affymetrix SNP Array 6.0)
and computed the gene-centric copy number value compatible with hg38
using the R Bioconductor package CNTools^[200]95.
We downloaded DNA methylation data of Infinium HumanMethylation27
Bead-Chip (27K) and Infinium HumanMethylation450 Bead-Chip (450K)
platforms from TCGA. Gene-specific beta values were calculated
separately for both platforms. For the 450K platform, the average beta
value for promoter-specific probes was considered due to their role in
transcriptional silencing^[201]96. Given lower coverage in the 27K
platform, we utilized all the probes. In this case, we set the DNA
methylation of a gene as the average beta values of all its probes.
We downloaded experimentally-validated TF-gene interactions from TRED
and TRRUST databases to incorporate TF-gene interactions in the LASSO
step. Table [202]1 shows the sample sizes of different data modalities
used in this study for eighteen different cancer types from TCGA.
Datasets to evaluate miRDriver
To check the correlation between copy number and expression across all
the samples of the computed miRNAs of miRDriver, we used TCGA's masked
copy number variation (Affymetrix SNP Array 6.0) data. We utilized the
R Bioconductor package CNTools^[203]95 to compute the miRNA-centric
copy number value by giving miRNA coordinates extracted from the TCGA's
legacy data file type hg19.mirbase20.isoform.
To evaluate if the miRNAs computed by miRDriver were enriched in
cancer-related miRNAs, we downloaded a list of 351 known oncogenic
miRNAs from the oncomiRDB database^[204]97. Each miRNA listed in
oncomiRDB is involved in at least one cancer-related phenotype or
cellular process. We harmonized the names of oncomiRDB miRNAs regarding
the miRBase^[205]98 database.
To check if the miRNA-gene interactions computed by miRDriver were
significantly enriched in the known miRNA-gene interactions, we
performed a hypergeometric test for the computed target genes of each
miRNA. For this purpose, we compiled a list of experimentally-validated
miRNA-gene interactions from miRTarBasev6.1, TarBasev7.0 and miRWalk
databases^[206]99 as our ground truth data. Considering that miRDriver
could compute direct targets and the indirect downstream targets (i.e.,
targets of the direct targets), we included potential indirect targets
to the ground truth dataset. Hence, for each miRNA-gene interaction
where the gene was a known TF, we included the experimentally-validated
targets of this TF obtained from TRED and TRRUST databases.
To assess the prognostic relevance of the miRDriver-selected miRNAs as
clinical biomarkers, we performed multivariate survival
analysis^[207]82 and multivariate Cox regression^[208]84. We downloaded
the clinical data for eighteen different cancer types using
TCGAbiolinks^[209]77. We considered the available clinical variables
from age, race, gender, stage, and grade as independent variables
whenever available (see Table [210]14).
Table 14.
Availability of clinical variables in TCGA.
[211]graphic file with name 41598_2022_7628_Tab14_HTML.jpg
[212]Open in a new tab
Green—Available; Black—Unavailable.
We considered four different endpoints, namely, OS, PFI, DSS and DFI.
In OS, patients who were dead from any cause were considered as dead,
otherwise censored. In PFI, patients having new tumor event whether it
was a progression of the disease, local recurrence, distant metastasis,
new primary tumor event, or died with cancer without new tumor event,
including cases with a new tumor event whose type is N/A were
considered as "event occurred" and all other patients were censored.
DFI was similar to PFI with the inclusion of censored patients with new
primary tumors in other organs; patients who were dead with tumors
without new tumor event and patients with stage IV were excluded. In
DSS, disease-specific survival time in days, last contact days, or
death days, whichever was larger, was used to identify "event occurred"
versus censored patients^[213]100.
We checked the subtype-specific association of gene expression of
computed target genes in BRCA, LGG, KIRC, LUSC and PAAD. We used the R
package TCGAbiolinks^[214]77 to download the different subtype labels
for the different cancer types.
Discussion
We developed a computational pipeline called miRDriver, which
integrates multi-omics datasets such as CNA, DNA methylation, TFs,
gene, and miRNA expression to infer copy number-derived miRNA-gene
interactions in cancer. In the current study, we extended the use of
miRDriver with an R package and carried out a comprehensive and
rigorous analysis of the pan-cancer characterization of TCGA samples to
infer miRNA-gene interaction networks integrating multi-omics datasets.
We focused on DNA aberration regions of 7294 cancer samples associated
with eighteen different cancer types uncovering the tissue-specific
omics interplay in establishing the miRNA–gene associations. miRDriver
outperformed several existing methods in all different cancer types
used in the study. In each case, miRDriver was able to select many
miRNA-gene interactions enriched in known miRNA-target databases. We
observed that selected miRNAs by miRDriver were significantly enriched
in the known cancer-related miRNAs.
Several cancer-related biological pathways and GO terms were found to
be enriched in the computed genes. Among these, GPCR-related pathways,
which play crucial roles in tumor development, invasion, migration,
survival, and metastasis, were enriched in ten or more cancer types.
More than 40% of the total computed genes were cited in cancer-related
studies based on OncoScore frequency. Among these, at least 50% of
genes had more than ten cancer-related citations.
We highlighted 22 common miRNAs that were frequently selected in
multiple cancer types and explored their prognostic roles. Several of
these miRNAs had significant survival differences in high and
low-expression patient sample groups. Among these, miRNAs belonging to
the let-7 family were found to act as both tumor suppressors and
oncogene in several studies^[215]101. miR-100, miR-149, miR-210,
miR-31, miR-346, miR-34b, miR-486 and miR-675 were cited in
cancer-related studies with high OncoScore frequency. We found several
enriched GO terms with the computed targets of these 22 common miRNAs.
Among these, GO terms such as "Regulation of gene silencing by miRNA"
and "Regulation of post-transcriptional gene silencing" were implicated
in several cancer-related studies explaining the miRNAs' roles in
cancer initiation and progression ^[216]53,[217]102. The GO term
"Chromatin silencing" was involved in cancer ^[218]49,[219]103. The GO
term "DNA replication-dependent nucleosome assembly" has been studied
concerning cell fate and differentiation regulation and suggested to be
explored in cancer in a recent study^[220]104.
We also assessed these common miRNAs as non-invasive biomarkers, such
as the presence of these miRNAs as the circulating miRNAs that can be
detected in organic liquids effectively after getting discharged by the
tumor cells. For this purpose, we submitted these 22 miRNAs to the
MiRandola^[221]105 database as a knowledge base for extracellular
circulating miRNAs for inferring their relevance as non-invasive
biomarkers. We found ten out of 22 common miRNAs, namely let-7b,
miR-100, miR-1249, miR-149, miR-210, miR-31, miR-346, miR-34b, miR-486
and miR-675, to be as potent non-invasive biomarkers.
Although there were common miRNAs across multiple cancer types, there
were not many common miRNA-gene interactions. Only fourteen common
interactions were shared in at least two cancer types among ~ 10,000
computed interactions. Considering the much higher number of target
genes than the miRNAs used in this analysis, these findings were not
surprising. We discussed several of these interactions that were found
to be in experimental studies.
We identified several cancer driver genes targeted by multiple miRNAs
(i.e., high-degree genes) across different cancer types. Also,
high-degree target genes have been shown to have a strong association
with the molecular subtypes in multiple cancer types, such as BRCA,
LGG, LUSC and PAAD. Specifically, in BRCA, 106 high-degree genes (three
genes were common with PAM50 genes) were found to serve as
subtype-specific gene signatures with high classification accuracy with
respect to the baseline PAM50 gene-based subtypes. We compared the
prognostic significance of low-degree target genes with high-degree
target genes in the disease progression and survival hazards. We
discovered high-degree genes to be more significant prognostic factors
than low-degree genes. These findings point out that multiple miRNAs in
coordination can impact the gene expression stronger than a single
miRNA.
The presented pan-cancer-wide analysis discovering copy
number-aberration-influenced miRNA-target associations may be used in
future experimental work to validate the roles of the miRNAs in
context-specific gene regulation to derive even greater confidence in
their tissue-specific associations. We integrated several potential
co-regulators such as CNA, DNA methylation, miRNA expression and TFs,
that can influence trans gene's expression in the LASSO step. Other
potential regulators such as histone modification and chromatin
accessibility (such as ATAC-seq) could also be integrated. miRDriver
outperformed the existing sequence-based ceRNA inference tool, Cupid.
This analysis reveals that this work can be further examined by taking
into account the presence of recognized target sites that contribute to
gene regulation, as well as utilizing ceRNA interactions to improve the
inferred miRNA-gene networks. miRDriver does compute both direct and
indirect targets of miRNAs, which helps decipher the downstream
biological processes and pathways regulated by these miRNAs. To
identify the direct targets of these selected miRNAs, one could utilize
sequence-based filtering.
Finally, in this study, we established miRDriver as an R software
package and provided users with a variety of options for running our
workflow with their preferred settings. Users can, for example, utilize
the tool exclusively with up or down-regulated genes from amplified or
deleted regions, or both. However, in these cases, the context in which
miRNA-gene interactions are discovered will limit their detection. To
receive the most comprehensive list of miRNA-gene interactions, we
propose that users evaluate all of the directions. In the software, we
have also included the flexibility to utilize user-defined TF-targets
with evidence-based confidence levels filtering options for
cancer-related TF-target interactions from the DoRothEA gene set
resource^[222]106. In this study, however, we used only the highly
confident TF-target interactions from TRED and TRRUST in the LASSO step
as using many predictors in LASSO could affect its performance, and
cause false positive and false negative interactions. Furthermore,
considering gene expression is controlled at multiple levels, including
transcriptional regulation and post-transcriptional regulation, our
software provides the flexibility to run the LASSO step in two phases.
In the first run, only the transcriptional predictors could be utilized
to explain the expression variation. In the second run,
post-transcriptional predictors and the residual of the first LASSO run
can be utilized as the independent and dependent variables,
respectively. Alternatively, if the user has the transcriptional and
post-transcriptional expression change data, both LASSO runs can be
performed in any order. The details of all these options can be
accessed in the vignette of the miRDriver R package.
Supplementary Information
[223]Supplementary Information 1.^ (73.5KB, xlsx)
[224]Supplementary Information 2.^ (20.6KB, xlsx)
[225]Supplementary Information 3.^ (107.9KB, xlsx)
[226]Supplementary Information 4.^ (15.7KB, xlsx)
[227]Supplementary Information 5.^ (38.6KB, xlsx)
[228]Supplementary Information 6.^ (16.2KB, xlsx)
[229]Supplementary Information 7.^ (22KB, xlsx)
[230]Supplementary Information 8.^ (78.7KB, xlsx)
[231]Supplementary Information 9.^ (276.4KB, xlsx)
[232]Supplementary Information 10.^ (159.5KB, pdf)
[233]Supplementary Information 11.^ (536.6KB, pdf)
[234]Supplementary Information 12.^ (667.7KB, pdf)
[235]Supplementary Information 13.^ (330.1KB, pdf)
[236]Supplementary Information 14.^ (193.5KB, pdf)
[237]Supplementary Information 15.^ (340.8KB, pdf)
[238]Supplementary Information 16.^ (382.7KB, pdf)
[239]Supplementary Information 17.^ (234.6KB, pdf)
[240]Supplementary Information 18.^ (168.8KB, pdf)
[241]Supplementary Information 19.^ (277.5KB, pdf)
[242]Supplementary Information 20.^ (372.1KB, pdf)
[243]Supplementary Information 21.^ (595.4KB, pdf)
[244]Supplementary Information 22.^ (449.6KB, pdf)
[245]Supplementary Information 23.^ (859.4KB, pdf)
[246]Supplementary Information 24.^ (275.8KB, pdf)
[247]Supplementary Information 25.^ (227.5KB, pdf)
[248]Supplementary Information 26.^ (158.5KB, pdf)
[249]Supplementary Information 27.^ (513.6KB, pdf)
[250]Supplementary Information 28.^ (119.2KB, pdf)
[251]Supplementary Information 29.^ (146.8KB, pdf)
[252]Supplementary Information 30.^ (152KB, pdf)
[253]Supplementary Information 31.^ (151.9KB, pdf)
[254]Supplementary Information 32.^ (150.4KB, pdf)
[255]Supplementary Information 33.^ (173.6KB, pdf)
[256]Supplementary Information 34.^ (219.1KB, zip)
Acknowledgements