Abstract

   MicroRNA is responsible for the fine-tuning of fundamental cellular
   activities and human disease development. The altered availability of
   microRNAs, target mRNAs, and other types of endogenous RNAs competing
   for microRNA interactions reflects the dynamic and conditional property
   of microRNA-mediated gene regulation that remains under-investigated.
   Here we propose a new integrative method to study this dynamic process
   by considering both competing and cooperative mechanisms and
   identifying functional modules where different microRNAs co-regulate
   the same functional process. Specifically, a new pipeline was built
   based on a meta-Lasso regression model and the proof-of-concept study
   was performed using a large-scale genomic dataset from ~4,200 patients
   with 9 cancer types. In the analysis, 10,726 microRNA-mRNA interactions
   were identified to be associated with a specific stage and/or type of
   cancer, which demonstrated the dynamic and conditional miRNA regulation
   during cancer progression. On the other hands, we detected 4,134
   regulatory modules that exhibit high fidelity of microRNA function
   through selective microRNA-mRNA binding and modulation. For example,
   miR-18a-3p, −320a, −193b-3p, and −92b-3p co-regulate the
   glycolysis/gluconeogenesis and focal adhesion in cancers of kidney,
   liver, lung, and uterus. Furthermore, several new insights into dynamic
   microRNA regulation in cancers have been discovered in this study.

Introduction

   Mature microRNAs (miRNAs) are typically 20–25 nucleotides long. They
   hybridize with complementary sequences in the 3′-untranslated regions
   (UTRs) in messenger RNAs (mRNAs) and silence protein-coding genes
   through destabilizing mRNA or preventing translation of mRNA^[32]1. In
   humans, it was estimated that 2,588 miRNAs regulate over 60% of human
   genes and participate in every aspect of cellular activities in cell
   growth and cell death^[33]2. During the past decade, numerous studies
   have disclosed the regulatory roles of miRNAs in both the fundamental
   cellular processes^[34]3–[35]9 and the development of complex human
   diseases such as cancer^[36]10–[37]14.

   Functional study of miRNA largely depends on the reliable
   identification of miRNA-mRNA regulatory interactions. Most
   state-of-the-art computational prediction methods, such as
   miRanda^[38]15, RNA22^[39]16, DIANA-microT^[40]17 and
   Targetscan^[41]18, are focused on primary search for nucleotide
   sequences complementary to the miRNA seed region (2^nd–8^th bases on
   the 5′ end)^[42]18 and suffer from high false prediction because of the
   dramatic complexity in miRNA binding^[43]19,[44]20. Recent
   sequencing-based interactome data provide different views in this
   issue. For examples, through the crosslinking, ligation, and sequencing
   of hybrids (CLASH) analysis^[45]21, 18,514 miRNA-mRNA interactions were
   detected and only ~22% were associated with seed region, via either
   contiguous or gapped complementary RNA base pairing. The same study
   also unveiled that ~60% of the binding sites are within coding region
   of mRNA, as opposed to the 3′UTR centric search by the existing
   algorithms. Similar findings were observed using the covalent ligation
   of endogenous Argonaute–bound RNAs (CLEAR)-CLIP in human hepatoma
   (Huh7.5) cells, where ~26% of the interactions are seed-associated and
   ~57% are non-3′UTR interactions^[46]22. In addition, compelling
   evidence also shows the stochastic nature of miRNA-mRNA interactions
   that 1) multiple miRNAs can bind to the same mRNA sequence or different
   copies of the same transcript - combinatorial
   interactions^[47]23–[48]26 and 2) multiple different mRNAs, possibly
   along with other long non-coding RNAs and circular RNAs^[49]27–[50]29,
   can compete for binding to the same miRNA - competitive
   interactions^[51]23,[52]30–[53]32, which is very similar to
   transcription factor (TF) regulation^[54]33,[55]34. Other factors,
   including genetic mutations^[56]35–[57]38, the competition with other
   RNA binding proteins^[58]39,[59]40 and the conditional expression of
   miRNA and mRNA^[60]41 can also affect the status of miRNA-mRNA
   interactions. Each of these mechanisms stresses the dynamic miRNA
   regulation from a different perspective and many of them are
   conditional (e.g., associated with certain phenotypes or development
   stages). It is notable the conditional information can be captured by
   genomic data collected under associated conditions. However, to the
   best of our knowledge, currently there is no systematic assessment
   available on the dynamic miRNA regulation that assesses the changes
   across conditions. The first such attempt, reported in^[61]24, only
   briefly touched the combinatorial module that a set of miRNA can
   regulate a common set of targets in a specific condition, which hinders
   the practical use as a dynamic system.

   We notice that the actual challenges in tackling dynamic miRNA
   regulation involve several major trends. First, while the reliable
   stratification of gene regulation networks has been lagging, the
   associations between miRNAs and TFs were reported in different
   scenarios^[62]26,[63]42–[64]44. For example, one TF gene, CTCF,
   controls the expression of HOXC5, which is the host gene of
   miR-615-3p^[65]45,[66]46 while miR-615-3p interacts with CTCF
   transcript in CALSH experiment^[67]21. This type of bi-directed
   regulation between TF and miRNA may introduce the feedback loops to the
   whole gene regulation network. Likewise, in order to determine the
   miRNA-mediated regulation on gene expression, one should also consider
   other major gene regulatory factors, such as copy number variation
   (CNV) and DNA methylation^[68]47. Secondly, evidence also shows that
   multiple miRNAs can regulate a set of functionally relevant genes
   instead of the same gene in a cooperative manner, e.g., miR-17-5p,
   -18a-5p, -19a-3p, and -92a-3p co-regulate 44 functionally related
   genes, such as CCND2, TNRC6B, and PHF12^[69]48. Although our knowledge
   of cooperative miRNA regulation is extremely limited, it is envisioned
   that the mechanisms underlying TF non-complex
   co-regulation^[70]49,[71]50 can be adopted in miRNA study.

   In this study, we propose a new method to identify dynamic miRNA-gene
   interactions, particularly focusing on the conditional interactions
   that are associated with tumor progression, based on the integration of
   heterogeneous genomic information on multiple human cancers. Other gene
   expression regulators such as TF, CNV, and DNA methylation have also
   been considered when determining miRNA-mediated gene regulatory
   networks in each developmental stage of cancer. Moreover, we
   demonstrated the discovery of modularized miRNA regulation based on the
   identified miRNA-gene interactions.

Results

   The workflow of this study is demonstrated in Fig. [72]1 with the
   detailed methodologies explained in Methods. We applied the proposed
   pipeline on cancers from seven tissues including lung, kidney, stomach,
   breast, liver, uterine, and pancreas, and have identified miRNA-mRNA
   interactions associated with each specific cancer. Note that the
   interactions were identified in the regulator detection step
   (Fig. [73]1B), which was performed for each cancer and each stage.
   While the complete results were archived in our online database
   ([74]http://sbbi.unl.edu/miRDR), main discoveries were presented in the
   following sections. Particularly, we focused on the kidney and lung
   cancers and compare our discoveries between two well-defined subtypes,
   i.e., Kidney Renal Clear Cell Carcinoma (KIRC) versus Kidney Renal
   Papillary Cell Carcinoma (KIRP), and Lung Adenocarcinoma (LUAD) versus
   Lung Squamous Cell Carcinoma (LUSC).

Figure 1.

   [75]Figure 1
   [76]Open in a new tab

   Discovery of dynamic and modularized miRNA regulation during cancer
   progression. (A) The identification pipeline of conditional miRNA
   regulatory interactions; (B) Meta-Lasso Regression approach to the
   detection of microRNA regulators of each gene in each cancer stage.

Transcript availability and binding affinity are not the only factors in
miRNA regulation

   After reanalyzing CLASH data (details in Methods), 17,400 miRNA-mRNA
   interactions resulting from 17,651 high-confidence binidng sites have
   passed the quality-filtering step and remained for the downstream
   analysis. Figure [77]2A shows the abundance distribution of these
   interactions and Fig. [78]2B shows the distribution of corresponding
   corrected p-values from the Binomial test. 95.3% of the interactions
   have their counts significantly higher compared to random interaction
   (p-value ≤ 0.05 after Bonferroni multiple test adjustment). Detailed
   statistics were provided in Supplementary Table [79]S2.

Figure 2.

   [80]Figure 2
   [81]Open in a new tab

   Overview of the interactions detected by the CLASH analysis. (A)
   Distribution of the miRNA-gene interactions (y-axis) with respect to
   their abundances (x-axis). (B) Distribution of the corrected p-values
   of each interaction from Binomial test for quality filtering. (C)
   Projection of each interaction based on the abundance of the
   corresponding miRNA (x-axis), mRNA (y-axis), and interaction (z-axis).
   (D) The relationship between the abundances of each miR-92a-3p
   interaction and the corresponding mRNA. The colored points indicated
   the three levels of binding affinities based on MFE values: High (red,
   [−31.2, −21.8]], Medium (green, [−21.8, 12.3]), and Low (blue, [−12.3,
   −2.9]).

   We also observed weak correlations of miRNA regulatory interaction
   abundance with both mRNA and miRNA abundance (Fig. [82]2C). For
   example, 57.9% of miRNAs has Pearson correlation coefficient r ≤ 0.03
   between the abundance of their target mRNAs and the number of
   interactions, contradicting the generally-believed assumption that
   miRNA binds to its different mRNA targets proportionally with respect
   to their sequence availability. Figure [83]2D shows that the most
   “versatile” miRNA, i.e., miR-92a-3p, regulates 956 different mRNAs
   through 1,065 different interaction sites. The interaction counts range
   from 63 to 23,870 (Mean = 296.5), significantly correlated with the
   corresponding mRNA abundance (r = 0.26). Similar weak correlations were
   also observed between binding affinities, e.g., minimum free energy
   (MFE)^[84]51, and interaction abundances (r = 0.01).

   In summary, these findings suggest that in addition to mRNA sequence
   availability and binding affinity, there must be some other factors
   impacting the miRNA-mRNA binding, which makes miRNA regulation a highly
   selective process. For example, one possible factor can be mRNA
   structure that determines the accessibility for miRNA binding.

MiRNA-mRNA interactions can be specific to cancer type and stage

   The initial analysis of genomic data has identified differentially
   expressed (DE) genes and miRNAs in different stages of cancer (summary
   statistics are in Supplementary Table [85]S3 while more details are
   available online). Many changes were found to be stage-specific. For
   example, there are 4,273 DE-genes with fold-change (FC) over 2 (fold)
   in at least one stage of KIRC, but only 21.2% (908) of them have more
   than 2-fold changes in all four stages. Similar observations were made
   on miRNAs, i.e., 174 DE-miRNAs were identified in KIRC and only 43.7%
   (76) were consistent across all stages. This analysis provides a list
   of 15,901 genes and 419 miRNAs that are cancer-associated for
   downstream analysis. However, this simple correlation analysis each
   time only checks one pair of miRNA and mRNA. It fails to acknowledge
   the fact that in reality there can be multiple miRNA regulators for one
   individual gene. Considering the fact of simultaneous multiple
   interactions, we turn to our proposed pipeline to solve this issue.

   Using our meta-Lasso regression approach (Methods), we predicted
   interactions between miRNA and miRNA-targetable genes under each cancer
   developmental stage. Overall, there were 14,554 unique miRNA-mRNA
   interactions identified across all conditions. Similar to DE genes and
   miRNAs, those interactions tend to be cancer stage- or type- specific.
   Figure [86]3A shows the statistics of interactions identified in each
   cancer and each stage where the blue bars indicate the portion specific
   to the corresponding stage within that cancer type. First, we observed
   in most cancers (8 out 9), miRNA interactions are more abundant in
   early stages (Stage I + II) compared to the advanced stages, which
   highlights the miRNA’s implication in the early cancer progression. In
   total, 73.7% of the interactions are detected in only one stage of that
   cancer. In different stages of a cancer, the same miRNA may regulate
   different genes while the same gene can be regulated by different
   miRNAs. For example, a kidney developing gene, Homeobox A11 (HOXA11),
   was regulated by three miRNAs in KIRC and suppressed expression was
   observed during progression (3-fold, miR-769-3p (Stage I); 2.9-fold,
   miR-484 (Stage II); 2.1-fold, miR-30d-5p (Stage III); 2.2-fold, none
   (Stage IV)) (Fig. [87]4A). Similar observation was made on Kinesin
   Family Member 1 C (KIF1C) gene: five miRNAs interacted with KIF1C in
   three stages of KIRC (1.8-fold, miR-484 and miR-769-3p (Stage I);
   1.9-fold, miR-1260b, let-7b-5p, miR-149-5p, miR-769-3p and miR-484
   (Stage II); 2.0-fold, miR-769-3p (Stage III); 2.2-fold, none (Stage
   IV)) (Fig. [88]4B). This may indicate that different miRNAs can play a
   similar role in cancer development by targeting the same target.

Figure 3.

   [89]Figure 3
   [90]Open in a new tab

   Overview of the miRNA-mRNA interactions identified in nine cancers. (A)
   Number of interactions detected in a specific stage (Blue) versus
   multiple stages (Orange), in each cancer. (B) Pie chart shows the
   percentages of interactions shared by different cancers.

Figure 4.

   [91]Figure 4
   [92]Open in a new tab

   Illustration of the dynamic miRNA-mediated gene regulation (miRNA: pink
   diamonds and genes: purple nodes). (A) Three miRNAs regulate HOXA11
   gene in different stages of KIRC. (B) Five miRNAs repress the same
   KIF1C gene in KIRC while the miR-484 and miR-769-3p (pink diamonds with
   red border) consistently interact with KIF1C across stages.

   In addition to the low consistency across stages of the same cancer, we
   also observed limited common miRNA regulatory interactions across
   cancers. Up to 92.7% (12,742) of the interactions were shared by less
   than five types of cancer and 22.8% (3,142) were identified
   specifically in one cancer type (Fig. [93]3B). Interestingly, even
   within the same tissue type, the number of common interactions across
   cancer subtypes was much lower than expected, e.g., 17.8% (1,264 out of
   7,118) between KIRC and KIRP and 16.2% (1,051 out of 6,496) between
   LUAD and LUSC. This may be explained by the distinct expression
   profiles of genes and miRNAs in different subtypes of cancer. For
   example, KIRC and KIRP only share 34.5% of 6,332 DE genes and 23.5% of
   136 DE miRNAs. The distinct miRNA regulatory patterns among stages and
   cancers demonstrate again the conditional feature of miRNA regulation.
   Moreover, it signifies that some miRNAs can only contribute to certain
   types or phases of tumor developments.

Functional implication of miRNA regulation in kidney cancers

   To assess the functional impacts of miRNA regulation in cancer
   progression, we further examined the interactions identified in two
   subtypes of kidney cancer (KIRC and KIRP), particularly focusing on
   those associated with cancer driver genes reported by Vogelstein et
   al.^[94]52. Per our identification, 19 cancer genes were regulated by
   miRNAs in both early-stage KIRC and KIRP (purple nodes in the upper
   panel, Figure [95]S1), where only 16 out of the 118 related
   interactions are commonly shared by both cancers (red edges) and 12
   genes are regulated by totally different miRNAs. For example,
   Neurogenic Locus Notch Homolog Protein 2 gene (NOTCH2) was regulated by
   miR-744-5p, -25-3p, -92a-3p, and -27b-3p in KIRC and by miR-744-5p,
   -106a-5p, -130b-5p, and let-7c-5p in KIRP. Another critical cancer
   suppressor gene TP53 was regulated by miR-454-3p in KIRC, and by
   miR-324-5p and -222-3p in KIRP. On the other hand, we observed that
   common miRNAs regulated different targets in two cancer subtypes. Among
   27 common miRNAs, 65.0% of the interactions were linked to different
   cancer genes in both early-stage tumors.

   Given the distinct mechanisms underlying the development of these two
   subtypes of cancer^[96]53,[97]54, we are also interested in exploring
   how miRNA regulation contributes to the general or divergent growth of
   kidney cancers. Pathway and gene ontology (GO) enrichment analysis was
   performed using The Database for Annotation, Visualization and
   Integrated Discovery (DAVID v6.8)^[98]55 based on the suppressed genes
   in the early-stage cancers. Table [99]1 lists the top enriched KEGG
   pathways that are either common or specific in KIRC and KIRP (selected
   according to p-values and the complete list of GO and pathways are
   provided in Supplementary Table [100]S4). Among the three common
   pathways (PI3K-ATK signaling pathway, Cell cycle, and Thyroid hormone
   signaling pathway), 22 genes were miRNA-regulated in the early-stage
   kidney cancer and 9 of them occurred in both subtypes. Only 7
   miRNA-gene interactions were observed in both subtypes, which indicates
   that the same functional regulation can be realized by miRNAs
   interacting with different gene targets.

Table 1.

   Illustration of the common and cancer-specific pathways and the
   miRNA-mRNA interactions involved in KIRC and KIRP.
   Enriched Pathways Interactions in KIRC Interactions in KIRP
   Common PI3K-ATK Signaling Pathway MYC-miR-17-5p; TSC1-miR-320a;
   PPP2R1A-miR-106a-5p/ -941; TP53-miR-454-3p; JAK1-miR-15b-5p;
   JAK2-miR-195-5p; PTEN-miR-181b-5p MYC-miR-17-5p/ -320a/ -423-5p/
   -744-5p/ let-7b-5p; TSC1-miR-320a; PPP2R1A-miR-17-5p/ -222-3p/ -92a-3p;
   TP53-miR-222-3p/ -324-5p; CCND1-miR-15a-5p/ -193b-3p/ -196a-5p/
   -92a-3p; FGFR2-miR-186-5p; MAP2K1-miR-93-3p; NRAS-miR−877-3p;
   STK11-miR-17-5p
   Cell Cycle CREBBP-let-7e-5p/miR-1301-5p; MYC-miR-17-5p;
   TP53-miR-454-3p; EP300-miR-30c-5p; ATM-miR-3144-3p/ -92a-3p;
   CDKN2A-miR-455-3p; STAG2-miR-193b-3p; ABL1-let-7b-5p CREBBP-let-7e-5p/
   miR-1301-5p/ -935; EP300-let-7b-5p/ -193b-3p/ -92a-3p; MYC-miR-17-5p/
   -320a/ -423-5p/ -744-5p/let-7b-5p; TP53-miR-222-3p/ -324-5p;
   CDKN2A-miR-320a; CCND1-miR-15a-5p/ -193b-3p/ -196a-5p/ -92a-3p
   Thyroid Hormone Signaling Pathway CREBBP-let-7e-5p/miR-1301-5p;
   MYC-miR-17-5p; NCOA3-miR-20a-5p; NOTCH2-miR-744-5p/ -27b-3p/ -25-3p/
   -92a-3p; TP53-miR-454-3p; EP300-miR-30c-5p; CTNNB1-miR-744-5p
   CREBBP-let-7e-5p/miR-1301-5p/ -935; MYC-miR-17-5p/ -320a/ -423-5p/
   -744-5p/ let-7b-5p; NCOA3-miR-20a-5p; NOTCH2-miR-744-5p/ -106a-5p/
   -130b-5p/ let-7c-5p; TP53-miR-222-3p/ -324-5p; EP300-let-7b-5p/
   -193b-3p/ -92a-3p; CCND1-miR-15a-5p/ -193b-3p/ -196a-5p/ -92a-3p;
   NRAS-miR-877-3p; MAP2K1-miR-93-3p; MED12-miR-454-3p
   KIRC-Specific Herpes Simplex Infection CREBBP-let-7e-5p/miR-1301-5p;
   TP53-miR-454-3p; EP300-miR-30c-5p; JAK1-miR-15b-5p; JAK2-miR-195-5p;
   PTEN-miR−181b-5p; PTPN11-miR-130b-3p/ −183-5p/ −193b-3p
   Transcriptional Misregulation MYC-miR-17-5p; TP53-miR-454-3p;
   ATM-miR-3144-3p/ −92a-3p; CEBPA-miR-744-5p; H3F3A-miR-100-5p;
   MEN1-miR-17-5p
   Jak-STAT Signaling Pathway CREBBP-let-7e-5p/ miR-1301-5p;
   MYC-miR-17-5p; EP300-miR-30c-5p; JAK1-miR-15b-5p; JAK2-miR-195-5p;
   PTEN-miR-181b-5p; PTPN11-miR-130b-5p/ -183-5p/ -193b-3p
   KIRP-Specific MAPK Signaling Pathway MYC-miR-17-5p/ -320a/ -423-5p/
   -744-5p / let-7b-5p; TP53-miR-222-3p/ -324-5p; BRAF-miR-191-5p /
   -744-5p; FGFR2-miR-186-5p; MAP3K1-let-7g-5p/ miR-423-3p;
   MAP2K1-miR-93-3p; NRAS-miR-877-3p
   Rap1 Signaling Pathway GNAS-miR-331-3p; BRAF-miR-191-5p/ -744-5p;
   FGFR2-miR-186-5p; MAP2K1-miR-93-3p; NRAS-miR-877-3p
   Central Carbon Metabolism MYC-miR-17-5p/ -320a/ -423-5p/ -744-5p/
   let-7b-5p; TP53-miR-222-3p/ -324-5p; FGFR2-miR-186-5p;
   MAP2K1-miR-93-3p; NRAS-miR-877-3p
   [101]Open in a new tab

   *Underline indicates the common interactions between KIRC and KIRP and
   Bold denotes the same genes or miRNAs involved in KIRC and KIRP.

   Figure [102]5 demonstrates how miRNAs regulated PI3K-Akt signaling
   pathway (in 5 A) and cell cycle pathway (in 5B) in early-stage KIRC and
   KIRP through various conditional interactions. For example, the common
   genes (purple nodes), e.g., TSC1, MYC, and CDKN2A, were targeted by
   different miRNAs in two cancer subtypes while the common miRNA
   regulators (pink diamonds with red border), e.g., miR-320a, -17-5p, and
   92a-3p, can interact with different targets, which leads to a different
   set of interactions that control the same functional pathway. Moreover,
   within the cancer-specific pathways, the distinct miRNA regulation
   patterns (Table [103]1) also provide new insights into cancer subtyping
   in the miRNA regulation perspective. In addition, similar observations
   were found by comparing LUAD and LUSC (Supplementary Table [104]S5).

Figure 5.

   [105]Figure 5
   [106]Open in a new tab

   MiRNA-mediated gene regulation involved in two functional pathways in
   the early stage of KIRC and KIRP (miRNAs: pink diamonds; genes:
   ellipses). (A) PI3K-Akt Signaling Pathway; (B) Cell Cycle Pathway.
   Purple nodes denote the common genes between KIRC and KIRP; pink
   diamonds with red border indicate the common miRNA regulators; green
   ellipses are the condition-specific cancer genes.

Prediction of miRNA regulatory modules

   Based on consistent observations in all nine cancers, we hypothesized
   that miRNAs can exert a regulatory role in certain biological functions
   through interacting gene targets that may be different across
   conditions. Moreover, multiple miRNAs can effectively cooperate in the
   same functional pathway by repressing the same or associated targets.
   Therefore we conducted a module detection based on the miRNA-gene
   interactions derived from the previous steps (Methods). As a result,
   4,134 miRNA modules were identified and each included a set of 4–5
   miRNAs consistently co-regulating the same set of pathways across more
   than two cancer conditions. Table [107]2 lists the summary statistics
   on the derived modules and details are provided in Supplementary
   Table [108]S6.

Table 2.

   Statistics of the miRNA regulatory modules.
   Module sizes (k) # of k-miRNA Modules # of modules reflecting
   significant pathways # of consistent modules across >= 2 conditions
   Total Cancer-Specific General
   In 2 stages of a cancer In > 2 stages of a cancer In 2 cancers In > 2
   cancers
   1 285 4 — — — — —
   2 6,276 641 — — — — —
   3 149,449 15,383 — — — — —
   4 229,801 23,044 2,349 2,127 — 219 3
   5 1,711,669 83,711 1,785 1,728 — 57 —
   [109]Open in a new tab

   We observed that most miRNA modules play their regulatory roles in
   different stages of cancer, and that only a small proportion is common
   across cancers. As expected, we observed in each single module that
   miRNAs coregulate one or multiple module, miRNAs co-regulate one or
   more biological processes through co-binding to the same target or
   different targets in the same pathway across conditions. Table [110]3
   provides an example of a 4-miRNA module that regulates two pathways
   (glycolysis and focal adhesion) in three cancers (KIRC, LIHC, and
   UCEC). Under different conditions, miRNAs tend to target different
   genes but miR-320a always interacts with the same gene (PKM2) when
   regulating the glycolysis pathway. Figure [111]6 illustrates how the
   same miRNA module regulates critical signaling transduction genes in
   the focal adhesion pathway. For example, MET and LAMB1/COL4A1/LAMA5
   were regulated in different stages. Since those genes are the initial
   activators and triggers in focal adhesion, the regulatory model is
   capable of controlling the whole process more efficiently by each miRNA
   member targeting different genes under different stages.

Table 3.

   An illustration on a 4-miRNA module (miR-18a-3p, -320a, -193b-3p and
   -92b-3p) that regulates two signaling pathways in three conditions.
   Gene targets are listed for each pathway and each condition.
   Pathways MicroRNAs Conditions (Cancer/Stage)
   KIRC LIHC UCEC
   Stage 1 Stage 2 Stage 1
   Glycolysis / Gluconeogenesis Pathway miR-18a-3p PGK1, PKM2 PKM2 PKM2
   miR-320a PKM2 PKM2 PKM2
   miR-193b-3p GPI, PGAM1 GPI, ALDH3A2, ALDH7A1 GPI, ALDH3A2, TPI1
   miR-92b-3p PGAM1 ALDOA PGAM1
   Pathways MicroRNAs KIRC LUSC UCEC
   Stage 1 Stage 1 Stage 1
   Focal Adhesion miR-18a-3p PAK4, PIK3R3 COL4A1 COL4A1
   miR-320a ACTB, CCND2 PAK1, CCND2 LAMA5, MET
   miR-193b-3p LAMB1 FLNA, RAPGEF1, VAV2 CCND1, PTEN, PAK2, PIP5K1C,
   COL4A1
   miR-92b-3p ACTN4, PARVG, PPP1CB FLNB, MAPK1 ACTN4, FLNA, MAPK1, PARVG,
   TLN1
   [112]Open in a new tab

Figure 6.

   [113]Figure 6
   [114]Open in a new tab

   Illustration of the cooperative regulation of a miRNA module that
   contains four miRNAs: miR-18a-3p, -320a, -193b-3p, and -92b-3p (pink
   diamonds). This miRNA module consistently regulates the Focal Adhesion
   pathway in KIRC, LUSC, and UCEC. The target genes are color-coded (each
   color represents one cancer condition and the bi-colored nodes indicate
   the corresponding genes have been regulated by miRNAs in two
   conditions). The detailed miRNA-gene interactions are listed in
   Table [115]3. The network is visualized using Cytoscape 3.2.0 ^[116]72
   with KEGGscape^[117]73 plugin.

Validation of identified regulatory interaction in cancers

   Through comparison, 2,014 miRNA-gene interactions of our prediction
   have been annotated as experimentally validated entries in
   miRTarBase^[118]56, which are reported by various assays, such as
   Luciferase reporter assay, qRT-PCR or Western blot analysis with miRNA
   transfections (Supplementary Table [119]S7).

   To further validate the interations in cancer, we also performed an
   extensive literature search on all miRNA/cancer-related publications
   during 2001–2017 in PubMed database by utilizing the NCBI API tool.
   First, we searched for articles that mentioned the predicted miRNA-
   mRNA pairs in the corresponding cancer type, e.g., let-7a-5p-HMGA1 in
   breast cancer. Then we examined all 27,586 returned articles to confirm
   if the experimental evidence for a direct interaction is strong or not
   based on the experimental assays that have been used. Note that the
   negative expression correlation is not considered as strong evidence of
   direct interaction in our analysis. As a result, 43 miRNA-mRNA
   interactions associated with the nine types of cancer have been
   validated (Table [120]4). It is not surprising that those interactions
   are related to a few well-known cancer-associated miRNAs such as
   let-7a, miR-181a, and 200c, which reflects the reality that most of
   current experimental efforts are still on the assessment of
   well-reported targets in various cancer conditions. We believe
   computational prediction like ours can bring more new targets in this
   direction.

Table 4.

   Detailed information of validated interactions from literatures.
     microRNAs     Genes         Cancers        Techniques
   hsa-let-7a-5p   DICER1  BRCA^[121]74,[122]75 1, 4
                   HMGA1                        2, 4
   hsa-miR-10a-5p  EPHA4   LIHC^[123]76         2, 4
   hsa-miR-17-5p   HBP1    BRCA^[124]77,[125]78 1, 2, 4
                   MYC
                   UBE2C   STAD^[126]79
   hsa-miR-181a-5p DDX3X   UCEC^[127]80         2, 4
   hsa-miR-193b-3p CCND1   PAAD^[128]81         1, 2, 4
                   YWHAZ
                   CCND1   STAD^[129]82         3, 4
   hsa-miR-196a-5p FOXO1   LUAD^[130]83         1, 4
                   HOXA7   LUAD^[131]84         3, 4
                   HOXB8   LUAD/LUSC^[132]84
                   HOXC8
   hsa-miR-196b-5p HOXC8   BRCA^[133]85         1, 3, 4
                           LUSC^[134]85
   hsa-miR-200c-3p ZEB1    BRCA^[135]86         1, 2, 3, 4
                           KIRC/KIRP^[136]87
                           LIHC^[137]88
                           LUAD/LUSC^[138]89
                           PAAD^[139]90
                           STAD^[140]91
                           UCEC^[141]92
   hsa-miR-210-5p  ISCU    LIHC^[142]93         1, 3, 4
                           LUAD/LUSC^[143]94
   hsa-miR-221-3p  ATXN1   BRCA^[144]95–[145]97 1, 3, 4
                   BCL2L11
   hsa-miR-222-3p  PPP2R2A LUSC^[146]98,[147]99 3, 4
   hsa-miR-23b-3p  PSAP    LUSC^[148]100        1, 2, 3, 4
   hsa-miR-26a-5p  HSPA8   BRCA^[149]101        2, 4
   hsa-miR-30c-5p  MTDH    LUAD/LUSC^[150]102
                   RAB18   LUAD^[151]103
   hsa-miR-30d-5p  HOXA11  UCEC^[152]104        2, 4
   hsa-miR-320a    MYC     LIHC^[153]105        1, 2, 4
                   VDAC1   UCEC^[154]106
   hsa-miR-424-5p  WEE1    KIRC^[155]107        1, 2, 4
   hsa-miR-744-5p  TGFB1   KIRC/ KIRP^[156]108  2, 4
   [157]Open in a new tab

   Validation Techniques: (1) Luciferase reporter assay, (2) quantitative
   RT-PCR, (3) Western blot analysis, (4) miRNA transfection.

Model Quality Assessment

   Robustness of the proposed method was assessed based on the consistency
   of regulator detection across datasets with various sample sizes. Take
   the 251 samples from stage I of KIRC as an example. For each size i
   (5<=i<= 251), we randomly generated or each integer i from 5 to 251
   (maximum sample size), we randomly generated 1000 testing data sets
   with the sample size. We conducted regulator selection for each gene
   based on the final model in the meta-Lasso regression analysis on each
   dataset. The consistency across models with different sample sizes was
   evaluated by comparing the similarity of the selected regulators.
   Figure [158]7A shows the comparisons on these selected regulators based
   on three genes (CARD11, JAK2, and HMGA1), and a high-level stability of
   our model with the increasing sample size was observed. Moreover, for
   miRNA regulator identification (Fig. [159]7B), we found that our method
   has reached the high-level consistency for CARD11 and HMGA1 (JAK2 were
   not regulated by miRNAs in this stage).

Figure 7.

   [160]Figure 7
   [161]Open in a new tab

   Consistency assessments of regulator selection regarding difference
   sample sizes of three genes: CARD11 (red, 28 potential regulators),
   JAK2 (blue, 73), and HMGA1 (green, 109). The lines denote the
   transition Jaccard’s similarity of the regulators selected at different
   sample sizes of these genes. (A) Consistency of all regulator selection
   across difference sample sizes. (B) Consistency of microRNA regulator
   selection across difference sample sizes.

   To further demonstrate the performance of miRDR, we also conducted a
   peer comparison with two other well-established methods: RACER^[162]57
   and MCMG^[163]58, on miRNA target prediction using expression data. The
   test dataset was collected from 332 patients in LIHC, available at the
   TCGA website^[164]59, along with 34,986 experimental miRNA-target
   interactions detected by CLEAR-CLIP in Hepatocellular carcinoma cell
   (Huh-7.5)^[165]22. We utilized three methods to predict the
   miRNA-target interactions in each stage of LIHC. First, we found that
   RACER and MCMG predicted significant more interactions than miRDR
   (average number of predicted interactions: miRDR 1,634; RACER 10,371;
   MCMG 212,171) but less experimental validated interactions among the
   top ranked ones. When taking a close look at each stage, we ranked the
   predicted interactions from RACER and MCMG according to the score and
   compare with top-n predictions from miRDR at 100-intervals
   (Fig. [166]8A). It is clear that miRDR outperformed two other methods
   by identifying significantly more validated interactions in each stage
   (p-value of one-sided Wilcoxon singed rank test is shown in the
   figure). Moreover, unlike RACER and MCMG, miRDR’s performance was not
   affected by the input sample size. For example, RACER and MCMG
   predicted only half validated targets in smaller sets (in stage II
   (82), III (80) and IV (6)) than what they did in a bigger set (stage I
   (164)) among top-1000 predictions. At another evaluation, we compared
   the distribution of miRNA target count predicted by three methods,
   using the experimental data from CLEAR-CLIP as reference. As
   Fig. [167]8B shows, the distribution of miRDR is very close to the
   experimental one, while RACER and MCMG tend to predict too many more
   targets for most miRNAs.

Figure 8.

   [168]Figure 8
   [169]Open in a new tab

   (A) Method comparison on the identified miRNA-target interactions on
   the test set. MiRDR, RACER and MCMG were applied on the Hepatocellular
   carcinoma (LIHC) data. The x-axis represents the top ranked interaction
   predicted by miRDR while the y-axis represents the count of validated
   interactions. The p-values were calculated by one-sided Wilcoxon singed
   rank test; (B) Method comparison based on the counts of identified
   miRNA targets. X-axis: number of target genes; y-axis: miRNA
   percentage. Outer plot shows the overall distribution and the inner
   shows the zoom-in view of partial distribution in the range of [0,
   200].

   Although Lasso regression method is capable of handling the problem of
   more variable than samples^[170]60, we still recommend a minimum number
   of samples for each gene for using our meta-Lasso regression model in
   order to reach high confidence in the inferred miRNA regulators. Note
   these thresholding values also depend on the numbers of possible
   regulators each gene can have (Supplementary Table [171]S8).

Software and database access

   We have developed a microRNA Dynamic Regulation Database (miRDR) that
   archives all the identified miRNA regulatory modules in each cancer
   online at [172]http://sbbi.unl.edu/miRDR. An R package implementing the
   proposed method is available for download from our miRDR webpage.

Discussion

   In this study, we presented two important characteristics of miRNA
   regulation. Dynamic regulation refers to the fact that the same miRNAs
   interact with different targets under different conditions.
   Theoretically speaking, factors contributing to the dynamic property of
   miRNA regulation may include the distinct expression of miRNA and their
   target, potential SNPs involved in the binding site, DNA translocation
   in the gene region, competitive binding between miRNA and other RNA
   binding proteins. On the other hand, modularized regulation describes
   that miRNAs may cooperate when they target to the same gene or same
   functional process across different conditions. Intuitively speaking,
   when multiple miRNAs target the same process, especially when multiple
   genes are controlled (Fig. [173]6, Table [174]5 and Supplementary
   Table [175]S6), they render higher efficacy in terms of functional
   regulation.

Table 5.

   Detailed statistics of samples from nine cancer types.
   Cancer Types Normal Stage I Stage II Stage III Stage IV
   Breast invasive carcinoma BRCA 104 181 601 242 20
   Kidney renal clear cell carcinoma KIRC 71^a 251 55 125 80
   Kidney renal papillary cell carcinoma KIRP 71^a 169 22 49 14
   Liver hepatocellular carcinoma LIHC 50 164 82 80 6
   Lung adenocarcinoma LUAD 58^b 276 122 84 25
   Lung squamous cell carcinoma LUSC 58^b 227 151 80 6
   Pancreatic adenocarcinoma PAAD 4 21 147 4 4
   Stomach adenocarcinoma STAD 32 55 119 174 42
   Uterine Corpus Endometrial Carcinoma UCEC 33 331 48 121 28
   [176]Open in a new tab

   ^aThe same set of normal samples were used in two type of Kidney
   cancers;

   ^bThe same set of normal samples were used in two type of Lung cancers.

   Through heterogeneous genomic data integration, modeling, and
   adjustments of gene expression based on other regulatory mechanisms
   such as TF and CNV, we have detected 14,554 conditional miRNA-gene
   interactions in 9 types of cancer. The discrepancy between the inferred
   interactions from our pipeline and the CLASH-reported interactions may
   be explained by the fact that the former reported conditional
   interactions highly associated with cancers while the latter reported
   interactions associated with AGO1 protein in HEK293 cell. To generalize
   the CLASH reported miRNA-gene interactions to different cell types, we
   also designed a filtering process to minimize the possibility of
   overuse. Gene expression and miRNA expression have been assessed to
   ensure both molecules have a meaningful expression in the cell of
   interest. We also checked if any somatic mutation occurs at the CLASH
   reported binding site on mRNA sequence in the new cell line. The design
   of miRNA regulatory score enables an efficient differentiation among
   miRNA-mRNA interactions. In contrast to the existing methods that
   evaluate the interactions based on simple features such as binding
   affinity or complementary sequences, our miRNA scoring system learned
   and considered more features directly from experimental data, which is
   expected to be more reliable.

   Comparing with two leading algorithms for miRNA target prediction based
   on expression profiles, miRDR has showed its advantageous predictive
   power with two major improvements. First, comparing with other methods,
   miRDR’s predictions tend to include more validated interactions while
   not introducing significant high false prediction. This is an important
   aspect given the widely-accepted critique on high false positive rate
   in current target prediction. Another advantage was the stable
   performance of miRDR on fairly limited sample size. The meta-analysis
   design minimized the impacts of sample variation on the target
   prediction, which seems promising to facilitate the detection of weak
   signals between miRNA and targets with insufficient samples.

   Overall, our knowledge of modularized miRNA regulation is still very
   limited. Along this line, Ding et al. identified 181 modules based on a
   simple statistical analysis on CLASH reported interactions^[177]24. In
   this study, we have identified 4,134 functional miRNA modules based on
   the consistent miRNA co-regulation across different cancer conditions,
   where all previously reported modules are covered. Note the previous
   study focused on miRNAs that target the same gene while our method
   assesses comprehensively all modules involved in similar functional
   processes, which therefore can be more general. For example, one novel
   5-miRNAs module, namely miR-196a-5p, -196b-5p, -365a-3p, -378a-5p and
   -30b-5p, has been identified to regulate cell cycle and MAPK/Wnt
   singling pathways in kidney and uterine cancers in our study. It was
   also conceived as a functional module by Trajkovski et al. where they
   uncovered that these miRNAs cooperatively promote the brown
   adipogenesis pathway in mice^[178]61.

   In addition, since new sequencing technologies such as CLASH and CLIP
   can produce non-overlapped data on miRNA interactome, future inclusion
   of interactions detected in different platforms and cell types is
   expected. The concern of minimum sample size for reliable detection of
   regulators for each gene using our meta-lasso regression approach is
   actually a common concern of all similar analysis. We envision
   controlling the initial set of regulators including both TFs and miRNAs
   may alleviate this concern.

Conclusion

   In summary, we proposed an integrative computational method to identify
   reliable miRNA-gene interactions in a given biological condition based
   on the integration of heterogeneous genomic data. In order to
   stratifying complex gene regulation networks, our approach considers
   other gene-expression regulatory factors including TF regulation, CNV,
   and DNA methylation. Applying the meta-lasso regression approach to 9
   types of cancer, we have examined in-depth all possible miRNA-mRNA
   interactions and regulatory modules that are involved in different
   phase of cancer progression. The observed properties of the identified
   interactions further demonstrated the dynamic feature of miRNA
   regulation during tumor progression. We hope the method developed in
   this study can provide the community a promising solution to overcoming
   scientific limitations in the study of dynamic miRNA regulation in
   complex systems by generating novel insights in dynamic targeting and
   dramatically reducing the extensive lab-load in related research areas.

Methods

Data preparation

   Data used in this study are all publically available, including (1) raw
   sequencing reads of miRNA-mRNA interactions from the CLASH
   experiment^[179]21 ([180]GSE50452); (2) DNA binding data of 150 TFs
   from large-scale ChIP-Seq analysis by the Encyclopedia of DNA Elements
   (ENCODE) Consortium^[181]45; (3) multi-level genomic data from 4,206
   patients and 9 types of cancer from Cancer Genome Atlas (TCGA)^[182]62
   including miRNA and mRNA expressions, CNVs, and DNA methylation data
   (Table [183]5). In addition, for each cancer patient, stage and subtype
   information was also retrieved from TCGA.

CLASH data reanalysis

   To assess the detection confidence level of CLASH reported binding
   sites, we first reanalyzed the raw sequencing data from CLASH and
   performed reads mapping against human reference genome (HG19) and human
   mature miRNA sequences database (miRBase^[184]63, version 21). Then,
   three types of reads were examined for each miRNA-mRNA binding site in
   the CLASH experiment including (1) chimeric reads that represent
   miRNA-mRNA binding sites; (2) miRNA single reads that represent free
   miRNAs, which are unbound with any mRNAs; (3) mRNA single reads that
   represent unbound mRNA sequences, which are unbound with any miRNAs.

   Binomial test was applied on each binding site to infer the detection
   confidence level of a specific binding site k between miRNA i and the
   mRNA j. Specifically, the chimeric reads count of interactions between
   miRNA i and mRNA j at the binding site k, C [ijk] was assumed to follow
   a binomial distribution
   [MATH:
   <msub><mrow><mi>C</mi></mrow><mrow><mi>i</mi><mi>j</mi><mi>k</mi></mrow
   ></msub><mo>∼</mo><mi>B</mi><mi>i</mi><mi>n</mi><mi>o</mi><mi>m</mi><mi
   >i</mi><mi>a</mi><mi>l</mi><mspace width=".25em"></mspace><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>a</m
   i><mi>l</mi><mi>l</mi></mrow></msub><mo>,</mo><mspace
   width=".25em"></mspace><msub><mrow><mi>P</mi></mrow><mrow><mi>i</mi><mi
   >j</mi><mi>k</mi></mrow></msub></mrow><mo stretchy="true">)</mo></mrow>
   :MATH]
   under the null hypothesis, where P [ijk] (equation ([185]2)) represents
   the probability of a binding occurred between miRNA i and mRNA j at the
   binding site k. Binomial test was performed and the p-value for each
   miRNA-mRNA binding site was calculated by the formula
   [MATH:
   <mi>p</mi><mo>−</mo><mi>v</mi><mi>a</mi><mi>l</mi><mi>u</mi><msub><mrow
   ><mi>e</mi></mrow><mrow><mi>i</mi><mi>j</mi><mi>k</mi></mrow></msub><mo
   >=</mo><munderover><mo>∑</mo><mrow><msub><mrow><mi>C</mi></mrow><mrow><
   mi>i</mi><mi>j</mi><mi>k</mi></mrow></msub></mrow><mrow><msub><mrow><mi
   >C</mi></mrow><mrow><mi>a</mi><mi>l</mi><mi>l</mi></mrow></msub></mrow>
   </munderover><mo stretchy="false">(</mo><mtable columnspacing="1em"
   rowspacing="4pt"><mtr><mtd><msub><mrow><mi>C</mi></mrow><mrow><mi>a</mi
   ><mi>l</mi><mi>l</mi></mrow></msub></mtd></mtr><mtr><mtd><msub><mrow><m
   i>C</mi></mrow><mrow><mi>i</mi><mi>j</mi><mi>k</mi></mrow></msub></mtd>
   </mtr></mtable><mo
   stretchy="false">)</mo><msubsup><mrow><mi>P</mi></mrow><mrow><mi>i</mi>
   <mi>j</mi><mi>k</mi></mrow><mrow><mi>C</mi></mrow></msubsup><mo
   stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mrow><mi>P</mi></mrow
   ><mrow><mi>i</mi><mi>j</mi><mi>k</mi></mrow></msub><msup><mo
   stretchy="false">)</mo><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>a</
   mi><mi>l</mi><mi>l</mi></mrow></msub><mo>−</mo><mi>C</mi></mrow></msup>
   :MATH]
   1

   where C [all] is the count of chimeric reads supporting all
   interactions and the probability of a binding occurred between miRNA i
   and mRNA j at the binding site k is estimated by
   [MATH:
   <msub><mrow><mi>P</mi></mrow><mrow><mi>i</mi><mi>j</mi><mi>k</mi></mrow
   ></msub><mo>=</mo><mfrac><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>i
   </mi><mi>j</mi><mi>k</mi></mrow></msub></mrow><mrow><mi>S</mi><mi>u</mi
   ><msub><mrow><mi>m</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></mf
   rac><mo>×</mo><mfrac><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>i</mi
   ><mi>j</mi><mi>k</mi></mrow></msub></mrow><mrow><mi>S</mi><mi>u</mi><ms
   ub><mrow><mi>m</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow></mfrac>
   :MATH]
   2

   where Sum [i] and Sum [j] represent the total reads counts (chimeric
   reads and single reads) associated with miRNA i and mRNA j,
   respectively. Binding sites with Bonferroni-multiple-test-adjusted p
   values less than 0.05 were considered to be significant and included in
   the further analysis.

Regulatory score (RS) calculation of each miRNA-mRNA pair

   To assess the likelihood of interactions between a potential target
   mRNA and a miRNA, we designed a probability-based regulatory score. As
   introduced above, we first estimated the probability of an interaction
   event between miRNA i and mRNA j at the binding site k by the equation
   ([186]2), which we denoted as P [ijk].

   Then, raw RS of each miRNA-mRNA pair was calculated by the aggregation
   of the binding probability and the binding affinity, e.g., represented
   by the MFE^[187]51 of the corresponding interactions, as follows:
   [MATH:
   <mi>R</mi><msub><mrow><mi>S</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow
   ></msub><mo>=</mo><mfrac><mn>1</mn><mi>K</mi></mfrac><munderover><mo>∑<
   /mo><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>K</mi></munderover><
   mrow><mo stretchy="true">(</mo><mrow><mrow><mo
   stretchy="true">|</mo><mrow><mi>M</mi><mi>F</mi><msub><mrow><mi>E</mi><
   /mrow><mrow><mi>k</mi></mrow></msub></mrow><mo
   stretchy="true">|</mo></mrow><mo>×</mo><msub><mrow><mi>P</mi></mrow><mr
   ow><mi>i</mi><mi>j</mi><mi>k</mi></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow> :MATH]
   3

   where MFE [k] represents the minimum free energy of the binding site k
   and K is the total number of different binding sites between miRNA i
   and mRNA j.

   To mimic the situation of competition among mRNAs on the same miRNA,
   raw RS was normalized by the total number of potential targets
   associated with miRNA i:
   [MATH:
   <mi>R</mi><msup><msub><mrow><mi>S</mi></mrow><mrow><mi>i</mi><mi>j</mi>
   </mrow></msub><mo>′</mo></msup><mo>=</mo><mi>R</mi><msub><mrow><mi>S</m
   i></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>/</mo><mi>R</mi><m
   sub><mrow><mi>S</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>.</mo>
   :MATH]
   4

   where RS [i]. denotes the set of raw RS between all target j and miRNA
   i.

Regulatory score (RS) calculation of each TF-gene pair

   Similar to miRNAs, a TF may also have more than one binding site on a
   target gene. The RS for the TF t and gene j based on the binding site k
   was estimated based on the distance (d [tjk]) between the binding site
   k and the transcription start site (TSS)^[188]47,[189]64, as follows:
   [MATH:
   <mi>R</mi><msub><mrow><mi>S</mi></mrow><mrow><mi>t</mi><mi>j</mi><mi>k<
   /mi></mrow></msub><mo>=</mo><mi>exp</mi><mrow><mo
   stretchy="true">(</mo><mrow><mo>−</mo><mrow><mo
   stretchy="true">(</mo><mrow><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</m
   o><mfrac><mrow><mn>4</mn><msub><mrow><mi>d</mi></mrow><mrow><mi>t</mi><
   mi>j</mi><mi>k</mi></mrow></msub></mrow><mrow><msup><mrow><mn>10</mn></
   mrow><mn>5</mn></msup></mrow></mfrac></mrow><mo
   stretchy="true">)</mo></mrow></mrow><mo stretchy="true">)</mo></mrow>
   :MATH]
   5

   Because a smaller distance (d [tjk]) is associated with a higher
   regulatory potential of TF, RS for all binding sites of a TF and gene
   pair can be aggregated as follows^[190]47:
   [MATH:
   <mi>R</mi><msub><mrow><mi>S</mi></mrow><mrow><mi>t</mi><mi>j</mi></mrow
   ></msub><mo>=</mo><mn>1</mn><mo>−</mo><munderover><mo>∏</mo><mrow><mi>k
   </mi><mo>=</mo><mn>1</mn></mrow><mi>K</mi></munderover><mrow><mo
   stretchy="true">(</mo><mrow><mn>1</mn><mo>−</mo><mi>R</mi><msub><mrow><
   mi>S</mi></mrow><mrow><mi>t</mi><mi>j</mi><mi>k</mi></mrow></msub></mro
   w><mo stretchy="true">)</mo></mrow> :MATH]
   6

   To mimic the situation that the same TF can bind to more than one gene,
   raw RS was normalized by the formula
   [MATH:
   <mi>R</mi><msup><msub><mrow><mi>S</mi></mrow><mrow><mi>t</mi><mi>j</mi>
   </mrow></msub><mo>′</mo></msup><mo>=</mo><mi>R</mi><msub><mrow><mi>S</m
   i></mrow><mrow><mi>t</mi><mi>j</mi></mrow></msub><mo>/</mo><mi>R</mi><m
   sub><mrow><mi>S</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>.</mo>
   :MATH]
   where
   [MATH:
   <mi>R</mi><msub><mrow><mi>S</mi></mrow><mrow><mi>t</mi></mrow></msub><m
   o>.</mo> :MATH]
   is the sum of
   [MATH:
   <mi>R</mi><msub><mrow><mi>S</mi></mrow><mrow><mi>t</mi><mi>j</mi></mrow
   ></msub> :MATH]
   for a fixed t over all j’s.

Identification of the Conditional Gene Regulators through the Lasso
Regression

   In this study, we considered each stage of the nine types of cancer as
   an individual condition D [sc] (stage: s = 1,2,3,4; cancers:
   c = 1,2,…,9). For each gene under a specific condition, four matrices
   were constructed for a lasso regression approach to identify the
   conditional TF and miRNA regulators (Fig. [191]9) including the
   following:
     * The expression vector Y of size (n × 1) represents the expression
       changes of gene j in each of the n cancer patients associated with
       this condition versus healthy control. Here, in sample n, the
       expression changes of gene j and miRNA i were calculated by
       comparing with the average expression in the normal control group
       as follows:

   [MATH: <mi
   mathvariant="normal">Δ</mi><msub><mrow><mi>g</mi></mrow><mrow><mi>j</mi
   ><mi>n</mi></mrow></msub><mo>=</mo><msub><mrow><mi>log</mi></mrow><mrow
   ><mn>2</mn></mrow></msub><mrow><mo
   stretchy="true">(</mo><mrow><mrow><msub><mrow><mi>g</mi></mrow><mrow><m
   i>j</mi><mi>n</mi></mrow></msub></mrow><mo>/</mo><mrow><mspace
   width=".5em"></mspace><mover accent="true"><mrow><mspace
   width="-.25em"></mspace><msub><mrow><mi>g</mi></mrow><mrow><mi>j</mi><m
   space
   width=".5em"></mspace><mi>c</mi><mi>o</mi><mi>n</mi><mi>t</mi><mi>r</mi
   ><mi>o</mi><mi>l</mi></mrow></msub></mrow><mrow><mo
   stretchy="true">¯</mo></mrow></mover></mrow></mrow><mo
   stretchy="true">)</mo></mrow> :MATH]
       7

   [MATH: <mi
   mathvariant="normal">Δ</mi><msub><mrow><mi>m</mi></mrow><mrow><mi>i</mi
   ><mi>n</mi></mrow></msub><mo>=</mo><msub><mrow><mi>log</mi></mrow><mrow
   ><mn>2</mn></mrow></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>m</mi></mrow><mrow><mi>i</m
   i><mspace width="-.25em"></mspace><mspace
   width=".1em"></mspace><mi>n</mi></mrow></msub><mo>/</mo><mspace
   width=".5em"></mspace><mover accent="true"><mrow><mspace
   width="-.25em"></mspace><msub><mrow><mi>m</mi></mrow><mrow><mi>i</mi><m
   space
   width=".5em"></mspace><mi>c</mi><mi>o</mi><mi>n</mi><mi>t</mi><mi>r</mi
   ><mi>o</mi><mi>l</mi></mrow></msub></mrow><mrow><mo
   stretchy="true">¯</mo></mrow></mover></mrow><mo
   stretchy="true">)</mo></mrow> :MATH]
       8
          + where
            [MATH: <mspace width=".5em"></mspace><mover
            accent="true"><mrow><mspace
            width="-.25em"></mspace><msub><mrow><mi>g</mi></mrow><mrow><mi
            >j</mi><mspace
            width=".5em"></mspace><mi>c</mi><mi>o</mi><mi>n</mi><mi>t</mi>
            <mi>r</mi><mi>o</mi><mi>l</mi></mrow></msub></mrow><mrow><mo
            stretchy="true">¯</mo></mrow></mover> :MATH]
            and
            [MATH: <mspace width=".5em"></mspace><mover
            accent="true"><mrow><mspace
            width="-.25em"></mspace><msub><mrow><mi>m</mi></mrow><mrow><mi
            >i</mi><mspace
            width=".5em"></mspace><mi>c</mi><mi>o</mi><mi>n</mi><mi>t</mi>
            <mi>r</mi><mi>o</mi><mi>l</mi></mrow></msub></mrow><mrow><mo
            stretchy="true">¯</mo></mrow></mover> :MATH]
            denote the average expression levels of g [j] and m [i] among
            the normal control samples.
     * The background matrix X [1] of size n × 2 includes values of the
       copy number change and methylation of the gene j in each cancer
       compared to the healthy control;
     * The regulator matrix P = [P ^1,…, P ^n]^T of size n × (r [1] + r
       [2]) lists the expression changes of all regulators, including r
       [1]TFs and r [2] miRNAs, that interact with gene j, in each patient
       compared to the normal control;
     * The regulatory score matrix R of size 1 × (r [1] + r [2]) lists the
       RS of each corresponding regulator to the gene j derived from
       equations ([192]4) and ([193]6).

Figure 9.

   Figure 9
   [194]Open in a new tab

   Four matrices constructed for each gene j and each condition, i.e., a
   specific cancer stage. Y represents the vector of expression changes of
   gene j in the corresponding cancer patients under this condition; X [1]
   represents the matrix of background adjustment factors (CNV and DNA
   methylation); R represents a RS vector that contains the regulatory
   scores of all r [1] TF- and r [2] miRNA- regulators of gene j; P is the
   matrix that includes all the expression changes of each regulator in
   each patient under the corresponding condition.

   Based on the four matrices, a linear system is created as equation
   ([195]9), where X [2] = [P ^1 × R, …, P ^n × R]^T (inner products of R
   and each row in P) represents the regulatory effects by TFs and miRNAs.
   The regulatory scores in X [2] denote the impacts of corresponding
   regulator (column) to the gene in a specific sample (row). Before we
   solve the linear system in equation ([196]9), the Frisch–Waugh–Lovell
   (FWL) method^[197]47,[198]65,[199]66 was applied to adjust for the gene
   expression change caused by the background factors (CNV and
   methylation) through a linear transformation, and convert the linear
   system into equation ([200]10). Then, Lasso regression^[201]60,[202]67
   was used to identify the TFs and miRNAs that regulated the gene under
   the given condition.
   [MATH:
   <mi>Y</mi><mo>=</mo><msub><mrow><mi>β</mi></mrow><mrow><mn>1</mn></mrow
   ></msub><msub><mrow><mi>X</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>
   +</mo><msub><mrow><mi>β</mi></mrow><mrow><mn>2</mn></mrow></msub><msub>
   <mrow><mi>X</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>+</mo><mi>ε</m
   i> :MATH]
   9
   [MATH:
   <msub><mrow><mi>M</mi></mrow><mrow><msub><mrow><mi>X</mi></mrow><mrow><
   mn>1</mn></mrow></msub></mrow></msub><mi>Y</mi><mo>=</mo><msub><mrow><m
   i>M</mi></mrow><mrow><msub><mrow><mi>X</mi></mrow><mrow><mn>1</mn></mro
   w></msub></mrow></msub><msub><mrow><mi>β</mi></mrow><mrow><mn>2</mn></m
   row></msub><msub><mrow><mi>X</mi></mrow><mrow><mn>2</mn></mrow></msub><
   mo>+</mo><mrow><mi>ε</mi><mo>′</mo></mrow><mo>,</mo><mspace
   width=".25em"></mspace><mi>w</mi><mi>h</mi><mi>e</mi><mi>r</mi><mi>e</m
   i><mspace
   width=".25em"></mspace><msub><mrow><mi>M</mi></mrow><mrow><msub><mrow><
   mi>X</mi></mrow><mrow><mn>1</mn></mrow></msub></mrow></msub><mo>=</mo><
   mi>I</mi><mo>−</mo><msub><mrow><mi>X</mi></mrow><mrow><mn>1</mn></mrow>
   </msub><msup><mrow><mo
   stretchy="false">(</mo><msup><msub><mrow><mi>X</mi></mrow><mrow><mn>1</
   mn></mrow></msub><mi>T</mi></msup><msub><mrow><mi>X</mi></mrow><mrow><m
   n>1</mn></mrow></msub><mo
   stretchy="false">)</mo></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup><
   msup><msub><mrow><mi>X</mi></mrow><mrow><mn>1</mn></mrow></msub><mi>T</
   mi></msup> :MATH]
   10

Meta-Lasso regression for robust regulator identification

   To minimize the bias caused by limited cancer samples from TCGA and
   enhance the robustness in regulator selection, meta-Lasso
   regression^[203]68 was adopted on regulator identification of each gene
   under each condition. The workflow is shown in Fig. [204]1B where a
   hierarchical sampling and model integration was performed. Li et al.
   demonstrated that the meta-Lasso method has the good consistency of
   selection and low false predication rate on several high-dimensional
   gene expression data^[205]68. Here, we adopted the objective function
   of their meta-Lasso regressions to detect the most likely regulators of
   each gene with the consideration of consistency across of multiple
   “individual” dataset:
   [MATH:
   <munder><mrow><mi>max</mi></mrow><mrow><msub><mrow><mi>β</mi></mrow><mr
   ow><mn>1</mn></mrow></msub><mo>,</mo><mi>g</mi><mo>,</mo><mi>ζ</mi></mr
   ow></munder><mrow><mo
   stretchy="true">{</mo><mrow><munderover><mo>∑</mo><mrow><mi>m</mi><mo>=
   </mo><mn>1</mn></mrow><mi>M</mi></munderover><mrow><msub><mrow><mi>l</m
   i></mrow><mrow><mi>m</mi></mrow></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>β</mi></mrow><mrow><msub><m
   row><mi>m</mi></mrow><mrow><mn>0</mn></mrow></msub></mrow></msub><mo>,<
   /mo><mi>g</mi><mo>,</mo><msub><mrow><mi>ζ</mi></mrow><mrow><mi>m</mi></
   mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow><mo>−</mo><mi>w</mi><munderover><mo>∑</mo>
   <mrow><mi>p</mi><mo>=</mo><mn>1</mn></mrow><mi>P</mi></munderover><mrow
   ><mo
   stretchy="false">|</mo><msub><mrow><mi>g</mi></mrow><mrow><mi>p</mi></m
   row></msub><mo
   stretchy="false">|</mo><mo>−</mo><mi>λ</mi><munderover><mo>∑</mo><mrow>
   <mi>p</mi><mo>=</mo><mn>1</mn></mrow><mi>P</mi></munderover><mrow><mund
   erover><mo>∑</mo><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi><
   /munderover><mrow><mo
   stretchy="false">|</mo><msub><mrow><mi>ζ</mi></mrow><mrow><mi>m</mi><mi
   >p</mi></mrow></msub><mo
   stretchy="false">|</mo></mrow></mrow></mrow></mrow></mrow><mo
   stretchy="true">}</mo></mrow> :MATH]
   11

   where
   [MATH:
   <msub><mrow><mi>l</mi></mrow><mrow><mi>m</mi></mrow></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>β</mi></mrow><mrow><msub><m
   row><mi>m</mi></mrow><mrow><mn>0</mn></mrow></msub></mrow></msub><mo>,<
   /mo><mi>g</mi><mo>,</mo><msub><mrow><mi>ζ</mi></mrow><mrow><mi>m</mi></
   mrow></msub></mrow><mo stretchy="true">)</mo></mrow> :MATH]
   is the log-likelihood function of the m-th dataset; M denotes the
   number of individual datasets; g [p] is the effect of the p-th
   regulator (out of P regulators) at the overall condition; and ζ [mp] is
   the effect of the p-th regulator at the m-th dataset (out of M
   datasets).

   Specifically, among the samples under one cancer condition, two rounds
   of 5-fold data separation were performed to generate 25 individual
   subsets that each contains a randomly selected 64% (80% × 80%) of the
   entire samples (Fig. [206]1B). The final LASSO regression model was
   achieved by applying the meta-LASSO method to the hierarchical sampled
   subsets. Any none-zero regression coefficients in the final LASSO model
   indicate the interactions occurred between the corresponding regulators
   and genes.

   Subsequently, the same analysis was conducted for every human gene j in
   36 conditions across all cancers (4 × 9). As shown in the workflow
   (Fig. [207]1), we followed a hierarchical framework to identify
   cancer-specific regulations (based on the consistency across different
   stages of that cancer) and common cancer regulation (based on the
   consistency across all cancers) through similar model selection
   procedure as described above.

Assessment of the model stability through a robustness test

   To assess the robustness of this model, particularly its dependence on
   the sample size, we performed regulator identification with different
   sample sizes from p (the total number of potential regulators) to the
   total number of patients under that condition. We randomly generated
   1,000 sets of data to perform the analysis at each sample size n, then
   evaluate the consistency of the derived models by calculating the
   Jaccard’s similarity coefficient between selected regulators (L)
   between model F [n] (regulator selection using n samples) and its
   neighbor F [n-1] (using n-1 samples), as follows:
   [MATH: <mi>J</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>F</mi></mrow><mrow><mi>n</mi></m
   row></msub><mo>,</mo><mspace
   width=".25em"></mspace><msub><mrow><mi>F</mi></mrow><mrow><mi>n</mi><mo
   >−</mo><mn>1</mn></mrow></msub><mo
   stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mo
   stretchy="true">|</mo><mrow><mi>L</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>F</mi></mrow><mrow><mi>n</mi></m
   row></msub><mo stretchy="false">)</mo><mspace
   width=".25em"></mspace><mo>∩</mo><mrow><mi>L</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>F</mi></mrow><mrow><mi>n</mi><mo
   >−</mo><mn>1</mn></mrow></msub><mo
   stretchy="false">)</mo></mrow></mrow><mo
   stretchy="true">|</mo></mrow><mrow><mo
   stretchy="true">|</mo><mrow><mi>L</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>F</mi></mrow><mrow><mi>n</mi></m
   row></msub><mo stretchy="false">)</mo><mspace
   width=".25em"></mspace><mo>∪</mo><mrow><mi>L</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>F</mi></mrow><mrow><mi>n</mi><mo
   >−</mo><mn>1</mn></mrow></msub><mo
   stretchy="false">)</mo></mrow></mrow><mo
   stretchy="true">|</mo></mrow></mfrac> :MATH]
   12

Assessment of potential miRNA-gene interactions in a specific cancer

   Although the reported miRNA-gene interactions, e.g., 37,751 from
   miRTarBase^[208]56 database were validated in a particular experiment
   on a certain type of cell, the same physical binding between the
   participating miRNA and gene can still be valid in other cells as long
   as their sequences and expressions are conserved. With appropriate
   processing, we believe the interactions obtained from one experiment
   can be conditionally generalized to other cell types. Specifically, for
   each interaction, we conducted the following checkup when applying it
   to a new cell type:
     * both miRNA and gene must have meaningful expressions in the cell
       type of interest to maintain the activity level of the interaction;
     * the annotated binding sites shouldn’t show any mutations in the new
       cancer cell. Otherwise, we would no longer consider this binding
       site in the regulatory score calculation step.

   In this proof-of-concept study, we focused on the
   experimental-validated binding sites from CLASH due to the availability
   of raw sequencing read information that is required in our model for
   regulatory score calculation.

Detections of miRNA regulatory module based on functional involvement in
cancers

   Based on the conditional miRNA-gene interactions derived from all
   cancers and stages, we explored the modularized miRNA regulation by
   examining their involvement in known functional pathways, i.e., 171
   fundamental pathways from the KEGG database^[209]69. We exhaustively
   searched for all k-miRNA (k = 1–5) candidate modules where miRNAs
   consistently regulated the same set of pathways across multiple
   conditions and evaluated the significance in the following steps:
     * i.
       Under each condition, the miRNAs were linked to a set of KEGG
       pathways through their target genes. Then, the miRNAs were grouped
       by the commonality on the same pathway as candidate modules
       (k = 1–5);
     * ii.
       For each candidate module under one condition, a pathway enrichment
       analysis was conducted to ensure the involvement of each pathway is
       statistically significant through a test based on the
       hypergeometric distribution^[210]70:

   [MATH: <mi>Pr</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>w</mi></m
   row></msub><mo
   stretchy="false">|</mo><msub><mrow><mi>N</mi></mrow><mrow><mi>D</mi></m
   row></msub><mo>,</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>w</mi></mro
   w></msub><mo>,</mo><mi>n</mi><mo
   stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mrow><mo
   stretchy="true">(</mo><mrow><mtable><mtr><mtd><mrow><msub><mrow><mi>p</
   mi></mrow><mrow><mi>w</mi></mrow></msub><msub><mrow><mi>N</mi></mrow><m
   row><mi>D</mi></mrow></msub></mrow></mtd></mtr><mtr><mtd><mrow><msub><m
   row><mi>r</mi></mrow><mrow><mi>w</mi></mrow></msub></mrow></mtd></mtr><
   /mtable></mrow><mo stretchy="true">)</mo></mrow><mrow><mo
   stretchy="true">(</mo><mrow><mtable><mtr><mtd><mrow><mo
   stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mrow><mi>p</mi></mrow
   ><mrow><mi>w</mi></mrow></msub><mo
   stretchy="false">)</mo><msub><mrow><mi>N</mi></mrow><mrow><mi>D</mi></m
   row></msub></mrow></mtd></mtr><mtr><mtd><mrow><mi>n</mi><mo>−</mo><msub
   ><mrow><mi>r</mi></mrow><mrow><mi>w</mi></mrow></msub></mrow></mtd></mt
   r></mtable></mrow><mo stretchy="true">)</mo></mrow></mrow><mrow><mo
   stretchy="true">(</mo><mrow><mtable><mtr><mtd><mrow><msub><mrow><mi>N</
   mi></mrow><mrow><mi>D</mi></mrow></msub></mrow></mtd></mtr><mtr><mtd><m
   i>n</mi></mtd></mtr></mtable></mrow><mo
   stretchy="true">)</mo></mrow></mfrac> :MATH]
       13
       where N [D] denotes the total number of miRNA-regulated genes in
       the condition D, P [w] represents the percentage of human genes
       associated with pathway w, n indicates the total number of target
       genes associated with the module and r [w] is the number of genes
       belong to pathway w.
     * iii.
       A Binomial test was applied to assess the statistical significance
       of the involvement of |w|-pathways in a miRNA module M

   [MATH:
   <msub><mrow><mi>w</mi></mrow><mrow><mi>M</mi><mi>D</mi></mrow></msub><m
   o>∼</mo><mi>B</mi><mi>i</mi><mi>n</mi><mi>o</mi><mi>m</mi><mi>i</mi><mi
   >a</mi><mi>l</mi><mo stretchy="false">(</mo><mo
   stretchy="false">|</mo><mi>W</mi><mo
   stretchy="false">|</mo><mo>,</mo><mspace
   width=".5em"></mspace><msub><mrow><mi>P</mi></mrow><mrow><mi>W</mi><mi>
   D</mi></mrow></msub><mo stretchy="false">)</mo> :MATH]
       14

   [MATH:
   <msub><mrow><mi>P</mi></mrow><mrow><mi>W</mi><mi>D</mi></mrow></msub><m
   o>=</mo><mfrac><mrow><munderover><mo>∑</mo><mrow><mi>w</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>171</mn></mrow></munderover><msub><mrow><mo
   stretchy="false">(</mo><mi>T</mi><mi>o</mi><mi>t</mi><mi>a</mi><mi>l</m
   i><mspace
   width="0.25em"></mspace><mi>n</mi><mi>u</mi><mi>m</mi><mi>b</mi><mi>e</
   mi><mi>r</mi><mspace
   width="0.25em"></mspace><mi>o</mi><mi>f</mi><mspace
   width="0.25em"></mspace><mi>M</mi><mi>o</mi><mi>d</mi><mi>u</mi><mi>l</
   mi><mi>e</mi><mi>s</mi><mspace
   width="0.25em"></mspace><mi>w</mi><mi>i</mi><mi>t</mi><mi>h</mi><mspace
   width="0.25em"></mspace><mi>s</mi><mi>i</mi><mi>z</mi><mi>e</mi><mspace
   width="0.25em"></mspace><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mrow><mi>w</mi><mi>D</mi></mrow></msub><
   /mrow><mrow><mrow><mo stretchy="false">|</mo></mrow><mi>W</mi><mrow><mo
   stretchy="false">|</mo></mrow><mo stretchy="false">(</mo><mtable
   columnspacing="1em"
   rowspacing="4pt"><mtr><mtd><mi>K</mi></mtd></mtr><mtr><mtd><mi>k</mi></
   mtd></mtr></mtable><mo stretchy="false">)</mo></mrow></mfrac> :MATH]
       15
       where k denotes the number of miRNAs in the module while K is the
       total number of miRNAs in this analysis, i.e., 371; P [WD] denotes
       the overall probability of a k-miRNA group co-involves a common
       pathway under condition D; w [MD] represents the number of enriched
       pathways associated with the k-miRNA module M under condition D; |
       W | is the total number of functional pathways.
     * iv.
       Another Binomial test was utilized to examine the statistical
       significance of the recurrence of the candidate module M across
       conditions.

   [MATH:
   <msub><mrow><mi>d</mi></mrow><mrow><mi>M</mi></mrow></msub><mo>∼</mo><m
   i>B</mi><mi>i</mi><mi>n</mi><mi>o</mi><mi>m</mi><mi>i</mi><mi>a</mi><mi
   >l</mi><mo stretchy="false">(</mo><mo
   stretchy="false">|</mo><mi>D</mi><mo
   stretchy="false">|</mo><mo>,</mo><mspace
   width=".5em"></mspace><msub><mrow><mi>P</mi></mrow><mrow><mi>k</mi></mr
   ow></msub><mo stretchy="false">)</mo> :MATH]
   16
   [MATH:
   <msub><mrow><mi>P</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>=</mo><m
   frac><mrow><munderover><mo>∑</mo><mrow><mi>D</mi><mo>=</mo><mn>1</mn></
   mrow><mrow><mn>36</mn></mrow></munderover><msub><mrow><mo
   stretchy="false">(</mo><mi>T</mi><mi>o</mi><mi>t</mi><mi>a</mi><mi>l</m
   i><mspace
   width="0.25em"></mspace><mi>n</mi><mi>u</mi><mi>m</mi><mi>b</mi><mi>e</
   mi><mi>r</mi><mspace
   width="0.25em"></mspace><mi>o</mi><mi>f</mi><mspace
   width="0.25em"></mspace><mi>M</mi><mi>o</mi><mi>d</mi><mi>u</mi><mi>l</
   mi><mi>e</mi><mi>s</mi><mspace
   width="0.25em"></mspace><mi>w</mi><mi>i</mi><mi>t</mi><mi>h</mi><mspace
   width="0.25em"></mspace><mi>s</mi><mi>i</mi><mi>z</mi><mi>e</mi><mspace
   width="0.25em"></mspace><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mrow><mi>D</mi></mrow></msub></mrow><mro
   w><mrow><mo stretchy="false">|</mo></mrow><mi>D</mi><mrow><mo
   stretchy="false">|</mo></mrow><mo stretchy="false">(</mo><mtable
   columnspacing="1em"
   rowspacing="4pt"><mtr><mtd><mi>K</mi></mtd></mtr><mtr><mtd><mi>k</mi></
   mtd></mtr></mtable><mo stretchy="false">)</mo></mrow></mfrac> :MATH]
   17

   where k and K are defined in (14); P [k] denotes the overall
   probability of a k-miRNA module associated with common pathways across
   conditions; d [M] represents the number of conditions where the k-miRNA
   module M is associated with common pathways; |D| is the total number of
   conditions.

   Finally, we calculated False Discovery Rate (FDR) q-values from
   p-values using Benjamini’s method^[211]71. Module candidates with
   q-values < 0.05 were reported as miRNA regulatory modules, i.e. we
   controlled FDR at 0.05 in our integrative report of all miRNA module
   candidates with different k’s.

Electronic supplementary material

   [212]miRDR code^ (11.5KB, txt)
   [213]Supplementary Information^ (416.6KB, pdf)
   [214]Supplementary Table S2^ (2.3MB, xls)
   [215]Supplementary Table S4^ (66.5KB, xls)
   [216]Supplementary Table S5^ (68.5KB, xls)
   [217]Supplementary Table S6^ (715.5KB, xls)
   [218]Supplementary Table S8^ (18.5MB, xls)
   [219]Supplementary Table S9^ (189.5KB, xls)

Acknowledgements