Abstract

Background

   MicroRNAs (miRNAs) are small non-coding RNAs that bind messenger RNAs
   and promote their degradation or repress their translation. There is
   increasing evidence of miRNAs playing an important role in alcohol
   related disorders. However, the role of miRNAs as mediators of the
   genetic effect on alcohol phenotypes is not fully understood. We
   conducted a high-throughput sequencing study to measure miRNA
   expression levels in alcohol naïve animals in the LXS panel of
   recombinant inbred (RI) mouse strains. We then combined the sequencing
   data with genotype data, microarry gene expression data, and data on
   alcohol-related behavioral phenotypes such as ’Drinking in the dark’,
   ’Sleep time’, and ’Low dose activation’ from the same RI panel.
   SNP-miRNA-gene triplets with strong association within the triplet that
   were also associated with one of the 4 alcohol phenotypes were selected
   and a Bayesian network analysis was used to aggregate results into a
   directed network model.

Results

   We found several triplets with strong association within the triplet
   that were also associated with one of the alcohol phenotypes. The
   Bayesian network analysis found two networks where a miRNA mediates the
   genetic effect on the alcohol phenotype. The miRNAs were found to
   influence the expression of protein-coding genes, which in turn
   influences the quantitative phenotypes. The pathways in which these
   genes are enriched have been previously associated with alcohol-related
   traits.

Conclusion

   This work enhances association studies by identifying miRNAs that may
   be mediating the association between genetic markers (SNPs) and the
   alcohol phenotypes. It suggests a mechanism of how genetic variants are
   affecting traits of interest through the modification of miRNA
   expression.

Electronic supplementary material

   The online version of this article (10.1186/s12864-018-5004-3) contains
   supplementary material, which is available to authorized users.

   Keywords: microRNA, Ethanol, Bayesian network, Systems genetics

Background

   Non-coding RNAs are defined as biologically functional RNAs that are
   not translated into proteins. They have a variety of functions
   including the regulation of the expression of protein coding genes.
   microRNAs (miRNA) are small non-coding RNAs that bind messenger RNAs
   (mRNA) and promote degradation or repress translation by
   post-transcriptional silencing of the target mRNA [[39]1]. The seed
   region of miRNAs often binds to the 3’ untranslated region (UTR) of the
   target transcripts to down-regulate their translation [[40]2], but in
   some cases miRNAs can also positively regulate the mRNA expression
   [[41]3, [42]4].

   miRNAs have been found to have important roles for many complex traits
   [[43]5]. The role of miRNAs in alcohol-related phenotypes and addiction
   is an emerging research area. There is evidence of miRNA expression
   being associated with both alcohol dependence and alcohol tolerance
   [[44]6–[45]8] and miRNAs have been suggested to be ‘master regulators’
   of alcohol abuse [[46]9]. However, little is known about how miRNAs
   regulate such complex traits. The influence of genetic background on
   alcohol-related traits, shown by several behavioral Quantitative Trait
   Loci (bQTL) studies [[47]10, [48]11], and on miRNA expression shown by
   miRNA expression Quantitative Trait Loci (eQTL) studies [[49]12,
   [50]13], is well recognized. It is reasonable to propose that some
   miRNAs may have a mediating role in the genetic predisposition to
   alcohol-related phenotypes. It can be extremely useful to understand
   such mediation for developing miRNA-based therapy [[51]14]. The area of
   miRNA therapeutics is fairly new, but rapidly developing
   [[52]14–[53]18]. Prediction models that can estimate or infer the trait
   level based on levels of expression of related miRNAs could help
   identify therapeutic targets.

   There has been some general work on miRNA-mediated effects in the
   context of a liver gene regulatory network [[54]19], but there is a
   lack of research on this systems genetics aspect to understand the
   nature of miRNA-mediated effects in alcohol research. A study of
   miRNA-mediated genetic effects requires a statistical framework to
   combine genotype data, miRNA expression data and phenotype data. It is
   also reasonable to include data on expression of protein-coding genes,
   since there is increasing evidence of association of both miRNA and
   gene expression levels with the alcohol-related phenotypes [[55]6,
   [56]20–[57]27]. In this work, we propose a method to integrate these
   different data types to understand the mediating role of miRNAs.

   We have used data from a recombinant inbred (RI) mouse panel known as
   LXS (Inbred Long Sleep × Inbred Short Sleep) [[58]28]. This panel,
   originally developed to study differential sensitivity to ethanol’s
   sedative effects, has previously been used for bQTL analysis of various
   alcohol related traits. We used genotype, miRNA expression, expression
   of protein coding genes and multiple alcohol-related phenotypes from
   the LXS mouse panel to understand the nature of the miRNA-mediated
   genetic effect on predisposition to alcohol-related phenotypes. Because
   both the miRNA and gene expression level were measured on naïve mice
   (no ethanol exposure), associations represent predisposing factors
   rather than responses to alcohol. For the sake of simplicity, we will
   use the term ‘gene’ to mean ‘protein coding gene’ for the rest of the
   paper.

   The traditional methods [[59]29] to test mediation with potentially
   multiple mediators (miRNA and gene) have several drawbacks. The most
   important problem is that it is not possible to learn from the data
   whether the mediation is serial or parallel, and if assumed serial,
   what the order of the mediators should be. For example, for a
   correlated miRNA-gene pair, we do not know for certain if miRNA
   expression influences gene expression or it is the other way around.
   The situation can be more complicated when there are multiple miRNAs
   and/or multiple genes. Bayesian network models can be used to overcome
   such limitations of a frequentist mediation approach [[60]30]. We have
   used a Bayesian network analysis to learn the network structure and
   make predictions. This analysis included necessary adjustments to the
   usual structure learning methods to make them compatible with our data.
   The results from such analysis contribute to a better understanding of
   the miRNA mediated genetic effect.

   We report multiple miRNA mediated pathways from genotype to phenotype
   using Bayesian network analysis. In addition we investigated the genes
   targeted by the miRNAs and the pathways in which they are enriched.
   Finally, we have included the predicted changes in the associated
   phenotypes when the miRNA expression level changes. Such predictions
   were possible due to the nature of our statistical modeling and can be
   helpful for future in silico testing and targeted drugs.

Methods

Animals

   The LXS RI panel reported by Markel et al. [[61]31] originally
   consisted of 77 strains. This sample size is larger than most RI
   panels, which helps to achieve higher statistical power. The panel was
   generated from crosses between the ILS (Inbred Long Sleep, abbreviated
   to L) and ISS (Inbred Short Sleep, abbreviated to S) strains of mice.
   ILS and ISS mice were reciprocally intercrossed and the F1 mice are
   intercrossed to produce F2 progeny. Pairs of the F2 mice are then
   repeatedly inbred to produce the inbred lines (See [[62]32] for
   details). RI panels have the advantage of being virtually renewable in
   the sense that they can be genotyped once and used for behavioral,
   physiological, and molecular phenotyping repeatedly over many
   generations since mice within a strain can be considered to be
   genetically identical [[63]33]. This enables us to use existing
   phenotype data from different sources as long as they correspond to the
   same LXS strains. However, none of the original datasets from the
   different sources had all the 77 strains. The genotype data, mRNA
   expression data, and miRNA expression data consisted of 66, 60, and 59
   strains, respectively, with 59 strains belonging to all three.
   Therefore, when conducting the analysis for each phenotype, we only
   used the strains that were present in all genomic datasets (i.e. miRNA,
   mRNA and SNPs) as well as that phenotype data. The list of the strains
   available for each dataset is reported in Additional file [64]1: Table
   S1.

Alcohol related phenotypes

   We have used existing data on three different alcohol-related
   phenotypes for this study. For the rest of the paper the term
   ‘phenotype’ denotes ’alcohol-related phenotype’. Descriptive figures
   showing the distribution of the phenotypes across the strains and the
   pairwise associations between the three phenotypes are included in the
   Supplementary Materials (Additional file [65]1: Figures S1 and S2).

Drinking in the dark (DID)

   In a free-choice ethanol consumption paradigm, mice are given limited
   access (for two hours) to 20% ethanol during the early phase of their
   circadian dark cycle for four consecutive days. The previously
   published DID phenotype was measured in grams of ethanol per kilogram
   of body weight in male mice on the third day of alcohol consumption
   [[66]25]. This phenotype is related to ‘binge drinking’. The DID data
   consisted of 38 strains, but had 33 strains that were also in the three
   genomic datasets.

Low dose activation (LDA)

   For LDA, the mice (male) were given ethanol or saline injections on
   different days. Difference in total distance traveled in centimeters
   between the day when ethanol injection (1.8 g/kg) is given and the day
   when an injection of saline is given was measured as LDA [[67]34]. This
   phenotype is generally thought to be a measure of sensitivity to low
   dose of ethanol. The LDA data consisted of 72 strains, but had 57
   strains that were also in the three genomic datasets.

Loss of righting reflex (LORR)

   LORR, also known as Sleep Time, is the phenotype the LS (Long Sleep)
   and SS (Short Sleep) mice were originally selected for. Mice were given
   an intraperitoneal dose of ethanol and placed on their backs in a
   v-shaped tray. LORR was measured by the difference in time (minutes)
   between the regain of the righting reflex and the time of the initial
   loss of the righting reflex [[68]35]. Both male and female mice were
   used for this experiment and the strain means were reported. LORR is a
   measure of the sensitivity of an animal to the hypnotic effects of a
   high dose of ethanol. The LORR data consisted of 76 strains, but had 58
   strains that were also in the three genomic datasets.

miRNA expression data

   The miRNA expression dataset used was obtained from a subset of the
   panel with multiple mice per strain. Animal breeding was conducted in
   the specific pathogen-free facility at the Institute for Behavioral
   Genetics, Boulder, CO. A total of 175 male mice (59 LXS strains, 2–3
   biological replicates per strain) were sacrificed using CO[2]
   inhalation followed by decapitation during the light phase and had
   total RNA extracted from whole brain tissues and fragments between
   20–35 bp were selected during the library preparation. Libraries were
   sequenced on the Illumina HiSeq 2500 platform using single-end 50 base
   pair reads [[69]36]. The trimmed reads were mapped using a novel k-mer
   matching method [[70]37] to quantify the number of sequencing reads per
   individual miRNA. Using this method SNPs for each strain were accounted
   for in the individual reference miRNAs so that there would not be a
   mapping bias against those miRNAs. In addition, we used miRDeep2 to
   identify 362 putative novel miRNAs using the clipped reads and mouse
   miRBase v20. Following mapping and quantitation, filtering of miRNAs
   with low counts (less than 5 samples having at least 10 counts),
   normalization and batch correction were performed [[71]36]. The
   filtered data consisted of 881 miRNAs including 86 novel miRNAs. A
   variance stabilizing transformation (VST) [[72]38] was used to
   transform the read count data to address heteroscedasticity. For our
   analysis, we collapsed the observations within a strain using the
   average VST expression across the biological replicates.

Genotype data

   Existing genotype data on the same 59 LXS strains were available from
   Yang et al. [[73]39]. The genotype data identified approximately 40000
   different Single Nucleotide Polymorphisms (SNPs), but only 34642 were
   informative with different alleles for the parental strains and with no
   missing data. Many of the 34642 SNPs had the same Strain Distribution
   Pattern (SDP). Two SNPs are defined to have the same SDP if they are in
   complete linkage disequilibrium, i.e., there is no recombination in any
   of the strains between the two SNPs. The data were summarized to 1416
   unique SDPs. We used these SDPs for all the statistical analysis but
   reported the original physical locations of the SNPs corresponding to
   each SDP. The list of all SDPs and summary of the information on SNPs
   corresponding to them is reported in Additional file [74]2: Table S2.

Gene expression data

   For measures of brain mRNA expression levels, the public data set on 60
   LXS strains and the two parental strains (n=4 to 6 male mice per
   strain) was downloaded as Affymetrix Mouse Exon 1.0 ST Array
   (Affymetrix, Santa Clara, CA) CEL files from the PhenoGen website
   ([75]http://phenogen.ucdenver.edu; [[76]11, [77]40]). The probe mask
   described previously in Vanderlinden et al. [[78]11] was used to
   eliminate low integrity probes, i.e., probes that that did not align
   uniquely to the mm10 version of the mouse genome or aligned to a region
   of the genome that harbored a sequence polymorphism between either
   parental strain and the C57BL/6J reference strain. The remaining probe
   sets were compared to the Ensembl GRCm38/mm10 version of the
   transcriptome in mouse. Probe sets targeting the same Ensembl gene were
   aggregated into a single expression estimate on the log base 2 scale
   for each sample using the rma-sketch pipeline for normalization and
   aggregation in Affymetrix Power Tools [[79]41, [80]42]. Normalized
   expression estimates were adjusted for batch effects using ComBat
   [[81]43] and all results are reported at the gene level. For our
   analysis, we selected the 59 strains for which miRNA data are available
   and collapsed the RNA expression levels of the individual mice within
   strains using the average expression level for each strain.

Statistical analysis

Identification of ‘cohesive’ quadruples

   The summary of the statistical analysis is shown in Fig. [82]1. We
   first selected candidate SDP-miRNA-gene triplets for which all the
   pairwise Pearson correlations between the three variables are strong
   (nominal p-value <10^−3, step ([83]a) in Fig. [84]1). We call them
   ‘cohesive’ triplets. Then we calculated the correlation of the
   phenotypes with all three variables of the selected candidate triplets.
   SDP-miRNA-gene-phenotype quadruples for which the phenotype is
   significantly associated (nominal p-value <0.05, step ([85]b) in
   Fig. [86]1) with each of the three components of a chosen triplet are
   selected as ‘cohesive’ quadruples. We used a more stringent p-value
   threshold for the molecular traits since a more direct relationship is
   expected between them. The list of all ‘cohesive’ quadruples is
   reported in Additional file [87]3: Table S3 and the distribution of the
   correlation between the phenotypes and the ‘cohesive’ triplets is shown
   by Additional file [88]1: Figure S3.

Fig. 1.

   Fig. 1
   [89]Open in a new tab

   Analytical pipeline to identify miRNA-mediated netowrks associated with
   alcohol-related phenotypes. The various steps in the flowchart are a
   Select triplets for which all 3 variables are strongly correlated
   (p<10^−3) with each other; b Select quadruples for which the phenotype
   is significantly associated (p<0.05) with each of the 3 components of a
   chosen ‘cohesive’ triplet; c Bayesian Network Analysis separately for
   each quadruple: Select the quadruple for next step if the best network
   using the quadruple has a miRNA mediating the effect of the SNP on the
   phenotype (See details in Fig. [90]2); d Advanced Bayesian networks:
   miRNAs and genes that were associated with the same phenotype and an
   SDP from the same region of the genome were combined into larger
   networks

   Due to the large number of correlated variables involved in the
   analysis, it is extremely difficult to adjust for multiple testing
   within this filtering step, and it remains an open problem how to make
   such adjustment. However, we argue that our use of nominal p-values in
   the selection of cohesive quadruples is unlikely to result in many
   false positives in the final results due to the stringent threshold
   used in the later stages. We did not use standard genome-wide cutoff
   since our focus is on identifying miRNA mediating networks, and this is
   used merely the first step to choose interesting ‘cohesive’ triplets
   for the subsequent filtering. We evaluated the QQ-plots for all the 3
   pairwise associations within a triplet to determine where the points
   deviated from the diagonal line and chose a significance threshold of
   10^−3, which strikes a balance across all 3 pairwise associations
   (Additional file [91]1: Figure S4).

Initial Bayesian Network Analysis

   Next, Bayesian Network Analysis (BNA) was performed to learn network
   structure and discover potential miRNA mediated networks (step ([92]c)
   in Fig. [93]1). We performed BNA separately for each quadruple. When
   directing edges within the network we forced any edge with the SDP to
   be directed away from the SDP and every edge with the phenotype to be
   directed towards the phenotype. Since SDPs are genetic elements encoded
   in the mouse genome, it is realistic to use them as ‘causal anchors’ by
   the principle of Mendelian randomization [[94]44, [95]45]. The
   phenotype, on the other hand, can only be a response and cannot affect
   the miRNA or gene expression since the expression data are obtained
   from naive mice. There are 96 possible network structures for each
   quadruple satisfying these properties.

   A Bayesian Information Criterion (BIC) score-based network learning
   procedure was adopted. The BIC score is defined as
   [MATH: <mtext
   mathvariant="italic">BIC</mtext><mo>=</mo><mo>log</mo><mo>(</mo><mtext
   mathvariant="italic">Likelihood</mtext><mo>)</mo><mo>−</mo><mfrac><mrow
   ><mi>d</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>log</mo><mo>(</mo>
   <mi>n</mi><mo>)</mo> :MATH]
   1

   where n is the sample size and d is the number of parameters of the
   whole network. A higher value of BIC indicates greater support for the
   model.

   For each quadruple, we performed an exhaustive score-based search
   across all 96 possible network structures to determine whether the
   network structure with the highest score has the miRNA as a mediator
   between the SDP and the phenotype. In order to have a high confidence
   about the mediating role of the miRNA, we also compared it to the
   highest score among networks where the miRNA is not a mediator. We
   chose the SDP, miRNA and phenotype from the current quadruple for the
   next step only if the BIC clearly favored the model with the miRNA as
   the mediator, i.e., the difference between the two scores is greater
   than 2 (Fig. [96]2, Step 1). It is typically recommended that a
   difference of 1 in the BIC scores is needed to claim one model to be
   better than another [[97]46] (Note that BIC as defined in the
   referenced paper [[98]46] differs from our definition by a constant).
   If the quadruple met this criterion, we also evaluated the inclusion of
   the gene within the path from SDP to phenotype. For this comparison, we
   were more permissive with the difference in BIC scores (Fig. [99]2,
   Step 2). The intent of our analysis was to identify miRNA-mediated
   effects, therefore our criterion for the mediating effect of the miRNA
   was conservative. With genes, we were more concerned with missing a
   biologically relevant member of a pathway that has met our initial
   stringent BIC criteria. For a gene to be included, we compared the
   highest BIC among networks with both the miRNA and the gene in the path
   between the SDP and the phenotype to the highest BIC among networks
   with the miRNA in the path from SDP to phenotype but not the gene.
   Using a non-inferiority framework, we require the difference in BIC to
   be greater than −1 for the inclusion of the gene in the final model
   (Fig. [100]2, Step 2). These choices of SDP, miRNA, gene and phenotype
   are not dependent on any optimization algorithm since we conducted
   exhaustive search among all possible networks.

Fig. 2.

   [101]Fig. 2
   [102]Open in a new tab

   Identification of miRNA mediated quadruples (Step c in Fig. [103]1).
   For each ‘cohesive’ SDP-miRNA-gene-phenotype quadruple, we perform the
   following steps. Step 0: Compute the BIC scores for all possible
   network structures that satisfy scientific assumptions. The network
   structure with the highest score can be considered as the ’most
   probable’ network for this quadruple. Step 1: We choose the SDP, miRNA
   and Phenotype for building larger network if the quadruple passes the
   threshold for the BIC-difference in this step. Step 2: Also choose the
   gene for building larger network if the quadruple passes the threshold
   for the BIC-difference in this step

   We used methods for Gaussian Bayesian Network within the R-package
   bnlearn [[104]47] for the purpose of structure learning. It is a common
   practice to assume that the normalized gene expression and phenotypes
   follow normal distribution. Also, the miRNA expression obtained from
   sequencing was transformed using a variance stabilizing transformation
   [[105]38] which can then be treated as Gaussian. The variable SDP is
   binary and cannot be transformed to a normal random variable.
   Therefore, we modified the network learning methods to accommodate
   hybrid Bayesian networks [[106]48]. The modification of the likelihood
   is simple due to the fact that the binary random variable SDP is always
   the causal anchor and the likelihood only involves its unconditional
   density.

Combining chosen quadruples to obtain larger network

   Finally, we combined all miRNAs and genes which are associated with the
   same phenotype and also associated with SDPs from the same region of
   the chromosome (within 40 megabases from each other) into bigger
   networks (step ([107]d) in Fig. [108]1). The SDPs physically located
   near one another were combined since SNPs that are physically close
   with each other often show a high Linkage Disequilibrium (LD) pattern
   and associations of a trait (behavioral or molecular) with multiple
   such SNPs may just be due the fact that the SNPs are highly correlated
   with each other. Therefore, we believe that such physically close SDPs
   are likely to be part of the same larger network. However, the miRNAs
   and genes did not necessarily have to be physically close to the SDPs.
   At this final stage the more complex networks are learned using a
   hill-climbing algorithm [[109]49]. To ensure that the learned
   structures are stable, we used bootstrapping to repeat the structure
   learning for 500 bootstraps and used network averaging to combine the
   results [[110]50, [111]51]. Network averaging retains an edge and a
   direction if it appears in more than 50% of the 500 cases. It is
   possible for some nodes to be not connected to any other nodes in the
   network, such disconnected nodes are not presented as part of the final
   network.

Finding pathways enriched for genes with binding sites for the mediating
miRNAs

   Using predicted/validated target databases, we also examined the miRNAs
   from the final networks to obtain the pathways in which their
   predicted/validated target genes are enriched. We used the tools
   multiMiR [[112]52], miRmap [[113]53] and DIANA-miRPath v3.0 [[114]54]
   for the analysis.

Results

   We obtained 2916 candidate ‘cohesive’ quadruples for which the SDP,
   miRNA, gene and phenotype were strongly associated with each other.
   Thirty nine quadruples included DID, 2231 included LDA and 646 included
   LORR. Figure [115]3 illustrates the association of the four variables
   (SDP, miRNA, gene and phenotype) with each other for two such
   quadruples. In the first example, LDA is negatively correlated with the
   expression of the gene Ano5 (Anoctamin 5) and the miRNA miR-7057-5p,
   and positively associated with the ISS allele (Fig. [116]3[117]a). In
   the second example, LORR is negatively correlated with the expression
   of the gene Terf2 (Telomeric repeat binding factor 2) and the ISS
   allele, but positively correlated with the novel miRNA
   (Fig. [118]3[119]b).

Fig. 3.

   [120]Fig. 3
   [121]Open in a new tab

   Quantitative relationships between SNP-miRNA-gene-phenotypes quadruples
   contained within the final network models. Scatter plots of the gene
   expression (in log base 2 scale) and miRNA expression in the causal
   pathway with a LDA b LORR. The color of the points represent the ISS
   (red) or ILS (blue) alleles for the associated SDP. The value of the
   correlation coefficient is printed on the top of each scatter plot, the
   p-values being smaller than the thresholds shown by Fig. [122]1 in each
   case

   Three networks were obtained from our final step of the analysis, of
   which one was not miRNA-mediated (Additional file [123]1: Figure S1),
   when all SDP, miRNA, and genes were included in the model. Of the two
   miRNA-mediated networks, the first network involves the miRNA
   miR-7057-5p (Fig. [124]4), targeting the gene, Ano5, which in turn
   influences the level of the phenotype LDA. The network also involves
   the gene Nell1 (Neural EGFL Like 1). The miRNA, the SDP, and the genes
   are all located on chromosome 7. Predicted target sites for miR-7057-5p
   in the 3’ UTRs of both Ano5 and Nell1 are found using miRmap [[125]53].
   The pathway enrichment analysis found two pathways, Cell Adhesion
   Molecules and Extracellular Matrix Receptor Interaction, in which the
   genes predicted to be targeted by the miRNA miR-7057-5p are enriched
   (Table [126]1).

Fig. 4.

   [127]Fig. 4
   [128]Open in a new tab

   Bayesian network for Low Dose Activation (LDA). a A Bayesian network
   for miRNA mediated genetic effect on LDA. The thickness of the arrows
   represent the proportion (to scale) of bootstraps for which the edge is
   present (ranging from 0.53 to 0.99) and the darkness of the arrows
   represent the proportion of bootstraps for which the edge has the same
   direction (ranging from 0.54 (gray) to 1 (black)). b Illustration of
   the relative locations of the SDPs, genes and miRNA in the mouse genome
   (not to scale). The start position of the genes and miRNA are reported.
   The location of an SDP indicates the range of the original physical
   locations of the SNPs with the same SDP

Table 1.

   Pathways enriched for genes with known binding sites to miR-7057-5p
   miRNA       Pathway                 FDR    Genes targeted by miRNA
   miR-7057-5p Cell adhesion molecules 0.0031 Itga6, Itgb2, Cd34, Itgav,
                                              Cadm3, Itgb1, Pdcd1
               Extracellular Matrix    0.0063 Itga6, Sv2c, Itgav, Itgb1
               Receptor Interaction
   [129]Open in a new tab

   DIANA-miRPath v3.0 was used for the analysis. No pathways were enriched
   for the novel miRNA in Fig. [130]5

   The second network (Fig. [131]5) involves a novel miRNA (mature
   sequence: CGGGACACCTGAGCTGCCTCTCCT) targeting the gene Terf2, which in
   turn influences the level of the phenotype LORR. The novel miRNA was
   not found to be homologous to any other known miRNA upon a homology
   search using SSEARCH and BLASTN in miRbase (E-value cutoff 1). The
   miRNA, the SDP, and the gene are all located near one another on
   chromosome 8 indicating local regulation (i.e. eQTL) of both the miRNA
   and the mRNA. The proportions of the bootstrap samples where the edges
   were detected are particularly high (>0.85 for presence and >0.99 for
   direction of the edges) in the network involving LORR which provides a
   high confidence about the learned network. The exact values of the
   proportions are reported in Additional file [132]1: Figures S6 and S7.
   For both networks, there are some SDPs that are not connected with any
   other node in the network, and they are not shown in the final
   networks.

Fig. 5.

   [133]Fig. 5
   [134]Open in a new tab

   Bayesian network for Loss Of Righting Reflex (LORR). a A Bayesian
   network for miRNA mediated genetic effect on LORR. The thickness of the
   arrows represent the proportion (to scale) of bootstraps for which the
   edge is present (ranging from 0.85 to 1) and the darkness of the arrows
   represent the proportion of bootstraps for which the edge has the same
   direction (ranging from 0.99 to 1). b Illustration of the relative
   locations of the SDPs, genes and miRNA in the mouse genome (not to
   scale). The start position of the genes and miRNA are reported. The
   location of an SDP indicates the range range of the original physical
   of the SNPs with the same SDP

   Using the fitted Bayesian network models, we determined the predicted
   changes in the phenotypes for 2-quartile or 4-quartile change in the
   normalized miRNA or gene expression (Fig. [135]6). We note that the
   activation of the corresponding genes reduces the magnitude of both LDA
   or LORR, i.e. decreases the alcohol sensitivity. However, the
   corresponding miRNA miR-7057-5p is positively associated with the gene
   Ano5 for LDA while the novel miRNA is negatively associated with the
   gene Terf2 for LORR. Therefore, the effect of the mediating miRNA on
   the phenotype is positive for LORR and negative for LDA.

Fig. 6.

   [136]Fig. 6
   [137]Open in a new tab

   Illustration of prediction based on the fitted Bayesian network models.
   The figure illustrates the changes in the phenotypes when the
   expression of the gene or miRNA is increased from the first quartile to
   the third quartile (2-Quartile difference) or from the minimum to
   maximum (4-Quartile difference). The bar indicates the change in the
   phenotype. a LDA: The miRNA is miR-7057-5p and the gene is Ano5. b
   LORR. The miRNA is a novel miRNA and the gene is Terf2

   It is important to note that all of these network structures were
   learned, and we did not force the phenotype (or any other variable) to
   have a connected edge in the network. It is possible to not have any
   miRNA-mediation in the final Bayesian network model even when all the
   variables are strongly associated. For example, for some of the
   candidate genes and miRNAs, when the cohesive quadruples were combined
   for a larger network, the larger network did not have any connected
   edge with the corresponding phenotype, LORR (Additional file [138]1:
   Figure S5).

Discussion

   With the increase in the volume of high-throughput omics data and the
   advent of less expensive sequencing technologies, more researchers are
   looking to integrate data of different types to learn more about
   functional mechanisms. Our statistical framework based on BNA is an
   effective way to incorporate the different types of data in an unified
   analysis instead of conducting separate analyses such as miRNA eQTL,
   gene eQTL and bQTL for alcohol phenotypes. BNA can be considered as a
   way of decomposing a large joint probability distribution, but it can
   also serve as a causal probability network model [[139]55]. The method
   enables us to determine miRNA-mediated predisposition, and the use of
   BNA ensures that we learn the network structure rather than making
   prior assumption about the direction of the edges. Although in our
   final models, the genes, the miRNA, and the SNPs were co-localized,
   this co-localization was not forced in the procedure and the method
   should be equally able to detect distal regulations, when present.

   Special care was taken to identify the networks for which we have a
   strong evidence about the mediating role of the miRNA. The thresholds
   used for BIC differences are arbitrary but commonly used. We used a
   more stringent threshold for including the mediating miRNA in the final
   model to make sure that the miRNA is indeed playing a mediating role,
   while we used a permissive threshold for including a gene to make sure
   we do not exclude a relevant gene. The use of bootstrap for learning
   the structure of the larger network minimizes the effect of sampling
   fluctuation and potential outliers on the final models. We also used a
   random starting network approach where we used different networks as
   starting points instead of starting from an empty network [[140]50,
   [141]51], and the results were similar (data not shown). The use of
   bootstrap was not necessary for the final network obtained in
   Fig. [142]5 since we could use the exhaustive search and therefore have
   more confidence about the network learned.

   We obtained two different miRNA-mediated networks involving the
   phenotypes LDA and LORR. No miRNA-mediated networks involving DID was
   found, which is likely to be due to the much smaller sample size for
   the DID data (only 33 strains compared to 57 and 58 for LDA and LORR,
   respectively). Ano5, the gene targeted by the miRNA miR-7057-5p and
   associated with LDA, encodes a protein which is likely a calcium
   activated chloride channel (CACC). CACCs are known to be associated
   with hypnotic ethanol responses in rats after a high dose of ethanol
   [[143]56]. LDA is also a measure of sensitivity to alcohol, although
   the effect of high dose and low dose of alcohol could be different. It
   has been shown by other studies that LORR (sensitivity to high dose)
   and LDA (sensitivity to low dose) have an inverse relationship, and it
   is possible that they have similar genetic sources of variation
   [[144]34]. The predicted target genes for miR-7057-5p are enriched in
   the pathways ‘Cell Adhesion Molecules’ (CAM) and ‘Extracellular Matrix
   (ECM) receptor Interaction’. CAMs and ECM Receptors are known to be
   associated with diseases or nervous system and brain, and addiction
   including alcohol use disorders [[145]57–[146]60]. The second network
   involved the gene Terf2 targeted by a novel miRNA and associated with
   LORR. Terf2 have been shown to have important role in telomere
   homeostasis and brain development in mice [[147]61]. We have also
   reported the predicted change in the phenotypes for a fixed change in
   the normalized miRNA expression or the gene expression (Fig. [148]6).
   The direction and magnitude of the predicted change can be helpful for
   developing targeted drugs.

   The results have the same limitations of any in vitro study. However,
   the results from this paper provide strong candidates for future
   validation. The stringent threshold for the BIC difference implies high
   confidence about the mediating role of the miRNAs we reported. For the
   LORR results, we must assume that the genetic sources of variation in
   the phenotype are the same in male and female mice since the data
   contained average measurement for male and female mice while every
   other dataset were based on male mice only. There is also an amount of
   uncertainty (of causal SNP) introduced by linkage disequilibrium. The
   causal SNP may not be uniquely identifiable since many SNPs have the
   same SDP, and it may require additional studies for more accurate
   mapping.

Conclusions

   This work enhances association studies by identifying miRNAs that may
   be mediating the association between SNPs and an alcohol phenotype. We
   proposed a statistical approach that can identify different mechanisms
   of how genetic variants are affecting traits of interest through the
   modification of miRNA expression. In particular, we have identified two
   miRNA-mediated networks. We also incorporated gene expression data to
   better understand the functional mechanism and to evaluate alternative
   drug targets. The ability to detect miRNA-mediated effects and to
   predict the level of an alcohol related trait based on the miRNA
   expression provides an opportunity to identify targets for miRNA
   therapeutics.

Additional files

   [149]Additional file 1^ (1.1MB, pdf)

   Supplementary Materials. (PDF 1105 KB)
   [150]Additional file 2^ (82.5KB, csv)

   Supplementary Table 2. (CSV 83 KB)
   [151]Additional file 3^ (469.5KB, csv)

   Supplementary Table 3. (CSV 470 KB)

Acknowledgments