Abstract Background Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion. Results We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10^−100), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately −9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (−9435 to −8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions. Conclusion We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0460-0) contains supplementary material, which is available to authorized users. Keywords: Cardiac differentiation, Network inference, Logistic regression, Time-varying dynamic Bayesian model, Data integration, Gene regulatory network Background Decoding the temporal control of gene expression patterns is essential to understand the complex mechanism of developmental regulatory events during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions [[31]1-[32]6]. For example, in several studies, RNA-seq and histone modification ChIP-seq experiments were performed to profile the changes in global gene expression and the chromatin state at distinct stages of cardiac differentiation from ESCs to cardiomyocytes in human and mouse [[33]1,[34]3]. In these reports, the authors reported changes in chromatin modification patterns associated with gene activation and identified stage specific distal enhancer elements. He et al., outlined the candidate binding sites of five known cardiac transcription factors (TFs) (Gata4, Nkx2-5, Tbx5, Srf and Mef2a), which were identified using ChIP-seq [[35]2]. Moreover, Schlesinger et al. knocked down each of the four key cardiac transcription factors (Gata4, Mef2a, Nkx2-5 and Srf) in HL-1 cells using RNA interference, followed by the profiling of the changes in global gene expression [[36]4]. Although these studies presented a novel and global perspective for the examination of the chromatin status and the prediction of transcriptional regulation, they were limited in the types of data that were integrated [[37]1,[38]3] and they based their initial screening on a small set of candidate TFs [[39]2,[40]4]. As large-scale multi-dimensional data are accumulating at an unprecedented pace, there is a pressing need to develop systematic methods to integrate these data from individual studies and infer the dynamic gene regulatory networks (GRN) during cardiac differentiation in an unbiased manner. Time series expression profiles based on microarray and/or more recently RNA-seq data have been widely used to reconstruct the static networks, that is, networks with invariant topology over a given set of genes [[41]7-[42]11]. However, because the GRN at a particular time point depends on a specific biological context, it can undergo systematic rewiring rather than being invariant over time. Therefore, recent research has focused on inferring the dynamic (time-varying) networks over the time course [[43]1-[44]4,[45]12-[46]15]. A key technical hurdle to precisely reconstruct dynamic networks based solely on temporal expression data is that there are too many unknown variables to be estimated (i.e. (T-1)p ^2 network edges). Some attempts have been made to circumvent this difficulty including: factorizing gene-gene regulatory relationships into modular effects [[47]1,[48]3,[49]11,[50]14], deconvolving the observed indirect effects into direct effects [[51]2,[52]16], or smoothing the edge weight between the networks of neighboring time points [[53]4,[54]13,[55]17]. However, the overall performance of reconstructing GRN based solely on temporal expression profiles is still limited [[56]1,[57]3,[58]18]. One widely used strategy to infer the causal relationship in GRN is to over-express or repress the key TFs and measure the change in global expression. The significantly up- or down-regulated genes may be either directly or indirectly regulated by the perturbed TFs. This strategy has been successfully utilized and several examples include: the GRN in sea urchin embryonic development [[59]2,[60]4,[61]19], the early response of GRN in embryonic stem cells (ESC) [[62]7-[63]11,[64]20], and the cardiac GRN involving several key cardiac genes [[65]4]. Perturbation-based methods can, in theory, greatly improve the prediction accuracy for downstream targets, as compared with the methods solely based on temporal expression profiles [[66]18]. The limitation of this strategy is that it is unrealistic to perturb all TFs in the mammalian genome in a specific context and it is not easy to distinguish direct effects from indirect effects in the readout. The most common strategy used to discover the direct regulatory relationship is to combine the TF information and temporal expression profiles [[67]2,[68]12,[69]21-[70]23]. The general assumption is that a gene can be regulated by a TF if its promoter or enhancer regions are occupied by the TF. The TF binding sites (TFBS) within the putative regulatory region of a gene are identified by either scanning the known positional weight matrix (PWM) representing a relatively short (5–20 nucleotides) degenerative sequence motif recognized by a TF, or by TF ChIP-seq experiments. Although PWMs have been defined for the TFBSs of more than 500 TFs in vertebrates by various techniques [[71]24-[72]32], the sensitivity and specificity are generally low when used to predict putative binding sites [[73]33]. Alternatively, TF ChIP followed by sequencing or microarray analyses emerged as the standard approach to directly determine the bona fide TFBS. However, because ChIP-seq experiments are still relatively expensive and labor-intensive, and the TFBSs tend to vary in distinct biological contexts, for example, only 7.14% of enhancers identified in ESCs are overlapped with the enhancers in heart [[74]34-[75]37], the number of available TF ChIP-seq datasets is still limited. Moreover, for most TFs in the genome, there are no ChIP-seq datasets available. For example, in ChIPBase, only 12 and 5 TFs have corresponding ChIP-seq data in ESCs and cardiomyocyte HL-1 cells, respectively [[76]38]. At present, there is no consensus regarding whether ChIP-seq data obtained in one cell type can be readily applied to predict TFBS in another cell type. Moreover, it is unclear whether or not we can adapt the information from the available ChIP-seq results and predict the binding sites of TFs with only PWM information in a specific biological context (e.g. cell types or developmental stages). In lieu of profiling the binding sites of individual TFs, the general enhancers or regulatory regions have also been mapped by DNaseI hypersensitive sequencing experiments as well as ChIP-seq with p300, histone H3 Lys4 mono-methylation (H3K4me1), histone H3 Lys27 acetylation (H3K27ac) in a wide range of cell types [[77]39-[78]44] including mouse ESCs and the heart [[79]34-[80]36,[81]45]. The genomic loci defined by these marks, however, typically span several hundred or thousand bases, and are generally too broad to define the specific DNA sequences mediating promoter or enhancer functions. It has been proposed that local depletion in the ChIP signal intensity (dip) is indicative of TF binding sites [[82]41]. Thus, several studies have used the structural change of these active marks to discover the functional TFBS among the enhancer regions, either by heuristic methods [[83]1], or by more sophisticated approaches, such as an integrated hidden Markov model [[84]46], logistic regression [[85]47,[86]48], or a hierarchical mixture model [[87]49]. However, these studies usually focused on individual cell types. Moreover, they focus on static regulatory relations and do not fall under the framework of inferring dynamic gene regulatory networks. While each of the aforementioned strategies has its own merits, they also have limitations in the inability to capture the dynamic networks. An integrated approach for network inference, which combines the strengths of all these methods is highly desirable. In this study, we presented a framework to integrate available four-dimensional data: (1) temporal RNA-seq, (2) temporal histone ChIP-seq, (3) TF ChIP-seq and (4) perturbation studies to reconstruct the dynamic networks during cardiac differentiation. Our method not only infers the time-varying networks between distinct stages of heart development, but also identifies the TF binding sites on the promoter or enhancer of the genes being regulated. Results Overview We developed a two-step strategy to infer the dynamic GRN during cardiac differentiation (Figure [88]1). In the first step, based on 17 TFs whose ChIP-seq data are available for either mouse ESCs or cardiomyocyte HL-1 cells (Table [89]1), we trained a logistic regression model to predict the probability for any base being bound by any TFs with known PWMs, at a specific differentiation stage. The model included the context independent features that do not change during differentiation (e.g. base conservation) and context dependent features such as the expression levels of nearby genes, the intensity of histone modifications within defined distances, as well as histone modification changes between adjacent time points. This concept was modified from the work by Ernst et al. that infers a score quantifying the general binding preferences of TFBS [[90]48]. However, it should be noted that,