Abstract

Background

   Knowledge on the molecular targets of diseases and drugs is crucial for
   elucidating disease pathogenesis and mechanism of action of drugs, and
   for driving drug discovery and treatment formulation. In this regard,
   high-throughput gene transcriptional profiling has become a leading
   technology, generating whole-genome data on the transcriptional
   alterations caused by diseases or drug compounds. However, identifying
   direct gene targets, especially in the background of indirect
   (downstream) effects, based on differential gene expressions is
   difficult due to the complexity of gene regulatory network governing
   the gene transcriptional processes.

Results

   In this work, we developed a network analysis method, called
   DeltaNeTS+, for inferring direct gene targets of drugs and diseases
   from gene transcriptional profiles. DeltaNeTS+ uses a gene regulatory
   network model to identify direct perturbations to the transcription of
   genes using gene expression data. Importantly, DeltaNeTS+ is able to
   combine both steady-state and time-course expression profiles, as well
   as leverage information on the gene network structure. We demonstrated
   the power of DeltaNeTS+ in predicting gene targets using gene
   expression data in complex organisms, including Caenorhabditis elegans
   and human cell lines (T-cell and Calu-3). More specifically, in an
   application to time-course gene expression profiles of influenza A H1N1
   (swine flu) and H5N1 (avian flu) infection, DeltaNeTS+ shed light on
   the key differences of dynamic cellular perturbations caused by the two
   influenza strains.

Conclusion

   DeltaNeTS+ is a powerful network analysis tool for inferring gene
   targets from gene expression profiles. As demonstrated in the case
   studies, by incorporating available information on gene network
   structure, DeltaNeTS+ produces accurate predictions of direct gene
   targets from a small sample size (~ 10 s). Integrating static and
   dynamic expression data with transcriptional network structure
   extracted from genomic information, as enabled by DeltaNeTS+, is
   crucial toward personalized medicine, where treatments can be tailored
   to individual patients. DeltaNeTS+ can be freely downloaded from
   [33]http://www.github.com/cabsel/deltanetsplus.

   Keywords: Gene expression, Gene regulatory network, Gene targets, Drug
   discovery, Mechanism of action

Background

   Analyzing the molecular mechanism of drugs and diseases is a central
   task in drug discovery. For drugs, this task involves identifying the
   molecular targets whose interaction with the compound is associated
   with its pharmacological activity. The knowledge of the molecular
   mechanism of action (MOA) of a drug is important in ascertaining not
   only the therapeutic efficacy of this drug, but also any potential
   toxicity and side effects. On the other hand, insights into the
   molecular mechanism of a disease may lead to a better understanding of
   its pathogenesis and possible to new and better treatment formulations.
   In this regard, gene transcriptional profiling has emerged as a viable
   high-throughput platform for drug discovery and drug target
   identification [[34]1, [35]2] and for studying disease mechanism
   [[36]3]. However, determining the direct molecular targets through
   which a drug or a disease exert its effects using gene transcriptional
   profiles remains a major bioinformatics challenge. Changes in the gene
   expression caused by a drug or a disease may arise directly from the
   action of the drug or disease, or indirectly as downstream or secondary
   effects. Delineating direct and indirect gene targets from gene
   transcriptional profiles is further complicated by the fact that gene
   expression is a highly regulated process that involves a complex and
   context-specific gene regulatory network (GRN).

   Many strategies have been proposed for inferring gene targets from gene
   transcriptional data [[37]4]. In general, these strategies fall in two
   main categories: comparative analysis and network analysis methods. The
   former involves comparing the transcriptional profiles of interest with
   a library of reference profiles with known targets [[38]1, [39]2,
   [40]5, [41]6], under the assumption that a likeness in the
   transcriptional profiles is indicative of similarity in the molecular
   targets. The latter class of methods uses a model of GRN to account for
   gene transcriptional regulations when analyzing gene expression data. A
   number of network analysis methods, notably causal analysis [[42]7,
   [43]8] and DeMAND [[44]9], employs graph models of the GRNs with nodes
   representing the genes and edges representing gene–gene interactions.
   Here, gene target identification is typically formulated as a
   statistical hypothesis test using the gene transcriptional profiles.
   Another group of network analysis methods, including network
   identification by multiple regression (NIR) [[45]10], mode of action by
   network identification (MNI) [[46]11], sparse simultaneous equation
   model (SSEM) [[47]12], and DeltaNet [[48]13], relies on a mechanistic
   model of the gene transcriptional regulatory network. By invoking a
   pseudo-steady-state assumption, the inference of gene targets from gene
   expression data is recasted as a regression problem. Generally
   speaking, a gene is scored highly as a potential target when its
   expression deviates significantly from what the GRN model predicts
   based on the expression of its transcription factors (TFs).

   Here, we present a much-improved network analysis method of our
   previous algorithm DeltaNet [[49]13] to address two shortcomings:
   analysis of time-series data and incorporation of prior information on
   the GRN structure. As we have demonstrated earlier [[50]14], DeltaNet
   performed poorly when using time-series transcriptional data due to the
   invalidity of the underlying steady-state assumption. Such an issue
   likely applies to similar algorithms employing pseudo steady-state
   assumption, such as NIR, MNI and SSEM. There is a clear need to
   accommodate time-series expression data, as they constitute an
   important class of gene expression datasets. A notable example is the
   Connectivity Map (CMap), which includes over 1.5 million time-series
   gene expression profiles from 9 different cell types across roughly
   5000 small-molecule compounds and 3000 genetic reagents.

   The second shortcoming of any network analysis methods is the
   uncertainty in the gene network model employed in the analysis. In
   DeltaNet, as well as in NIR, MNI and SSEM, the GRN is reconstructed
   from the gene expression data as an intermediate step or implicitly in
   the inference. But, as we and many others have shown [[51]15, [52]16],
   such GRN reconstructions constitute an undetermined problem and thus
   the reconstructed GRN often has high uncertainty. In parallel, we have
   seen a tremendous progress over the past decade in the mapping of
   transcriptional regulatory elements [[53]17], the measurements of
   promoter/enhancer activity [[54]18], and the identification of TF
   binding motifs and binding sites [[55]19]. Such information has enabled
   the reconstruction of GRN graphs for ~ 300 cell types in human
   [[56]20]. Thus, there is an obvious opportunity and necessity to
   combine diverse datasets on the GRN structure with the gene
   transcriptional data within the network analysis paradigm for gene
   target inference.

   In this work, we developed DeltaNeTS+. DeltaNeTS+ is capable of
   combining steady-state and time-series gene transcriptional data for
   gene target scoring. DeltaNeTS+ is further able to incorporate prior
   information of the structure of the GRN for the target inference. We
   demonstrated the superiority of DeltaNeTS+ over well-established
   procedures, including TSNI (Time Series Network Identification)
   [[57]21], DeMAND [[58]9], and differential expression analysis, using
   gene transcriptional datasets of complex organisms, including C.
   elegans and human. Notably, the application of DeltaNeTS+  to gene
   expression datasets from human lung Calu-3 cells infected by influenza
   strains H5N1 (avian flu) and H1N1 (swine flu) reveals key differences
   in the cellular responses to the two strains.

Methods

DeltaNeTS+

   DeltaNeTS+ is a network analysis method for inferring causal gene
   targets from time series gene expression data. DeltaNeTS+ relies on an
   ordinary differential equation (ODE) model of gene transcriptional
   process, as follows [[59]22]:
   [MATH:
   <mrow><mfrac><mrow><mi>d</mi><msub><mi>r</mi><mi>k</mi></msub></mrow><m
   row><mi
   mathvariant="italic">dt</mi></mrow></mfrac><mo>=</mo><msub><mi>u</mi><m
   i>k</mi></msub><munderover><mo
   movablelimits="false">∏</mo><mrow><mrow><mtable><mtr><mtd><mrow><mi>j</
   mi><mo>=</mo><mn>1</mn></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow>
   <mrow><mi>j</mi><mo>≠</mo><mi>k</mi></mrow></mrow></mtd></mtr></mtable>
   </mrow></mrow><mi>n</mi></munderover><msubsup><mi>r</mi><mrow><mi>j</mi
   ></mrow><msub><mi>a</mi><mrow><mi
   mathvariant="italic">kj</mi></mrow></msub></msubsup><mo>-</mo><msub><mi
   >d</mi><mi>k</mi></msub><msub><mi>r</mi><mi>k</mi></msub></mrow> :MATH]
   1

   where r[k] denotes the concentration of gene k transcripts (mRNA), u[k]
   and d[k] denote the transcription and degradation rate constants of
   gene k, respectively, a[kj] denotes the regulatory control of gene j on
   gene k, and n is the number of genes. The magnitude and sign of a[kj]
   indicate the strength and mode of the regulatory interaction,
   respectively. A positive a[kj] indicates activation, while a negative
   a[kj] indicates repression. As commonly done, a[jj]’s are assumed to be
   zero (i.e., no self-regulatory loops). Taking pseudo steady-state
   assumption and logarithm transformation, Eq. ([60]1) can be simplified
   into,
   [MATH: <mrow><msub><mi>c</mi><mrow><mi
   mathvariant="italic">ki</mi></mrow></msub><mo>=</mo><munderover><mo
   movablelimits="false">∑</mo><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow>
   <mi>n</mi></munderover><msub><mi>a</mi><mrow><mi
   mathvariant="italic">kj</mi></mrow></msub><msub><mi>c</mi><mrow><mi
   mathvariant="italic">ji</mi></mrow></msub><mo>+</mo><msub><mi>p</mi><mr
   ow><mi mathvariant="italic">ki</mi></mrow></msub></mrow> :MATH]
   2

   where c[ki] = log[2](r[ki]/r[kb]) denotes the log-twofold change
   (log2FC) of mRNA levels of gene k between treatment sample i and
   corresponding control b, and p[ki] = log((u[ki]/d[ki]) / (u[kb] /
   d[kb])) denotes the effects of treatment in sample i on gene k. The
   variable p[ki] is the unknown variable of interest. The formulation in
   Eq. ([61]2) shows that the expression change of gene k by the treatment
   arises from the indirect effects through its transcriptional regulators
   (the summation in Eq. ([62]2)) and the direct effect (the last term in
   Eq. ([63]2)). More specifically, the variable p[ki] gives the amount of
   fold-change expression of gene k that cannot be explained by the
   fold-change expression of its transcriptional regulators. Here, we use
   p[ki] as a measure of the direct perturbations on gene k by the
   treatment. A positive (negative) p[ki] indicates that the treatment
   causes a higher (lower) level of gene k transcription than what is
   expected from the changes in the transcription factors or regulators of
   the gene. In general, the larger the magnitude of p[ki], the stronger
   is the perturbations to gene k in the sample i.

   As we often have more than one samples for a given treatment (e.g.,
   technical repeats, multiple time points), we may assign the same
   perturbations to all of the related samples. For M different treatments
   among m samples (M < m), the following equation in a matrix–vector
   format can be considered:
   [MATH: <mrow><mi mathvariant="bold">C</mi><mo>=</mo><mi
   mathvariant="bold">AC</mi><mo>+</mo><mi
   mathvariant="bold">PZ</mi></mrow> :MATH]
   3

   where C is the n × m matrix of log2FCs of n genes in m samples, A is
   the n × n matrix of a[kj]’s describing the GRN, P is the n × M matrix
   of perturbation coefficients, and Z is an M × m {0, 1} mapping matrix
   that assigns the specific perturbation variable p to the appropriate
   sample. Note that Z becomes an m × m identity matrix when the
   perturbations are treated as distinct among the entire m samples.

   For time-series data, we assume that mRNA concentrations vary between
   sampling time points such that
   [MATH: <mrow><mtext>log</mtext><mfenced close=")"
   open="("><msubsup><mi>r</mi><mrow><mi>k</mi></mrow><mrow><mi>l</mi><mo>
   +</mo><mn>1</mn></mrow></msubsup></mfenced><mo>=</mo><msubsup><mi>s</mi
   ><mrow><mi>k</mi></mrow><mi>l</mi></msubsup><mi>t</mi><mo>+</mo><mtext>
   log</mtext><mfenced close=")"
   open="("><msubsup><mi>r</mi><mrow><mi>k</mi></mrow><mi>l</mi></msubsup>
   </mfenced><mspace width="0.277778em"></mspace><mtext>for</mtext><mspace
   width="0.277778em"></mspace><msub><mi>t</mi><mi>l</mi></msub><mo>≤</mo>
   <mi>t</mi><mo>≤</mo><msub><mi>t</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn
   ></mrow></msub></mrow> :MATH]
   4

   where
   [MATH: <msub><mi>t</mi><mi>l</mi></msub> :MATH]
   denotes the l-th time point,
   [MATH: <msubsup><mi>r</mi><mrow><mi>k</mi></mrow><mi>l</mi></msubsup>
   :MATH]
   is the mRNA concentration of gene k at the l-th time point, and
   [MATH: <msubsup><mi>s</mi><mrow><mi>k</mi></mrow><mi>l</mi></msubsup>
   :MATH]
   is the slope between the two time points. Using Eq. ([64]1), the first
   order derivative of the logarithm of r[k] can be rewritten as the
   following:
   [MATH: <mrow><mfrac><mrow><mi>d</mi><mtext>log</mtext><mfenced
   close=")"
   open="("><msub><mi>r</mi><mi>k</mi></msub></mfenced></mrow><mrow><mi
   mathvariant="italic">dt</mi></mrow></mfrac><mo>=</mo><mfrac><msub><mi>u
   </mi><mi>k</mi></msub><msub><mi>r</mi><mi>k</mi></msub></mfrac><mundero
   ver><mo
   movablelimits="false">∏</mo><mrow><mrow><mtable><mtr><mtd><mrow><mi>j</
   mi><mo>=</mo><mn>1</mn></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow>
   <mrow><mi>j</mi><mo>≠</mo><mi>k</mi></mrow></mrow></mtd></mtr></mtable>
   </mrow></mrow><mi>n</mi></munderover><msubsup><mi>r</mi><mrow><mi>j</mi
   ></mrow><msub><mi>a</mi><mrow><mi
   mathvariant="italic">kj</mi></mrow></msub></msubsup><mo>-</mo><msub><mi
   >d</mi><mi>k</mi></msub><mo>=</mo><msub><mi>u</mi><mi>k</mi></msub><mo>
   exp</mo><mfenced close=")" open="("><mrow><munderover><mo
   movablelimits="false">∑</mo><mrow><mrow><mtable><mtr><mtd><mrow><mi>j</
   mi><mo>=</mo><mn>1</mn></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow>
   <mrow><mi>j</mi><mo>≠</mo><mi>k</mi></mrow></mrow></mtd></mtr></mtable>
   </mrow></mrow><mi>n</mi></munderover><msub><mi>a</mi><mrow><mi
   mathvariant="italic">kj</mi></mrow></msub><mo>log</mo><mfenced
   close=")"
   open="("><msub><mi>r</mi><mi>j</mi></msub></mfenced><mo>-</mo><mo>log</
   mo><mfenced close=")"
   open="("><msub><mi>r</mi><mi>k</mi></msub></mfenced></mrow></mfenced><m
   o>-</mo><msub><mi>d</mi><mi>k</mi></msub></mrow> :MATH]
   5

   By substituting Eq. ([65]4) for r[k] in Eq. ([66]5), the following
   equation can be derived for
   [MATH:
   <mrow><msub><mi>t</mi><mi>l</mi></msub><mo>≤</mo><mi>t</mi><mo>≤</mo><m
   sub><mi>t</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow>
   :MATH]
   :
   [MATH: <mrow><mfrac><mrow><mi>d</mi><mtext>log</mtext><mfenced
   close=")"
   open="("><msub><mi>r</mi><mi>k</mi></msub></mfenced></mrow><mrow><mi
   mathvariant="italic">dt</mi></mrow></mfrac><mo>=</mo><msub><mi>u</mi><m
   i>k</mi></msub><mo>exp</mo><mfenced close=")"
   open="("><mrow><munderover><mo
   movablelimits="false">∑</mo><mrow><mrow><mtable><mtr><mtd><mrow><mi>j</
   mi><mo>=</mo><mn>1</mn></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow>
   <mrow><mi>j</mi><mo>≠</mo><mi>k</mi></mrow></mrow></mtd></mtr></mtable>
   </mrow></mrow><mi>n</mi></munderover><msub><mi>a</mi><mrow><mi
   mathvariant="italic">kj</mi></mrow></msub><mfenced close=")"
   open="("><mrow><msubsup><mi>s</mi><mrow><mi>j</mi></mrow><mi>l</mi></ms
   ubsup><mi>t</mi><mo>+</mo><mtext>log</mtext><mfenced close=")"
   open="("><msubsup><mi>r</mi><mrow><mi>j</mi></mrow><mi>l</mi></msubsup>
   </mfenced></mrow></mfenced><mo>-</mo><mfenced close=")"
   open="("><mrow><msubsup><mi>s</mi><mrow><mi>k</mi></mrow><mi>l</mi></ms
   ubsup><mi>t</mi><mo>+</mo><mtext>log</mtext><mfenced close=")"
   open="("><msubsup><mi>r</mi><mrow><mi>k</mi></mrow><mi>l</mi></msubsup>
   </mfenced></mrow></mfenced></mrow></mfenced><mo>-</mo><msub><mi>d</mi><
   mi>k</mi></msub></mrow> :MATH]
   6

   The derivative of Eq. ([67]6) (i.e. the second derivative of log(r[k]))
   is zero under the pseudo steady-state assumption made in Eq. ([68]4).
   [MATH:
   <mrow><mfrac><mrow><msup><mi>d</mi><mn>2</mn></msup><mtext>log</mtext><
   mfenced close=")"
   open="("><msub><mi>r</mi><mi>k</mi></msub></mfenced></mrow><mrow><mi>d<
   /mi><msup><mi>t</mi><mn>2</mn></msup></mrow></mfrac><mo>=</mo><mn>0</mn
   ><mo>=</mo><msub><mi>u</mi><mi>k</mi></msub><mo>exp</mo><mfenced
   close=")" open="("><mrow><munderover><mo
   movablelimits="false">∑</mo><mrow><mrow><mtable><mtr><mtd><mrow><mi>j</
   mi><mo>=</mo><mn>1</mn></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow>
   <mrow><mi>j</mi><mo>≠</mo><mi>k</mi></mrow></mrow></mtd></mtr></mtable>
   </mrow></mrow><mi>n</mi></munderover><msub><mi>a</mi><mrow><mi
   mathvariant="italic">kj</mi></mrow></msub><mfenced close=")"
   open="("><mrow><msubsup><mi>s</mi><mrow><mi>j</mi></mrow><mi>l</mi></ms
   ubsup><mi>t</mi><mo>+</mo><mtext>log</mtext><mfenced close=")"
   open="("><msubsup><mi>r</mi><mrow><mi>j</mi></mrow><mi>l</mi></msubsup>
   </mfenced></mrow></mfenced><mo>-</mo><mfenced close=")"
   open="("><mrow><msubsup><mi>s</mi><mrow><mi>k</mi></mrow><mi>l</mi></ms
   ubsup><mi>t</mi><mo>+</mo><mtext>log</mtext><mfenced close=")"
   open="("><msubsup><mi>r</mi><mrow><mi>k</mi></mrow><mi>l</mi></msubsup>
   </mfenced></mrow></mfenced></mrow></mfenced><mfenced close=")"
   open="("><mrow><munderover><mo
   movablelimits="false">∑</mo><mrow><mrow><mtable><mtr><mtd><mrow><mi>j</
   mi><mo>=</mo><mn>1</mn></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow>
   <mrow><mi>j</mi><mo>≠</mo><mi>k</mi></mrow></mrow></mtd></mtr></mtable>
   </mrow></mrow><mi>n</mi></munderover><msub><mi>a</mi><mrow><mi
   mathvariant="italic">kj</mi></mrow></msub><msubsup><mi>s</mi><mrow><mi>
   j</mi></mrow><mi>l</mi></msubsup><mo>-</mo><msubsup><mi>s</mi><mrow><mi
   >k</mi></mrow><mi>l</mi></msubsup></mrow></mfenced></mrow> :MATH]
   7

   Here, u[k] and d[k] are assumed to be the same between the l-th and
   l + 1-th time points, i.e. the perturbations are constant in this time
   window. Since u[k] and the exponential term in Eq. ([69]7) are always
   positive, the remaining term should be zero. Therefore, the following
   relationship between the variable
   [MATH: <msubsup><mi>s</mi><mrow><mi>k</mi></mrow><mi>l</mi></msubsup>
   :MATH]
   and the network coefficient a[kj] can be obtained:
   [MATH:
   <mrow><msubsup><mi>s</mi><mrow><mi>k</mi></mrow><mi>l</mi></msubsup><mo
   >=</mo><munderover><mo
   movablelimits="false">∑</mo><mrow><mrow><mtable><mtr><mtd><mrow><mi>j</
   mi><mo>=</mo><mn>1</mn></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow>
   <mrow><mi>j</mi><mo>≠</mo><mi>k</mi></mrow></mrow></mtd></mtr></mtable>
   </mrow></mrow><mi>n</mi></munderover><msub><mi>a</mi><mrow><mi
   mathvariant="italic">kj</mi></mrow></msub><msubsup><mi>s</mi><mrow><mi>
   j</mi></mrow><mi>l</mi></msubsup></mrow> :MATH]
   8

   The matrix form of Eq. ([70]8) for n genes and m samples is the same as
   following:
   [MATH: <mrow><mi mathvariant="bold">S</mi><mo>=</mo><mi
   mathvariant="bold">AS</mi></mrow> :MATH]
   9

   where S is the n × m matrix of the slopes calculated from time series
   log2FCs data. In DeltaNeTS+, the slopes of the time series gene
   expression profiles were calculated using 2nd-order accurate finite
   difference approximations at each sampling time point [[71]23]. For the
   first and last time points, forward and backward finite difference were
   used, respectively, while for middle time points, a centered difference
   approximation was used. At the minimum, only two time points are needed
   to compute the time slopes, but having more time points enables
   implementing finite differencing with a higher order of accuracy.

   Combining Eq. ([72]3) and Eq. ([73]9), DeltaNeTS+ calculates the
   unknown variables A and P row by row using the following equation:
   [MATH: <mrow><mfenced close="]"
   open="["><mrow><mtable><mtr><mtd><msubsup><mover><mi
   mathvariant="bold">C</mi><mo>¯</mo></mover><mrow><mi>k</mi></mrow><mtex
   t>T</mtext></msubsup></mtd></mtr><mtr><mtd><mrow><mrow></mrow><msubsup>
   <mover><mi
   mathvariant="bold">S</mi><mo>¯</mo></mover><mrow><mi>k</mi></mrow><mtex
   t>T</mtext></msubsup></mrow></mtd></mtr></mtable></mrow></mfenced><mo>=
   </mo><mfenced close="]"
   open="["><mrow><mtable><mtr><mtd><msup><mrow><mover><mi
   mathvariant="bold">C</mi><mo>¯</mo></mover></mrow><mtext>T</mtext></msu
   p></mtd><mtd><mi
   mathvariant="bold">Z</mi></mtd></mtr><mtr><mtd><mrow><mrow></mrow><msup
   ><mrow><mover><mi
   mathvariant="bold">S</mi><mo>¯</mo></mover></mrow><mtext>T</mtext></msu
   p></mrow></mtd><mtd><mn
   mathvariant="bold">0</mn></mtd></mtr></mtable></mrow></mfenced><mfenced
   close="]" open="["><mrow><mtable><mtr><mtd><msubsup><mi
   mathvariant="bold">A</mi><mrow><mi>k</mi></mrow><mtext>T</mtext></msubs
   up></mtd></mtr><mtr><mtd><mrow><mrow></mrow><msubsup><mi
   mathvariant="bold">P</mi><mrow><mi>k</mi></mrow><mtext>T</mtext></msubs
   up></mrow></mtd></mtr></mtable></mrow></mfenced></mrow> :MATH]
   10

   where
   [MATH: <msubsup><mover><mi
   mathvariant="bold">C</mi><mo>¯</mo></mover><mrow><mi>k</mi></mrow><mrow
   ></mrow></msubsup> :MATH]
   ,
   [MATH: <msubsup><mover><mi
   mathvariant="bold">S</mi><mo>¯</mo></mover><mrow><mi>k</mi></mrow><mrow
   ></mrow></msubsup> :MATH]
   , A[k], and P[k] are the row vectors of the matrices
   [MATH: <mover><mi mathvariant="bold">C</mi><mo>¯</mo></mover> :MATH]
   ,
   [MATH: <mover><mi mathvariant="bold">S</mi><mo>¯</mo></mover> :MATH]
   , A, and P for gene k and 0 is the m × M zero matrix. The matrices
   [MATH: <mover><mi mathvariant="bold">C</mi><mo>¯</mo></mover> :MATH]
   and
   [MATH: <mover><mi mathvariant="bold">S</mi><mo>¯</mo></mover> :MATH]
   are the normalized log2FC and slope matrices
   [MATH: <mi mathvariant="bold">C</mi> :MATH]
   and
   [MATH: <mi mathvariant="bold">S</mi> :MATH]
   such that each of the matrices has a 2-norm equal to the square root of
   the number of samples in the matrix (i.e. the number of columns in the
   matrix). The normalization is set to balance the contributions from the
   two matrices in determining A and P.

   In DeltaNeTS+, when information on the structure of the GRN is
   available—for each gene k, we have information on its transcription
   factors—we can reduce the dimension of the problem stated in
   Eq. ([74]10) by restricting the inference of A only for the subset of
   genes related to the TFs of gene k:
   [MATH: <mrow><mfenced close="]"
   open="["><mrow><mtable><mtr><mtd><msubsup><mover><mi
   mathvariant="bold">C</mi><mo>¯</mo></mover><mrow><mi>k</mi></mrow><mtex
   t>T</mtext></msubsup></mtd></mtr><mtr><mtd><mrow><mrow></mrow><msubsup>
   <mover><mi
   mathvariant="bold">S</mi><mo>¯</mo></mover><mrow><mi>k</mi></mrow><mtex
   t>T</mtext></msubsup></mrow></mtd></mtr></mtable></mrow></mfenced><mo>=
   </mo><mfenced close="]"
   open="["><mrow><mtable><mtr><mtd><msubsup><mover><mi
   mathvariant="bold">C</mi><mo>¯</mo></mover><mrow><msub><mtext>TF</mtext
   ><mi>k</mi></msub></mrow><mtext>T</mtext></msubsup></mtd><mtd><mi
   mathvariant="bold">Z</mi></mtd></mtr><mtr><mtd><mrow><mrow></mrow><msub
   sup><mover><mi
   mathvariant="bold">S</mi><mo>¯</mo></mover><mrow><msub><mtext>TF</mtext
   ><mi>k</mi></msub></mrow><mtext>T</mtext></msubsup></mrow></mtd><mtd><m
   n
   mathvariant="bold">0</mn></mtd></mtr></mtable></mrow></mfenced><mfenced
   close="]" open="["><mrow><mtable><mtr><mtd><msubsup><mi
   mathvariant="bold">A</mi><mrow><msub><mtext>TF</mtext><mi>k</mi></msub>
   </mrow><mtext>T</mtext></msubsup></mtd></mtr><mtr><mtd><mrow><mrow></mr
   ow><msubsup><mi
   mathvariant="bold">P</mi><mrow><mi>k</mi></mrow><mtext>T</mtext></msubs
   up></mrow></mtd></mtr></mtable></mrow></mfenced></mrow> :MATH]
   11

   where the subscript TF[k] refers to the subset of genes corresponding
   to the TFs of gene k. Equation ([75]11) is solved using ridge
   regression to give
   [MATH: <msubsup><mi
   mathvariant="bold">A</mi><mrow><msub><mtext>TF</mtext><mi>k</mi></msub>
   </mrow><mrow></mrow></msubsup> :MATH]
   and
   [MATH: <msubsup><mi
   mathvariant="bold">P</mi><mrow><mi>k</mi></mrow><mrow></mrow></msubsup>
   :MATH]
   matrices for the entire GRN [[76]24]. When GRN structure is not
   provided or not available, DeltaNeTS+ solves Eq. ([77]10) by using
   LASSO regularization to predict a sparse matrix A[k] and P[k] [[78]14].
   We implemented LASSO [[79]25] and ridge regression in DeltaNeTS+ using
   the GLMNET algorithm with k-fold cross-validation (default k = 10)
   [[80]26]. Consequently, the smallest number of samples that DeltaNeTS+
   can handle is k samples.

Gene expression data

   We applied DeltaNeTS+ to time-series gene expression data of C. elegans
   embryo [[81]27], human cord blood CD4+T cells [[82]28], and human lung
   cancer Calu-3 cells [[83]29–[84]31]. For C. elegans embryo, log2
   intensity data which were normalized by robust multi-array average
   (RMA) method were obtained from Gene Expression Omnibus (GEO) [[85]32]
   (accession number: [86]GSE2180 and [87]GSE51162). Log2FC of gene
   expressions and its statistical significance (Benjamini–Hochberg
   adjusted p value) between gene knockout and wild-type conditions at
   each time point were calculated using a linear fit model and empirical
   Bayes method in the limma package of Bioconductor. The probe sets were
   mapped to the official gene symbols in celegans.db. In the case of
   multiple probe sets mapping to a gene symbol, we take the log2FC from
   the probe set with the smallest average adjusted p value over the
   samples.

   For human T cells, log2 intensity data by quantile normalization were
   obtained from GEO (accession number: [88]GSE17851). In the same way as
   C. elegans, log2FC values were calculated between gene knockout and
   wild-type conditions at each time point using limma. The probe sets
   were mapped to the gene symbols from the Illumina human-6 v2.0
   expression beadchip data in GEO (accession number: [89]GPL6102), and
   for the multiple probe sets mapping to the same gene, the probe set
   with the smallest average adjusted p value across all samples was
   chosen.

   For Calu-3 data, we compiled the raw Agilent Whole Human Genome
   4 × 44 K microarray data from [90]GSE33264 for IFN-α and IFN-γ
   experiments [[91]31] and from [92]GSE37571 and [93]GSE33142 for H1N1
   and H5N1 experiments [[94]29, [95]30]. The raw data were
   background-corrected and normalized using normexp and quantile methods
   in limma package of Bioconductor. The log2FCs between virus (or
   interferon) and mock samples at each time point were calculated using
   limma, with their statistical p value adjusted by Benjamini-Hockberg
   method. The probe sets were mapped to the official gene symbols in
   hgug4112a.db package. For a gene with multiple probe sets, we chose the
   data from the probe set with the smallest average adjusted p value.

   During preprocessing of time-series data, we substituted any
   time-series log2FC gene expression data that were not statistically
   significant with linearly interpolated values using adjacent time
   points with statistically significant log2FCs. Note that the
   computation of statistical significance (p value) requires at least
   three repeats, and thus, the linear interpolation using adjacent time
   points above is done only when 3 or more repeats are available. Unless
   mentioned differently, we used Benjamini–Hochberg adjusted p value
   < 0.05 to establish statistical significance. If the log2FC values of a
   gene were not statistically significant at any time point, we set the
   log2FCs to zero.

Gene regulatory networks

   The GRN for C. elegans embryo data set was obtained from TF-gene
   interactions for C. elegans in TF2DNA database [[96]33], where TF
   binding motifs and their regulated genes were identified by calculating
   the binding affinity based on the known 3D structure of TF-DNA
   complexes. The GRN for C. elegans was composed of 355,080 edges from 48
   TFs to 15,738 genes.

   Meanwhile, the GRNs for human T-cell and Calu-3 data sets were obtained
   from TF-gene interactions specific for human cord blood-derived cells
   and human epithelium lung cancer cells, respectively, available in
   Regulatory Circuit database [[97]20]. We only used TF-gene interactions
   with a confidence score greater than 0.1 in the Regulatory Circuit
   database. The GRNs for human T-cells and Calu-3 consisted of 11,955
   edges pointing from 438 TFs to 2,385 genes and 42,145 edges pointing
   from 515 TFs to 7,125 genes, respectively.

TSNI and DeMAND implementation

   For TSNI [[98]21], we downloaded the MATLAB subroutine from
   [99]https://dibernardo.tigem.it/softwares/time-series-network-identific
   ation-tsni and applied it to C. elegans, human T-cell, and Calu-3
   interferon data. The parameter of the number of principle components in
   TSNI was optimized to nPC = 1—i.e. using only the first principal
   component. For DeMAND [[100]9], we installed the DeMAND R-package from
   Bioconductor [[101]34] ([102]https://www.bioconductor.org) and applied
   the method using the same GRNs used in DeltaNeTS+ analysis.

Enrichment analysis of Calu-3 data

   For interferon case study, the top 50 genes ranked by each method
   (DeltaNeTS+, TSNI, and log2FC) were used for Gene Ontology (GO) and
   Reactome pathway enrichment analysis using Enrichr [[103]35]. For
   influenza A viral infection case study, the averaged P values of
   DeltaNeTS+ for each phase (phase 1: 0 to 7 h, phase 2: 7 to 18 h, phase
   3: > 18 h) were used for the gene set enrichment analysis (GSEA) of
   Reactome pathways [[104]36] using ReactomePA package [[105]37] in R.
   Before the enrichment analysis, genes were sorted based on the average
   P for each time phase, and genes with no perturbation score were
   excluded during the GSEA. Afterwards, the significance score (− log[10]
   p value) of the enriched pathways was calculated among the pathways
   with positive enrichment score from GSEA. The illustration of the
   results in Fig. [106]2 only shows the highest level of pathway
   information in the Reactome hierarchy
   ([107]https://reactome.org/PathwayBrowser), while the significance
   score is the best significance score (highest − log[10] p value) of the
   sub-pathways.

Fig. 2.

   [108]Fig. 2
   [109]Open in a new tab

   Summary of key Reactome pathways from gene set enrichment analysis of
   DeltaNeTS+ predictions. The size and color of dots indicate the score
   (negative logarithm-10 of p values) for the enriched Reactome terms.
   The influenza infection period is divided into three time phases:
   Phase1 = 0–7 h, phase 2 = 7–18 h, and phase 3 ≥ 18 h post-infection

Weighted gene co-expression network analysis (WGCNA)

   WGCNA [[110]38] was applied to the DeltaNeTS+ score of both H1N1
   (influenza A/CA/04/09) and H5N1 (influenza A/VN/1203/04) samples. The
   H1N1 data were from 9 time points (0, 3, 7, 12, 18, 24, 30, 36, and
   48 h) and the H5N1 data were from 6 time points (0, 3, 7, 12, 18, and
   24 h). In WGCNA analysis, we computed modules with a minimum of 200
   genes using the signed network option (soft-thresholding power = 18).
   Afterwards, GO and Reactome enrichment analysis were also performed for
   each module using ReactomePA and clusterProfiler packages in R.

Results

Predicting genetic perturbations

   We tested the performance of DeltaNeTS+ by inferring gene perturbations
   from time series C. elegans and human T-cell gene expression profiles
   from RNA interference (RNAi) experiments. DeltaNeTS+ generates gene
   perturbation scores p[ki] for all genes using the entire GRN. p[ki]
   indicates the strength of perturbations to the expression of gene k in
   sample i. In the following, we used the magnitude of the perturbation
   scores to rank the gene targets predicted by DeltaNeTS+. The C. elegans
   dataset comprises three genetic perturbation experiments [[111]27],
   each of which provides gene expression data across 10 time points after
   skn-1 and pal-1 knockdowns by shRNA on mex-3 or pie-1 mutated cells.
   The human T-cell dataset includes time-series gene expression data
   measured over 4 time points after STAT6 knockdown by shRNA [[112]28]
   (see Table [113]1). When applying DeltaNeTS+ on these data, gene
   regulatory network graphs for C. elegans and human T-cells were used as
   prior information (see [114]Methods).

Table 1.

   Rank prediction of gene targets in C. elegans and human T-cell data
   sets by DeltaNeTS+, TSNI, DeMAND, and log2FC magnitudes. The ranking is
   computed based on the magnitude of target scores for all genes in the
   GRN from each algorithm (see [115]Methods)
               Experiment 1  Experiment 2  Experiment 3
   C. elegans data set
    Methods    mex-3 skn-1   pie-1 pal-1   pie-1
    DeltaNeTS+ 9     44      9     2       2
    TSNI       11    5086    13    7       13
    DeMAND     38    15,150  23    3       38
    log2FC     3     8623    4     304     3
               Experiment 1 Experiment 2
   Human T-cell data set
    Methods    STAT6        STAT6
    DeltaNeTS+ 1            1
    TSNI       1            22
    DeMAND     1            2
    log2FC     1            4
   [116]Open in a new tab

   For comparison purposes, we generated gene target predictions using
   TSNI [[117]21] and DeMAND [[118]9] and those based on the magnitudes of
   log2FC values. For log2FC analysis, the gene target candidates were
   ranked in decreasing order of the absolute value of the log2FCs, while
   for DeMAND, the genes were ranked in increasing order of the
   statistical p value of dysregulation [[119]9]. For TSNI, we used the
   first principal component to generate the gene target predictions as
   this setting gave the best accuracy among the trials using principle
   components between 1 and 3. We also compared the accuracy of the gene
   target predictions under two different scenarios; the first is where
   the gene perturbations are time invariant, and the other is where the
   gene perturbations are allowed to vary over time. Because of the small
   number of samples in the above datasets, when the GRN structure is not
   used, the application of DeltaNeTS+ using LASSO (see [120]Methods) and
   a previous algorithm DeltaNet resulted in an empty target prediction,
   where the perturbation scores P were 0.

   Table [121]1 shows the ranking of the genes targeted by shRNA or
   mutation in C. elegans and human T-cell data with the gene
   perturbations set to be time-invariant, i.e. the gene perturbations are
   the same for all samples from the same treatment/condition. Except for
   skn-1, DeltaNeTS+ placed the known targets (mex-3, pal-1, pie-1, and
   STAT6) among the top 10 of the candidate gene targets. Notably, in
   almost all instances, DeltaNeTS+ ranked the known targets higher than
   TSNI, DeMAND, and log2FC analysis. The transcription factor skn-1
   regulates critical biological pathways related to oxidative stress
   responses and lifespan of C. elegans [[122]39]. Silencing skn-1 alters
   the transcription of a large number of genes (> 10,000), which
   complicates the identification of the true gene perturbation in these
   samples. Here, a significant fraction of gene targets predicted by
   DeltaNeTS+ with higher ranks than skn-1 were genes regulated by skn-1
   (see Additional file [123]1). Nevertheless, DeltaNeTS+ was able to put
   skn-1 at a much higher rank than TSNI, DeMAND, and log2FC.

   When using time-varying perturbation (i.e. the targets can vary across
   different time samples of the same treatment/condition), DeltaNeTS+
   again performed better than the three methods TSNI, DeMAND, and log2FC,
   as illustrated in Fig. [124]1. The log2FC of samples from the early
   time points actually gave a reasonably accurate indication of the
   direct gene perturbations (see Additional file [125]2: Tables S1–S2).
   But, log2FC magnitude became drastically a less accurate indicator for
   the direct gene targets for the later time points. This trend is
   expected because the downstream effects of a gene perturbation will
   progressively mask the true gene target identity over time. In
   comparison, DeltaNeTS+ prediction accuracy degraded much more mildly
   over the sampling times, demonstrating its higher robustness with
   respect to the choice of time samples (see Additional file [126]2:
   Tables S1–S2). Note that TSNI and DeMAND only provide an overall target
   prediction, and not for different time points.

Fig. 1.

   [127]Fig. 1
   [128]Open in a new tab

   Ranks predictions for knock-out targets. The distribution of box plot
   indicates the ranks from all the time points from the same experiment.
   The ranks of gene targets by DeltaNeTS+ are colored in red and those by
   TSNI, DeMAND, and log2FC analysis are colored in blue, green, and grey,
   respectively. *p value < 0.1 and **p value < 0.01 by Wilcox signed rank
   test between DeltaNeTS and other methods

Predicting the mechanism of interferon-α and interferon-γ actions

   Next, we tested the ability of DeltaNeTS+ in differentiating the
   specific activity of two related compounds from the same family of
   proteins: interferon-α (IFN-α) and interferon- γ (IFN-γ). This task is
   challenging as these two signalling proteins trigger a common response
   of the immune system but through different signalling pathways. IFN-α
   and IFN-γ are cytokines that induce innate immune response against
   viral infection through separate signalling pathways, called type I and
   type II signalling, respectively. The IFN-α signal goes through type I
   IFN receptor and is followed by the formation of the complex IRF9,
   STAT1 and STAT2 that activates the transcription of IFN-α/β stimulated
   genes (ISG). Meanwhile, the IFN-γ signal is received by type II IFN
   receptor which leads to the transcription of genes containing gamma
   interferon activation sites (GAS) [[129]40, [130]41]. The gene
   expression data came from a previous study using human lung cancer
   cells Calu-3 [[131]31]. We employed the human epithelial lung cancer
   cell GRN structure as prior information for DeltaNeTS+ (see
   [132]Methods). To examine cellular pathways that are perturbed by each
   of the two interferon treatments, we performed GO (Biological
   Processes) and Reactome pathway enrichment analysis for the 50 highest
   ranked genes from each method [[133]35].

   Table [134]2 summarizes the enriched GO terms and Reactome pathways of
   the top gene target predictions from DeltaNeTS+, DeMAND and log2FC
   analysis (adjusted p value < 0.01; detailed results in Additional file
   [135]3). The top 50 gene targets predicted by TSNI did not show any
   statistically significant enrichment for neither GO nor Reactome
   pathways. The enriched GO and Reactome terms of the DeltaNeTS+
   perturbation analysis correctly point to the distinct signalling
   pathways through which the two interferons act. Besides
   interferon-specific signalling, pathways related to major
   histocompatibility complex (MHC) class II antigen, which is induced by
   an IFN-γ stimulated gene CIITA [[136]42], are also enriched in the
   DeltaNeTS+ analysis for IFN-γ data, but not for IFN-α data. On the
   other hand, the enrichment analysis of top log2FC genes shows more
   diverse over-represented pathways, which is expected as the log2FC
   expressions reflect both direct and indirect effects of the two
   interferons. While the distinct signalling pathways related to IFN-α
   and IFN-γ are among the top enriched GO and Reactome pathways for
   log2FC, the signalling pathways specific to each interferon are
   cross-listed in the enrichment results for the two interferons. For
   DeMAND, no interferon signalling was detected for IFN-γ data although
   type I and II interferon signalling pathways were enriched from the
   DeMAND prediction for IFN-α data. In comparison to DeltaNeTS+, log2FC
   and DeMAND were less capable in differentiating the specific signalling
   pathways through which IFN-α and IFN-γ act.

Table 2.

   Reactome pathway enrichment analysis for Interferon-α and -γ treatments
   Interferon-α enrichment analysis
   DeltaNeTS+ GO terms Log2FC Reactome pathways DeMAND GO terms
   Type I interferon signaling pathway Interferon alpha/beta signaling
   Defense response to virus
   Negative regulation of viral genome replication Interferon
   Signaling_Homo sapiens Type I interferon signaling pathway
   Negative regulation of viral life cycle Cytokine Signaling in Immune
   system Cellular response to type I interferon
   Cellular response to type I interferon Immune System_Homo sapiens
   Interferon-gamma-mediated signaling pathway
   Regulation of viral genome replication Antiviral mechanism by
   IFN-stimulated genes Response to type I interferon
   Reactome pathways ISG15 antiviral mechanism Reactome pathways
   Interferon signaling Interferon gamma signaling Interferon alpha/beta
   signaling
   Interferon alpha/beta signaling RIG-I/MDA5 mediated induction of
   IFN-alpha/beta pathways Interferon gamma signaling
   TRAF3-dependent IRF activation pathway Interferon signaling
   Regulation of IFNA signaling
   TRAF6 mediated IRF7 activation
   Negative regulators of RIG-I/MDA5 signaling
   Interferon-γ enrichment analysis
   DeltaNeTS+ GO terms Log2FC GO terms DeMAND No enriched term
   Antigen processing and presentation of exogenous peptide antigen via
   MHC class II Interferon-gamma-mediated signaling pathway
   Cellular response to interferon-gamma
   Antigen processing and presentation of exogenous peptide antigen
   Cytokine-mediated signaling pathway
   Reactome pathways
   Antigen processing and presentation of peptide antigen via MHC class II
   Interferon signaling
   Interferon gamma signaling
   Cellular response to interferon-gamma Cytokine signaling in immune
   system
   Interferon-gamma-mediated signaling pathway Immune system
   Reactome pathways Interferon alpha/beta signaling
   Interferon signaling Translocation of ZAP-70 to immunological synapse
   Interferon gamma signaling Phosphorylation of CD3 and TCR zeta chains
   MHC class II antigen presentation PD-1 signaling
   Cytokine signaling in immune system MHC class II antigen presentation
   Generation of second messenger molecules
   [137]Open in a new tab

Network perturbation analysis during influenza viral infections

   Finally, we applied DeltaNeTS+ to analyze gene expression data from
   H1N1 and H5N1 influenza virus infected Calu-3 cells with the goal of
   elucidating the similarities and differences between the two important
   influenza viral strains. H1N1 strain is the cause of the 2009 swine flu
   outbreak and is known for its high transmissibility among humans. On
   the other hand, H5N1 strain is an avian influenza subtype that is known
   for its severe virulence with a high mortality rate of 60%. This high
   pathogenicity results in growing attention to understand the causal
   molecular mechanism [[138]43]. Following DeltaNeTS+ analysis, we
   performed a GSEA to find over-represented Reactome pathways [[139]37]
   and WGCNA [[140]38] analysis to identify gene modules with similar
   dynamic perturbations, using the gene perturbation scores produced by
   DeltaNeTS+.

   Figure [141]2 depicts the Reactome pathways enriched (adjusted p value
   < 0.01, see [142]Methods) across the three phases of the infection:
   early (phase 1: 0–7 h), middle (phase 2: 7–18 h) and late (phase 3:
   > 18 h) (full results in Additional file [143]4). H1N1 and H5N1 trigger
   many of the same cellular pathways but often in different phases.
   Expectedly, immune response, such as cytokine signaling and innate
   immune system, is triggered by both viral infections with roughly the
   same trend over time. Other pathways however are modulated with
   different timings. Among the pathways significantly enriched only in
   H5N1 infection are those associated with G-protein coupled receptor
   (GPCR) signaling and actin cytoskeleton (muscle contraction), both of
   which are known to be hijacked for viral entry processes (Fig. [144]2)
   [[145]18, [146]44, [147]45]. On the other hand, programmed cell death
   and DNA damage (chromatin organization and DNA repair) are
   significantly enriched only in H1N1 data from the early through
   mid-phase of the infection. A previous study reported that death
   signaling molecules are downregulated in Calu-3 cells infected by H5N1
   [[148]46], a finding that is consistent with the absence of enrichment
   for programmed cell death in the result of DeltaNeTS+ analysis for
   H5N1. Other pathways enriched in the late phase of H1N1 infection are
   NOTCH, WNT, and RHO GTPase signaling, pathways that are relevant to
   cellular proliferation [[149]47, [150]48]. These pathways are related
   to Histone cluster 1 families, and based on DeltaNeTS+ perturbation
   scores, are predicted to have been negatively perturbed (see Additional
   file [151]5). The negative perturbations signify a repression of
   cellular proliferation in H1N1 infection.

   In addition to enrichment analysis, we applied WGCNA to the DeltaNeTS+
   gene perturbation predictions to group the genes based on the pattern
   of perturbations over time. In this case, WGCNA finds clusters of genes
   whose perturbation scores given by the P matrix are highly similar
   based on a topological overlap derived from the gene–gene correlations.
   In the WGCNA results, the time-course perturbation values in each group
   can be represented by the first principal component in that group,
   called eigengene. Figure [152]3 shows the eigengene perturbation
   profiles of seven groups identified by WGCNA, and Table [153]3 provides
   the summary of GO and Reactome pathways enriched in each group (see
   details in Additional files 6 and 7). For the smallest group M7, no
   pathway was found to be significantly enriched.

Fig. 3.

   [154]Fig. 3
   [155]Open in a new tab

   Eigengene profiles from WGCNA applied to DeltaNeTS+ perturbation scores
   of H1N1 and H5N1 influenza A infections (red: H1N1, blue: H5N1)

Table 3.

   WGCNA modules of DeltaNeTS+ perturbation scores
   Module Size^a Summary of enriched pathways^c DB^d
   M1 2863 DNA replication, cell cycle GO & RP
   M2 2803 Influenza Viral RNA Transcription and Replication, viral mRNA
   Translation GO & RP
   M3 2645 Defense response to virus, immune response, interferon
   signaling GO & RP
   M4 1427 Viral entry (GPCR binding, amine transport) GO & RP
   M5 888 Crosslinking of collagen fibrils RP
   M6 762 Apoptosis (TNFR2 non-canonical NF-kB pathway) RP
   M7 397 N/A –
   [156]Open in a new tab

   RP reactome pathway database, GO gene ontology

   ^aThe number of genes clustered in each module

   ^bSummary of enriched pathways (q-value < 0.10) in each module

   ^cPathway databases used for enrichment analysis

   Groups M1, M4, and M5 are related to cell cycle arrest, viral entry,
   and cellular matrix barrier respectively. For these groups, the signs
   of the perturbations are the same for H1N1 and H5N1, but the magnitudes
   of the perturbations are larger for H5N1. For both strains, the
   infection acts to repress cell cycle arrest and cellular matrix barrier
   (negative perturbations) and to activate viral entry pathways (positive
   perturbations). For groups M2 and M3, the signs of the perturbations
   for the two influenza A strains are the opposite of each other. In H5N1
   infection, viral RNA transcription, translation and replication (group
   M2) are induced, whereas viral defense and immune mechanisms (group M3)
   are strongly inhibited. In contrast, in H1N1 infection, viral RNA
   replication processes are repressed, but genes for antiviral mechanisms
   are moderately induced. Taken together, GSEA and WGCNA of DeltaNeTS+
   for H1N1 and H5N1 infection in Calu3 indicates that in comparison to
   H1N1, the more virulent H5N1 infection shows increased activity of
   viral entry process and increased repression of cell cycle arrest, as
   well as inhibition of the immune system and programmed cell death.

Discussion

   DeltaNeTS+ is a network perturbation analysis tool for the analysis of
   gene expression data, which generates predictions on the direct gene
   expression perturbation in a given treatment or sample. DeltaNeTS+
   builds on our previous algorithm DeltaNeTS which is able to utilize
   time-series datasets of gene expression, and adds the capability to
   take advantage of information on the GRN structure. The incorporation
   of GRN structure information enables accurate gene target predictions
   from a small number of samples, an advantage over our previous methods
   DeltaNet and DeltaNeTS. The direct gene perturbation of a gene is
   computed based on the difference between the measured differential
   expression of the gene and the differential expression predicted by the
   GRN model based on the expression of its transcription factors. We
   compared DeltaNeTS+ to three strategies, including (1) differential
   expression analysis, (2) a target inference method TSNI which relies on
   a model of the GRN [[157]21], and (3) a network perturbation analysis
   method DeMAND which relies on statistical hypothesis testing via
   Kullback–Leibler divergence [[158]9]. The comparative methods were
   selected since they are among the most commonly used and well-cited
   methods for gene target inference and are able to handle time-series
   data. DeltaNeTS+ is most similar to TSNI [[159]21], as both methods
   rely on a linear regression formulation and time-series data. While
   TSNI uses principal component analysis (PCA) to reduce the dimension of
   the unknown GRN matrix A and perturbation matrix P, DeltaNeTS+ employs
   a regularization strategy (ridge regression or LASSO, depending on the
   availability of the GRN structure). TSNI is unable to accommodate
   information on GRN structure.

   Adding GRN structural information to DeltaNeTS+ is highly beneficial
   and improves the gene target predictions, especially when the dataset
   contains only a few samples. The application of DeltaNeTS+ and a
   previous algorithm DeltaNet to C. elegans, T cell, and Calu-3 datasets
   without incorporating the GRN structure returned a null gene target
   prediction (P is 0; data not shown). Here, by using the available GRN
   structure, DeltaNeTS+ is implemented using ridge regression in which
   the dimensionality of the unknown GRN matrix A is reduced to known
   TF-gene regulations. As shown in C. elegans and human T-cell case
   study, DeltaNeTS+ was able to use the GRN structure to predict genetic
   perturbations with high accuracy more reliably than the comparative
   methods. Besides, gene targets highly ranked by DeltaNeTS+ were closely
   associated with the true gene targets – for example, genes that are
   directly regulated by the true gene targets (see Additional file
   [160]1). The ability to utilize GRN structure information is important
   high-quality cell type specific gene regulatory networks become more
   available in recent times, thanks to the success of large-scale
   projects such as ENCODE [[161]17] and FANTOM5 [[162]49]. Besides,
   advanced network inference tools such as ARACNE [[163]50] and MINDy
   [[164]51] allow us to reverse-engineer a context-specific network from
   the gene expression data for a cell of our interest.

   Note that DeltaNeTS+ is able to combine time-series and steady-state
   transcriptomic datasets for improving the accuracy of gene target
   predictions. For example, when we added a steady-state transcriptomic
   dataset ([165]GSE51162) to the training of C. elegans GRN model in
   DeltaNeTS+, we were able to improve the rank of the predicted gene
   targets for pie-1 mutation (see Additional file [166]2: Table S3). Vice
   versa, the gene target predictions for the C. elegans steady-state
   dataset were better upon using the GRN model from the combination of
   time-series and steady-state data than using that from steady-state
   data alone (see Additional file [167]2: Figure S1). While incorporating
   the GRN structure information enables DeltaNeTS+ to predict gene
   targets using a small set of samples, the result above demonstrates
   that a larger number of samples improves accuracy. The capability of
   DeltaNeTS+ to integrate both time-series and steady-state data
   seamlessly in the gene target prediction means that DeltaNeTS+ will be
   able to take advantage of a larger library of available transcriptomic
   datasets and avoid running separate analyses for each kind of datasets.

   The application of DeltaNeTS+ to the analysis of gene perturbation
   targets associated with H1N1 and H5N1 infection led to insights on the
   differences in the mechanism of actions between the two influenza A
   strains. In general, the more pathogenic strain H5N1 induces stronger
   and swifter perturbations than the less pathogenic but more infectious
   H1N1 (Figs. [168]2, [169]3). Notably, H5N1 infection shows a strong
   induction of viral entry process and viral replication and a repression
   of cell cycle arrest, all of which are strategies that would ensure a
   successful proliferation of the virus. The pathogenic H5N1 infection
   also demonstrates inhibition of viral defense mechanism. Furthermore,
   the less severity of H1N1 infection is also associated with a
   successful activation of cell death and repression of cell
   proliferation, pathways that are crucial in curtailing viral
   progression.

Conclusions

   DeltaNeTS+ is an effective network analysis method for inferring gene
   transcriptional perturbations from gene expression dataset. The ability
   of DeltaNeTS+ to integrate both steady-state and time-course gene
   expression data and available information of gene regulatory network
   structure enables accurate prediction of gene targets from limited
   number of samples (~ 10 s). The application of DeltaNeTS+ to influenza
   A infection datasets in human Calu-3 give insights into the key
   functional perturbations that differ between highly virulent H5N1 avian
   flu and highly transmissible H1N1 swine flu. Insights on the molecular
   mechanisms of drugs and diseases from gene target predictions provide
   important information for drug discovery and treatment formulations.
   The ability to marry gene expression profiles with transcriptional
   network structures derived from genomic information, as done in
   DeltaNeTS+, is crucial for understanding diseases and for formulating
   individualized treatments in the era of personalized medicine.

Supplementary Information

   [170]12859_2021_4046_MOESM1_ESM.xlsx^ (16.9KB, xlsx)

   Additional file 1. Genes with higher ranking than the true targets in
   Caenorhabditis elegans dataset
   [171]12859_2021_4046_MOESM2_ESM.docx^ (106.8KB, docx)

   Additional file 2: Table S1. Gene target ranking by DeltaNeTS+ and
   log2FC analysis for each time point in C. elegans datasets. Table S2.
   Gene target ranking by DeltaNeTS+ and log2FC analysis for each time
   point in STAT6 siRNA experiments of human T-cell data. Table S3. Gene
   target prediction of DeltaNeTS+ for time-series C. elegans data
   (GSE2180) upon using time-series data alone and using a combination of
   time-series and steady-state (GSE51152) dataset. Figure S1. Gene target
   ranking by DeltaNeTS+ for steady-state C. elegans data (GSE51162) upon
   using the GRN model learned on steady-state dataset alone and using the
   GRN model trained on steady-state and time-series (GSE2180) datasets
   (*p value < 0.05 by Wilcox signed rank test)
   [172]12859_2021_4046_MOESM3_ESM.xlsx^ (250.2KB, xlsx)

   Additional file 3. Enriched GO terms and Reactome pathways of the top
   gene target predictions from DeltaNeTS+, DeMAND and log2FC analysis
   [173]12859_2021_4046_MOESM4_ESM.xlsx^ (1.1MB, xlsx)

   Additional file 4. Reactome pathway enrichment analysis across the
   three phases of H1N1 and H5N1 influenza-A infection in Calu-3: early
   (phase 1: 0-7hr), middle (phase 2: 7-18hrs) and late (phase 3: >18hrs)
   [174]12859_2021_4046_MOESM5_ESM.xlsx^ (1.9MB, xlsx)

   Additional file 5. DeltaNeTS+ analysis of H1N1 and H5N1 influenza-A
   infection in Calu-3: early (phase 1: 0-7hr), middle (phase 2: 7-18hrs)
   and late (phase 3: >18hrs)
   [175]12859_2021_4046_MOESM6_ESM.xlsx^ (1.7MB, xlsx)

   Additional file 6. GO enrichment analysis of WGCNA modules derived from
   DeltaNeTS+ analysis of H1N1 and H5N1 influenza-A infection in Calu-3
   (terms with adjusted p-value < 0.5 are highlighted in red)
   [176]12859_2021_4046_MOESM7_ESM.xlsx^ (489.7KB, xlsx)

   Additional file 7. Reactome pathway enrichment analysis of WGCNA
   modules derived from DeltaNeTS+ analysis of influenza-A infection in
   Calu-3 (pathway with q-value < 0.1 are highlighted in red)

Acknowledgements