Abstract

Background

   Along with the development of precision medicine, individual
   heterogeneity is attracting more and more attentions in clinical
   research and application. Although the biomolecular reaction seems to
   be some various when different individuals suffer a same disease (e.g.
   virus infection), the final pathogen outcomes of individuals always can
   be mainly described by two categories in clinics, i.e. symptomatic and
   asymptomatic. Thus, it is still a great challenge to characterize the
   individual specific intrinsic regulatory convergence during dynamic
   gene regulation and expression. Except for individual heterogeneity,
   the sampling time also increase the expression diversity, so that, the
   capture of similar steady biological state is a key to characterize
   individual dynamic biological processes.

Results

   Assuming the similar biological functions (e.g. pathways) should be
   suitable to detect consistent functions rather than chaotic genes, we
   design and implement a new computational framework (ABP: Attractor
   analysis of Boolean network of Pathway). ABP aims to identify the
   dynamic phenotype associated pathways in a state-transition manner,
   using the network attractor to model and quantify the steady pathway
   states characterizing the final steady biological sate of individuals
   (e.g. normal or disease). By analyzing multiple temporal gene
   expression datasets of virus infections, ABP has shown its
   effectiveness on identifying key pathways associated with phenotype
   change; inferring the consensus functional cascade among key pathways;
   and grouping pathway activity states corresponding to disease states.

Conclusions

   Collectively, ABP can detect key pathways and infer their consensus
   functional cascade during dynamical process (e.g. virus infection), and
   can also categorize individuals with disease state well, which is
   helpful for disease classification and prediction.

Introduction

   The general gene-set or gene module can be used to find combinatory
   biomarkers or signatures to indicate particular phenotypic change of a
   biological system, e.g. the disease development or worsening
   [[35]1–[36]3]. However, such ab initio predictions usually have less
   interpretability in biological researches [[37]4]. Different from these
   conventional methods, the pathway centered analysis can provide a new
   way to balance the discovery of interpretable biological functions and
   the detection of new pathway elements. Thus, these approaches could
   highlight the conditional importance of prior-known pathways [[38]5] in
   a phenotype change/ transition [[39]6], and also uncover its new
   candidate underlying functions [[40]7].

   Biological pathway consists of a set of interactive genes or other
   biomolecules, which is well-known to execute a series of functional
   cascades for particular cellular response/outcome [[41]8]. Nowadays,
   there are many carefully curated pathways available to represent
   creditable functional compositions and interactions [[42]9], which are
   expected to targetedly capture the permutation of established
   biological functions involved in the phenotype changes [[43]10,
   [44]11].

   Indeed, pathway centered models and methods have been widely applied in
   diverse biological and clinical studies. For instance, the well-known
   gene set enrichment analysis (GSEA) [[45]12] can recognize
   dys-regulated pathway according to the measurement of status change of
   a pathway. Similarly, many pathway-level aggregation methods have been
   designed to investigate the biological signatures of different
   phenotypes on the pathway activity level [[46]13]. PAGODA can reveal
   multiple overlapping aspects of transcriptional heterogeneity for
   coordinated variability amongst tested cells, by evaluating biological
   pathways as persistent cell-type specific features or transient
   processes [[47]14]. The mendelian randomization-based pathway
   enrichment analysis (MRPEA), has developed a pathway association
   analysis method for correcting the genetic confounding effects of
   environmental exposures during the genetic studies of human complex
   diseases [[48]15]. Pathifier is a principle curve based algorithm to
   infer pathway deregulation scores for each tumor sample on the basis of
   expression data by transforming gene-level information into
   pathway-level information [[49]16]. A silico Pathway Activation Network
   Decomposition Analysis (iPANDA) is designed for robust biomarker
   identification from gene expression data, which estimates the pathway
   activation scores based on the degree of differential gene expression
   and pathway topology decomposition [[50]17]. There are many further
   improvements on these pathway-centered computational approaches and
   their applications by machine learning ideas and technologies
   [[51]18–[52]20]. A network-based pathway-expanding approach is applied
   to take the topological structures of biological networks into account
   [[53]21]; especially, the interaction between internal and external
   genes of the pathway and between pathways, can accurately and reliably
   identify significant pathways related to the corresponding disease
   [[54]22]. And a multiple kernel learning has been proposed to feature
   selection by separately per data type and by pathway membership, and
   this maximizes the amount of information used to build effective
   prognostic prediction due to its usage of all available data [[55]23].

   Actually, there exists some pathway-related studies based on the
   time-course experiment and data, e.g. a pathway-based phylogenetic
   approach [[56]24], a web-based software program Network Painter
   [[57]25], and a new Wnt signaling time-fit model [[58]26]. However,
   there is still urgent requirement on the computational characterization
   of interactive pathways and their phenotype-associated states, which
   should provide a new viewpoint of network of networks for a biological
   system [[59]27], and should also supply new strategy to state
   prediction for the phenotype change of a biological system.

   Here, as shown in Fig. [60]1, we present a composite computational
   framework (ABP: Attractor analysis of Boolean network of Pathway) to
   investigate the dynamical biological process (e.g. state transition of
   biological systems). ABP reconstructs the Boolean network (BN) of
   pathways by the pathway activity profiles, and then applies the BN
   attractors to indicate the steady pathway states, which can
   quantitatively characterize the biological system status corresponding
   to different conditions or phenotypes (e.g. normal and disease). By
   wide evaluation on time-course gene expression datasets of virus
   infections, ABP has shown its efficiency and accuracy on identifying
   key phenotype-associated pathways subjected to virus infection; and
   rebuilding these key pathways’ consensus functional cascade during
   individual specific virus infection; and especially categorizing
   individuals with disease state well, which should be helpful for
   disease classification and prediction in personalized medicine.

Fig. 1.

   [61]Fig. 1
   [62]Open in a new tab

   The framework ABP (Attractor analysis of Boolean network of Pathway) of
   pathway-centered dynamical network analysis. It includes several steps:
   transforming highthrough-put data into pathway score by GSEA; selecting
   dys-regulated pathways along time point; inferring network based on
   BoolNet; and finally computing attractors for each individual to
   describe the steady state of disease

Methods

   Our proposed ABP framework consists of a few main steps and parameters
   as illustrated in Fig. [63]1:
    1. Normalization of gene expression data produced by high-throughput
       technologies, e.g. RNA sequencing or microarray;
    2. Quantification of pathway activity according to pathway measuring
       approaches, i.e. GSVA using the expression data of each sample and
       gene-sets from KEGG database;
    3. Selection of key phenotype-associated pathways through differential
       significance test (P < 0.05), where the number of finally selected
       pathways is dependent on the maximal number of significantly
       dys-regulated pathways at different time points;
    4. Extraction of individual specific time-course pathway activity
       profile (i.e. every individual would have multiple samples
       collected at consecutive time points), thus, the organization of
       pathway activity profile is a cubic dataset, including key
       pathways, individuals, and time points; of note, the number of key
       pathways should be less than the number of time points, which is
       required by the follow-up BN construction;
    5. Reconstruction of individual specific Boolean network based on
       corresponding time-course pathway activity profile, which is
       calculated by BoolNet method, using binarizeTimeSeries function
       with “scanStatistic” method, windowSize with intervals from 0.1 to
       0.2, sign.level with intervals from 0.05 to 0.15; and applying
       reconstructNetwork function with “bestfit” method and default
       arguments.
    6. Extraction of individual specific attractors corresponding to each
       Boolean network, which represent the individual specific steady
       pathway state so as to distinguish dissimilar phenotypes by
       unsupervised (i.e. HCL) or supervised (i.e. SVM with radial kernel)
       methods.

   These key steps applied in above framework will be introduced in
   details in below.

The calculation of pathway activity scores

   Due to the reduced costs of high-throughput technologies, omics data is
   being produced under more complex experimental designs with multiple
   phenotypic and/or clinical samples. The Gene Set Variation Analysis
   (GSVA) [[64]28] allows to assess the underlying pathway activity
   variation by transforming the gene by sample matrix into a gene set by
   sample matrix, which just provide the relative enrichment (scores) of
   pathways across the samples to support follow-up traditional analytical
   methods in a pathway focused manner, rather than the only qualitative
   enrichment with respect to a phenotype change. In this study, we apply
   GSVA to obtain the activity scores of KEGG pathways at multiple time
   points, which will be further used in following network and dynamic
   analysis on pathway level rather than gene level.

   Notably, several approaches have proposed some pathway-level
   aggregation methods, but, it still remains unclear how they compare
   with one another due to limited evaluations. One recent benchmarking
   investigation has pointed that there is actually necessity for further
   improving pathway-level aggregation [[65]29]. As known, there were
   remarkable outcome differences for different pathway tools even when
   the same data input. For example, the results of a tested approach were
   typically consistent across different datasets in cancer studies, yet
   different between methods [[66]30].

   Thus, any alternative kinds of activity score can also be used in this
   step if necessary in particular analysis routines, which would supply
   tunable analysis strategies dependent on researcher’s objective and
   experience.

The rank of dys-regulated pathways

   With an assumption of that the subjects / individuals with the same
   clinical symptoms should have the similar molecular / pathway reaction
   mechanism, the pathway activity may be various in individuals, but the
   final steady state of a network among pathways should be similar among
   the same symptom group or distinguishing between different symptom
   groups. Thus, we rank the differentially activated pathways between
   different phenotypic individual groups at each time point, which are
   dependent on the difference test of the activity scores by Wilcoxon
   rank sum test.

The selection of key pathways during dynamical biological process

   Then, according to the number of high-ranked dys-regulated pathways at
   each time point, we determine one critical time point, and the pathways
   significantly enriched (e.g. observed) at this time point are key
   pathways. These selected key pathways and their activities across
   consecutive time points can be regarded as the state indices which
   could trace the active pathway during a dynamical process, even in each
   subject. Noted, the final state or converged state (rather than
   currently observed state) of one pathway is characterized and
   quantified by the steady sate of this pathway in a subject, and this
   steady sate is modeled by network attractor and deduced from following
   Boolean network analysis.

The re-construction of Boolean network among key pathways

   To characterize the dynamical features of a biological system among
   above filtered key pathways, a network of pathways is constructed based
   on Boolean network model. At the starting point for a Boolean Network
   Model, N kinds of interacting nodes (i.e. pathways or meta-genes) are
   collected, the states of which are modeled as either “on” (active, or
   up-regulated) or “off” (inactive, or down-regulated). Then, at any
   given time, the system of such N key pathways is just in a system- or
   network-state. Along with a dynamical process, the system will change
   from one state to another depending on the interactions between these
   pathways. Thus, from any start state, the system is assumed to happen a
   confirmed sequence of state changes and ends up in a stable state
   (known as an attractor), and this sequence or trajectory of such system
   state change is defined as a Boolean process corresponding to
   discretized time-course data.

   In practice, the Boolean network model is applied to convert the gene
   or pathway expression snapshots over a time course into a Boolean form
   by noting which genes or pathways are active or not. The dynamics of a
   Boolean network (BN) model (simply using the current state to determine
   the next state) can be described as follows:
   [MATH: <msub><mi mathvariant="normal">s</mi><mi
   mathvariant="normal">i</mi></msub><mfenced close=")" open="("><mrow><mi
   mathvariant="normal">t</mi><mo>+</mo><mn>1</mn></mrow></mfenced><mo>=</
   mo><mfenced close="" open="{"><mtable><mtr><mtd><mn>1</mn><mspace
   width="0.25em"></mspace><munder><mo>∑</mo><mi
   mathvariant="normal">j</mi></munder><msub><mi
   mathvariant="normal">a</mi><mi>ji</mi></msub><msub><mi
   mathvariant="normal">s</mi><mi
   mathvariant="normal">j</mi></msub><mfenced close=")" open="("><mi
   mathvariant="normal">t</mi></mfenced><mo>></mo><mn>0</mn></mtd></mtr><m
   tr><mtd><mn>0</mn><mspace width="0.25em"></mspace><munder><mo>∑</mo><mi
   mathvariant="normal">j</mi></munder><msub><mi
   mathvariant="normal">a</mi><mi>ji</mi></msub><msub><mi
   mathvariant="normal">s</mi><mi
   mathvariant="normal">j</mi></msub><mfenced close=")" open="("><mi
   mathvariant="normal">t</mi></mfenced><mo><</mo><mn>0</mn></mtd></mtr><m
   tr><mtd><msub><mi mathvariant="normal">s</mi><mi
   mathvariant="normal">i</mi></msub><mfenced close=")" open="("><mi
   mathvariant="normal">t</mi></mfenced><mspace
   width="0.25em"></mspace><munder><mo>∑</mo><mi
   mathvariant="normal">j</mi></munder><msub><mi
   mathvariant="normal">a</mi><mi>ji</mi></msub><msub><mi
   mathvariant="normal">s</mi><mi
   mathvariant="normal">j</mi></msub><mfenced close=")" open="("><mi
   mathvariant="normal">t</mi></mfenced><mo>=</mo><mn>0</mn></mtd></mtr></
   mtable></mfenced> :MATH]

   where, the s[i](t) represents the state of some variable (e.g. a node
   in network) i at time point t; a[ji] points the influence weight of
   another variable j for this variable i; thus, the state of variable i
   at time point t + 1 would be determined by all other variables’ states
   at the time point ahead, i.e. at time point t.

   In this study, the BoolNet package is used for the construction and
   evaluation of Boolean networks, which has been successfully applied in
   many biological and biomedical researches [[67]31]. In particular,
   BoolNet is developed for the reconstruction and analysis of binary
   gene-regulatory networks: (i) the Network reconstruct function is used
   for inferring Boolean networks from temporal gene expression or pathway
   activity profiles by popular reconstruction algorithms; and (ii) the
   binarize Time Series function is used for binarizing the real-valued
   time series from these reconstruction algorithms. Noted, the tuning
   parameters used in BoolNet are searched by grid method and determined
   by the follow-up state clustering performance. Of course, some
   alternative Boolean network analysis methods are worthy of future
   benchmark investigations.

The biological state clustering based on attractors of temporal network of
key pathways

   As assumed, the attractors represent a final steady state of a
   biological system, e.g. the success-infection or failure-infection
   states during virus infection. Obviously, these final states are
   expected to classify or even predict the particular phenotypes of
   individuals. Here, it is relevant to evaluate the indicative power of
   those state features rather than expression features, so that, the
   clustering of those states and expressions would be applicable and
   comparable, e.g. hierarchical clustering of pathway states and gene
   expressions respectively. However, in math terms, a state of an
   attractor is just a binary vector (attractor with simple structure) or
   a group of vectors (attractor with loop structure), so the conventional
   clustering method is not applied before we can supply the distance
   matrix among states. To generally measure the distance between two
   states (attractors), the canonical-correlation analysis (CCA) is used
   to infer the association information from cross-covariance matrices,
   which can indicate the association (distance) between two groups of
   vectors, i.e. two attractors.

   Given two matrix X and Y, canonical-correlation analysis is to seek two
   vectors a and b to maximize the correlation between a ′ X and b ′ Y,
   i.e.
   [MATH: <munder><mo>max</mo><mrow><mi
   mathvariant="normal">a</mi><mo>,</mo><mi
   mathvariant="normal">b</mi></mrow></munder><mi
   mathvariant="normal">ρ</mi><mfenced close=")" open="("
   separators=","><mrow><msup><mi
   mathvariant="normal">a</mi><mo>′</mo></msup><mi
   mathvariant="normal">X</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>′</mo><mi
   mathvariant="normal">Y</mi></mrow></mfenced><mo>=</mo><munder><mo>max</
   mo><mrow><mi mathvariant="normal">a</mi><mo>,</mo><mi
   mathvariant="normal">b</mi></mrow></munder><mfrac><mrow><mi
   mathvariant="normal">a</mi><mo>′</mo><msub><mi
   mathvariant="normal">Σ</mi><mi>XY</mi></msub><mi
   mathvariant="normal">b</mi></mrow><mrow><msqrt><mrow><mi
   mathvariant="normal">a</mi><mo>′</mo><msub><mi
   mathvariant="normal">Σ</mi><mi>XX</mi></msub><mi
   mathvariant="normal">a</mi></mrow></msqrt><msqrt><mrow><mi
   mathvariant="normal">b</mi><mo>′</mo><msub><mi
   mathvariant="normal">Σ</mi><mi>YY</mi></msub><mi
   mathvariant="normal">b</mi></mrow></msqrt></mrow></mfrac> :MATH]

   where, Σ[XX] = Cov(X, X), Σ[YY] = Cov(Y, Y), Σ[XY] = Cov(X, Y), and
   Cov(., .) is covariance matrix.

   Then, the distance matrix converted from CCA matrix between multiple
   attractors corresponding to multiple individual Boolean networks, is
   used to build the hierarchical tree. Two clusters are divided,
   expecting one corresponds to Sx group and the other one corresponds to
   Asx group. And the clustering accuracy, defined as the application
   efficiency, is evaluated by the index Acc [[68]32]. Given samples in Sx
   and Asx groups that are:
   [MATH: <msubsup><mfenced close="}" open="{"><msub><mi
   mathvariant="normal">S</mi><mi
   mathvariant="normal">i</mi></msub></mfenced><mrow><mi
   mathvariant="normal">i</mi><mo>=</mo><mn>1</mn></mrow><mn>2</mn></msubs
   up> :MATH]

   where the state-based individuals clustering also provide two
   individual clusters:
   [MATH: <msubsup><mfenced close="}" open="{"><msub><mi
   mathvariant="normal">C</mi><mi
   mathvariant="normal">j</mi></msub></mfenced><mrow><mi
   mathvariant="normal">j</mi><mo>=</mo><mn>1</mn></mrow><mn>2</mn></msubs
   up> :MATH]

   Then, the application efficiency of network reconstruction is
   calculated as:
   [MATH: <mi>Acc</mi><mo>=</mo><mfrac><mrow><munder><mo>min</mo><mrow><mi
   mathvariant="normal">τ</mi><mfenced close=")" open="("><mfenced
   close="]" open="["
   separators=","><mn>1</mn><mn>2</mn></mfenced></mfenced></mrow></munder>
   <munderover><mo>∑</mo><mrow><mi
   mathvariant="normal">j</mi><mo>=</mo><mn>1</mn></mrow><mn>2</mn></munde
   rover><mo>∣</mo><msub><mi mathvariant="normal">C</mi><mi
   mathvariant="normal">j</mi></msub><mo>⊓</mo><msub><mi
   mathvariant="normal">S</mi><mrow><mi
   mathvariant="normal">τ</mi><mfenced close=")" open="("><mi
   mathvariant="normal">j</mi></mfenced></mrow></msub><mo>∣</mo></mrow><mr
   ow><munderover><mo>∑</mo><mrow><mi
   mathvariant="normal">i</mi><mo>=</mo><mn>1</mn></mrow><mn>2</mn></munde
   rover><mo>∣</mo><msub><mi mathvariant="normal">S</mi><mi
   mathvariant="normal">i</mi></msub><mo>∣</mo></mrow></mfrac> :MATH]

   Besides, the performance measurement of SVM adopts the AUC with random
   5-fold cross-validation, which is used to evaluate the robustness of
   SVM features (i.e. pathway states as indicators).

Results and discussion

Instruction of datasets and experimental steps

   To assess ABP, we collected and analyzed serially sampled gene
   expression data from a challenge study [[69]27], noted as Rhinovirus
   UVA data. This data mainly contains 20 human volunteers (subjects)
   inoculated with live human rhinovirus (HRV), and each subject was
   serially sampled for a few days quantifying temporal whole blood gene
   expression by Affymetrix GeneChips technology, clinical symptom scores
   self-reported over 8–10 symptoms, and viral shedding from periodic
   nasopharyngeal titrations [[70]33]. Each subject has different samples
   on 15 time points, one time point before viral Inoculum and the other
   14 time points after inoculation. Each subject was designated as a
   symptomatic subject (Sx) or an asymptomatic subject (Asx) by a modified
   Jackson score based on these clinical symptoms self-reported [[71]33].
   The analyzed HRV UVA subjects were divided in to Sx group (10
   individuals) and Asx group (6 individuals). Thus, actually, there were
   (10 + 6)*15 = 240 gene expression profiles used in this data study.

   Based on the pathway activities across multiple samples, we determined
   the critical time point and corresponding selected pathways. Seeing
   Fig. [72]2, the number of differentially activated pathways achieved
   maximum at 12[th] time point (i.e. 48 h after inoculation), and 13
   pathways significantly changed at this time point suggest a critical
   point between two groups of subjects. These selected pathways in
   Table [73]1 were applied as the activity indices aiming to trace the
   state change of pathways across consecutive time points in each
   subject, and the final state of pathways was determined by the steady
   sate of a Boolean network corresponding to each subject. Noted, at 14
   [th] time point, there were also nine dys-regulated pathways observed,
   only two of which were also observed at 12 [th] time point (i.e. Fatty
   acid degradation and Glycine, serine and threonine metabolism), and the
   remains had less significance relevant to the studied disease. Thus,
   the selected 13 pathways at 12[th] time point should be more pathogen
   informative in following analysis.

Fig. 2.

   [74]Fig. 2
   [75]Open in a new tab

   The number of dysfunctional pathways observed in different time points.
   The vertical axis of the histogram represents the number of
   dys-regulated pathways observed at particular time point; and the
   horizontal axis of the histogram represents the time points

Table 1.

   The key pathways detected in HRV dataset
   Metagene
   ID Kegg
   ID Pathway
   ID
      Mgene 1  hsa00062 Fatty acid elongation
      Mgene 2  hsa00260 Glycine, serine and threonine metabolism
      Mgene 3  hsa00630 Glyoxylate and dicarboxylate metabolism
      Mgene 4  hsa03320 PPAR signaling pathway
      Mgene 5  hsa04115 p53 signaling pathway
      Mgene 6  hsa04725 Cholinergic synapse
      Mgene 7  hsa04750 Inflammatory mediator regulation of TRP channels
      Mgene 8  hsa04920 Adipocytokine signaling pathway
      Mgene 9  hsa04972 Pancreatic secretion
      Mgene 10 hsa05143 African trypanosomiasis
      Mgene 11 hsa05146 Amoebiasis
      Mgene 12 hsa05160 Hepatitis C
      Mgene 13 hsa05164 Influenza A
   [76]Open in a new tab

Key pathways associated with individual phenotypes

   Rhinovirus (RV) leads the majority of common colds, and also causes
   exacerbations in patients with asthma and chronic obstructive pulmonary
   disease. In the detected key pathways by ABP, many signaling pathways
   were efficiently detected and could be productive entry pathways for
   HRV. By examining the effects of tiotropium on RV infection and RV
   infection-induced airway inflammation, RV14 titres, RNA and cytokine
   concentrations would be reduced, along with the reduction of the
   expression of intercellular adhesion molecule and the number of
   cellular acidic endosomes modulating airway inflammation in rhinovirus
   infection [[77]34]. Influenza virus or HRV infections of the upper
   airway can cause colds and the flu, and they can also activate
   exacerbations of lower airway diseases so as to induce chronic
   obstructive pulmonary disease and asthma, where a systems approach has
   identified the temporally changing patterns of host gene expression
   from these viruses [[78]35], and a transmembrane protein RNASEK is
   needed for the replication of HRV, influenza A virus, and dengue virus
   [[79]36]. In addition, a global study has also confirmed that RSV is an
   important respiratory pathogen in the elderly, and preventative
   measures such as vaccination could decrease severe respiratory
   illnesses and complications in the elderly [[80]37]. Effective drugs
   and modification of pre-existing drugs or regimens would be improve to
   increase effectiveness of antiviral therapy, Pleconaril has some
   activity against enteroviruses and some efficacy against rhinoviruses
   in ongoing trials [[81]38], e.g. it has been suggested that quercetin
   inhibits RV endocytosis and replication in airway epithelial cells at
   multiple stages of the RV life cycle [[82]39].

   In addition, a few our detected biological processes and functions are
   also associated with virus infection [[83]40–[84]42], such as HRV
   infection, as discovered in Table [85]1 (Noted, the metagene id in this
   table will be used to label the corresponding pathway in following
   figures for convenience). For example, the transient receptor potential
   (TRP) channel family are potential candidates for sensing physical and
   chemical stimuli, and TRP channels +may be novel therapeutic targets
   for controlling virus-induced cough [[86]43]; Amoebiasis, the condition
   of harbouring the protozoan parasite Entamoebahistolytica, is a major
   health problem throughout the world [[87]44], and parasite virulence
   can implicate multiple amoebic and host factors through complex
   host-parasite interactions [[88]45]; and HCV infection is able to
   induce autophagy and downstream UPR molecules regulating key autophagic
   gene expression, which just can similarly promote human rhinovirus
   infection via the autophagic pathway [[89]46, [90]47].

Temporal module network charactering the dynamical process of individual
phenotype appearance

   For each subject / individual, the topological structure of his / her
   Boolean network is displayed in Fig. [91]3. Obviously, these structures
   can’t classify the Sx and Asx groups directly, although there were some
   group specific edges observed in the networks. For instance, the
   regulation association between “Hepatitis C” and “Influenza A”
   pathways, has many observations in the Sx-individual specific networks
   (5 in 10) rather than in the Asx-individual specific networks (0 in 6),
   which is represented by the edge (meta-)Gene12 - > (meta-)Gene13; or
   the regulation association between “Amoebiasis” and “Glycine, serine
   and threonine metabolism” pathways, also has some observations in the
   Sx-individual specific networks (4 in 10) but none in the
   Asx-individual specific networks (0 in 6), which is represented by the
   edge (meta-)Gene11 - > (meta-)Gene2. Besides, there is a
   cross-reactivity between hepatitis C virus and Influenza A virus
   reported, and the host responses to an infectious agent can be
   influenced by such cross-reactive memory cells [[92]48]. Thus, the
   “Hepatitis C” and “Influenza A” pathways, along with their involved
   genes / proteins, and even their interaction could all play important
   roles in HRV infection through a transferable manner.

Fig. 3.

   [93]Fig. 3
   [94]Open in a new tab

   The topological structure of Boolean network corresponding to each
   subject. The Sx individual is labeled with red box; the Asx individual
   is labeled with green box; and the individuals labeled with grey box
   have not determinate clinical phenotype evidences in original report.
   Noted, each gene id is actually a metagene id which represents a
   particular pathway listed in Table [95]1

Attractor of module network indicating the pathways states corresponding to
phenotypes

   In addition to the aforementioned comparison and discussion about
   topological structures of individual specific BN, we further compared
   the individual specific final states of BN represented by the network
   attractors of networks. As illustrated in Fig. [96]4, there were
   several obvious common pathways observed in active states for Sx or Asx
   subjects, e.g. Hepatitis C and Influenza A, indicating the shared
   molecular mechanism suffering from virus infection. And most other
   pathways will be active or not for different subjects, but not
   distinguish for Sx and Asx groups separately. By contrast, integrating
   all pathways’ states, a simple clustering (i.e. HCL) can distinguish
   the Sx and Asx individuals with an accuracy larger than 80%, and would
   be more efficient than that based on topological structures in a
   parameter space, as displayed in Fig. [97]5a. Next, the AUC of SVM with
   5-fold cross-validation had also been used to evaluate the robustness
   of our adopted network attractor based classification model. The
   results in Fig. [98]5b illustrated again the classification model
   trained from pathways’ states can achieve higher AUC than the
   classification model trained from pathways’ topological structures. In
   addition, in a bootstrap manner, by removing the samples at each time
   point, the attractor based model had been re-built and its performances
   kept well (i.e. with robustness) as shown in Fig. [99]5c. These results
   supported again the merit of this systems biology research, which
   provided discriminative systematical features (i.e. network features)
   to characterize the phenotype diversity, and also revealed the
   interpretable mechanisms underlying individual specific phenotypic
   changes. It should be noted that, although there are only 20
   individuals in this data, the number of samples is actually larger than
   200. And in the efficiency evaluation by hierarchical clustering or
   SVM, there are indeed 13 features used, which is less than the number
   of observations of model. Thus, the performance obtained here is not
   trivial. Of course, the evaluation on more individuals with multiple
   temporal samples should be further carried on in future work.

Fig. 4.

   [100]Fig. 4
   [101]Open in a new tab

   The attractor states of Boolean network corresponding to each subject.
   The Sx individual is labeled with red box; the Asx individual is
   labeled with green box; and the individuals labeled with grey box have
   not determinate clinical phenotype evidences in original report

Fig. 5.

   [102]Fig. 5
   [103]Open in a new tab

   The efficiency for discriminating Sx and Asx individuals by the state
   and structure of Boolean network respectively, with statistic of Acc
   under different parameters. a The comparison between pathway activity
   based model and pathway network based model by Acc from clustering
   performance. b The comparison between pathway activity based model and
   pathway network based model by AUC from classification performance. c
   The robustness evaluation of pathway activity based model in a
   bootstrap manner

Replicate discovery on independent dataset

   To further validate the robustness of ABP, we used another gene
   expression data also inoculated with live human rhinovirus (HRV)
   [[104]27], and this dataset was referred to Rhinovirus Duke data. This
   new dataset includes serially sampled gene expression data from 19
   human volunteers (subjects). Each subject has different samples in 19
   time points, one time point before the viral Inoculum and the remaining
   18 time points behind inoculation. That means there are actually
   19*19 = 361 expression profiles here. According to a modified Jackson
   score computed from the self-reported clinical symptoms, the analyzed
   HRV Duke subjects were also divided in to Sx group (9 individuals) and
   Asx group (3 individuals) and others [[105]27]. We used the same key
   pathways with an assumption that the pathway reaction mechanism would
   be similar in the same disease, and more importantly this can also
   validate key pathways in an independent test. From the results on this
   dataset, we actually observed again that Boolean network model can
   mimic well a dynamical process that the biological system changes from
   one state to another. Especially, the steady state of the network
   attractors can distinguish Ax and Asx individuals efficiently
   (Fig. [106]6), which was also robust on parameters than that based on
   network structures. Notably, in the unsupervised evaluation as
   clustering in Fig. [107]6a, the steady state of the network attractors
   obviously outperformed the structure of network. Meanwhile, in the
   supervised evaluation as classification in Fig. [108]6b, the steady
   state of the network attractors tended to have satisfactory AUC
   performance (e.g. > 0.8) on larger parameter scopes than the structure
   of network, although the median performance of the structure of network
   seemed to be higher possibly caused by sample unbalance.

Fig. 6.

   [109]Fig. 6
   [110]Open in a new tab

   The efficiency for discriminating Sx and Asx individuals by the state
   and structure of Boolean network respectively on independent dataset,
   with statistic of Acc under different parameters. a The comparison
   between pathway activity based model and pathway network based model by
   Acc from clustering performance. b The comparison between pathway
   activity based model and pathway network based model by AUC from
   classification performance

Conclusions

   In this study, we designed and implemented a computational framework
   ABP to investigate the dynamic of biological process for individual
   comparisons, especially on the network of pathways rather than genes.
   Based on the quantification of pathways activity (i.e. estimated
   activity score based on expression), the Boolean networks are
   reconstructed for each individual, so that the important network
   attractor can be obtained and applied to infer the final stable
   biological system state of individuals after particular phenotypic
   change (e.g. virus infection). Noted, some other pathway score like
   Pathifier has also been tested, however, the results show it is weak to
   obtain reasonable Boolean network, which would be caused by these
   scores being limited into a numeric region that would greatly affect
   the following network inference. Thus, the optimized combination of
   pathways activity estimation and Boolean network construction should be
   deserved to future study. Actually, both the structure and the
   attractor of the Boolean network can reveal indicative features (e.g.
   pathway associations or pathway activations) distinguishing Sx
   individuals or its sub-group from the Asx individuals. They would
   provide candidate disease identification or prediction approaches when
   there are enough individuals with temporal samples available [[111]49].

   Indeed, our proposed ABP has shown its efficiency and accuracy on
   identifying phenotype-associated pathways. These identified pathways
   could explain the phenotypic change in a state-transition manner, which
   is strongly supported by wide analysis and evaluation on multiple
   temporal gene expression datasets of HRV infections. A future direction
   is to further investigate how to build attractor-focused disease
   prediction model [[112]50], and especially to release the model
   implementation as a web service for wide bioinformatics and biomedical
   study and application.

Acknowledgments