Abstract

   The study of systems genetics is changing the way the genetic and
   molecular basis of phenotypic variation, such as disease susceptibility
   and drug response, is being analyzed. Moreover, systems genetics aids
   in the translation of insights from systems biology into genetics. The
   use of systems genetics enables greater attention to be focused on the
   potential impact of genetic perturbations on the molecular states of
   networks that in turn affects complex traits. In this study, we
   developed models to detect allele-specific perturbations on
   interactions, in which a genetic locus with alternative alleles exerted
   a differing influence on an interaction. We utilized the models to
   investigate the dynamic behavior of an integrated molecular network
   undergoing genetic perturbations in yeast. Our results revealed the
   complexity of regulatory relationships between genetic loci and
   networks, in which different genetic loci perturb specific network
   modules. In addition, significant within-module functional coherence
   was found. We then used the network perturbation model to elucidate the
   underlying molecular mechanisms of individual differences in response
   to 100 diverse small molecule drugs. As a result, we identified
   sub-networks in the integrated network that responded to variations in
   DNA associated with response to diverse compounds and were
   significantly enriched for known drug targets. Literature mining
   results provided strong independent evidence for the effectiveness of
   these genetic perturbing networks in the elucidation of small-molecule
   responses in yeast.

Introduction

   Elucidation of the molecular and genetic basis of phenotypic variation
   has been a longstanding goal in genetics, including all aspects of
   morphology, physiology, behavior, disease susceptibility, and drug
   response. Genome-wide association studies (GWAS) mapping quantitative
   trait loci (QTLs) have enabled the identification of regions of the
   genome in which genetic variations are associated with phenotypic
   variation. However, the computational understanding of the biological
   mechanisms underlying genetic variations remains unclear. Numerous
   previous studies have dissected the downstream effects of genetic
   perturbations on RNA intermediates, proteins, metabolites and other
   molecular endophenotypes [43][1], [44][2], [45][3], [46][4], [47][5].
   Whole-genome expression QTL (eQTL) analysis in yeast, mice, and humans
   has demonstrated that gene expression traits are highly inheritable and
   exhibit surprisingly complex underlying genetic architecture [48][6],
   [49][7], [50][8]. By layering gene expression phenotypes as
   intermediate phenotypes, many studies have combined eQTL and disease
   GWAS to identify causal relationships between genes and disease
   (reviewed by Ertekin-Taner [51][5]). Therefore, further elucidation of
   changes in molecular states that directly respond to changes in DNA
   could potentially fill in the information gaps left by GWAS, and serves
   as an excellent first step to understanding the drivers of a complex
   phenotype.

   However, most proteins perform their functions through interactions
   with other proteins, or as part of biochemical pathways and networks.
   Thus, some genes may respond as groups due to their membership in
   networks. As an alternative to the studies assuming that genes act
   independently, certain previous studies have utilized an effective
   approach that assessed higher-order a priori defined gene network
   responses caused by genetic variation [52][9], [53][10], [54][11],
   [55][12], [56][13], [57][14], [58][15], [59][16]. However, these
   studies only considered the variation of the synthetic expression of
   genes in a priori defined networks, and failed to model how genetic
   variations are mediated by a network of molecular interactions in the
   cell. In addition, the association of these networks that are affected
   by genetic variations with the resultant phenotypic variation remains
   poorly understood.

   The use of systems genetics is changing the future of genetics.
   Moreover, it is a corollary to the basic insight of systems biology in
   that most complex traits of living things are properties generated by
   dynamic networks of interacting genes and molecules [60][17], [61][18].
   In recent years, our ability to interpret the phenotypic variation in
   model systems, and ultimately in humans, has benefited from systems
   genetics approaches [62][17], [63][18], [64][19], [65][20], [66][21],
   [67][22], [68][23]. Chen et al. [69][21] constructed co-expression
   networks through the combination of gene expression and genotype data,
   and uncovered components of the co-expression networks that respond to
   genetic variations associated with disease-associated traits. This
   study confirmed that complex traits, such as obesity, are potentially
   emergent properties of molecular networks modulated by complex genetic
   loci and environmental factors. Zhong et al. [70][23] proposed a model
   to describe the effect of disease-causing mutations in Mendelian
   disorders on systems or interactome properties. Equipped with the tools
   emerging from the genomics revolution, the study of the effects of
   genetic perturbations on molecular networks will serve as an important
   aspect of systems genetics research.

   Various biochemical or biophysical interaction(s) are the building
   blocks of biological functions and processes and are also the basic
   units of molecular networks; thus, it is important to investigate the
   effects of genetic variations on molecular interactions. Recently,
   several reports have explored the effects of genetic variations on
   protein-DNA interactions [71][24] or protein-protein interactions
   [72][23] from the perspective of the structural properties of physical
   binding. In this study, through integrating gene expression data,
   genotype data, and molecular interactions, we developed models to
   detect two types of allele-specific perturbations on interactions,
   termed allele-specific co-perturbation (ASCP) and allele-specific
   dys-perturbation (ASDP). In both cases, a genetic locus exerted a
   different influence on the transcriptional program of interactions for
   cases in which the locus exhibited distinct (alternative) alleles.

   In this study, using a well-controlled system, we aimed to investigate
   how molecular networks respond to DNA perturbations based on the
   interaction perturbation models (ASCP and ASDP), and thus, drive
   variations in physiological states associated with phenotypic
   variation. Therefore, we used a panel of 112 genotyped and
   expression-profiled yeast strains [73][6] which were also treated with
   a collection of 100 diverse compounds, termed small-molecule
   perturbagens (SMPs) [74][25], to investigate the effect of genetic
   variations (such as Single Nucleotide Polymorphisms, SNPs) on the
   dynamic behavior of an integrated interactome of yeast, and to
   elucidate the molecular mechanisms underlying individual differences in
   response to small-molecule drugs in yeast based on networks that
   exhibit genetic perturbation.

Results

EQTL Mapping

   We performed eQTL mapping using gene expression and SNP genotyping data
   on 112 segregants generated in a cross between laboratory (BY) and wild
   (RM) strains of Saccharomyces cerevisiae [75][6]. We sought to identify
   an initial set of potential associations between eQTLs and their target
   genes for further analysis. In order to determine the statistical
   significance more accurately, we merged adjacent markers that exhibited
   the same genotype profile to obtain a total of 1118 representative
   markers and selected 4500 transcripts with significantly high
   heritability ( Inline graphic >0.669 at a false discovery rate, FDR of
   0.05) (see [76]Materials and Methods). Finally, we performed linkage
   calculations between 1118 genetic markers and 4500 transcript levels
   using the Student’s t-test, and assessed significance via permutations
   (see eQTL mapping section in [77]Materials and Methods for algorithmic
   details). As a result, we detected 30,793 associated transcript-locus
   pairs (FDR<0.05). There were 3175 transcripts (3141 ORFs) linked to at
   least one QTL and an average of 9.7 QTLs per transcript. A total of
   1112 markers were linked to at least one target transcript. To
   determine whether the loci identified by linkage act in cis or in
   trans, we investigated transcripts whose levels were linked to markers
   within 10 kb of their own gene. We found that 656 (21%) of the 3175
   transcripts fell into the cis-acting category and most expression
   differences mapped to trans-acting loci. Our findings are in
   concordance with other reports in which trans-acting loci appear to be
   responsible for most differences in gene expression between yeast
   strains [78][26], [79][27].

Determining Allele-specific Perturbation of Interactions

   To assess how variations at the genetic level that affected gene
   expression levels (eQTLs) would alter molecular interaction states in
   the cell, we introduced allele-specific perturbation of interaction; we
   defined this as: a genetic locus with alternative alleles that exerts a
   different influence on an interaction. Two models were developed to
   detect the allele-specific behavior of interactions: allele-specific
   co-perturbation (ASCP) and allele-specific dys-perturbation (ASDP) (a
   detailed description can be found in the [80]Materials and Methods
   section). ASCP denotes that an eQTL synchronously regulated the
   expression of two interacting genes. ASDP denotes that an eQTL
   triggered significant changes in expression correlation between two
   interacting genes. Each interaction in an integrated interactome,
   containing protein-protein interactions (PPI), protein-DNA interactions
   (PDI), kinase-protein interactions (KPI), and enzyme-enzyme
   interactions (EEI) (a detailed description of the interaction types can
   be found in the [81]Materials and Methods section), was then analyzed
   to determine which demonstrated allele-specific behavior based on the
   two models. As a result, we identified 10440 ASCP relationships between
   408 eQTLs and 2184 interactions, and 5774 ASDP relationships between
   758 eQTLs and 2416 interactions (P<0.001, FDR<0.082). A total of 70.86%
   (788/1112) of the eQTLs perturbed at least one interaction. In
   addition, we analyzed the cis-acting or trans-acting regulation between
   the genetic loci and their perturbed ASCP and ASDP interactions. Five
   different conditions were investigated and detailed information is
   shown in [82]Table S1. Interestingly, we found that many interacting
   genes in ASCP and ASDP interactions are located on the same
   chromosomes, even located adjacent to their regulators (eQTLs). This
   phenomenon may reflect the linear arrangement and aggregation of yeast
   genes on the same chromosomes which are required for coordinate
   expression of genes involved in related metabolic or regulatory
   pathways [83][28], [84][29], [85][30]. In addition, among the 3141
   target genes associated with at least one eQTL, approximately 65%
   exhibited allele-specific perturbation of interactions with their
   partners in the interactome. It has been suggested that for a
   considerable portion of target genes, the eQTLs that disturbed their
   expression levels would also cause a degree of dynamic variation in
   their local network. Various biochemical or biophysical interaction(s)
   are the basic units of a molecular network and also the building blocks
   of biological functions and processes; thus, exploration of the
   allele-specific behavior of interactions makes it possible to test the
   downstream effects of genetic perturbations on molecular networks.

Comparison of the Effect of ASCP and ASDP on Networks

   [86]Table 1 shows the overall statistics of perturbed interactions
   detected by ASCP and ASDP models. Firstly, we found that 13.53%
   (4443/32831) of the interactions between target genes (of eQTLs) and
   their partners in the integrated network are perturbed by at least one
   eQTL. This finding indicates that many of the interactions (∼86%) may
   behave consistently across individuals with different genotypes and
   represent a cellular network ‘backbone’. Secondly, we found that PPIs
   in complexes exhibited ASCP more frequently. When annotated to 305
   protein complexes obtained from the CYC database [87][31], 162 of the
   939 (17.3%) ASCP PPIs were found to be in the same complexes, whereas
   only 56 of the 819 (6.8%) ASDP PPIs were located in the same complexes
   (Fisher’s exact test, P = 1.184e–11). For PDIs, regulatory
   relationships between TFs (TF-TF) were also observed more frequently in
   ASCP (Fisher’s exact test, P = 0.03809), whereas phosphorylation events
   between two kinases (K-K) were observed more frequently in ASDP
   (P = 0.03245) ([88]Figure S1A).

Table 1. Summary of the global effect of ASCP and ASDP on the four studied
molecular interaction networks and the integrated interactome.

   NETWORK  Edges ASCP ASDP Both Percentage
   PPI      12067 939  819  1703 14.11%
   PDI      6815  524  616  1056 15.50%
   KPI      13532 550  842  1336 9.87%
   EEI      1912  261  237  421  22.02%
   COMBINED 32831 2184 2416 4443 13.53%
   [89]Open in a new tab

   The second column denotes the total number of interactions detected for
   allele-specific behavior. The third and the fourth column represent the
   number of ASCP and ASDP interactions detected respectively. The fifth
   column represents the total number of perturbed interactions when
   considering both ASCP and ASDP, while the proportion they occupy in all
   interactions tested is listed in the sixth column.

   In order to facilitate analysis, we constructed two networks called CPN
   and DPN that contained all of the ASCP and ASDP interactions,
   respectively. As shown in [90]Figure S1B, most interactions in CPN are
   connected and form a large connecting component that contains ∼90% of
   all nodes; the same is observed in DPN (∼93% of all nodes). These
   results indicate that the transmission of the effect of genetic
   variations is mediated by network continuity. By merging these two
   types of perturbed interactions, a larger system is formed, thus, the
   coordination of these two disturbance effects is evident. We evaluated
   the degree distribution of genes and observed a power-law in both CPN
   and DPN ([91]Figure S1C). This suggests that both models of genetic
   perturbing networks display scale-free characteristics [92][32]. CPN
   and DPN shared approximately 39.8% (910/2286) proteins but only
   approximately 3.53% edges, indicating that a large quantity of
   interactions is rewired between CPN and DPN ([93]Figure S1D). Thus,
   these two models represent various genetic perturbing actions on a
   molecular network. Hence, ASCP and ASDP could potentially reveal the
   condition-specific dynamic information hidden among otherwise common
   static interactions.

The Interactome Exhibits a Modular Changing Pattern Under Allele-specific
Context

   As discussed above, when genetic variations occurred on DNA sequences,
   some gene expression levels are affected (eQTL mapping), and cause the
   alteration of interaction states (allele-specific perturbation on
   interactions). Next, dynamic changes in networks would be triggered. In
   this study, we introduced an allele-specific context to capture the
   changing patterns of a network from one allele state to another. Due to
   the linkage disequilibrium (LD) of adjacent eQTLs, we identified 263
   haplotype LD blocks (B1–B263) defined by the non-random association of
   genetic variants at two or more loci using Haploview software [94][33].
   We then assembled all ASCP and ASDP interactions associated with the
   eQTLs located in the same block region. As a result, we found 246
   blocks associated with at least one interaction; of these, 180 were
   associated with at least three edges. Next, we built an association
   matrix that represented the allele-specific perturbation relationships
   among all of the blocks and their perturbed interactions, and performed
   hierarchical clustering on the perturbed interactions. [95]Figure 1A
   shows the clustering results of four sub-matrices extracted from the
   overall association matrix. These sub-matrices represent the
   associations between 32 blocks and 1446 PPIs, 30 blocks and 937 PDIs,
   23 blocks and 1078 KPIs, and 14 blocks and 366 EEIs, respectively. Our
   findings indicated that for all four types of networks, each block
   primarily regulated a small portion of the network and that distinct
   parts of the network are primarily modulated by a few particular
   blocks. Thus, a complicated combination is evident, in terms of both
   target specificity and cooperativity of blocks. Moreover, the overall
   integrated network presented the same changing pattern under an
   allele-specific context ([96]Figure S2).

Figure 1. The changing pattern of molecular networks under allele-specific
context determined by LD blocks.

   [97]Figure 1
   [98]Open in a new tab

   (A) Hierarchical clustering on the perturbed interactions based on the
   association matrix between blocks and their perturbing interactions in
   PPI, PDI, KPI, and EEI networks, respectively, in which rows represent
   blocks and columns represent interactions. In each of the four
   sub-matrices, blocks associated with less than ten interactions are
   filtered out. The columns are reordered according to hierarchical
   clustering. The rows colored correspond to B20. (B) The allele-specific
   sub-network associated with B20. The sub-network is composed of 142
   PPIs, 54 PDIs, 93 KPIs, and 26 EEIs, which are marked with green, red,
   blue and orange, respectively. Nodes marked with red are proteins
   constituting a GO cellular component (nuclear lumen), and PPIs among
   these genes are regulated by B20. Nodes marked with green are enzymes
   from a KEGG pathway (DNA-directed RNA polymerase activity), and their
   EEIs are regulated by B20.

   To further analyze the role that individual blocks play in the
   regulation of the integrated network, we mapped an allele-specific
   sub-network for each block. Each sub-network corresponded to a minimal
   component in the original integrated network that contained all the
   genes consisting of the interactions associated with a block. It is
   guaranteed that all of the interactions perturbed by the block are
   located in this network component. We investigated the sub-graph
   properties of these sub-networks to determine their topological
   features in the integrated network. A summary of the results is listed
   in [99]Table 2 and [100]Figure 2A. Computer simulation results
   demonstrated that the topological features of these sub-networks were
   significantly different from randomized networks (details of the
   randomization test are provided in [101]Materials and Methods). The
   results indicated that the characteristic path length between genes in
   the same sub-network was significantly shorter, and the average density
   of these sub-networks was significantly greater. Moreover, the average
   in-degree ratio of the sub-networks was significantly higher,
   indicating that proteins composing the allele-specific sub-network
   jointly exhibit significant modularity in the integrated network. The
   in-degree ratio is defined as the ratio (R) of in-degree to out-degree
   of a sub-network, in which “in-degree” represents the number of its
   connections within the sub-network, and “out-degree” represents its
   connections outside sub-network [102][34]. In total, 162 blocks (89.5%
   of 180 blocks investigated) were demonstrated to perturb interactions
   located in a module of the integrated network (FDR<0.05). In other
   words, the alteration of allele state of a block potentially leads to
   changes inside a network module. In addition, modules in the integrated
   network are composite modules; thus elucidation of the complex
   relationships underlying multiple biological interaction types will
   further the understanding of complex cell processes. For example, we
   found that B20 on chromosome II regulates 318 interactions among 246
   genes, including 93 KPIs, 54 PDIs, 142 PPIs, and 26 EEIs ([103]Figure
   1B). Most interactions (92%) perturbed by B20 were connecting
   components gathered in a network module, moreover, various types of
   interactions were gathered in certain areas; the PPIs in the area
   marked by orange constitute a cellular component called nuclear lumen,
   and the EEIs in the area marked by green are derived from a pathway
   that controls DNA-directed RNA polymerase activity. These findings
   suggest that the information exchange and collaboration occurs among
   multi-layer molecular networks.

Table 2. Summary of topological properties of the allele-specific
sub-networks and their Z-scores.

                    All   ASN                     Random
                    Mean  Mean   <Z>    p-value   Mean
   Char.path length 2.857 2.0548 −12.34 5.57e–035 4.27
   Density          0.004 0.2226 21.51  1.18e–102 0.046
   In-degree ratio  N.A.  0.0175 18.42  8.43e–076 0.0065
   [104]Open in a new tab

   Topological properties of the yeast integrated network (All), the
   allele-specific sub-networks associated with blocks (ASN), and the
   random sub-networks (Random). The average Z-scores and the p-values for
   the observed properties of allele-specific sub-networks are also
   listed, respectively. N.A. not applicable.

Figure 2. A summary of the sub-graph properties of the sub-networks
associated with LD blocks.

   [105]Figure 2
   [106]Open in a new tab

   (A) The characteristic path length among genes in the same
   allele-specific sub-networks is significantly shorter compared to a
   randomization test. Moreover, the density and in-degree ratio of these
   sub-networks are significantly greater compared to a randomization
   test. Boxes of light color represent the distribution of the sub-graph
   properties (characteristic path length, density, in-degree ratio) of
   allele-specific sub-networks, and the black boxes correspond to random
   ones. (B) The panel shows the difference in CSR values between block
   pairs on the same chromosomes and on different chromosomes. P-values
   are calculated using the Wilcoxon rank-sum test.

Proximal Blocks Regulating Similar Network Components

   It has been observed previously that different chromosomal regions (LD
   blocks) regulate common network modules together. Here, we found a
   total of 423 pairs of blocks associated with overlapping
   allele-specific sub-networks. We introduced a score termed Component
   Share Ratio (CSR) to measure the degree of overlap between two network
   components regulated by two distinct blocks ([107]Materials and
   Methods). This value ranges from 0 to 1; the higher the score, the more
   overlap between two network components, with 1 indicating that two
   blocks regulate identical network components.

   We divided 423 pairs of blocks that exhibited a CSR value into two
   groups: 211 block pairs were located on the same chromosome and 212 on
   different chromosomes. Next, we calculated the significance of the
   differences between the two groups and found that block pairs on the
   same chromosome exhibited significantly higher CSR values
   (log-transformed) compared to block pairs on different chromosomes
   (P = 4.9165e-051, [108]Figure 2B). In this study, a higher CSR value
   reflects a higher gene composition and function similarity between two
   blocks’ perturbed network modules. It is observed that the
   regulator-pairs (LD block-pairs) on the same chromosome have much
   higher CSR values. Moreover, the network modules regulated by the
   blocks on the same chromosomes, especially on adjacent regions, are
   enriched in metabolic processes, signaling pathways and enzyme
   catalyzed processes. It is suggested that LD blocks on the same
   chromosomes, especially adjacent blocks tend to involve in the
   regulation of common or correlated functions. This phenomenon may
   reflect the coordinated regulation requirement of genes involved in
   related metabolic or regulatory processes [109][29], [110][30]. In
   addition, some network modules perturbed by block-pairs on separate
   chromosomes also have high CSR values; this may imply the universality
   and genetic complexity of the yeast genome in the regulation of
   molecular networks.

Biological Features of the Allele-specific Sub-networks Associated with
Blocks

   We evaluated the biological features of these allele-specific
   sub-networks. We classified 2286 genes from all of the sub-networks
   into 85 functionally related gene groups (or classes) (F1–F85) with
   enrichment score > = 1.3 (equivalent to non-log scale 0.05) using the
   Functional Classification Tool provided by DAVID [111][35], [112][36],
   with 833 genes not included in any group. Next, we performed functional
   enrichment analysis using hypergeometric distribution and detected 158
   enrichment relationships (FDR<0.05 P<0.0039) between 87 sub-networks
   and 48 functional groups. Using blocks to represent their associated
   sub-networks, these enrichment relationships are presented as a
   bipartite graph between 87 blocks (triangle) and 48 functional groups
   (circle) ([113]Figure S3).

   To create an intuitive display of the role that different chromosome
   segments (LD blocks) played in molecular network regulation or
   perturbation, we filtered out 57 relationships between 36 blocks and 16
   functional groups (57 red edges in [114]Figure S3). For each enrichment
   relationship between a block and a functional group, we extracted the
   perturbed interactions associated with the block and belonging to the
   enriched functional group. We re-constructed a network, shown in
   [115]Figure 3, by combining all of the interactions extracted; the
   functional modular regulation mode of genetic variations on the
   multi-layer integrated molecular network can be seen. Different
   chromosomal regions (blocks) that control or affect different parts or
   modules of the network exhibit intuitive biological features and
   trigger the modification of specific cellular functions. For instance,
   blocks on chromosome III (B29–B31) and X (B144) (yellow rectangles) are
   associated with a network module containing 14 enzymes (black box).
   This module is generally annotated as the “methionine biosynthetic
   process” (enrichment significance: P = 8.0E–27), and nine enzymes are
   located in the metabolism pathway-“sulfur metabolism: reduction and
   fixation”. Moreover, some functional groupings are found to be under
   the control of common blocks. For example, blocks on chromosome XIV
   (B219–B222) are associated with the cellular component “mitochondrial
   ribosome” and also regulate other functional classes, such as
   glycolysis, ATP binding, transcription, and the cell cortex. The
   mitochondrial ribosome is responsible for the biosynthesis of protein
   components crucial to the generation of ATP in the eukaryotic cell.
   Similar cases are the perturbation of blocks B237–B238 (simultaneously
   influencing oxidative phosphorylation and mitochondrial inner
   membrane), B111 (nuclear chromatin and nuclear division), and B29–B30
   (methionine biosynthetic process and branched chain family amino acid
   biosynthetic processes). Most of the joint perturbations by blocks
   reflect the coordination among different biological processes,
   molecular functions, and cellular components in a certain cellular
   function, and ATP (ATP binding) is one of the most generic sites
   (targeted by B179–B184, B228, etc.) in these processes ([116]Figure 3).

Figure 3. Biological features of the allele-specific sub-networks associated
with LD blocks.

   [117]Figure 3
   [118]Open in a new tab

   A reconstructed network created by combining all of the allele-specific
   perturbed interactions associated with the 36 blocks and belonging to
   their enriched 16 functional groups. The genes belonging to different
   functional groups are marked with the same colors as in [119]figure S3;
   also, chromosome blocks are marked using the same colors as their
   regulating functional classes. Functional annotation was conducted for
   each dyed gene module using DAVID based on GO and KEGG, and the most
   enriched functional item covering total genes (100%), along with its
   significance level (p-value), is presented near the dyed gene module.
   The gene module marked yellow is mainly associated with blocks on
   chromosome III (B29–B31) and X (B144); the lower right corner is a
   schematic representation of the metabolic pathway: sulfur metabolism,
   reduction and fixation, and nine enzymes in the module annotated to the
   pathway are marked with red rectangles.

Application in Understanding Small-molecule Drug Response in Yeast

   We utilized the model of genetic perturbing networks discussed above to
   gain a systematic understanding of small molecule perturbagen (SMP)
   response in genetically distinct yeast individuals. Firstly, we
   conducted linkage analysis between 324 phenotypes (segregant final
   yields of 100 SMPs at multiple time points and concentrations) and
   2,956 genetic markers previously genotyped in the segregants [120][6],
   [121][25]. We identified 1470 QTLs with an absolute value t statistic
   score >3.28 (FDR<0.05) using a method identical to eQTL mapping, and
   201 QTLs were in concordance with previous studies [122][25], covering
   nearly 92% of the original result (219 QTLs with a logarithm of the
   odds (lod) score Inline graphic 4). Because all time points and
   concentrations of each SMP were used as independent phenotypes, we
   combined QTLs associated with the same SMP but different concentrations
   and time points, and obtained QTLs for 92 SMPs. The position
   distribution for the QTLs along the genome is shown in [123]Figure 4A.
   Furthermore, for 91 SMPs, we obtained a sub-network perturbed by QTLs
   associated with each SMP response trait. Among these SMP response
   associated sub-networks, 16 were significantly enriched for known drug
   targets based on the entire chemical–protein interactions set obtained
   from the STITCH database [124][37], such as hydrogen peroxide (H2O2)
   (P = 5.70E–06) and menadione (P = 3.50E–07) ([125]Figure 4B and
   [126]Table 3). In total, 22 high-confidence (SCORE Inline graphic 700)
   yeast chemical–protein interactions were found in SMP associated
   networks, such as hydrogen peroxide, which targets the proteins (SOD2,
   CYB2); rapamycin, which targets the proteins (TOR2, LST8); and
   staurosporine, which targets the protein (YPL236C). This suggests that
   these sub-networks potentially reflect the molecular mechanisms of the
   drug action. Taking hydrogen peroxide (H2O2) as an example, the
   sub-network response to it is highly enriched for the GO biological
   processes ‘ergosterol metabolic process’, ‘lipid biosynthetic process’,
   and ‘oxidation reduction’ (P<5.7E–24, P<2.1E–19, and P<2.1E–7). The
   expression levels of the majority of the genes annotated to these GO
   terms were down-regulated in response to H2O2 in wild type ([127]Figure
   5, green triangles in left branch under allele state from RM). This
   observation is in concordance with the report by Pedroso and Folmer
   [128][38], [129][39] in which the adaptation to H2O2 was observed to
   repress the expression of genes (ERG3, ERG6, ERG7, ERG25, FEN1) coding
   for enzymes involved in both ergosterol biosynthesis and lipid
   metabolism in wild strains. The QTLs associated with H2O2 are located
   on chromosome XII and chromosome XIII. A LD block on chromosome XII
   contains the candidate gene HAP1 (YLR256W), which encodes a
   heme-responsive zinc-finger transcription factor whose expression
   varies according to oxygen levels in the cell, thus regulating oxygen
   dependent gene expression. As shown in [130]Figure 5, HAP1 serves as an
   important regulator and it regulates most of the proteins associated
   with H2O2 response. It has been previously reported that HAP1 with a
   Ty1 insertion induces expression changes of sterol biosynthesis genes
   [131][40]. Additionally, it is a known drug target of H2O2 and is
   included in the STITCH database. By utilizing our proposed model, we
   were able to identify molecular network components that respond to
   small-molecule drugs; this provided insight into how networks states
   changed from wild-type to mutant-type when responding to SMPs
   ([132]Figure 5). Characterizing molecular network states that underlie
   complex traits, such as drug response, potentially provides a
   comprehensive view, which in turn could potentially lead to the direct
   identification of genes or pathways underlying drug response processes
   and aid in the elucidation of the functional roles of these genes with
   respect to drug response. Thus, based on the modeling of genetic
   perturbing networks that respond to SMP, we investigated the potential
   biological application of drug function and drug target prediction, and
   investigated whether our results would uncover new and possibly
   non-intuitive relationships between biochemical pathways.

Figure 4. The QTLs distribution and drug targets enrichment of SMPs.

   [133]Figure 4
   [134]Open in a new tab

   (A) The QTLs distribution along the whole genome for 53 small-molecule
   drugs. QTLs associated with different SMPs are marked with various
   colors. (B) The sub-networks responding to the SMP response of hydrogen
   peroxide and menadione. Red nodes are known drug targets included in
   the STITCH database.

Table 3. Summary of enrichment analysis of known drug targets in the
sub-networks associated with 16 small-molecule drug responses.

   Compound response m   k    x  N    p           q (<0.05)
   hexylresorcinol   1   96   1  6063 0           0
   rapamycin         265 966  76 5799 2.59E–08    2.59E–07
   menadione         149 637  36 5915 3.50E–07    2.10E–06
   staurosporine     123 755  35 5941 4.21E–07    2.10E–06
   cycloheximide     188 365  28 5876 2.04E–06    8.15E–06
   hydrogenperoxide  314 205  26 5750 5.70E–06    1.90E–05
   anisomycin        33  450  10 6031 1.44E–05    4.11E–05
   ascomycin         10  1007 6  6054 0.000257504 0.000643761
   tamoxifen         33  147  4  6031 0.001073139 0.002384754
   doxorubicin       109 123  7  5955 0.001565578 0.003131156
   cerulenin         57  391  9  6007 0.003105336 0.005646066
   mastoparan        10  206  2  6054 0.003885136 0.005977133
   niguldipine       4   611  2  6060 0.003767185 0.005977133
   tomatine          9   238  2  6055 0.004206531 0.00600933
   gliotoxin         24  124  2  6040 0.01234541  0.016460546
   clotrimazole      26  70   1  6038 0.035718741 0.044648426
   [135]Open in a new tab

   x represents the number of known targets for the SMP drawn from the
   sub-network associated with the SMP; k represents the number of genes
   in the sub-network associated with the SMP; m represents the number of
   known drug targets for the SMP in the background gene set; n represents
   the number of genes in the background gene set excluded the known drug
   targets for the SMP; p-value is calculated as 1-phyper(x, m, n, k),
   where phyper is a function used in R.

Figure 5. The state of the sub-networks in response to hydrogen peroxide in
segregants inherited alleles from RM and BY.

   [136]Figure 5
   [137]Open in a new tab

   Solid edges indicate correlated expression between proteins in one
   allele state, whereas that organization is lost in another allele state
   (dotted line). Edge colors represent diverse interaction types between
   proteins, whereas node colors represent changes in gene expression
   between wild and mutant groups; green denotes down-regulation and red
   denotes up-regulation. Triangle nodes present genes annotated in
   ‘ergosterol metabolic process’, ‘lipid biosynthetic process’, and
   ‘oxidation reduction’. Nodes with dark edges are known drug targets of
   hydrogen peroxide based on Stitch2 database.

   In the SMP associated networks, we observed many perturbed interactions
   (ASCP and ASDP) between known drug targets and other uncharacterized
   genes in drug response, allowing for potential drug target predictions.
   Comparing between known drug target genes and other genes, we found
   that, in response to SMPs, ASCP or ASDP interactions tend to occur
   among drug target genes. Therefore, we predicted candidate drug targets
   based on their connecting degree with known drug targets in SMP
   associated networks. The candidate genes listed in [138]Table S2 were
   all connected to Inline graphic known drug targets. Among them, 59%
   (10/17) were found to exhibit resistance to the chemicals analysed in
   this study [139][41], [140][42], [141][43], [142][44], [143][45],
   [144][46], [145][47], [146][48], [147][49]. INO4, as an example, which
   is a transcription factor associated with phospholipids synthesis,
   shows differential interactions with nine genes targeted by eight
   drugs. Of these drugs, rapamycin, cycloheximide, lycorine, cerulenin,
   nocodazole, and resveratrol are classical cell proliferation inhibitors
   that are extensively utilized in cancer therapy and cell
   synchronization in molecular biology experiments [148][41], [149][42],
   [150][46], [151][47]. It has also been reported that gene INO4 may be
   affected by bleomycin and fenpropimorph, and the mutant type will
   produce resistance. These two drugs are both antibiotics and have been
   considered for use in cancer treatment. INO4 is seen as a potential
   drug target, particularly because it potentially plays a role in cell
   growth inhibition, and even cancer pathology. Regarding the other
   candidate genes, RPN4, STE12, FKH2 and CIN5 are also transcription
   factors that regulate important biological processes, such as RNA
   transcriptional elongation, MAPK signaling pathway, and proteasome
   genes expression. YCK3, PKP2, PKP1, and SKM1 are important kinases and
   exhibit serine/threonine kinase, casein kinase and mitochondrial
   protein kinase activity. FBA1 is fructose 1,6-biophosphate aldolase,
   which is required for glycolysis and gluconeogenesis. These genes
   exhibit certain functional features as drug targets, and published
   reports demonstrate that they exhibit reactions with certain known
   drugs and compounds; the natural or induced mutations will lead to
   mutant phenotypes. These annotations suggest that our genetic
   perturbing network results may be effective in drug target prediction.

   Moreover, we conducted pathway enrichment analysis for each SMP
   associated network using DAVID based on KEGG. We detected 207
   enrichment relationships (Beniamini<0.05) between 45 SMPs and 22 KEGG
   pathways ([152]Figure S4). It has been demonstrated that multiple
   pathways are involved in a SMP response. This suggests that cooperation
   among pathways may exist in the process of small molecular drug
   responses and that there is the possibility of discovering new and
   non-intuitive relationships between biochemical pathways. We found that
   124 pairs of pathways were involved in at least one common SMP
   response. Literature mining results based on PubMatrix [153][50] have
   provided strong independent evidence for the correlation among these
   pathways, 58% (72/124) of the pathway pairs co-occurred in at least one
   reference (detailed information for pathway pairs involved in >9 common
   SMPs is listed in [154]Table S3). Some pathway pairs are already
   recognized as correlated pathways based on KEGG. Ten SMPs
   (cycloheximide, parthenlide, menadione and rapamycin et al.) clustered
   in the center of SMP-pathway graph (red rectangle area in [155]Figure
   S4) targeted approximately the same pathways. These pathways all relate
   to protein biosynthesis, and are in concordance with the protein
   translation inhibition functional role of these SMPs. Some internal
   mechanisms of the SMP effects can be indicated in the SMP-pathway
   graph. For instance, glycolysis/gluconeogenesis is co-targeted by
   valinamycin, rapamycin, tunicamycin and fccp. However, the main effect
   of these four drugs is not regulation of the glucose distribution. A
   number of references suggest valinamycin plays a role in the glycolysis