Graphical abstract graphic file with name fx1.jpg [43]Open in a new tab Highlights * • We propose a gray box framework with Boolean networks and a black box optimizer * • GREY is a learned optimizer specialized in Boolean network optimization * • Gray box framework can predict drug responses and reveal underlying mechanisms Motivation The challenge of predicting cellular responses to perturbations amid the complex non-linearities of molecular interactions has spurred the development of machine learning-based models. However, interpreting these models in terms of molecular regulatory dynamics remains difficult. Conversely, logical network models like Boolean networks offer interpretability but struggle with large-scale networks due to high-dimensional search spaces. To overcome these hurdles, we introduce a scalable derivative-free optimizer, trained via meta-reinforcement learning, for Boolean network models. This approach enables prediction of anti-cancer drug responses in cancer cell lines while offering valuable insights into their molecular regulatory mechanisms. __________________________________________________________________ Kim et al. present a gray box framework that combines a white box logical model with a black box optimizer, addressing challenges in interpreting molecular regulatory dynamics. The gray box framework successfully predicts anti-cancer drug responses of cancer cells, while shedding light on the underlying molecular regulatory mechanisms. Introduction Controlling cell fate with molecular regulatory perturbations is an important challenge in various areas of biology. However, the control effect of a perturbation on specific molecules is often counterintuitive due to the complexity of the cellular network arising from non-linear molecular regulatory interactions.[44]^1 Molecular regulatory scale network structures, such as feedback loops and crosstalk, cause perturbation responses of the network to be highly dynamic processes. Therefore, predicting cellular response to a perturbation from static information alone, such as mutation profiles incorporated into the cellular network, obscures the underlying mechanism.[45]^2 In contrast, dynamic information regarding network state changes over time after a perturbation reveals the key regulatory mechanisms of the network response and aids the precision of response predictions. In this view, dynamic models of molecular regulation capable of simulating cellular responses to perturbations provide critical insight.[46]^3^,[47]^4^,[48]^5 Various machine learning (ML) models, including deep learning, have shown high accuracy in predicting non-linear dynamic processes of cellular networks, such as responses to molecular regulatory perturbations.[49]^6^,[50]^7^,[51]^8^,[52]^9^,[53]^10 However, ML-based models cannot provide biological rationale for their predictions since interpreting the internal workings of the model is difficult. To rectify this “black box” nature of ML models, some interpretable methods such as random forest and linear regression have been applied to predict cellular responses to perturbations,[54]^11^,[55]^12^,[56]^13^,[57]^14^,[58]^15^,[59]^16 which provide biological features highly correlated to the predictions. More recently, explainable artificial intelligence[60]^17^,[61]^18 and deep neural networks with architectures that mimic the relationships of biological entities, such as hierarchies of biological pathways,[62]^19 subsystems,[63]^20^,[64]^21 or molecular interactions,[65]^22 have been utilized. These methods suggest biological features related to the decision of the models and identify their relationships. However, due to the lack of dynamic information provided by these suggestions from the ML-based models, a dynamic model of molecular regulation is needed to simulate and predict perturbation responses. In systems biology, a cellular network is considered as a non-linear dynamical system of interacting cellular components, such as genes and transcription factors.[66]^23^,[67]^24 Thus, the network can be abstracted to a system model containing molecular components and their regulatory parameters represented by nodes and links. Continuous dynamic models, such as ordinary differential equation network models, can be constructed by integrating multiple sources of biological data and a priori knowledge, and therefore can successfully recapitulate complex dynamic processes such as cellular responses to perturbations.[68]^25^,[69]^26 Reconstructing and simulating a dynamic network model reproduces molecular regulatory mechanisms, which not only predicts the perturbation responses but also enables interpretation of molecular regulation dynamics such as perturbation signal flows related with those responses. This approach is called “white box” modeling due to the interpretable nature of the predicted responses driven by the system. Despite this, estimating every kinetic parameter of the continuous dynamic model is only feasible for small and well-investigated networks. As the network size increases, accurately estimating parameters becomes more difficult due to the increase in the dimensionality of the search space. Discrete logic-based models, such as Boolean network models, can partially overcome these problems by abstracting the network dynamics into simple but essential regulatory functions such as logical equations. These functions can be represented by network parameters or edge weights. The influence of neighboring nodes on a specific node in the network can be determined by considering the states of these neighboring nodes and the weights of the directed edges connecting them to the specific node. This influence is quantified by a weighted sum, calculated by multiplying the state value of each neighboring node by the corresponding weight of the edge connecting it to the specific node. If this weighted sum is greater than 0, the specific node becomes activated, influenced by the states of its neighboring nodes. Conversely, if the sum is less than 0, the node is deactivated, again due to the influence of the states of its neighboring nodes. Several studies have been made to estimate regulatory functions of these discrete models based on information theory[70]^27^,[71]^28^,[72]^29 or meta-heuristic optimization[73]^30^,[74]^31 due to the gradient-free characteristics of discrete model optimization, but such approaches still have scalability problems with high computational complexity. Moreover, the growing demand for discrete dynamic models of in silico drug discovery for precision medicine requires more efficient optimization algorithms, due to the complexity of estimating network parameters for each unique in silico model that predicts diverse drug response dynamics of patient samples. Therefore, a more sophisticated optimization algorithm is needed to efficiently estimate regulatory functions and scale up to larger networks. The learning to optimize (L2O)[75]^32 framework is a methodology that utilizes ML to train an optimization algorithm. Whereas traditional ML trains a particular model, L2O trains an optimizer that can be subsequently applied to optimize several related models. By using L2O to train a specialized optimizer for a specific type of optimization problem, it has the potential to outperform classical optimization algorithms in terms of computational complexity.[76]^33^,[77]^34 This is because the trained optimizer tries to learn the characteristics of the search space for the class of problems at hand, which enables it to efficiently explore the space of a particular one of those problems. In this study, we present a “gray box framework” that optimizes discrete dynamic models of molecular regulatory networks using an optimizer trained for discrete models with the L2O approach. This gray box framework combines the interpretability of a white box model with the scalability of a black box optimizer, allowing it to be both interpretable and able to scale up to larger networks ([78]Figure 1). We trained the black box optimizer, which we call GREY for gradient-free optimizer trained by meta-reinforcement learning to yield edge weights of Boolean networks, using synthetic optimization problems that imitate white box models consisting of Boolean networks. Figure 1. [79]Figure 1 [80]Open in a new tab A schematic representation of the gray box framework In the gray box framework, the white box model provides topological information of intracellular networks to the black box model, and the black box model optimizes the edge weights of intracellular networks from biological data. More specifically, the black box optimizer is trained on synthetic environments generated by random networks with perturbation fold change simulations. By blending these modeling frameworks, the gray box model can predict precise drug responses that are biologically interpretable. GREY can learn a derivative-free exploration strategy to find an optimal solution in the high-dimensional discontinuous search space of discrete dynamical models such as Boolean networks. To validate the optimization ability of GREY for real biological perturbation data, we applied it to drug response prediction. We constructed a drug response core network (DRCN) structure for each individual drug extracted from a pan-cancer core network (PCCN) integrated from public databases. The DRCN optimized by GREY can successfully predict the drug responses of cancer cell lines by utilizing the area under the curve (AUC) values derived from the viability curve data in the database Genomics of Drug Sensitivity in Cancer 2 (GDSC2): [81]https://www.cancerrxgene.org/.[82]^35 GREY was able to optimize the logical parameters of the DRCN more efficiently than a state-of-the-art derivative-free optimization algorithm. Further mechanism analysis showed that the optimized network reveals key network motifs related with perturbation response dynamics, which were utilized to infer drug response biomarkers that match basket trial marker genes reported from previous experiments. Results A gray box framework that optimizes a white box logical model using a black box optimizer We developed a gray box framework by combining a black box optimizer with a white box modeling approach that generates biologically interpretable predictions. The framework constructs a Boolean network to model drug response at the cellular level, and optimizes the edge weights of the network to match existing biological data, such as cellular responses to drug perturbation. The gray box framework can be broken into three parts ([83]Figure 2): (1) an extraction module for the DRCN structure, (2) a preprocessing module for the cell line profiles and drug response data, and (3) a gray box module that optimizes the logical relationships within the DRCN. Figure 2. [84]Figure 2 [85]Open in a new tab Overall procedure for constructing the gray box framework The extraction module (green dotted box) constructs the pan-cancer core network (PCCN) based on pan-cancer related networks gathered from cancer-related genes and network topology data and extracts drug response core networks (DRCNs). The preprocessing module (blue dotted box) gathers mutation profiles and drug response data of cell lines, and projects them onto the DRCN, generating DRCNs specific to each drug. The gray box module (black dotted box) optimizes edge weights using GREY based on the DRCN and preprocessed data. The ensemble of optimized DRCNs is utilized by the drug response dynamic network simulator to predict drug responses and generate mechanistic interpretations. The DRCN is constructed to distill existing knowledge of regulatory interactions between genes and proteins related to drug responses in cancer into a minimal network. It begins with a pan-cancer related network that isolates genes contained in cancer-related biological networks, frequently mutated genes in cancer, and target genes of anti-cancer drugs. The links from a structural network database of intracellular signaling among genes and proteins, Omnipath,[86]^36 that have both a direction (identified source and target nodes) and a sign (whether the source node activates or inhibits the target node) are chosen to connect the selected genes into the pan-cancer related network. The PCCN was then extracted by removing irrelevant inter-nodes in the pan-cancer related network that are not contained in a path between cancer-related genes. Next, we extracted the DRCN from the constructed PCCN. The DRCN is required to keep the network size as small as possible to minimize the computational requirements of the Boolean network simulations, while retaining all drug response-related genes and signaling pathways. To accomplish these requirements, a minimum number of genes were selected that were significantly related to the drug response of cancer cell lines in GDSC2. Then, a minimum number of links and intermediate nodes were selected by Edmonds’ algorithm, and utilized to connect the selected genes to the DRCN. Finally, phenotypic “proliferation” and “apoptosis” nodes, and their links from the CancerGeneNet,[87]^37 were added to the extracted DRCN as phenotypic markers. As a result, the constructed DRCNs incorporate both drug response-related genes and proteins as nodes, recapitulating the fundamental molecular regulatory dynamics of the selected drug responses (see [88]STAR Methods). The ability for the proposed gray box framework to predict drug responses was evaluated across three different DRCNs to ensure robustness across a variety of network structures. We constructed three different DRCNs of US Food and Drug Administration-approved targeted cancer drugs from Molecular Analysis for Therapy Choice (MATCH) basket trials run by the National Cancer Institute (NCI-MATCH).[89]^38 We selected afatinib (an inhibitor of EGFR and ERBB2), trametinib (an inhibitor of MAP2K1 and MAP2K2, also known as MEK1 and MEK2, respectively), and palbociclib (an inhibitor of CDK4 and CDK6). Each of these drugs inhibits well-known and unique cancer-related machinery, and together they engage a large fraction of the nodes covering the PCCN structure (see [90]STAR Methods). The afatinib DRCN comprised 57 nodes, whereas the trametinib and palbociclib DRCNs comprised 66 nodes and 192 nodes, respectively. While the constructed three different DRCNs that GREY optimizes recapitulate essential regulatory network structures related with drug responses, each drug response of each individual cancer cell line can be simulated by applying the mutational profiles of each cell line from the GDSC2 database to the DRCN of each of the three drugs. To apply mutations from a cell line to the DRCN of a given drug, the corresponding nodes in the DRCN whose mutations correspond to the mutation profiles of that cell line in GDSC2, are set to a specific invariant state. The invariant node state is set to 1 if the mutation is either activation, amplification, or gain of function and set to −1 if the mutation is suppression, deletion, or loss of function. Mutations that are in nodes outside of the DRCN are treated as a constant input signal that activates or inactivates any downstream nodes in the DRCN. The mutation profiles of each cell line were obtained from the Catalogue of Somatic Mutations in Cancer (COSMIC)[91]^39 database and the effect of each mutation was determined from their oncogenic or tumor suppressive q values (see [92]STAR Methods).[93]^40 In this way, we mapped numerous mutation profiles of cell lines to a small number of nodes in our DRCNs. For example, 716 cell lines were represented identically by the afatinib DRCN composed of 57 nodes. Having incorporated mutation profiles and drug responses, the edge weights of the resulting DRCNs are then ready to be optimized by GREY. Construction and optimization procedure of GREY We developed GREY to optimize edge weights of DRCNs for the mutation profiles and the drug response data from GDSC2. We formalized DRCNs using weighted sum Boolean logic (see [94]STAR Methods), and the edge weights of the DRCN were assigned to each link. In weighted sum Boolean logic, the activation of each node is determined by a weighted sum of values from its incoming connections, where the weights represent the regulatory influence of each link. Due to the non-continuous fitness space that results from the discrete states of the nodes, and the time-consuming attractor simulation processes of Boolean networks, we needed a derivative-free and sample-efficient optimizer. Thus, we trained GREY, which is an optimizer specialized for optimizing the edge weights of Boolean networks, through L2O, which employs multi-agent meta-reinforcement learning[95]^41^,[96]^42^,[97]^43 for its implementation. An agent in reinforcement learning is an entity that learns to make optimal decisions by interacting with an environment and receiving feedback in the form of rewards. In multi-agent reinforcement learning, multiple agents must learn how to cooperate with each other to achieve their collective objectives. In the current setting, the environment is an optimization problem (optimizing the edge weights of Boolean networks), and there is one agent for each edge weight. The agents explore the search space of the optimization problem (all possible edge weight values) to find good solutions that have high fitness. The edge weight values of each solution are utilized to simulate a Boolean network. Fitness is then defined as the Pearson’s correlation coefficient between the simulated fold change of the node values and the ground truth extracted from a database or biological experiment. The reward for agents is proportional to the difference between its current fitness and its previous best fitness, and the agents are trained to maximize reward. In traditional reinforcement learning, an agent learns an exploration strategy to maximize its reward within a specific environment. For our problem, each drug corresponds to a single DRCN structure. These different DRCNs represent different environments for the optimizer. To accommodate multiple environments in the form of multiple different DRCN structures, we adopted meta-reinforcement learning, which allows agents to learn adaptable exploration strategies by training across a variety of different environments. In a given environment, GREY receives the network structure of the environment with edge weights and reward as input, and produces new edge weights as output. Then, fitness is computed using these new edge weights, followed by a reward. The entire sequence from entering input to GREY to obtaining a reward constitutes a single step. The new edge weights and the reward obtained from the preceding step are used as input for the subsequent step. In the initial step, edge weights and reward used as input are initialized to random values and zero, respectively. An episode is a sequence of interactions between GREY and an environment, i.e., a sequence of steps, and GREY is trained from the cumulative experiences gained throughout the episode. GREY was trained using 100,000 synthetic environments with random networks composed of 20–40 nodes, random edge weights, and a generated ground truth dataset (see [98]STAR Methods; [99]Figures 3A and [100]S1). The random network was formalized by weighted sum Boolean logic, and the ground truth dataset comprises the fold change of the average activity for nodes between the first attractors attained from random initial states and the second attractors reached from the first attractor after fixing some nodes to 1 or –1. The resulting GREY trained on randomly generated synthetic environments can be applied to optimize the edge weights of biological networks with any given structure and corresponding data ([101]Figure 3B). Figure 3. [102]Figure 3 [103]Open in a new tab Overall architecture of GREY and training procedure (A) An illustration of the training procedure. [MATH: Aˆ :MATH] is the true edge weights of the environment, and [MATH: At :MATH] is the new edge weights at step [MATH: t :MATH] . [MATH: ft :MATH] is the fitness at step [MATH: t :MATH] , and [MATH: Rt :MATH] is the observed improvement (OI) of the fitness (PCC, Pearson’s correlation coefficient) as a reward at step [MATH: t :MATH] . [MATH: mt :MATH] includes hidden states. After generating an episode by repeating 80 steps, the trajectory is stored in the buffer. Neural network parameters of GREY ( [MATH: θG :MATH] ) are updated whenever 64 episodes are saved, and then the buffer is cleared. Supervised pre-training to predict the true edge weights is performed until the initial 8,000 episodes are used and, from then on, reinforcement learning is performed with the DPPO algorithm to maximize the cumulative discounted rewards. (B) Trained GREY can optimize the edge weights of the white box Boolean network model regardless of network structure. During each step of optimization, trained GREY iteratively receives network structure with preceding edge weights and its reward as input, and then attempts to find new edge weights with better fitness and returns those as output. (C) Inputs are processed using graph neural networks and node, edge, and global features are generated. (D) For each edge, a reorganized feature is formed by concatenating its edge feature, the node features of sender and receiver nodes, and the global feature. (E) LSTMs are assigned to each edge. While the parameters of these LSTMs are shared, they maintain independent hidden states. The architecture of GREY consists of three layers: the input, the submodule, and the main layer ([104]Figure S1A). The input and the submodule layers serve an auxiliary role of providing information to the main layer. The input layer is composed of several graph neural networks (GNNs),[105]^44 which are a type of neural network designed to process and learn from data structured as graphs by capturing relationships between nodes. GNNs in the input layer produce node, edge, and global (graph-level) features by aggregating information from neighboring nodes, edges, and the entire graph, and they represent underlying dependencies within the graph ([106]Figure 3C). Those features are then reorganized for each edge by concatenating the edge feature with the node features of sender and receiver nodes, as well as the global features ([107]Figure 3D). These reorganized features are then passed to the submodule and the main layers and processed by long short-term memory (LSTM) ([108]Figure 3E). In the submodule and the main layers, one LSTM is assigned to each edge, and LSTMs in the main layer each act as one agent. Each agent produces a new weight for its corresponding edge, and the new weights from all agents collectively determine fitness, since fitness depends on the dynamics of the network as a whole. The submodule layer has a context estimator, which helps the agent adapt quickly to the environment by estimating information about the environment, and a global state estimator, which helps the agent assigned to each edge to cooperate by estimating the behavior of the other agents. Application of GREY to synthetic and biological environments We assessed the optimization performance of GREY against unseen environments. In each application, GREY produces five new candidate edge weight sets for each step (i.e., generating five outputs at each step) and randomly replaces elements (e.g., 50%) from each candidate edge weight set by the edge weights with the best fitness from all previous steps. Then, we evaluated the fitness of each candidate edge weight set (i.e., five evaluations per step) and used the candidate with the best fitness as an input for the next step. First, we checked that GREY can perform optimization on an example synthetic environment not used in training. As more evaluations were performed, GREY found edge weights with higher fitness ([109]Figure 4A). Random agents, which are untrained GREY, showed no meaningful behavior. Thus, we confirmed that GREY can perform optimization properly. Figure 4. [110]Figure 4 [111]Open in a new tab Performance of GREY on synthetic environments and DRCNs For synthetic environments and DRCNs, we used 128 and 1,000 random initial states, respectively. Initial states are sampled for every step during optimization. During each step, 5 sets of edge weights are generated and evaluated, and the set of edge weights with the highest fitness becomes an input for the next step. (A) GREY optimization for a synthetic environment (50 nodes, 94 links) that was not used in training. The larger line plot shows the best fitness (y axis) across evaluations (x axis), while the inset plot shows fitness at each evaluation. Results of five random agents are shown for comparison. Shaded areas have 95% confidence intervals over three independent runs. (B and C) Comparison of the results of CMA-ES on various scales. Initial standard deviation ( [MATH: σ0 :MATH] ) of CMA-ES was chosen among (0.1, 0.3, 0.5, 1.0, 2.0) by selecting that which exhibited the highest average fitness. The number of evaluations per step (population size) of CMA-ES are ( [MATH: 4+3ln(n etworksize ) :MATH] ), which is the default setting, in (B) and 5 in (C). The results depict the average maximum fitness after 100 evaluations. Error bars are 95% confidence intervals over 20 random networks (∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, Student’s t test, two-sided). (D) Optimization performance for DRCNs of afatinib, trametinib, and palbociclib. We randomly selected 300 cell lines for each drug. Shaded areas are 95% confidence intervals over three independent runs. (E) Median and median absolute deviations of F1 scores for 21 drugs in four methods. Next, we compared the optimization performance of GREY to covariance matrix adaptation evolution strategy (CMA-ES),[112]^45 which is considered the state-of-the-art derivative-free optimization method for challenging settings, such as non-convex and non-continuous problems. During each optimization step, evaluating fitness incurs the greatest computational cost for both GREY and CMA-ES, because of the time-consuming attractor simulation processes of Boolean networks. Thus, we compared the performance in terms of sample efficiency, aiming for higher fitness with fewer evaluations. GREY achieved statistically significant higher fitness for all scales of networks (20–200 links with 20 networks for each size interval) that were not used during training ([113]Figure 4B). Furthermore, GREY outperformed CMA-ES when the number of evaluations per iteration step was the same ([114]Figure 4C). We then applied GREY to optimizing the DRCNs of three drugs, afatinib, trametinib, and palbociclib, with respect to their drug response data from GDSC2 ([115]Figure 4D). When optimizing the edge weights of the DRCN for a certain drug, the fitness of the solution is the accuracy of DRCN drug-response predictions. More specifically, the DRCN predicts drug responses by calculating the difference between the average activity of phenotype nodes between the prior attractor and the new attractor. The prior attractor is obtained from random initial states with the given mutation profile for each cell line, and the new attractor is obtained from the prior attractor while maintaining drug target nodes at −1. The fitness is then defined as the Pearson’s correlation coefficient between the predicted drug responses from DRCN and the ground truth, as before. However, unlike with the simulated datasets, the ground truth drug response of each node is not known. Instead, only the activity change of the phenotype nodes are measured and compared with the ground truth to determine fitness. The ground truth of the drug response is the measured area under the viability curve (AUC) value collected from GDSC2 (see [116]STAR Methods). Compared with CMA-ES, GREY consistently achieved better fitness when optimizing DRCNs for all three drugs. For the trametinib DRCN, GREY reached a fitness of 0.3 after an average of 403 evaluations, which is approximately 3 times more sample-efficient than an average of 1,102 evaluations of CMA-ES. GREY successfully optimized the palbociclib DRCN consisting of 655 edge weights, despite the search space size being significantly larger than that of the synthetic environments used in the training of GREY. These results suggest that GREY is sample efficient and highly scalable. We then compared prediction performance of optimized DRCNs over 21 drugs that have various modes of action ([117]Table S1) with ML models SRMF[118]^46 and DeepCDR,[119]^10 and a logical prediction model LOBICO.[120]^47 Each of the 21 drugs represents a corresponding group of drugs targeting the specific pathway according to the labels in the GDSC2 database. For each drug, 300 cell lines were randomly selected to form a training set and the remainder formed a test set. All comparison models were trained using the same set of training data with default settings. Drug-response values were binarized according to the rule defined by Knijnenburg et al.[121]^47 Predicted drug-response values ([122]Figure S2) from DRCNs are binarized using an optimal threshold, determined based on the highest F1 scores observed across the training set cell lines. F1 score ([123]Equation 2) was calculated for each drug. Medians and median absolute deviations of all drugs are summarized in [124]Figure 4E. [MATH: Precision=TruepositiveTrue positive+Falsepositive,Recall=TruepositiveTrue positive+Falsenegative :MATH] (Equation 1) [MATH: F1score=2×Recall×Precision Recall+Precision :MATH] (Equation 2) The prediction performances from the optimized DRCNs using GREY (GREY opt.) are lower than that of the ML models such as SRMF or DeepCDR. However, GREY opt. showed better performance than the logical prediction model LOBICO, and it has further strengths in terms of interpretability. For example, a marker predicted for afatinib by LOBICO is an ERBB2 mutation,[125]^47 which is a direct target of afatinib, but it does not explain the underlying dynamics of how the mutation affects the phenotype. On the other hand, our model can suggest molecular regulatory dynamics by simulating the optimized DRCN with sensitive and resistant cell lines, as we describe in the following sections. Successful prediction of drug responses and molecular regulatory dynamics with the optimized DRCNs Interpretable drug responses depend on accurate inference of both the phenotypic response and the underlying molecular regulatory dynamics. We explored whether the DRCNs optimized through GREY can predict not only the phenotype-level drug responses but also the changes represented by fold changes in gene expression. To ensure the applicability of our gray box framework throughout the pan-cancer, we selected two cell lines, one sensitive and one resistant to afatinib treatments, respectively, for each of three frequently occurring cancer types in humans: breast, colon, and stomach cancer. Each of the six selected cell lines has been carefully chosen to serve as a representative model, demonstrating either a sensitive or resistant response to the selected drug, afatinib, based on the AUCs of the GDSC2 database ([126]Figure 5A). However, within each cancer type, the sensitive and resistant cell lines share identical genetic markers for drug response predictions, as suggested by the MATCH basket trials conducted by the National Cancer Institute (NCI-MATCH). Therefore, classifying drug response differences between sensitive and resistant cell lines requires interpretation of the underlying molecular dynamics, which remain obfuscated by classical marker-based methods (see [127]STAR Methods). Figure 5. [128]Figure 5 [129]Open in a new tab In-house experimental validation of simulated changes in gene expression (A) Relative afatinib responses of six selected cell lines from the GDSC2 drug responses database. Three afatinib-sensitive (in green) and three afatinib-resistant (in red) cell lines were well separated with respect to the area under the curve (AUC) of cell viability after various dosages of afatinib treatments given in the GDSC2 database. Two breast cancer cell lines used for further analysis are colored with higher saturations. (B) Simulated afatinib responses of six cell lines. Note that the first two graphs show a decrease in the average activity of the proliferation node (y axis < 0) and the third graph shows an increase in average activity of the proliferation node (y axis > 0) (∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, Student’s t test, two-sided). (C) In-house afatinib response data of six cell lines, quantified by relative confluence. Representative microscopic images for one pair of cell lines are shown (∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, Student’s t test, two-sided). (D) The pathway enrichment analysis results of DEGs before and after each drug treatment. The most enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway of DEGs in afatinib-treated BT-474 is “decreasing cell cycle,” The most enriched pathway in afatinib-treated BT-20 is “increasing DNA replication.” (E) Min-max normalized log2-fold change values of each gene of the in-house experimental data compared with min-max normalized average activity changes of each afatinib DRCN node computed by the drug treatment simulation. Parameter sets are optimized by a random agent and GREY, each repeated three times. Min-max normalized fold changes in the iterative simulation results of phenotype nodes and STAT1 (downstream node of EGFR) computed by the optimized parameter sets found by GREY and a random agent are compared with experimental data in histograms. With the selected six cell lines, we firstly confirmed that GREY-optimized DRCNs are capable of reproducing the expected phenotypes in response to drug perturbations. To do so, we compared our in-house experimental data of drug responses in six cell lines with the drug responses simulated by optimized DRCNs. We mapped the genetic profile of each cell line onto the basal node activities of the afatinib DRCN, and GREY optimized the logical parameters to reproduce the viability of the cell lines in a robust manner. Using the optimized afatinib response dynamic network simulator, we simulated the proliferation response of each cell line, calculated as the average change of the proliferation node from 34 repeats ([130]Figure 5B). Average activities of the proliferation nodes after afatinib treatment in two of the sensitive cell lines (BT-474 and SW48) were significantly decreased compared with those of two of the resistant cell lines (BT-20 and SW620). For the pair of stomach cancer cell lines, the average activity of the proliferation nodes did not decrease following afatinib treatment in the simulations. However, the average increase of activity in the sensitive stomach cancer cell line (NCI-N87) was smaller than that of the resistant stomach cancer cell line (MKN45). Thus, all of the predicted drug responses were consistent with the relative afatinib responses from the GDSC2 database. We assessed the effect of afatinib on proliferation of the cell lines in culture. The experimental results of afatinib responses were consistent with the response data from the GDSC2 database and with the simulated predictions of drug responses through DRCNs ([131]Figure 5C). Detailed in-house experimental data of drug treatment in cancer cells are presented in [132]Figure S3. After confirming that the optimized model reproduces the expected phenotypic responses, we compared the model predictions of each gene in response to the drug perturbations with both existing databases and in-house biological data. To determine plausible gene sets that distinguish between sensitive and resistant pairs of cell lines at the cellular level, we first performed a pathway enrichment analysis with the web-based gene set analysis toolkit (WebGestalt)[133]^48 based on the Kyoto Encyclopedia of Genes and Genomes pathways.[134]^49^,[135]^50^,[136]^51 We selected BT-474 and BT-20 cell lines, because experimentally these cells had the most significantly different proliferation responses to afatinib. We calculated the log2-fold changes of differentially expressed genes (DEGs) before and after drug treatments in both cell lines by RNA sequencing data and analyzed the pathways with altered expressions ([137]Figure 5D). The pathway with the highest enrichment score of DEGs in the afatinib-treated BT-474 was “decreasing cell cycle” (hsa04110), which is consistent with the response of this sensitive cell line. The pathway with the highest enrichment score of DEGs in the afatinib-treated BT-20 was “increasing DNA replication” (hsa03030), which is known to be related to drug resistance mechanisms,[138]^52^,[139]^53 but did not meet the threshold for false discovery rate, indicating that there were no significant changes in gene expression induced by afatinib in these cells. We then preprocessed the expression fold change data of BT-20 and BT-474 cells to enable comparison with simulated afatinib treatment in GREY-optimized DRCNs. Although the expression fold changes in both BT-20 and BT-474 cell lines display a sufficiently large number of preprocessed DEGs exceeding threshold values to ensure quality control, the two cell lines exhibit a different number of significant DEGs. BT-20 cells had fewer genes with a base mean count over 100 (5,190 genes) than the BT-474 cells (15,583 genes), of which only 21 of 55 nodes were matched with the afatinib GREY-optimized DRCN (excluding the two phenotypic nodes). In the case of BT-474, 45 of 55 nodes in the afatinib DRCN were matched with the measured genes with a base mean count over 100. Both in-house expression data and DRCN simulation data are calculated in the form of log2-fold changes and min-max normalized. The log2-fold change values of each gene of the in-house experimental data and the average activity changes of each afatinib DRCN node computed by the drug treatment simulation were min-max normalized and compared. As the afatinib DRCN is an essential regulatory network of intricately interconnected genes and proteins that collectively influence drug response, it is imperative to validate the correlation between changes in gene expression levels and the resulting shifts in the activities of proteins within this network.[140]^54 Consequently, the activity changes of proteins represented by nodes in the afatinib DRCN can be compared and validated with the RNA expression fold changes measured by our in-house experiments. The insights into drug response differences between sensitive and resistance cell lines based on the in-house data were then compared with the predictions of GREY-optimized DRCNs. The optimized afatinib DCRN was simulated, and the average change in node activity due to drug perturbation was compared with the gene expression fold changes of the experimental data. The computed average activity change of each node was an average value of three simulations from three parameter sets optimized by GREY or by a random agent, which had the same structure as GREY but with randomly initialized weights. To check the consistency of fold changes in each node computed by the optimized parameter sets from GREY, we compared the results of 30 iterative afatinib treatment simulations, each with ten repetitions and different initial states for each of the three parameter sets, against the corresponding 30 simulation results computed by three parameter sets optimized by the random agent. Genes with moderate fold changes in experimental data and nodes with inconsistent fold changes, which depended on the parameter sets or the initial state sets, were filtered according to log2-fold changes and fold change variance over 30 simulations, respectively. As a result, the fold change tendencies of 14 nodes simulated by the parameters of GREY-optimized afatinib DRCN and mutational profiles of BT-474 projected on the network were in accord with afatinib treatment experiments. Whereas in the random agent-optimized parameters, only 5 of 42 results matched the experiments. In BT-20, 7 of the simulated node fold changes optimized by GREY showed the same tendency as the experimental data, while only 1 node fold change optimized by the random agent was well in accord with the experiments ([141]Figure 5E). The correlation between the simulated node fold changes optimized by GREY and the experimental data is further validated with a statistical test and compared with the data of randomly shuffled node labels ([142]Figure S4). The major difference between the networks with parameters optimized by GREY and the random agent was that the former depends both on network topology and logic, whereas the latter depends on only network topology. Although random agent-optimized parameters were able to correctly predict some of the node fold changes, the simulated network converged into a single attractor. Note that the iterative fold change results of phenotypic nodes and STAT1 (downstream node of EGFR) simulated by GREY-optimized afatinib DRCNs were more spread out, indicating various attractors, while those simulated by random agent-optimized afatinib DRCNs were more concentrated in a few rescaled fold change intervals, indicating a single attractor (histograms in [143]Figure 5E). The node states of the converged attractor with random agent-optimized afatinib DRCNs were predictable by topologically estimating the effect of the afatinib perturbation with the network diffusion algorithm suggested by Hofree et al.[144]^55 The result infers that the dynamics of the network are simple, lacking any complex feedback or crosstalk dynamics, and hence inadequate for reproducing the behavior of a biological network. Moreover, the single attractor converged to by each of the optimized afatinib DRCNs using the random agent differed depending on the parameter sets. This implied that with additional optimized parameter sets, the simulated network dynamics would not converge into a single distribution. As a result, increasing the number of parameters sets cannot guarantee prediction performance improvement. In stark contrast, the networks with the parameters optimized by GREY converged into a higher number of attractors, corresponding to varied regulatory dynamics, which exhibited similar phenotypic predictions. These results suggest that the networks optimized by GREY comprise not only topological information, but also dynamic properties including feedbacks and crosstalk, which are known to be important features to understand in order to overcome drug resistance mechanisms in cancer therapy.[145]^56 Moreover, even though average responses of GREY and the random agent-optimized DRCNs are similar, GREY-optimized networks can be said to more properly recapitulate the biological nature of perturbation responses of regulatory networks. Biology regulatory networks exhibit stochasticity in their gene expressions, molecular activities, and probability of chemical reactions, which tend to produce a range of distinct attractors.[146]^57^,[147]^58^,[148]^59^,[149]^60 The multiple attractors exhibited by GREY-optimized DRCNs are hence more realistic. Moreover, the distributions of attractors from the network optimized by different parameter sets from GREY were more highly clustered than those of the random agents. This illustrates that the prediction performance of the drug-response networks optimized by GREY will not be derailed by a larger number of parameter sets. Thus, the networks optimized by GREY are more suitable than the networks optimized by the random agent for the purpose of cancer drug-response predictions. Interpreting regulatory mechanisms of drug resistance with the gray box framework After establishing the ability of GREY to infer regulatory dynamics, we explored if those cellular-level insights could be interpreted to reveal underlying mechanisms of drug resistance. We analyzed the simulated responses of BT-20 and BT-474 cells to discover network motifs behind the differences in afatinib response. We calculated the difference between the average activity fold change of afatinib-treated BT-20 and that of BT-474 by subtracting fold change values of the BT-474 afatinib core network from that of the BT-20 afatinib core network ([150]Figure 6A). Note that some nodes had significantly different fold change values. With an assumption that the nodes with significant difference in fold change between the sensitive and the resistant cell lines cause the difference in afatinib responses, we selected the five nodes with the largest positive difference (PTPN6, AR, PDPK1, IGF1R, and SGK1, in decreasing order of absolute values) and five with the largest negative difference (MAP3K3, LCK, NFKBIA, PPARA, and STK3, in decreasing order of absolute values) in fold changes of simulated node activities. Figure 6. [151]Figure 6 [152]Open in a new tab Mechanistic analysis of afatinib treatment to identify the most influential nodes for predicting and controlling drug resistance (A) Afatinib DRCN. The color of each node represents the predicted relative activity changes of resistant and sensitive cell line models, and the color of the arrows denotes whether the signal flow increases (red) or decreases (blue) after afatinib treatment. (B) Activity difference sub-network for afatinib response from BT-20 and BT-474 cells. This sub-network was extracted from the afatinib response network in (A). (C) Identification of context-dependent flows of afatinib treatment effects toward the proliferation node. The propagated mutational profile of BT-20 on the activity difference sub-network contained two nodes with activating mutations (EGFR and PTPN6), and three inputs from mutations outside this sub-network that activated (NGKB1, SYK, and AR), and one that is inactivated (PPARA). The propagated mutational profile of BT-474 on the activity difference sub-network had three nodes with activating mutations (EGFR, PPARA, and NFKB1), two nodes activated by input from mutations outside this sub-network (STAT1 and ERBB2), and one node inactivated by external mutation (CDK1). Signal flow and node activity changes in activity difference sub-network of resistant cells and sensitive cells upon perturbation of EGFR and ERBB2 (afatinib treatment) are illustrated. (D) The balance between mutational effects on two essential negative feedback loops is primarily responsible for determining the resistance to afatinib treatment. To explore the interactions between these nodes in the context of drug resistance, we extracted an activity difference sub-network from the afatinib DRCN. This activity difference sub-network was formed by connecting the selected ten nodes with links and inter-nodes by PathLinker,[153]^61^,[154]^62 which explored up to 50 paths from the drug target nodes (EGFR, ERBB2) to the phenotypic node (proliferation) through the selected 10 nodes ([155]Figure 6B). We found that the ERBB2 cascade and AR-AKT1 negative feedback are activated in BT-20, whereas the EGFR cascade and NFKBIA-NFKB1 negative feedback are activated in BT-474 after afatinib treatment. To interpret the mechanism underlying the different afatinib responses with respect to EGFR and ERBB2 inhibition signals using the activity difference sub-network, mutational effects and afatinib perturbation signal flows were analyzed ([156]Figure 6C). Signal flows from a node are defined by adding the incoming signal flow to the product of the node activity change with the sign (positive or negative) of the outgoing links.[157]^63 The effect of a mutation on a node inside the activity difference sub-network was treated as fixing that node to a state, such that incoming links were deleted and a constant signal flows out from that node. The color of the arrows denotes whether the signal flow increases (red) or decreases (blue) after afatinib treatment. The effect of a mutation from outside the activity difference sub-network was incorporated by propagating the change in signal flow caused by a mutation through exclusive simple paths computed by the maximum matching algorithm. The propagated mutational profiles of two cell lines were incorporated into the sub-network as the indirect influence of mutation on the sub-network nodes (see [158]STAR Methods). The activity-difference sub-network reveals how drug responses differently affect each cell line. In BT-20, the decreasing signal flows from the afatinib-inhibited EGFR node through the STAT1-PPARA-NFKBIA cascade and the ESR1-AR cascade were offset by the increasing signal flows into the NFKB1 node and the AR node, respectively. The NFKB1 node and AR node were indirectly influenced by mutations in nodes outside the afatinib sub-network, but within the PCCN. The increasing signal from the afatinib-inhibited ERBB2 node flows through the CDK1-AR cascade and is further enhanced by the increasing signal flow at the indirectly activated node, AR, increasing the proliferation node activity. The increasing signal flowing into SYK from the mutations outside the afatinib DRCN also inactivated LCK, which dampened the ability for afatinib-inhibited EGFR to decrease the flow into STAT1 and NFKBIA. The net outcome of the inhibitory effect of afatinib on EGFR and ERBB2 is predicted to be an increase in proliferation node activity in BT-20. In BT-474, the decreasing signal flow from the afatinib-inhibited EGFR node into the STAT1 node is blocked by the PPARA mutation, but the afatinib-inhibited EGFR decreases AR node activity by reducing the signal flow through the ESR1-AR cascade. The increasing signal from the afatinib-inhibited ERBB2 node is offset by decreasing the signal flow from the CDK1 node, which is indirectly inhibited by mutations. As a result, the activity of the AR node is decreased after afatinib treatment, resulting in decreased activity of the proliferation node. Together, the signal flow analyses of the activity difference sub-networks predicted a reason why BT-20 and BT-474 have different responses to afatinib treatment: the signal flows in the two cell lines differently affect the NFKB1-NFKBIA and the AR-AKT1 negative feedback regulations ([159]Figure 6D). The underlying reason for the resistance to afatinib treatment for BT-20 can be summarized as the following: the mutational profile activates AR-AKT negative feedback regulation and inactivates NFKB1-NFKBIA negative feedback regulation. The genetic marker for the NCI-MATCH afatinib basket trial was mutated or amplified EGFR.[160]^64 Both BT-20 and BT-474 cells have gain-of-function mutations in EGFR. Thus, using only that single genetic marker, one would expect the same response to afatinib for these two cell lines. On the other hand, our framework not only predicted the different responses of these two cell lines but also revealed their underlying mechanisms. The collection of genes involved in the two relevant feedbacks, NFKB1-NFKBIA and AR-AKT1, may constitute a new type of dynamic drug-response biomarker. A biomarker composed of a group of genes and their holistic dynamics, which respond differentially to drug treatment, may predict drug responsiveness more effectively than the classical mutation biomarkers of afatinib. Owing to the interpretable and simulatable afatinib DRCN, we can properly predict the drug response, distinguishing resistant cell lines that need additional treatment. Moreover, by interpreting the resistance mechanisms, such as feedback and crosstalk, we can identify efficient combinatorial drug targets to compensate for those resistances. Additional validations for suggesting dynamic drug-response biomarkers are in [161]Figure S5. Detailed signal flow analyses of trametinib-treated SW620 and SW48 cell lines are discussed in [162]Figure S6. Discussion Predicting cellular response to a perturbation is of great interest in systems biology and ML.[163]^4^,[164]^9^,[165]^20^,[166]^46 Systems biology uses a white box approach based on mechanistic modeling of intracellular molecular regulatory networks, and can provide interpretable insights into regulatory dynamics, but is difficult to scale up to larger networks. ML uses a black box approach by modeling input-output relationships, and can offer high prediction performance, but it is difficult to interpret its underlying rationale. There are several works[167]^14^,[168]^15^,[169]^20^,[170]^22^,[171]^47^,[172]^65 that have attempted to merge these approaches to make interpretable predictions, but they fall short of achieving detailed insight into molecular regulatory dynamics about cellular responses after a perturbation. Therefore, to provide interpretations of molecular regulation dynamics, we constructed a gray box framework by combining the white box and the black box L2O approaches to model molecular regulatory dynamics. We devised a black box optimizer, GREY, to tackle the high computational complexity of parameter optimization for DRCNs. GREY learned efficient derivative-free search rules by identifying the relationship between the structure of a network and its dynamical response. To accomplish this, we adopted a GNN[173]^44 to represent attributes of the network components (i.e., nodes and edges), and used meta-reinforcement learning to learn different dynamic responses to perturbations for each network. Remarkably, GREY optimized the edge weights of the DRCN for trametinib three times more efficiently than the state-of-the-art derivative-free optimization method, CMA-ES, with respect to sample efficiency. Moreover, GREY optimized the DRCN for palbociclib to a fitness value of 0.5, which is difficult to achieve by inferring the regulatory logic of a network of such scale (∼600 edges) using any other optimization method. We applied GREY to optimize the DRCNs of 21 drugs. Constructing a DRCN for each drug involves numerous steps that influence the number of nodes and edges, and the optimal settings may vary for each drug. However, we applied the same settings to construct all DRCNs to allow for a systematic comparison of the prediction performance of optimized DRCNs. Consequently, certain drugs inevitably have a suboptimal DRCN, influencing our results. Investigating the impact of drug-specific settings on the prediction performance of DRCNs would be an intriguing avenue for further research. To demonstrate the efficacy of the gray box framework on recapitulating cellular dynamics, we applied it to the drug-response predictions in cancer networks. Our results have shown the broad applicability of the framework with a wide range of cancer types and drug targets. As the mechanistic interpretation of DRCNs is not detached from drug-response prediction performance, a question whether the molecular mechanism interpretation of afatinib drug responses is perfectly correct or cannot arise. However, since the Boolean networks with optimized weights not only fully reproduce the quantitative responses for various cell lines of several drugs, but also reproduce the activity changes after those drug treatments, we can assume that the regulatory logic recapitulated by the weighted Boolean logic of the optimized DRCNs accurately models the underlying molecular regulatory mechanism, at least for the scope of drug response. Moreover, the optimized DRCNs not only correctly predicted drug responses of BT-20 and BT-474 cell lines, but also suggested a biomarker based on regulatory dynamics that was directly related with drug-response mechanisms and thus more accurately predict the resistance. This result is well in accord with the research of Choi et al.,[174]^4 where classifying cancer cell lines with similar changes in expression levels induced by drug treatment exhibits more similar drug responses than classifying cancer cell lines with similar mutation profiles. Therefore, we believe that the interpretability of the model can provide insight into biological mechanisms, such as drug resistance, which can subsequently help to overcome these hurdles in real biological applications. Cancer is a patient-specific as well as tissue-specific disease, but there are recurrent regulatory architectures common to all cancer types.[175]^66^,[176]^67 Based on these recurrent architectures, we hypothesized the existence of a PCCN and developed a method for reconstructing such a core regulatory network from large datasets. Feltes et al.[177]^68 also tried to curate core regulatory networks relevant to all cancer types.[178]^69^,[179]^70 However, these approaches might contain indirect interactions as curated links depend on the data. Due to the risk that few incorrect links in the network structure can vastly change the dynamics of the model, we instead focused on using a well-known network structure database such as Omnipath and cross-checked every link in the PCCN with other network databases: Pathway Commons, DoRothEA, and Cui’s human signaling network.[180]^71^,[181]^72^,[182]^73 Each drug has a different mode of action at the cellular level and thereby affects a different part of the intracellular network. Thus, we extracted the core structure responsible for each drug and optimized the parameters separately. The gray box framework can be applicable to readily accessible perturbation data, while enabling deep interpretation of molecular regulatory dynamics. Single-cell sequencing of drug-induced cellular state transitions provides detailed dynamic information of drug responses. Using these data, a number of studies predict and interpret the response dynamics of molecular regulatory perturbations.[183]^74^,[184]^75 However, aside from the high financial costs, producing such data is a difficult and time-consuming task that requires experimentally well-prepared samples of cells undergoing specific dynamics and properly selected pre-processing and analyzing tools.[185]^76 Under these limitations, the most commonly accessible data of molecular regulatory perturbations are drug responses of cancer cell lines in terms of cell viabilities. It is notable that discrete DRCNs optimized by GREY have successfully interpreted the drug-resistance mechanisms of afatinib in terms of molecular regulatory dynamics without explicit molecular regulatory experimental data, but only with cell viability. Owing to the vastness of public drug-response viability data in various diseases caused by malfunctioning cell networks, the suggested framework can also be applicable to controlling fates of pluripotent stem cells[186]^77^,[187]^78 or macrophage differentiation.[188]^79 Moreover, prediction performance and interpretation of molecular regulatory mechanisms of the developed gray box framework can be expected to further improve if data on molecular regulatory dynamics are also explicitly incorporated. Currently, GREY is trained with scale-free random networks for the specific task of optimizing biological networks, although many biological networks are not scale free.[189]^80 Thus, there is room for improving the optimization performance of GREY by training it with more biologically plausible random networks. Furthermore, in the future, GREY could be trained with other types of random networks with different structural characteristics to optimize various discrete dynamic models. Limitations of the study The DRCN formulates an "essential structure" connecting biological molecules tied to drug responses in the PCCN, aiming to minimize computational burdens of the optimizer by limiting number of links extracted between molecules in the PCCN. While this method efficiently identifies core network structures among molecules associated with drug responses, it overlooks long-distance bypass regulations involving multiple connections. Such pathways, despite potentially lower chances of affecting drug response dynamics, are critical but might be missed by our methodology. Dynamics influenced by these bypasses tend to be embedded through optimized parameters in shorter pathways among the selected molecules, omitting a few significant long-distance interactions. We utilized GREY to optimize DRCNs for 21 drugs. However, the prediction performance for certain drugs did not reach significance. Although constructing each DRCN involves varying optimal settings, we employed uniform settings for systematic performance comparison, potentially leading to suboptimal DRCNs for some drugs. In addition, GREY does not incorporate regularization in its optimization procedure for DRCNs. Incorporating regularization in future studies could enhance performance. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Chemicals, peptides, and recombinant proteins __________________________________________________________________ Trametinib APExBIO Cat#A3018 Afatinib Selleckchem Cat#S1011 Palbociclib Selleckchem Cat#S4482 __________________________________________________________________ Critical commercial assays __________________________________________________________________ RNA-spin kit INTRON Cat#17211 QuantSeq 3′ mRNA-Seq Library Prep Kit Lexogen Cat#015.24 __________________________________________________________________ Deposited data __________________________________________________________________ Gene expression data This paper [190]GSE184731 Harmonizome Rouillard et al.[191]^81 SCR_016176 Kyoto Encyclopedia of Genes and Genomes Kanehisa et al.[192]^49 SCR_012773 Genomics of Drug Sensitivity in Cancer 2 Iorio et al.[193]^35 SCR_011956 Catalog Of Somatic Mutations In Cancer Tate et al.[194]^39 SCR_002260 Pathway Commons Cerami et al.[195]^72 SCR_002103 __________________________________________________________________ Experimental models: Cell lines __________________________________________________________________ SW48 ATCC CCL-231 SW620 ATCC CCL-227 BT20 Korean Cell Line Bank(KCLB) KCLB #: 60061 NCI-N87 Korean Cell Line Bank(KCLB) KCLB #: 60113 MKN45 Korean Cell Line Bank(KCLB) KCLB #: 80103 BT474 Korean Cell Line Bank(KCLB) KCLB #: 60062 __________________________________________________________________ Software and algorithms __________________________________________________________________ GRAY This paper [196]https://github.com/hanyh0807/GREY [197]https://doi.org/10.5281/zenodo.10989715 Tensorflow version 1.15.5 Google [198]https://www.tensorflow.org/ Graph_nets python package Peter et al.[199]^39^,[200]^44 [201]https://github.com/deepmind/graph_nets/tree/master SRMF Wang et al.[202]^46 [203]https://github.com/linwang1982/SRMF DeepCDR Liu et al.[204]^10 [205]https://github.com/kimmo1019/DeepCDR LOBICO Knijnenburg et al.[206]^47 [207]https://github.com/TianxiaoNYU/LOBICO-Matlab MATLAB R2019b MathWorks [208]https://kr.mathworks.com/ pycma v3.1.0 Nikolaus et al.[209]^45 [210]https://github.com/CMA-ES/pycma STAR Dobin et al.[211]^82 [212]https://github.com/alexdobin/STAR DESeq2 Love et al.[213]^83 [214]http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html Signal flow analysis Lee et al.[215]^63 [216]https://github.com/dwgoon/sfa GDSCTools Cokelaer et al.[217]^84 [218]http://github.com/CancerRxGene/gdsctools Hierarchical HotNet Reyna et al.[219]^85 [220]https://github.com/raphael-group/hierarchical-hotnet The PathLinker app Gil et al.[221]^62 [222]https://apps.cytoscape.org/apps/pathlinker [223]Open in a new tab Resource availability Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Kwang-Hyun Cho (ckh@kaist.ac.kr) Materials availability This study did not generate new unique reagents. Data and code availability * • Gene expression data have been deposited at GEO and are publicly available as of the date of publication. Accession number is listed in the [224]key resources table. * • All original code has been deposited at GitHub and is publicly available as of the date of publication. The GitHub repository and DOI are listed in the [225]key resource table. * • Any additional information required to reanalyze the data reported in this paper is available from the [226]lead contact upon request. Experimental model and study participant details Setup for in-house experimental validations The relevant cell cultures, SW48 (Colon), SW620 (Colon), BT20 (Breast), NCI-N87 (Breast), MKN45 (Stomach), and BT474 (Stomach), were obtained from Korean Cell Line Bank and the American Type Culture Collection (ATCC). All cell lines were cultured in Dulbecco’s modified Eagle’s medium (WelGENE Inc., Gyeongsan, Republic of Korea) with 10% fetal bovine serum (FBS, WelGENE Inc.) and antibiotics (100 U/ml of penicillin, 100 μg/mL streptomycin, and 0.25 μg/mL of Fungizone) (Life Technologies Corp., Carlsbad, CA) at 37°C in a humidified atmosphere containing 5% CO2. To obtain reagents, trametinib (an inhibitor of MAP2K1 and MAP2K2, also known as MEK1 and MEK2, respectively) was purchased from APExBIO (Boston, MA, USA). afatinib (an inhibitor of EGFR and ERBB2) and palbociclib (an inhibitor of CDK4 and CDK6) were obtained from Selleckchem (Houston, TX, USA). To conduct cell growth assays, cells were seeded into a 96-well plate at a density of 3–7×103 cells/well in growth medium, incubated for 24 h and then treated with the indicated drugs. Following incubation of the plates for 72–120 h, relative cell viability was measured. After seeding, cells were imaged using IncuCyte ZOOM. To assess cell growth, the average area of each cell was determined at each time point using the IncuCyte ZOOM analysis software. Images were captured at 3 h intervals from three separate regions per well with a 20× objective. To conduct RNA isolation and bulk RNA-seq analysis, total RNA was extracted using an RNA-spin kit (INTRON), and RNA quality was assessed by Agilent 2100 bioanalyzer using the RNA 6000 Nano Chip (Agilent Technologies). RNA quantification was performed using ND-2000 Spectrophotometer (Thermo, Inc.), and 500 ng of RNA per sample were used for library construction. The library construction was performed using the QuantSeq 3′ mRNA-Seq Library Prep Kit (Lexogen, Inc.), and the library was single-end sequenced on an Illumina NextSeq 500 (Illumina, Inc.). The reads were aligned using STAR,[227]^82 and the read counts were estimated by featureCounts. Differentially expressed gene (DEG) analysis was performed by using significance analysis of microarrays method implemented in the DESeq2 package. We calculated the differentially expressed genes in log2-fold changes of expression levels measured in both sensitive and resistant cell lines for each type of cancer (colon, breast and stomach) using the DESeq2[228]^83 R package. The cutoff values were set with the ‘adjusted p-value’ under 0.05 and the base mean over 100. Method details Constructing DRCNs To efficiently elucidate the construction of the Pan-Cancer Core Network (PCCN) and the Drug Response Core Network (DRCN), we integrated a comprehensive list of cancer-related genes from multiple high-throughput data sources. These included oncogenic genes and compound targets from ten major signaling pathways identified in The Cancer Genome Atlas[229]^66 (TCGA), such as RTK/RAS, Nrf2, PI3K, TGFβ, Wnt, Myc, p53, Cell cycle, Notch and Hippo. Additionally, we incorporated frequently amplified or deleted mutations,[230]^71 the NCI-MATCH basket trial markers,[231]^38 notably co-occurring or mutually exclusive mutations,[232]^86 and genes from ‘Pathways in cancer’ (hsa05200) of the KEGG pathway.[233]^49^,[234]^50^,[235]^51 Through these integrations, we identified a total of 2,822 pan-cancer genes. Using the PathLinker algorithm,[236]^61^,[237]^62 these pan-cancer genes were each used as a source node to connect to all other genes, employing interconnecting pathways derived from the Omnipath network structure databse.[238]^36 The connected pathways, comprising 3,615 nodes after exclusion of paths with overlapping sources and targets, represented comprehensive interconnections among the pan-cancer related genes. To refine this network and extract the core structure, we pruned leaf nodes and redundant inter-nodes, smoothing the network using a network propagation algorithm,[239]^87 and eventually extracted a core structure with 2,574 nodes and 5,992 links. For the construction of the DRCN, we applied hierarchical HotNet[240]^85 and Edmonds’ algorithm[241]^88 to extract a core network that captures the essential genes and pathways involved in drug responses. This involved gathering genomic features such as mutations, copy number alterations, and expression levels of genes from various databases including the Cell Model Passport,[242]^89 OncoKB, and COSMIC. These features were used to compute z-scores in analysis of variance (ANOVA) of the GDSCTools[243]^84 package, which then served as vertices in the hierarchical HotNet algorithm, identifying the most statistically significant clusters of genes. We connected these clusters using Edmonds’ algorithm with link weights derived from inverse values of joint similarity scores calculated through HotNet. This process culminated in a DRCN that minimized network size while retaining critical drug response-related genes and pathways, further enriched with phenotypic markers such as ’proliferation’ and ‘apoptosis’ nodes from the CancerGeneNet.[244]^37 To extract a core structure from the pan-cancer related network, we took the following four steps: first, the nodes that only have incoming links or outgoing links were pruned. Second, pan-cancer related genes were smoothed across the network by the network propagation algorithm shown in the [245]Equation 3 suggested by Vanunu et al.[246]^90 [MATH: Pt+1=(1r)WPt+rP0 :MATH] (Equation 3) Where [MATH: Pt :MATH] is a probability vector at time step [MATH: t :MATH] in which the [MATH: i :MATH] -th element holds the probability of a random walker that will be placed at that time step. The initial probability vector, [MATH: P0 :MATH] , is a vector with a probability of 1 for the cancer related genes and 0 for the other genes, which were normalized by a number of pan-cancer related genes, such that the vector sums to 1. [MATH: W :MATH] is a column-normalized adjacency matrix of the pan-cancer related network. [MATH: r :MATH] is a probability of restarting from the pan-cancer related genes at every time step, which was set as 0.3 in our extraction. Third, an iterative random walker with a limited number of transitions walked around the network. The number of iterations was set to 10,000 and the maximum number of transitions was set to 30. Fourth, the pan-cancer core network was constructed by selecting a sub-network that was visited by a larger number of walkers than a specific cutoff value, which was set to 12. As a result, the pan-cancer core network was constructed with 2,574 nodes and 5,992 links. Weighted sum Boolean logics To simulate drug responses of a given DRCN, we adopted “weighted sum Boolean logic” by modifying the simple majority rule suggested by Font-Clos et al.[247]^91 A state of a node [MATH: k :MATH] in a network with [MATH: N :MATH] nodes, [MATH: sk :MATH] , can be either 1 (activated) or −1 (inactivated). The state of a network at a discrete time point, [MATH: t :MATH] , can be defined as a state vector, [MATH: Vst :MATH] , which is composed of the [MATH: N :MATH] states of its nodes, such as [MATH: skt :MATH] , at a certain time point. The tendency of each node state without any incoming signals can be represented as a basal level vector, [MATH: Vb :MATH] , which usually has a value of zero. Relative influences of each incoming link to a node can be a value in a specific range, which includes integers between −5 and 5 in our work. Positive numbers represent activating regulations and negative numbers refer to inhibitory regulations. A relation matrix can be defined as a [MATH: N :MATH] -by- [MATH: N :MATH] connectivity matrix of the network, wherein each element [MATH: (MR)ij :MATH] represents the relative influence of a node [MATH: j :MATH] on a node [MATH: i :MATH] . The network state at the next time point, [MATH: Vst+1 :MATH] , can be decided by the sign of element-wise multiplication of a weighted sum vector, [MATH: Vw :MATH] , computed by the matrix operation shown in [248]Equation 4. If some elements of the weighted sum vector are zero, the next state of the corresponding nodes follows the current state of each node. Network nodes are synchronously updated at each discrete time point: [MATH: Vst+1=sign(Vw)=sign(MR×Vst+Vb) :MATH] (Equation 4) The attractors, converged steady states of the network, can represent the biological phenotypes by analyzing average states of each node within the attractor states.[249]^92^,[250]^93 Average activity of each molecule represented by a node state of an attractor is computed as suggested by Kim et al.[251]^94 The average state of each node is computed as the average of the node state within a specific attractor multiplied by the proportional ratio of initial states which converged into that attractor. If an attractor does not converge into a single state and cycles a set of states, the average state of each node in that attractor is computed as a sum of node states in the cycle divided by the period of the cycle. The converging ratio of each attractor is approximately measured by repeated enumeration of the network state starting from a random initial network state to a particular attractor. Synthetic environments for training GRAY We designed a synthetic environment to train GRAY for Boolean network optimization through meta-reinforcement learning. A randomly generated scale-free network[252]^95 of 20∼40 nodes, with edge weights varying within [-5,5] except 0, were assigned to each environment. We generated 100,000 synthetic environments. At the start of each episode, one of 100,000 environments was randomly selected, and its virtual perturbation data was generated. The virtual perturbation data contains the fold change of the average activity for each node between the first attractors that were reached from 128 random initial states, and the second attractors that were reached starting from the first attractor after fixing several random nodes to 1 or -1 to imitate a drug treatment. By varying the fixed nodes to compute the second attractor, 16 ground truth datasets were generated for each environment. The fitness is defined as the average of the Pearson’s correlation coefficients (PCC) between the ground truth datasets and the fold change values computed by applying the edge weights ( [MATH: At :MATH] , obtained edge weights at step [MATH: t :MATH] ) obtained from GRAY onto the network. The fixed node sets used to compute fold change values for [MATH: At :MATH] were the same as the ground truth datasets. The reward was given as the observed improvement[253]^33 as follows: [MATH: Rt=max{ft maxi<t(fi),0} :MATH] (Equation 5) Here, [MATH: ft :MATH] is the fitness and [MATH: Rt :MATH] is the reward at step [MATH: t :MATH] . The goal of GRAY against a given environment is to maximize the discounted cumulative sum of reward. Neural network structure of GRAY GRAY consists of three components: the input, the submodule, and the main layer ([254]Figure S1A). The input layer has a graph module for embedding networks and a message module for communications between agents. The graph module consists of two graph neural networks (GNN), which is a class of neural networks tailored for processing input represented as graphs. The first ( [MATH: G :MATH] ) is for the main layer and the global state estimator of the submodule layer, and the second ( [MATH: GC :MATH] ) is for the context estimator of the submodule layer. The message module ( [MATH: Gm :MATH] ) is also a GNN and is used for all modules in both the submodule and the main layer. In each step, GNNs in the graph module receives eight centrality measures for nodes (betweenness,[255]^96 degree,[256]^97 eigenvector,[257]^98 katz,[258]^99 closeness,[259]^100 load,[260]^101 harmonic,[261]^102 pagerank[262]^103) and three values for edges (weight, sign of each edge in the network of the environment, and edge betweenness centrality[263]^96) as input, and updates nodes, edges, and the global feature by iteratively aggregating information from neighboring nodes, edges, and the entire graph. The weight in the input for the edge is from the output of the preceding step. The global feature represents a graph-level attribute. The message module initially receives a zero-vector input for all nodes and edges at the beginning of an episode, and generates nodes, edges, and the global feature. From the second step, the message module receives nodes, edges, global features, and edge weights generated from the preceding step as input, and updates its features. The outputs of the GNNs in the graph module and the message module are reorganized for each edge by concatenating the global feature, the edge feature, and the features of sender and receiver nodes of each edge ([264]Figure 3D). Each reorganized feature represents the state of each edge. We use this as the feature vector to be fed into the subsequent modules. The reorganized feature of the ith edge from each of the three GNNs ( [MATH: G,GC,Gm :MATH] ) are as follows: [MATH: Li=(ni_< mi>s,ni_< /mo>t,li ,g),LiC=(ni_sC,ni_tC,< msubsup>liC,gC),Lim=(ni_sm,ni_tm,< msubsup>lim,gm) :MATH] (Equation 6) [MATH: L=L1:k,LC=L1:kC,Lm< /mi>=L1:km :MATH] (Equation 7) Here, [MATH: Li,LiC,Lim :MATH] are the reorganized features of i^th edge from [MATH: G,GC,Gm :MATH] respectively. [MATH: ni_s,ni_t< /msub> :MATH] are the node features of the source and target node of i^th edge, respectively. [MATH: li :MATH] is the edge feature of i^th edge. [MATH: g :MATH] is the global feature of a network. [MATH: L,LC,Lm :MATH] are the reorganized features for all edges (1 to [MATH: k :MATH] ) in the network from each module, [MATH: G,GC,Gm :MATH] , respectively. Features without any superscript are from [MATH: G :MATH] , and features with superscript C and m are from [MATH: GC :MATH] and [MATH: Gm :MATH] , respectively. Modules in the submodule and main layer receive the reorganized features as an input ([265]Figure S1A). A submodule layer consists of a context estimator module and a global state estimator module. Both modules are one-layer LSTMs with layer normalization.[266]^104 In each module, there is one LSTM for each reorganized feature (i.e., one LSTM for each edge). LSTMs in each module share their parameters (weights of LSTM) but maintain independent hidden states. The context estimator is a module that helps the agent to quickly adapt to the selected environment by estimating information about the environment.[267]^105 During training, the true edge weights of the synthetic network from the selected environment in each episode are known. Thus, we can use the true edge weights as a target for the context estimator. [MATH: LC :MATH] , [MATH: Lm :MATH] , and the edge weight and the reward from the preceding step are set as inputs to the LSTM, and the 11-dimensional logits (edge weights are between −5 and 5) are generated through two dense layers with exponential linear unit (ELU) activation functions.[268]^106 Supervised training is done using cross entropy loss with respect to the true edge weight of the network from the selected environment in each episode. The global state estimator is a module used to provide information that can help learn cooperative policy in order to encourage the agents to cooperate and perform efficient exploration in high-dimensional search space, inspired by Foerster et al.,[269]^107 Das et al.,[270]^108 and Kim et al.[271]^109 Since the output (edge weight) of an agent (main layer) for each edge collectively determines the fitness, the output of all the agents needs to be well-organized. We trained the communication protocol to gather [MATH: L :MATH] , [MATH: Lm :MATH] , and the edge weight of the current step into the global state estimator module, and then predict the feature ( [MATH: L :MATH] ) of the next step. Since the input for the subsequent step is determined by the outputs of all agents, it is necessary to infer the plans of other agents in order to predict the features of the subsequent step. Therefore, the global state estimator provides inferred information of the outputs of the other agents. The global state estimator module predicts the average and standard deviation for the features of the subsequent step and is trained to increase the log probability for the actual value of that feature through supervised training. In the main layer (agent), [MATH: L :MATH] , [MATH: Lm :MATH] , the edge weight and the reward from the preceding step, and the activations of the second-to-last layer of the two submodules are set as inputs to a two-layer LSTMs with layer normalization. Similar to the submodule layer, there is a two-layer LSTM for each reorganized feature. LSTMs in this layer share parameters, while maintaining independent hidden states. Output and value are generated through an independent dense layer with an ELU activation function. The output is 11-dimensional logits (to represent −5 to 5) which estimates the next weight of an edge assigned to the agent, and the value is an expected discounted reward of the current input. Training procedure To train a derivative-free search rule for optimizing the edge weights of Boolean networks, GRAY uses meta-reinforcement learning.[272]^32^,[273]^42 GRAY is trained on various synthetic environments, where the environment involves optimizing edge weights of random synthetic networks for ground truth datasets. At the beginning of each episode, an environment is sampled out of 100,000 synthetic environments and fixed until the end of the episode. Each episode consists of 80 steps. Hidden states of all LSTM in GRAY are set to the zero vector at the beginning of the episode. The initial state ( [MATH: S1 :MATH] ) of the episode is a combination of a random parameter vector ( [MATH: A0 :MATH] ) and zero ( [MATH: R0 :MATH] ). The sign of each parameter in [MATH: A0 :MATH] is kept the same as that of the corresponding link in the network of the selected task. All episodes have 80 steps and each iteration of the input and output processes of an episode is saved to episode buffer when the episode is finished. GRAY is trained using 10 workers and one master, where each worker spends 8,000 episodes in the supervised pretraining phase and 12,000 episodes in the reinforcement learning phase. Each worker independently collects episodes and computes gradients. The master collects gradients from each worker and then averages them. The averaged gradients are applied to the neural network of the master, and the updated parameters are redistributed to each worker. The training procedure for the main layer consists of two phases: a supervised pre-training phase based on directly predicting the true edge weights of the selected environment and a reinforcement learning phase for maximizing the discounted cumulative sum of rewards using the distributed proximal policy optimization (DPPO) algorithm.[274]^110 In the supervised pretraining phase, we randomly replaced 50% elements of the new parameters with the true parameters that have a probability of 0.05 for each step. Whenever 64 episodes are collected, GRAY is trained using Adam on 4 minibatch samples (4 episodes) for one or three epochs, and the buffer is cleared. In the supervised pretraining phase, all neural networks are trained using supervised learning. However, the parameters of G^C and the submodule layer are updated for three epochs for every 64 episodes and the remaining parameters (G, G^m, main layer) are updated for one epoch for every 64 episodes. In the reinforcement learning phase, G^C and submodule layer are trained using supervised learning for one epoch for every 64 episodes. Moreover, the remaining parameters (G, G^m, main layer) are updated using the DPPO algorithm for three epochs for every 64 episodes. Detailed hyperparameters are summarized in [275]Table S2. Selecting six cell lines for in-house experimental validations To validate the molecular-scale interpretability of the DRCNs constructed by the gray box framework ensuring that the DRCNs can effectively capture significant drug response dynamics across a diverse range of cancer cell lines, with an affordable number of experimental validations, we have designed experiments by selecting specific cell lines based on a set of well-defined criteria that allows us to represent the broader landscape of drug response dynamics in pan-cancer settings. Our selection criteria was carefully designed to ensure that the chosen cell lines could serve as representative samples for validating the predictive capabilities of our DRCNs. These criteria include the following four considerations. First, we selected three of the most commonly occurring cancer types—colon, breast, and stomach cancers. Each of these cancer types has unique gene families associated with them, which makes each cancer type having distinctive molecular machinery related with drug responses to others. Second, we chose three representative U.S. Food and Drug Administration (FDA)-approved targeted cancer drugs—afatinib (EGF signaling), trametinib (MEK signaling), and palbociclib (CDK signaling). These drugs act on distinct signaling pathways that are most commonly altered in various cancers, including our three selected types. Third, we selected two groups of cell lines corresponding to those ranked in the top 15 for sensitivity and resistance, respectively, to each of the three selected drugs. These rankings were based on the area under the dose-response curve (AUC) values from the GDSC2 database. Fourth, among the selected sensitive and resistant cell lines, we chose one cell line from each group that shared the same genetic markers as suggested by the Molecular Analysis for Therapy Choice (MATCH) basket trials conducted by the National Cancer Institute (NCI-MATCH). As a result, we have used the mutational drug response markers from the EAY131 NCI-MATCH screening trial as follows: EGFR activating mutations and ERBB2 activating mutations for afatinib, BRAF fusions, BRAF V600E or V600K mutations, GNAQ or GNA11 mutations and NF1 mutations for trametinib, CCND1 or CCND2 or CCND3 amplifications, CDK4 or CDK6 amplifications for Palbociclib. For the six selected cancer cell lines, both BT-20 and BT-474 have the same mutation profiles for EGFR and ERBB2 (mutated EGFRs and not mutated ERBB2s), while they show different afatinib responses (sensitive BT-474 and resistant BT-20). Both NCI-87 and MKN45 have the same mutation profiles of no EGFR mutations, while they show different afatinib responses (sensitive NCI-N87 and resistant MKN45). Both SW48 and SW620 have the same mutation profiles of no ERBB2 mutations, while they show different afatinib responses (sensitive SW48 and resistant SW620). Both SW48 and SW620 have no mutations on BRAF, GNA11, GNAQ and NF1, while they show different Trametinib responses (sensitive SW48 and resistant SW620). Both SW48 and SW620 have no mutations on CCND1, CDK4 and CDK6, while they show different Palbociclib responses (sensitive SW48 and resistant SW620). This approach allowed us to address the challenge of classifying drug response differences and interpret the resulting molecular dynamics, which remain obfuscated by classical marker-based methods. By adhering to these selection criteria, we aimed to ensure that our experiments could effectively represent and validate the predictive capabilities of the DRCNs optimized by the gray box framework. These experiments serve as representative examples of how our framework can predict drug responses at the molecular level across diverse cancer types and targeted drug categories, surpassing the predictions made by traditional marker-based methods. In summary, we have been able to demonstrate the applicability of the gray box framework by strategically selecting representative drugs and cell lines for validation. This approach was chosen to avoid the need for experimental validation across all 300 cell lines used in our training set, as it would not provide additional insights beyond what has already been achieved with our carefully designed experiments. Integrating cell line mutation profiles with functional effect The mutational profile of each cell line is curated from the preprocessed COSMIC database, which was processed by Béal et al.[276]^111 The functional effect of each mutation is decided by comparing oncogenic and tumor suppressive q-values suggested by Tokheim et al.[277]^40 If the oncogenic q-value of a specific gene is smaller than the tumor suppressive q-value, then the functional effect of the mutation on that gene is considered oncogenic. Therefore, the functional effect of the mutation on the gene is treated as a gain of function. In the case that the tumor suppressive q-value is smaller than the oncogenic q-value, the functional effect of the mutation on the gene is treated as a loss of function. These functional effects of mutations are then cross-validated with functionally associated genes in Harmonizome.[278]^81 Maximum matching-based mutation data projection Several mutated genes of cancer cell lines exist within the pan-cancer network but not in the DRCN. Thus, we needed to approximate the effects of each mutation on the DRCN by leveraging structural information. Hu et al.[279]^112^,[280]^113 defined the structural control configuration (SCC) of a network model to approximate the effects of certain drugs on the downstream genes. We employed this method to approximate the effects of mutations. An SCC of a network model is a maximally matched graph within the network, although there exist many different SCCs within a large network. SCC consists of elementary paths, which are cascades connected by maximum links, and elementary cycles, which are cycles connected by maximum links. Additional links indicate those that connect elementary paths and elementary cycles. We generated 1,000 SCCs until the proportion of new maximum matching links reached zero. Next, we extracted all possible simple paths consisting of maximum matching links and additional links from mutated genes to other genes within the DRCN of each SCC. We utilized the sign information of each link from the paths among the mutated genes as well as the genes within the DRCN to obtain the effects of each mutation on its downstream genes. Incorporating mutations and drugs onto the simulation During simulations of drug responses of the core network, the effect of a mutation on a node inside the core network was reflected by fixing the state of the corresponding node to either 1 or -1 for a gain of function or a loss of function mutation, respectively. The effects of mutations on a node outside of the core network were projected into the core network using maximum matching. These effects were then applied onto their corresponding elements of the basal level vector [MATH: Vb :MATH] , 1 or −1 for a gain of function or a loss of function mutation, respectively. To estimate the inhibition effect of drugs on the core network, we applied a two-step drug response simulation. First, the network converged into attractors with the given mutation profile. Second, the network converged again from the prior attractor states to new attractors while the drug target nodes were additionally fixed to −1. If some of the prior attractor states were cycles, the first state of each cycle was treated as the prior attractor state in the second step of the drug response simulation. The difference between the average activity of the phenotype node in the two states (the new attractor and the prior attractor) were compared with the measured area under the viability curve (AUC) value collected from GDSC2. Signal flow analysis of DRCN perturbations Signal flows after the DRCN perturbations are analyzed by an algorithm suggested by Lee et al.[281]^63 Signal flows from a node are defined by adding the incoming signal flow to the product of the node activity change with the sign (positive or negative) of the outgoing links. Effects of mutations on signal flows are incorporated differently depending on whether the mutations were on a node inside the activity difference sub-network or not. The effect of a mutation on a node inside the activity difference sub-network was treated as pinning that node to a state, such that incoming links were deleted and a constant signal flows out from that node. The effect of a mutation from outside the activity difference sub-network was incorporated by propagating the change in signal flow caused by a mutation through exclusive simple paths computed by the maximum matching algorithm. Quantification and statistical analysis For comparison between two groups, two-sided Student’s t test was used. All pathway enrichment analyses were conducted using WebGestalt[282]^48 and KEGG.[283]^49^,[284]^50^,[285]^51 ANOVA was conducted using GDSCTools.[286]^84 Statistical details can be found in the respective figure legends, results, and the [287]STAR Methods section. Acknowledgments