Graphical abstract
graphic file with name fx1.jpg
[43]Open in a new tab
Highlights
* •
We propose a gray box framework with Boolean networks and a black
box optimizer
* •
GREY is a learned optimizer specialized in Boolean network
optimization
* •
Gray box framework can predict drug responses and reveal underlying
mechanisms
Motivation
The challenge of predicting cellular responses to perturbations amid
the complex non-linearities of molecular interactions has spurred the
development of machine learning-based models. However, interpreting
these models in terms of molecular regulatory dynamics remains
difficult. Conversely, logical network models like Boolean networks
offer interpretability but struggle with large-scale networks due to
high-dimensional search spaces. To overcome these hurdles, we introduce
a scalable derivative-free optimizer, trained via meta-reinforcement
learning, for Boolean network models. This approach enables prediction
of anti-cancer drug responses in cancer cell lines while offering
valuable insights into their molecular regulatory mechanisms.
__________________________________________________________________
Kim et al. present a gray box framework that combines a white box
logical model with a black box optimizer, addressing challenges in
interpreting molecular regulatory dynamics. The gray box framework
successfully predicts anti-cancer drug responses of cancer cells, while
shedding light on the underlying molecular regulatory mechanisms.
Introduction
Controlling cell fate with molecular regulatory perturbations is an
important challenge in various areas of biology. However, the control
effect of a perturbation on specific molecules is often
counterintuitive due to the complexity of the cellular network arising
from non-linear molecular regulatory interactions.[44]^1 Molecular
regulatory scale network structures, such as feedback loops and
crosstalk, cause perturbation responses of the network to be highly
dynamic processes. Therefore, predicting cellular response to a
perturbation from static information alone, such as mutation profiles
incorporated into the cellular network, obscures the underlying
mechanism.[45]^2 In contrast, dynamic information regarding network
state changes over time after a perturbation reveals the key regulatory
mechanisms of the network response and aids the precision of response
predictions. In this view, dynamic models of molecular regulation
capable of simulating cellular responses to perturbations provide
critical insight.[46]^3^,[47]^4^,[48]^5
Various machine learning (ML) models, including deep learning, have
shown high accuracy in predicting non-linear dynamic processes of
cellular networks, such as responses to molecular regulatory
perturbations.[49]^6^,[50]^7^,[51]^8^,[52]^9^,[53]^10 However, ML-based
models cannot provide biological rationale for their predictions since
interpreting the internal workings of the model is difficult. To
rectify this “black box” nature of ML models, some interpretable
methods such as random forest and linear regression have been applied
to predict cellular responses to
perturbations,[54]^11^,[55]^12^,[56]^13^,[57]^14^,[58]^15^,[59]^16
which provide biological features highly correlated to the predictions.
More recently, explainable artificial intelligence[60]^17^,[61]^18 and
deep neural networks with architectures that mimic the relationships of
biological entities, such as hierarchies of biological pathways,[62]^19
subsystems,[63]^20^,[64]^21 or molecular interactions,[65]^22 have been
utilized. These methods suggest biological features related to the
decision of the models and identify their relationships. However, due
to the lack of dynamic information provided by these suggestions from
the ML-based models, a dynamic model of molecular regulation is needed
to simulate and predict perturbation responses.
In systems biology, a cellular network is considered as a non-linear
dynamical system of interacting cellular components, such as genes and
transcription factors.[66]^23^,[67]^24 Thus, the network can be
abstracted to a system model containing molecular components and their
regulatory parameters represented by nodes and links. Continuous
dynamic models, such as ordinary differential equation network models,
can be constructed by integrating multiple sources of biological data
and a priori knowledge, and therefore can successfully recapitulate
complex dynamic processes such as cellular responses to
perturbations.[68]^25^,[69]^26 Reconstructing and simulating a dynamic
network model reproduces molecular regulatory mechanisms, which not
only predicts the perturbation responses but also enables
interpretation of molecular regulation dynamics such as perturbation
signal flows related with those responses. This approach is called
“white box” modeling due to the interpretable nature of the predicted
responses driven by the system. Despite this, estimating every kinetic
parameter of the continuous dynamic model is only feasible for small
and well-investigated networks. As the network size increases,
accurately estimating parameters becomes more difficult due to the
increase in the dimensionality of the search space.
Discrete logic-based models, such as Boolean network models, can
partially overcome these problems by abstracting the network dynamics
into simple but essential regulatory functions such as logical
equations. These functions can be represented by network parameters or
edge weights. The influence of neighboring nodes on a specific node in
the network can be determined by considering the states of these
neighboring nodes and the weights of the directed edges connecting them
to the specific node. This influence is quantified by a weighted sum,
calculated by multiplying the state value of each neighboring node by
the corresponding weight of the edge connecting it to the specific
node. If this weighted sum is greater than 0, the specific node becomes
activated, influenced by the states of its neighboring nodes.
Conversely, if the sum is less than 0, the node is deactivated, again
due to the influence of the states of its neighboring nodes. Several
studies have been made to estimate regulatory functions of these
discrete models based on information theory[70]^27^,[71]^28^,[72]^29 or
meta-heuristic optimization[73]^30^,[74]^31 due to the gradient-free
characteristics of discrete model optimization, but such approaches
still have scalability problems with high computational complexity.
Moreover, the growing demand for discrete dynamic models of in silico
drug discovery for precision medicine requires more efficient
optimization algorithms, due to the complexity of estimating network
parameters for each unique in silico model that predicts diverse drug
response dynamics of patient samples. Therefore, a more sophisticated
optimization algorithm is needed to efficiently estimate regulatory
functions and scale up to larger networks.
The learning to optimize (L2O)[75]^32 framework is a methodology that
utilizes ML to train an optimization algorithm. Whereas traditional ML
trains a particular model, L2O trains an optimizer that can be
subsequently applied to optimize several related models. By using L2O
to train a specialized optimizer for a specific type of optimization
problem, it has the potential to outperform classical optimization
algorithms in terms of computational complexity.[76]^33^,[77]^34 This
is because the trained optimizer tries to learn the characteristics of
the search space for the class of problems at hand, which enables it to
efficiently explore the space of a particular one of those problems. In
this study, we present a “gray box framework” that optimizes discrete
dynamic models of molecular regulatory networks using an optimizer
trained for discrete models with the L2O approach. This gray box
framework combines the interpretability of a white box model with the
scalability of a black box optimizer, allowing it to be both
interpretable and able to scale up to larger networks ([78]Figure 1).
We trained the black box optimizer, which we call GREY for
gradient-free optimizer trained by meta-reinforcement learning to yield
edge weights of Boolean networks, using synthetic optimization problems
that imitate white box models consisting of Boolean networks.
Figure 1.
[79]Figure 1
[80]Open in a new tab
A schematic representation of the gray box framework
In the gray box framework, the white box model provides topological
information of intracellular networks to the black box model, and the
black box model optimizes the edge weights of intracellular networks
from biological data. More specifically, the black box optimizer is
trained on synthetic environments generated by random networks with
perturbation fold change simulations. By blending these modeling
frameworks, the gray box model can predict precise drug responses that
are biologically interpretable.
GREY can learn a derivative-free exploration strategy to find an
optimal solution in the high-dimensional discontinuous search space of
discrete dynamical models such as Boolean networks. To validate the
optimization ability of GREY for real biological perturbation data, we
applied it to drug response prediction. We constructed a drug response
core network (DRCN) structure for each individual drug extracted from a
pan-cancer core network (PCCN) integrated from public databases. The
DRCN optimized by GREY can successfully predict the drug responses of
cancer cell lines by utilizing the area under the curve (AUC) values
derived from the viability curve data in the database Genomics of Drug
Sensitivity in Cancer 2 (GDSC2):
[81]https://www.cancerrxgene.org/.[82]^35 GREY was able to optimize the
logical parameters of the DRCN more efficiently than a state-of-the-art
derivative-free optimization algorithm. Further mechanism analysis
showed that the optimized network reveals key network motifs related
with perturbation response dynamics, which were utilized to infer drug
response biomarkers that match basket trial marker genes reported from
previous experiments.
Results
A gray box framework that optimizes a white box logical model using a black
box optimizer
We developed a gray box framework by combining a black box optimizer
with a white box modeling approach that generates biologically
interpretable predictions. The framework constructs a Boolean network
to model drug response at the cellular level, and optimizes the edge
weights of the network to match existing biological data, such as
cellular responses to drug perturbation. The gray box framework can be
broken into three parts ([83]Figure 2): (1) an extraction module for
the DRCN structure, (2) a preprocessing module for the cell line
profiles and drug response data, and (3) a gray box module that
optimizes the logical relationships within the DRCN.
Figure 2.
[84]Figure 2
[85]Open in a new tab
Overall procedure for constructing the gray box framework
The extraction module (green dotted box) constructs the pan-cancer core
network (PCCN) based on pan-cancer related networks gathered from
cancer-related genes and network topology data and extracts drug
response core networks (DRCNs). The preprocessing module (blue dotted
box) gathers mutation profiles and drug response data of cell lines,
and projects them onto the DRCN, generating DRCNs specific to each
drug. The gray box module (black dotted box) optimizes edge weights
using GREY based on the DRCN and preprocessed data. The ensemble of
optimized DRCNs is utilized by the drug response dynamic network
simulator to predict drug responses and generate mechanistic
interpretations.
The DRCN is constructed to distill existing knowledge of regulatory
interactions between genes and proteins related to drug responses in
cancer into a minimal network. It begins with a pan-cancer related
network that isolates genes contained in cancer-related biological
networks, frequently mutated genes in cancer, and target genes of
anti-cancer drugs. The links from a structural network database of
intracellular signaling among genes and proteins, Omnipath,[86]^36 that
have both a direction (identified source and target nodes) and a sign
(whether the source node activates or inhibits the target node) are
chosen to connect the selected genes into the pan-cancer related
network. The PCCN was then extracted by removing irrelevant inter-nodes
in the pan-cancer related network that are not contained in a path
between cancer-related genes. Next, we extracted the DRCN from the
constructed PCCN. The DRCN is required to keep the network size as
small as possible to minimize the computational requirements of the
Boolean network simulations, while retaining all drug response-related
genes and signaling pathways. To accomplish these requirements, a
minimum number of genes were selected that were significantly related
to the drug response of cancer cell lines in GDSC2. Then, a minimum
number of links and intermediate nodes were selected by Edmonds’
algorithm, and utilized to connect the selected genes to the DRCN.
Finally, phenotypic “proliferation” and “apoptosis” nodes, and their
links from the CancerGeneNet,[87]^37 were added to the extracted DRCN
as phenotypic markers. As a result, the constructed DRCNs incorporate
both drug response-related genes and proteins as nodes, recapitulating
the fundamental molecular regulatory dynamics of the selected drug
responses (see [88]STAR Methods).
The ability for the proposed gray box framework to predict drug
responses was evaluated across three different DRCNs to ensure
robustness across a variety of network structures. We constructed three
different DRCNs of US Food and Drug Administration-approved targeted
cancer drugs from Molecular Analysis for Therapy Choice (MATCH) basket
trials run by the National Cancer Institute (NCI-MATCH).[89]^38 We
selected afatinib (an inhibitor of EGFR and ERBB2), trametinib (an
inhibitor of MAP2K1 and MAP2K2, also known as MEK1 and MEK2,
respectively), and palbociclib (an inhibitor of CDK4 and CDK6). Each of
these drugs inhibits well-known and unique cancer-related machinery,
and together they engage a large fraction of the nodes covering the
PCCN structure (see [90]STAR Methods). The afatinib DRCN comprised 57
nodes, whereas the trametinib and palbociclib DRCNs comprised 66 nodes
and 192 nodes, respectively.
While the constructed three different DRCNs that GREY optimizes
recapitulate essential regulatory network structures related with drug
responses, each drug response of each individual cancer cell line can
be simulated by applying the mutational profiles of each cell line from
the GDSC2 database to the DRCN of each of the three drugs. To apply
mutations from a cell line to the DRCN of a given drug, the
corresponding nodes in the DRCN whose mutations correspond to the
mutation profiles of that cell line in GDSC2, are set to a specific
invariant state. The invariant node state is set to 1 if the mutation
is either activation, amplification, or gain of function and set to −1
if the mutation is suppression, deletion, or loss of function.
Mutations that are in nodes outside of the DRCN are treated as a
constant input signal that activates or inactivates any downstream
nodes in the DRCN. The mutation profiles of each cell line were
obtained from the Catalogue of Somatic Mutations in Cancer
(COSMIC)[91]^39 database and the effect of each mutation was determined
from their oncogenic or tumor suppressive q values (see [92]STAR
Methods).[93]^40 In this way, we mapped numerous mutation profiles of
cell lines to a small number of nodes in our DRCNs. For example, 716
cell lines were represented identically by the afatinib DRCN composed
of 57 nodes. Having incorporated mutation profiles and drug responses,
the edge weights of the resulting DRCNs are then ready to be optimized
by GREY.
Construction and optimization procedure of GREY
We developed GREY to optimize edge weights of DRCNs for the mutation
profiles and the drug response data from GDSC2. We formalized DRCNs
using weighted sum Boolean logic (see [94]STAR Methods), and the edge
weights of the DRCN were assigned to each link. In weighted sum Boolean
logic, the activation of each node is determined by a weighted sum of
values from its incoming connections, where the weights represent the
regulatory influence of each link. Due to the non-continuous fitness
space that results from the discrete states of the nodes, and the
time-consuming attractor simulation processes of Boolean networks, we
needed a derivative-free and sample-efficient optimizer. Thus, we
trained GREY, which is an optimizer specialized for optimizing the edge
weights of Boolean networks, through L2O, which employs multi-agent
meta-reinforcement learning[95]^41^,[96]^42^,[97]^43 for its
implementation. An agent in reinforcement learning is an entity that
learns to make optimal decisions by interacting with an environment and
receiving feedback in the form of rewards. In multi-agent reinforcement
learning, multiple agents must learn how to cooperate with each other
to achieve their collective objectives.
In the current setting, the environment is an optimization problem
(optimizing the edge weights of Boolean networks), and there is one
agent for each edge weight. The agents explore the search space of the
optimization problem (all possible edge weight values) to find good
solutions that have high fitness. The edge weight values of each
solution are utilized to simulate a Boolean network. Fitness is then
defined as the Pearson’s correlation coefficient between the simulated
fold change of the node values and the ground truth extracted from a
database or biological experiment. The reward for agents is
proportional to the difference between its current fitness and its
previous best fitness, and the agents are trained to maximize reward.
In traditional reinforcement learning, an agent learns an exploration
strategy to maximize its reward within a specific environment. For our
problem, each drug corresponds to a single DRCN structure. These
different DRCNs represent different environments for the optimizer. To
accommodate multiple environments in the form of multiple different
DRCN structures, we adopted meta-reinforcement learning, which allows
agents to learn adaptable exploration strategies by training across a
variety of different environments.
In a given environment, GREY receives the network structure of the
environment with edge weights and reward as input, and produces new
edge weights as output. Then, fitness is computed using these new edge
weights, followed by a reward. The entire sequence from entering input
to GREY to obtaining a reward constitutes a single step. The new edge
weights and the reward obtained from the preceding step are used as
input for the subsequent step. In the initial step, edge weights and
reward used as input are initialized to random values and zero,
respectively. An episode is a sequence of interactions between GREY and
an environment, i.e., a sequence of steps, and GREY is trained from the
cumulative experiences gained throughout the episode.
GREY was trained using 100,000 synthetic environments with random
networks composed of 20–40 nodes, random edge weights, and a generated
ground truth dataset (see [98]STAR Methods; [99]Figures 3A and
[100]S1). The random network was formalized by weighted sum Boolean
logic, and the ground truth dataset comprises the fold change of the
average activity for nodes between the first attractors attained from
random initial states and the second attractors reached from the first
attractor after fixing some nodes to 1 or –1. The resulting GREY
trained on randomly generated synthetic environments can be applied to
optimize the edge weights of biological networks with any given
structure and corresponding data ([101]Figure 3B).
Figure 3.
[102]Figure 3
[103]Open in a new tab
Overall architecture of GREY and training procedure
(A) An illustration of the training procedure.
[MATH: Aˆ
:MATH]
is the true edge weights of the environment, and
[MATH: At :MATH]
is the new edge weights at step
[MATH: t :MATH]
.
[MATH: ft :MATH]
is the fitness at step
[MATH: t :MATH]
, and
[MATH: Rt :MATH]
is the observed improvement (OI) of the fitness (PCC, Pearson’s
correlation coefficient) as a reward at step
[MATH: t :MATH]
.
[MATH: mt :MATH]
includes hidden states. After generating an episode by repeating 80
steps, the trajectory is stored in the buffer. Neural network
parameters of GREY (
[MATH: θG :MATH]
) are updated whenever 64 episodes are saved, and then the buffer is
cleared. Supervised pre-training to predict the true edge weights is
performed until the initial 8,000 episodes are used and, from then on,
reinforcement learning is performed with the DPPO algorithm to maximize
the cumulative discounted rewards.
(B) Trained GREY can optimize the edge weights of the white box Boolean
network model regardless of network structure. During each step of
optimization, trained GREY iteratively receives network structure with
preceding edge weights and its reward as input, and then attempts to
find new edge weights with better fitness and returns those as output.
(C) Inputs are processed using graph neural networks and node, edge,
and global features are generated.
(D) For each edge, a reorganized feature is formed by concatenating its
edge feature, the node features of sender and receiver nodes, and the
global feature.
(E) LSTMs are assigned to each edge. While the parameters of these
LSTMs are shared, they maintain independent hidden states.
The architecture of GREY consists of three layers: the input, the
submodule, and the main layer ([104]Figure S1A). The input and the
submodule layers serve an auxiliary role of providing information to
the main layer. The input layer is composed of several graph neural
networks (GNNs),[105]^44 which are a type of neural network designed to
process and learn from data structured as graphs by capturing
relationships between nodes. GNNs in the input layer produce node,
edge, and global (graph-level) features by aggregating information from
neighboring nodes, edges, and the entire graph, and they represent
underlying dependencies within the graph ([106]Figure 3C). Those
features are then reorganized for each edge by concatenating the edge
feature with the node features of sender and receiver nodes, as well as
the global features ([107]Figure 3D). These reorganized features are
then passed to the submodule and the main layers and processed by long
short-term memory (LSTM) ([108]Figure 3E). In the submodule and the
main layers, one LSTM is assigned to each edge, and LSTMs in the main
layer each act as one agent. Each agent produces a new weight for its
corresponding edge, and the new weights from all agents collectively
determine fitness, since fitness depends on the dynamics of the network
as a whole. The submodule layer has a context estimator, which helps
the agent adapt quickly to the environment by estimating information
about the environment, and a global state estimator, which helps the
agent assigned to each edge to cooperate by estimating the behavior of
the other agents.
Application of GREY to synthetic and biological environments
We assessed the optimization performance of GREY against unseen
environments. In each application, GREY produces five new candidate
edge weight sets for each step (i.e., generating five outputs at each
step) and randomly replaces elements (e.g., 50%) from each candidate
edge weight set by the edge weights with the best fitness from all
previous steps. Then, we evaluated the fitness of each candidate edge
weight set (i.e., five evaluations per step) and used the candidate
with the best fitness as an input for the next step. First, we checked
that GREY can perform optimization on an example synthetic environment
not used in training. As more evaluations were performed, GREY found
edge weights with higher fitness ([109]Figure 4A). Random agents, which
are untrained GREY, showed no meaningful behavior. Thus, we confirmed
that GREY can perform optimization properly.
Figure 4.
[110]Figure 4
[111]Open in a new tab
Performance of GREY on synthetic environments and DRCNs
For synthetic environments and DRCNs, we used 128 and 1,000 random
initial states, respectively. Initial states are sampled for every step
during optimization. During each step, 5 sets of edge weights are
generated and evaluated, and the set of edge weights with the highest
fitness becomes an input for the next step.
(A) GREY optimization for a synthetic environment (50 nodes, 94 links)
that was not used in training. The larger line plot shows the best
fitness (y axis) across evaluations (x axis), while the inset plot
shows fitness at each evaluation. Results of five random agents are
shown for comparison. Shaded areas have 95% confidence intervals over
three independent runs.
(B and C) Comparison of the results of CMA-ES on various scales.
Initial standard deviation (
[MATH: σ0 :MATH]
) of CMA-ES was chosen among (0.1, 0.3, 0.5, 1.0, 2.0) by selecting
that which exhibited the highest average fitness. The number of
evaluations per step (population size) of CMA-ES are (
[MATH: 4+3ln(n
etworksize
) :MATH]
), which is the default setting, in (B) and 5 in (C). The results
depict the average maximum fitness after 100 evaluations. Error bars
are 95% confidence intervals over 20 random networks (∗p < 0.05,
∗∗p < 0.01, ∗∗∗p < 0.001, Student’s t test, two-sided).
(D) Optimization performance for DRCNs of afatinib, trametinib, and
palbociclib. We randomly selected 300 cell lines for each drug. Shaded
areas are 95% confidence intervals over three independent runs.
(E) Median and median absolute deviations of F1 scores for 21 drugs in
four methods.
Next, we compared the optimization performance of GREY to covariance
matrix adaptation evolution strategy (CMA-ES),[112]^45 which is
considered the state-of-the-art derivative-free optimization method for
challenging settings, such as non-convex and non-continuous problems.
During each optimization step, evaluating fitness incurs the greatest
computational cost for both GREY and CMA-ES, because of the
time-consuming attractor simulation processes of Boolean networks.
Thus, we compared the performance in terms of sample efficiency, aiming
for higher fitness with fewer evaluations. GREY achieved statistically
significant higher fitness for all scales of networks (20–200 links
with 20 networks for each size interval) that were not used during
training ([113]Figure 4B). Furthermore, GREY outperformed CMA-ES when
the number of evaluations per iteration step was the same
([114]Figure 4C).
We then applied GREY to optimizing the DRCNs of three drugs, afatinib,
trametinib, and palbociclib, with respect to their drug response data
from GDSC2 ([115]Figure 4D). When optimizing the edge weights of the
DRCN for a certain drug, the fitness of the solution is the accuracy of
DRCN drug-response predictions. More specifically, the DRCN predicts
drug responses by calculating the difference between the average
activity of phenotype nodes between the prior attractor and the new
attractor. The prior attractor is obtained from random initial states
with the given mutation profile for each cell line, and the new
attractor is obtained from the prior attractor while maintaining drug
target nodes at −1. The fitness is then defined as the Pearson’s
correlation coefficient between the predicted drug responses from DRCN
and the ground truth, as before. However, unlike with the simulated
datasets, the ground truth drug response of each node is not known.
Instead, only the activity change of the phenotype nodes are measured
and compared with the ground truth to determine fitness. The ground
truth of the drug response is the measured area under the viability
curve (AUC) value collected from GDSC2 (see [116]STAR Methods).
Compared with CMA-ES, GREY consistently achieved better fitness when
optimizing DRCNs for all three drugs. For the trametinib DRCN, GREY
reached a fitness of 0.3 after an average of 403 evaluations, which is
approximately 3 times more sample-efficient than an average of 1,102
evaluations of CMA-ES. GREY successfully optimized the palbociclib DRCN
consisting of 655 edge weights, despite the search space size being
significantly larger than that of the synthetic environments used in
the training of GREY. These results suggest that GREY is sample
efficient and highly scalable.
We then compared prediction performance of optimized DRCNs over 21
drugs that have various modes of action ([117]Table S1) with ML models
SRMF[118]^46 and DeepCDR,[119]^10 and a logical prediction model
LOBICO.[120]^47 Each of the 21 drugs represents a corresponding group
of drugs targeting the specific pathway according to the labels in the
GDSC2 database. For each drug, 300 cell lines were randomly selected to
form a training set and the remainder formed a test set. All comparison
models were trained using the same set of training data with default
settings. Drug-response values were binarized according to the rule
defined by Knijnenburg et al.[121]^47 Predicted drug-response values
([122]Figure S2) from DRCNs are binarized using an optimal threshold,
determined based on the highest F1 scores observed across the training
set cell lines. F1 score ([123]Equation 2) was calculated for each
drug. Medians and median absolute deviations of all drugs are
summarized in [124]Figure 4E.
[MATH: Precision=TruepositiveTrue
positive+Falsepositive,Recall=TruepositiveTrue
positive+Falsenegative
:MATH]
(Equation 1)
[MATH: F1score=2×Recall×Precision
Recall+Precision :MATH]
(Equation 2)
The prediction performances from the optimized DRCNs using GREY (GREY
opt.) are lower than that of the ML models such as SRMF or DeepCDR.
However, GREY opt. showed better performance than the logical
prediction model LOBICO, and it has further strengths in terms of
interpretability. For example, a marker predicted for afatinib by
LOBICO is an ERBB2 mutation,[125]^47 which is a direct target of
afatinib, but it does not explain the underlying dynamics of how the
mutation affects the phenotype. On the other hand, our model can
suggest molecular regulatory dynamics by simulating the optimized DRCN
with sensitive and resistant cell lines, as we describe in the
following sections.
Successful prediction of drug responses and molecular regulatory dynamics
with the optimized DRCNs
Interpretable drug responses depend on accurate inference of both the
phenotypic response and the underlying molecular regulatory dynamics.
We explored whether the DRCNs optimized through GREY can predict not
only the phenotype-level drug responses but also the changes
represented by fold changes in gene expression. To ensure the
applicability of our gray box framework throughout the pan-cancer, we
selected two cell lines, one sensitive and one resistant to afatinib
treatments, respectively, for each of three frequently occurring cancer
types in humans: breast, colon, and stomach cancer. Each of the six
selected cell lines has been carefully chosen to serve as a
representative model, demonstrating either a sensitive or resistant
response to the selected drug, afatinib, based on the AUCs of the GDSC2
database ([126]Figure 5A). However, within each cancer type, the
sensitive and resistant cell lines share identical genetic markers for
drug response predictions, as suggested by the MATCH basket trials
conducted by the National Cancer Institute (NCI-MATCH). Therefore,
classifying drug response differences between sensitive and resistant
cell lines requires interpretation of the underlying molecular
dynamics, which remain obfuscated by classical marker-based methods
(see [127]STAR Methods).
Figure 5.
[128]Figure 5
[129]Open in a new tab
In-house experimental validation of simulated changes in gene
expression
(A) Relative afatinib responses of six selected cell lines from the
GDSC2 drug responses database. Three afatinib-sensitive (in green) and
three afatinib-resistant (in red) cell lines were well separated with
respect to the area under the curve (AUC) of cell viability after
various dosages of afatinib treatments given in the GDSC2 database. Two
breast cancer cell lines used for further analysis are colored with
higher saturations.
(B) Simulated afatinib responses of six cell lines. Note that the first
two graphs show a decrease in the average activity of the proliferation
node (y axis < 0) and the third graph shows an increase in average
activity of the proliferation node (y axis > 0) (∗p < 0.05, ∗∗p < 0.01,
∗∗∗p < 0.001, Student’s t test, two-sided).
(C) In-house afatinib response data of six cell lines, quantified by
relative confluence. Representative microscopic images for one pair of
cell lines are shown (∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, Student’s t
test, two-sided).
(D) The pathway enrichment analysis results of DEGs before and after
each drug treatment. The most enriched Kyoto Encyclopedia of Genes and
Genomes (KEGG) pathway of DEGs in afatinib-treated BT-474 is
“decreasing cell cycle,” The most enriched pathway in afatinib-treated
BT-20 is “increasing DNA replication.”
(E) Min-max normalized log2-fold change values of each gene of the
in-house experimental data compared with min-max normalized average
activity changes of each afatinib DRCN node computed by the drug
treatment simulation. Parameter sets are optimized by a random agent
and GREY, each repeated three times. Min-max normalized fold changes in
the iterative simulation results of phenotype nodes and STAT1
(downstream node of EGFR) computed by the optimized parameter sets
found by GREY and a random agent are compared with experimental data in
histograms.
With the selected six cell lines, we firstly confirmed that
GREY-optimized DRCNs are capable of reproducing the expected phenotypes
in response to drug perturbations. To do so, we compared our in-house
experimental data of drug responses in six cell lines with the drug
responses simulated by optimized DRCNs. We mapped the genetic profile
of each cell line onto the basal node activities of the afatinib DRCN,
and GREY optimized the logical parameters to reproduce the viability of
the cell lines in a robust manner. Using the optimized afatinib
response dynamic network simulator, we simulated the proliferation
response of each cell line, calculated as the average change of the
proliferation node from 34 repeats ([130]Figure 5B). Average activities
of the proliferation nodes after afatinib treatment in two of the
sensitive cell lines (BT-474 and SW48) were significantly decreased
compared with those of two of the resistant cell lines (BT-20 and
SW620). For the pair of stomach cancer cell lines, the average activity
of the proliferation nodes did not decrease following afatinib
treatment in the simulations. However, the average increase of activity
in the sensitive stomach cancer cell line (NCI-N87) was smaller than
that of the resistant stomach cancer cell line (MKN45). Thus, all of
the predicted drug responses were consistent with the relative afatinib
responses from the GDSC2 database. We assessed the effect of afatinib
on proliferation of the cell lines in culture. The experimental results
of afatinib responses were consistent with the response data from the
GDSC2 database and with the simulated predictions of drug responses
through DRCNs ([131]Figure 5C). Detailed in-house experimental data of
drug treatment in cancer cells are presented in [132]Figure S3.
After confirming that the optimized model reproduces the expected
phenotypic responses, we compared the model predictions of each gene in
response to the drug perturbations with both existing databases and
in-house biological data. To determine plausible gene sets that
distinguish between sensitive and resistant pairs of cell lines at the
cellular level, we first performed a pathway enrichment analysis with
the web-based gene set analysis toolkit (WebGestalt)[133]^48 based on
the Kyoto Encyclopedia of Genes and Genomes
pathways.[134]^49^,[135]^50^,[136]^51 We selected BT-474 and BT-20 cell
lines, because experimentally these cells had the most significantly
different proliferation responses to afatinib. We calculated the
log2-fold changes of differentially expressed genes (DEGs) before and
after drug treatments in both cell lines by RNA sequencing data and
analyzed the pathways with altered expressions ([137]Figure 5D). The
pathway with the highest enrichment score of DEGs in the
afatinib-treated BT-474 was “decreasing cell cycle” (hsa04110), which
is consistent with the response of this sensitive cell line. The
pathway with the highest enrichment score of DEGs in the
afatinib-treated BT-20 was “increasing DNA replication” (hsa03030),
which is known to be related to drug resistance
mechanisms,[138]^52^,[139]^53 but did not meet the threshold for false
discovery rate, indicating that there were no significant changes in
gene expression induced by afatinib in these cells.
We then preprocessed the expression fold change data of BT-20 and
BT-474 cells to enable comparison with simulated afatinib treatment in
GREY-optimized DRCNs. Although the expression fold changes in both
BT-20 and BT-474 cell lines display a sufficiently large number of
preprocessed DEGs exceeding threshold values to ensure quality control,
the two cell lines exhibit a different number of significant DEGs.
BT-20 cells had fewer genes with a base mean count over 100 (5,190
genes) than the BT-474 cells (15,583 genes), of which only 21 of 55
nodes were matched with the afatinib GREY-optimized DRCN (excluding the
two phenotypic nodes). In the case of BT-474, 45 of 55 nodes in the
afatinib DRCN were matched with the measured genes with a base mean
count over 100. Both in-house expression data and DRCN simulation data
are calculated in the form of log2-fold changes and min-max normalized.
The log2-fold change values of each gene of the in-house experimental
data and the average activity changes of each afatinib DRCN node
computed by the drug treatment simulation were min-max normalized and
compared. As the afatinib DRCN is an essential regulatory network of
intricately interconnected genes and proteins that collectively
influence drug response, it is imperative to validate the correlation
between changes in gene expression levels and the resulting shifts in
the activities of proteins within this network.[140]^54 Consequently,
the activity changes of proteins represented by nodes in the afatinib
DRCN can be compared and validated with the RNA expression fold changes
measured by our in-house experiments.
The insights into drug response differences between sensitive and
resistance cell lines based on the in-house data were then compared
with the predictions of GREY-optimized DRCNs. The optimized afatinib
DCRN was simulated, and the average change in node activity due to drug
perturbation was compared with the gene expression fold changes of the
experimental data. The computed average activity change of each node
was an average value of three simulations from three parameter sets
optimized by GREY or by a random agent, which had the same structure as
GREY but with randomly initialized weights. To check the consistency of
fold changes in each node computed by the optimized parameter sets from
GREY, we compared the results of 30 iterative afatinib treatment
simulations, each with ten repetitions and different initial states for
each of the three parameter sets, against the corresponding 30
simulation results computed by three parameter sets optimized by the
random agent. Genes with moderate fold changes in experimental data and
nodes with inconsistent fold changes, which depended on the parameter
sets or the initial state sets, were filtered according to log2-fold
changes and fold change variance over 30 simulations, respectively. As
a result, the fold change tendencies of 14 nodes simulated by the
parameters of GREY-optimized afatinib DRCN and mutational profiles of
BT-474 projected on the network were in accord with afatinib treatment
experiments. Whereas in the random agent-optimized parameters, only 5
of 42 results matched the experiments. In BT-20, 7 of the simulated
node fold changes optimized by GREY showed the same tendency as the
experimental data, while only 1 node fold change optimized by the
random agent was well in accord with the experiments ([141]Figure 5E).
The correlation between the simulated node fold changes optimized by
GREY and the experimental data is further validated with a statistical
test and compared with the data of randomly shuffled node labels
([142]Figure S4).
The major difference between the networks with parameters optimized by
GREY and the random agent was that the former depends both on network
topology and logic, whereas the latter depends on only network
topology. Although random agent-optimized parameters were able to
correctly predict some of the node fold changes, the simulated network
converged into a single attractor. Note that the iterative fold change
results of phenotypic nodes and STAT1 (downstream node of EGFR)
simulated by GREY-optimized afatinib DRCNs were more spread out,
indicating various attractors, while those simulated by random
agent-optimized afatinib DRCNs were more concentrated in a few rescaled
fold change intervals, indicating a single attractor (histograms in
[143]Figure 5E). The node states of the converged attractor with random
agent-optimized afatinib DRCNs were predictable by topologically
estimating the effect of the afatinib perturbation with the network
diffusion algorithm suggested by Hofree et al.[144]^55 The result
infers that the dynamics of the network are simple, lacking any complex
feedback or crosstalk dynamics, and hence inadequate for reproducing
the behavior of a biological network. Moreover, the single attractor
converged to by each of the optimized afatinib DRCNs using the random
agent differed depending on the parameter sets. This implied that with
additional optimized parameter sets, the simulated network dynamics
would not converge into a single distribution. As a result, increasing
the number of parameters sets cannot guarantee prediction performance
improvement.
In stark contrast, the networks with the parameters optimized by GREY
converged into a higher number of attractors, corresponding to varied
regulatory dynamics, which exhibited similar phenotypic predictions.
These results suggest that the networks optimized by GREY comprise not
only topological information, but also dynamic properties including
feedbacks and crosstalk, which are known to be important features to
understand in order to overcome drug resistance mechanisms in cancer
therapy.[145]^56 Moreover, even though average responses of GREY and
the random agent-optimized DRCNs are similar, GREY-optimized networks
can be said to more properly recapitulate the biological nature of
perturbation responses of regulatory networks. Biology regulatory
networks exhibit stochasticity in their gene expressions, molecular
activities, and probability of chemical reactions, which tend to
produce a range of distinct
attractors.[146]^57^,[147]^58^,[148]^59^,[149]^60 The multiple
attractors exhibited by GREY-optimized DRCNs are hence more realistic.
Moreover, the distributions of attractors from the network optimized by
different parameter sets from GREY were more highly clustered than
those of the random agents. This illustrates that the prediction
performance of the drug-response networks optimized by GREY will not be
derailed by a larger number of parameter sets. Thus, the networks
optimized by GREY are more suitable than the networks optimized by the
random agent for the purpose of cancer drug-response predictions.
Interpreting regulatory mechanisms of drug resistance with the gray box
framework
After establishing the ability of GREY to infer regulatory dynamics, we
explored if those cellular-level insights could be interpreted to
reveal underlying mechanisms of drug resistance. We analyzed the
simulated responses of BT-20 and BT-474 cells to discover network
motifs behind the differences in afatinib response. We calculated the
difference between the average activity fold change of afatinib-treated
BT-20 and that of BT-474 by subtracting fold change values of the
BT-474 afatinib core network from that of the BT-20 afatinib core
network ([150]Figure 6A). Note that some nodes had significantly
different fold change values. With an assumption that the nodes with
significant difference in fold change between the sensitive and the
resistant cell lines cause the difference in afatinib responses, we
selected the five nodes with the largest positive difference (PTPN6,
AR, PDPK1, IGF1R, and SGK1, in decreasing order of absolute values) and
five with the largest negative difference (MAP3K3, LCK, NFKBIA, PPARA,
and STK3, in decreasing order of absolute values) in fold changes of
simulated node activities.
Figure 6.
[151]Figure 6
[152]Open in a new tab
Mechanistic analysis of afatinib treatment to identify the most
influential nodes for predicting and controlling drug resistance
(A) Afatinib DRCN. The color of each node represents the predicted
relative activity changes of resistant and sensitive cell line models,
and the color of the arrows denotes whether the signal flow increases
(red) or decreases (blue) after afatinib treatment.
(B) Activity difference sub-network for afatinib response from BT-20
and BT-474 cells. This sub-network was extracted from the afatinib
response network in (A).
(C) Identification of context-dependent flows of afatinib treatment
effects toward the proliferation node. The propagated mutational
profile of BT-20 on the activity difference sub-network contained two
nodes with activating mutations (EGFR and PTPN6), and three inputs from
mutations outside this sub-network that activated (NGKB1, SYK, and AR),
and one that is inactivated (PPARA). The propagated mutational profile
of BT-474 on the activity difference sub-network had three nodes with
activating mutations (EGFR, PPARA, and NFKB1), two nodes activated by
input from mutations outside this sub-network (STAT1 and ERBB2), and
one node inactivated by external mutation (CDK1). Signal flow and node
activity changes in activity difference sub-network of resistant cells
and sensitive cells upon perturbation of EGFR and ERBB2 (afatinib
treatment) are illustrated.
(D) The balance between mutational effects on two essential negative
feedback loops is primarily responsible for determining the resistance
to afatinib treatment.
To explore the interactions between these nodes in the context of drug
resistance, we extracted an activity difference sub-network from the
afatinib DRCN. This activity difference sub-network was formed by
connecting the selected ten nodes with links and inter-nodes by
PathLinker,[153]^61^,[154]^62 which explored up to 50 paths from the
drug target nodes (EGFR, ERBB2) to the phenotypic node (proliferation)
through the selected 10 nodes ([155]Figure 6B). We found that the ERBB2
cascade and AR-AKT1 negative feedback are activated in BT-20, whereas
the EGFR cascade and NFKBIA-NFKB1 negative feedback are activated in
BT-474 after afatinib treatment.
To interpret the mechanism underlying the different afatinib responses
with respect to EGFR and ERBB2 inhibition signals using the activity
difference sub-network, mutational effects and afatinib perturbation
signal flows were analyzed ([156]Figure 6C). Signal flows from a node
are defined by adding the incoming signal flow to the product of the
node activity change with the sign (positive or negative) of the
outgoing links.[157]^63 The effect of a mutation on a node inside the
activity difference sub-network was treated as fixing that node to a
state, such that incoming links were deleted and a constant signal
flows out from that node. The color of the arrows denotes whether the
signal flow increases (red) or decreases (blue) after afatinib
treatment. The effect of a mutation from outside the activity
difference sub-network was incorporated by propagating the change in
signal flow caused by a mutation through exclusive simple paths
computed by the maximum matching algorithm. The propagated mutational
profiles of two cell lines were incorporated into the sub-network as
the indirect influence of mutation on the sub-network nodes (see
[158]STAR Methods).
The activity-difference sub-network reveals how drug responses
differently affect each cell line. In BT-20, the decreasing signal
flows from the afatinib-inhibited EGFR node through the
STAT1-PPARA-NFKBIA cascade and the ESR1-AR cascade were offset by the
increasing signal flows into the NFKB1 node and the AR node,
respectively. The NFKB1 node and AR node were indirectly influenced by
mutations in nodes outside the afatinib sub-network, but within the
PCCN. The increasing signal from the afatinib-inhibited ERBB2 node
flows through the CDK1-AR cascade and is further enhanced by the
increasing signal flow at the indirectly activated node, AR, increasing
the proliferation node activity. The increasing signal flowing into SYK
from the mutations outside the afatinib DRCN also inactivated LCK,
which dampened the ability for afatinib-inhibited EGFR to decrease the
flow into STAT1 and NFKBIA. The net outcome of the inhibitory effect of
afatinib on EGFR and ERBB2 is predicted to be an increase in
proliferation node activity in BT-20.
In BT-474, the decreasing signal flow from the afatinib-inhibited EGFR
node into the STAT1 node is blocked by the PPARA mutation, but the
afatinib-inhibited EGFR decreases AR node activity by reducing the
signal flow through the ESR1-AR cascade. The increasing signal from the
afatinib-inhibited ERBB2 node is offset by decreasing the signal flow
from the CDK1 node, which is indirectly inhibited by mutations. As a
result, the activity of the AR node is decreased after afatinib
treatment, resulting in decreased activity of the proliferation node.
Together, the signal flow analyses of the activity difference
sub-networks predicted a reason why BT-20 and BT-474 have different
responses to afatinib treatment: the signal flows in the two cell lines
differently affect the NFKB1-NFKBIA and the AR-AKT1 negative feedback
regulations ([159]Figure 6D). The underlying reason for the resistance
to afatinib treatment for BT-20 can be summarized as the following:
the mutational profile activates AR-AKT negative feedback regulation
and inactivates NFKB1-NFKBIA negative feedback regulation.
The genetic marker for the NCI-MATCH afatinib basket trial was mutated
or amplified EGFR.[160]^64 Both BT-20 and BT-474 cells have
gain-of-function mutations in EGFR. Thus, using only that single
genetic marker, one would expect the same response to afatinib for
these two cell lines. On the other hand, our framework not only
predicted the different responses of these two cell lines but also
revealed their underlying mechanisms. The collection of genes involved
in the two relevant feedbacks, NFKB1-NFKBIA and AR-AKT1, may constitute
a new type of dynamic drug-response biomarker. A biomarker composed of
a group of genes and their holistic dynamics, which respond
differentially to drug treatment, may predict drug responsiveness more
effectively than the classical mutation biomarkers of afatinib. Owing
to the interpretable and simulatable afatinib DRCN, we can properly
predict the drug response, distinguishing resistant cell lines that
need additional treatment. Moreover, by interpreting the resistance
mechanisms, such as feedback and crosstalk, we can identify efficient
combinatorial drug targets to compensate for those resistances.
Additional validations for suggesting dynamic drug-response biomarkers
are in [161]Figure S5. Detailed signal flow analyses of
trametinib-treated SW620 and SW48 cell lines are discussed in
[162]Figure S6.
Discussion
Predicting cellular response to a perturbation is of great interest in
systems biology and ML.[163]^4^,[164]^9^,[165]^20^,[166]^46 Systems
biology uses a white box approach based on mechanistic modeling of
intracellular molecular regulatory networks, and can provide
interpretable insights into regulatory dynamics, but is difficult to
scale up to larger networks. ML uses a black box approach by modeling
input-output relationships, and can offer high prediction performance,
but it is difficult to interpret its underlying rationale. There are
several works[167]^14^,[168]^15^,[169]^20^,[170]^22^,[171]^47^,[172]^65
that have attempted to merge these approaches to make interpretable
predictions, but they fall short of achieving detailed insight into
molecular regulatory dynamics about cellular responses after a
perturbation. Therefore, to provide interpretations of molecular
regulation dynamics, we constructed a gray box framework by combining
the white box and the black box L2O approaches to model molecular
regulatory dynamics. We devised a black box optimizer, GREY, to tackle
the high computational complexity of parameter optimization for DRCNs.
GREY learned efficient derivative-free search rules by identifying the
relationship between the structure of a network and its dynamical
response. To accomplish this, we adopted a GNN[173]^44 to represent
attributes of the network components (i.e., nodes and edges), and used
meta-reinforcement learning to learn different dynamic responses to
perturbations for each network. Remarkably, GREY optimized the edge
weights of the DRCN for trametinib three times more efficiently than
the state-of-the-art derivative-free optimization method, CMA-ES, with
respect to sample efficiency. Moreover, GREY optimized the DRCN for
palbociclib to a fitness value of 0.5, which is difficult to achieve by
inferring the regulatory logic of a network of such scale (∼600 edges)
using any other optimization method. We applied GREY to optimize the
DRCNs of 21 drugs. Constructing a DRCN for each drug involves numerous
steps that influence the number of nodes and edges, and the optimal
settings may vary for each drug. However, we applied the same settings
to construct all DRCNs to allow for a systematic comparison of the
prediction performance of optimized DRCNs. Consequently, certain drugs
inevitably have a suboptimal DRCN, influencing our results.
Investigating the impact of drug-specific settings on the prediction
performance of DRCNs would be an intriguing avenue for further
research.
To demonstrate the efficacy of the gray box framework on recapitulating
cellular dynamics, we applied it to the drug-response predictions in
cancer networks. Our results have shown the broad applicability of the
framework with a wide range of cancer types and drug targets. As the
mechanistic interpretation of DRCNs is not detached from drug-response
prediction performance, a question whether the molecular mechanism
interpretation of afatinib drug responses is perfectly correct or
cannot arise. However, since the Boolean networks with optimized
weights not only fully reproduce the quantitative responses for various
cell lines of several drugs, but also reproduce the activity changes
after those drug treatments, we can assume that the regulatory logic
recapitulated by the weighted Boolean logic of the optimized DRCNs
accurately models the underlying molecular regulatory mechanism, at
least for the scope of drug response. Moreover, the optimized DRCNs not
only correctly predicted drug responses of BT-20 and BT-474 cell lines,
but also suggested a biomarker based on regulatory dynamics that was
directly related with drug-response mechanisms and thus more accurately
predict the resistance. This result is well in accord with the research
of Choi et al.,[174]^4 where classifying cancer cell lines with similar
changes in expression levels induced by drug treatment exhibits more
similar drug responses than classifying cancer cell lines with similar
mutation profiles. Therefore, we believe that the interpretability of
the model can provide insight into biological mechanisms, such as drug
resistance, which can subsequently help to overcome these hurdles in
real biological applications.
Cancer is a patient-specific as well as tissue-specific disease, but
there are recurrent regulatory architectures common to all cancer
types.[175]^66^,[176]^67 Based on these recurrent architectures, we
hypothesized the existence of a PCCN and developed a method for
reconstructing such a core regulatory network from large datasets.
Feltes et al.[177]^68 also tried to curate core regulatory networks
relevant to all cancer types.[178]^69^,[179]^70 However, these
approaches might contain indirect interactions as curated links depend
on the data. Due to the risk that few incorrect links in the network
structure can vastly change the dynamics of the model, we instead
focused on using a well-known network structure database such as
Omnipath and cross-checked every link in the PCCN with other network
databases: Pathway Commons, DoRothEA, and Cui’s human signaling
network.[180]^71^,[181]^72^,[182]^73 Each drug has a different mode of
action at the cellular level and thereby affects a different part of
the intracellular network. Thus, we extracted the core structure
responsible for each drug and optimized the parameters separately.
The gray box framework can be applicable to readily accessible
perturbation data, while enabling deep interpretation of molecular
regulatory dynamics. Single-cell sequencing of drug-induced cellular
state transitions provides detailed dynamic information of drug
responses. Using these data, a number of studies predict and interpret
the response dynamics of molecular regulatory
perturbations.[183]^74^,[184]^75 However, aside from the high financial
costs, producing such data is a difficult and time-consuming task that
requires experimentally well-prepared samples of cells undergoing
specific dynamics and properly selected pre-processing and analyzing
tools.[185]^76 Under these limitations, the most commonly accessible
data of molecular regulatory perturbations are drug responses of cancer
cell lines in terms of cell viabilities. It is notable that discrete
DRCNs optimized by GREY have successfully interpreted the
drug-resistance mechanisms of afatinib in terms of molecular regulatory
dynamics without explicit molecular regulatory experimental data, but
only with cell viability. Owing to the vastness of public drug-response
viability data in various diseases caused by malfunctioning cell
networks, the suggested framework can also be applicable to controlling
fates of pluripotent stem cells[186]^77^,[187]^78 or macrophage
differentiation.[188]^79 Moreover, prediction performance and
interpretation of molecular regulatory mechanisms of the developed gray
box framework can be expected to further improve if data on molecular
regulatory dynamics are also explicitly incorporated. Currently, GREY
is trained with scale-free random networks for the specific task of
optimizing biological networks, although many biological networks are
not scale free.[189]^80 Thus, there is room for improving the
optimization performance of GREY by training it with more biologically
plausible random networks. Furthermore, in the future, GREY could be
trained with other types of random networks with different structural
characteristics to optimize various discrete dynamic models.
Limitations of the study
The DRCN formulates an "essential structure" connecting biological
molecules tied to drug responses in the PCCN, aiming to minimize
computational burdens of the optimizer by limiting number of links
extracted between molecules in the PCCN. While this method efficiently
identifies core network structures among molecules associated with drug
responses, it overlooks long-distance bypass regulations involving
multiple connections. Such pathways, despite potentially lower chances
of affecting drug response dynamics, are critical but might be missed
by our methodology. Dynamics influenced by these bypasses tend to be
embedded through optimized parameters in shorter pathways among the
selected molecules, omitting a few significant long-distance
interactions. We utilized GREY to optimize DRCNs for 21 drugs. However,
the prediction performance for certain drugs did not reach
significance. Although constructing each DRCN involves varying optimal
settings, we employed uniform settings for systematic
performance comparison, potentially leading to suboptimal DRCNs for
some drugs. In addition, GREY does not incorporate regularization in
its optimization procedure for DRCNs. Incorporating regularization in
future studies could enhance performance.
STAR★Methods
Key resources table
REAGENT or RESOURCE SOURCE IDENTIFIER
Chemicals, peptides, and recombinant proteins
__________________________________________________________________
Trametinib APExBIO Cat#A3018
Afatinib Selleckchem Cat#S1011
Palbociclib Selleckchem Cat#S4482
__________________________________________________________________
Critical commercial assays
__________________________________________________________________
RNA-spin kit INTRON Cat#17211
QuantSeq 3′ mRNA-Seq Library Prep Kit Lexogen Cat#015.24
__________________________________________________________________
Deposited data
__________________________________________________________________
Gene expression data This paper [190]GSE184731
Harmonizome Rouillard et al.[191]^81 SCR_016176
Kyoto Encyclopedia of Genes and Genomes Kanehisa et al.[192]^49
SCR_012773
Genomics of Drug Sensitivity in Cancer 2 Iorio et al.[193]^35
SCR_011956
Catalog Of Somatic Mutations In Cancer Tate et al.[194]^39 SCR_002260
Pathway Commons Cerami et al.[195]^72 SCR_002103
__________________________________________________________________
Experimental models: Cell lines
__________________________________________________________________
SW48 ATCC CCL-231
SW620 ATCC CCL-227
BT20 Korean Cell Line Bank(KCLB) KCLB #: 60061
NCI-N87 Korean Cell Line Bank(KCLB) KCLB #: 60113
MKN45 Korean Cell Line Bank(KCLB) KCLB #: 80103
BT474 Korean Cell Line Bank(KCLB) KCLB #: 60062
__________________________________________________________________
Software and algorithms
__________________________________________________________________
GRAY This paper [196]https://github.com/hanyh0807/GREY
[197]https://doi.org/10.5281/zenodo.10989715
Tensorflow version 1.15.5 Google [198]https://www.tensorflow.org/
Graph_nets python package Peter et al.[199]^39^,[200]^44
[201]https://github.com/deepmind/graph_nets/tree/master
SRMF Wang et al.[202]^46 [203]https://github.com/linwang1982/SRMF
DeepCDR Liu et al.[204]^10 [205]https://github.com/kimmo1019/DeepCDR
LOBICO Knijnenburg et al.[206]^47
[207]https://github.com/TianxiaoNYU/LOBICO-Matlab
MATLAB R2019b MathWorks [208]https://kr.mathworks.com/
pycma v3.1.0 Nikolaus et al.[209]^45
[210]https://github.com/CMA-ES/pycma
STAR Dobin et al.[211]^82 [212]https://github.com/alexdobin/STAR
DESeq2 Love et al.[213]^83
[214]http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html
Signal flow analysis Lee et al.[215]^63
[216]https://github.com/dwgoon/sfa
GDSCTools Cokelaer et al.[217]^84
[218]http://github.com/CancerRxGene/gdsctools
Hierarchical HotNet Reyna et al.[219]^85
[220]https://github.com/raphael-group/hierarchical-hotnet
The PathLinker app Gil et al.[221]^62
[222]https://apps.cytoscape.org/apps/pathlinker
[223]Open in a new tab
Resource availability
Lead contact
Further information and requests for resources and reagents should be
directed to and will be fulfilled by the lead contact, Kwang-Hyun Cho
(ckh@kaist.ac.kr)
Materials availability
This study did not generate new unique reagents.
Data and code availability
* •
Gene expression data have been deposited at GEO and are publicly
available as of the date of publication. Accession number is listed
in the [224]key resources table.
* •
All original code has been deposited at GitHub and is publicly
available as of the date of publication. The GitHub repository and
DOI are listed in the [225]key resource table.
* •
Any additional information required to reanalyze the data reported
in this paper is available from the [226]lead contact upon request.
Experimental model and study participant details
Setup for in-house experimental validations
The relevant cell cultures, SW48 (Colon), SW620 (Colon), BT20 (Breast),
NCI-N87 (Breast), MKN45 (Stomach), and BT474 (Stomach), were obtained
from Korean Cell Line Bank and the American Type Culture Collection
(ATCC). All cell lines were cultured in Dulbecco’s modified Eagle’s
medium (WelGENE Inc., Gyeongsan, Republic of Korea) with 10% fetal
bovine serum (FBS, WelGENE Inc.) and antibiotics (100 U/ml of
penicillin, 100 μg/mL streptomycin, and 0.25 μg/mL of Fungizone) (Life
Technologies Corp., Carlsbad, CA) at 37°C in a humidified atmosphere
containing 5% CO2.
To obtain reagents, trametinib (an inhibitor of MAP2K1 and MAP2K2, also
known as MEK1 and MEK2, respectively) was purchased from APExBIO
(Boston, MA, USA). afatinib (an inhibitor of EGFR and ERBB2) and
palbociclib (an inhibitor of CDK4 and CDK6) were obtained from
Selleckchem (Houston, TX, USA).
To conduct cell growth assays, cells were seeded into a 96-well plate
at a density of 3–7×103 cells/well in growth medium, incubated for 24 h
and then treated with the indicated drugs. Following incubation of the
plates for 72–120 h, relative cell viability was measured. After
seeding, cells were imaged using IncuCyte ZOOM. To assess cell growth,
the average area of each cell was determined at each time point using
the IncuCyte ZOOM analysis software. Images were captured at 3 h
intervals from three separate regions per well with a 20× objective.
To conduct RNA isolation and bulk RNA-seq analysis, total RNA was
extracted using an RNA-spin kit (INTRON), and RNA quality was assessed
by Agilent 2100 bioanalyzer using the RNA 6000 Nano Chip (Agilent
Technologies). RNA quantification was performed using ND-2000
Spectrophotometer (Thermo, Inc.), and 500 ng of RNA per sample were
used for library construction. The library construction was performed
using the QuantSeq 3′ mRNA-Seq Library Prep Kit (Lexogen, Inc.), and
the library was single-end sequenced on an Illumina NextSeq 500
(Illumina, Inc.). The reads were aligned using STAR,[227]^82 and the
read counts were estimated by featureCounts.
Differentially expressed gene (DEG) analysis was performed by using
significance analysis of microarrays method implemented in the DESeq2
package. We calculated the differentially expressed genes in log2-fold
changes of expression levels measured in both sensitive and resistant
cell lines for each type of cancer (colon, breast and stomach) using
the DESeq2[228]^83 R package. The cutoff values were set with the
‘adjusted p-value’ under 0.05 and the base mean over 100.
Method details
Constructing DRCNs
To efficiently elucidate the construction of the Pan-Cancer Core
Network (PCCN) and the Drug Response Core Network (DRCN), we integrated
a comprehensive list of cancer-related genes from multiple
high-throughput data sources. These included oncogenic genes and
compound targets from ten major signaling pathways identified in The
Cancer Genome Atlas[229]^66 (TCGA), such as RTK/RAS, Nrf2, PI3K, TGFβ,
Wnt, Myc, p53, Cell cycle, Notch and Hippo. Additionally, we
incorporated frequently amplified or deleted mutations,[230]^71 the
NCI-MATCH basket trial markers,[231]^38 notably co-occurring or
mutually exclusive mutations,[232]^86 and genes from ‘Pathways in
cancer’ (hsa05200) of the KEGG pathway.[233]^49^,[234]^50^,[235]^51
Through these integrations, we identified a total of 2,822 pan-cancer
genes.
Using the PathLinker algorithm,[236]^61^,[237]^62 these pan-cancer
genes were each used as a source node to connect to all other genes,
employing interconnecting pathways derived from the Omnipath network
structure databse.[238]^36 The connected pathways, comprising 3,615
nodes after exclusion of paths with overlapping sources and targets,
represented comprehensive interconnections among the pan-cancer related
genes. To refine this network and extract the core structure, we pruned
leaf nodes and redundant inter-nodes, smoothing the network using a
network propagation algorithm,[239]^87 and eventually extracted a core
structure with 2,574 nodes and 5,992 links.
For the construction of the DRCN, we applied hierarchical
HotNet[240]^85 and Edmonds’ algorithm[241]^88 to extract a core network
that captures the essential genes and pathways involved in drug
responses. This involved gathering genomic features such as mutations,
copy number alterations, and expression levels of genes from various
databases including the Cell Model Passport,[242]^89 OncoKB, and
COSMIC. These features were used to compute z-scores in analysis of
variance (ANOVA) of the GDSCTools[243]^84 package, which then served as
vertices in the hierarchical HotNet algorithm, identifying the most
statistically significant clusters of genes. We connected these
clusters using Edmonds’ algorithm with link weights derived from
inverse values of joint similarity scores calculated through HotNet.
This process culminated in a DRCN that minimized network size while
retaining critical drug response-related genes and pathways, further
enriched with phenotypic markers such as ’proliferation’ and
‘apoptosis’ nodes from the CancerGeneNet.[244]^37
To extract a core structure from the pan-cancer related network, we
took the following four steps: first, the nodes that only have incoming
links or outgoing links were pruned. Second, pan-cancer related genes
were smoothed across the network by the network propagation algorithm
shown in the [245]Equation 3 suggested by Vanunu et al.[246]^90
[MATH: Pt+1=(1−r)WPt+rP0
:MATH]
(Equation 3)
Where
[MATH: Pt :MATH]
is a probability vector at time step
[MATH: t :MATH]
in which the
[MATH: i :MATH]
-th element holds the probability of a random walker that will be
placed at that time step. The initial probability vector,
[MATH: P0 :MATH]
, is a vector with a probability of 1 for the cancer related genes and
0 for the other genes, which were normalized by a number of pan-cancer
related genes, such that the vector sums to 1.
[MATH: W :MATH]
is a column-normalized adjacency matrix of the pan-cancer related
network.
[MATH: r :MATH]
is a probability of restarting from the pan-cancer related genes at
every time step, which was set as 0.3 in our extraction. Third, an
iterative random walker with a limited number of transitions walked
around the network. The number of iterations was set to 10,000 and the
maximum number of transitions was set to 30. Fourth, the pan-cancer
core network was constructed by selecting a sub-network that was
visited by a larger number of walkers than a specific cutoff value,
which was set to 12. As a result, the pan-cancer core network was
constructed with 2,574 nodes and 5,992 links.
Weighted sum Boolean logics
To simulate drug responses of a given DRCN, we adopted “weighted sum
Boolean logic” by modifying the simple majority rule suggested by
Font-Clos et al.[247]^91 A state of a node
[MATH: k :MATH]
in a network with
[MATH: N :MATH]
nodes,
[MATH: sk :MATH]
, can be either 1 (activated) or −1 (inactivated). The state of a
network at a discrete time point,
[MATH: t :MATH]
, can be defined as a state vector,
[MATH: Vst
:MATH]
, which is composed of the
[MATH: N :MATH]
states of its nodes, such as
[MATH: skt
:MATH]
, at a certain time point. The tendency of each node state without any
incoming signals can be represented as a basal level vector,
[MATH: Vb :MATH]
, which usually has a value of zero. Relative influences of each
incoming link to a node can be a value in a specific range, which
includes integers between −5 and 5 in our work. Positive numbers
represent activating regulations and negative numbers refer to
inhibitory regulations. A relation matrix can be defined as a
[MATH: N :MATH]
-by-
[MATH: N :MATH]
connectivity matrix of the network, wherein each element
[MATH: (MR)ij
mrow> :MATH]
represents the relative influence of a node
[MATH: j :MATH]
on a node
[MATH: i :MATH]
. The network state at the next time point,
[MATH:
Vst+1
:MATH]
, can be decided by the sign of element-wise multiplication of a
weighted sum vector,
[MATH: Vw :MATH]
, computed by the matrix operation shown in [248]Equation 4. If some
elements of the weighted sum vector are zero, the next state of the
corresponding nodes follows the current state of each node. Network
nodes are synchronously updated at each discrete time point:
[MATH: Vst+1=sign(Vw)=sign(MR×Vst+Vb) :MATH]
(Equation 4)
The attractors, converged steady states of the network, can represent
the biological phenotypes by analyzing average states of each node
within the attractor states.[249]^92^,[250]^93 Average activity of each
molecule represented by a node state of an attractor is computed as
suggested by Kim et al.[251]^94 The average state of each node is
computed as the average of the node state within a specific attractor
multiplied by the proportional ratio of initial states which converged
into that attractor. If an attractor does not converge into a single
state and cycles a set of states, the average state of each node in
that attractor is computed as a sum of node states in the cycle divided
by the period of the cycle. The converging ratio of each attractor is
approximately measured by repeated enumeration of the network state
starting from a random initial network state to a particular attractor.
Synthetic environments for training GRAY
We designed a synthetic environment to train GRAY for Boolean network
optimization through meta-reinforcement learning. A randomly generated
scale-free network[252]^95 of 20∼40 nodes, with edge weights varying
within [-5,5] except 0, were assigned to each environment. We generated
100,000 synthetic environments. At the start of each episode, one of
100,000 environments was randomly selected, and its virtual
perturbation data was generated. The virtual perturbation data contains
the fold change of the average activity for each node between the first
attractors that were reached from 128 random initial states, and the
second attractors that were reached starting from the first attractor
after fixing several random nodes to 1 or -1 to imitate a drug
treatment. By varying the fixed nodes to compute the second attractor,
16 ground truth datasets were generated for each environment. The
fitness is defined as the average of the Pearson’s correlation
coefficients (PCC) between the ground truth datasets and the fold
change values computed by applying the edge weights (
[MATH: At :MATH]
, obtained edge weights at step
[MATH: t :MATH]
) obtained from GRAY onto the network. The fixed node sets used to
compute fold change values for
[MATH: At :MATH]
were the same as the ground truth datasets. The reward was given as the
observed improvement[253]^33 as follows:
[MATH: Rt=max{ft−
maxi<t(fi),0} :MATH]
(Equation 5)
Here,
[MATH: ft :MATH]
is the fitness and
[MATH: Rt :MATH]
is the reward at step
[MATH: t :MATH]
. The goal of GRAY against a given environment is to maximize the
discounted cumulative sum of reward.
Neural network structure of GRAY
GRAY consists of three components: the input, the submodule, and the
main layer ([254]Figure S1A). The input layer has a graph module for
embedding networks and a message module for communications between
agents. The graph module consists of two graph neural networks (GNN),
which is a class of neural networks tailored for processing input
represented as graphs. The first (
[MATH: G :MATH]
) is for the main layer and the global state estimator of the submodule
layer, and the second (
[MATH: GC :MATH]
) is for the context estimator of the submodule layer. The message
module (
[MATH: Gm :MATH]
) is also a GNN and is used for all modules in both the submodule and
the main layer.
In each step, GNNs in the graph module receives eight centrality
measures for nodes (betweenness,[255]^96 degree,[256]^97
eigenvector,[257]^98 katz,[258]^99 closeness,[259]^100 load,[260]^101
harmonic,[261]^102 pagerank[262]^103) and three values for edges
(weight, sign of each edge in the network of the environment, and edge
betweenness centrality[263]^96) as input, and updates nodes, edges, and
the global feature by iteratively aggregating information from
neighboring nodes, edges, and the entire graph. The weight in the input
for the edge is from the output of the preceding step. The global
feature represents a graph-level attribute. The message module
initially receives a zero-vector input for all nodes and edges at the
beginning of an episode, and generates nodes, edges, and the global
feature. From the second step, the message module receives nodes,
edges, global features, and edge weights generated from the preceding
step as input, and updates its features. The outputs of the GNNs in the
graph module and the message module are reorganized for each edge by
concatenating the global feature, the edge feature, and the features of
sender and receiver nodes of each edge ([264]Figure 3D). Each
reorganized feature represents the state of each edge. We use this as
the feature vector to be fed into the subsequent modules. The
reorganized feature of the ith edge from each of the three GNNs (
[MATH:
G,GC,Gm :MATH]
) are as follows:
[MATH: Li=(ni_<
mi>s,ni_<
/mo>t,li
,g),LiC=(ni_sC,ni_tC,<
msubsup>liC,g
mi>C),Lim=(ni_sm,ni_tm,<
msubsup>lim,g
mi>m)
:MATH]
(Equation 6)
[MATH: L=L1:k,LC=L1:kC,Lm<
/mi>=L1:km :MATH]
(Equation 7)
Here,
[MATH:
Li,LiC,Lim
:MATH]
are the reorganized features of i^th edge from
[MATH:
G,GC,Gm :MATH]
respectively.
[MATH:
ni_s,ni_t<
/msub> :MATH]
are the node features of the source and target node of i^th edge,
respectively.
[MATH: li :MATH]
is the edge feature of i^th edge.
[MATH: g :MATH]
is the global feature of a network.
[MATH:
L,LC,Lm :MATH]
are the reorganized features for all edges (1 to
[MATH: k :MATH]
) in the network from each module,
[MATH:
G,GC,Gm :MATH]
, respectively. Features without any superscript are from
[MATH: G :MATH]
, and features with superscript C and m are from
[MATH: GC :MATH]
and
[MATH: Gm :MATH]
, respectively. Modules in the submodule and main layer receive the
reorganized features as an input ([265]Figure S1A).
A submodule layer consists of a context estimator module and a global
state estimator module. Both modules are one-layer LSTMs with layer
normalization.[266]^104 In each module, there is one LSTM for each
reorganized feature (i.e., one LSTM for each edge). LSTMs in each
module share their parameters (weights of LSTM) but maintain
independent hidden states. The context estimator is a module that helps
the agent to quickly adapt to the selected environment by estimating
information about the environment.[267]^105 During training, the true
edge weights of the synthetic network from the selected environment in
each episode are known. Thus, we can use the true edge weights as a
target for the context estimator.
[MATH: LC :MATH]
,
[MATH: Lm :MATH]
, and the edge weight and the reward from the preceding step are set as
inputs to the LSTM, and the 11-dimensional logits (edge weights are
between −5 and 5) are generated through two dense layers with
exponential linear unit (ELU) activation functions.[268]^106 Supervised
training is done using cross entropy loss with respect to the true edge
weight of the network from the selected environment in each episode.
The global state estimator is a module used to provide information that
can help learn cooperative policy in order to encourage the agents to
cooperate and perform efficient exploration in high-dimensional search
space, inspired by Foerster et al.,[269]^107 Das et al.,[270]^108 and
Kim et al.[271]^109 Since the output (edge weight) of an agent (main
layer) for each edge collectively determines the fitness, the output of
all the agents needs to be well-organized. We trained the communication
protocol to gather
[MATH: L :MATH]
,
[MATH: Lm :MATH]
, and the edge weight of the current step into the global state
estimator module, and then predict the feature (
[MATH: L :MATH]
) of the next step. Since the input for the subsequent step is
determined by the outputs of all agents, it is necessary to infer the
plans of other agents in order to predict the features of the
subsequent step. Therefore, the global state estimator provides
inferred information of the outputs of the other agents. The global
state estimator module predicts the average and standard deviation for
the features of the subsequent step and is trained to increase the log
probability for the actual value of that feature through supervised
training.
In the main layer (agent),
[MATH: L :MATH]
,
[MATH: Lm :MATH]
, the edge weight and the reward from the preceding step, and the
activations of the second-to-last layer of the two submodules are set
as inputs to a two-layer LSTMs with layer normalization. Similar to the
submodule layer, there is a two-layer LSTM for each reorganized
feature. LSTMs in this layer share parameters, while maintaining
independent hidden states. Output and value are generated through an
independent dense layer with an ELU activation function. The output is
11-dimensional logits (to represent −5 to 5) which estimates the next
weight of an edge assigned to the agent, and the value is an expected
discounted reward of the current input.
Training procedure
To train a derivative-free search rule for optimizing the edge weights
of Boolean networks, GRAY uses meta-reinforcement
learning.[272]^32^,[273]^42 GRAY is trained on various synthetic
environments, where the environment involves optimizing edge weights of
random synthetic networks for ground truth datasets. At the beginning
of each episode, an environment is sampled out of 100,000 synthetic
environments and fixed until the end of the episode. Each episode
consists of 80 steps. Hidden states of all LSTM in GRAY are set to the
zero vector at the beginning of the episode. The initial state (
[MATH: S1 :MATH]
) of the episode is a combination of a random parameter vector (
[MATH: A0 :MATH]
) and zero (
[MATH: R0 :MATH]
). The sign of each parameter in
[MATH: A0 :MATH]
is kept the same as that of the corresponding link in the network of
the selected task. All episodes have 80 steps and each iteration of the
input and output processes of an episode is saved to episode buffer
when the episode is finished.
GRAY is trained using 10 workers and one master, where each worker
spends 8,000 episodes in the supervised pretraining phase and 12,000
episodes in the reinforcement learning phase. Each worker independently
collects episodes and computes gradients. The master collects gradients
from each worker and then averages them. The averaged gradients are
applied to the neural network of the master, and the updated parameters
are redistributed to each worker.
The training procedure for the main layer consists of two phases: a
supervised pre-training phase based on directly predicting the true
edge weights of the selected environment and a reinforcement learning
phase for maximizing the discounted cumulative sum of rewards using the
distributed proximal policy optimization (DPPO) algorithm.[274]^110 In
the supervised pretraining phase, we randomly replaced 50% elements of
the new parameters with the true parameters that have a probability of
0.05 for each step. Whenever 64 episodes are collected, GRAY is trained
using Adam on 4 minibatch samples (4 episodes) for one or three epochs,
and the buffer is cleared. In the supervised pretraining phase, all
neural networks are trained using supervised learning. However, the
parameters of G^C and the submodule layer are updated for three epochs
for every 64 episodes and the remaining parameters (G, G^m, main layer)
are updated for one epoch for every 64 episodes. In the reinforcement
learning phase, G^C and submodule layer are trained using supervised
learning for one epoch for every 64 episodes. Moreover, the remaining
parameters (G, G^m, main layer) are updated using the DPPO algorithm
for three epochs for every 64 episodes. Detailed hyperparameters are
summarized in [275]Table S2.
Selecting six cell lines for in-house experimental validations
To validate the molecular-scale interpretability of the DRCNs
constructed by the gray box framework ensuring that the DRCNs can
effectively capture significant drug response dynamics across a diverse
range of cancer cell lines, with an affordable number of experimental
validations, we have designed experiments by selecting specific cell
lines based on a set of well-defined criteria that allows us to
represent the broader landscape of drug response dynamics in pan-cancer
settings.
Our selection criteria was carefully designed to ensure that the chosen
cell lines could serve as representative samples for validating the
predictive capabilities of our DRCNs. These criteria include the
following four considerations. First, we selected three of the most
commonly occurring cancer types—colon, breast, and stomach cancers.
Each of these cancer types has unique gene families associated with
them, which makes each cancer type having distinctive molecular
machinery related with drug responses to others. Second, we chose three
representative U.S. Food and Drug Administration (FDA)-approved
targeted cancer drugs—afatinib (EGF signaling), trametinib (MEK
signaling), and palbociclib (CDK signaling). These drugs act on
distinct signaling pathways that are most commonly altered in various
cancers, including our three selected types. Third, we selected two
groups of cell lines corresponding to those ranked in the top 15 for
sensitivity and resistance, respectively, to each of the three selected
drugs. These rankings were based on the area under the dose-response
curve (AUC) values from the GDSC2 database. Fourth, among the selected
sensitive and resistant cell lines, we chose one cell line from each
group that shared the same genetic markers as suggested by the
Molecular Analysis for Therapy Choice (MATCH) basket trials conducted
by the National Cancer Institute (NCI-MATCH).
As a result, we have used the mutational drug response markers from the
EAY131 NCI-MATCH screening trial as follows: EGFR activating mutations
and ERBB2 activating mutations for afatinib, BRAF fusions, BRAF V600E
or V600K mutations, GNAQ or GNA11 mutations and NF1 mutations for
trametinib, CCND1 or CCND2 or CCND3 amplifications, CDK4 or CDK6
amplifications for Palbociclib. For the six selected cancer cell lines,
both BT-20 and BT-474 have the same mutation profiles for EGFR and
ERBB2 (mutated EGFRs and not mutated ERBB2s), while they show different
afatinib responses (sensitive BT-474 and resistant BT-20). Both NCI-87
and MKN45 have the same mutation profiles of no EGFR mutations, while
they show different afatinib responses (sensitive NCI-N87 and resistant
MKN45). Both SW48 and SW620 have the same mutation profiles of no ERBB2
mutations, while they show different afatinib responses (sensitive SW48
and resistant SW620). Both SW48 and SW620 have no mutations on BRAF,
GNA11, GNAQ and NF1, while they show different Trametinib responses
(sensitive SW48 and resistant SW620). Both SW48 and SW620 have no
mutations on CCND1, CDK4 and CDK6, while they show different
Palbociclib responses (sensitive SW48 and resistant SW620). This
approach allowed us to address the challenge of classifying drug
response differences and interpret the resulting molecular dynamics,
which remain obfuscated by classical marker-based methods. By adhering
to these selection criteria, we aimed to ensure that our experiments
could effectively represent and validate the predictive capabilities of
the DRCNs optimized by the gray box framework. These experiments serve
as representative examples of how our framework can predict drug
responses at the molecular level across diverse cancer types and
targeted drug categories, surpassing the predictions made by
traditional marker-based methods.
In summary, we have been able to demonstrate the applicability of the
gray box framework by strategically selecting representative drugs and
cell lines for validation. This approach was chosen to avoid the need
for experimental validation across all 300 cell lines used in our
training set, as it would not provide additional insights beyond what
has already been achieved with our carefully designed experiments.
Integrating cell line mutation profiles with functional effect
The mutational profile of each cell line is curated from the
preprocessed COSMIC database, which was processed by Béal
et al.[276]^111 The functional effect of each mutation is decided by
comparing oncogenic and tumor suppressive q-values suggested by Tokheim
et al.[277]^40 If the oncogenic q-value of a specific gene is smaller
than the tumor suppressive q-value, then the functional effect of the
mutation on that gene is considered oncogenic. Therefore, the
functional effect of the mutation on the gene is treated as a gain of
function. In the case that the tumor suppressive q-value is smaller
than the oncogenic q-value, the functional effect of the mutation on
the gene is treated as a loss of function. These functional effects of
mutations are then cross-validated with functionally associated genes
in Harmonizome.[278]^81
Maximum matching-based mutation data projection
Several mutated genes of cancer cell lines exist within the pan-cancer
network but not in the DRCN. Thus, we needed to approximate the effects
of each mutation on the DRCN by leveraging structural information. Hu
et al.[279]^112^,[280]^113 defined the structural control configuration
(SCC) of a network model to approximate the effects of certain drugs on
the downstream genes. We employed this method to approximate the
effects of mutations. An SCC of a network model is a maximally matched
graph within the network, although there exist many different SCCs
within a large network. SCC consists of elementary paths, which are
cascades connected by maximum links, and elementary cycles, which are
cycles connected by maximum links. Additional links indicate those that
connect elementary paths and elementary cycles. We generated 1,000 SCCs
until the proportion of new maximum matching links reached zero. Next,
we extracted all possible simple paths consisting of maximum matching
links and additional links from mutated genes to other genes within the
DRCN of each SCC. We utilized the sign information of each link from
the paths among the mutated genes as well as the genes within the DRCN
to obtain the effects of each mutation on its downstream genes.
Incorporating mutations and drugs onto the simulation
During simulations of drug responses of the core network, the effect of
a mutation on a node inside the core network was reflected by fixing
the state of the corresponding node to either 1 or -1 for a gain of
function or a loss of function mutation, respectively. The effects of
mutations on a node outside of the core network were projected into the
core network using maximum matching. These effects were then applied
onto their corresponding elements of the basal level vector
[MATH: Vb :MATH]
, 1 or −1 for a gain of function or a loss of function mutation,
respectively. To estimate the inhibition effect of drugs on the core
network, we applied a two-step drug response simulation. First, the
network converged into attractors with the given mutation profile.
Second, the network converged again from the prior attractor states to
new attractors while the drug target nodes were additionally fixed to
−1. If some of the prior attractor states were cycles, the first state
of each cycle was treated as the prior attractor state in the second
step of the drug response simulation. The difference between the
average activity of the phenotype node in the two states (the new
attractor and the prior attractor) were compared with the measured area
under the viability curve (AUC) value collected from GDSC2.
Signal flow analysis of DRCN perturbations
Signal flows after the DRCN perturbations are analyzed by an algorithm
suggested by Lee et al.[281]^63 Signal flows from a node are defined by
adding the incoming signal flow to the product of the node activity
change with the sign (positive or negative) of the outgoing links.
Effects of mutations on signal flows are incorporated differently
depending on whether the mutations were on a node inside the activity
difference sub-network or not. The effect of a mutation on a node
inside the activity difference sub-network was treated as pinning that
node to a state, such that incoming links were deleted and a constant
signal flows out from that node. The effect of a mutation from outside
the activity difference sub-network was incorporated by propagating the
change in signal flow caused by a mutation through exclusive simple
paths computed by the maximum matching algorithm.
Quantification and statistical analysis
For comparison between two groups, two-sided Student’s t test was used.
All pathway enrichment analyses were conducted using WebGestalt[282]^48
and KEGG.[283]^49^,[284]^50^,[285]^51 ANOVA was conducted using
GDSCTools.[286]^84 Statistical details can be found in the respective
figure legends, results, and the [287]STAR Methods section.
Acknowledgments