Abstract

   High-throughput screening of drug sensitivity of cancer cell lines
   (CCLs) holds the potential to unlock anti-tumor therapies. In this
   study, we leverage such datasets to predict drug response using cell
   line transcriptomics, focusing on models’ interpretability and
   deployment on patients’ data. We use large language models (LLMs) to
   match drug to mechanisms of action (MOA)-related pathways. Genes
   crucial for prediction are enriched in drug-MOAs, suggesting that our
   models learn the molecular determinants of response. Furthermore, by
   using only LLM-curated, MOA-genes, we enhance the predictive accuracy
   of our models. To enhance translatability, we align RNAseq data from
   CCLs, used for training, to those from patient samples, used for
   inference. We validated our approach on TCGA samples, where patients’
   best scoring drugs match those prescribed for their cancer type. We
   further predict and experimentally validate effective drugs for the
   patients of two highly lethal solid tumors, i.e., pancreatic cancer and
   glioblastoma.

   Subject terms: Machine learning, Predictive medicine, Targeted
   therapies, Cancer genomics
     __________________________________________________________________

   Potential anti-tumor therapies remain to be discovered in cancer cell
   line high-throughput screening datasets. Here, the authors develop a
   machine learning approach to infer cancer cell drug sensitivity from
   transcriptomics data and to explore drug mechanisms of action, and
   predict effective drugs for pancreatic cancer and glioblastoma.

Introduction

   Landmark cancer genomic projects such as The Cancer Genome Atlas (TCGA)
   have provided an unprecedented, multimodal picture of the complex
   genetic and molecular landscape characterizing major tumor types^[64]1.
   The possibility to define tumors at a molecular level is changing
   cancer treatment through the development of single- or
   combinatorial-targeted therapies for patients characterized by specific
   genetic makeups. Unfortunately, effective treatments are still lacking
   for many patients and the emergence of resistance further limits the
   clinical benefit of therapies.

   In recent years, the availability of large-scale pharmacogenomics
   databases, including the Cancer Cell Line Encyclopedia (CCLE)^[65]2,
   the Genomics of Drug Sensitivity in Cancer v1 and v2 (GDSC)^[66]3, the
   Cancer Therapeutics Response Portal v2 (CTRPv2)^[67]4, the Profiling
   relative inhibition simultaneously in mixtures (PRISM)^[68]5 and
   DepMap^[69]6, has fostered the development of personalized oncology
   strategies.

   Initial landmark work by Iorio, Garnett & coworkers highlighted the
   genomic alterations that sensitize to drugs as well as the contribution
   of different genome sequencing data types for drug sensitivity
   predictions^[70]3. Since then, several computational forecast
   methodologies have been proposed to leverage the information in CCLE
   and GDSC to predict cancer cell lines drug-sensitivity (reviewed in
   refs. ^[71]7–[72]9) and best practices to develop predictive models
   have been issued ^[73]10. Models “primed” with prior knowledge have
   also been developed. In general, these models exploit representations
   of biological processes instead of individual genes as features to
   reduce the number of input, independent variables, while leading to an
   increase of model performance and interpretability^[74]11–[75]16.
   However, a systematic investigation of whether drug sensitivity models
   are able to learn, in an unbiased fashion, the biological processes
   associated with drugs’ mechanism of action (MOA) is still missing from
   the literature. Moreover, to what extent models trained on expression
   data in cancer cell lines can be exploited to interpret patients’ bulk
   RNAseq data is still a largely unsolved issue, although earlier studies
   tried to tackle this fundamental point^[76]11,[77]17.

   In this work, to forecast drug sensitivity in cancer cell lines we
   developed a machine learning algorithm trained on large cancer cell
   lines drug sensitivity datasets, i.e., GDSC and PRISM, which represent
   the largest screening datasets of oncological and non-oncological
   drugs, respectively. We focused on the interpretability and deployment
   of the predictive models, which are both critical to generate new
   testable hypotheses (Fig. [78]1A). We created a pipeline, that we
   called CellHit, by combining the predictive models with
   Celligner^[79]18, an unsupervised alignment strategy that allows to
   identify cell lines whose transcriptomics profile most closely match
   the bulk RNAseq from patient tumors. We employed our pipeline to infer
   the best-scoring drugs for the entire TCGA cohort based on patient
   transcriptomics profile, as well as on samples from PDAC and GBM
   patients, which were experimentally validated.

Fig. 1. Training a machine learning framework for predicting cancer cell
sensitivity to drugs.

   [80]Fig. 1
   [81]Open in a new tab

   A Schematic workflow of the pipeline from dataset acquisition and model
   training through to benchmarking and external validation. Leveraging
   large-scale transcriptomics datasets, several machine learning models
   are trained on Genomics of Drug Sensitivity in Cancer (GDSC) and
   Profiling Relative Inhibition Simultaneously in Mixtures (PRISM)
   viability screening datasets. A Large Language Model (LLM) assists the
   curation of drug Mechanisms of Action (MOA), enhancing model
   interpretability and facilitating feature selection. Three model types
   are trained: one using cell transcriptomics and drug features, another
   using only transcriptomics, and a third using a selected subset of
   transcriptomic data informed by pathways identified by the LLM. These
   models undergo benchmarking across 20 distinct train/validation/test
   splits. The best-performing model is then applied in inference on The
   Cancer Genome Atlas (TCGA) bulk RNA-seq data and on external patient
   datasets for pancreatic ductal adenocarcinoma (PDAC) and glioblastoma
   multiforme (GBM). Predictions are externally validated on TCGA data
   using National Cancer Institute (NCI) cancer drug indications to assess
   the recovery of known information, as well as experimentally on primary
   (GBM dataset) and commercial (PDAC dataset) cancer cell lines; B Bar
   plot comparing the performance of different model architectures (MLP,
   XGBoost and literature baselines) and input feature representations
   (cell features and drug features) in terms of Pearson correlation with
   observed drug sensitivities. Different colors denote different learning
   algorithms (e.g., light blue XGBoost and purple MLP). Etched bars
   highlight models using only transcriptomic data (no drug
   featurizations). Results are obtained by averaging results across 20
   distinct test splits. Error bars represent SD of the results; C Bar
   plot depicting Mean Squared Error (MSE) for the same models and
   features as in (B). Also in this case results are obtained by averaging
   results across 20 distinct test splits. Error bars represent SD of the
   results; D Histogram of the distribution of Pearson correlation
   coefficients for drug-specific models using all genes, indicating the
   median, mean, and standard deviation; E Box plots illustrating the
   variability in the distribution of Pearson correlation coefficients
   across 20 different random training/testing splits. Each box plot
   represents a specific model and displays the median correlation
   (central line), interquartile range (box edges), and variability
   outside the upper and lower quartiles (whiskers). Source data are
   provided as a [82]Source Data file.

Results

An interpretable model for cell line drug response predictions with or
without drug representations

   We developed an interpretable machine learning framework for the
   prediction of drug sensitivity of cancer cell lines by using the latest
   version of GDSC and PRISM datasets. Given the lower dimensionality of
   GDSC, we tested alternative modeling strategies on this dataset and
   then transferred optimal settings to PRISM. After GDSC data
   pre-processing, we obtained the profiling of 286 unique drugs in 686
   cell lines. We first implemented an integrated model by jointly
   considering representations of both drugs and cell lines to predict
   IC50 values as target variables (Fig. [83]1A). We employed different
   strategies to numerically represent drugs chemical structures and cell
   line expression profiles to provide inputs to ML algorithms (Methods).
   As for cell lines, we first transformed the RNAseq data using
   Celligner^[84]18 along with TCGA bulk RNAseq samples. This
   preprocessing step is required to match the cell line closest to TCGA
   samples by aligning their RNAseq data, enabling the application of our
   ML model, which was trained on cancer cell lines, to patient samples
   (see below). We employed drug and cellular representations as features
   to train a supervised regression model to predict IC50 values (i.e.,
   “Joint feature” models, see Fig. [85]1A). We used a tissue-dependent,
   cell line stratification strategy to generate training, validation and
   testing splits to avoid data leakages (see Methods). We found that
   XGBoost, in combination with all-gene expression vectors and one-hot
   encodings of the molecules, achieved the best performance (Pearson
   correlation coefficient ρ = 0.89, Mean Square Error MSE = 1.55;
   Fig. [86]1B, C), being able to outperform other architectures, such as
   Multi-Layer Perceptron (MLP), as well as other competitive methods
   available from the literature (Chen & Zhang, 2021) (Fig. [87]1B, C;
   Supplementary Fig. [88]1; Supplementary data [89]1).

   The fact that the simplest representation of the drug, i.e., one-hot
   encoding, achieved the best results, suggested that the model leverages
   drugs identifiers, disregarding molecular properties of individual
   drugs. Since our main goal was to learn the transcriptional programs
   responsible for drug responses for post-hoc interpretation, we
   generated drug-specific models by employing only gene expression as
   input features (i.e., “All-genes” models, see Fig. [90]1A). We trained
   the models by repeating the same procedure used for the joint
   representation. Overall, aggregated IC50 predictions have correlations
   with experimental values very close to the joint model (ρ = 0.88,
   MSE = 1.73; Fig. [91]1B, C; Supplementary data [92]1). By evaluating
   performances on a held-out testing set for each of the 286
   drug-specific models, we obtained a median ρ = 0.40 (Fig. [93]1D). The
   best performance was achieved by Venetoclax (ρ = 0.72, Fig. [94]1E), a
   small molecule that increases apoptosis by inhibiting BCL2^[95]19. A
   total of 73 drug models (25%) had correlation ρ > 0.5 (Supplementary
   data [96]2). In summary, we developed reliable models for drug
   sensitivity based on just cell lines transcriptomics data.

Models’ interpretation reveals convergence between important genes and known
drug-targets

   For each drug-specific model, we inspected the genes most important for
   prediction and checked whether they were either known targets, or
   members of pathways associated with the known MOA of that drug. Gene
   importance, defined as the contribution of individual gene expression
   to model predictiveness, was evaluated in two ways. First, we inspected
   whether genes gave a positive or negative contribution to the final
   IC50 prediction via a game theory approach (i.e., Shapley Additive
   exPlanations^[97]20). Second, we used an importance permutation method,
   randomly shuffling genes and evaluating the effect of the perturbation
   on test set metrics. We considered only those genes deemed important
   with both approaches.

   We found that 39% of the drug-specific models trained on GDSC
   identified the known target among important genes in at least one model
   out of 20 random splits (Fig. [98]2A, Supplementary data [99]2).
   Remarkably, models for BCL2 inhibitors, such as Venetoclax, Navitoclax
   and ABT737, consistently recovered their target in the majority of the
   trained models (Fig. [100]2A, B). Several other drug models recovered
   the corresponding targets in more than 50% of the splits (e.g.,
   Gefitinib-EGFR, Nutlin-3a(-)-MDM2, or Linsitinib-IGF1R; Fig. [101]2A).
   We assessed the statistical validity of these results by estimating the
   drug-specific recovery rate of a random gene across the 20 splits (see
   Methods). We found that 70% of the targets are found at or above the
   90th percentiles of the background distributions, suggesting a
   significant positive recovery rate for most models (Fig. [102]2A,
   Supplementary data [103]2). Analysis of the importance scores, for
   example for the top performing model (i.e., Venetoclax), showed that
   higher values are attributed to the corresponding target, i.e., BCL2.
   This connection is seen both as a strong negative contribution to the
   predicted IC50 value (i.e., SHAP importance; Fig. [104]2B teal) and in
   test metrics (i.e., Correlation delta; Fig. [105]2B orange). By
   plotting for each cell line the target gene expression, Venetoclax’s
   experimental IC50s, as well as Venetoclax’s model SHAP values, it is
   possible to understand how the Venetoclax model leverages the
   expression levels of its target (i.e., BLC2) to successfully predict
   IC50 (Fig. [106]2C). Indeed, cell lines having a higher expression of
   BCL2 are characterized by lower IC50 as well as strongly negative SHAP
   values (i.e., higher efficacy of Venetoclax as a BCL2 inhibitor)
   (Fig. [107]2C).

Fig. 2. GDSC model interpretability.

   [108]Fig. 2
   [109]Open in a new tab

   A Target recovery for the top 25 ligand-target pairs. The length of the
   bars on the left indicates the fraction of recovery (# of times the
   drug-specific model identifies the putative gene as important out of 20
   train/test splits). Red lines represent the 95th percentile of the Hit
   Fraction distribution across all genes for a given drug. The length of
   the bars on the right shows the median pearson correlation for each
   drug-specific model; B SHAP (teal) and correlation delta (orange)
   importances for the Venetoclax drug. Permutation importance reflects
   the decrease in the model’s prediction accuracy when a feature’s values
   are shuffled, indicating its importance (greater drops signify higher
   importance). SHAP importance represents a feature’s contribution to the
   model’s prediction, with larger absolute values indicating greater
   importance; C An integrated assessment of the Venetoclax model across
   various cell lines (X axis). The top plot (in black) shows the
   experimental IC50 z-scores, while the second plot (in gray) depicts the
   predicted IC50 values, providing a comparison of model performance
   against experimental data. The third and fourth plots (in teal and red)
   respectively represent the SHAP values and expression levels of BCL2.
   Overall, the figure shows how lower IC50 values (higher drug efficacy)
   are associated with higher BCL2 expression levels and correctly
   identified impact (negative SHAP value) of the gene on predicted IC50.
   Source data are provided as a [110]Source Data file.

MOA processes and gene dependencies are learnt by the drug sensitivity models

   We further inspected whether the models learned about MOA-related
   biological processes and pathways. The information of biological
   pathways associated to the action of a certain drug is sporadically
   annotated in chemical bioactivity or biological pathway knowledge
   bases. We therefore systematically curated the association of GDSC
   drugs to the pathways of a reference knowledgebase, Reactome^[111]21.
   We first retrieved a total of 66 GDSC drugs that are annotated to
   Reactome pathways through Guide to Pharmacology/Pubchem^[112]22
   mappings. We then considered all pathways containing the drugs’
   putative targets, finding matches for 233/287 (81%) unique GDSC drugs.
   To further extend the drug-MOAs’ pathway coverage, we leveraged
   an open-source Large Language Model (LLMs), i.e., Mixtral Instruct
   8x7b, to generate detailed descriptions of drug-related processes by
   inputting the generic name of the drug (or available synonyms) and
   available annotations from the GDSC compound information table. The
   resulting annotations informed a second prompt designed to identify
   from Reactome the semantically closest pathways for each drug
   (Fig. [113]3A; see Methods). Through this approach, we were able to
   retrieve detailed information about MOA and associated pathways for
   253/287 (88%; Supplementary data [114]3) GDSC drugs (Fig. [115]3B,
   top), therefore increasing the coverage of annotated drugs
   (Fig. [116]3B, bottom). We finally combined the three sets of
   drug-pathway associations, for a total of 5662 instances, 253 unique
   drugs and 138 unique pathways (Supplementary data [117]3).

Fig. 3. Analysis of Drug MOAs and Gene Essentiality via GDSC Models.

   [118]Fig. 3
   [119]Open in a new tab

   A Workflow depicting the use of a large language model (LLM) for
   generating drug MOAs and identifying semantically relevant pathways.
   Starting from the drug’s available metadata, an LLM is repeatedly
   tasked with specialized prompts to generate a drug textual description.
   In parallel, PubMed is queried programmatically with the drug name to
   retrieve abstracts related to the drug. The information is integrated
   in a final textual description. The obtained drug description is used
   by a “Guided” LLM to choose which are the Reactome pathways which are
   most likely to modulate drug efficacy. This last procedure is repeated
   5 different times and only pathways selected at least two times are
   retained; B Venn diagram showing the different drugs(top) and pathways
   (bottom) recovered using the LLM procedure as compared to pathway match
   based on drugs’ and target’s names; C Heatmap of significant
   MOA-pathways for various drug models, filtered by a correlation
   threshold ρ > 0.5. Drug names and involved pathways are labeled along
   the x-axis and y-axis, respectively. Starred squares highlight pathways
   linked to drugs via at least one annotation criterion. Adjacent bar
   plots show the count of significantly enriched elements per row/column
   in light gray, with those annotated by the pipeline in dark gray. A
   vertical dashed line highlights the presence of a group of drugs that
   most frequently recover pathways and known MOAs; D number of
   significantly enriched MOA-pathways obtained from different annotation
   criteria; E tissue-wise statistics of number of cell lines (top),
   number of core essential genes (middle), recall of essential genes at
   different important genes (SHAP) stringencies (top k 10, 20, 50, 100);
   F STRING PPI network of lung core essential genes recovered by SHAP
   importances. Nodes’ have diameters proportional to node degree and are
   colored according to SHAP values (the brighter the more important).
   Source data are provided as a [120]Source Data file.

   We performed pathway enrichment analysis on important genes from
   All-genes drug specific models to check whether known MOAs for the
   corresponding drug were identified. We found a total of 114 GDSC drug
   models with at least one enriched pathway (FDR < 0.1). Out of these, 65
   (57%) unique drugs have at least one MOA-pathway significantly enriched
   when considering the aggregated list of MOA-pathways (Fig. [121]3D;
   Supplementary data [122]4), with the LLM-derived ones being the single
   list providing the highest recovery of enriched pathways (Fig. [123]3D;
   Supplementary data [124]4). The fraction of drug models with at least
   one significantly enriched MOA-pathway increased upon considering
   models with a correlation ρ > 0.5 (Fig. [125]3D; Supplementary
   data [126]4). We then clustered pathway enrichments of the
   best-performing drug models (ρ > 0.5) having at least one significant
   MOA-pathway (Fig. [127]3C). We found that certain processes, such as
   “Apoptosis”, “Cellular responses to stress”, “Biological Oxidations”,
   “MAPK family signaling cascades”, were widely enriched across drugs,
   often consistently with drugs’ MOAs, and clustered apart
   (Fig. [128]3C).

   We evaluated whether important genes also recovered information about
   gene dependencies of cancer cell lines from different tissues.  To this
   end, we retained drug-cell line instances yielding the most significant
   predictions, ranked the top k most important genes based by SHAP
   values, and pooled them on the basis of the tissue of origin of the
   cell lines. We then evaluated the recall of the top k important genes
   to identify core essential genes from an updated dependency map across
   27 cancer tissues^[129]23. Remarkably, when aggregating the top 100
   genes by SHAP importance, we identified core essential genes with a
   recall greater than 0.9 in several tissues (Fig. [130]3E, Supplementary
   data [131]5). These essential, prediction-important genes are often
   found in highly connected protein-protein interaction networks (e.g.,
   STRING network of important, essential genes in lung, Fig. [132]3F). By
   ranking genes based on their average SHAP importance across drug
   models, we found BCL2L1 (Bcl2-like 1) and YAP1 (Yes1 Associatd
   Transcriptional Regulator) as the top 2 most important genes (brighter
   nodes, Fig. [133]3F; Supplementary data [134]6). Overall, this analysis
   suggests that drug sensitivity is also realized through the modulation
   of genes that are essential for cell survival and proliferation.

Extending explainable drug sensitivity predictions to the PRISM dataset

   We employed a similar strategy to train drug-specific, All-genes models
   for 6337 drugs and 887 cell lines available in the PRISM
   database^[135]5. We obtained a total of 762 drug models with a
   correlation ρ = 0.2 (Fig. [136]4A; Supplementary data [137]7; see
   Methods). The aggregated IC50 predictions of these models with
   experimental values achieved a ρ = 0.80 and MSE = 1.18 (Fig. [138]4B).
   Drugs targeting kinases are by far the category with the highest number
   of models with a correlation ρ > 0.2, being also more effective in
   recovering the corresponding targets among important genes
   (Fig. [139]4C). Other recurrent drug target categories are generic
   Enzymes, followed by Epigenetic enzymes and GPCR targets
   (Fig. [140]4C). Kinases also stand out when normalizing the number of
   models for the total number of drugs considered in PRISM (Supplementary
   Fig. [141]2A, B), or considering the number of drugs achieving a
   certain LFC threshold (Supplementary Fig. [142]2C).

Fig. 4. PRISM model performance and interpretation.

   [143]Fig. 4
   [144]Open in a new tab

   A scatter plot of PRISM drug-specific model correlations and MSEs. Dots
   are colored according to IQR values, ranging from blue to red for
   increasing values of IQR. The density plots located at the top and left
   of figure  compares the distribution of correlation and MSE values
   between all models (in blue) and models with an IQR greater than 1 (in
   red); B scatter plot of predicted vs experimental IC50 values from
   models with correlation ρ > 0.2; C barplot statistics of PRISM models
   with corr ρ > 0.2 (salmon) and corr ρ > 0.2 and target recovered,
   stratified by putative target protein families (red); D fraction of
   target recovery for the top 25 ligand-target pairs (same plot as 2 A
   but for PRISM). Red lines represent the 95th percentile of the Hit
   Fraction distribution across all genes for a given drug. The length of
   the bars on the right shows the median pearson correlation for each
   drug-specific model; E heatmap showing significant MOA-pathways (rows)
   for drug models (columns) with corr ρ > 0.5 (same plot as 3E but for
   PRISM); F Venn diagram comparing MOA-pathway significantly enriched on
   PRISM’s and GDSC’s drug model important genes. Source data are provided
   as a [145]Source Data file.

   Inspection of the most important genes revealed that 62% (339 out of
   547) of the models for drugs with annotated target information
   identified one corresponding target among the important genes across at
   least one train/test split (Fig. [146]4D; Supplementary data [147]8).
   Also for the PRISM dataset, we successfully identified 73.7% of drug
   targets at or above the 90th percentile threshold of the background
   recovery distribution (Fig. [148]4D). STF-31 and CGM097 were the two
   drug models that recovered their corresponding targets (i.e., the
   regulator of intracellular NAD+ pool Nicotinamide
   phosphoribosyltransferase, NAMPT; and the p53 ubiquitin ligase MDM2,
   respectively) in 100% of the training/testing splits (Fig. [149]4D).
   Among the most supported models, we also found a few non-oncological
   drugs, such as PARDOPRUNOX (targeting the Serotonin 5-HT7 receptor
   HTR7), which is approved as a treatment for Parkinson.

   Similarly to the GDSC drug datasets, we curated MOA-pathways for 6305
   PRISM drugs (Supplementary data [150]9) and performed pathway
   enrichment on the most important genes of All-genes models. To
   determine whether the LLM was identifying drug-specific biological
   pathways, and not those broadly associated with cancer, we leveraged
   available PRISM drugs’ categorization into chemotherapeutic agents,
   targeted therapies, and non-oncological drugs. We analyzed the
   proportion of biological pathways assigned to each drug type and found
   that different classes of drugs are associated with distinct sets of
   pathways (Supplementary Fig. [151]3). We also performed pathway
   analysis on important genes for each drug model and similarly displayed
   enrichments for the best-performing instances (i.e., models with
   ρ > 0.5). Certain pathways, e.g., “Cell Cycle, Mitotic”, “Cellular
   responses to stress”, “Apoptosis”, “Cell Cycle Checkpoints”, “Cytokine
   Signaling in Immune System” and “Signaling by Receptor Tyrosine
   Kinases”, were significantly enriched in a larger set of drugs,
   consistently with their MOAs (Fig. [152]4E, Supplementary
   data [153]10). Notably, these pathways are largely overlapping with
   recurrently enriched pathways in GDSC drug models (Fig. [154]3D).
   Intriguingly, while several enriched MOA-pathways also match processes
   that are recurrently enriched in GDSC oncology drugs, we found a bigger
   proportion of pathways exclusively enriched only for PRISM drugs,
   suggesting a broader diversity of mechanisms of actions (Fig. [155]4F,
   Supplementary Fig. [156]3A).

MOA-primed models improve drug-sensitivity predictions

   We leveraged the information of curated MOA-pathways to select the most
   informative variables (i.e., gene expression) for a given drug to
   develop knowledge-driven models (hereinafter referred to as MOA-primed
   models).

   For model priming, we considered MOA-pathways generated via GPT-4 as
   well as through the freely available Mixtral Instruct 8x7b model.
   Models obtained through the latter approach achieved higher
   performances (Fig. [157]1B, C) and are discussed further below.

   On GDSC, MOA-primed models performed overall better than their
   All-genes counterparts, with median ρ = 0.5 (Fig. [158]5A). Overall,
   aggregated IC50 predictions have a high correlation with experimental
   values (ρ = 0.89) and the lowest MSE (1.52) (Fig. [159]5B). Certain
   drug models, such as Bicalutamide, showed a remarkable increase in
   their performance, by almost doubling their correlations when
   considering only MOA-pathway genes as dependent variables
   (Fig. [160]5E). On the other hand, a few drug models, such as BX795,
   Gemcitabine or Savolitinib displayed a reduction in performance of the
   MOA-primed models compared to All-genes models (Supplementary
   Fig. [161]4).

Fig. 5. MOA-driven models.

   [162]Fig. 5
   [163]Open in a new tab

   A comparison of the correlation distribution of all-genes (red) vs
   MOA-primed (blue) GDSC models; B scatter plot of predicted vs
   experimental IC50 values from GDSC MOA-primed models. The coloring of
   the dots on the scatterplot indicates the density of points around a
   particular area; C comparison of the correlation distributions of
   all-genes (red) vs MOA-primed (blue) PRISM models (considering only
   models with IQR > 1); D scatter plot of predicted vs experimental IC50
   values from PRISM MOA-primed models with correlation ρ > 0.2. The
   coloring of the dots on the scatterplot indicates the density of points
   around a particular area; E boxplot of correlation distributions for
   GDSC models showing the greatest correlation improvement of the
   MOA-primed vs all-genes models. Each box plot represents a specific
   drug model and displays the median correlation (central line),
   interquartile range (box edges), and variability outside the upper and
   lower quartiles (whiskers); F boxplot of correlation distributions for
   PRISM models showing the greatest correlation improvement of the
   MOA-primed vs all-genes models. Boxplots follow the structure of (5E).
   Source data are provided as a [164]Source Data file.

   Also for PRISM, we employed MOA-pathways to select genes to train
   MOA-primed models, which outperformed All-genes models. Notably, we
   obtained almost twice the models with ρ > 0.2 compared to All-genes
   counterparts (1254 vs 762; Supplementary data [165]7). When considering
   only drugs with IQR > 1, we obtained a median ρ = 0.32 (Fig. [166]5C)
   and a correlation of aggregated IC50 predictions with experimental
   values ρ = 0.93 (Fig. [167]5D). Also in this case, several drugs (e.g.,
   Rolapitant) displayed great improvement in the performance of the
   MOA-primed model with respect to the All-genes counterpart
   (Fig. [168]5F).

CellHit inference on aligned bulk RNAseq from TCGA patients data recovers
cancer type-specific mono and combination therapies

   We deployed our model to forecast effective drug treatments for TCGA
   tumors based on their transcriptomic profiles. We first compiled a list
   of FDA drugs approved for specific cancer types (i.e.,
   [169]https://www.cancer.gov/about-cancer/treatment/drugs/cancer-type)
   and matched them to the corresponding cancers in TCGA (Supplementary
   data [170]11). This yielded a total of 41 GDSC drugs approved for 23
   cancer types. For each drug, we ranked the top k predicted clinical
   samples according to two criteria: either predicted log IC50 or
   quantile score (Fig. [171]6A; Supplementary data [172]12). The latter
   is a quantitative measure that trades-off between efficacy and
   selectivity of a drug, i.e., how much a given drug is predicted to be
   potent for a particular sample relative to all the other samples
   inferred (see Methods). We determined the number of top k samples to
   consider (i.e., 600) as the one that optimized metrics related to the
   binary classification of drug prediction matching cancer type
   prescriptions (see Methods; Supplementary Fig. [173]5). In general,
   both metrics worked well in prioritizing patients with matching cancer
   types (Fig. [174]6B). For certain drugs, such as Cytarabine, Venetoclax
   and 5-azacytidine, we achieved excellent recall statistics for the
   cancer type for which the drug is prescribed (Fig. [175]6B). Overall,
   we found that 37 out 41 (90%) GDSC drugs’ models found, among the top
   600 ranked patient samples, at least one from a cancer type for which
   the corresponding drug is approved (Fig. [176]6C). For several drugs,
   most of predicted samples are derived from cancer types for which that
   drug has been developed. For instance, Fulvestrant (an anti-estrogen)
   in Breast Cancer (BRCA), BCL2 inhibitors and Cyclophosphamide in Breast
   Cancer or Acute Myeloid Leukemia (LAML), the mutated BRAF inhibitor
   Dabrafenib and the MEK1/2 inhibitor Trametinib in skin cutaneous
   melanoma (SKCM) (Fig. [177]6C). Since it is known that Dabrafenib is
   only effective against BRAF V600E mutated tumors, we looked if genes
   involved in the drug’s MOA were deemed important by the model. However,
   consistent with the notion that is the mutation in BRAF and not its
   absolute expression level to determine sensitivity to Dabrafenib, we
   didn’t find BRAF among the important genes. Moreover, none of the
   BRAF-associated pathways were found among the significantly enriched
   ones. On the other hand, we found that BRAF mutations are the most
   recurrent among the top 600 patients scored by the Dabrafenib model
   (Supplementary Fig. [178]6A), confirming that the model can infer the
   mutational status from gene expression signatures. While the majority
   of the best scored samples with BRAF mutations are affected by
   melanoma, as expected, we also ranked several BRAF mutated samples from
   other tumors, such as thyroid carcinoma (THCA) or Diffuse Large B-cell
   Lymphoma (DLBC) (Supplementary Fig. [179]6B), confirming the
   repurposing potential of Dabrafenib in more rarely BRAF-mutated
   tumors^[180]24. We also observed in most other drug models that among
   the top 600 ranked samples, several came from cancer types for which
   the drug is not currently prescribed (Supplementary Fig. [181]7),
   suggesting additional possible repurposing candidates.

Fig. 6. Drug sensitivity predictions on TCGA samples.

   [182]Fig. 6
   [183]Open in a new tab

   A Schema of a drug response prediction workflow leveraging Bulk RNA
   sequencing data from TCGA patients, as well from new PDAC and GBM
   cohorts. The process begins with data harmonization using Celligner,
   followed by drug inhibitory concentration (IC50) predictions through
   CellHit. Patients are then ranked by their predicted predIC50 values
   and quantile score to assess drug efficacy. Validation involves
   comparing TCGA predictions with NCI cancer drug metadata and refining
   tumor-specific predictions by clustering patient responses within
   cancer subtypes for experimental validation.; B recall of the recovered
   drug indications, from NCI cancer drugs, for the TCGA best ranked
   samples (top 600), according to either predicted IC50 (predIC50) or
   quantile score metrics; C stacked barplot of the GDSC drugs scoring
   among the top 600 samples patients with cancer types matching the
   prescription according to NCI drugs. The height of the barplot’s stacks
   corresponds to the number of unique samples and the color of the
   specific cancer type; D circle plot showing drugs predicted for the
   same pool of patients, i.e., suggesting combination therapies. Circle
   diameter is proportional to the number of unique samples, among the top
   600, best scoring for both drug models. Colors indicate the level of
   support for that combination, i.e., approved (red) or sharing
   indication for the same cancer type (dark green); E Inference on TCGA
   data for the 20 best performing non-oncological drug models in the
   PRISM dataset. The height of the barplot’s stacks corresponds to the
   number of unique samples and the color of the specific cancer type
   (highlighting potential drug repurposing opportunities). Colors are
   shared with panel C. Source data are provided as a Source Data file.

   A total of 10.5k samples across 33 different cancer types were ranked
   among the top 600 scoring ones by multiple drug models, suggesting the
   potential for combination therapies (Fig. [184]6D; Supplementary
   data [185]13). We inspected the predicted drug combinations and ranked
   them according to the number of predicted samples. This analysis showed
   that many of the top-ranking combinations are already approved, such as
   Trametinib and Dabrafenib in SKCM, Venetoclax and Cytarabine or
   5-azacytidine in LAML, Fulvestrant and AZD5363, Alpelisib, Ribociclib
   or Palbociclib in BRCA, as well as Oxaliplatin and 5-Fluorouracil in
   colon adenocarcinoma (COAD) (Fig. [186]6D; Supplementary data [187]14).
   We found additional predicted combinations characterized by shared
   indications, highlighting the potential for combination therapy
   (Fig. [188]6D; Supplementary data [189]14).

   Likewise, we used the models trained on the PRISM dataset to infer drug
   sensitivities for each TCGA sample by considering the top 600 samples
   ranked by each drug model. We inlcuded only the top 20 best performing
   models for non-oncological drugs, including 8 for Enzymes and 6 for
   GPCRs (Fig. [190]6E). The analysis revealed cancer-type specific
   patterns among the best scoring samples for each drug. For instance,
   two Adenosine Receptors antagonists, i.e., CGS-15943 and MRS-1220,
   which have been recently proposed as effective therapies against
   multiple cancers^[191]25,[192]26, are indeed predicted for multiple
   samples of breast (BRCA), liver (LIHC), prostate (PRAD) as well as
   gastric (STAD) cancers (Fig. [193]6E).

   Hence, we demonstrated that the CellHit models can reliably predict
   anti-cancer therapies based on patients’ transcriptomic signatures.

CellHit predictions identified drugs selective for distinct PDAC subtypes

   Next, we determined whether CellHit could infer possible drugs with
   specific effects against selected pancreatic ductal adenocarcinoma
   (PDAC) subtypes recently identified using a laser microdissection-based
   spatial transcriptomics approach^[194]27.

   We first verified the projection of the PDAC samples with the cancer
   cell lines from CCLE database. As expected, PDAC samples were mapped to
   the tissue of origin at the transcriptome level (Supplementary
   Fig. [195]8) while they displaying different profiles in terms of
   cancer cell lines responsiveness to drugs (Fig. [196]7A). PDAC samples
   assigned to the well-differentiated Glandular (GL) subtype were mainly
   associated with the esophagogastric adenocarcinoma cell lines and only
   in a minor fraction to the pancreatic adenocarcinoma ones. Conversely,
   samples of the Transitional (TR) subtype, which is characterized by
   gene expression programs suggestive of epithelial-mesenchymal
   transition, were mainly associated with invasive breast carcinoma and
   head and neck squamous carcinoma. This result suggests that PDAC cancer
   cell lines may exhibit heterogenous responses to drugs^[197]28 and that
   the closest cancer cell lines are rarely associated with the pancreatic
   lineage.

Fig. 7. CellHit predictions on distinct PDAC subtypes and experimental
validation.

   [198]Fig. 7
   [199]Open in a new tab

   A Enrichment of predicted cancer cell lines that react most similarly
   to the available drugs for the Glandular (GL) and Transitional (TR)
   subtypes. Classification of the tissue types of cell lines (oncotree
   system) is shown; B Heatmap of predicted IC50 (predIC50) of GDSC drugs
   derived by CellHit prediction in PDAC samples. K-means clustering
   (n_clusters=2, method=Euclidean distance) was performed. Subtype
   annotations are shown for each sample; C Violin plot showing the
   predicted IC50 (predIC50) of Gemcitabine for the Glandular (GL) and
   Transitional (TR) subtypes; D Euclidean distance derived from Celligner
   between each PDAC cell line (CFPAC-1, PANC-1) and each selected tumor
   type (Pancreatic, Esophagogastric, Invasive Breast, Head and Neck); E
   Violin plot showing the predicted IC50 (predIC50) of Irinotecan and
   Etoposide for the Glandular (GL) and Transitional (TR) subtypes; F
   percentage of cell viability of CFPAC-1 and PANC-1 cells treated with
   increasing concentrations of Irinotecan or Etoposide at 24, 48 and
   72 h. Individual values represent the average of three independent
   experiments ±SD. Source data are provided as a [200]Source Data file.

   We next applied the GDSC-trained version of CellHit to the PDAC samples
    to predict drugs for which PDAC subtypes exhibited differential
   sensitivities. Hierarchical clustering based on the predicted IC50 of
   the available drugs showed that PDAC samples segregated into two main
   groups according to their subtype of origin (Fig. [201]7B;
   Supplementary data [202]15). Moreover, it also revealed two main
   clusters of drugs to which PDAC samples showed different sensitivities.
   In cluster 1, TR samples were associated with resistance to the
   treatments compared to the GL ones. This finding aligns with the
   behavior and the poor prognosis of PDACs, in which the Transitional
   subtype is the most abundant tumor component^[203]27. Cluster 2 was
   instead enriched in drugs with different effects on both subtypes
   likely due to the co-existence of endodermal gene programs in these two
   PDAC variants^[204]27.

   Standard-of-care chemotherapeutic drugs, such as Gemcitabine (cluster
   1), were previously described to act mainly on classical epithelial
   (glandular) cells^[205]28. As expected, CellHit predicted that the GL
   subtype was more sensitive to Gemcitabine, since these cells have
   epithelial gene expression programs, compared with the TR subtype which
   displays quasi-mesenchymal phenotype (Fig. [206]7C). This analysis
   confirms the subtype-specific responses to this drug^[207]28.

   To test these drugs for their repurposing potential, we sought the PDAC
   cell lines that most closely resembled the cancer cell lines shown by
   CellHit to have similar drug response profiles to those inferred from
   patients’ samples (Fig. [208]7A). Celligner revealed that CFPAC-1 was
   transcriptionally closest to both the pancreatic and the
   esophagogastric adenocarcinomas while PANC-1 was associated to the
   invasive breast and the head and neck squamous carcinomas
   (Fig. [209]7D), lineages that were previously associated by CellHit to
   GL and TR subtypes, respectively. Among the drugs enriched in cluster 1
   and predicted to be more effective in the GL subtype compared to the TR
   one (Fig. [210]7C, E), we also found two topoisomerases inhibitors,
   such as Irinotecan and Teposide (or its analogous Etoposide), that are
   approved and used in clinics for cancer therapy^[211]29,^[212]30. To
   validate our predictions, we treated CFPAC-1 and PANC-1 cells with
   these drugs at different drug concentrations to test the cell viability
   over three days (Fig. [213]7F). Both treatments were preferentially
   active in the CFPAC-1 cell line with respect to PANC-1 at the dose
   corresponding to its Cmax, the maximal concentration that can be
   reached in patients’ blood. Overall, the experimental validation was in
   concordance with CellHit predictions confirming the robustness of the
   model to infer drug responses in different tumor subtypes and
   underlining the capacity to repurpose FDA-approved drugs for one of the
   most lethal solid tumors for which there are no efficient drugs
   available.

Validation on primary cells from GBM patient’s samples the specific response
profiles predicted by CellHit

   To further assess CellHit’s capabilities in a real case scenario, we
   employed the model trained on the GDSC dataset to infer the drug
   sensitivity profiles for 64 samples obtained from patients with
   Glioblastoma Multiforme (GBM). We employed the inferred drug
   sensitivity profiles to cluster patients, which identified two main
   groups displaying different sensitivity profiles to drugs
   (Supplementary Fig. [214]9). We chose for experimental validation
   primary cell cultures obtained from the samples Gb130 and Gb107,
   considered as representatives of the two groups (Supplementary
   Fig. [215]9). We tested two compounds, the Mcl-1-specific inhibitor
   AZD5991 and the E3 ubiquitin-protein ligase XIAP inhibitor, AZD5582, as
   we found them to be respectively more and less sensitive relative to
   the median values across samples. When considering only the predicted
   lnIC50, we predicted for both samples Gb130 and Gb107 higher
   sensitivities for AZD5582 than AZD5991 (Fig. [216]8A), which we
   experimentally confirmed (Fig. [217]8C, E). Indeed, experimental lnIC50
   are in line with the predicted ones, particularly for the Gb130 sample.
   Predictions of the Gb107 sample differ more compared to experimental
   results, which anyway confirmed that AZD5582 is more effective in
   inhibiting the growth of the cancer cell line. Such discrepancy might
   likely be due to highly specific transcriptional program characterizing
   this sample and its derived primary cell culture, which is closest to a
   Leiomyosarcoma cell line based on its response patterns (Supplementary
   data [218]16).

Fig. 8. GBM patient samples inference and experimental validation.

   [219]Fig. 8
   [220]Open in a new tab

   A predicted lnIC50 of drugs AZD5991 (blue) and AZD5582 (red) for
   samples Gb130 and Gb107. Horizontal dashed lines indicate the median of
   the experimental lnIC50 of each drug on GDSC cell lines. Error bars are
   obtained by computing SD across models in the ensemble to estimate the
   overall model uncertainty; B predicted quantile scores of drugs AZD5991
   (blue) and AZD5582 (red) for samples Gb130 and Gb107; Dose-response
   curves for the two patient-derived primary cultures of GBM, C Gb130 and
   E Gb107 using the Crystal Violet assay. The curves illustrate the
   response to treatment in terms of reduction of cell viability with
   AZD5991 and AZD5582, measured 72 h post-treatment for Gb107 (C) and
   Gb130 (E). The IC50 values for both drugs are provided. The
   dose-response curves are also compared across the two cell lines for
   AZD5991 (D) and AZD5582 (F). The AZD5991 curve is represented by blue
   lines, while red lines are used for AZD5582. The profiles for Gb107 are
   depicted with dashed lines and triangles marking the points, while
   those for Gb130 are represented with solid lines and circles marking
   the points. The threshold for a 50% reduction in cell viability is
   indicated by a dark dashed line. Error bars on plots (C, F) are
   computing considering data from assys performed three times in
   triplicates. Source data are provided as a [221]Source Data file.

   By considering the deviation of the predicted lnIC50 values from the
   median of the distribution of experimental lnIC50 across cell lines, it
   is possible to observe that AZD5991 had a predicted lnIC50 lower than
   the median lnIC50 (i.e., 4.591) for Gb130 and higher for Gb107, while
   AZD5582 had a predicted lnIC50 higher than the median lnIC50 (i.e.,
   2.014) for both Gb130 and Gb107 samples (Fig. [222]8B). Such different
   extents of deviation from the median are also associated to differences
   in quantile scores for these drugs for the inferred samples. Indeed,
   while AZD5991 has a bigger difference of quantile scores among cell
   lines, with higher values for Gb130, AZD5582 is characterized by more
   comparable values. We therefore proceeded to test the difference in
   response of the same drug on the two primary cultures simultaneously
   (see Methods). We observed a greater reduction in viability in Gb130
   compared to Gb107 in response to the treatment with AZD5991
   (Fig. [223]8D). This result was consistent with the model, which
   predicted a lower lnIC50 for GB130 (3.05) than for GB107 (5.57). For
   AZD5582, we did not observe a clear difference in cell viability
   reduction following treatment (Fig. [224]8F).

Discussion

   In this study, we have developed a ML framework to predict and
   interpret the sensitivity of cancer cell lines to drug treatments and
   proposed a strategy to perform robust inference with this model on bulk
   RNAseq obtained from patients’ samples.

   Our analysis confirmed that transcriptomics data are a critical
   component to predict cell line drug sensitivity. Indeed, drug-specific
   models, developed by considering as features only the cell line
   expression data, showed similar performances to top performing
   predictors entailing a joint representation of drug and cell lines. The
   usage of drug-specific models provided several advantages: it is less
   memory intensive during training, a crucial factor when dealing with
   big datasets such as PRISM and can be easily adapted to new datasets
   and drugs. Most importantly, our XGBoost-based model for individual
   drugs provides an easy mean to interpret the predictions and explain
   which genetic expression features are leveraged by the model. As
   highlighted by ref. ^[225]5, numerous drugs display selective activity
   profiles. Variations in the efficacy of drugs might reveal key
   molecular mechanisms that underpin specific cancer types. Insights
   gained from the interpretability of our models shed light on these
   nuances, enabling a deeper understanding of drug interactions with
   genetic expression profiles. This knowledge can be pivotal in the
   development of targeted therapies within a personalized medicine
   framework, where understanding the particularities of drug responses is
   essential for tailoring treatments to individual patient profiles. The
   dual criterion that we have employed for determining feature
   importance, i.e., SHAP and permutation importance, sets a very
   stringent requirement for the identification of important expression
   signatures. Despite this, we found that many drug models are indeed
   learning the mechanisms responsible for cell-line drug sensitivity
   solely based on cell-line basal transcriptomics data.

   We employed a strategy based on a freely available LLM, i.e., Mixtral
   Instruct 8x7b, to improve the MOA description for each drug and use it
   to associate semantically closest pathways from a reference knowledge
   base. Through this resource, we assessed that many of the All-genes
   models are characterized by significantly enriched pathways matching
   the MOA for the corresponding drug, in addition to frequently
   identifying the nominal targets. Overall, 135 GDSC drug models out of
   253 can recover either the target or MOA-related pathways among
   important genes, suggesting that drug-specific models leverage known
   biological mechanisms to carry out their prediction task. We also
   demonstrated that the important genes of the model successfully
   recapitulated cancer tissue essential genes from a recently published
   compendium^[226]23. Our models’ interpretation clearly suggests that
   drug sensitivity emerges from the interplay of drug MOAs and genes
   essential for cell survival. We speculate that the higher the degree of
   interaction between MOAs and essential genes, the more effective a drug
   is in inhibiting cancer cell growth. Our purely data-driven approach to
   explain drug sensitivity mechanisms can be considered as a
   complementation of previous approaches, either based on simple
   correlation of sensitivity and basal gene expression^[227]4, or
   integrated with PPI network and pathway analysis^[228]31.

   The proposed LLM pipeline extracts features in a human-readable and
   interpretable manner, and represents an example of how to use powerful
   LLMs by leveraging relevant domain knowledge. The growing ‘reasoning’
   capabilities of future models could further improve the capabilities of
   the proposed approach by leveraging multi-modal contents, such as
   images and knowledge graphs and similar applications of LLMs are
   already appearing in the literature (e.g.,
   [229]https://biochatter.org/^[230]32). The usage of novel strategies to
   mitigate LLMs’ hallucination (e.g., ref. ^[231]33) will be critical to
   systematically assess the predicted outputs and reliably incorporate
   them in knowledgebase annotation procedures.

   In our study, we first tailored the training and interpretability
   strategies to model the GDSCv2 dataset, which has also been extensively
   tackled by a multitude of previous methods^[232]7–[233]9. We then
   applied the same strategy to the PRISM dataset, which consists of a
   much larger panel of drugs screened against cancer cell lines. Although
   the PRISM dataset poses specific challenges, such as many drugs showing
   little or no effect at all on cancer cells, we showed that several
   models achieved good performances and recovered the information of
   targets and associated MOAs, which encompass a broader range of
   biological processes compared to GDSC drugs. This ML model tackled drug
   sensitivity predictions on both GDSC and PRISM through an explainable
   framework. However, it is possible that the kind of regression model
   that we are employing here might not be best suited for many of the
   drug candidates that we have observed in PRISM, characterized by highly
   specific activity profiles (in other words, showing activity on a
   limited number of cells). Considering the high anti-cancer potential of
   many non-oncological drugs against certain targets (e.g., GPCR
   drugs)^[234]26,[235]27,[236]34, it will be interesting to explore in
   the future alternative ML frameworks to model the sensitivity of those
   drugs for which we obtained low performances due to high specificity
   and low variability of the response.

   To deploy the models in real world scenarios, i.e., on bulk RNAseq data
   obtained from patients, we employed Cellligner^[237]18 to align bulk
   RNAseq from patients to those from cell lines upon which the model has
   been trained. We used Celligner’s transformed RNAseq data both to train
   the model (i.e., CCLE) as well as to perform inference on more than 10k
   patients’ samples from TCGA to predict an IC50 for each drug. Best
   predicted drugs for each patient often matched mono- and
   combination-therapies approved for the corresponding cancer type, such
   as Venetoclax AML, Fulvestrant in BRCA and Dabrafenib in SKCM, along
   with multiple combination drugs whose usage has been already approved.
   These results support the high potential for translation of our model
   predictions, as we find many more predicted mono- or
   combination-therapies for many TCGA cancer types, with some evidence of
   indications for combined usage, which might represent new repositioning
   opportunities.

   To further validate our strategy, we transformed RNAseq data from
   different morpho-biotypes recently identified for PDAC^[238]27 and
   inferred the most likely drugs for the samples of distinct subtypes. We
   showed that Irinotecan and Etoposide, predicted to be more effective
   against the “GL” than “TR” biotypes, indeed displayed differential
   sensitivity on cell lines more closely resembling the different tumor
   types reacting most similarly to the two PDAC subtypes, corroborating
   our predictions. The high intratumor heterogeneity, namely the
   coexistence in the same patient of heterogeneous groups of tumor cells
   with distinct morphological and transcriptional profiles, may lead to
   the adaptation of these cells to therapy. Thus, this approach could be
   important to design ad hoc combinatorial therapies based on the tumor’s
   subtype composition. We also demonstrated on GBM patients’ tumor
   samples the capability of our model to exploit predicted drug
   sensitivity profiles to cluster samples based on their similarities in
   drug sensitivity profiles. Through this approach we have identified
   representative samples with specific sensitivities, which we
   experimentally validated using match patient-derived tumor primary cell
   lines. These additional validations not only strengthen the reliability
   of our model but also highlights its potential translatability in
   clinical practice providing the therapeutic field of glioblastoma,
   which has remained stagnant for years, with a broad spectrum of
   potential new therapeutic possibilities.

   These results pave the way for future exploitation of the CellHit
   pipeline for inference using larger sets of drugs (i.e., PRISM) on
   alternative patient cohorts. In this respect, it will be critical to
   develop faster algorithms to align bulk RNAseq for inference with the
   model. The current strategy, based on Celligner, requires an initial
   alignment between CCLE, TCGA and any additional input RNAseq dataset,
   and subsequent retraining of the model on the transformed CCLE data. We
   plan to employ deep learning architectures, such as Variational Auto
   Encoders (e.g., Mober^[239]35), to improve this preliminary alignment
   step which is of critical importance to effectively deploy cell
   line-based models on patient samples. We will provide our models via a
   webapp for fast drug-sensitivity inference of bulk RNAseq data from
   patient samples provided as input. This will allow to analyze and
   compare inputted samples based on the similarity of their
   “responsiveness” profile, in addition to the one based on
   transcriptomic profile, which will speed up the hypothesis generation
   process to find new personalized treatments against cancer.

Methods

   The research was conducted in compliance with the principles outlined
   in the Declaration of Helsinki, and the protocol for sample collection
   received approval from the Ethics Committee of the University Hospital
   of Pisa (787/2015).

Datasets

Transcriptomics, IC50s and LFCs

   We obtained data model’s training from two comprehensive
   high-throughput studies: the Genomics of Drug Sensitivity in Cancer
   (GDSC)^[240]3 and the Profiling Relative Inhibition Simultaneously in
   Mixtures (PRISM)^[241]5. We downloaded the GDSC2 dataset, release
   version from 24 July 2022, from its official website
   ([242]https://www.cancerrxgene.org/). This dataset consists of 969
   cancer cell lines profiled for their responses to a panel of 286 drugs.
   Our primary focus within this dataset are the half-maximal inhibitory
   concentrations (IC50) values, which serve as target values for
   predictive modeling. We also incorporated the PRISM Repurposing Public
   23Q2 dataset from the Dependency Map (DepMap, Broad Institute) portal,
   considering log-fold change (LFC) in cell viability measurements and
   consisting of 919 cell lines and 6.415 compounds. We also retrieved
   drugs metadata available both in GDSC and PRISM regarding drug putative
   targets and mechanism of action. We obtained RNA sequencing (RNASeq)
   data from the Cancer Cell Lines Encyclopaedia (CCLE)^[243]2 as
   available from DepMap. We considered log2-transformed transcripts per
   million plus one (TPM + 1) data for protein-coding genes. We mapped
   CCLE cell lines data to GDSC by using available COSMIC to DepMap
   identifier mappings which allowed us to link cell lines to additional
   resources available on the DepMap portal such as mutations, gene
   essentiality, and additional metadata. We employed DepMap IDs to sort
   cell lines into different tissues and disease categories via the
   OncoTree classification system^[244]36.

TCGA data

   We integrated into our pipeline RNA bulk transcriptomic data from
   TCGA^[245]1 which we used for initial validation of the designed models
   on actual patient-derived samples. Specifically, our study employs the
   Tumor Compendium v11 Public PolyA dataset, released in April 2020,
   which amalgamates data from various publicly accessible repositories,
   including the TCGA and Therapeutically Applicable Research to Generate
   Effective Treatments (TARGET) projects. This data, sourced from the
   UCSC Treehouse Public Data platform, is downloaded in the RSEM
   log2(TPM + 1) normalized format.

Gene essentiality data

   We retrieved data from the 23Q4 CRISPR Gene Dependency dataset from the
   DepMap portal. This dataset contains synthetic lethality experimental
   data across 1100 unique cell lines and 18,444 genes. We refined our
   analysis to a subset of 17,425 genes, which are also represented in
   both the CCLE and TCGA databases. Matching and integration of this
   dataset with cell lines from GDSC and PRISM is made through DepMapID.

Mutations data

   We obtained somatic mutations data from the 23Q4 release of the DepMap
   portal. This dataset includes mutation data for 1750 unique cell lines.
   We considered the mutations of 693 high-consensus oncogenes and tumor
   suppressor genes as curated in the OncoKB database^[246]37. We
   binarized oncodriver genes as mutated or not mutated regardless of the
   specific mutation type. We cross-referenced this dataset with cell
   lines from GDSC and PRISM via DepMapID.

Reactome

   Pathway data from the Reactome database^[247]21 is incorporated into
   our analysis. Notably, we leveraged the directed acyclic graph
   representing the hierarchical structure of the pathways alongside a
   comprehensive list of pathways and their associated genes. Data
   extraction from Reactome is executed through two main methods: file
   dumps from Reactome, which provide us with the list of pathways and
   their hierarchical organization, and the REST API, which is used to
   obtain information on pathway-associated genes and the drugs that have
   been manually annotated to these pathways. The API is queried
   programmatically using the “get” function from Python’s built-in
   “requests” library. We employed topological sorting to organize the
   nodes (i.e., pathways) of the Reactome hierarchy, considered as a
   directed acyclic graph (DAG). We carried out the following steps: 1. We
   performed a topological sort on the Reactome graph using NetworkX’s
   ‘topological_sort‘ function; 2. we iterated through the sorted nodes,
   and assigned a layer number to each node based on its position relative
   to its predecessors; 3. The layer assignment follows this logic:
     * If a node has no incoming edges (no predecessors), it’s assigned to
       layer 0.
     * Otherwise, the node is assigned to the layer immediately following
       the minimum layer of its predecessors.

   Through this process, we systematically categorized nodes into
   hierarchical “layers”, designed to only have incoming edges from the
   layer above and outgoing edges to the layer below, hence maintaining
   the hierarchical structure of the Reactome pathways. Crucially, the use
   of topological sorting guarantees that when we move to lower level of
   the DAG, nodes predecessors already have an assigned layer (since
   assignment happens recursively by looking at them). We focused on
   pathways within the “1st Layer,” selecting a subset of 169 unique
   pathways from Reactome. Processing of this data is carried out by
   leveraging Python’s networkx library, specifically its
   “topological_sort” function.

NCI Cancer drugs

   We developed a natural language processing pipeline to programmatically
   identify drugs’ clinical indications for the different types of cancer
   as defined in TCGA. In more depth, this pipeline works by extracting
   information from textual data. Utilizing the Beautiful Soup
   ([248]https://www.crummy.com/software/BeautifulSoup/bs4/doc/) package,
   we retrieved links to drugs listed on the National Cancer Institute’s
   website ([249]https://www.cancer.gov/about-cancer/treatment/drugs). We
   then combined Python’s requests package with Beautiful Soup to extract
   textual descriptions of each drug. The extracted text is used as input
   to create a custom textual prompt. This prompt is then fed to an LLM
   with generation constrained by the Guidance package (see “Mixtral
   pipeline” section). The goal was to determine whether the text provided
   any evidence that the drug was prescribed for any of the 37 cancer
   categories from TCGA. Given the potential discrepancies in drug naming
   conventions between NCI and GDSC, we recover PubChem IDs from free-text
   drug names using PubChemPy, a tool developed by the curators of
   PubChem^[250]38 to programmatically retrieve information about chemical
   compounds. This step was vital in aligning these drugs with their
   counterparts in the GDSC, ultimately allowing us to obtain clinical
   indications for a total of 41 drugs. All data obtained through this
   pipeline is reported in Supplementary data [251]11.

PRISM drugs’ metadata and IQR analysis

   We obtained PRISM drugs categories, i.e., Chemotherapeutics, Targeted
   Oncology Drugs, and Non-oncology Drugs, from the secondary screening
   metadata provided by the 2019 PRISM project data release, available
   through the DepMap portal ([252]https://depmap.org/repurposing/). We
   transferred the classification to the drugs in the 2023 release of
   PRISM by matching their BROAD IDs. This yielded a final set of 835
   drugs, including 435 targeted therapies, 346 general oncology
   treatments, and 54 chemotherapeutics. We calculated the Interquartile
   Range (IQR) for the Log Fold Change (LFC) data across all 6337 drugs
   represented in the PRISM dataset. It is important to note that, due to
   the properties of the logarithm in base 2, an IQR greater than 1
   implies that the viability counts of the cell line at the 75th
   percentile after a given drug are at least twice as high as those
   observed for the cell line at the 25th percentile.

PDAC subtype data

   For the drug prediction in PDAC subtypes (Glandular, GL; Transitional,
   TR; Undifferentiated, UN), data were obtained from the transcriptional
   profiles of multiple morphological distinguishable tumor areas isolated
   by laser micro-dissection (LMD) in primary PDACs of treatment-naïve
   patients^[253]27.

Data pre-processing

Standardization and filtering

   We first selected a subset of common genes among the datasets, using
   the Human Gene Nomenclature Committee (HGNC)
   ([254]https://www.genenames.org/) system. We removed genes exhibiting
   zero standard deviation in either of the datasets (CCLE or TCGA),
   yielding a final set of 18.174 genes. We standardized transcriptomic
   data by subtracting the mean and dividing by the standard deviation
   across the whole dataset. This normalization is not particularly
   important for tree-based models but is important for the stability of
   neural networks and other models (used as baselines in this work). For
   the response values (Ys), we conducted a drug-by-drug standardization,
   by removing for each drug the mean and standard deviation computed
   specifically for that drug. This standardization ensured that the
   model’s predictions were not skewed by the inherent differences in the
   drug response scales but were instead sensitive to the nuances in
   response patterns specific to each drug, considering that different
   drugs can have varying ranges and distributions of response values. The
   standardization was performed only on the training set. The splitting
   strategy used throughout the analysis is described in the “Model
   testing” section. At the end of the preprocessing step, the GDSC
   dataset comprised 686 unique cell lines, 286 unique drugs and a total
   of 169.208 drug-cell line pairs (IC50 values). The PRISM dataset
   comprised 887 unique cell lines, 6337 unique drugs and a total of
   3.810.028 drug-cell line pairs (LFC values).

Alignment of patients’ bulk RNAseq with Celligner

   To harmonize representations of bulk RNAseq transcriptomic profiles
   from cancer cell lines and patient-derived samples, we adopted the
   pipeline proposed by the Celligner method^[255]18. We employed the data
   from the CCLE and the Tumor Compendium v11 Public PolyA (see “Datasets”
   section). Unlike the Celligner publication, in our study we also
   integrated, together with TCGA, bulk RNASeq data obtained from two
   additional patient cohorts with pancreatic ductal adenocarcinoma (PDAC)
   and glioblastoma multiforme (GBM)(see below). This integration is
   achieved by subsetting all data based on a common set of available
   genes, concatenating experimental samples with TCGA data, and then
   executing the cell alignment. We used the Python version of Celligner,
   available through its GitHub repository
   ([256]https://github.com/broadinstitute/celligner/tree/master).

Drugs and cells featurization

   We featurized the drugs using their two-dimensional structures. Since
   the GDSC dataset does not provide the structural identifiers for the
   tested drugs, we used PubChemPy to retrieve the SMILES representations
   for each tested compound, for a total of 229 unique drugs. We employed
   the Extended-Connectivity Fingerprints (ECFP)^[257]39 and ChemBerta
   fingerprints^[258]40 for drug featurization. Both ECFP and ChemBerta
   fingerprints were computed using the MolFeat
   ([259]https://molfeat.datamol.io/) library. As a control, we also
   introduced a OneHot embedding technique. This approach produced a
   229-dimensional vector for each drug, with the ith position marked as
   ‘1’ for the ith drug and ‘0’ in all other positions. In parallel, we
   also explored alternative featurization strategies for the cell lines.
   On one hand, we considered the log2 + 1 normalized expression values of
   a total of 18.174 genes. We also explored a dimensionality reduction
   technique, specifically Principal Component Analysis (PCA), retaining
   eigenvectors describing 90% of the total variance (395 components). PCA
   analysis is performed through the “PCA” function from Scikit-learn
   library (ref. ^[260]41 and fitted on the log2 + 1 normalized expression
   values.

MOA-pathway annotation

   To identify pathways potentially involved in the mechanism of action
   (MOA) of a certain drug, we implemented a pipeline with three different
   attribution criteria: an LLM attribution criterion, a pathway
   membership criterion and a manual Reactome pathway annotation
   criterion. Below we provide detailed explanations for each of these
   three criteria.

Extending drug MOA with LLM

   We employed two distinct Large Language Models (LLMs) for the
   intelligent extraction and interpretation of drug-related information:
   GPT-4, a proprietary model developed by OpenAI
   ([261]https://openai.com/chatgpt), and Mistral Instruct, which is
   freely available and developed by MistralAI ([262]https://mistral.ai/).
   Information extracted from the LLMs was leveraged to identify
   biological pathways likely influencing drug efficacy, which can then be
   used to select on which genes a model should be trained. The full list
   of curated pathways for each criterion is reported in Supplementary
   data [263]3 and [264]S9 for GDSC and PRISM respectively.

GPT-4 based pipeline

   The GPT-4 pipeline begins with the extraction of GDSC’s drug metadata.
   These primarily include short textual labels describing the mechanism
   of action or the putative target of each drug (253 drugs out of the 286
   available in GDSC). Utilizing a specialized prompt, we engaged GPT-4 to
   expand this basic metadata into a comprehensive textual description,
   which elucidates the drug’s mechanism of action and its metabolic
   pathways in greater detail. We then used the expanded drug description
   as a second specialized prompt to task GPT-4 with the identification of
   the top 15 biological pathways likely to modulate the drug’s efficacy
   as well as to provide a reasoned explanation for the selection of each
   pathway. The model’s ability to elucidate its reasoning offers valuable
   insights into the complex interplay between drugs and biological
   systems, significantly augmenting our understanding of drug responses
   and offering an entry point to possibly debug the pipeline. Both of
   these steps are implemented using the OpenAI API, with a specific
   emphasis on the “function calling” feature introduced in July 2023.
   This feature is pivotal as it enables us to receive responses in a
   structured JSON format, adhering to a pre-specified schema and allowing
   integration in the data pipeline. The selection of pathways by GPT-4 is
   constrained and informed by the Reactome knowledge base, specifically
   it is forced to select pathways only among “Level 1” pathways (see
   “Reactome pre-processing”). This hierarchical approach not only
   provided a comprehensive overview of biological interactions at various
   levels of complexity but also allowed us to effectively control the
   scope of pathways among which GPT-4 operates, balancing the complexity
   and operative cost of our pipeline.

Mixtral pipeline

   We introduce an AI curation strategy employing Mixtral
   Instruct^[265]42, a freely available instruction-tuned 8 × 7 billion
   parameter mixture of experts LLM. Notably, the usage of a freely
   available architecture allowed us to devise a cost-effective and
   reproducible methodology, addressing computational and accessibility
   challenges. Our approach guarantees predictive accuracy on par with
   GPT-4 (or even exceeding) by leveraging three core strategies:
   structured Chain-of-Thought prompts^[266]43 for detailed reasoning,
   Self-Consistency^[267]44 procedures for the minimization false
   positives, and Retrieval Augmented Generation (RAG)^[268]45 for the
   integration of validated scientific literature.

   Operatively the pipeline unfolds across three phases: initial drug
   description generation based on metadata, refinement of these
   descriptions with RAG leveraging PubMed abstracts, and the selection of
   biological pathways through a self-consistency approach. Initially,
   drug descriptions are generated using detailed prompts that elicit
   step-by-step reasoning from Mixtral Instruct, mirroring the GPT-4
   pipeline but with enhanced specificity. To further diminish the
   likelihood of model hallucinations, descriptions are refined by
   integrating a RAG strategy through the Entrez module of the biopython
   python library^[269]46. This involves systematically retrieving and
   synthesizing information from the top 10 PubMed abstracts (sorted by
   relevance) related to each drug. Subsequently, the LLM is tasked with
   refining these initial descriptions, specifically prioritizing
   information from the abstracts in the event of conflicting statements
   and seamlessly integrating any relevant, missing knowledge. In the
   pathway selection phase, we introduce a self-consistency methodology
   where Mixtral Instruct is tasked multiple times (with different random
   seeds) to identify the most relevant biological pathways influencing
   drug efficacy. By considering pathways identified at least twice across
   multiple iterations, we significantly diminished the risk of false
   positives. Due to the absence of function calling capabilities in
   Mixtral Instruct, we interleave “generate” and “choice” functions from
   the Guidance library ([270]https://github.com/guidance-ai/guidance) to
   impose constraints on the LLM’s generative process and obtain
   structured outputs from the LLM. The use of Guidance not only allows
   for a controlled selection among predefined pathways but also ensures
   that the rationale behind each choice is generated before the pathway
   itself. This enhances the causal coherence of the model’s output and
   leverages its autoregressive nature.

   Operationally, we deploy a GPTQ 4-bit quantized version of Mixtral
   Instruct, sourced from the Huggingface model Hub
   ([271]https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ).
   This quantized model configuration strikes a balance between model
   precision and inference speed, significantly reducing computational
   costs. Crucially, its compatibility with NVIDIA V100 GPU cards,
   requiring less than 30 GB of VRAM, enabled us to conduct extensive
   deployment within our High-Performance Computing (HPC) cluster
   efficiently. Furthermore, the use of the vLLM^[272]47 library
   facilitates continuous batching and maximizes GPU utilization during
   inference for drug description generation and refinement. For the task
   of pathway selection, we integrated the Huggingface transformers
   library^[273]48 with the Guidance framework to adhere to Guidance
   requirements. All prompts utilized throughout the Mixtral Instruct
   pipeline, alongside the project’s code, will be publicly released.

Extending drug MOA with target pathways and drug annotations in Reactome

   We also linked drugs with potential Reactome pathways based on their
   putative targets. Utilizing the Reactome API, specifically the
   “referenceEntities” endpoint, we extracted entities classified under
   the ReferenceGeneProduct class for each pathway of “Level 1” (see
   Reactome Preprocessing). This process yielded a comprehensive list of
   gene products associated with each identified pathway. A pathway was
   considered relevant to a drug if its putative target was among the
   pathway’s gene products. We extracted known ligand associations to
   Reactome’s pathways, by using the Reactome API’s “referenceEntities”
   endpoint. Specifically, we retrieved for each of the “Level 1” pathways
   the objects assigned to the ReferenceTherapeutic class. This step
   provided a nested list of compounds along with their common names and
   corresponding identifiers from either PubChem or the Guide to
   Pharmacology for each pathway.

   To merge annotations, we re-employed the PubChemPy pipeline introduced
   in the “Drugs and cells featurization” chapter. For those instances
   where the Guide to Pharmacology was the initial source, this pipeline
   converted identifiers to PubChem format. Since we had previously
   acquired PubChem IDs for molecules within the GDSC dataset, this
   streamlined the integration process. Through this approach, we
   successfully associated pathway information with 66 distinct drugs
   present in the GDSC. We compared retrieved pathways using the three
   distinct attribution criteria using the Python matplotlib_venn library
   ([274]https://github.com/konstantint/matplotlib-venn).

LLM pathway recovery probability

   We assessed the specificity of the LLM in identifying distinct pathways
   for various drug categories by utilizing metadata from the PRISM 2019
   release (refer to “Metadata on PRISM Drug Categorization”). Our
   examination spanned different drug categories, focusing on the pathways
   highlighted by the LLM. For each drug category, we calculated the
   probability of an L1 pathway (see Data pre-processing) to be selected
   as relevant by dividing the number of times that pathways appear in
   that drug category by the total number of drugs within that category.
   To consolidate and visually represent our findings, we compiled the 25
   most commonly identified pathways across all categories, presenting
   them in an annotated heatmap depicted in Supplementary Fig. [275]S3.

Model training

   We employed a dual-strategy training methodology for our predictive
   models, each addressing unique aspects of the drug-cell line
   interaction landscape. The first strategy involved a joint drug and
   cell-line featurization approach under the hypothesis that a dual
   representation of both drug molecules and cells would lead to a more
   nuanced and accurate prediction of how various cancer cell lines
   respond to different drugs. As a second strategy, we employed a
   drug-by-drug modeling approach, where we focused on examining the
   effects of individual drugs on cancer cell lines, with particular
   attention to the transcriptional responses of the cells. This method
   allowed us to delve into the specific mechanisms through which each
   drug influences cell behavior, thereby offering insights into
   drug-specific interactions and responses.

Validation and hyperparameter optimization

   We describe a rigorous approach for validation and hyperparameter
   optimization essential for extracting maximum performance from deployed
   models. To obtain a fair comparison between methods, this process is
   uniformly applied across various models in our study.

   The proposed model-based hyperparameter search procedure leverages the
   Optuna framework^[276]49. Optuna is a cutting-edge tool for automating
   hyperparameter optimization and, specifically, deploys a
   Multi-Objective Tree Parzen Estimator (MO TPE)^[277]50. This estimator
   simultaneously optimizes two key metrics: correlation and mean squared
   error, both evaluated on the validation dataset. By optimizing these
   metrics concurrently, we ensure a balanced approach to model
   performance, focusing on both absolute predictive accuracy and the
   strength of the relationship between predicted and observed values. MO
   TPE leverages a Bayesian-inspired strategy that is initialized with the
   evaluation of a specified budget of random hyperparameters.

   For the models optimized using Optuna in our study, we adhered to a
   structured budget for evaluations. This included 100 initial random
   evaluations, serving as a broad exploration of the hyperparameter
   space. We then conducted an additional 200 evaluations, which were more
   focused or “greedy.” This approach, totaling 300 trials, is designed to
   strike a balance between exploring a wide range of possibilities and
   honing in on the most effective hyperparameters. We ensure transparency
   and reproducibility of our methodology by providing detailed
   information about the prior spaces for the tuned hyperparameters of
   each model.

Baselines from the literature

   Our investigation includes a benchmarking of various models that have
   been recognized as state-of-the-art in the literature^[278]7.

Kernel Ridge Regression (KRR)

   KRR emerges as a pivotal machine learning technique, offering a
   sophisticated blend of ridge regression’s regularization capabilities
   with the kernel trick’s ability to operate in higher-dimensional
   spaces. Fundamentally, KRR extends linear ridge regression by
   incorporating a kernel function, thus enabling the modeling of
   non-linear relationships without explicitly transforming data into a
   high-dimensional space. KRR stands as a benchmark alternative for
   feature selection, distinct from the approaches proposed via Large
   Language Models (LLMs). This method solely utilizes transcriptomic
   data, constructing a separate model for each drug without incorporating
   drug featurizations. We implemented KRR through the Scikit-Learn
   library.

Similarity-Regularized Matrix Factorization (SRMF)

   The Similarity-Regularized Matrix Factorization (SRMF) method^[279]7 is
   an innovative approach for predicting anticancer drug responses in cell
   lines. It leverages the inherent similarities between drugs and cell
   lines to enhance prediction accuracy. Specifically, SRMF incorporates
   chemical structure similarities of drugs and gene expression profile
   similarities of cell lines as regularization terms in the matrix
   factorization model. We implemented this model by modifying the
   original Matlab code available at
   ([280]https://github.com/linwang1982/SRMF).

Full joint models

   Following methodologies that are widely recognized in the literature,
   we developed multiple predictive pipelines intended to effectively
   extrapolate and harness meaningful information from both the drugs and
   the cell lines. Crucially, models in this chapter make use of the
   featurization described in the “Drugs and cells featurization” chapter.

Multi-layer perceptron

   We implemented a customizable multi-layer perceptron (MLP) model
   architecture. The model dynamically constructs its architecture based
   on specified hyperparameters, including the number of input features,
   the number of hidden layers, the number of neurons in each hidden
   layer, and the dropout rate to mitigate overfitting. Each hidden layer
   is normalized using batch normalization to enhance stability and
   employs the ReLU activation function to introduce non-linearity, with
   an optional dropout applied based on the specified rate. The MLP is
   optimized using the AdamW optimizer, leveraging hyperparameters such as
   learning rate and weight decay for regularization. Training involves a
   customizable loss criterion, defaulting to Mean Squared Error Loss for
   regression tasks, with early stopping implemented based on a patience
   parameter to prevent overfitting by halting training if the validation
   metric does not improve for a specified number of epochs. Optimal
   hyperparameters for this model are fixed following the procedure
   described in the “Validation and hyperparameter optimization” chapter.
   The full list of hyperparameters and the prior space of the
   hyperparameters is provided in the supplementary material. The MLP is
   implemented using the PyTorch library^[281]51.

XGBoost

   XGBoost^[282]52 is a highly efficient machine learning algorithm based
   on gradient boosting, using decision trees to iteratively improve
   predictions. It includes innovations like regularization to prevent
   overfitting, efficient tree pruning, scalable handling of missing data,
   and optimized splitting algorithms for large-scale datasets. Crucially,
   given the high dimensionality of our dataset, we leverage the GPU
   acceleration included in XGBoost from version 2.1.0. Optimal
   hyperparameters for this model are fixed following the procedure
   described in the “Validation and hyperparameter optimization” chapter.
   XGBoost is implemented through the official library
   ([283]https://xgboost.readthedocs.io/en/stable/).

Drug-by-drug models

   We delineate the methodology employed for developing a model to predict
   cancer cell line responses to drugs, focusing exclusively on
   transcriptomic data. The following methodologies were applied to both
   the GDSC and the PRISM dataset, resulting in the development of 286
   models for GDSC and 6337 models for PRISM. Crucially, we develop two
   different types of drug-by-drug models: all genes models and MOA-primed
   models.

All genes models

   Given the consistently superior performance of the XGBoost algorithm,
   as evidenced by our results and a multitude of studies in the
   literature on tabular data, we have selected it as our primary tool for
   developing drug-specific models. Details about this model type can be
   seen in the dedicated “XGBoost” chapter. For the PRSIM dataset, we
   found that the majority of trained models were characterized by overall
   low performances (Fig. [284]4A, blue curve and dots; median correlation
   ρ = 0.04), likely due to moderate and tissue-specific responses of
   cancer cell lines to the many non-oncological drugs present in PRISM.
   On the other hand, the drug models characterized by dispersed
   experimental log-fold changes (LFCs) in their dataset, i.e., with
   InterQuartile Range (IQR) greater than 1, are characterized by a
   substantially higher average performance (median ρ = 0.24,
   Fig. [285]4A, red curve), yielding a total of 713 out of 6337 drug
   models. Based on this result, we decided to restrict the downstream
   analysis only to those models with ρ > 0.2, as they represent most of
   those with IQR > 1 (Fig. [286]4A). A cutoff of ρ > 0.2 is moreover
   consistent with earlier modeling efforts on a previous release of the
   PRISM dataset^[287]5.

MOA-Primed models

   MOA-primed models differ from regular full-gene models in two key
   aspects: firstly, they are trained exclusively on a subset of genes
   identified as relevant to the drug’s MOA by the three criteria
   described in the chapter “MOA pathway annotation.” Reducing model
   training to this subset of genes leads to a significant reduction in
   the number of genes from 18.174 to an average of 4.117, achieving a
   factor of reduction of 4.4. Secondly, this narrower gene focus allows
   for a more comprehensive optimization process within a reasonable
   computation time. In more detail, the MOA-primed model consists of an
   ensemble of three XGBoost models (each comprising five parallel trees)
   trained on different subsets of the training data. This approach, along
   with the reduced number of genes used, greatly decreases the
   overfitting problem both during hyperparameter optimization and
   training. As for other models, optimal hyperparameters are fixed
   following the procedure described in the “Validation and hyperparameter
   optimization” chapter. Differently from other models, we leverage a
   3-fold cross-validation strategy during hyperparameter search.
   Crucially, each XGBoost model belonging to the ensemble is trained on a
   different train-validation split.

Model testing

Data splitting procedure

   To guarantee a robust evaluation of the predictive performance of our
   models downstream, we have implemented a strategic approach to
   partitioning the dataset. This partitioning is guided by the OncoTree
   classification system, which offers a detailed categorization of cell
   lines across different tissues involved in cancer. We selected two
   distinct cell lines from each tissue type specified in the OncoTree
   classification. This diverse selection ensures a representative
   cross-section of cancer types, aiding in the generalizability of the
   model. Subsequently, all drug-cell line pairs related to these selected
   cell lines were systematically assigned to either the training,
   validation, or test sets. Such an approach ensures that each set
   reflects a broad spectrum of tissue types and their corresponding
   responses. To ensure a reliable estimate of our trained models’
   performance, we repeated the train-validation-test procedure described
   earlier 20 times, each with a different random seed. We use random
   seeds from 0 to 19. The median values of the metrics obtained, along
   with their standard deviations, are reported.

Aggregated evaluation

   We conducted a comprehensive comparison of all trained models by
   calculating the Mean Squared Error (MSE) and Pearson correlation
   coefficient for the predicted values across all drug-cell line pairs
   (encompassing all drugs and cell lines) in both the GDSC and PRISM
   datasets. It is important to note that baselines, full joint models,
   and drug-by-drug models adhere to the same standardization schema
   outlined in the section “Data Pre-processing.” As such, before
   evaluating the metrics on the test set, we adjusted both the
   experimental and predicted values (Ys) to their original scales on a
   drug-by-drug basis. The cumulative predictions of the drug-by-drug
   models resulted from pooling forecasts made by individual models.

Drug specific evaluation

   We assessed the models’ local performances by computing the median and
   standard deviation of both MSE and Pearson correlation coefficient
   across 20 random splits for each drug-specific model. This evaluation
   is conducted without reverting the standardized data to its original
   scale. Crucially, this approach assessed a model’s predictive power
   around its mean activity value (either IC50 or LFC). Additionally, we
   derived a final summary metric by calculating the median and standard
   deviation of all these single-model performances, providing a concise
   overview of model efficacy in a drug-centric context.

Model interpretability

Importance computations

   We have adopted a stringent criterion to identify genes that are
   pivotal in our models’ predictions of cancer cell line responses to
   drugs. To determine the importance of genes, we utilized two distinct
   but complementary methods^[288]53: SHAP (SHapley Additive
   exPlanations)^[289]20 importance and Permutation Importance^[290]54. We
   imposed that a gene is relevant if both methods yielded importance
   values greater than zero. Notably, SHAP values provide insights into
   how each gene contributes to the model’s final prediction. This method
   effectively quantifies the impact of each variable (gene) in the
   context of the model’s decision-making process. On the other hand,
   Permutation Importance evaluates the influence of each gene on the
   overall performance of the model. This is achieved by measuring the
   degradation in model performance when the values of a gene are randomly
   shuffled, thereby disrupting the gene’s original relationship with the
   response variable. In the results discussed above we chose as reference
   metric for permutation importance a drop in correlation (we provide in
   the released code also computations for MSE). TreeSHAP, as implemented
   in the Python SHAP library, was employed to ascertain the importance of
   each gene ([291]https://shap.readthedocs.io/). Notably, this method
   offers a fast and exact computation of otherwise costly SHAP
   importances for tree-based models. Given the high dimensionality of our
   dataset, encompassing almost 19k genes, we designed a computational
   pipeline to expedite the computation of Permutation Importance.
   Initially, we employed XGBoost built-in feature importance methods
   based on impurity decrease, a standard computation in tree-based
   methods, to significantly narrow down the list of potential genes of
   interest. This approach allowed us to reduce the number of genes to be
   evaluated for importance by at least a factor of ten. To further
   accelerate the process, we batched multiple variable computations by
   designing a Numba-enhanced routine and then performed joint fast
   inference leveraging the GPU-accelerated implementation of XGBoost.
   Feature importance computations were conducted individually for each of
   the 20 random seeds. While the SHAP mean importance values were
   computed exactly, the permutation feature importances were derived by
   randomly shuffling a gene’s values and observing the resultant drop in
   model performance. A gene’s permutation importance, for a given drug
   and a fixed seed, is obtained by averaging importance values obtained
   across three independent repetitions of the shuffling procedure. This
   methodology allowed us to mitigate aleatoric uncertainties inherent in
   the shuffling process, ensuring a more reliable identification of genes
   that are genuinely influential in predicting drug responses in cancer
   cell lines.

Putative and pathway gene recovery

   In evaluating the soundness of our models, we investigated their
   ability to identify known putative targets and associated genes (i.e.,
   those within the same pathway as the putative targets). For each random
   seed, we selected genes with non-zero importance value (as outlined in
   the “Importance Computation” section) and then calculated the frequency
   with which each putative gene appears in this subset of significant
   genes. Additionally, we conducted a parallel analysis involving genes
   functionally linked to putative targets through pathways. For each
   random seed and corresponding putative target, we aggregated all genes
   from the pathways involving the targets. We then determined the overlap
   between this aggregated set and the significant genes, documenting both
   the quantity and specific identities of the genes identified. Similar
   to the approach for putative target identification, the rigor of the
   selection criteria can be adjusted by focusing on pathway genes that
   consistently emerged across various random seeds. We refined our
   evaluation by calculating the empirical distribution of recovery rates
   for each drug, starting with the importation and binarization of gene
   importance scores—SHAP values (converted to 1 if greater than 0) and
   permutation importances (set to 1 if less than 0). By intersecting
   genes with both metrics equal to one, we identified those of dual
   significance. A baseline distribution of recovery frequencies across
   splits was established by averaging these intersected, binarized
   importances. This drug-specific procedure allowed us to derive critical
   quantiles (90th, 95th) as benchmarks for significant recovery. We then
   compared the putative targets’ recovery rates to these percentiles.

Pathway enrichment

   We performed pathway enrichment on the list of important genes (see
   above section) identified by each All-Genes model using the GSEAPY (v
   1.0.6) package^[292]55. We performed an over-representation analysis
   using the enrichr function, considering as categories the
   “Reactome_2022” pathways from the MsigDB library^[293]56, version
   2023.1.Hs. We considered all the pathways with FDR < 0.1. To check if
   an enriched pathway represents a mechanism of action of the
   corresponding drug, we checked if it is connected to the MOA-pathways
   that we compiled for each drug by traversing the graph of Reactome
   pathway hierarchy using the python-igraph (v 0.9.10) library. If the
   enriched pathway is either a parent or a child of any MOA-pathway for
   that drug, it is considered to match the MOA for that drug. Enrichment
   analysis was performed for both GDSC and PRISM. Pathway enrichments
   obtained through this analysis are reported in Supplementary
   datas [294]4 and [295]10 for GDSC and PRISM respectively. Enrichment
   data obtained is also used to assess the proportion of MOA pathways
   that are overrepresented across the three criteria (refer to the “MOA
   Pathway Annotation” section). To generate Fig. [296]3D, we only
   considered the models exhibiting a correlation exceeding ρ > 0.5.

   We compared significantly enriched pathways on GDSC and PRISM models
   using the Python matplotlib_venn library
   ([297]https://github.com/konstantint/matplotlib-venn).

Clustering analysis of drug and MOA-related pathways

   To elucidate the intricate relationships between drug MOA and their
   biological pathways, we clustered pathway enrichments and applied
   stringent statistical filters to identify and visualize significant
   drug-pathway associations. We focused on drug models with a correlation
   greater than 0.5 and the pool of MOA pathways found enriched, with a
   FDR below 0.1, in at least one drug model (see “Pathway enrichment”).
   For the clustering approach, we employed hierarchical clustering to
   organize both the drugs and the MOA pathways based on their patterns of
   association. This method allows for the grouping of drugs with similar
   pathway profiles and, conversely, pathways commonly influenced by a
   cohort of drugs. The clustering was conducted using the Ward method
   within the clustermap function from the seaborn python library
   ([298]https://seaborn.pydata.org/, v0.12.2). The Ward method minimizes
   variance within clusters, thereby enhancing the interpretability of
   complex relationships by showcasing clusters of drugs and pathways with
   similar biological effects.

Correlating IC50s, predicted IC50s, Gene Expressions and SHAP values

   We aimed to elucidate the relationship between IC50 values, predicted
   IC50s, gene expressions, and SHAP values to better understand the
   dynamics of drug response in cell lines. For a given drug, we sorted
   the IC50 values from the lowest to the highest. This sorting strategy
   enabled simultaneous comparison of actual versus predicted IC50 values,
   gene expression levels, and SHAP values corresponding to the gene
   expression across cell lines. The result of this analysis is depicted
   in Fig. [299]2C, specifically focusing on the interaction between the
   Venetoclax drug and the BCL2 gene. This figure serves to visually
   convey the complex interplay and correlations among the variables of
   interest, providing insights into the predictive model’s predictions
   and the gene’s influence on drug sensitivity. To improve the graphical
   representation and clarity of trends within the gene expression and
   SHAP value data, both datasets were subjected to smoothing using a
   5-lag moving window.

Gene essentiality analysis

   We assessed the potential of local SHAP values, representing gene
   impacts on drug-cell line pair predictions, to identify genes involved
   in gene essentiality, as evidenced by CRISPR knock-out screenings.
   Utilizing a dataset from Pacini et al.^[300]23, which lists essential
   genes across tissues and CRISPR Gene dependencies (see “Datasets”), we
   tailored this analysis to include only those cell lines and tissues
   mentioned therein. This required harmonizing tissue nomenclature
   differences between our dataset and Pacini et al., notably merging
   Esophagus and Stomach categories into a single group and mapping other
   tissues’ names to match our dataset’s structure (the mapping will be
   released together with the code to reproduce the analysis). Our
   analysis was further refined to drug-cell line pairs exhibiting the
   highest sensitivity, based on IC50 values and quantile score (see
   “Quantile Score”). This results in retaining 266 unique drugs and
   selecting 43,174 “top pairs.” For these pairs, we examined the top k
   most negative SHAP values to identify influential genes, with k set at
   10, 20, 50, and 100 thresholds. We remark on how negative SHAP values
   indicate genes whose inhibition likely enhances drug efficacy, aligning
   with the biological outcomes expected from CRISPR knockouts that lead
   to reduced cell viability. By aggregating these key genes and comparing
   them against the catalog of essential genes from the reference genes,
   we aimed to ascertain the extent to which local importance estimates
   could recover genes critical for cell survival and implicated in gene
   essentiality. We constructed protein-protein interaction networks using
   STRING^[301]57 within the Cytoscape platform^[302]58. Nodes within the
   network were annotated with degree information and a cumulative SHAP
   value pooled from the set SHAP values over ‘top pairs.’

Responsiveness similarity analysis

   After GDSC model training, we characterize cell lines through a
   286-dimensional vector representation derived from the aggregated
   predictions of these models. This defines a “response space” which
   complements transcriptomic similarity analyses and can be used to
   compare samples based on their drug responsiveness similarity. This
   approach enables two key applications: clustering of cell lines to
   identify groups with similar drug sensitivities and the use of the
   Python library faiss for efficient nearest neighbor searches in the
   response space.

Model inference on TCGA

Quantile score

   IC50 values are a commonly used measure of drug potency. However, raw
   values may not fully capture the nuances of drug efficacy and
   specificity, particularly for cytotoxic drugs, which could feature
   systematically lower IC50 levels. We propose a scoring system aimed at
   providing a more comprehensive assessment of a drug’s potential.

   Our approach integrates two dimensions: drug efficacy and drug
   specificity. Drug efficacy is defined as the likelihood of identifying
   an alternative drug within the dataset that possesses a higher IC50
   value than the current drug in question. This measure reflects the
   drug’s ability to inhibit or kill cancer cells at lower concentrations,
   thereby indicating its potency. On the other hand, drug specificity
   evaluates the probability of finding another cell line upon which the
   given drug exhibits a higher IC50 value. This dimension captures the
   drug’s targeted action, ensuring that it effectively inhibits the
   growth of specific cancer cells with minimal effects on others, thereby
   reducing potential side effects.

   To synthesize these dimensions into a singular, coherent score, we
   employed the harmonic mean of the two probabilities. The harmonic mean,
   a type of average, is particularly suited for our purpose because it
   tends to skew towards the lower of its inputs, ensuring that both high
   efficacy and high specificity are necessary for a drug to achieve a
   high score. As such, our metric ranges between 0 and 1, where a score
   closer to 1 indicates a drug that is both effective and specific. This
   property of the harmonic mean serves to penalize drugs that are highly
   effective against a broad range of cell lines (potentially indicating
   high toxicity) or highly specific but with limited overall efficacy.
   [MATH:
   <mi>S</mi><mi>p</mi><mi>e</mi><mi>c</mi><mi>i</mi><mi>f</mi><mi>i</mi><
   mi>c</mi><mi>i</mi><mi>t</mi><mi>y</mi><mo>=</mo><mi>P</mi><mfenced
   close=")"
   open="("><mrow><msub><mrow><mi>Y</mi></mrow><mrow><mi>d</mi></mrow></ms
   ub><mspace width="0.25em"></mspace><mo>></mo><mspace
   width="0.25em"></mspace><msubsup><mrow><mi>y</mi></mrow><mrow><mi>d</mi
   ></mrow><mrow><mi>c</mi></mrow></msubsup></mrow></mfenced> :MATH]
   1
   [MATH:
   <mi>E</mi><mi>f</mi><mi>f</mi><mi>i</mi><mi>c</mi><mi>a</mi><mi>c</mi><
   mi>y</mi><mo>=</mo><mi>P</mi><mrow><mo>(</mo><mrow><msub><mrow><mi>Y</m
   i></mrow><mrow><mi>c</mi></mrow></msub><mspace
   width="0.25em"></mspace><mo>></mo><mspace
   width="0.25em"></mspace><msubsup><mrow><mi>y</mi></mrow><mrow><mi>d</mi
   ></mrow><mrow><mi>c</mi></mrow></msubsup></mrow><mo>)</mo></mrow>
   :MATH]
   2
   [MATH: <mi>Q</mi><mi>S</mi><mo>=</mo><mfrac><mrow><mn>2</mn><mspace
   width="0.25em"></mspace><mo>*</mo><mrow><mo>(</mo><mrow><mi>S</mi><mi>p
   </mi><mi>e</mi><mi>c</mi><mi>i</mi><mi>f</mi><mi>i</mi><mi>c</mi><mi>i<
   /mi><mi>t</mi><mi>y</mi><mo>*</mo><mi>E</mi><mi>f</mi><mi>f</mi><mi>i</
   mi><mi>c</mi><mi>a</mi><mi>c</mi><mi>y</mi></mrow><mo>)</mo></mrow></mr
   ow><mrow><mi>S</mi><mi>p</mi><mi>e</mi><mi>c</mi><mi>i</mi><mi>f</mi><m
   i>i</mi><mi>c</mi><mi>i</mi><mi>t</mi><mi>y</mi><mo>+</mo><mi>E</mi><mi
   >f</mi><mi>f</mi><mi>i</mi><mi>c</mi><mi>a</mi><mi>c</mi><mi>y</mi></mr
   ow></mfrac> :MATH]
   3

TCGA inference and cancer-specific drug prescription validation

   We aimed to validate our models using the TCGA dataset. This allowed us
   to assess whether the proposed pipeline can accurately identify
   tumor-specific clinical drug prescriptions annotated in the GDSC
   through the use of processed NCI indications (refer to the “Dataset”
   section). A key step in our approach was aligning TCGA data with CCLE
   using the Celligner method (detailed in “Alignment of patients’ bulk
   RNAseq with Celligner”), which enabled model training on CCLE data and
   subsequent deployment to TCGA. Moreover, an important part of this
   validation was to stratify TCGA patients by cancer type, utilizing the
   extensive metadata available within the TCGA dataset. Our analysis
   encompassed 9,805 TCGA samples for 41 clinically relevant drugs, with
   models’ predictions ranked by logIC50 values and quantile scores.

   We have performed additional analyses to validate the statistical
   significance of our model’s predictions on TCGA dataset and defined an
   optimal number of k sample to consider for downstream analysis. First,
   we assessed for different top k samples the proportions of TCGA samples
   predicted for a drug prescribed for a specific cancer type, relative to
   the baseline frequency of the corresponding cancer type on TCGA. We
   assessed the extent of the model to predict samples from prescribed
   cancer types via classical machine learning metrics, considering as
   negatives all the cancer types for which a given drug is not
   prescribed. We proceeded as follows:
    1. We fixed a number k of TCGA samples to prioritize starting from our
       model’s predictions. Specifically, we selected the union of the k
       predictions with the lowest predicted IC50 values and the k
       predictions with the highest quantile scores.
    2. We computed the relative proportions of tumor types in the entire
       TCGA dataset to obtain a baseline proportions vector.
    3. For each drug, we calculated the relative proportions of tumor
       types within the extracted samples.
    4. For each drug, we compared the predicted proportions with the
       baseline distribution. Given that we have ground truth indications
       for several drugs (considering as negatives all the cancer types
       for which a given drug is not prescribed):
          + True Positives (TP): Tumor types where the drug is actually
            prescribed and are relatively more abundant in the predicted
            proportions.
          + False Positives (FP): Tumor types where the drug is not
            prescribed but are predicted as more abundant.
          + True Negatives (TN): Tumor types where the drug is not
            prescribed and are not predicted as more abundant.
          + False Negatives (FN): Tumor types where the drug is prescribed
            but are not predicted as more abundant.
       This classification allows us to construct a 2 × 2 confusion matrix
       for each drug.
    5. Using the confusion matrix, we compute standard performance metrics
       including accuracy, precision, recall, specificity, and F1-score.

   We repeated the analysis over a range of k values, from 100 to 2500 in
   increments of 100. As expected, we found that higher k Values led to
   higher recall rates in both analyses due to the inclusion of more
   samples, but resulted in lower precision. On the other hand, lower k
   values offered higher precision by focusing on the most confident
   predictions, but reduced recall. We determined that k = 600 is the best
   tradeoff, since it maximized the F1-score, effectively balancing
   precision and recall.

   A similar analysis is carried out also on the PRISM dataset. We first
   ranked the models by performance, then curated a list of the top 20
   non-oncological drug models. For each model, predictions were ordered
   by Log Fold Change (LFC) from lowest to highest, and the initial 600
   samples were selected. These samples were stratified by tumor type
   within the TCGA patient dataset to identify potential drug repurposing
   opportunities.

   In conclusion, utilizing the same pipeline, we identified and
   quantified patients within the TCGA dataset where specific drug pairs
   are concurrently recommended, indicating potential synergies. We
   corroborated our findings by cross-referencing manually with the
   approved drug combinations in DrugBank^[303]59, successfully
   identifying both known and approved combinations.

Experimental validation on PDAC samples

Ranking PDAC cell lines for selected tumor types with Celligner

   To determine which PDAC cell lines most closely resemble the tumor
   types (Pancreatic adenocarcinoma, Esophagogastric adenocarcinoma,
   Invasive breast carcinoma and head and neck squamous carcinoma) derived
   from Fig. [304]7A, Celligner
   ([305]https://depmap.org/portal/celligner/) was used. For each of these
   tumors, we ranked and selected PDAC cell lines based on their Euclidean
   distance from that tumor type. The Euclidean distance was calculated as
   the average distance between a cell line and all TCGA tumor samples of
   that tumor type.

Cell lines, culture and treatments

   The following human PDAC cell lines were used: CFPAC1 (KRAS G12V; ATCC
   CRL-1918) and PANC1 (KRAS G12D; ATCC CRL-1469), were used for the
   following experiments. CFPAC1 cells were maintained in Iscove’s
   Modified Dulbecco’s Medium (IMDM; Euroclone) + 10% fetal bovine serum
   (FBS) while PANC1 were maintained in Dulbecco’s Modified Eagle Medium
   (DMEM; Euroclone) + 10% fetal bovine serum (FBS). Media were all
   supplemented with 2 mM L-glutamine. The cell line was authenticated by
   the IEO Tissue Culture Facility using the GenePrint10 System (Promega)
   and was routinely screened for Mycoplasma contamination.Irinotecan
   (Sigma-Aldrich, I1406) and Etoposide (Sigma-Aldrich, E1383) were
   diluted in DMSO and used at 3 different concentrate ions for the cell
   viability experiments. No commonly misidentified cell lines were used
   in the study.

Cell viability assays

   5–10,000 cells per well were plated in 96-well plates. One day after,
   cells were treated with the drug (Irinotecan or Etoposide). The cell
   viability was measured after 72 h of treatment using CellTiter-Glo
   Luminescent Cell Viability Assay (Promega, G9242) and GloMax®
   (Promega). The assay was performed three times in triplicates.

Experimental validation on GBM samples

Human Subjects

   The research was conducted in compliance with the principles outlined
   in the Declaration of Helsinki, and the protocol for sample collection
   received approval from the Ethics Committee of the University Hospital
   of Pisa (787/2015).

   Tumor specimens were obtained from 64 patients who underwent surgical
   resection of histologically confirmed GBM after providing informed
   consent. Samples were acquired from the Unit of Neurosurgery of Livorno
   Civil Hospital. All patients had a GBM diagnosis without a prior
   history of brain neoplasia and did not exhibit R132 IDH1 or R172 IDH2
   mutations. Resected tumors were preserved in MACS tissue storage
   solution (Miltenyi Biotec, Bergisch Gladbach, Germany, cod.
   130-100-008) at 4 °C for 2–4 h. Each tumor specimen was rinsed with
   Dulbecco’s phosphate-buffered saline (DPBS) (Gibco, Carlsbad, CA, USA,
   cod. 14190094) within a sterile dish and divided into ~0.5–1 mm^2
   pieces under a biological hood. Biopsies were cryopreserved in 90%
   fetal bovine serum (FBS, Gibco, Carlsbad, CA, USA, cod. 26140079) and
   1% dimethyl sulfoxide (DMSO, Invitrogen, Waltham, USA, cod.
   [306]D12345) at −140 °C until further preparations. We replaced the
   exact age of the patients with a categorical variable indicating
   whether the patient’s age is above or below the threshold of 55 years.
   This threshold is widely used in clinical diagnostics for glioblastoma
   and maintains the clinical relevance of the data while eliminating the
   possibility of identifying individuals. The GBM samples were collected
   after the participants signed the informed consent form. The sex of the
   participants was determined based on self-report All clinical and
   molecular data of GBM samples are provided in Supplementary
   data [307]17.

RNA Isolation

   RNA extraction from GBM tissues was performed using the Maxwell 16 LEV
   Simply RNA Tissue Kit (Promega, Madison, WI, USA, cod. AS1270)
   according to the manufacturer’s protocol. The concentration of RNA was
   assessed using the Qubit Fluorometer (Thermo Fisher Scientific,
   Waltham, MA, USA, cod. [308]Q10210), and quality was evaluated using
   the Agilent 2200 Tapestation (Agilent Technologies, Santa Clara, CA,
   G2964AA) system.

Whole Transcriptome RNA Analysis Libraries

   RNA-Seq was performed on the NextSeq 500 platform (Illumina, San Diego,
   CA, USA). Libraries were prepared using the Illumina Stranded Total RNA
   Prep with Ribo-Zero Plus kit (Illumina, cod. 20040525), starting from
   200 ng of total RNA, following the manufacturer’s protocol. The
   libraries were quantified using Qubit reagents (Thermo Fisher
   Scientific, Waltham, MA, USA, cod. [309]Q10210) and analyzed for
   validation using TapeStation (Agilent Technologies, Santa Clara, CA,
   G2964AA). Up to 10 libraries were loaded onto the NextSeq High Output
   cartridge (150 cycles) (Illumina, San Diego, CA, USA, cod. 20024908).

Human Subjects for GBM Primary Cell Cultures

   Tumor specimens were obtained from 2 patients who underwent surgical
   resection of histologically confirmed GBM after providing informed
   consent. Samples were acquired from the Unit of Neurosurgery of Livorno
   Civil Hospital. All patients had a GBM diagnosis without a prior
   history of brain neoplasia and did not exhibit R132 IDH1 or R172 IDH2
   mutations. The patients included one female and one male, aged 37 and
   76 years, respectively. Resected tumors were preserved in MACS tissue
   storage solution (Miltenyi Biotec, Bergisch Gladbach, Germany, cod.
   130-100-008) at 4 °C for 2–4 hours. Each tumor specimen was rinsed with
   Dulbecco’s phosphate-buffered saline (DPBS) (Gibco, Carlsbad, CA, USA,
   cod. 14190094) within a sterile dish and divided into ~ 0.5–1 mm^2
   pieces under a biological hood. Biopsies were cryopreserved in 90%
   fetal bovine serum (FBS; Gibco, Carlsbad, CA, USA, cod. 26140079) and
   1% dimethyl sulfoxide (DMSO; Invitrogen, Waltham, USA, cod.
   [310]D12345) at −140 °C until further preparations.

GBM primary cell cultures

   Patient-derived GB living tissues were dissociated into single-cell
   suspensions using mechanical dissociation combined with enzymatic
   degradation, utilizing the Brain Tumor Dissociation Kit (Myltenyi
   Biotech, Bergisch Gladbach, Germany, cod. 130-095-942), following the
   manufacturer’s instructions. The culture medium consisted of DMEM/F12
   medium (Gibco, Carlsbad, CA, USA, cod. 12634010), supplemented with 10%
   FBS (Gibco, Carlsbad, CA, USA, cod. 26140079), 100 U/mL penicillin, and
   0.1 mg/mL streptomycin (Gibco, Carlsbad, CA, USA, cod. 15070063), 1%
   Amphotericin B (Gibco, Carlsbad, CA, USA, cod. 15290026), 1% G-5
   supplement (Gibco, Carlsbad, CA, USA, cod. 17503012), and 1% Glutamax
   (Gibco, Carlsbad, CA, USA, cod. 35050061). Cultures were maintained in
   a 5% CO[2] atmosphere at 37 °C. The medium was replaced the day after
   culture. Upon reaching confluency, cells were seeded for viability
   assays, WST-1, and Crystal Violet, as described below. No commonly
   misidentified cell lines were used in the study.

Treatments of GBM primary cell cultures

   AZD5591 (Aurogene S.r.l, Rome, Italy, cod. S8643) and AZD5582 (Sigma,
   St. Louis, MI, USA, cod. SML2900) were dissolved in DMSO (Invitrogen,
   Waltham, USA, cod. [311]D12345) and water, respectively, to create
   stock concentrations of 200 mM and 4.8 mM, respectively. Dilutions of
   these drugs for treatment were prepared with the cell medium. We used
   concentrations of 2, 5, 10, 20, 50, and 100 uM of AZD5991 or AZD5582,
   or an equivalent volume of vehicle (DMSO or water) for control groups.

Viability and cytotoxicity assays

   We used the Crystal Violet (CV) assay to evaluate cytotoxicity of the
   same chemicals on the 2 different cell types. For this assay, 2500
   cells were seeded in 48-well plates (3 wells per experimental
   condition). After 72 h post-treatment, cells were fixed with 4% PFA
   (Thermo Fisher Scientific, Waltham, MA, USA, cod. Q10210cod. J19943.K2)
   and stained using a crystal violet solution (0.1% crystal violet, 20%
   methanol, in water). Following staining, the excess crystal violet was
   washed with tap water, and plates were dried. Cells were de-stained
   using a 10% acetic acid solution, and the absorbance of the solution
   was then measured at 590 nm. IC50 values were computed using python’s
   SciPy, library reimplementing the procedure of Corsello (2020).

Software

   We employed customized scripts in python (version 3.8.11), using
   matplotlib (v3.6.0), seaborn (v0.11.1), and biopython (v1.78)
   libraries.

Reporting summary

   Further information on research design is available in the [312]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [313]Supplementary Information^ (1.7MB, pdf)
   [314]Peer Review File^ (3.6MB, pdf)
   [315]41467_2025_56827_MOESM3_ESM.pdf^ (87KB, pdf)

   Description of Additional Supplementary Files
   [316]Supplementary Data 1–17^ (42.3MB, zip)
   [317]Reporting Summary^ (736.9KB, pdf)

Source data

   [318]Source Data^ (17.8MB, xlsx)

Acknowledgements