Abstract Cancer dependency maps have accelerated the discovery of tumor vulnerabilities that can be exploited as drug targets when translatable to patients. The Cancer Genome Atlas (TCGA) is a compendium of ‘maps’ detailing the genetic, epigenetic and molecular changes that occur during the pathogenesis of cancer, yet it lacks a dependency map to translate gene essentiality in patient tumors. Here, we used machine learning to build translational dependency maps for patient tumors, which identified tumor vulnerabilities that predict drug responses and disease outcomes. A similar approach was used to map gene tolerability in healthy tissues to prioritize tumor vulnerabilities with the best therapeutic windows. A subset of patient-translatable synthetic lethalities were experimentally tested, including PAPSS1/PAPSS12 and CNOT7/CNOT78, which were validated in vitro and in vivo. Notably, PAPSS1 synthetic lethality was driven by collateral deletion of PAPSS2 with PTEN and was correlated with patient survival. Finally, the translational dependency map is provided as a web-based application for exploring tumor vulnerabilities. Subject terms: Cancer genomics, Machine learning, Target identification __________________________________________________________________ Shi et al. present a hybrid dependency map based on machine-learning analysis of gene essentiality data from the DEPMAP database, translated to data from TCGA. This application can be used to visualize other gene essentiality data. Main The rapid expansion of genomic technologies to characterize healthy and diseased patient populations has provided unprecedented resolution to the pathophysiological drivers of cancer and many other diseases. In 2018, TCGA completed a 10-year study of 33 tumor types across ~11,000 patients, which has broadly illuminated the genetic underpinnings of cancer^[58]1. Building on the success of TCGA, multiple other initiatives have been launched to explore aspects of cancer initiation, evolution, metastasis and response to therapy^[59]2–[60]6, with the hope that the deepening molecular characterization of cancer will improve diagnosis, treatment and prevention; however, a critical step toward fully leveraging patient data to eradicate cancer is to assign functionality to the observations made in TCGA that translate putative tumor dependencies to life-saving therapies. One approach to understanding tumor dependencies is through genome-wide genetic and chemical perturbation datasets (for example, DEPMAP^[61]7,[62]8, Project SCORE^[63]9 and Connectivity Map^[64]10) that have been paired with thousands of deeply characterized cancer models (for example, Cancer Cell Line Encyclopedia^[65]11, Cancer Cell Line Factory^[66]12 and Human Cancer Models Initiative^[67]13). Multiple studies have demonstrated the ability of DEPMAP to translate gene essentially to therapeutic targets^[68]14–[69]18 and a broader functional understanding of tumor dependencies^[70]19,[71]20. Compared to TCGA, a differentiating strength of the ‘dependency maps’ is that hypotheses can be readily tested, replicated and refined in different contexts, whereas patient datasets are typically not amenable to functional experimentation; however, the dependency maps also pose limitations when compared to the translatability of TCGA, as homogeneous cell lines in culture dishes do not replicate the pathophysiological complexities of the intact tumor microenvironment^[72]21. Further, the current experimental models do not completely recapitulate the genetic drivers that are present in the patient population^[73]22, and experimental outcomes of genetic perturbation screens do not capture most aspects of disease outcome and patient survival. To address the unique challenges posed by TCGA and DEPMAP, we built a hybrid dependency map (TCGA[DEPMAP]) via machine learning of gene essentiality in the cell-based DEPMAP that was then translated to TCGA patient tumors. As such, TCGA[DEPMAP] leverages the experimental strengths of DEPMAP, while enabling patient-relevant translatability of TCGA. A systematic analysis of TCGA[DEPMAP] revealed tumor vulnerabilities that predicted treatment response and patient outcomes, including lineage dependencies, oncogenes and synthetic lethalities. The flexible machine-learning framework was also used to assemble maps that captured other aspects of patient-relevant features, including translating dependencies to drug responses in the Patient-Derived Xenograft (PDX) Encyclopedia (PDXE[DEPMAP]) and tolerability within healthy tissues of the Genotype-Tissue Expression project (GTEX[DEPMAP]). Combined with a user-friendly and freely available web-based application, these data provide a resource for identifying patient-relevant tumor vulnerabilities that can be exploited as drug targets. Results Predictive modeling of gene essentiality To begin building the translational dependency maps, predictive models of gene essentiality were trained on genome-wide CRISPR-Cas9 knockout screens from the DEPMAP^[74]8 using elastic-net regularization for feature selection and modeling^[75]23 (Fig. [76]1a). Genome-wide gene essentiality scores for DEPMAP cancer cell models (n = 897) were estimated by CERES^[77]24, which measures the essentiality of each gene relative to the distribution of effect sizes for common essential and nonessential genes within each cell line^[78]25. Because many genes do not impact cell viability, elastic-net models were attempted only for genes with at least five dependent and nondependent cell lines, which included 7,260 out of 18,119 genes (40%) with gene essentiality scores in the DEPMAP. In addition to gene essentiality scores, the input variables for elastic-net predictive modeling included genome-wide gene expression, mutation and copy number profiles for each cancer cell model. Based on previous evidence that predictive modeling of gene essentiality with RNA expression performed comparably to similar modeling that also included DNA features^[79]26,[80]27, two sets of elastic-net models were compared using RNA alone (expression only) or combined with mutation and copy number profiles (multi-omics). Finally, the best fitting elastic-net models were selected by a tenfold cross-validation to identify models with the minimum error, while balancing the predictive performance with the number of features selected ([81]Methods). Fig. 1. Predictive modeling of gene essentiality in the DEPMAP. [82]Fig. 1 [83]Open in a new tab a, Schematic of the elastic-net models for predictive modeling of gene essentiality in the DEPMAP using expression-only data or multi-omics data. Note the broad overlap in cross-validated models using expression-only or multi-omics data. b, Distribution of the number features per multi-omics model. c, Distribution of the number of features per expression-only model. d, Number of features per multi-omics model that passed (n = 2,045) or failed (n = 5,215) cross-validation based on a correlation coefficient of 0.2 threshold. e, Number of features per expression-only model that passed (n = 1,966) or failed (5,294) cross-validation based on a correlation coefficient of 0.2 threshold. For d and e, the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the fifth and 95th percentiles. f, Rank of the target gene (self) as a feature in the cross-validated multi-omics models. g, Rank of the target gene (self) as a feature in the cross-validated expression-only models. h, Comparison of model performance (correlation coefficients) of cross-validated models from multi-omics and expression-only data. Note for b–h that the performance and characteristics of multi-omics and expression-only models are very similar. P values indicated on graphs were determined by the Wilcoxon rank-sum test for two-group comparison (d and e). [84]Source data The elastic-net models for predicting essentiality of the 7,260 genes (as described above) were compared by tenfold cross-validation (Pearson’s r > 0.2; false discovery rate (FDR) < 1 × 10^−3) when considering expression-only or multi-omics data as input variables (Supplementary Tables [85]1 and [86]2). The distribution of features per model skewed higher in the multi-omics models (3–510 features, median of 98) (Fig. [87]1b) compared to the expression-only models (3–369 features, median of 80) (Fig. [88]1c) and the performance of both improved with the number of features per model (Fig. [89]1d,e). Of the 7,260 models, cross-validation confirmed 1,966 expression-only models and 2,045 multi-omics models, of which most cross-validated models overlapped (n = 1,797) (Supplementary Table [90]3). The incidence of self-inclusion of the target gene in the cross-validated models was also similar between multi-omics dataset (31% of models) (Fig. [91]1f) and expression-only dataset (26% of models) (Fig. [92]1g). The majority of cross-validated models (76%) performed comparably (within a correlation coefficient of 0.05) using either expression-only or multi-omics data. Likewise, 86 out of 103 annotated oncogenes (84%) with cross-validated models performed similarly using either expression-only or multi-omics datasets (for example, HER2, BRAF and PIK3CA), with a few notable examples that included the oncogenes: NRAS, FLT3 and ARNT (Fig. [93]1h and Extended Data Fig. [94]1a–e). Collectively, these data demonstrate that predictive models of gene essentiality with expression-only (Supplementary Table [95]1) and multi-omics (Supplementary Table [96]2) data as input variables perform comparably in detecting selective vulnerabilities of cancer in most cases (Supplementary Table [97]3). Extended Data Fig. 1. The characteristics of gene essentiality models before and after transcriptional alignment cell models and patient tumor biopsies. [98]Extended Data Fig. 1 [99]Open in a new tab (a) The performances of expression-only and multi-omics models of gene essentiality were compared across 103 annotated oncogenes. Note the strong correlation of expression-only and multi-omics models with a few notable outliers, such as NRAS, FLT3 and ARNT. (b) The distribution of the number of features for the multi-omics models for the 103 annotated oncogenes. (c) The number of features per multi-omics model for the 103 annotated oncogenes that passed (n = 95) or failed (n = 102) cross-validation. (d) The distribution of the number of features per expression-only models for the 103 annotated oncogenes. (e) The number of features per expression-only model for the 103 annotated oncogenes that passed (n = 101) or failed (n = 96) cross-validation. Note similarities in the characteristics and performances of multi-omics and expression-only models, and that only 7% of the multi-omics models significantly outperformed the expression-only models in the cross-validation while 84% were comparable when applying a cutoff of 0.05 correlation coefficient difference between models as a meaningful improvement in performance. As a reference using the same criteria 15% of multi-omics models outperformed expression-based models and 76% were comparable when we used the whole set of 2,211 models. (f, g) The heatmaps show the Pearson correlation between the gene expression of DepMap and TCGA before (f) and after (g) expression alignment by identification and removal of the most variant signatures (cPC1–4; that is, stromal signatures) before elastic-net ML. The rows are TCGA lineages and columns are DepMap lineages. (h) Shows that the correlation of expression for the same lineage (n = 22) in TCGA and DepMap is significantly improved by our expression alignment pipeline. (i) Comparison of expression-only elastic-net models for gene essentiality and gene mutational status (n = 890). To make performance metrics (AUC) comparable with binary mutational status, the essentiality scores were binarized using a –0.5 essentiality score as a cutoff. To calculate the accuracy of predicting dependencies and mutations, elastic-net machine learning was run to predict mutations and essentiality using the same settings and expression data for 891 genes with mutations at >2% prevalence in TCGA[DEPMAP] patients. Of note, the elastic-net models were allowed to select the most informative predictive features for mutation and essentiality for each gene, as the best predictors for essentiality may not be the best features to predict mutation. For (C,E,H,I), the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the 5th and 95th percentiles. The two-sided Wilcoxon rank test was used for (C,E,H) and for (I) ****P < 0.0001 by Student unpaired t-test. [100]Source data Constructing TCGA[DEPMAP] TCGA[DEPMAP] was built using the expression-only elastic-net models of gene essentiality, based on the evidence here (Fig. [101]1) and elsewhere^[102]26,[103]27 that the performance of most models was comparable to those including genomic features. Moreover, as genetic information is withheld from the expression-only elastic-net models, the transposed essentiality scores can be correlated with genetic drivers in TCGA[DEPMAP] patients who might otherwise be missed in cancer cell models. Finally, expression-based predictive modeling of essentiality can also be extended to non-oncological studies (for example, GTEX), which do not have somatic mutations and copy number changes^[104]28. As outlined in Fig. [105]2a, the expression-based predictive models of DEPMAP dependencies were transposed using the transcriptomic profiles of 9,596 TCGA patients, following alignment to account for differences between the expression profiles of cell lines and tumor biopsies with varying stromal content. The importance of transcriptional alignment was evident from the strong correlation of the 1,966 cross-validated gene essentiality models with the tumor purity of TCGA samples (Fig. [106]2b). To overcome this issue, expression data from DEPMAP and TCGA were quantile normalized and transformed by contrastive principal-component analysis (cPCA), which is a generalization of the PCA that detects correlated variance components that differ between two datasets. The removal of the top four principal components (cPC1–4) between the DEPMAP and TCGA transcriptomes significantly reduced the correlation of tumor dependencies with tumor purity (Fig. [107]2b) and improved the alignment of the expression-based dependency models (Fig. [108]2c,d and Extended Data Fig. [109]1f–h). Enrichment analysis of gene essentiality scores with correlation coefficients that changed the most between the pre- and post-aligned models revealed a significant enrichment of pathways related to the stroma (Supplementary Table [110]4). Combined, these data demonstrate that without transcriptional alignment, the predicted gene essentialities in patient samples were strongly correlated with tumor purity, which should not be the case when one considers that these dependency models were generated using cultured cancer cell lines without stroma. Fig. 2. Building a translational dependency map: TCGA[DEPMAP]. [111]Fig. 2 [112]Open in a new tab a, Schematic of gene essentiality model transposition from DEPMAP to TCGA, following alignment of genome-wide expression data to account for differences in homogeneous cultured cell lines and heterogenous tumor biopsies with stroma. b, Coefficient of determination (R^2) of the cross-validated gene essentiality models and tumor purity before (n = 1,966) and after transcriptional alignment (n = 1,966). The center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the fifth and 95th percentiles. A two-sided Wilcoxon rank-sum test was performed to test for statistical significance. c, Uniform Manifold Approximation and Projection (UMAP) visualization of normalization of genome-wide transcriptomes improves alignment between cultured cells and patient tumor biopsies with contaminating stroma. d, Correlation coefficients of essentiality profiles of different lineages of cultured cell models and TCGA patient tumors. e, Unsupervised clustering of predicted gene essentiality scores across TCGA[DEPMAP] revealed strong lineage dependencies. Blue indicates genes with stronger essentiality and red indicates genes with less essentiality. f, KRAS dependency was enriched in TCGA[DEPMAP] lineages (n = 9,593) with high frequency of KRAS GOF mutations, including colon adenocarcinoma (COAD), LUAD, STAD, READ, esophageal carcinoma (ESCA) and PAAD. g, KRAS essentiality correlated with KRAS mutations in all TCGA[DEPMAP] lineages (n = 532 for KRAS^mut and n = 7,049 for KRAS^wt). h, BRAF dependency in TCGA[DEPMAP] (n = 9,593) was enriched in SKCM, which has a high frequency of GOF mutations in BRAF. i, BRAF essentiality correlated with BRAF mutations in all TCGA[DEPMAP] lineages (n = 559 for BRAF^mut and n = 7,022 for BRAF^wt). For f–i, the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the fifth and 95th percentiles. For g–i, a two-sided Wilcoxon rank-sum test was performed to test for statistical significance. j, Scatter-plot of model selectivity in TCGA[DEPMAP] and DEPMAP, as determined by normality likelihood (NormLRT). k, Ranking of model selectivity between in TCGA[DEPMAP] and DEPMAP, as determined by the NormLRT scores. ***P < 0.001, as determined by the Wilcoxon rank-sum test for two-group comparison and Kruskal–Wallis followed by Wilcoxon rank-sum test with multiple test correction for the multi-group comparison. CNS, central nervous system; PNS, peripheral nervous system; ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; CESC, cervical and endocervical cancers; CHOL, cholangiocarcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LGG, lower-grade glioma; LIHC, liver hepatocellular carcinoma; MESO, mesothelioma; OV, ovarian serous cystadenocarcinoma; PRAD, prostate adenocarcinoma; SARC, sarcoma; TGCT, testicular germ cell tumors; THCA, thyroid carcinoma; THYM, thymoma; UCEC, uterine corpus endometrial carcinoma; UCS, uterine carcinosarcoma; UVM, uveal melanoma. [113]Source data To further benchmark the accuracy of TCGA[DEPMAP], we tested whether gene essentiality in patient tumors could predict tumor lineages and oncogene dependencies, as has been reported in the cell-based dependency maps^[114]8. The predicted negative values indicate higher predicted essentiality. Unsupervised clustering of gene essentialities across TCGA[DEPMAP] revealed striking lineage dependencies (Fig. [115]2e and Supplementary Table [116]5), including well-known oncogenes such as KRAS (Fig. [117]2f,g) and BRAF (Fig. [118]2h,i). For example, KRAS essentiality was markedly stronger in KRAS-mutant stomach adenocarcinoma (STAD), rectal adenocarcinoma (READ), pancreatic adenocarcinoma (PAAD) and colon adenocarcinoma (COAD) lineages (Fig. [119]2f,g), whereas BRAF essentiality was strongest in BRAF-mutant skin cutaneous melanoma (SKCM) (Fig. [120]2h,i). We more broadly compared oncogene essentiality in TCGA patients with or without a gain-of-function (GOF) event (mutation or amplification), using the list of 100 cross-validated models for oncogenes from the Cosmic Cancer Gene Census ([121]https://cancer.sanger.ac.uk/census). Of the 100 oncogenes, a total of 85 gene essentialities predicted stronger dependencies in patients with a GOF event (Supplementary Table [122]6). To ensure that the associations between dependencies and mutations were not due to the same underlying predictive features, the accuracy of elastic-net models to predict essentiality and somatic mutations in the same genes were compared. The comparison was restricted to genes with cross-validated models of essentiality and somatic mutations with >2% prevalence (n = 891 models). The elastic-net models were allowed to select the most informative predictive features for mutation and essentiality for each gene, as the best predictors for essentiality may not be the best features to predict mutation. Comparison of the area under the curve (AUC) of the two model sets revealed that transcriptomic features were significantly more predictive of gene essentiality compared to mutational status (Extended Data Fig. [123]1i). Considering that the expression-only models of essentiality did not include genomic features, these data further demonstrate that the essentiality scores in TCGA[DEPMAP] can be independently correlated with genomic features in patient tumors. Combined with the evidence that cross-validated gene essentiality models accurately predict cancer lineages, these data suggest that the cross-validated gene essentiality models are accurate and interpretable across a wide range of biological contexts, including oncogenic dependencies. Selective dependencies in TCGA[DEPMAP] Strongly selective dependencies (SSDs) have been characterized in cell-based maps using the normality likelihood ratio test (NormLRT) to rank whether an essentiality fits a normal or t-skewed distribution (selective) across the cohort^[124]20,[125]29. A strength of this approach is the ability to rank SSDs regardless of the underlying mechanisms of dependency (for example, lineage, genetic and expression). To compare the SSDs in patients with cancer and cell models, NormLRT was applied to gene effect scores for the cross-validated essentiality models in TCGA[DEPMAP] and DEPMAP, respectively. Most SSDs (NormLRT > 100) correlated well between TCGA[DEPMAP] and DEPMAP (r = 0.56, P < 0.0001), including KRAS, BRAF, MYCN and many other known SSDs (Fig. [126]2j and Supplementary Table [127]7). Although most SSDs correlated well between TCGA[DEPMAP] and DEPMAP, there were several examples where the SSDs differed between patients and cell models (Fig. [128]2j,k). Notably, the druggable oncogenes (for example, FLT3 and PTPN11) were more prominent SSDs in TCGA[DEPMAP] patients than DEPMAP cell lines, whereas other notable SSDs in the DEPMAP (for example, ATP6V0E1) were less noticeable in TCGA[DEPMAP] (Fig. [129]2j,k). The top predictive features for essentiality of FLT3 (self-expression) and ATPV6V0E1 (paralog expression) did not differ between DEPMAP and TCGA[DEPMAP], yet the distribution and prevalence of strong dependency scores varied across lineages between patients and cell lines (Extended Data Fig. [130]2a–d). Likewise, the dependency on PTPN11 (SHP2) was noticeably more selective in TCGA[DEPMAP] than DEPMAP (Fig. [131]2j,k), which was reflected by greater essentiality in a subset of patients with breast cancer (BRCA) (Extended Data Fig. [132]2e) that was absent from BRCA cell lines (Extended Data Fig. [133]2f). A Fisher’s exact test of the genetic drivers that were enriched in TCGA[DEPMAP] patients with BRCA that were most dependent on PTPN11 included TP53 mutations and HER2/ERBB2 amplifications (Extended Data Fig. [134]2g), whereas FAT3 deletions and GATA3 mutations were depleted in these patients (Extended Data Fig. [135]2h). Particularly in the case of HER2, which signals through SHP2 and the RAS pathway, these data fit with the observation that RAS pathway inhibition, including SHP2 inhibitors, are more potent in the three-dimensional (3D) versus two-dimensional (2D) context^[136]30,[137]31. Thus, the presence of TCGA[DEPMAP] patients with BRCA that were highly dependent on PTPN11 is likely due to the 3D context of patient tumors, whereas DEPMAP BRCA cell lines with similar genetic drivers are not PTPN11 dependent due to the 2D context of cultured cells. Collectively, these data demonstrate that identifying SSDs can be impacted by different prevalence and distributions of the underlying drivers in patients and cell models, which can be overcome by patient-relevant dependency maps, such as TCGA[DEPMAP]. Extended Data Fig. 2. Examples of dependencies with different selectivity profiles across TCGA[DEPMAP] and DEPMAP cohorts. [138]Extended Data Fig. 2 [139]Open in a new tab (a) FLT3 was classified as a strongly selective dependency (SSD) with markedly higher dependency in blood lineage cancers of TCGA[DEPMAP] (blue bar, n = 7,021), (b) whereas FLT3 showed higher dependency in some blood lineage cancers but does not meet the threshold of an SSD in DEPMAP (n = 810). (c) ATPV6V0E1 essentiality scores varied widely across TCGA[DEPMAP] (n = 7,021), (d) while ATPV6V0E1 was classified as an SSD that was restricted to only a few lineages in DEPMAP (blue bars, n = 810). (e) PTPN11 was classified as an SSD with very strong dependencies in a subset of breast cancer patients in TCGA[DEPMAP] (blue bar, n = 7,021), (f) whereas no selectivity of PTPN11 essentiality was detected in DEPMAP (n = 810). For (A-F), the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the 5th and 95th percentiles (g) Top cancer driver mutations enriched in TCGA[DEPMAP] breast cancer patients that were highly dependent on PTPN11. (h) Top cancer driver mutations depleted in TCGA[DEPMAP] breast cancer patients that were highly dependent on PTPN11. For (g, h), ***FDR < 0.01, **P < 0.01, and *P < 0.05, as determined by Fisher exact test. [140]Source data Clinical phenotypes and outcomes in TCGA[DEPMAP] Another strength of translational tumor dependency maps is the ability to assess the impact of gene essentiality on clinically relevant phenotypes, such as molecular subtyping, therapeutic response and patient outcomes. To evaluate the utility of TCGA[DEPMAP] for therapy-relevant patient stratification, an unsupervised clustering of the 100 most variable gene dependencies was performed using the TCGA[DEPMAP] BRCA cohort (Fig. [141]3a). The 100-dependency signature (DEP100) performed comparably to the established PAM50 signature^[142]32 in classifying BRCA subtypes (AUC > 0.8 for most subtypes), despite only three overlapping genes between PAM50 and DEP100 (Fig. [143]3b). Dependency subtyping with DEP100 predicted significantly higher ESR1 essentiality in ER-positive tumors (Fig. [144]3c) and higher HER2 essentiality in HER2-amplified tumors (Fig. [145]3d). Finally, due to the limited accessibility of therapeutic response data in TCGA^[146]33, we identified nine clinical datasets for molecular therapeutics of tumor dependencies for which we had accurate models and sufficient statistical power^[147]34–[148]36. Of these nine datasets, we found seven out of nine dependency models significantly predicted clinical responses and performed better or comparable to the target gene expression in predicting therapeutic responses (Fig. [149]3e–h and Supplementary Table [150]8). Of the two nonsignificant datasets, both trended in the correct direction and would likely reach statistical significance with larger cohort sizes. Taken together, these data establish the physiological relevance of TCGA[DEPMAP] to associate dependencies with common clinicopathological features, such as molecular subtyping and therapeutic response. Fig. 3. Translating TCGA[DEPMAP] to clinically relevant phenotypes and outcomes. [151]Fig. 3 [152]Open in a new tab a, Unsupervised clustering of the top 100 dependencies in TCGA breast cancer patients. b, A ROC–AUC analysis was used to test the accuracy of calling breast cancer subtypes using the top 100 dependencies. c, ESR1 dependencies are strongest in ER-positive luminal BRCA (n = 96 for basal-like, n = 57 for HER2^+, n = 231 for luminal A, n = 126 for luminal B and n = 7 for normal-like). d, HER2 dependencies are strongest in HER2-amplified BRCA (n = 96 for basal-like, n = 57 for HER2^+, n = 231 for luminal A, n = 126 for luminal B and n = 7 for normal-like) e, HER2 dependency predicts trastuzumab response in patients with BRCA (n = 6 for no response, n = 33 for partial response and n = 9 for complete response). f, BRAF dependency predicts sorafenib response in patients with hepatocellular cancer (n = 46 for non-responder and n = 21 for responder). g, EGFR dependency predicts cetuximab response in patients with head and neck cancer (n = 26 for non-responder and n = 14 for responder). For c–g, *P < 0.05, **P < 0.01 and ***P < 0.001, as determined by the Wilcoxon rank-sum test for two-group comparison and Kruskal–Wallis test followed by a Wilcoxon rank-sum test with multiple test correction for the multi-group comparison. For boxplots in c–g, the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the 5th and 95th percentiles. h, AUC values for drug response predictions based on essentiality, expression and random essentiality scores generated via random sampling (control). i, Top gene essentialities associated with the PFI by univariate Cox proportional hazard regression model across multiple lineages in TCGA[DEPMAP] (Benjamini–Hochberg, FDR < 0.2). j, HRs of the top essentialities across TCGA[DEPMAP]. Blue indicates a greater dependency associated with worse outcome and red indicates a greater dependency is associated with better outcome. P values and HRs are shown in Supplementary Table [153]9. [154]Source data The ability to associate gene essentiality with patient survival is a unique strength of TCGA[DEPMAP], which is not accessible using cell-based dependency maps. Moreover, outcomes driven by perturbations of oncogenic pathways and genetic drivers of human cancers are likely not captured by gene expression alone and rather require a readout of gene essentiality. To test this possibility, the cross-validated gene essentiality models (n = 1,966) were tested for association with the progression-free interval (PFI) in TCGA[DEPMAP]. Among 29 cancer lineages that are well powered for PFI analysis^[155]33, 105 known genetic drivers of human cancer were significantly associated with the PFI of TCGA patients (Supplementary Table [156]9), including 29 that were prognostic in at least four cancer lineages (Fig. [157]3i,j). For example, a stronger dependency on the druggable oncogene, STAT3 (ref. ^[158]35), was significantly associated with a shortened time to disease progression of six different cancers (Fig. [159]3i,j). Likewise, multiple other prevalent genetic drivers of human malignancies were associated with a significantly shorter PFI, including PAX5 and PDGFRA (Fig. [160]3i,j). Both proteins have been investigated previously as prognostic indicators of poor outcomes by expression analysis in patient biopsies^[161]37,[162]38 and this study shows that dependency on these oncogenes is associated with worse outcome in patients using a translational dependency map. Synthetic lethalities in TCGA[DEPMAP] In addition to illuminating lineage and oncogenic dependencies, the DEPMAP has dramatically expanded the list of potential synthetic lethalities (the loss of a gene sensitizes tumor cells to inhibition of a functionally redundant gene within the same pathway)^[163]6,[164]16,[165]17,[166]39,[167]40; however, one of the current limitations of the DEPMAP is that the available cancer cell models do not yet fully recapitulate the genetic and molecular diversity of TCGA patients^[168]25. Thus, we assessed the landscape of predicted synthetic lethalities with loss-of-function (LOF) events (damaging mutations or deletions) in TCGA[DEPMAP]. Lasso regression analysis of gene essentiality profiles and 25,026 LOF events detected in TCGA[DEPMAP] yielded 633,232 synthetic lethal candidates (FDR < 0.01) (all candidates added as an R object to a figshare repository), which were too numerous to experimentally validate by current methods. To prioritize the synthetic lethal candidates, the gene interaction scores were correlated with the mutual exclusivity of corresponding mutations in TCGA[DEPMAP], which narrowed the list to 28,609 candidates (FDR < 0.01). Multiple additional criteria were then applied to refine the list further by enriching for predicted paralogs with close phylogenic distance to prioritize candidates with redundant functions due to sequence homology. All told, this approach identified many known synthetic lethal pairs (for example, STAG1/STAG2, SMARCA2/SMARCA4 and EP300/CREBBP)^[169]41–[170]43 and previously untested synthetic lethal candidates, demonstrating that TCGA[DEPMAP] is well powered to predict synthetic lethal relationships with LOF events in patient tumor biopsies (Extended Data Fig. [171]3a–d and Supplementary Table [172]10). Extended Data Fig. 3. Characterization of synthetic lethalities. [173]Extended Data Fig. 3 [174]Open in a new tab (a) STAG1 synthetic lethality with STAG2 mutation (n = 163 for STAG2^MUT and n = 7,418 for STAG2^WT), (b) SMARCA2 synthetic lethality with SMARCA4 mutation (n = 223 for SMARCA4^MUT and n = 7,358 for SMARCA4^WT), (c) CREBBP synthetic lethality with EP300 mutation (n = 937 for EP300^DEL and n = 6,644 for EP300^WT), and (d) CNOT7 synthetic lethality with CNOT8 deletion are examples of synthetic lethalities that were detected by TCGA[DEPMAP]. (n = 550 for CNOT8^DEL and n = 7,031 for SMARCA4^WT) ***P < 0.001, as determined by the Wilcoxon rank-sum test. (E-I) Comparison of multiplexed CRISPR/Cas12 screens performed using AsCas12a and EnAsCas12a enzymes. Analysis was performed using a Pearson’s correlation and coefficients (r) are displayed on the graphs. (j) Simple Western blots of protein expression of CNOT7, CNOT8 and housekeeping control Beta-Actin of nontargeting (NT) control, single (KO) and dual (DKO) knockout cells 3 days after CRISPR/RNP electroporation. (k) Plots showing the protein abundance ratio of CNOT8 (Y-axis) and copy number status of CNOT7 (X-axis) in the CPTAC Lung Adenocarcinoma (LUAD) and Breast Cancer (BRCA) cohorts showing a significant upregulation of CNOT8 protein in tumors with CNOT7 copy number loss (shallow and deep deletions) compared to diploid and gain tumors (for LUAD n = 7 for gain, n = 51 for diploid and n = 55 for shallow deletion; for BRCA n = 22 for gain, n = 33 for diploid and n = 67 for shallow deletion). For (A-D and K), the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the 5th and 95th percentiles. The two-sided Wilcoxon rank test was used for (A-D) ***p < 0.001 and ***p < 0.001 as determined by Student’s unpaired, two-tailed t-test for (K). [175]Source data Synthetic lethalities that were predicted with LOF events in the TCGA[DEPMAP] (n = 604 pairs) were experimentally tested using a multiplexed CRISPR/AsCas12a screening approach across representative cell models of five cancer lineages (Fig. [176]4a,b). Additional pairs (n = 261 controls) were added to the library to control for screen performance, including essential paralog pairs and nonessential pairs of tumor suppressor genes (TSGs) and interacting partners (Supplementary Table [177]10). An initial pilot screen was performed using five cancer cell models, which experimentally validated 69 TCGA[DEPMAP] synthetic lethalities in at least one representative cell model (Supplementary Table [178]11). As these data were being generated, an enhanced AsCas12a (enAsCas12a) enzyme was reported to be compatible with CRISPR/AsCas12a libraries^[179]44, enabling replication of the initial pilot screens and expansion to a total of 16 cancer cell models. Notably, the replication of the initial screens was highly concordant across the five cell models in common (average r = 0.69) (Extended Data Fig. [180]3e–i), as well as detection of increased depletion of essential controls and synthetic lethal partners compared to nonessential controls (Fig. [181]4c). In addition to novel pairs, multiple previously reported synthetic lethalities (HSP90AA1/HSP90AB1 (ref. ^[182]45), DDX19A/DDX19B^[183]45, HDAC1/HDAC2 (refs. ^[184]45,[185]46), SMARCA2/SMARCA4 (refs. ^[186]45,[187]46), EP300/CREBBP^[188]43, STAG1/STAG2 (refs. ^[189]42,[190]46) and CNOT7/8 (ref. ^[191]47)) were replicated across multiple cell lines in both cohorts (Supplementary Table [192]11), demonstrating the robustness of the multiplex CRISPR/Cas12a screening platform to test synthetic lethalities. Notably, as observed elsewhere^[193]39,[194]41,[195]46, the sensitivity to synthetic lethalities varied between cell models and lineages, implicating the prevalence of unknown modifiers of synthetic lethality that manifest in different cellular contexts and are yet to be fully understood. Fig. 4. Using TCGA[DEPMAP] to translate synthetic lethalities in human cancer. [196]Fig. 4 [197]Open in a new tab a, Schematic of the CRISPR/Cas12 library multiplexed guide arrays targeting one or two genes per array. b, Schematic of the synthetic lethality screening approach using the CRISPR/Cas12 library. All CRISPR screens were performed as n = 3 biological replicates per cell line. c, Violin plots of target-level CRISPR of the average log[2] fold change (FC) across all tested cell lines for nontargeting (NT) guide (neg CTRL), single knockout guides targeting essential genes (single KO CTRL), DKO guides targeting essential genes (DKO CTRL), single knockout guides of TCGA[DEPMAP] candidates (single KO) and DKO guides of TCGA[DEPMAP] candidates (DKO). d, Rank plot of target-level gene interaction (GI) scores averaged across n = 14 cell lines in the CRISPR/Cas12 multiplexed screening (A549, DETROIT562, FADU, H1299, H1703, HCT116, HSC2, HSC3, HT29, MDAMB231, MIAPACA2, PANC1, PC3M and SNU1), including the top five synthetic lethalities (table insert). The black line indicates the mean and gray error bars show ±s.e.m. e, Distribution of synthetic lethal candidates from TCGA[DEPMAP] with experimental evidence of synthetic lethality in the CRISPR/Cas12 multiplexed screening across 14 cancer cell lines. A blue box indicates a GI score < −2. f,g, Cell viability assessed by CellTiterGlo (CTG) luminescence at 7 days after single (KO) or dual (DKO) CNOT7/CNOT8 knockouts, normalized to NT controls in five cell lines grown in 2D monolayers (f) or 3D spheroids (g); n = 3 biological replicates per cell model per condition with the exception of n = 5 biological replicates for Hs578T grown in 2D monolayer. Error bars are mean ± s.d. h, Crystal violet staining of CNOT7^−/− clones C1 and C2 stably expressing nontargeting (sgNT) or CNOT8-targeting (sgCNOT8) dox-inducible guide constructs, following 7 days of dox treatment ([198]Methods). i, Tumor xenograft studies of HT29 clones grown in mice fed dox-containing food from day 0 (gray and green lines) or beginning on day 19 (blue lines). n = 5 mice per group. Error bars are ±s.d. Asterisks in f, g and i reflect two-tailed, unpaired Student’s t-test P values; *P < 0.05; **P < 0.01; ***P < 0.001. [199]Source data Of the 604 synthetic lethalities predicted by TCGA[DEPMAP], a total of 78 (13%) were experimentally validated in at least one representative cell model (Fig. [200]4d,e and Supplementary Table [201]11). For example, double knockout (DKO) of CNOT7/8 was synthetic lethal in 11 out of 14 cell lines that were screened (Fig. [202]4e) and was orthogonally validated in five cell models by DKO using ribonucleoprotein (RNP) in both 2D monolayer and 3D spheroid assays (Fig. [203]4f,g). Likewise, doxycycline (dox)-inducible loss of CNOT8 was synthetic lethal in HT29 cells that lacked CNOT7 in both in vitro 2D monolayers (Fig. [204]4h) and in vivo mouse xenograft studies (Fig. [205]4i). Notably, loss of CNOT7 in single knockout (KO) cells coincided with elevated CNOT8 protein (Extended Data Fig. [206]3j), fitting with previous observations that loss of CNOT7 increases integration of CNOT8 into the CCR4–NOT complex^[207]48. Likewise, CNOT8 protein levels were inversely correlated with CNOT7 copy numbers in patients with lung adenocarcinoma (LUAD) and BRCA in the NCI Clinical Proteomic Tumor Analysis Consortium cohort (Extended Data Fig. [208]3k). Collectively, these observations demonstrate the power of TCGA[DEPMAP] to detect patient-relevant synthetic lethal mechanisms, which can be orthogonally validated and provide therapeutic targets for drug discovery. Another discovery using TCGA[DEPMAP] was the prediction of PAPSS1 synthetic lethality with deletion of PAPSS2 and the neighboring tumor suppressor, PTEN, which were frequently co-deleted in TCGA patient tumors (43% co-incidence) yet were largely unaffected in cancer cell lines (Extended Data Fig. [209]4a–g). PAPSS1/PAPSS2 are functionally redundant enzymes essential for synthesis of 3′-phosphoadenosine 5′-phosphosulfate (PAPS), which is required for all sulfonation reactions^[210]49, suggesting that loss of PAPSS1/PAPSS2 is synthetic lethal due to the inability to sulfonate proteins. To test this hypothesis, PAPSS1/PAPSS2 were targeted in H1299 spheroids by RNP, followed by measurement of spheroid growth and sulfonation levels of heparan sulfate (HS) proteoglycan (HSPG) chains on the cell surface by flow cytometry. Confirming the CRISPR/Cas12 screen data (Fig. [211]5a), dual loss of PAPSS1 and PAPSS2 significantly reduced H1299 spheroid growth compared to controls (Fig. [212]5b and Extended Data Fig. [213]4h,i), which coincided with loss of HSPG sulfonation (Fig. [214]5c). Likewise, targeting PAPSS1 by RNP in UMUC3 cells, which endogenously lack PAPSS2 and PTEN, also significantly depleted HSPG sulfonation and coincided with significant spheroid growth reduction, which could be rescued by addition of exogenous heparan sulfate (Fig. [215]5d and Extended Data Fig. [216]4h,j). Finally, PAPSS1/PAPSS2 synthetic lethality was confirmed in vivo, as demonstrated by a significant tumor growth reduction of UMUC3 tumors without PAPSS1 and PAPSS2 compared to control tumors lacking only PAPSS2 (Fig. [217]5e and Extended Data Fig. [218]4k). Taken together, these data demonstrate that translational dependency maps, such as the TCGA[DEPMAP] are powerful tools to uncover previously underrepresented synthetic interactions in cancer models that are likely to be patient relevant. Extended Data Fig. 4. Supporting evidence of PAPSS1/2 synthetic lethality. [219]Extended Data Fig. 4 [220]Open in a new tab a, b) PAPSS1 is a novel synthetic lethality in the context of PAPSS2 deletion, which is not detectable in (a) DEPMAP cell lines (n = 905) and is only detectable in (b) TCGA[DEPMAP] patient samples (n = 7,581). (c, d) Likewise, PAPSS1 is not synthetic lethal with PTEN deletion in DEPMAP cell lines (c, n = 905) and is only detectable in TCGA[DEPMAP] patient samples (d, n = 7,581). For (A-D), the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the 5th and 95th percentiles. (e) Unlike cultured cell models, PAPSS2 is frequently co-deleted with PTEN in TCGA patients. (f) PAPSS2 is a closely neighboring gene of PTEN. (g) A schematic representation summarizing the hypothesized synthetic lethality of PAPSS1 that is driven by collateral deletion of PAPSS2 with the tumor suppressor gene (TSG), PTEN, in patients but not cell lines. ***P < 0.001, as determined by the Wilcoxon rank-sum test. (h) Endogenous expression by Simple Western of PAPSS1, PAPSS2, and PTEN in the model cell lines UMUC3 and NCI-H1299. (i,j) Validation of PAPSS1 and PAPSS2 single (KO) and double (dKO) knockouts by RNP in spheroid experiments for NCI-H1299 (i) and UMUC3 (j). (k) Validation of PAPSS1 knockout in the UMUC3 xenograft experiment tumors (n = 5 tumors per condition from n = 1 independent experiment). Molecular weight marker lanes are shown in kDa. Data shown in (h-j) are representative from at least 3 independent experiments. The two-sided Wilcoxon rank test was used for (A-D), ***P < 0.001. [221]Source data Fig. 5. PAPSS1 and PAPSS2 are novel synthetic lethal paralogs detected by TCGA[DEPMAP]. [222]Fig. 5 [223]Open in a new tab a, Rank plot of target-level GI scores in H1299 cells, including the top ten synthetic lethalities (table insert). The novel synthetic lethality, PAPSS1/PAPSS2, is highlighted in blue. All CRISPR screens were performed as n = 3 biological replicates per cell line. b, Spheroid size of H1299 cells with single or dual PAPSS1 and PAPSS2 knockouts, normalized to NT control spheroids; n = 4 biological replicates per condition. Data show mean ± s.d. *P < 0.05 and **P < 0.01 as per unpaired, two-tailed t-test. c, Flow cytometry histogram overlay plots of viable H1299 and UMUC3 cells (DAPI^−) showing expression of cell surface sulfonated HSPGs as measured by antibody clone 10E4-FITC. Dual loss of PAPSS1/PAPSS2 leads to total loss of sulfonation comparable to heparinase III treatment (HepIII*) which specifically cleaves sulfonated HS chains. d, Growth defects of UMUC3 spheroids following deletion of PAPSS1 (yellow bars) were partially rescued by the addition of 10 μg ml^−1 and 50 μg ml^−1 of exogenous HS as compared to NT control spheroids (green bars); n = 4 biological replicates for the untreated control and n = 3 biological replicates per treated condition. Data are mean ± s.d. *P < 0.05 as per unpaired, two-tailed t-test. e, Diagram showing tumor volumes over time (d, days) after in vivo implantation of 1 × 10^6 UMUC3 NT or PAPSS1-KO cells in SCID/beige mice. Each dot represents an individual mouse (n = 5 mice per condition); ***P < 0.001, as determined by unpaired, two-tailed t-test of the final data point. f, Kaplan–Meier plot of TCGA[DEPMAP] patients with a predicted PAPSS1/PAPSS2 synthetic lethality has a worse outcome compared to the rest of the cohort, as determined by a Cox log-rank test. DAPI, 4,6-diamidino-2-phenylindole. [224]Source data TCGA[DEPMAP] is unique in its ability to uncover potential synthetic lethalities that can be related to patient outcomes, enabling the prioritization of the experimentally validated synthetic lethalities that correlate with the worst outcome and therefore likely to have the greatest clinical impact if druggable. To test this possibility, a Cox log-rank test was used to assess overall survival (OS) of TCGA patients who correlated with predicted gene essentiality by TCGA[DEPMAP] and LOF events (mutation, deletion or both) of the putative synthetic lethal partner. After controlling for tumor lineage, PAPSS1 dependency in TCGA[DEPMAP] was correlated with significantly worse OS (hazard ratio (HR) = 0.61, P = 0.0004) in patients with PAPSS2 deletion (Fig. [225]5f), demonstrating that PAPSS1 is a synthetic lethality target with potentially high translational impact. Collectively, these data demonstrate that translational dependency maps can enable the discovery, validation and translation of synthetic lethalities. Constructing PDXE[DEPMAP] In addition to building TCGA[DEPMAP], a similar approach was applied to generating an orthogonal translational dependency map using the PDX Encyclopedia (PDXE[DEPMAP])^[226]50. As outlined in Fig. [227]6a, PDXE[DEPMAP] was assembled by transferring the cross-validated 1,966 expression-only models from the DEPMAP to the PDXE (n = 191 tumors) using the aligned genome-wide expression profiles from the PDXE (Supplementary Table [228]12). Unsupervised clustering of gene essentialities across five well-represented lineages in PDXE[DEPMAP] confirmed that lineage is a key driver of gene dependencies (Fig. [229]6b), fitting with the observations made in TCGA[DEPMAP] (Fig. [230]2e). PDXE[DEPMAP] also detected markedly stronger KRAS essentiality in KRAS-mutant PDX of pancreatic ductal carcinoma (PDAC) and colorectal carcinoma (CRC) lineages (Fig. [231]6c,d), whereas BRAF essentiality was strongest in BRAF-mutant PDX of cutaneous melanoma (CM) (Fig. [232]6e,f). These data collectively demonstrate that the PDXE[DEPMAP] performed comparably to TCGA[DEPMAP] and is well powered to detect gene essentiality signals in PDX models. Fig. 6. Building a translational dependency map in patient-derived xenografts: PDXE[DEPMAP]. [233]Fig. 6 [234]Open in a new tab a, Schematic of gene essentiality model transposition from DEPMAP to PDXE, following alignment of genome-wide expression data to account for differences in homogeneous cultured cell lines and PDX samples with contaminating stroma. b, Unsupervised clustering of predicted gene essentiality scores across five lineages in PDXE[DEPMAP] confirmed similar lineage drivers of gene dependencies, as observed in TCGA[DEPMAP]. Blue indicates genes with stronger essentiality and red indicates genes with less essentiality. c, KRAS dependency was enriched in PDXE[DEPMAP] lineages with high frequency of KRAS GOF mutations, including CRC and PDAC. n = 43 for BRCA, n = 51 for CRC, n = 27 for NSCLC, n = 39 for PDAC and n = 32 for CM. d, KRAS essentiality correlated with KRAS mutations in all PDXE[DEPMAP] lineages (n = 74 for KRAS^mut and n = 117 for KRAS^wt). e, BRAF dependency in PDXE[DEPMAP] was enriched in CM, which has a high frequency of GOF mutations in BRAF. n = 43 for BRCA, n = 51 for CRC, n = 27 for NSCLC, n = 39 for PDAC and n = 32 for CM. f, BRAF essentiality correlated with BRAF mutations in all TCGA[DEPMAP] lineages (n = 32 for BRAF^mut and n = 159 for BRAF^wt). For c–f, the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the fifth and 95th percentiles. g, Top correlated gene essentiality models that correlate with PDX response to erlotinib in PDXE[DEPMAP]. h, Top correlated gene essentiality models that correlate with PDX response to cetuximab in PDXE[DEPMAP]. ***P < 0.001, as determined by the Wilcoxon rank-sum test for two-group comparison (d and f) and Kruskal–Wallis test followed by a Wilcoxon rank-sum test with multiple test correction for a multi-group comparison (c and e). NSCLC, non-small cell lung cancer. [235]Source data In addition to orthogonal validation of TCGA[DEPMAP], a unique strength of PDXE[DEPMAP] is the ability to assess gene essentiality in the context of therapeutic responses across five cancer lineages and 15 molecular therapies^[236]50. To test the ability of gene essentiality to predict the response to corresponding targeted therapies, the change in PDX burden from baseline to experimental end point was correlated with target gene essentiality. This revealed that 80% of drugs (12 of 15) were significantly correlated (P < 0.05) with the predicted essentiality of the target gene (Supplementary Table [237]13). For example, trastuzumab response in the PDXE[DEPMAP] was strongly predicted by HER2 dependency (R = 0.4849, P = 0.002, AUC = 0.75), in line with the predictive power of HER2 dependency on trastuzumab responsiveness in patients with HER2-amplified BRCA (Fig. [238]3e). Other examples, such as erlotinib (R = 0.4937, P = 0.01, AUC = 0.78) and cetuximab (R = 0.2293, P = 0.06, AUC = 0.83), which target the same gene (EGFR), provide the opportunity to explore dependency mechanisms of therapeutic resistance across modalities. Comparisons of PDX responses to erlotinib or cetuximab revealed dependencies within two common pathways: the SWI/SNF complex (SMARCA2 and SMARCD1) and protein trafficking (EMC4, EMC6, VPS39 and MAPK14) (Fig. [239]6g,h). Notably, components of both pathways have been implicated in resistance to EGFR inhibitors^[240]51,[241]52, suggesting that targeting these dependencies would likely improve patient outcomes. Taken together, these data demonstrate the ability of gene essentiality to predict therapeutic response and highlight the translatability of PDX modeling to patient-relevant clinical outcomes. Translating gene tolerability in GTEX[DEPMAP] A final objective of this study was to define gene essentiality in the context of healthy tissues, which would provide a resource for prioritizing tumor dependencies with the best predicted tolerability. To achieve this objective, the expression-based dependency models from DEPMAP were transposed using the aligned expression data from GTEX (GTEX[DEPMAP]), a compendium of deeply phenotyped normal tissues collected from postmortem healthy donors (n = 948)^[242]28 (Fig. [243]7a and Supplementary Table [244]14). To assess the sensitivity of GTEX[DEPMAP] to dependencies with low tolerability, the molecular targets of drugs with reported toxicities in the liver and blood (n = 241) were compared across GTEX[DEPMAP] (Supplementary Table [245]15). This revealed that the average essentiality was higher in liver and blood than other normal tissues (Fig. [246]7b). Likewise, unsupervised clustering of the 1,966 cross-validated gene essentiality models revealed strong tissue-of-origin dependencies in healthy organs (Fig. [247]7c), suggesting that tissue-specific biological context also contributes to gene essentiality in normal physiological settings. Taken together, these data demonstrate that GTEX[DEPMAP] is sensitive to known toxicities, which cluster around different healthy organ types. Fig. 7. Building a translational dependency map in normal tissues: GTEX[DEPMAP]. [248]Fig. 7 [249]Open in a new tab a, Schematic of gene essentiality model transposition from DEPMAP to GTEX, following alignment of genome-wide expression data to account for differences in homogeneous cultured cell lines and healthy tissue biopsies. b, Average gene essentiality profile across healthy tissues of GTEX[DEPMAP] (n = 17,382) for molecular targets with known liver and blood toxicities (in blue). c, Unsupervised clustering of predicted gene essentiality scores across healthy tissues. Blue indicates genes with stronger essentiality and red indicates genes with less essentiality. d, KRAS essentiality is significantly higher in PAAD with GOF mutations compared to healthy pancreas in GTEX[DEPMAP] (n = 146 for cancer with n = 106 KRAS^mut and n = 40 KRAS^wt, n = 328 for normal) e, BRAF essentiality is significantly higher in SKCM with GOF mutations compared to normal skin GTEX[DEPMAP] (n = 319 for cancer with n = 165 BRAF^mut and n = 154 BRAF^wt, n = 1,809 for normal) For b, d, and e, the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the fifth and 95th percentiles. f, Global differences between the predicted target efficacy score (TCGA[DEPMAP]) and the healthy tissue-of-origin tolerability score (GTEX[DEPMAP]). g, STRING network analysis of the top 100 LUAD targets with the greatest predicted tolerability in healthy lung reveals significant connectivity (P < 1 × 10^−16) and gene ontology enrichment oxidative phosphorylation (blue-colored spheres; P = 5.8 × 10^−11) and mitochondrial translation (red-colored spheres; P = 2.9 × 10^−20). ***P < 0.001, as determined by a Wilcoxon rank-sum test for two-group comparison and Kruskal–Wallis test followed by a Wilcoxon rank-sum test with multiple test correction for a multi-group comparison (d and e). [250]Source data Comparing essentiality scores of known druggable oncogenes in TCGA[DEPMAP] with GTEX[DEPMAP] revealed greater dependency in malignant tissues versus a healthy tissue of origin. For example, KRAS and BRAF essentialities seem to be concomitantly dependent on lineage and genetic drivers, as the healthy tissues of origin were predicted to be significantly less affected in the GTEX[DEPMAP] compared to TCGA[DEPMAP](Fig. [251]7d,e). Likewise, similar observations were made for other oncogenic drivers that are approved therapeutic targets in patients with cancer, such as HER2-amplified BRCA (Extended Data Fig. [252]5a). In contrast, there was markedly less separation in the predicted essentialities of malignant tumors and healthy tissues of origin for molecular therapies that have yet to be successful in clinical trials (Supplementary Table [253]16). To refine the list of oncogenic pathways with significant differences in tumor efficacy and healthy tissue-of-origin tolerability, we compared dependency (TCGA[DEPMAP]) and tolerability (GTEX[DEPMAP]) scores across all genes and tissues (Fig. [254]7f). Pathway analysis of the strongest tumor dependencies with the least tissue-of-origin toxicity revealed enrichment of multiple oncogenic pathways and pathophysiological processes (Supplementary Table [255]17), including dysregulation of oxidative phosphorylation (P = 5.8 × 10^−11) and mitochondrial translation (P = 2.9 × 10^−20) pathways that were enriched in LUAD compared to healthy lung (Fig. [256]7g and Extended Data Fig. [257]5b). Combined, these observations suggest that predicted gene essentiality in the context of a driver mutation and correspondingly low essentiality within the healthy tissue of origin is likely to identify efficacious drug targets with acceptable tolerability. Extended Data Fig. 5. Essentiality profiles of genes in cancer versus normal tissues. [258]Extended Data Fig. 5 [259]Open in a new tab (a) ERBB2 essentiality is significantly higher in malignant breast cancer with ERBB2 amplifications (TCGA[DEPMAP], n = 137 for ERBB2^AMP and n = 932 for ERBB2^WT) compared with normal breast (GTEX[DEPMAP], n = 459). ***P < 0.001, as determined by the Wilcoxon rank-sum test. For the boxplot, the center horizontal line represents the median (50th percentile) value. The box spans from the 25th to the 75th percentile. The whiskers indicate the 5th and 95th percentiles. (b) STRING network analysis of the top 100 LUAD targets with the greatest predicted tolerability in normal lung reveals significant connectivity (p < 1 × 10^−16) and gene ontology enrichment for oxidative phosphorylation (blue colored spheres; p = 5.8 × 10^−11) and mitochondrial translation (red-colored spheres; p = 2.9 × 10^−20). [260]Source data Tool for visualizing translational dependencies To enable visualization of the data, we have provided an interactive web-based application ([261]https://xushiabbvie.shinyapps.io/TDtool/) for exploring the data within TCGA[DEPMAP], PDXE[DEPMAP] and GTEX[DEPMAP]. Discussion Cancer dependency maps have accelerated the discovery of tumor vulnerabilities, yet translating these findings to predict the therapeutic window of potential drug targets in patients remains challenging. Here, we used machine learning to build translational dependency maps in patient tumors and normal tissue biopsies that would enable tumor vulnerabilities to be studied in the context of a drug target’s efficacy, tolerability and outcome. The translational dependency maps were built using elastic-net models of transcriptomic features to predict gene essentiality. As the predictive models of essentiality did not include genomic features, the dependency scores could be independently tested for associations with genetic drivers in patient tumors. Moreover, these expression-only models of gene essentiality could be applied to healthy tissues that do not have appreciable levels of the somatic alterations that are observed in malignant tissues^[262]28. To illustrate how these data can be integrated to predict a target’s therapeutic window, we showed that KRAS and BRAF dependencies were elevated in patient tumors with GOF mutations (TCGA[DEPMAP] and PDXE[DEPMAP]), which was far less pronounced in normal tissue biopsies lacking these driver mutations (GTEX[DEPMAP]). Combined, these new translational dependency maps offer a unique and clinically relevant aspect to gene essentiality that is not currently accessible in the traditional cell-based dependency maps. Finally, we made the dependency maps freely accessible in a user-friendly and interactive web-based application for exploring and visualizing the data. During the completion of this study, Chiu et al.^[263]27 took a complementary approach to building a translational dependency map (DeepDEP) using deep learning and the genomic, epigenomic and transcriptomic profiles of TCGA patients and DEPMAP cell lines. Here, we used elastic-net regularized regression models of expression data for predicting gene essentiality and tolerability, as these expression-based models performed comparably to multi-omics models and can be applied to malignant tissue (TCGA[DEPMAP] and PDXE[DEPMAP]) and nonmalignant tissue (GTEX[DEPMAP]). The DeepDEP authors also highlighted that a simplified deep-learning model using expression only (Exp-DeepDEP) performed comparably well to DeepDEP^[264]27, suggesting that both approaches are dominated by expression data^[265]27. For lack of other ground truths, we compared the predicted tumor dependencies of TCGA[DEPMAP] and DeepDEP by pan-cancer lineage and BRCA subtypes, as these were annotated by TCGA and DEPMAP. Compared to DeepDEP, the predicted dependencies by TCGA[DEPMAP] were comparable in identifying cancer lineages and BRCA subtypes (Extended Data Fig. [266]6). Thus, the collective data demonstrated that the elastic-net models underlying TCGA[DEPMAP], PDXE[DEPMAP] and GTEX[DEPMAP] performed well compared to DeepDEP. As additional studies become available, more in-depth benchmarking of approaches for translating dependencies is warranted, including the ability to detect genetic drivers, synthetic lethalities and other patient-relevant features. Extended Data Fig. 6. TCGA[DEPMAP] outperforms DeepDEP. [267]Extended Data Fig. 6 [268]Open in a new tab a) Precision-recall analysis of pan-cancer lineage predictions by the AUC values are significantly higher for TCGA[DEPMAP] in predicting cancer lineages based on top 100 variable dependencies compared with DeepDEP. (b) The ROC curves for predicting the breast cancer subtypes based on the top 100 variable gene dependencies. The TCGA[DEPMAP] significantly outperforms DeepDEP in predicting any of the breast cancer subtypes (TCGA[DEPMAP] continuous line; DeepDEP dotted line). [269]Source data A strength of translational dependency maps is the ability to recapitulate patient tumor context, therapeutic responses and many aspects of disease outcomes. Fitting with observations that the tissue of origin dominates the molecular landscape of cancer^[270]53, TCGA[DEPMAP] and PDXE[DEPMAP] revealed that tumor vulnerabilities were tightly correlated with disease lineage and subtype. Oncogenic dependencies were also predictive of response to molecularly targeted therapeutics in both TCGA[DEPMAP] and PDXE[DEPMAP], as would be expected based on the response rates for molecular therapeutics targeting oncogenic drivers in patients. In total, 85% of oncogenic dependencies had a GOF event associated with increased dependency in patient tumors and 28% could be associated with PFI, including some that predicted better or worse outcomes depending on the cancer lineage. These data fit with the observation that ~10% cancer-driver genes have evidence for both oncogenic and suppressive characteristics depending on tumor context. The selectivity of some oncogenic dependencies also differed between patients and cell models, including FLT3, ATPV6V0E1 and PTPN11. Some of these discrepancies seemed to be attributed to cohort-specific distributions of the underlying drivers of SSDs (for example, FLT3 and ATPV6V0E1), whereas others were likely attributable to different pathophysiological contexts, such as the 3D contexts of intact tumors versus the 2D contexts of cultured cells (for example, PTPN11). Taken together, these data highlight the complexities of interpreting gene essentiality in patient-relevant contexts, and future studies are warranted to further translate the underlying mechanisms of novel tumor dependencies that impact patient outcomes. TCGA[DEPMAP] detected multiple known synthetic lethalities (for example, STAG1/STAG2, SMARCA2/SMARCA4 and EP300/CREBBP)^[271]42,[272]43,[273]45,[274]46, as well as synthetic lethalities that are less well characterized (for example, CNOT7/CNOT78 and PAPSS1/PAPSS2). As reported elsewhere^[275]39,[276]41,[277]46, synthetic lethal interactions varied widely when tested across different cancer cell models, suggesting that the currently available models are insufficient to account for all patient-relevant contexts. Nonetheless, both a commonly shared synthetic lethality (CNOT7/CNOT78) and a more selective synthetic lethality (PAPSS1/PAPSS2) were validated in vitro and in vivo. CNOT7/CNOT78 are paralogous subunits of the CCR4–NOT complex that mediates messenger RNA stability^[278]47, fitting with the observation that loss of both subunits was broadly synthetic lethal. PAPSS1/PAPSS2 are paralogous synthases of PAPS, which is required for sulfonation reactions^[279]49. We hypothesized that loss of PAPSS2 is likely driven by its proximity to PTEN and is an example of collateral deletion in patient tumors^[280]54. This observation was confirmed by the synthetic lethal interaction of PAPSS1 in UMUC3 cells that lacked PAPPS2 and PTEN, which coincided with the inability of these cells to sulfonate proteins. Notably, the unique ability of TCGA[DEPMAP] to detect and associate synthetic lethal mechanisms with patient outcomes revealed a worse OS of patients with an endogenous loss of PAPSS2 and a predicted synthetic lethality with PAPSS1 dependency. Thus, these data collectively highlight the benefits of translational dependency maps that closely match the pathophysiological contexts of intact patient tumors and the diversity of patient genomic datasets to identify clinically relevant mechanisms^[281]1,[282]55. A unique aspect of this study was the ability to systematically compare gene essentiality associated with somatic mutations in TCGA[DEPMAP] with the healthy tissue-of-origin tolerability profiles in GTEX[DEPMAP]. Systematically expanding this analysis across all gene essentiality models in TCGA[DEPMAP] and GTEX[DEPMAP] revealed wide variability in the predicted tolerability windows, implicating the existence of other dependencies with strong genetic drivers that are likely to be more tolerable as therapeutic targets; however, when interpreting these data, we also recommend exercising caution, as the tolerability windows predicted by comparing tissue-of-origin gene essentiality between TCGA[DEPMAP] and GTEX[DEPMAP] likely does not yet fully capture the other dose-limiting toxicities that pose challenges to clinical drug development^[283]56. As such, future efforts to model gene essentiality in healthy tissues should expand to incorporate systems approaches to integrating tolerability signals across multi-organ physiological pathways and systems. The translational dependency maps presented in this study provide insights into gene essentiality and tolerability in the clinical context of patient tumors and healthy tissues. The ability of these maps to accurately translate dependencies to patients is reliant on the ability to build predictive models from cell-based mapping, which is still at the early stages and is expected to require 20× more data to fully predict gene essentiality^[284]7. Further, the observations that cell-based dependencies vary between 2D and 3D settings^[285]57 and are impacted by crosstalk with the tumor microenvironment^[286]58, suggests that gene essentiality is contextual and requires models with greater relevance to intact tumors, such as organoids. Likewise, it is equally plausible that accurately interpreting translational dependencies will require a deeper understanding of clonal heterogeneity with patient tumors that is lacking from homogenous cancer cell lines. To reach the full potential of translational dependency mapping, the catalog of patient genomic datasets will also likely require expansion to capture various stages of disease progression, including tumorigenesis^[287]2, metastasis^[288]3,[289]59 and therapeutic resistance^[290]3,[291]4,[292]59. Furthermore, as precision cancer clinical trials continue to expand (for example, MSK-IMPACT)^[293]4, it will be increasingly possible to refine translational dependency maps by testing outcomes of molecular therapeutics with predicted target essentiality. The utility of translational ‘tolerability’ maps in healthy tissues (for example, GTEX[DEPMAP]) remains to be fully explored and will likely benefit from further refinements to better capture aspects of dose-limiting toxicities that impact drug development. To this end, we postulate that modeling gene tolerability could be best assessed in normal cell types by pairing CRISPR perturbations with single-cell RNA sequencing^[294]60,[295]61 to broadly capture the alterations of pathways required for healthy tissue homeostasis. Ultimately, we postulate that predictive modeling of dependency and tolerability in patients will increase the success of drug discovery by preemptively prioritizing targets with the best therapeutic index (high dependency and tolerability). Methods Predictive modeling of gene essentiality using DEPMAP data Two sets of elastic-net regression models were generated to predict gene essentiality from the DEPMAP (n = 897 cell lines) with RNA alone (expression only) or combined with mutation and copy number profiles (multi-omics). Gene effect scores were estimated by CERES^[296]24, which measures the dependency probability of each gene relative to the distribution of effect sizes for common essential and nonessential genes within each cell line^[297]25. Because many genes do not impact cell viability (CERES < −0.5), elastic-net models were attempted only for genes with at least five dependent and nondependent cell lines, which included 7,260 out of 18,119 genes (40%) with effects scores in the DEPMAP (1Q21 release). Genome-wide datasets (19,005 genes) for RNA-seq, mutations and copy number variants (log[2] relative to ploidy + 1) for the 897 cell lines were downloaded directly from the DEPMAP (1Q21; [298]https://depmap.org/portal/). The ‘glmnet’ package (v.4.1.3)^[299]23 was used to build elastic-net regularized regression models with balanced weights for L1 and L2 norm regularization. The α values were kept constant at 0.5 for all models. Models were tenfold cross-validated using ‘lambda.min’ from cv.glmnet from the glmnet R package (100 lambdas tested per model by default) to select the lambda showing the minimum error balanced with the prediction performance and the number of features selected, as described previously^[300]61. The performance of the optimal model was then assessed by Pearson’s correlation coefficient (R), with a ‘pass’ threshold of R > 0.2 and FDR < 0.001 to correct for multiple hypothesis testing. The cross-validated models were also compared to models generated using the DepMap confounders dataset as a null distribution, including sex, cas9 activity, age, lineage, primary or metastasis, growth pattern, library, screen quality and cancer type. As shown in Extended Data Fig. [301]7, the expression-only gene essentiality models significantly outperformed the models built on confounders, with the 0.2 cross-validation threshold corresponding to P < 0.03 in the confounder distribution (~7,000 models). Cross-validation confirmed 1,966 expression-only models and 2,045 multi-omics models, of which the majority of cross-validated models overlapped (n = 1,797) between the two datasets (Supplementary Table [302]3). Extended Data Fig. 7. Comparison of cross-validated models with models generated using the DepMap confounders dataset as a null distribution, including sex, cas9 activity, age, lineage, primary or metastasis, growth pattern, library, screen quality and cancer type. [303]Extended Data Fig. 7 [304]Open in a new tab (a) Distribution of model performance across expression-only and confounder models. (b) The expression-only gene essentiality models significantly outperformed the models built on confounders, with the 0.2 cross-validation threshold corresponding to p < 0.03 in the confounder distribution (~7000 models). [305]Source data Model transposition following transcriptional alignment of DEPMAP to TCGA, PDXE and GTEX datasets to build TCGA[DEPMAP], PDXE[DEPMAP] and GTEX[DEPMAP] The translational dependency maps TCGA[DEPMAP], PDXE[DEPMAP] and GTEX[DEPMAP] were built using expression-only models of gene essentiality, based on relatively marginal performance gains in the multi-omics models of gene essentiality, as reported elsewhere^[306]26,[307]27. To enable transposition of the cross-validated expression-only models (n = 1,966) from the DEPMAP to TCGA (n = 9,596 tumors), PDXE (n = 191 tumors) and GTEX (n = 17,382 tissues across 54 tissues and 948 donors), the genome-wide gene expression datasets were downloaded for TCGA ([308]https://xenabrowser.net/datapages/), PDXE^[309]50 and GTEX ([310]https://gtexportal.org/home/datasets). For TCGA data, if multiple samples were collected from the same patient, only the primary tumor biopsy was included in TCGA[DEPMAP]. For GTEX, the potential biases introduced by sampling multiple organ tissues from each individual was assessed by Uniform Manifold Approximation and Projection (UMAP) analysis of the gene expression profiles across GTEX samples, which revealed that GTEX samples are clustered by tissue types rather than by individuals. Likewise, no evidence of clustering was observed based on other patient-specific clinical variables (for example, cause of death and age), suggesting that the tissue-specific effects are the predominant drivers of gene expression in healthy tissues. Unsupervised cluster analyses by UMAP dimension reduction were used to evaluate the similarities in expression profiles of the DEPMAP cell lines compared to the tissue biopsies from TCGA, PDXE and GTEX. As reportedly previously^[311]56, DEPMAP and TCGA expression profiles do not cluster well by UMAP alignment due to contaminating transcriptional profiles of stromal and immune cells, which would impact expression-based predictive modeling of gene essentiality. Likewise, UMAP clustering of expression profiles from the DEPMAP cell line data compared to PDXE and GTEX samples revealed that transcriptional alignment of these data was equally problematic. To overcome this issue, expression data from DEPMAP and TCGA were quantile normalized and transformed by cPCA, which is a generalization of the PCA that detects correlated variance components that differ between two datasets. When comparing the transcriptional profiles of the DepMap cell lines and TCGA patient tumors, the top contrastive principal components (cPC1–4) derived from the stromal contamination in TCGA, which were then removed followed by multiple-batch correction to normalize the expression data by matching the corresponding clusters in TCGA and DEPMAP. To assess transcriptional alignment on model transposition, all pre- and post-aligned TCGA[DEPMAP] gene essentiality models were compared to tumor purity, which revealed a strong correlation between gene essentiality and tumor purity that was removed by transcriptional alignment. An identical approach was utilized for aligning PDXE expression data, with the slight modification that only cPC1–3 required removal, as PDX models grown in immunocompromised mice lack the adaptive immune system and typically have lower stromal contamination. For aligning DEPMAP and GTEX data, a slightly different approach was used to combine quantile normalization and ComBat^[312]62 to remove potential batch effects without using cPCA, as GTEX data only includes nonmalignant tissue. Finally, the observed (DepMap) and predicted (TCGA[DEPMAP], PDXE[DEPMAP] and GTEX[DEPMAP]) gene essentiality scores were aligned by linear regression, whereby the slopes of each model were fitted using a constant to make the absolute value comparable to the measured essentiality values. Notably, because this approach used a scaling factor, the pattern of gene essentiality scores was not affected. All data are available on figshare^[313]63. Characterization of TCGA[DEPMAP] The distribution of the cross-validated expression-only models of gene essentiality (n = 1,966) across lineages was assessed by unsupervised cluster analysis (Ward.D2 method) and visualized using the ComplexHeatmap R package (v.2.6.2). A similar approach was used for unsupervised cluster analysis and heatmap visualization for molecular subtyping of the BRCA cohort of TCGA[DEPMAP] using the DEP100 across BRCA cohort only. For lack of other ground truths, the performance of TCGA[DEPMAP] to classify molecular subtypes of BRCA was benchmarked using a linear discriminant analysis with leave-one-out cross-validation performed using the MASS package (v.7.3.51.4) for R and the CV = TRUE option in the function. Predictions for each cancer type and subtype was evaluated separately and the AUC values were determined using the function ‘roc’ from the pROC (v.1.18.0) package for R and compared to the molecular typing and subtyping reported by the TCGA ([314]https://www.cbioportal.org/)^[315]64. In addition to BRCA molecular subtypes, a distinct subset of the 100 most variable dependencies from the pan-cancer TCGA[DEPMAP] dataset was used to benchmark TCGA[DEPMAP] more broadly, using an identical linear discriminant analysis with leave-one-out cross-validation, as described above. Finally, both analyses were repeated with the DeepDEP gene essentiality values reported by Chiu et al.^[316]27 and the receiver operating characteristic (ROC) AUC values were compared between TCGA[DEPMAP] and DeepDEP predictions of cancer lineages and BRCA cancer subtypes. Associations of dependencies with genomic features (somatic mutations and copy number variants) in TCGA[DEPMAP] were assessed using a Wilcoxon rank-sum differential test as implemented using stat_compare_means function of ggpubr R package (v.0.4.0). The ability of expression features to predict essentiality and mutational status of same gene by elastic-net modeling was compared using the glmnet R package (v.4.1) with the same parameters for both model sets. The elastic-net models were allowed to select the most informative predictive features for mutation and essentiality for each gene, as the best predictors for essentiality may not be the best features to predict mutation. For AUC evaluation, we used −0.5 as the cutoff for gene essentiality scores to determine sensitive and resistant cells for gene models. The AUC values are calculated using pROC R package (v.1.16.2). To characterize SSDs, a normality likelihood ratio test (NormLRT)^[317]29 was performed with slight modifications to rescale the larger NormLRT values observed in TCGA[DEPMAP] due to a tenfold larger cohort size (n = 9,596) compared to DEPMAP (n = 897). A bootstrapping of the DEPMAP gene effect scores was performed to estimate how the NormLRT scores change when scaling up from the DEPMAP cohort size (n = 897 cell models) to the cohort size of TCGA (9,596). A linear fitting was performed to estimate the slope between DEPMAP and bootstrapped equivalent, which was as a scaling factor (0.07) to rescale TCGA NormLRT scores. Notably, outliers were identified based on the ranking NormLRT scores within each cohort, which therefore was not affected by the rescaling TCGA NormLRT scores. For TCGA patients with BRCA (n = 765), we divided the patients into PTPN11 dependent and nondependent groups. The PTPN11-dependent patients (77 patients) were selected as the top 10% patients with BRCA with the lowest PTPN11 essentiality scores. Among all the variants, we applied Fisher’s exact test for mutations with more than 5% frequency (12 mutations), deletions with more than 10% frequency (4,891 deletions) and amplifications with more than 10% frequency (4,831 amplifications). The test was performed using the fisher.test function in the stats (v.4.0.3) R package with options ‘alternative = greater’ to calculate P values for enrichment of variants for PTPN11 dependent and nondependent groups. The gene models (890 models) used for mutation predictions are selected from 1,966 cross-validated expression-only essentiality models with a mutation frequency >2%. Associating clinical outcomes with tumor dependencies in TCGA[DEPMAP] Owing to the limited accessibility of therapeutic response data in TCGA^[318]33, the association of HER2 essentiality with response to trastuzumab (anti-HER2 antibody) was tested in a recent trastuzumab clinical trial of 50 HER2^+ patients with BRCA with pre- and post-treatment biopsies that were analyzed by microarray^[319]34. The microarray expression data were downloaded from NCBI GEO (accession code [320]GSE76360) and patient responses were defined by the study authors^[321]34. Differences in predicted HER2 essentiality in patients with different clinical responses were tested using ggpubr R package (v.0.4.0), followed by a Wilcoxon rank-sum test using the stat_compare_means function in the package. Correlation of HER2 essentiality and HER2 expression after treatment was tested by a Pearson’s correlation, as calculated by the stat_cor function ggpubr R package (v.0.4.0). For predicting essentiality response to sorafenib, although it is a multi-kinase inhibitor (BRAF, CRAF, VEGFR2, VEGFR3, PDGFRB, FLT3 and cKIT), its role in treating hepatocellular carcinoma (HCC) is widely attributed to inhibiting oncogenic RAF signaling. This combined with the observation that BRAF essentiality model performance (R = 0.71) was far better than the other target models (R = 0.2 to 0.45), led us choose the BRAF essentiality model to predict sorafenib response in the HCC cohort. Additionally, the correlation of TCGA[DEPMAP] dependencies with the PFI of TCGA patients was performed, excluding the acute myeloid leukemia (AML), diffuse large B-cell lymphoma (DLBC), kidney chromophobe (KICH) and pheochromocytoma and paraganglioma (PCPG) cohorts based on the recommendations of Liu et al.^[322]33. The PFI data were directly downloaded from Liu et al.^[323]33 and the maximally selected rank statistics from the ‘maxstat’ R package was used to determine the optimal cutoff point for dichotomization (high versus low) of dependency scores (n = 1,966 cross-validated models). The prognostic value of the resulting dichotomized dependency scores was evaluated using the log-rank test with FDR correction (Benjamini–Hochberg adjusted) to account for multiple hypothesis testing. The data were visualized by Kaplan–Meier curves and are interpreted as HR > 1 indicating a worse expected outcome in patients with a higher dependency score at an FDR < 0.2. Predicting synthetic lethality relationships in TCGA[DEPMAP] Multiple approaches were integrated to predict and prioritize synthetic lethality relationships with LOF events (defined as a predicted copy number loss or damaging mutation) in TCGA[DEPMAP]. Lasso regression was used to identify gene essentialities (n = 7,260 expression-only models) with increased dependencies associated with 25,026 LOF events in TCGA, as annotated by Bailey et al.^[324]65. For each model, the lambda value was selected as the lowest error by fivefold cross-validation and the resulting models with coefficients >0.3 were further evaluated by a t-test. The lasso regression analysis identified 633,232 predicted synthetic lethal candidates (FDR < 0.01), which were too numerous to experimentally validate and required further prioritization. First, UNCOVER^[325]66 was used to prioritize synthetic lethal candidates predicted by TCGA[DEPMAP] that correlated with endogenous mutual exclusivity of LOF mutations (3–70% prevalence) in TCGA, with the hypothesis that these candidates would have greater translational relevance. UNCOVER was ran in greedy mode (UNCOVER_greedyv2.py) to identify negative association with a mutated gene sets of maximum ten genes. To evaluate the confidence of association, we set the number of permutations as 100 to compute P values and applied a threshold of P < 0.01. Of the 633,232 predicted synthetic lethal candidates predicted by TCGA[DEPMAP], 28,609 pairs also had evidence of mutually exclusive mutation rates in TCGA. The candidate list was then refined further by prioritizing paralogs using the biomaRt paralog database (v.2.28.0) R package. We additionally included pairs characterized by phylogenetic distance with threshold less than 1.5, as described previously^[326]67,[327]68. The candidate list received a final filtering based on overall patient prevalence of LOF events, protein–protein interactions with TSGs^[328]69,[329]70, previous experimental evidence of gene–gene interactions^[330]6,[331]16,[332]17,[333]39,[334]40 and manual curation to include essential and nonessential controls. The final list of gene pairs that were prioritized for experimental validation included 601 synthetic lethality candidates from the original lasso regression of TCGA[DEPMAP] and an additional 264 pairs that were retained as library controls. The list of all synthetic lethal pairs that were predicted by TCGA[DEPMAP], as well as annotations of mutual exclusivity and phylogenetic distance, is provided as an R object in the figshare repository ([335]https://figshare.com/s/a76d338a425273b42c8b)^[336]71. Multiplexed screening synthetic lethalities using AsCas12a (AsCpf1) and enAsCas12a (enAsCpf1) Guides were designed using the TTTV PAM for AsCas12a and synthesized into four-guide arrays with direct repeats (DR)−0, −1, −2 and −3 preceding each guide, followed by cloning into a guide-only lentiviral vector (pRDA_052), as described previously^[337]45,[338]46. A DKO construct was designed with two guides × two genes (n = 4 guides total per construct) for each pair of synthetic lethal candidates. Single KO constructs were also designed two guides × one gene + two nontargeting (NT) guides (n = 4 guides total per construct) for each pair of synthetic lethal candidates. For some pairs, multiple single KOs were used to assess overall library variance and were collapsed to the median values for downstream gene interaction analysis. A total of 500 constructs with four NT guides were also included in the library as negative controls. An initial set of pilot screens were performed in triplicate using A549 (ATCC), NCI-H1299 (ATCC), MDA-MB-231 (ATCC), PC3M (MD Anderson) and DETROIT562 (ATCC) that stably express AsCas12a, as described previously^[339]46. An enhanced AsCas12a (enAsCas12a) enzyme was recently reported that is compatible with CRISPR/AsCas12a libraries^[340]44, enabling an independent replication of the initial pilot screens and expansion to a total 14 total cancer cell models. The subsequent screens using enAsCas12a were performed in triplicate using A549 (ATCC), NCI-H1299 (ATCC), MDA-MB-231 (ATCC), NCI-H1703 (ATCC), PC3M (MD Anderson), DETROIT562 (ATCC), HT29 (ATCC), HCT116 (ATCC), PANC1 (ATCC), MIAPACA2 (ATCC), SNU1 (ATCC), HSC2 (JCRB), HSC3 (JCRB) and FADU (ATCC). For all screens, cells were infected at a multiplicity of infection of 0.3 and cultured for 14 days while continuously maintaining 500× coverage, followed by DNA extraction and PCR-barcoding using the p5 Agon and p7 Kermit primers^[341]46. The PCR-barcoded libraries were single-end sequenced using an Illumina HiSeq4000 (300× cycle), followed by demultiplexing of sequencing reads (bcl2fastq, Illumina) and quantification of guide array abundance across all samples was performed with a custom Perl script. Sequences between the flanking sequences or by location were extracted and compared to a database of sgRNA for each library. Only perfectly matched sgRNA sequences were kept and used in the generation of count matrix. Normalization between all samples was conducted using the ‘TMM’ method^[342]72 implemented in the edgeR R Bioconductor package. The log[2] fold changes (L2FCs) of guide array abundance were calculated by comparing day 14 libraries with the plasmid library using limma-voom^[343]73. GIs were calculated by comparing the expected and observed L2FC of double and single KO constructs, as described previously^[344]39,[345]45. In brief, the expected L2FC for DKO constructs is calculated as a sum (LF2C) of the individual knockout (sgRNA + NT). Synthetic lethal and buffering interactions are defined for DKO in which the observed double knockout L2FC is significantly greater or less than that of the expected L2FC, respectively. No statistical methods were used to predetermine sample sizes but our sample sizes are similar to those reported in previous publications that have used multiplexed CRISPR to screen synthetic lethal interactions^[346]39,[347]45. Experimental validation of PAPSS1/2 and CNOT7/8 synthetic lethality CRISPR/Cas12 KOs of PAPSS1, PAPSS2, CNOT7 and CNOT8 were performed with Cas12 Ultra (Integrated DNA Technologies, 10007804) according to the manufacturer’s instructions by Neon electroporation of RNPs (Invitrogen). Guides were designed using the Broad Institute CRISPick algorithm and the two best-performing guides for each gene were used in combination (Supplementary Data). Protein expression was quantified by Simple Western (ProteinSimple, BioTechne) using the following antibodies; PAPSS1 clone 1F4 (Abnova, H00009061-M05) at 1:100 dilution, PAPSS2 (Cell Signaling Technology, 70638) at 1:50 dilution, PTEN (Cell Signaling Technology, 9552) at 1:100 dilution, CNOT7 (Santa Cruz, sc-101009) at 1:10 dilution, CNOT8 (LSBio, LS-C99242-400) at 1:1,000 dilution with β-actin clone 8H10D10 (Cell Signaling Technology, 3700), 1:1000 GAPDH clone 14C10 (Cell Signaling Technology, 2118) or 1:1,000 α-tubulin (Cell Signaling Technology, 2144) as loading controls. Flow cytometry analysis of sulfonated HSPGs was performed with the 10E4 antibody conjugated to FITC and used at 1:200 dilution (US Biological Life Sciences, H1890-10) (Extended Data Fig. [348]8). Bacteroides heparinase III was obtained from New England Biolabs (P0737L) and used as per manufacturer’s protocol by treating cells for 1 h in reaction buffer at 30 °C before FACS analysis. Spheroid cultures were performed on ultra-low attachment 96-well plates (Corning, 7007), growth was tracked on Incucyte S3 (Sartorius) and CellTiterGlo (CTG) readouts were performed for viability measurements (Promega, G9681). For rescue experiments, HS was used at 10–50 μg ml^−1 (Sigma, H7640). For CNOT7-null single-clone generation, HT29 (ATCC) cells were transduced with pFUN_104 Cas9 plasmid (Broad Institute), CNOT7-KO was performed with CRISPR/Cas12 RNP electroporation, CNOT7-KO single clones were isolated and expanded and clones sc2 and sc7 were transduced with the Cellecta pRSGTEP-U6Tet-sg-EF1-TetRep-2A-Puro vector containing the CNOT8-targeting sgRNA (sgCNOT8; 5′-CCAGGTTATCTGTGAAGTGT-3′ (CVCRC-PX, 98847-3P) or NT control (sgNT; 5′-GGCAGTCGTTCGGTTGATAT-3′ SGCTL-NT-pRSGTEP). Cells were then cultured in medium containing Tet System Approved FBS (TakaraBio, 631101) and dox was used at 1 μg ml^−1 for in vitro experiments. For in vivo experiments, 1 × 10^6 UMUC3 (ATCC) or HT29 cells were reconstituted in Hanks balanced salt solution, mixed 1:1 with Matrigel (Corning, 356235) and 200 μl inoculated in the right flank (n = 5 mice per condition). Female CB17/SCID and SCID/beige at 6–8 weeks of age were obtained from Charles River. In vivo experiments were conducted in compliance with AbbVie’s Institutional Animal Care and Use Committee and the National Institutes of Health guidelines in the Health Guide for Care and Use of Laboratory Animals. Tumor measurements of length (L) and width (W) were obtained using calipers and volume (V) calculated using the formula V = (L × W^2)/2. A maximum of 2,000 mm^3 tumor volume was allowed as per institutional guidelines. PAPSS1/PAPSS2 tumors were extracted at day 22, mechanically dissociated with scalpels and single-cell suspensions were made using Liberase and DNase I (Millipore Sigma, 05401127001 and 11284932001, respectively) incubated at 37 °C for 1 h and mouse cells were magnetically depleted on LS columns using mouse cell depletion cocktail (Miltenyi, 130-104-694 and 130-042-401). No statistical methods were used to predetermine sample sizes but our sample sizes are similar to those reported in previous publications that have tested tumor vulnerabilities and synthetic lethalities^[349]16,[350]42,[351]43,[352]74,[353]75. Extended Data Fig. 8. Gating strategy for Flow Cytometry plots. [354]Extended Data Fig. 8 [355]Open in a new tab Cells were first gated by FSC-A/SSC-A (~95%), and single cells by FSC-A/FSC-H (~98%). DAPI staining was used to gate viable cells (~98%). Unstained cells and/or Heparinase III treated cells were used for establishing the positive 10E4-FITC gate. [356]Source data Characterization of PDXE[DEPMAP] The distribution of the cross-validated expression-only models of gene essentiality (n = 1,966) across lineages was assessed by unsupervised cluster analysis (Ward.D2 method) and visualized using the ComplexHeatmap R package (v.2.6.2). Associations of dependencies with genomic features were assessed using a Wilcoxon rank-sum differential test as implemented using stat_compare_means function of the ggpubr R package (v.0.4.0). To test the ability of gene essentiality to predict the response to corresponding targeted therapies, the change in PDX burden from baseline to experimental end point was correlated with target gene essentiality in PDXE[DEPMAP] using a Pearson’s correlation test and FDR correction of P values for multiple hypothesis testing. ROC AUC analysis was performed using the pROC R package (v.1.18.0) to assess the accuracy of drug responses predicted by the target gene essentiality scores. Only drugs with at least 20 treated PDX models were evaluated and the metrics are reported in Supplementary Table [357]13. Characterization of GTEX[DEPMAP] The distribution of the cross-validated expression-only models of gene essentiality (n = 1,966) across healthy tissues was assessed by unsupervised cluster analysis (Ward.D2 method) and visualized using the ComplexHeatmap R package (v.2.6.2). Differences in gene essentiality in healthy and malignant tissues, as well as malignant tissues with genomic features, were assessed using a Wilcoxon rank-sum differential test as implemented using stat_compare_means function of ggpubr R package (v.0.4.0). Notably, the distributions of dependencies between TCGA[DEPMAP] and GTEX[DEPMAP] by PCA revealed that that the predicted dependency scales are similar between the two datasets (Extended Data Fig. [358]9) and thus any differences in gene essentiality are due to underlying biological mechanisms that differ between healthy and malignant tissues. To evaluate the sensitivity and specificity of GTEX[DEPMAP] to genes associated with tissue-specific toxicities, we profiled GTEX[DEPMAP] genes associated with both blood disorders and drug-induced liver toxicity using the Cortellis OFF-X database ([359]https://targetsafety.info/). The OFF-X database is a drug and target safety intelligence database that predicts potential associations based on both preclinical and clinical safety data alerts from peer-reviewed journals, company communications, clinical trials and regulatory agency communications. These blood and liver toxicity associations were further evaluated to identify overlapping or unique genes for each toxicity and annotated with the frequency of associated safety alerts. In total, the Cortellis OFF-X database identified drug targets associated with potential toxicities in blood (n = 82), liver (n = 85) or blood and liver (n = 74), which were then compared across healthy tissue lineages in GTEX[DEPMAP]. To compare gene essentiality between malignant and healthy tissues, TCGA[DEPMAP] and GTEX[DEPMAP] samples were matched based on the tissue of origin and a Student’s t-test was applied to differential analysis between the dependency profiles of tumor and healthy tissue of the same lineages. The t-statistic was used to characterize the dependency difference between the tumor and corresponding healthy tissue with a negative t-statistic value corresponding to a higher dependency in tumor as compared to the healthy tissue. Gene set enrichment analysis was performed across all paired malignant and healthy tissues of origin. The list of genes for the lung network was generated using the top 100 genes showing the largest differentiation in gene essentiality between cancer compared to healthy tissue in lung based on the negative t-statistic values. Network connectivity and gene ontology enrichment were calculated using STRING ([360]https://string-db.org/), as described previously^[361]76. Extended Data Fig. 9. The GTEX and TCGA expression profiles were aligned and normalized independently to the same DepMap expression profile and the same models (genes and coefficients) were used for both datasets. [362]Extended Data Fig. 9 [363]Open in a new tab (a) Overall range of effect sizes for both datasets was investigated using a PCA, which demonstrates that the dependency distributions show that the predicted dependency scale is very similar for the two datasets. (b) The distribution of gene essentiality scores is similar between TCGA[DEPMAP] and TCGA[DEPMAP]. [364]Source data Statistics and reproducibility All data used for the machine learning and translation of gene essentiality are from publicly available consortia with detailed methodologies for data collection, blinding, randomization and protection. Because the essentiality profiles have a long tail distribution, we have used the nonparametric Wilcoxon test, which does not require a particular probability distribution of the dependent variable in the analysis. Therefore, no tests were required for the normality assumption. No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. Reporting summary Further information on research design is available in the [365]Nature Portfolio Reporting Summary linked to this article. Supplementary information [366]Reporting Summary^ (2.4MB, pdf) [367]Supplementary Data 1^ (288.3KB, xlsx) Sequence for the CRISPR guide and SL library. [368]Supplementary Table 1^ (74.5MB, csv) Supplementary Table 1: expression-only elastic-net model coefficients of gene essentiality predicted from the DEPMAP. The rows denote features used in modeling and columns are corresponding gene models. [369]Supplementary Table 2^ (262.9MB, csv) Supplementary Table 2: multi-omics elastic-net model coefficients of gene essentiality predicted from the DEPMAP. The rows denote features used in modeling and columns are corresponding gene models. [370]Supplementary Tables 3–17^ (2MB, xlsx) Supplementary Table 3: list of cross-validated genes from expression-only and multi-omics models. Supplementary Table 4: pathway enrichment analysis of dependencies most changed by the transcriptional alignment, as defined by gene essentiality scores with the least correlation (r < 0.5) before and after alignment of DEPMAP and TCGA transcriptomes. Supplementary Table 5: gene essentiality scores across TCGA[DEPMAP]. The row labels denote the gene model and the column labels correspond to TCGA patient sample ID ([371]https://figshare.com/projects/TCGADEPMAP_Mapping_Translational_De pendencies_and_Synthetic_Lethalities_within_The_Cancer_Genome_Atlas/130 193). Supplementary Table 6: association of oncogene essentiality with GOF events. Supplementary Table 7: SSDs across DEPMAP and TCGA[DEPMAP], as assessed by the NormLRT. Supplementary Table 8: predicting therapeutic response in clinical datasets based on the target’s dependency profiles. Supplementary Table 9: Cox log-rank analysis of gene essentiality associated with PFI in TCGA[DEPMAP]. The row labels denote the gene models and the column heads correspond to the HRs and P values, respectively. Supplementary Table 10: annotation of synthetic lethality candidates that were experimentally tested by multiplexed CRISPR/Cas12 screening. The row labels denote gene pairs and column labels correspond to whether a pair was predicted by TCGA[DEPMAP] and whether the pair was annotated as paralogs or TSGs. Supplementary Table 11: GI scores from CRISPR/Cas12 screens using AsCas12a and enAsCas12a across 14 cell models. The row labels denote the gene pairs and the column labels correspond the to the enzyme used (AsCas12a or enAsCas12a), the expected (exp.) effect (summed z-score of the single KO), the observed (obs.) effect of DKO (z-score transformed), the difference (Diff.) of expected and observed effects and the z-score transformation of the difference (Diff_Z). Supplementary Table 12: gene essentiality scores across PDXEDEPMAP. The row labels denote the gene model and the column labels correspond to the PDX sample ID ([372]https://figshare.com/projects/TCGADEPMAP_Mapping_Translational_De pendencies_and_Synthetic_Lethalities_within_The_Cancer_Genome_Atlas/130 193). Supplementary Table 13: predicting drug response based on gene essentiality in PDX models. Supplementary Table 14: gene essentiality scores across GTEX[DEPMAP]. The row labels denote the gene model and the column labels correspond to the GTEX sample ID ([373]https://figshare.com/projects/TCGADEPMAP_Mapping_Translational_De pendencies_and_Synthetic_Lethalities_within_The_Cancer_Genome_Atlas/130 193). Supplementary Table 15: list of genes associated with both blood disorders and drug-induced liver toxicity that were curated from the Cortellis OFF-X database. The OFF-X database is a drug and target safety intelligence database that predicts potential associations based off both preclinical and clinical safety data alerts from peer-reviewed journals, company communications, clinical trial and regulatory agency communications. The rows denote the genes associated with potential toxicities in blood (n = 82), liver (n = 85) or blood and liver (n = 74). The column labels correspond to the different evidence classes of safety alerts. Supplementary Table 16: differences between predicted essentiality between malignant tissue and the healthy tissue of origin. The rows denote the gene model and the column labels correspond to the malignant (TCGA[DEPMAP]) and healthy (GTEX[DEPMAP]) tissue types that were matched based on the tissue of origin. A Student’s t-test was applied to differential analysis between the dependency profiles of tumor and healthy tissue of the same lineages. The t-statistic was used to characterize the dependency difference between the tumor and corresponding healthy tissue with a negative t-statistic value corresponding to a higher dependency in tumor as compared to the healthy tissue. Supplementary Table 17: pathway analysis of the strongest tumor dependencies with the least normal tissue-of-origin toxicity. The rows denote the pathway and the column labels correspond to the enrichment P value, FDR-corrected P value (Padj), normalized enrichment score, number of genes included in the gene set (size), the gene models comprising the leading edge (leadingEdge) and the paired malignant and healthy tissues (type). Source data [374]Source Data Fig. 1^ (366.4KB, xlsx) Source data for Fig. 1. [375]Source Data Fig. 2^ (1.9MB, xlsx) Source data for Fig. 2. [376]Source Data Fig. 3^ (796.9KB, xlsx) Source data for Fig. 3. [377]Source Data Fig. 4^ (192.2KB, xlsx) Source data for Fig. 4. [378]Source Data Fig. 5^ (197.8KB, xlsx) Source data for Fig. 5. [379]Source Data Fig. 6^ (36.6KB, xlsx) Source data for Fig. 6. [380]Source Data Fig. 7^ (1.5MB, xlsx) Source data for Fig. 7. [381]Source Data Fig. 4^ (107KB, pdf) Uncropped image for Fig. 4h. [382]Source Data Extended Data Fig. 1^ (67.6KB, xlsx) Source data for Extended Data Fig. 1. [383]Source Data Extended Data Fig. 2^ (476.9KB, xlsx) Source data for Extended Data Fig. 2. [384]Source Data Extended Data Fig. 3^ (1.3MB, xlsx) Source data for Extended Data Fig. 3. [385]Source Data Extended Data Fig. 4^ (386.5KB, xlsx) Source data for Extended Data Fig. 4. [386]Source Data Extended Data Fig. 5^ (69.1KB, xlsx) Source data for Extended Data Fig. 5. [387]Source Data Extended Data Fig. 6^ (10.2KB, xlsx) Source data for Extended Data Fig. 6. [388]Source Data Extended Data Fig. 7^ (327.1KB, xlsx) Source data for Extended Data Fig. 7. [389]Source Data Extended Data Fig. 8^ (10.1KB, xlsx) Source data for Extended Data Fig. 8. [390]Source Data Extended Data Fig. 9^ (89.5KB, xlsx) Source data for Extended Data Fig. 9. [391]Source Data Extended Data Fig. 10^ (444.6KB, pdf) Uncropped images for Extended Data Fig. 10. Acknowledgements