Graphical abstract graphic file with name fx1.jpg [42]Open in a new tab Highlights * • Computational approaches for detecting paired targets using CRISPR screen data * • Validation of paralog gene pairs' effect on T cell killing using CRISPR double knockouts * • A machine learning model for predicting paralog gene pairs that modulate immunotherapy The bigger picture Immunotherapies leverage the patient’s own immune system to fight cancer, and emerging therapies in this domain have revolutionized cancer treatment; however, resistance and relapse remain challenges. Recent efforts to identify combinatorial targets, or pairs of genes that could be used together for cancer treatment, are hindered by limitations in current high-throughput screening approaches. This type of screening often misses promising target pairs. In this study, we leverage public genomic datasets to identify and validate gene pairs that synergistically enhance effector T cell function, ultimately promoting immune-mediated tumor eradication. Our findings underscore the overlooked potential of paralog gene pairs in driving functional synergy and offer a framework for designing more effective combinatorial screen libraries. While this study focuses on cancer immunotherapy, the methods we developed for identifying combinatorial targets may be broadly applicable to other viability functional screening asks, where large-scale datasets are more readily available. __________________________________________________________________ Combinatorial targeting has shown promise in cancer therapies; however, the experimental detection of paired functional targets remains challenging due to the vast number of possible combinations. In this study, the authors introduce a method to repurpose single-gene screen data to identify functional paralog pairs that enhance immunotherapy, extending the search through machine learning approaches. The authors further demonstrate experimentally that synergistic paralog pairs can improve immunotherapy efficacy, even when neither individual target is effective, a scenario often underestimated. Introduction Emerging immunotherapies, especially immune checkpoint blockade (ICB) and adoptive cell therapy, have revolutionized cancer treatment for multiple cancer types; however, a significant fraction of patients fail to respond to immunotherapies or relapse following treatment.[43]^1 Recent efforts have focused on using CRISPR knockout (KO) or activation screening to identify targets that enhance T cell effector function and augment immune killing capability.[44]^2 Nevertheless, overcoming resistance by manipulating a single gene remains challenging due to the compensatory effects of other genes. Combination therapies have been proposed as a promising strategy to overcome tumor resistance to monotherapy when single agents are ineffective.[45]^3 Paralogs, genes that originate from the same ancestors and share similar functions, often work in concert to promote cell survival. Numerous focused studies have utilized genomic data and pooled screening techniques to explore how targeting paralogous genes can impact cancer cell viability.[46]^4^,[47]^5^,[48]^6 Recent research has shown that targeting these genes simultaneously in cancer cells can enhance the effectiveness of immunotherapy by preventing the cells from evading T cell-mediated destruction. For instance, Park et al. demonstrated that treating mice with both anti-PD-L1 and anti-PD-L2 antibodies significantly reduced tumor growth in specific mouse models.[49]^7 Additionally, signals from both PARP-1 and PARP-2 are essential for initiating an effective T cell response against breast cancer in mice.[50]^8 While pooled CRISPR screening is popular for identifying single genes that modulate specific responses, including those relevant to immunotherapy, double-knockout screens aiming to identify paired targets remain rare. Recent efforts in double-knockout screens, covering dozens to hundreds of gene pairs, have been conducted to identify novel gene combinations that lead to synthetic lethality and boost T cell function.[51]^5^,[52]^9^,[53]^10 However, a comprehensive, high-throughput strategy to screen all potential paralog pairs related to immunotherapy is challenging because a very large screening library that includes hundreds of thousands of gene combinations would be both costly and unfeasible. To solve this problem, we have developed a computational approach to identify the most promising paralog gene combinations, which can then be used to build a focused and manageable library for experimental validation. Here, we used an single guide RNA (sgRNA) set enrichment method to identify cancer-intrinsic paralog pairs that enhance T cell killing from prior pooled screening data. We show that our method identifies novel synergistic paralog pairs that are successfully validated experimentally. We have also constructed an ensemble learning XGBoost classifier to predict true-positive paralog pairs enhancing immunotherapy by incorporating features such as gene characteristics, sequence and structural similarity, protein-protein interaction (PPI) networks, and gene co-evolution data. To enhance the reliability of the XGBoost classifier predictions, we conducted further experimental validations on the highest-ranked paralog pairs identified. This study is expected to uncover novel therapeutic targets and inspire new combination immunotherapies to aid cancer treatment. Results Identifying cancer-intrinsic paired paralogs that enhance T cell-mediated killing using CRISPR screen data A large proportion of genes have paralogs that perform redundant functions. Inactivation of functionally important paralog pairs can lead to synthetic lethality, as well as more efficient modulation of biological pathways.[54]^9^,[55]^11 Unfortunately the identification of these paralog pairs is limited in single-gene perturbation screens due to compensation between paralogous genes,[56]^12 and dual gene screening is often unfeasible due to the large size of CRISPR libraries required to encompass all potential combinations of paralogous gene pairs. Current double-knockout screens cover only a limited number of genes and require a substantial number of gRNA pairs ([57]Table S1). Even the largest paired gene screens to date cover less than 0.1% of all possible combinations.[58]^5^,[59]^9^,[60]^12^,[61]^13^,[62]^14^,[63]^15^,[64]^1 6^,[65]^17^,[66]^18 To overcome this limitation, we proposed an unbiased computational approach to filter and predict functional gene pairs from prior published genome-wide CRISPR screen data. We initially attempted to search for immunotherapy-aiding paralog pairs using conventional MAGeCK results in Lawson’s CRISPR screen dataset,[67]^19 which was originally aimed at identifying genes involved in evading T cell-mediated cytotoxicity across six murine tumor cell lines. Based on a strict dual-hit gene criterion, we observed that few or no suitable pairs identified ([68]Figure S1A); however, relaxing the criteria to include single hit genes leads to a significant increase in uncertain pairs. To overcome this, we developed an enrichment-based methodology named paralog sgRNA set enrichment analysis (pSSEA), which integrates the sgRNAs of gene pairs to enhance the identification of high-potential paralog pairs. We provide a schematic of the pSSEA algorithm in [69]Figure 1B. Briefly, the process begins with the results of differential expression analysis (using DESeq2, MAGeCK, etc.) as input. The pSSEA evaluates the enrichment score by employing a Kolmogorov-Smirnov (KS)-like random walk statistic, a method similar to those used in GSVA[70]^20 and ASSESS.[71]^21 The detailed steps include [MATH: ν(l)=i=1l|ri|Ι(giγ) i=1p|ri|Ι(giγ)i=1l|ri|Ι(giγ)p|γ|< /mrow> :MATH] (Equation 1) where [MATH: ri :MATH] is the rank order of sgRNAs, sorted by the log fold changes; [MATH: γ :MATH] is the sgRNA set that belongs to the paralog gene pair; [MATH: Ι(giγ) :MATH] is the indicator function on whether the i-th sgRNA is in paralog sgRNA set [MATH: γ :MATH] , [MATH: |γ| :MATH] is the number of sgRNAs belonging to the gene pairs, and p is the number of total sgRNAs in the library. [MATH: ES=ν[argmaxl=1,.. .,p|ν(l)|] :MATH] (Equation 2) Figure 1. [72]Figure 1 [73]Open in a new tab Identification of paralog pairs that enhance T cell killing using pSSEA (A) Protein-coding paralogous genes retrieved from Ensembl Compara database using bioMart. (B) Workflow for identifying paralog pair hits using the paralog sgRNA set enrichment analysis (pSSEA). (C) Identification of paralog pairs that enhance T cell killing in six cancer models using pSSEA. CRISPR screen data were taken from GEO: [74]GSE149933 . The volcano plot of pSSEA outputs, with the x axis representing the enrichment score and the y axis showing the negative logarithm (base 10) of the p value from pSSEA, is shown. (D) Scatterplot illustrating the characteristics of identified T cell-enhancing paralog pairs using a single-gene enrichment model. The x and y axes represent the negative logarithm (base 10) of the p values from the enrichment results of gene1 and gene2, respectively. The enrichment scores for paralog pairs were calculated by identifying the maximum deviation from zero during a random walk analysis. To assess the significance of enrichment, we created an empirical null distribution by randomly selecting an equivalent number of non-targeting control (NTC) sgRNAs and repeating this process 10,000 times to ensure robustness. The significance of enrichment for each gene pair was then determined by comparing their sgRNA distributions against both the positive and negative tails of the NTC null distribution. We re-analyzed Lawson’s CRISPR screen dataset[75]^19 to identify cancer-intrinsic paralog pairs that enhance T cell-mediated killing function using the pSSEA method. From the Ensembl Compara database (Ensembl, v.102), a total of 139,654 protein-coding gene paralog pairs were retrieved, covering 16,202 protein-coding genes, as shown in [76]Figure 1A. First, DESeq2 was used to quantify the selective depletion of sgRNAs within tumor populations under immune selection from co-cultured OT-1 T cells. Subsequently, using the pSSEA framework, we identified paralogous gene pairs that potentially influence T cell-mediated cytotoxicity based on the differential sgRNA results obtained from DESeq2. Our analysis identified 7,310 significant paralog pairs (5.2% of 139,654) with p values below 0.05 in B16F10 melanoma cell data. Among these, 5,632 pairs were positively enriched, suggesting vulnerability to T cell killing, and 1,678 pairs were negatively enriched, suggesting resistance to T cell killing ([77]Figure 1D). Using these same methods, we identified paralog pairs associated with vulnerability and resistance to T cell killing for CT26, MC38, 4T1, EMT6, and Renca cell lines, as shown in [78]Figure 1C. To investigate the properties of paralog pairs identified as enriched through pSSEA, we assessed individual gene enrichment using pSSEA in single-gene mode. Among the 5,632 pairs associated with vulnerability to T cell killing in B16F10 melanoma cells, 3,620 pairs (64.3%) were single hit, with only one gene showing significance; 520 pairs (9.2%) were dual hits, with both genes significant; and the remaining 1,492 pairs (26.5%) showed no hits of either of the genes within the identified paralog pair ([79]Figure 1D; [80]Table S2). Notably, we observed that over a quarter of the paralog pairs exhibited a synergistic effect despite no hits at the single-gene level. This observation was confirmed using data from Parrish et al., which includes both single knockout (SKO) and double knockout (DKO) screens for tumor viability. By analyzing DKO alongside SKO screens using MaGeck, we found that out of 354 synthetic lethality pairs identified in the DKO screen, 95 were not significant as individual genes in the SKO screen (26.8%; [81]Figure S1B). We also observed that pSSEA was more effective in identifying synthetic gene pairs compared to selecting pairs based on the significance of individual genes or both genes in SKO screens ([82]Figure S1C). This aligns with our previous findings in [83]Figure 1D that many potential synergistic gene pairs may not be significant as individual genes, highlighting a broad spectrum of potential targets that might be overlooked by obtaining gene pairs according to individual gene results. Validation of synergistic cancer-intrinsic paralog pairs that, together, boost T cell killing To determine if the gene pairs identified by pSSEA truly function synergistically, we focused on the subset of paralog pairs that exhibited a synergistic effect despite neither gene being individually significant. This subset constituted over 25% of the functional paralog pairs identified ([84]Figure 1D). For our experimental validation using the B16F10 murine melanoma cell model, we first screened for paralog pairs in the B16F10 model, where neither gene showed a significant effect individually but demonstrated a significant effect when both genes were perturbed together. This ensured that we could efficiently validate potential gene pairs in this tumor model, Then, to enhance the robustness of our selection, we applied pSSEA to five additional tumor models, including CT26, MC38, 4T1, EMT6, and Renca. We aimed to identify paralog pairs that consistently enhanced T cell killing across multiple models, thereby reducing the likelihood that observed effects were model-specific anomalies. We observed that no paralog pair was significant across all six models. However, some pairs appeared frequently, being significant in up to four models ([85]Figure 2A; [86]Table S3). Ultimately, four paralog pairs, Rbm45+Rbms2, Ppp2r2a+Ppp2r2d, Elf2+Etv6, and Adam10+Adam15, were selected for further experimental assessment of their potential to enhance T cell killing efficiency ([87]Figures S1D–S1G). Figure 2. [88]Figure 2 [89]Open in a new tab Synergistic paralog pairs that, together, boost T cell killing function in B16F10 tumor model (A) The T cell synergistic paralog pairs identified in the B16F10 melanoma model and the intersection across the other 5 tumor models. T cell-enhanced paralog pairs were identified using pSSEA, and synergistic pairs were denoted as pairs where neither of the two genes could reach significance using a single-gene enrichment approach. Four paralog pairs were identified in 4 out of 6 tumor models. Left: UpSet plot of overlap synergistic paralog pairs identified from the B16F10 melanoma model that were also significant in the 5 other cancer cell models in the Lawson study. (B) Experimental design for the validation of selected paralog pairs predicted to enhance T cell killing. Naive CD8a+ T cells were isolated from the spleen of OT-I mice and stimulated with anti-mouse CD3/CD28 antibody for 2 days. Then, B16F10-Cas9-mCherry-Ova gene knockout cells were co-cultured with OT-I T cells at an E:T = 1:1 ratio. Tumor cell killing were tested at 24 h by flow cytometry. (C) Bar chart of mean viability of B16F10 cells with Adam10 and Adam15 single- or double-gene knockout(s) following culture with or without OT-1 T cells. Individual data points are shown as black dots. The p values come from two-sided heteroscedastic t tests (ns, p > 0.05, ∗p ≤ 0.05, ∗∗p ≤ 0.01, and ∗∗∗p ≤ 0.001). (D) Bar chart of mean viability of Ppp2r2a and Ppp2r2d knockout B16F10 cells following culture with or without OT-1 T cells. (E) Bar chart of mean viability of Elf2 and Etv6 knockout B16F10 cells following culture with or without OT-1 T cells. (F) Bar chart of mean viability of Rbm45 and Rbms2 knockout B16F10 cells following culture with or without OT-1 T cells. To perform the paired gene KO, we utilized the same gRNAs from the original gRNA-library pool used in the CRISPR screen. For the first gene within the paralog pair, the targeting gRNA was cloned into a GFP-expressing construct, while for the second gene within the paralog pair, the targeting gRNA was inserted into a puromycin-expressing construct ([90]Figure 2B). To evaluate the synergistic function of each gene pair, we established four experimental groups with targeted KOs: NTC paired with NTC (NTC-NTC), gene1 KO paired with NTC (gene1KO-NTC), gene2 KO paired with NTC (gene2KO-NTC), and a double knockout (DKO) of both genes (gene1KO-gene2KO). To generate cells with these specific gene targets, we performed dual transduction of B16F10-Cas9-mCherry-OVA tumor cells, which express the OVA antigen, using lentiviruses carrying the gRNAs with GFP or puromycin markers. GFP-positive cells were sorted and then selected for puromycin-resistant cells to obtain either single- or double-knockout cells. Subsequently, gene editing was verified, and KO efficiency was evaluated using the T7E1 assay. To investigate the role of the paired genes in T cell-mediated cytotoxicity, we co-cultured B16F10-Cas9-mCherry-OVA tumor cells from the four experimental groups described above with OT-I T cells, whose T cell receptor (TCR) recognizes the SIINFEKL peptide presented by Ova antigen-expressing tumor cells, and assessed tumor cell survival. In these co-culture experiments, we found that simultaneous inactivation of Adam10 and Adam15 significantly increased T cell-mediated tumor cell death, while single knockouts of Adam15 or Adam10 did not affect T cell activity ([91]Figures 2C and [92]S1H). Similarly, Ppp2r2a+Ppp2r2d double knockouts displayed significantly greater vulnerability to T cell cytotoxicity, whereas individual knockouts of Ppp2r2a or Ppp2r2d showed only modestly enhanced T cell killing ([93]Figures 2D and [94]S1I). The Elf2 and Etv6 combination showed significantly enhanced T cell killing compared to the NTC control but not compared to the single-gene Elf2 and Etv6 KO cells ([95]Figures 2E and [96]S1J). The Rbm45/Rbms2 pair showed no discernible impact on T cell efficiency in either single- or double-knockout cells compared to the NTC group ([97]Figures 2F and [98]S1K). Furthermore, we evaluated the synergistic effects of combination targets in response to T-cells mediated killing, synergy analysis showed that Adam10+Adam15, Ppp2r2a+Ppp2r2d, and Elf2+Etv6 demonstrated positive synergy scores across the HSA, Loewe, and delta log fold change (dLFC) models (see the [99]methods section and [100]Table S4). In total, three out of four predicted paralog pairs were successfully confirmed to exhibit increased T cell vulnerability, highlighting previously overlooked synergistic gene pairs. These findings also provide fundamental experimental evidence for the accuracy of using the pSSEA method to effectively identify paralog pairs that enhance T cell killing efficacy. Building an ensemble XGBoost classifier to predict paralog pairs that enhance T cell-mediated cytotoxic killing Using a CRISPR screen dataset, we have demonstrated that paralog pairs can work together to enhance T cell-mediated cytotoxicity. However, this CRISPR screen dataset was derived from in vitro co-culture of cancer cell lines with cytotoxic T cells, an assay with significant limitations and potential experimental biases. Therefore, we developed a system for deeper characterization of paralog pairs and prediction of their potential for enhancing cancer immunotherapy. Utilizing T cell-enhancing paralog pairs identified in B16F10 melanoma data with pSSEA as true positives and those identified as insignificant as false positives, we compiled 32 features representing various aspects of paralogous gene interactions. These features include sequence characterization, expression abundance and correlation, shared PPIs, complex membership, co-evolution, tumor microenvironment (TME) associations, perturbation similarities, and combined survival associations in patients with cancer ([101]Figure 3A; [102]Table S5). Then, an ensemble classifier was trained, utilizing the identified 32 features, to predict potential paralog pairs. In order to ensure the quality of model training and testing, the paralog pairs were filtered using several criteria: paralog pairs identified as T cell-enhancing pairs or insignificant pairs in B16 data; a sequence BLAST identity larger than 30% with an e-value < 1e−5; measurements in structural similarity, synthetic lethality, and survival associations; and paralog pairs with family sizes of 20 or fewer to prevent overrepresentation of specific ortholog families. A total of 3,204 paralog pairs were selected from the initial set of 139,654 paralog pairs. Furthermore, 80% of these 3,204 pairs were randomly assigned to the training dataset (n = 2,547), which includes negative pairs (n = 2,274) and positive pairs (n = 273). The remaining 20% were saved for internal testing later ([103]Figures S2A and S2B). Prior to building the ensemble model, our analysis revealed that certain individual features exhibited a stronger predictive effect in the training data. The top influential feature was gene expression abundance with the highest ROC AUC (area under receiver operating characteristic curve) values, indicating that gene expression itself could be a critical indicator of a paralog pair ([104]Figure 3B). Notably, the features ranked second to fifth were four PPI-network features ([105]Figure 3A), highlighting the crucial role of gene networks in determining whether two genes could interact in an immunotherapy context. Synthetic lethality was identified as a mid-level feature, suggesting that the collaborative roles of paralogs of cancer cells in response to immune context might differ from their roles in cell viability. Figure 3. [106]Figure 3 [107]Open in a new tab Development and evaluation of XGBoost classifier for predicting T cell enhancing synergistic paralog pairs (A) Schematic for constructing a machine learning classifier to predict T cell-enhancing paralog pairs. (B) ROC AUC and PRC AUC measures of 32 paralog features for predicting paralog pairs that boost T cell killing. (C) ROC curves of XGBoost and top individual features in the internal test dataset. The XGBoost classifier outperformed the traditional individual features. (D) Precision recall curves of the XGBoost classifier and individual features. Next, the XGBoost classifier, an ensemble-boosted tree learner, was trained, utilizing 5-fold cross-validation for hyperparameter tuning. We then applied the optimal parameters to re-train the classifier using the entire training dataset (details in [108]methods). Variable importance analysis revealed that complex, co-localization, and PPI features were the top-ranking features ([109]Figure S2C). We then evaluated the XGBoost classifier’s performance using testing data. Notably, the XGBoost classifier significantly outperformed all single features used as a baseline for comparison ([110]Figure 3C). Specifically, the classifier achieved an AUC of 0.743, surpassing the best-performing individual-feature average paralog gene expression in cancer cell lines, which had an AUC of 0.612. Considering the imbalanced nature of the training dataset, with more negative pairs than positive ones, we also assessed the classifier’s performance using the precision-recall curve (PRC; [111]Figure 3D). Similarly, the XGBoost classifier showed better performance compared to all individual features. Overall, the results underscore the enhanced predictive capability of our XGBoost classifiers in effectively capturing interactions between features and non-linear relationships, thereby enhancing T cell killing potential. Furthermore, we evaluated the classifier’s performance on paralog pairs in cells other than B16F10 cells. ROC analysis revealed that the classifier trained on B16F10 cell data could not predict paralog pairs enhancing T cell killing in other cell types, except for CT26 colon cells ([112]Figures S3A–S3E), indicating inherent heterogeneity among cancer cell models. Interpreting the prediction of immunotherapy-favorable paralog pairs from the machine learning classifier Using the XGBoost classifier, we comprehensively analyzed the probability of enhancing T cell cytotoxicity for additional paralog pairs outside the training and internal testing sets. By filtering with criteria including measurements in structural similarity, synthetic lethality, and survival associations and excluding pairs from the training and testing sets, we obtained 15,414 paralog pairs out of the total 139,654 pairs for prediction. Each paralog pair was calculated for its potential impact, assigned a probability score, and ranked accordingly. The Shapley additive explanations (SHAP) tree explainer method assessed the contribution of each feature to the final predictions. The paralog pairs were ranked by their prediction scores, and subsequent pathway enrichment analysis revealed that genes in the top 5% of paralog pairs were significantly enriched in functions related to protein phosphorylation and ubiquitination modification ([113]Figure S4A). The top three ranked predicted paralog pairs were Rapgef1+Papgef2, Cct8+Cct3, and Syk+Itk ([114]Figure 4A; [115]Table S5). Notably the expression of Syk in cancer cells has shown to be associated with both tumor promotion and suppression.[116]^22 Itk has been reported to enhance ICB response in solid tumors,[117]^23 and Rapgef1 was identified as essential for melanocyte growth through a CRISPR proliferation screen.[118]^24 Analysis of The Cancer Genome Atlas (TCGA) cohort revealed that four genes (Rapgef1, Rapgef2, Itk, and Syk) had mutation rates exceeding 1% in the pan-cancer cohort. At the individual cancer-type level, four genes (Cct3, Rapgef1, Rapgef2, and Syk) had mutation rates exceeding 5% in highly mutated endometrial carcinoma, while in melanoma, three genes (Rapgef1, Rapgef2, and Syk) had mutation rates exceeding 3%; Itk had a rate of 7.3% ([119]Figure 4B). The SHAP profile revealed differential feature attributions among the top pairs. For example, sequence identity and paralog family size showed negative contributions to the prediction in Rapgef1+Rapgef2 ([120]Figure 4C) and Itk+Syk ([121]Figure 4D) but positive contributions in Cct3+Cct8 ([122]Figure S4B). Figure 4. [123]Figure 4 [124]Open in a new tab Visualization and validation of top 3 predicted paralogs from XGBoost classifier (A) Paralog pairs with the top 3 prediction scores are displayed in a scatterplot. (B) Mutation profile of top 3 paralog pairs RAPGEF1/RAPGEF2, SYK/ITK, and CCT8/CCT3 in The Cancer Genome Atlas (TCGA) cohorts. (C and D) SHAP profile for RAPGEF1/RAPGEF2 (C) and SYK/ITK (D). (E and F) Bar chart of mean viability of Rapgef1/Rapgef2 (E) and Itk/Syk (F) knockout treated B16F10 melanoma cells following culture with or without OT-1 T cells. The p values come from two-sided heteroscedastic t tests. (ns, p > 0.05, ∗p ≤ 0.05, ∗∗p ≤ 0.01, and ∗∗∗p ≤ 0.001). (G and H) Gene sets enrichment analysis (GSEA) revealed that patients with low expression of RAPGEF1/RAPGEF2 (G) and SYK/ITK (H) were associated with downregulation of PD-L1/PD-1 signaling pathway. The GSEA was performed using the differential expression output between patients with low (<25%) expression of both paralog genes versus all remaining patients. To evaluate the synergistic effect of the paralog genes identified by the XGBoost classifier, we conducted DKO-B16F10-OT1 T cell co-culture assays. Knocking out Rapgef1 alone did not affect T cell-mediated tumor cell killing, while knocking out Rapgef2 alone increased tumor cell resistance to T cell killing. However, a combined KO of Rapgef1 and Rapgef2 significantly improved T cell efficiency in killing tumor cells ([125]Figures 4E and [126]S4C). A similar pattern was observed with the Syk+Itk gene pair: knocking out individual genes increased resistance to T cell-mediated killing, but a double knockout enhanced T cell killing activity ([127]Figures 4F and [128]S4D). In contrast, double knockout of the Cct8+Cct3 gene pair impaired T cell killing ability against tumor cells ([129]Figures S4E and S4F). Synergy analysis revealed that Rapgef1+Papgef2 and Syk+Itk exhibited positive synergy effects (see the [130]methods section and [131]Table S4). To further investigate why Cct3/Cct8 might fail while the other two pairs work, we performed gene set enrichment analysis (GSEA) on TCGA-SKCM data, comparing patients with low expression in both paired genes (≤25%) to those with high expression (≥75%). We observed significant downregulation of the PD-L1/PD-1 signaling pathway in patients with melanoma with low Rapgef1+Rapgef2 ([132]Figure 4G) and Itk+Syk ([133]Figure 4H) expression. Downregulation of the PD-L1/PD-1 signaling pathway augments T cell killing by interrupting inhibitory signals that suppress T cell activity. Conversely, patients with low Cct3+Cct8 expression ([134]Figure S4G) showed insignificant upregulation of this pathway compared to others. This finding suggests that the failure of Cct3+Cct8 to enhance T cell cytotoxicity may be due to the activation of other immunosuppressive pathways and the elevated expression of inhibitory molecules. Discussion Although numerous CRISPR screen studies aim to identify novel cancer cell targets for enhancing cancer immunotherapies, few have focused on functional gene pairs, thus limiting the development of combination strategies. To address this, we developed a computational enrichment-based approach, pSSEA, to identify potential paralog gene pairs using genome-wide CRISPR screen datasets by combining sgRNAs from two paralogs. We demonstrated that pSSEA enables the identification of cancer-intrinsic paralog pairs that can synergistically enhance T cell killing. Subsequently, we constructed an ensemble-learning XGBoost classifier to predict additional cancer-intrinsic, immunotherapy-aiding paralog pairs and experimentally tested the top predictions. Notably, all analyses in this study were based on CRISPR screen data. We observed that only a small subset of the screen-derived T cell-enhancing paralog pairs demonstrated a combined effect in predicting ICB responses when using patient-derived transcriptomic data (data not shown). This suggests that these two data types may capture different aspects of immunotherapy-related signals. We developed an enrichment-based approach pSSEA to identify key paralog gene pairs, building on two main considerations: (1) single-gene screens well correlated with combinatorial screens effects from previously studies[135]^9^,[136]^14^,[137]^16 and (2) although combinatorial targeting remains a compelling area of interest, designing a library that comprehensively covers all possible gene pairs is technically unfeasible. Limitations of the study Limitations of this study should also be acknowledged. First, the paralog pairs selected and used for modeling were not derived from actual combinatorial screens targeting immunotherapy. The absence of ground-truth data may influence the accurate identification of true paired hits, potentially impacting the model’s performance. Secondly, we did not incorporate a synergistic effect module, as used in other combinatorial screening studies,[138]^13^,[139]^25^,[140]^26 which defines synergy as the difference between a gene pair’s observed effects and the expected sum of their individual effects. While concerns remain about the comparability of enrichment scores generated from sgRNA sets of varying sizes, we plan to further investigate this issue and evaluate the feasibility of integrating similar functionality. Nevertheless, this study has the potential to offer valuable insights into the rapidly advancing field of immunotherapy and harness the power of well-established single-gene screens to uncover meaningful gene pair interactions. In summary, we envision our work providing novel methods to leverage genome-wide screen data for selecting combination targets in cancer immunotherapy. We believe that the computational framework in our study could be adapted to prioritize non-paralogous protein-coding gene pairs, broadening its applicability beyond paralogs. Methods Paralog pair information Information on protein-coding paralog pairs was obtained from the Ensembl Compara database via biomaRt (hsapiens_gene_ensembl, v.102) using custom R scripts. Duplicated paralog pairs were filtered out. A total of 139,654 protein-coding gene paralog pairs, covering 16,202 protein-coding genes, were used for the analyses in this study. Collect features for paralog pairs Sequence similarity Amino acid sequence of paralog genes were obtained from Ensembl (release 102).[141]^27 Paralog sequence similarities were measured using BLASTP command line tools.[142]^28 The longest translated peptide was used for genes with multiple transcripts. Gene pair sequence identity, paralog family size, and whether they are the closest pair in the same gene family were included as sequence-related features for paralog pairs. We also used high-resolution structure PDB files from [143]https://alphafold.ebi.ac.uk/.[144]^29 The structural similarity of paralogous proteins was measured using template modeling score (TM-score) software.[145]^30 The TM-score and root-mean-square deviation (RMSD) were included as additional sequence-related features. Shared gene network, complex, and co-localization We collected shared PPI features to characterize the role of paralogs in the gene interaction network, including total PPIs, shared PPIs, and two statistical measures: the Jaccard index and Fisher’s exact test −log10(p value) for shared PPIs. Gene subcellular location data were obtained from the Human Protein Atlas (v.22.0, file: subcellular_location.tsv). The co-localization of paralogous genes was also measured using the Jaccard index. The complex membership of paralog pairs and gene-protein complex membership data were obtained from CORUM[146]^31 via [147]https://maayanlab.cloud/Harmonizome/. Gene co-evolution Gene ages were obtained from ProteinHistorian ([148]https://proteinhistorian.docpollard.org/),[149]^32 and we used the average age of the two paralogs as the age for the pair (Wagner age reconstruction algorithm). The conservation level of an individual gene was calculated as the number of species with orthologs (BLASTP e-value < 1-e−5, identity > 30%) out of 269 species from Ensembl genomes (v.102). The shared homologs of a gene pair were calculated as the number of species having both gene homologs. The ratio of the number of non-synonymous substitutions per non-synonymous site to the number of synonymous substitutions per synonymous site (Ka/Ks) for gene pairs was calculated using the codeml program in the PAML package (v.4.10.6).[150]^33^,[151]^34 We adopted the phylogenetic distance method developed by Tabach et al. to measure the co-evolution of a pair of genes.[152]^35 A non-negative matrix factorization (NMF)-derived distance considering phylogenetic tree information was calculated using the phylogenetic profiles of the paralogous genes.[153]^36^,[154]^37 Essentiality and synthetic lethality Paralogs with available evidence of essentiality from S. pombe and S. cerevisiae ortholog data from OGEE v.3 were marked as essential pairs.[155]^38 We adopted Kegel’s method to identify synthetic lethality pairs[156]^6 using DepMap CRISPR data ([157]https://depmap.org/portal, 23Q2 release). Paralog pairs were considered synthetic if the A1 gene was an essential gene (Chronos score[158]^39 < −0.6 in 1% or 10% of cell lines), and its dependency was significantly associated with A2 loss status with a p value less than 0.05. Co-expression pattern Gene pair co-expression was measured using Spearman correlation coefficient ρ at multiple levels: cancer cell lines, normal tissues, and cancer scenarios. Cancer Cell Line Encyclopedia (CCLE) gene expression data were downloaded from DepMap.[159]^40 Tissue-specific gene expression profiles from healthy donors were downloaded from Genotype-Tissue Expression (GTEx; [160]https://gtexportal.org/) public releases.[161]^41 Cancer-specific gene expression profiles for 33 cancer types were downloaded from TCGA GDC Portal ([162]https://portal.gdc.cancer.gov/).[163]^42 Perturbation similarity Due to the limited number of gene perturbation experiments in Connectivity Map (L1000, [164]https://clue.io/data/),[165]^43 we conducted pseudo perturbation using TCGA mRNA sequence data to mimic the transcriptomic gene expression changes associated with single-gene manipulation. Practically, the differential expression profile of specific gene inhibition was calculated as the log fold change in gene expression between patients with expression levels in the lowest 25th percentile versus those in the highest 75th percentile of the query gene. The perturbation similarity of paralogous genes was measured using recovery AUC with the paralog gene differential expression profiles. Microenvironment associations The TME plays an important role in cancer progression and treatment outcome. We attempted to uncover features quantifying the association of the paralogs with multiple microenvironment indicators. CD8^+ T cell infiltration was measured by the single-stranded GSEA (ssGSEA) method.[166]^20 We calculated T cell exclusion and dysfunction scores using another software, the TIDE algorithm developed by Jiang et al.[167]^44 The association with gene pairs and microenvironmental characteristics was calculated by Kendall’s rank correlation test.[168]^45 Survival associations To determine gene pairs with a synergistic effect on survival outcomes, we calculated the survival associations to assess whether a paralog pair acts in concert to influence cancer prognosis. First, patients were categorized into three groups based on the expression levels of the paired paralog genes: groups were designated as 1 for both genes showing low expression, 3 for both showing high expression, and 2 for any other expression combination. The survival association for paralog pairs was then calculated using the formula [MATH: survivalassociation=log10(p) sign(β_gene1β_gene2) . :MATH] In this formula, p refers to the p value from the Cox regression analysis of the paralog group, adjusted for sex and age, while β_gene1 and β_gene2 represent the exponentiated coefficients of the individual genes from the Cox proportional hazards model. Clinical data and genomics data for TCGA-SKCM were downloaded from the GDC Portal ([169]https://portal.gdc.cancer.gov/).[170]^46 XGBoost ensemble classifier The popular supervised-learning algorithm XGBoost—a gradient boosting ensemble learner with regularization parameters—was used in this study (implementation in the XGBoost package).[171]^47 We performed a grid search based on 5-fold cross-validation of our paralog pair dataset to explore the space of potential hyperparameters for the XGBoost classifier. The parameters used in the final model are as follows: objective = reg: logistic, n_estimators = 400, max_depth = 10, min_child_weight = 1, min_split_loss = 0.1, subsample = 1, colsample_bytree = 1, learning_rate = 0.01, and random_state = 8 (for reproducibility). All parameters not listed used default settings. Performance evaluation The ROC AUC and PRC AUC for the feature values as well as the classifiers were computed to evaluate the performance of individual features and trained models. Lentivirus purification For lentivirus production, 20 μg of lenti-U6-sgRNA-EFS-GFP-WPRE or 20 μg of lenti-U6-sgRNA-EFS-Puro-WPRE, 10 μg of pMD2.G, and 15 μg of psPAX2 were co-transfected into LentiX-293 cells plated in a 150 mm dish at 80%–90% confluency using 130 μg polyethyleneimine (PEI). 6 h later, the media were replaced with fresh DMEM+10% FBS. Virus supernatant was collected 48 h post-transfection and centrifuged at 1,500g for 15 min to remove the cell debris. The virus supernatant was concentrated by Lenti-X concentrator at 4°C for 30 min (1 vol of Lenti-X concentrator with 3 vol of supernatant), followed by centrifugation at 1,500g for 45 min at 4°C. Finally, the virus was resuspended in DMEM, aliquoted, and stored at −80°C. Generation of single-gene-knockout or double-gene-knockout cells B16F10 cells stably expressing Cas9 were generated by transducing B16F10 cells with lentiviral EF1a-NLS-Cas9-2A-Blast-WPRE, followed by 5 days of selection under 20 μg/mL blasticidin. B16F10-Cas9 cells were further transfected with lentiviral EF1a-mCherry-2A-OVA-WPRE and sorted for mCherry-positive cells to generate B16f10-Cas9-mCherry-OVA cells. The two gRNAs targeting each gene within the paralog pair were transduced into the B16F10-Cas9-mCherry-Ova cells simultaneously. gRNA virus-infected cells were cultured at 37°C for more than 24 h, and then GFP-positive cells were sorted and selected under 5 μg/mL puromycin for 3 days to generate the double-knockout cells. T7E1 assay T7E1 was used to estimate gRNA cutting efficiency. In brief, gDNA was extracted by using Genomic DNA prep with QuickExtract (QE) buffer. PCR amplification of the genomic regions flanking the CRISPR RNA (crRNA) was performed using the primers listed in [172]Table S3. Using Phusion Flash High Fidelity Master Mix (Thermo Fisher Scientific), the thermocycling parameters for PCR were 98°C for 1 min; 35 cycles of 98°C for 1 s, 60°C for 5 s, and 72°C for 15 s); and 72°C for 2 min. The PCR amplicons were then used for T7E1 assays according to the manufacturer’s protocol. Naive OT-I CD8a+ T cell isolation and co-culture assay The naive OT-I CD8a+ T cells were isolated using the Naive CD8a+ T Cell Isolation Kit (Miltenyi Biotec) following the manufacturer’s instructions. Naive CD8a+ T cells were isolated from the spleen of OT-I mice and stimulated with anti-mouse CD3/CD28 antibody for 2 days. For the tumor cell and OT-I T cell co-culture assay, B16F10-Cas9-mCherry-Ova gene KO cells were co-cultured with OT1 cells at effector-to-target cell ratios (E:T) = 1:1 ratios. Tumor cell killing was tested at 24 h by flow cytometry. LIVE/DEAD near-infrared (IR) was diluted at 1:1,000 to identify the dead cells. Tumor cells were identified as mCherry-positive cells. Synergy analysis Statistical analysis for evaluating synergistic interactions between paralog pairs was calculated by the SynergyFinder R package[173]^48 using four models, including the highest single agent (HSA), Bliss, Loewe, and zero interaction potency (ZIP) models. The p values for the average synergy score were computed using bootstrapping of the dose-response matrix. We also adopted the dLFC method to measure the additive effect of gene pairs.[174]^13 The dLFC was calculated as the observed mean relative killing rate of the DKO minus the expected sum of the relative killing rates of the individual SKO. The relative killing rate was determined by subtracting the values from the NTC group. Resource availability Lead contact Requests for information and resources used in this article should be addressed to the lead contact, Sidi Chen (sidi.chen@yale.edu). Materials availability This study did not generate new unique reagents. Data and code availability The source code for processed data and modeling is available under GitHub ([175]https://github.com/cpdong/imParalog) and Zenodo ([176]https://doi.org/10.5281/zenodo.14759518)[177]^49 and included in this article and its [178]supplemental information files. An interactive web application of this study is deployed at [179]https://sidichenlab.shinyapps.io/imparalog for free visiting and querying. Any additional information required to re-analyze the data reported in this paper is available from the lead contact upon request. Acknowledgments