Abstract

   Chemotherapy agents can cause serious adverse effects by attacking both
   cancer tissues and normal tissues. Therefore, we proposed a synthetic
   lethality (SL) concept-based computational method to identify specific
   anticancer drug targets. First, a 3-step screening strategy
   (network-based, frequency-based and function-based screening) was
   proposed to identify the SL gene pairs by mining 697 cancer genes and
   the human signaling network, which had 6306 proteins and 62937
   protein-protein interactions. The network-based screening was composed
   of a stability score constructed using a network information centrality
   measure (the average shortest path length) and the distance-based
   screening between the cancer gene and the non-cancer gene. Then, the
   non-cancer genes were extracted and annotated using drug-target
   interaction and drug description information to obtain potential
   anticancer drug targets. Finally, the human SL data in SynLethDB, the
   existing drug sensitivity data and text-mining were utilized for target
   validation. We successfully identified 2555 SL gene pairs and 57
   potential anticancer drug targets. Among them, CDK1, CDK2, PLK1 and
   WEE1 were verified by all three aspects and could be preferentially
   used in specific targeted therapy in the future.

Introduction

   Synthetic lethality (SL) was first defined by Calvin Bridges in
   1922^[44]1, who noticed that some combinations of gene mutations in the
   model organism Drosophila melanogaster conferred lethality. This term
   now refers to the genetic interaction between two or more genes where
   only their co-alteration (e.g., by mutations, amplifications or
   deletions) can result in severe loss of viability or death of the cell,
   although the cell remains viable when the individual genes are
   altered^[45]2. The term “SL” was coined in 1946 by Theodosius
   Dobzhansky, who was a geneticist and evolutionary biologist and
   described a lethally genetic interaction as when two independently
   viable homologous chromosomes were allowed to recombine in Drosophila
   pseudoobscura^[46]3. In 1997, Hartwell et al. first proposed to apply
   the concept of SL and used chemical and genetic screening methods to
   develop selective anticancer drugs and anticancer drug targets^[47]4.
   Since then, SL has become a valuable concept that has led to an
   innovative approach for identifying specific anticancer drug
   targets^[48]5,[49]6.

   Serious adverse drug reactions are some of the main problems with
   cancer treatment. Conventional cancer chemotherapy that does not
   exploit the genetic differences between cancer tissues and normal
   tissues tends to produce toxic effects on normal cells. To solve the
   problem, targeted therapy has emerged as a hot spot in anticancer drug
   research and development. In addition, the discovery of “SL” creates
   new hope in discovering an anticancer drug target for targeted
   therapeutics^[50]7. Cancer is caused by the inactivation or mutation of
   particular genes in normal cells. If specific mutant genes are involved
   in cancer, it is possible to specifically kill cancer cells without
   harming healthy cells by inhibiting the SL partner gene with anticancer
   drugs. Even if the distribution of the SL partner gene is not specific,
   it will not cause a serious impact on normal cells according to the
   concept of SL. A major breakthrough in the targeted therapy of
   BRCA1-mutant cancers was the finding that cells with BRCA1/2 mutations
   were exquisitely sensitive to poly (ADP-ribose) polymerase (PARP)
   inhibitors^[51]8,[52]9, which was a great utility of SL. In addition,
   targeted therapy achieved a milestone success via the targeting of the
   PARP-1 enzyme by Olaparib in ovarian cancer patients carrying a tumor
   BRCA1/2 mutation^[53]10,[54]11.

   To identify SL interactions that could be efficacious in treating
   cancer, many approaches have been proposed. Current screening methods
   for potential SL gene pairs can be summarized in three categories. The
   first is based on model organisms (such as yeast or fruit flies). Their
   genomes are small and can be easily mutated and matched; therefore,
   gene silencing techniques are easier to conduct in model organisms.
   However, as with the homologous inference methods of all model
   organisms, most genes in SL gene pairs in model organisms do not have
   homologous genes in human genome. Even though homologous genes can be
   found in the human genome, their functions have undergone great changes
   and cannot be directly converted into SL gene pairs^[55]12. The second
   screening method was gene silencing in mammals, and two types of gene
   silencing methods have been developed. One is based on the priori
   knowledge speculation^[56]13. The potential SL gene pairs contained two
   kinds of genes, namely, mutant cancer genes and SL partner genes.
   Therefore, the SL partner genes should be directly knocked down and
   tested one by one. The other is based on high-throughput experimental
   techniques for unbiased screening of the whole genome^[57]14.
   Ultimately, siRNA and CRISPR screenings proved to be the most reliable
   methods for detecting SL gene pairs^[58]15. However, compared to model
   genetic systems, human cell systems face greater challenges for
   genome-wide siRNA or CRISPR screening. Moreover, these approaches are
   considerably more expensive, labor-intensive, time consuming and many
   of the essential genes so identified turn out to be either restricted
   to only these cell-line models or are in frequently overexpressed in
   cancers^[59]16. For these reasons, the third screening method based on
   computational methods has attracted more and more attention.

   Computational approaches, which can help to identify and prioritize
   potential SL gene pairs for further experimental validation, represent
   an attractive alternative compared to genome-wide siRNA or CRISPR-based
   human cell line screening approaches. These methods include human
   orthologous gene pairs inference from yeast SL genes^[60]7,[61]17; the
   use of robustness features in the cancer PPI network to evaluate the
   importance of gene pairs^[62]18; a mutual exclusivity calculation using
   statistical models from gene mutation/transcriptional expression
   data^[63]19,[64]20; data-driven detection of SL (DAISY) that combined
   somatic copy number alteration, siRNA screening and cell survival and
   gene co-expression information and achieved a promising performance in
   data-driven SL gene pair identification^[65]21; and a learning-based
   pipeline for training and prediction, which combined the three features
   of mutation coverage, driver mutation probability and network
   information centrality into a manifolds ranking model to generate a
   ranking list of potential SL pairs^[66]16.

   Furthermore, the methods mentioned above are not based on the human
   biological system or cannot be a good simulation of the human complex
   and staggered environments. The cells employ signaling pathways and
   networks to drive biological processes in which genomic alterations
   might result in malignant signaling, which then leads to cancer
   phenotypes^[67]22. In this article, the human system was abstracted
   into a human signaling network. The specific mutant gene was defined as
   a cancer gene and its SL partner gene was defined as the non-cancer
   gene. Then, we proposed a computational method using a 3-step screening
   strategy to identify SL gene pairs from the perspective of a network
   system. Next, according to the SL gene pairs we identified, we
   extracted non-cancer genes to obtain anticancer drug targets. Finally,
   we used 3 different aspects of data to validate parts of our results.
   Overall, the SL strategy contributes to the identification of
   anticancer drug targets and drug redirection.

Results

Human cancer signaling network

   This subject focused on high-frequency non-cancer genes that have a
   greater impact on biological systems. Thus, the frequencies of all
   non-cancer genes were counted according to the genes passing through
   the shortest path between all cancer gene and non-cancer gene pairs in
   the human signaling network (Fig. [68]1(a)). All of the nodes in the
   human signaling network were sorted by frequency in descending order.
   Then, the top 30% (740) of non-cancer genes were obtained to construct
   a network named the human cancer signaling network (HCSN) for further
   research. As shown in Fig. [69]1(b), HCSN includes 6153 proteins and
   56976 protein-protein interactions, and 697 cancer genes were
   successfully mapped. Thus, non-cancer genes were paired with cancer
   genes to form 515780 (740 × 697) gene pairs, which were used as input
   data for the following 3-step screening strategy for identifying SL
   gene pairs.

Figure 1.

   [70]Figure 1
   [71]Open in a new tab

   The illustration of the network. (a) The human signaling network. (b)
   The human cancer signaling network (HCSN). Blue nodes denote non-cancer
   genes; yellow nodes denote cancer genes; and edges represent
   protein-protein interactions. A larger node indicates a greater degree.

SL gene pairs

   We designed a 3-step screening strategy to predict the SL gene pairs in
   the HCSN, and the results are described herein.

   First, we chose the network-based screening method to obtain the SL
   gene pairs. According to the stability score and 1000 randomized
   networks (P < 0.05), we obtained the significant SL gene pairs. Then,
   we screened the gene pairs based on the distance between non-cancer
   genes and cancer genes. The average distance between non-cancer gene
   and cancer gene was 2.90; therefore, we kept the gene pairs with
   distances no more than 2. After the first screening step, 9241 gene
   pairs were obtained.

   Second, we chose the frequency-based screening method. We plotted the
   cumulative frequency percentage plot to obtain a reasonable frequency
   threshold (Fig. [72]2). As seen from the figure, the growth trend of
   the top 50% curve was faster. Therefore, 122 high-frequency non-cancer
   genes were focused on in our study. As a result, 4788 gene pairs were
   obtained.

Figure 2.

   Figure 2
   [73]Open in a new tab

   The cumulative percentage of frequency. The X-axis was the number of
   non-cancer genes. The Y-axis was the cumulative percentage of
   frequency. (122, 0.5) represented the cumulative frequency of the first
   highly frequent 122 genes account for 50% of the cumulative frequency
   of the total genes.

   Third, the function-based screening method was performed. The 4788 gene
   pairs from the second screening contained 749 genes and these genes
   were significantly enriched in 47 pathways (Fig. [74]3). These pathways
   could be divided into seven biological process categories, namely, cell
   growth and death, cell motility, signal transduction, endocrine system,
   immune system, cell community and growth. Many biological pathways in
   our results were found to be closely related to SL. For example, the
   HIF-1 signaling pathway, which activated the transcription of genes
   involved in angiogenesis, cell survival, glucose metabolism and
   invasion, was used as a screening pathway for the discovery of SL gene
   pairs^[75]23. The PI3K-AKT signaling pathway^[76]24, the RAS signaling
   pathway^[77]25, the P53 signaling pathway^[78]26, and the mTOR
   signaling pathway^[79]27 were also widely considered promising pathways
   for SL recognition and have attracted the interest of many researchers.

Figure 3.

   [80]Figure 3
   [81]Open in a new tab

   The significant enrichment pathways. Different colors denoted different
   pathway categories.

   In addition, after function-based screening, we obtained 395
   significantly enriched genes conformed 2555 SL gene pairs, which
   included 81 non-cancer genes and 314 cancer genes (Fig. [82]4). The
   average degree of the light blue nodes and red nodes were 8.14 and
   31.54, respectively. According to the concept of SL, we think that
   these 81 non-cancer genes should be potential and specific anticancer
   drug targets. Designing drugs against these non-cancer genes in cancer
   with specific cancer gene mutations could improve the therapeutic
   efficiency and reduce side effects. However, at the same time, many
   aspects need to be considered before a protein that could be used as a
   drug target such as molecular weight, polarity, and tissue distribution
   in the body. Therefore, we focused on the existing drug target
   information and our non-cancer genes in the SL gene pairs to explore
   adaptive anticancer drug targets.

Figure 4.

   Figure 4
   [83]Open in a new tab

   SL gene pairs. Light blue nodes denoted non-cancer genes; red nodes
   denoted cancer genes. Larger node indicates greater degree.

Potential anticancer drug targets

   We used the existing drug-target interaction data and 81 non-cancer
   genes in SL gene pairs to extract specific anticancer targets and
   drugs, which might be used in cancer treatment. After we annotated the
   81 non-cancer genes with the drug-target information, 57 known drug
   targets (Table [84]1) were identified, of which 27 had been used as
   anticancer drug targets in clinical treatment. Using these 27 targets,
   we expected specific and low-risk cancer therapies to be achieved. In
   our opinion, the rest of the 30 targets, which are closely related to
   the occurrence and progression of cancer, such as immune-related and
   anti-inflammatory targets, have the potential to become anticancer drug
   targets and will be used in anticancer drug re-positioning.

Table 1.

   The potential anticancer targets and corresponding non-cancer target.
   Non-cancer Gene Target Target Type Non-cancer Gene Target Target Type
   CCL2 [85]P13500 Anticancer drug target IL18 [86]Q14116
   Anti-Inflammatory target
   CDK1 [87]P06493 Anticancer drug target VCAM1 [88]P19320
   Anti-Inflammatory target
   CDK2 [89]P24941 Anticancer drug target CD4 [90]P01730 Immune-related
   target
   CDK5 [91]Q00535 Anticancer drug target CSF2 [92]P04141 Immune-related
   target
   CSF1 [93]P09603 Anticancer drug target GRB2 [94]P62993 Immune-related
   target
   CSNK2A1 [95]P68400 Anticancer drug target IL10 [96]P22301
   Immune-related target
   E2F1 [97]Q01094 Anticancer drug target ITGB1 [98]P05556 Immune-related
   target
   F2 [99]P00734 Anticancer drug target TH [100]P07101 Immune-related
   target
   FGF2 [101]P09038 Anticancer drug target APAF1 [102]O14727 other
   HDAC1 [103]Q13547 Anticancer drug target ARAF [104]P10398 other
   HGF [105]P14210 Anticancer drug target ARF6 [106]P62330 other
   IL6 [107]P05231 Anticancer drug target ATF2 [108]P15336 other
   LYN [109]P07948 Anticancer drug target ATF4 [110]P18848 other
   MAPK3 [111]P27361 Anticancer drug target BDNF [112]P23560 other
   MMP9 [113]P14780 Anticancer drug target CASP3 [114]P42574 other
   NFKB1 [115]P19838 Anticancer drug target CD40 [116]P25942 other
   NOS3 [117]P29474 Anticancer drug target CDC42 [118]P60953 other
   PRKCZ [119]Q05513 Anticancer drug target CXCL12 [120]P48061 other
   PTGS2 [121]P35354 Anticancer drug target EDN1 [122]P05305 other
   PTK2B [123]Q14289 Anticancer drug target F2R [124]P25116 other
   TNF [125]P01375 Anticancer drug target GJA1 [126]P17302 other
   TNFRSF1B [127]P20333 Anticancer drug target IGF1 [128]P05019 other
   VEGFA [129]P15692 Anticancer drug target INS [130]P01308 other
   XIAP [131]P98170 Anticancer drug target KAT2B [132]Q92831 other
   YES1 [133]P07947 Anticancer drug target NPR2 [134]P20594 other
   PLK1 [135]P53350 Anticancer drug target NPY [136]P01303 other
   WEE1 [137]P30291 Anticancer drug target PIK3CG [138]P48736 other
   Anti-Inflammatory target; SGK1 [139]O00141 other
   PPARA [140]Q07869 Analgesics drug target; PRKAB1 [141]Q9Y478 Analgesics
   drug target
   Immune-related target Anti-Inflammatory target;
   [142]Open in a new tab

   In addition, the average degree of the 57 drug targets was 33.81, which
   indicates those nodes had interactions with more red nodes in the
   network (Fig. [143]4). Meanwhile, some light blue nodes showed a large
   degree, but they weren’t known drug targets such as PAK1 and IL4. The
   frequencies of PAK1 and IL4 were 269 and 60, respectively. PAK1 encodes
   a family member of the serine/threonine p21-activating kinases, also
   known as the PAK proteins. This specific family member regulates cell
   motility and morphology. In addition, PAK1 could be mapped into many
   promising SL recognized pathways such as the MAPK signaling pathway,
   focal adhesion, and the ErbB signaling pathway. The protein encoded by
   the IL4 gene is a pleiotropic cytokine produced by activated T cells.
   This cytokine is a ligand for the interleukin 4 receptor. In addition,
   it could be mapped into the T cell receptor signaling pathway and the
   Fc epsilon RI signaling pathway. Therefore, those light blue nodes that
   had large degree also tend to have great effects in specific anticancer
   therapy in combination with the SL gene pairs we identified.

Validation of the anticancer drug target

   To verify the results, three aspects of the data were used. The first
   was SynLethDB^[144]28, which contained SL pairs information collected
   from biochemical assays, computational predictions, text mining results
   and other related databases. We used the overlap gene data between
   SynLethDB and our predicted anticancer drug targets information to
   validate the results. Because of the limitation of the SL gene pair
   data, 20 of the 57 known drug targets that we found were not included
   in SynLethDB. As a result, 18 of the 37 anticancer drug targets were
   validated as SL partner genes in this database. These targets with
   corresponding cancer genes constitute 35 SL gene pairs in our predicted
   results (Supplementary Table [145]S1). The second was the known drug
   sensitivity data. Among the data of drug targets that was used, 13 were
   overlapped with our result. In different cancer cell lines, a smaller
   IC50 value indicates higher drug sensitivity and the corresponding drug
   target tends to have better effects in cancer therapy. More information
   is shown in Supplementary Table [146]S2 (only IC50 values less than 0
   are shown). Finally, we conducted text-mining to determine the
   relationship between the anticancer drug targets that we found and the
   genes related to cancer (or SL). The results showed that 52 of the 81
   non-cancer genes had been shown to be significantly associated with
   cancer (p < 0.05) and 16 of the 81 non-cancer genes had been shown to
   be significantly correlated with SL. Furthermore, 12 anticancer drug
   targets were closely associated with both SL and cancer (see
   Supplementary Table [147]S3). In total, 27 of the 57 anticancer drug
   targets were verified through three different aspects and 4 targets
   have been verified in all three aspects of the data, as shown in
   Fig. [148]5(a).

Figure 5.

   [149]Figure 5
   [150]Open in a new tab

   Illustration of our validations. (a) Anticancer drug targets validated
   by three aspects of the data. In the SynLethDB validation, drug
   sensitivity validation and text-mining validation, we validated 18, 13
   and 12 anticancer drug targets, respectively. In addition, 4 targets
   could be validated using all three aspects. (b) The Venn diagram was
   drawn based on the overlap of the predicted SL gene pairs in four
   previous reports and our results. The methods with extremely low
   concordance of the results are not shown in the figure, which was drawn
   with the online tool
   [151]http://bioinformatics.psb.ugent.be/webtools/Venn/.

   In particular, the four overlap non-cancer genes (CDK1, CDK2, PLK1 and
   WEE1), which were validated by the three data resources, were all known
   anticancer drug targets and clinical trial targets. Furthermore, CDK1,
   CDK2, PLK1 and WEE1 were also predicted to be promising anticancer
   targets in BRCA2-ovarian cancers by Bueno’s research^[152]29.
   Therefore, we focused on the analysis of these four overlap genes.
   Above all, the CDK1 and CDK2, which can be promising specific
   anticancer target, are both family members of the serine/threonine
   protein kinases that participate in cell cycle regulation. Firstly,
   CDK1 can be the SL partner gene of the cancer genes KRAS and MYC. As
   reported, KRAS mutations have been found in approximately 20% of human
   cancers, but there is currently no therapy targeting them^[153]30.
   Thus, targeting the SL partner gene CDK1 in ovarian cancer patients
   carrying a KRAS mutation could be a good choice in anticancer drug
   research and development. Although the cancer gene MYC is a very
   attractive therapeutic target in the treatment of breast cancer, the
   direct inhibition of the MYC gene is still a great challenge and has
   not yet provided a clinically effective drug to target it^[154]31. In
   the MYC-dependent breast cancer, another alternative is to target MYC’s
   SL partner gene CDK1, as reported in some small interfering RNA (siRNA)
   experiment^[155]31. Secondly, CDK2 was predicted to be SL partner gene
   with p53 and MYCN by RNA interference techniques^[156]32,[157]33. In
   p53 defective cells, CDK2 can separate mitogenic from anti-apoptotic
   signaling for SL^[158]33. The SL relationship between CDK2 and MYCN
   indicates CDK2 inhibitors as potential MYCN-selective cancer
   therapeutics^[159]32. Furthermore, CDK1 and CDK2 are both drug targets
   of the investigational drug Alvocidib which is a synthetic flavonoid
   based on an extract from an Indian plant for the potential treatment of
   cancer. It works by inhibiting CDK, arresting cell division and causing
   apoptosis in non-small lung cancer cells^[160]34. According to the
   concept of SL, using Alvocidib to target CDK1 may selectively kill
   specific gene mutant tumor cells. Then, PLK1, which was a drug target
   studied in acute myeloid leukemia, non-small cell lung cancer, and
   pancreatic cancer^[161]34, could be a SL partner gene of many cancer
   genes in our results. In the drug sensitivity validation, some cells
   are sensitive to the drug target PLK1, which indicates that PLK1 can
   participate in various cancers by forming SL gene pairs with many
   cancer genes. Furthermore, some researches has identified PLK1 as a
   gene whose depletion was particularly detrimental to the viability of
   PIM1-overexpressing prostate cancer, which was particularly sensitive
   to PLK1 inhibition and suggests that PIM1 might be used as a marker for
   identifying patients who will benefit from PLK1 inhibitor
   treatment^[162]35. Finally, WEE1 kinase could regulate CDK1 and CDK2
   activity to facilitate DNA replication during S-phase and prevent
   unscheduled entry into mitosis, and cancers with defects in the FA and
   HR pathways may be targeted by WEE1 inhibition, which provides a basis
   for a novel SL strategy for cancers harboring FA/HR defects^[163]36. In
   addition to the four intersection genes, many of the other non-cancer
   genes that we identified have already been predicted as the anticancer
   drug targets. For example, in the drug sensitivity experiment, IL6,
   which could be the SL partner gene of CDKN2A, RB1, STK11 and TP53, was
   a specific anticancer drug target in the prostate cancer DU-145 cell
   line when targeted by VX-702^[164]37.

Discussion

   With the development of molecular biology, biological research has
   entered the post-genome era and has made it possible to understand the
   function of the organism from an overall level. Synthetic biological
   systems (human protein interaction networks) are complex, and each
   protein element is a node in the complex network that accomplishes each
   biological process by synergizing the interactions of the nodes. Thus,
   the biological network can be abstractly seen as a human biological
   system and provides pre-screening for in vitro and in vivo follow-up
   anticancer drug targets screening. It can also save financial and
   material resources and time.

   The existing approach, which also used networks to identify SL gene
   pairs, was proven to be effective^[165]18. However, they only took the
   efficiency changes of knocking out two nodes in the network into
   account. Since this change may sometimes be caused by knocking out a
   single gene node rather than the pair, we improved the method by
   considering the knockout of both a single node and two nodes, which was
   more reliable in our opinion. Furthermore, we took a multi-step
   screening strategy from many perspectives to obtain the SL gene pairs,
   which might get better results.

   Although this study has many advantages, there are some shortcomings.
   The most significant one is that the data resources we used. On one
   hand, it is the original data we used for this study. Although we
   integrated the cancer gene data and drug-target interactions data from
   different databases, more data should be included in the future to
   obtain more useful results. This way, we will improve the accuracy of
   our results and reduce data limitations. On the other hand, it is the
   limitation of the validation data. The genes and drugs in the drug
   sensitivity experiment are relatively small, so we could only validate
   the overlapping genes between the existing data and our studies. The
   SynLethDB database, which we used to validate, included 16976 SL gene
   pairs composed by 5157 genes. Only 7088 SL gene pairs
   (7088/16976 = 41.75%) that composed by 2174 genes (2174/5157 = 42.16%)
   were found in our network data. At the same time, we made a comparison
   between all 5157 genes in SynLethDB and our 697 input cancer genes, the
   overlap genes were only 369 (52.94%), which constituted 8582 SL gene
   pairs (8582/16976 = 55.55%) in SynLethDB. As can be seen from above,
   the data contained in the SynLethDB were very different from our input
   data. As a result, we can only validate the overlap part between
   SynLethDB and ours. We also tried to make a comparison with other
   state-of-the-art computational SL finding methods. However, various
   computational methods provided potential SL gene pairs from different
   data resources and perspectives, such as the correlation of gene
   expression with mutations, gene co-expression in related biological
   processes, robustness in the cancer network or human conserved SL gene
   interactions, which may be the reason for the low coincidence rate of
   the SL gene pairs obtained from different computational methods. At the
   same time, we compared the 2555 predicted SL gene pairs (81 non-cancer
   genes and 314 cancer genes) with the results of the other seven
   previous computational
   methods^[166]7,[167]16,[168]18,[169]20,[170]21,[171]38,[172]39. As
   shown in Fig. [173]5(b), the overlap SL gene pairs of these methods was
   very rare (the details are shown in the Supplementary
   Table [174]S4).This was not the case with our results, but also with
   others. The results from the different methods were complementary to
   each other in predicting the SL gene pairs^[175]16.

   The 57 known drug targets that we found might be targets for anticancer
   drugs and could be used in drug re-positioning. Focusing on these
   targets can accelerate the development of anticancer drugs. The other
   non-cancer genes, which have not been drug targets previously, may also
   have potential in cancer therapy. Moreover, in different cancer cells,
   mutations in the same cancer gene can also lead to various functions;
   therefore, our follow-up study will focus on the different mutant types
   of the same genes, which are dedicated to finding more specific
   anticancer drug targets and corresponding sensitive drugs through the
   combination of the SL strategy.

Materials and Methods

Data sources

   In this paper, the human signaling network, including 6306 proteins and
   62937 protein-protein interactions, was collected and curated manually
   by Zaman^[176]22 from previous studies^[177]40–[178]42. The cancer
   genes were downloaded from the F-Census^[179]43 and Cancer Gene
   Census^[180]44. We obtained 697 cancer genes after removing the
   redundant ones. Drug-target interaction data was collected from the
   DrugBank^[181]45, Therapeutic Targets Database (TTD)^[182]34 and
   PROMISCUOUS databases^[183]46. In addition, we obtained 16976 human SL
   genes pairs from the SynLethDB database^[184]28. The drug sensitivity
   data and the gene mutation backgrounds of 639 cancer cell lines were
   gathered from the research^[185]37, which contained 88 cancer genes and
   130 drugs under clinical and preclinical investigation in the
   experiment.

SL screening

   The overall workflow of our method is shown in Fig. [186]6. Above all,
   we constructed the human cancer signaling network (HCSN). Next, a
   3-step screening strategy was used to obtain the SL gene pairs. Then we
   extracted the non-cancer genes from the SL gene pairs and analyzed them
   with the drug-target interactions to find the targets that were suited
   for anticancer drugs. Finally, we conducted the validation with prior
   data.

Figure 6.

   Figure 6
   [187]Open in a new tab

   The workflow of anticancer drug targets identification. The human
   cancer signaling network (HCSN) was constructed to obtain SL gene pairs
   using a 3-step screening strategy. The data of non-cancer genes and
   drug-target interactions data were obtained to identify the anticancer
   drug targets. Some validations were made to validate our results.

Construction of HCSN

   To get the HCSN, we removed the orphan nodes, peripheral interactions,
   self-loop and redundant interactions of the human signaling network and
   mapped the cancer genes into it. The human signaling network and HCSN
   could be explored using the freely available Cytoscape software
   (version 3.3.0)^[188]47. Nodes represent proteins and edges represent
   protein-protein interactions.

Obtainment of SL gene pairs

   In this study, we designed a computational approach to predict SL gene
   pairs in the HCSN, which were mainly composed of a 3-step defined
   screening strategy, network-based screening, frequency-based screening
   and function-based screening.

Network-based screening of gene pairs.

    1. Calculation of the stability score
       Herein, the stability score was defined as the stability changes of
       HCSN when knocking out a pair of nodes and just one node.
       Therefore, according to the concept of SL, gene pairs, which have
       higher stability scores, should more likely be the SL gene pairs. A
       stability change may be caused by just one node rather than the
       combination effects of gene pairs. Therefore, we proposed a network
       information centrality-based approach by knocking out both a pair
       of nodes and the single one, respectively. Then, the network
       information centrality-based stability score S was scored in
       formula (1):

   [MATH: <mi
   mathvariant="italic">S</mi><mo>=</mo><mfrac><mrow><mn>2</mn><msub><mrow
   ><mi
   mathvariant="normal">D</mi></mrow><mrow><mi>m</mi><mo>,</mo><mi>n</mi><
   /mrow></msub><mo>−</mo><msub><mrow><mi>D</mi></mrow><mrow><mi>m</mi></m
   row></msub><mo>−</mo><msub><mrow><mi>D</mi></mrow><mrow><mi>n</mi></mro
   w></msub></mrow><mrow><msub><mrow><mi>D</mi></mrow><mrow><mn>0</mn></mr
   ow></msub></mrow></mfrac> :MATH]
       1
       where D[0] was the average shortest path length of HCSN; D[m] and
       D[n] represent the average shortest path length of HCSN after
       removing the cancer gene node m and the non-cancer gene node n,
       respectively; and D[m,n] was the average shortest path length of
       HCSN after removing both the cancer gene nodes m and non-cancer
       gene nodes n. D was the average shortest path length of the network
       (calculated by the closeness in R package igraph^[189]48) and it
       was defined as follows in formula (2):

   [MATH:
   <mi>D</mi><mo>=</mo><mfrac><mn>1</mn><mrow><mfrac><mn>1</mn><mn>2</mn><
   /mfrac><mi>N</mi><mrow><mo
   stretchy="true">(</mo><mrow><mi>N</mi><mo>−</mo><mn>1</mn></mrow><mo
   stretchy="true">)</mo></mrow></mrow></mfrac><msub><mo>∑</mo><mrow><mi>i
   </mi><mo>></mo><mi>j</mi></mrow></msub><mrow><msub><mrow><mi>d</mi></mr
   ow><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow> :MATH]
       2
       here, d[ij] refers to the shortest path between the nodes i and j;
       N represents the total number of nodes in the network.
    2. Network randomization
       To evaluate the significance, we calculated the probability values
       p for each of the gene pairs using 1000 degree-preserving
       randomized networks (constructed by R package tnet^[190]49). The
       formula to calculate the P values was as follows:

   [MATH: <mi
   mathvariant="italic">p</mi><mo>=</mo><mfrac><mrow><msub><mrow><mi
   mathvariant="italic">N</mi></mrow><mrow><msub><mrow><mi>S</mi></mrow><m
   row><mi>o</mi><mi>b</mi><mi>s</mi></mrow></msub><mo><</mo><msub><mrow><
   mi>S</mi></mrow><mrow><mi>r</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>o</mi
   ><mi>m</mi></mrow></msub></mrow></msub></mrow><mrow><mn>1000</mn></mrow
   ></mfrac> :MATH]
       3
       where S[obs] refers to the S score obtained from HCSN and S[random]
       refers to the S score obtained from randomized network.
       [MATH:
       <msub><mrow><mi>N</mi></mrow><mrow><msub><mrow><mi>S</mi></mrow><mr
       ow><mi>o</mi><mi>b</mi><mi>s</mi></mrow></msub><mo><</mo><msub><mro
       w><mi>S</mi></mrow><mrow><mi>r</mi><mi>a</mi><mi>n</mi><mi>d</mi><m
       i>o</mi><mi>m</mi></mrow></msub></mrow></msub> :MATH]
       represents the numbers when the S score in the randomized network
       was larger than that in HCSN.
    3. Distance-based screening of the gene pairs

   Distance-based SL screening played a vital role in network analysis in
   our study. We thought that the human signaling network was very
   important in tumorigenesis and cancer progression. In the network,
   proteins next to each other may have some similar functions and will
   participate in certain similar biological progresses. In other words,
   two proteins might more likely be SL partners if they were closer in
   distance in the network. Therefore, we calculated the distance between
   every non-cancer gene and cancer gene, and then computed the average
   distance of those nodes. Then, we discarded the pairs for which the
   distance was larger than the average.

Frequency-based screening of gene pairs

   The development of cancer is often quite complex and usually involves
   multiple genes and pathways. We defined the nodes in HSCN with high
   degree as high frequency genes. We assumed that the higher frequency
   non-cancer genes in HCSN are more important in the biological progress.
   Therefore, we used the frequency of non-cancer genes as a filter for
   further screening. According to the cumulative frequency percentage, we
   filtered out the low frequency non-cancer genes and kept high frequency
   ones for further analysis.

Function-based screening of gene pairs

   The occurrence and progress of cancer are closely related to cell
   survival, signal transduction, cell growth and death, etc. The SL genes
   were closely associated with cancer, and so, we thought that they
   played important roles in these cancer-related functions. To further
   identify SL gene pairs, we applied the genes from the above step for
   pathway enrichment analysis with DAVID Bioinformatics Resources
   6.8^[191]50. Afterwards, we got the final SL gene pairs and some
   significant pathways which helped to exploit the identification of SL
   gene pairs.

Identification and validation of anticancer drug targets

The identification of anticancer drug targets

   We assumed that the anticancer drug target was a protein, which could
   be targeted by at least one anticancer drug. To identify potential
   anticancer drug targets, we applied the drug-target interactions and
   drug description information to annotate the identified non-cancer
   genes in the SL gene pairs we identified above.

The validation of the anticancer drug targets

   We validated our identified anticancer drug targets with three data
   sources. Firstly, the human SL gene pair in the SynLethDB database was
   used. Secondly, the SL gene pair can be validated by Garnett et al.’s
   drug sensitivity experiment results. A SL gene pair could be seen as a
   specific mutated cancer gene and a drug targeted non-cancer gene. The
   cell line with the specific mutated cancer gene should have poor
   survival condition when added drugs to target the SL partner of the
   specific mutated cancer gene. That is, the cell line was highly
   sensitive to the drug. Thus, Garnett et al.’s drug sensitivity
   experiment was used be used to validate the anticancer drug target we
   obtained. Thirdly, text-mining validation was applied to validate our
   results. For gene G (the non-cancer gene in the SL pair), the number of
   studies that mentioned gene G in PubMed was K. The number of
   cancer-related (or SL-related) studies was M. The total number of
   studies in PubMed was N. By using hypergeometric test, we calculated
   the probability that at least x of the K articles containing gene G
   demonstrated that gene G is associated with cancer (or SL).
   [MATH: <mi
   mathvariant="italic">P</mi><mo>=</mo><mn>1</mn><mo>−</mo><munderover><m
   o>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>x</mi><mo>
   −</mo><mn>1</mn></mrow></munderover><mrow><mfrac><mrow><mrow><mo
   stretchy="true">(</mo><mrow><mtable><mtr><mtd><mi>M</mi></mtd></mtr><mt
   r><mtd><mi>i</mi></mtd></mtr></mtable></mrow><mo
   stretchy="true">)</mo></mrow><mrow><mo
   stretchy="true">(</mo><mrow><mtable><mtr><mtd><mrow><mi>N</mi><mo>−</mo
   ><mi>M</mi></mrow></mtd></mtr><mtr><mtd><mrow><mi>K</mi><mo>−</mo><mi>i
   </mi></mrow></mtd></mtr></mtable></mrow><mo
   stretchy="true">)</mo></mrow></mrow><mrow><mo
   stretchy="true">(</mo><mrow><mtable><mtr><mtd><mi>N</mi></mtd></mtr><mt
   r><mtd><mi>K</mi></mtd></mtr></mtable></mrow><mo
   stretchy="true">)</mo></mrow></mfrac></mrow> :MATH]
   4

   The significance threshold was set to 0.05 and all of the genes with a
   significant P-value of less than 0.05 were verified to be
   cancer-related (or SL-related) genes.

Electronic supplementary material

   [192]Dataset 1^ (12KB, xlsx)
   [193]Dataset 2^ (96.8KB, xlsx)
   [194]Dataset 3^ (11.4KB, xlsx)
   [195]Dataset 4^ (198.1KB, xlsx)

Acknowledgements