Abstract

   Cancer is the second deadliest disease listed by the WHO. One of the
   major causes of cancer disease is tobacco and consumption possibly due
   to its main component, 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone
   (NNK). A plethora of studies have been conducted in the past aiming to
   decipher the association of NNK with other diseases. However, it is
   strongly linked with cancer development. Despite these studies, a clear
   molecular mechanism and the impact of NNK on various system-level
   networks is not known. In the present study, system biology tools were
   employed to understand the key regulatory mechanisms and the
   perturbations that will happen in the cellular processes due to NNK. To
   investigate the system level influence of the carcinogen, NNK rewired
   protein–protein interaction network (PPIN) was generated from 544
   reported proteins drawn out from 1317 articles retrieved from PubMed.
   The noise was removed from PPIN by the method of modulation. Gene
   ontology (GO) enrichment was performed on the seed proteins extracted
   from various modules to find the most affected pathways by the
   genes/proteins. For the modulation, Molecular COmplex DEtection (MCODE)
   was used to generate 19 modules containing 115 seed proteins. Further,
   scrutiny of the targeted biomolecules was done by the graph theory and
   molecular docking. GO enrichment analysis revealed that mostly cell
   cycle regulatory proteins were affected by NNK.

   Keywords: NNK, cancer, systems biology, protein–protein interaction
   network, topological analysis, gene ontology

1. Introduction

   Cancer is one of the major non-communicable diseases [[44]1] and is
   accountable for millions of deaths per year worldwide. According to the
   World Health Organization (WHO), cancer is the second major cause of
   morbidity, with an estimate of 9.6 billion deaths in 2018 [[45]2].
   Cancer is a multistage process caused by aberrations in the cellular
   processes. Cancer is not only caused by mutation in any single gene but
   also by the accumulation of mutations in multiple genes, a phenomenon
   described as ‘oncogene addiction’ [[46]3]. According to the WHO, there
   are mainly three reasons that lead to these aberrations, with tobacco
   consumption heading the list, which is single-handedly responsible for
   around 22% of deaths by cancer globally [[47]4].

   Currently, we have immense information on how tobacco consumption has
   direct implications in cancer, specially lung, head and neck, stomach,
   liver, and pancreatic cancers [[48]5,[49]6].
   4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is one of the main
   components in tobacco that plays a major role in the causation of
   cancer [[50]6]. NNK and its derivative,
   4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol (NNAL), binds with the
   DNA and forms DNA adducts, the resultant of which may lead to genetic
   mutations followed by the deregulation of normal cellular processes
   [[51]6,[52]7].

   NNK is not just responsible for causing cancer but also holds serious
   implications in other diseases as well. Earlier studies have shown that
   NNK has significant impact on steatohepatitis [[53]8], Alzheimer’s
   disease [[54]9], and tuberculosis [[55]10], for example.

   In this study, attempts have been made to exploit the system biology
   approach for an investigation of the overall impact of
   tobacco-generated carcinogen NNK on molecular systems.

   Systems biology is an interdisciplinary field, which is a combination
   of mathematics, computer science, and biology [[56]11]. Systems biology
   holds importance as it helps in getting a holistic view of the
   connections of biomolecules. It provides an anti-reductionist approach
   towards the involvement of different biomolecular components in a
   variety of biological systems. It focuses on the development of
   interactome of the affected targets and then analyzing it by using
   mathematical models [[57]12]. Graph theory is the core assessment tool
   for the topological analysis of the interactome that aims to identify
   hub biomolecular targets based on their clustering coefficients,
   degrees, and betweenness centralities. Whereas, clustering and gene
   ontology (GO) enrichment analysis categorize the biomolecules on the
   basis of functions to procure more promising insight into complex
   networks. This study aims to find the most probable biomolecular
   targets of tobacco-associated carcinogen NNK along with its
   interactions and associated pathways that get perturbated by various
   cellular mechanisms and lead to cancer development. The most probable
   key targets of NNK are identified on the basis of their bottleneck
   scores and based on their thermodynamic interactions with NNK
   calculated by molecular docking simulations.

2. Materials and Methods

   The full methodology scheme is mentioned in [58]Figure 1.

Figure 1.

   [59]Figure 1
   [60]Open in a new tab

   Schematic diagram of the adopted methodology.

2.1. Construction and Visualization of Protein–Protein Interaction Network

   In total, 544 biomolecular targets were found to be affected by NNK in
   approximately 1320 studies using PubMed. A protein–protein interaction
   network was developed using the STRING database version 10.5 [[61]13].
   The network was evidence-based and developed with the highest
   confidence level score of 0.9, having 50 interactors in the first as
   well as second shell.

2.2. Protein–Protein Interaction Network (PPIN) Analysis

   The Cytoscape Software (version 3.6.1) program [[62]14] was used for
   the analysis of the protein–protein interaction network (PPIN) to
   generate protein interaction networks. Network analyzer, an in-built
   plugin of Cytoscape, was used to analyze the topological properties of
   an NNK modulated PPIN [[63]15]. The topological properties of any
   network provide a deep insight into complex biological networks
   [[64]16,[65]17]. The topological analysis also helps in reducing noise
   in the data and offers reliable information regarding the network
   [[66]18]. Node properties, like degree distribution, shortest path
   length, average clustering coefficient, betweenness centrality, and
   closeness centrality, were also analyzed.

2.3. Protein Interaction Network Modular Analysis and Pathway Enrichment

   Clusters or modules are closely connected nodes in a network that come
   together and form a dense sub-network [[67]19]. The analysis of
   clusters or modules helps in attaining detailed information about PPIN.
   Molecular COmplex DEtection (MCODE) is a plug-in available in the
   Cytoscape software program, which was used for the cluster analysis.
   The clusters are scored on the basis of size and density—a high score
   means a big and dense cluster—while gene ontology (GO) serves the
   purpose of validating the cluster that belongs to a specific function
   [[68]20]. Thus, the enrichment of clusters helps in enriching the
   pathways by providing an additional number of external genes that are
   not present in the dataset. GO analysis provides detailed information
   about the biological process underneath that cluster. For GO functional
   enrichment analysis, ClueGO (version 2.5.1) plug-in of Cytoscape
   software was used [[69]21]. The analysis was done using a threshold p
   value < 0.05. A two-sided hypergeometric test was used for the
   statistical analysis along with the Bonferroni correction method, in
   case applied.

2.4. Molecular Docking Analysis

   Molecular docking is one of the most preferred methods to find the
   orientation of two molecules when they form a complex. Docking
   simulation also explores the thermodynamic stability of the complexes
   by providing information regarding the binding energy and inhibition
   constant (K[i]) value and helps in finding the best binding modes or
   orientations of a ligand with its biomolecule. The docking parameters
   used were based on the studies published by [[70]22]. Autodock 4.0 MGL
   suite [[71]23] was used for docking simulations. The simulations were
   performed on AMD E1-6015 APU processor, CPU 1.4 GHz and 4 GB RAM of
   Hewlett-Packard (HP) machine.

3. Results

3.1. Construction of the Network

   In total, 544 biomolecular targets were found to be affected by NNK
   interaction through a literature survey. A protein–protein interaction
   network (PPIN) was developed using the STRING database version 10.5.
   The network developed at the 0.9 confidence level score and 50–50
   interactors in the first and second shell comprised of 534 nodes and
   2909 edges. The average node degree was 10.09 and average local
   clustering coefficient was 0.501. The PPI enrichment p value was <1 ×
   10^−16. [72]Figure 2 represents the NNK rewired protein–protein
   interaction network having 534 nodes and 2909 edges. [73]Figure 3A,B
   depicts the number of biomolecular targets involved in various
   processes and pathways, respectively.

Figure 2.

   [74]Figure 2
   [75]Open in a new tab

   STRING generated NNK rewired protein–protein interaction network with
   534 nodes and 2909 edges.

Figure 3.

   [76]Figure 3
   [77]Open in a new tab

   (A) Processes enriched by NNK rewired PPIN. (B) Pathways enriched by
   NNK rewired PPIN.

3.2. Topological Properties of the Network

   The protein–protein interaction network developed was further analyzed
   using Cytoscape software. The topological properties of the PPIN
   calculated with the help of Network Analyzer plug-in were the shortest
   path length average, neighborhood connectivity distribution, clustering
   coefficient, node degree distribution, betweenness centrality,
   closeness centrality, etc. The shortest path length in any network
   depicts the shortest communication mode between two nodes. [78]Figure 4
   shows a graphical representation of path length distribution 2971. This
   means that at the 2971-unit path length, the information is being
   passed on at the highest frequency. The degree of a node describes the
   connectivity of a node with other nodes; it is the total number of
   links, which are either reaching or starting from that node [[79]24].
   The node degree distribution ([80]Figure 5) is one of the most
   important topological properties of a network. It fits a power law that
   indicates the presence of hubs in the network [[81]25]. The nodes,
   which lie close to the power line and have a higher degree, can be
   considered as the hubs. The average neighborhood connectivity
   distribution stands for the average number of neighbors, which was
   observed as 15,940 in this study ([82]Figure 6). [83]Figure 6 shows the
   average connections of each node with its neighbors. The above
   parameter helps in understanding the density of a network. The
   clustering coefficient of a network depicts the tendency of a graph to
   be divided into clusters [[84]26]. The local clustering coefficient is
   the number of edges around a particular node, whereas the average
   clustering coefficient is the clustering coefficient of the whole
   network [[85]25], and in this study, the average clustering coefficient
   in this network was found to be 0.597 ([86]Figure 7).

Figure 4.

   [87]Figure 4
   [88]Open in a new tab

   Characteristic path length distribution: 2971.

Figure 5.

   [89]Figure 5
   [90]Open in a new tab

   Node degree distribution following power law fitting y = 61,323x^−0.861
   (R-squared 0.722).

Figure 6.

   [91]Figure 6
   [92]Open in a new tab

   Average (Avg.) neighborhood connectivity distribution following power
   law fitting y = 19,838x^0.137 (R-squared 0.253). The average number of
   neighbors is 15,940.

Figure 7.

   [93]Figure 7
   [94]Open in a new tab

   Average (Avg.) clustering coefficient following power law fitting y =
   1.326 x^−0.277 (R-squared 0.329). Clustering coefficient distribution
   of 0.597.

3.3. Clustering and GO Enrichment Analysis

   For modular analysis and pathway enrichment, the MCODE plug-in of the
   Cytoscape was used. Modules or clusters were created from the network
   and were scored on the basis of their size and density. A high score
   depicts a denser and tighter cluster. Formation of clusters also helps
   in the reduction of the noise and getting a better understanding of the
   genes involved in the clusters. [95]Figure 8 shows the clusters
   generated by MCODE. The nodes in red color represent the seed proteins,
   and the yellow nodes are the connectors. The seed proteins are the
   proteins that were reported earlier to get either upregulated or
   downregulated by the action of NNK and connector proteins are the
   proteins that are associated with the seed proteins in the transfer of
   information, but are not reported to have any direct relation with NNK.
   The seed proteins were checked in all the clusters and are presented in
   bold in [96]Table 1, while the remaining proteins (non-highlighted) are
   the connectors. The cluster that was ranked first had the highest score
   of 29,862, with 30 nodes and 433 edges. Moreover, during analysis, it
   was found that cluster 1 had 19 seed proteins and 11 connectors and the
   overall analysis of the entire cluster found 115 seeds and 88
   connectors.

Figure 8.

   [97]Figure 8
   [98]Open in a new tab

   Network modules: red circles show the seed proteins and yellow circles
   are the connectors.

Table 1.

   Generated clusters with MCODE scores, seeds (bold letters), and
   connector (non-bold letters) proteins.
   Cluster Score Nodes Edges Seeds Connectors Node IDs
   1 29.862 30 433 19 11 PLK1, CASC5, BUB3, CDCA8, STAG1, BUB1B, CCNB2,
   CCNB1, BUB1, ESPL1, CENPF, KIF2C, CDCA5, MAD2L1, CDK1, PDS5B, WAPAL,
   APITD1, SKA2, RAD21, SMC1A, KNTC1, CDC20, CENPA, CENPE, SMC3, STAG2,
   MAD1L1, AURKB, INCENP
   2 19.3 41 386 22 19 CDKN1A, FBXO31, CDC6, CDK2, RNF144B, KLHL42, CCNA1,
   DBF4, CDC7, MCM4, POLA2, PRIM2, CDC34, POLA1, SKP2, ORC3, MCM2, MCM5,
   ORC2, ORC6, MCM6, MCM3, ORC4, CHEK1, AURKA, PTTG1, CDC45, PRIM1, MCM10,
   CDT1, ORC1, HUWE1, ORC5, RBBP6, NEK2, MCM7, ANAPC16, ANAPC13, CDKN1B,
   RPA1, CCNA2
   3 14.652 47 337 9 38 ERCC2, E2F1, FBXO5, CDC23, ANAPC5, FANCE, FANCC,
   FANCG, BLM, ANAPC4, ANAPC1, BRCA1, RB1, FANCF, FANCL, C17orf70, CCNE1,
   C19orf40, FANCD2, FANCI, ANAPC10, FANCM, FANCB, GTF2H5, CCND2, FZR1,
   CDC26, RMI1, TOP3A, FANCA, GTF2H4, ANAPC11, CDC27, ANAPC2, GTF2H2,
   ANAPC7, GTF2H1, STRA13, CDC16, CCNH, C1orf86, ERCC3, ATM, CCND3, BARD1,
   GTF2H3, CCND1
   4 10.762 22 113 13 9 FAS, RIPK1, CASP8, IKBKG, BID, HDAC1, TNFRSF10B,
   TRADD, TP53, TFDP1, TRAF2, TNFSF10, FADD, FASLG, TNFRSF10A, CASP10,
   E2F2, TOPBP1, IKBKB, CDK4, CCNE2, CDK6
   5 7 7 21 7 0 BBC3, BCL2A1, BCL2L11, BCL2, MCL1, PMAIP1, BCL2L1
   6 5 5 10 2 3 IGFBP1, LGALS1, IGFBP4, QSOX1, IGFBP5
   7 4.333 7 13 5 2 NCKAP1, WASF2, CYFIP2, PRKCA, PTK2, PRKCB, BCAR1
   8 4 4 6 4 0 H1F0, HMGA2, ASF1A, HIRA
   9 4 4 6 1 3 KIAA1429, WTAP, CBLL1, ZC3H13
   10 3.333 4 5 3 1 EAF2, GTF2H2C, CDK7, MNAT1
   11 3.333 4 5 4 0 OAS1, IFI27, OAS2, IFIT3
   12 3.333 7 10 5 2 STAG3, CEP70, FKBP6, SMC1B, NEDD1, HSP90AA1, TPX2
   13 3 3 3 3 0 MAP3K5, TRAF1, BIRC3
   14 3 3 3 3 0 PSMC3IP, MND1, DMC1
   15 3 3 3 3 0 OIP5, MIS18A, NPM1
   16 3 3 3 3 0 TNFRSF11A, TNFRSF12A, TNFSF11
   17 3 3 3 3 0 RHOB, RHOC, CDC25C
   18 3 3 3 3 0 SDCBP, PYCARD, CPPED1
   19 3 3 3 3 0 FGF2, FGFR3, FGF9
   [99]Open in a new tab

   Nodes in “bold text” represent the seed proteins and the nodes in
   “normal” text represent the connector proteins.

   The seed proteins were identified in each cluster. Thereafter, the PPIN
   was generated using the STRING database with 0.9 confidence level,
   where 50 interactors in the first and another 50 in the second shell
   were recorded. The PPIN generated now was further enriched using the
   ClueGO plug-in.

3.4. PIN Construction and Topological and GO Analysis of Final Selected Seed
Proteins

   After the modularization process, 115 seed proteins were obtained, and
   used to further create a PPIN ([100]Figure 9) with 100 connectors using
   the highest confidence level score of 0.9. The PPIN generated had 213
   nodes and 2509 edges, with an average node degree 23.6 and average
   clustering coefficient of 0.761. The PPI enrichment p-value was less
   than 1 × 10^−16.

Figure 9.

   [101]Figure 9
   [102]Open in a new tab

   Protein–protein interaction network (PPIN) of final seed proteins.

   The network generated was analyzed using Cytohubba, a plug-in of
   Cytoscape [[103]27]. Moreover, cytohubba analysis of each node of the
   network was performed and scored based on various parameters, like the
   degree, closeness centrality, clustering coefficients, betweenness,
   bottleneck, and stress. The proteins were first sorted on the basis of
   the clustering coefficient and then on the basis of bottleneck. The
   proteins with a clustering coefficient less than 0.5 were selected. The
   proteins having a clustering coefficient more than 0.5 were rejected as
   this depicts that these proteins were highly clustered and have no
   further spaces for the attachment of other molecules. The nodes with a
   clustering coefficient less than 0.5 and high degrees were found to be
   highly significant. A high degree and clustering coefficient less than
   0.5 depict that the nodes are important in various connecting networks
   and they also have binding spaces available on their surface for other
   molecules to bind to. Once the nodes were sorted on the basis of
   clustering coefficients, the next most important parameter was the
   bottleneck. The proteins with high bottleneck were considered the most
   critical proteins in any PPIN. The median of the bottleneck scores of
   selected proteins was calculated which came out to be 3. All the
   proteins with bottleneck more than or equal to 2 were finally selected.
   [104]Table 2 enlists the final selected proteins, which were sorted on
   the basis of bottleneck, with clustering coefficients less than 0.5.
   CHEK1, showing the highest bottleneck score of 29, was ranked in first
   position followed by TP53 with a bottleneck of 27. These were also
   analyzed with ClueGO for the pathway enrichment analysis. [105]Figure
   10 shows the pie chart representation of the enriched pathways in the
   form of groups. In [106]Figure 10, the cell cycle is occupying the
   maximum area (43.64%) of the pie chart. From this, we can say that the
   cell cycle is the most enriched pathway, having the maximum number of
   biomolecules (a detailed graph displaying the sub-pathways in each
   group along with the number of genes present in it and the proteins
   involved in each group is enlisted in [107]Table 3, also depicted in
   [108]Supplementary Figure S1). [109]Figure 11 shows that the maximum
   number of genes are associated with the cell cycle followed by the
   cellular macromolecule metabolic process.

Table 2.

   Key proteins with their bottleneck, clustering coefficient, and degree
   scores.
     Name   Betweenness Bottleneck Closeness Clustering Coefficient Degree
    CHEK1    835.45994      29     116.61667        0.36501           52
     TP53   8007.14223      27     126.41667        0.19394           55
    BRCA1   2686.48081      23      127.45          0.26346           65
     CDK1   3705.28949      19     140.41667        0.32157           85
     CDK4    935.26901      14     112.66667        0.42521           35
   HSP90AA1 3657.32532      13     101.66667        0.27273           22
     RPA2   1523.40238      9      125.78333        0.33978           64
     ATM    1803.73545      8      115.36667        0.33718           40
    TFDP1    426.33754      6      111.78333        0.46702           34
    CDKN1B  1218.04331      4      120.41667        0.44245           50
    CASP8   1842.98595      4      90.11667         0.44853           17
    PYCARD      796         3      62.18333         0.33333           3
    CCNA1    894.78122      3        125.2          0.39548           60
    CCNB1   1208.39103      3      127.26667        0.40665           69
     RPA1   1657.28981      2       126.45          0.33269           65
     CDK2   1887.40725      2        134.2          0.34035           76
    CHEK2    126.13416      2        95.95           0.3619           15
     BID     577.03969      2      90.78333          0.4269           19
     RB1     608.84911      2      107.66667        0.44138           30
     PLK1   1284.13385      2       121.25          0.49545           56
     CDK7    488.15015      2      108.78333        0.49733           34
   [110]Open in a new tab

Figure 10.

   [111]Figure 10
   [112]Open in a new tab

   ClueGO results of gene ontology (GO) functional enrichment of key
   proteins.

Table 3.

   Pathway enrichment of the modulated seed proteins using ClueGO.
   Function Groups Group Genes
   Cell cycle Group12
   ANAPC1|ASF1A|ATM|AURKA|AURKB|BARD1|BBC3|BCAR1|BCL2|BCL2A1|BCL2L1|BCL2L1
   1|
   BIRC3|BRCA1|BUB1|BUB1B|BUB3|CASP8|CCNA1|CCNA2|CCNB1|CCNB2|CCNE2|CDC20|C
   DC25C|
   CDC34|CDC45|CDC6|CDC7|CDCA5|CDCA8|CDK1|CDK2|CDK6|CDKN1A|CENPA|CENPE|CEN
   PF|CEP70|CHEK1|CYFIP2|DBF4|DMC1|E2F1|E2F2|EAF2|FANCC|FANCD2|FAS|FASLG|F
   BXO31|FBXO5|FGF2|
   FGF9|FGFR3|FKBP6|H1F0|HDAC1|HIRA|HMGA2|HSP90AA1|HUWE1|KIF2C|KNTC1|LGALS
   1|MCL1|
   MCM2|MCM4|MCM5|MCM6|MCM7|MDM2|MIS18A|MNAT1|MND1|NEDD1|NEK2|NPM1|OIP5|
   PDS5B|PLK1|PMAIP1|POLA1|PRKCA|PRKCB|PSMC3IP|PTK2|PYCARD|RBBP6|RHOB|RHOC
   |RNF144B|SDCBP|SKA2|STAG1|STAG3|TFDP1|TNFRSF10A|TNFRSF10B|TNFRSF10C|TNF
   RSF12A|TOPBP1|TP53|TRAF1
   Cellular senescence Group06
   ANAPC1|ASF1A|ATM|BBC3|BCL2|BCL2L1|BCL2L11|BIRC3|BRCA1|CASP8|CCNA1|
   CCNA2|CCNE2|CDC20|CDC25C|CDCA5|CDK1|CDK2|CDK6|CDKN1A|CHEK1|E2F1|E2F2|FA
   S|
   FASLG|FGF2|FGF9|FGFR3|H1F0|HDAC1|HIRA|HMGA2|HSP90AA1|MAP3K5|MCL1|MDM2|P
   MAIP1|
   PRKCA|PRKCB|PTK2|TFDP1|TP53|TRAF1
   DNA conformation change Group07
   ASF1A|ATM|BBC3|BCL2L11|BIRC3|CASP8|CCNB1|CDC45|CDCA5|CDK1|CENPA|CENPE|C
   ENPF|
   DMC1|FANCC|FAS|FBXO5|H1F0|HDAC1|HIRA|HMGA2|HSP90AA1|KNTC1|MCM2|MCM4|MCM
   5|
   MCM6|MCM7|MDM2|MIS18A|MNAT1|NPM1|OAS1|OIP5|PMAIP1|POLA1|PRKCA|PTK2|PYCA
   RD|
   RHOC|TNFSF11|TP53|TRAF1
   DNA metabolic process Group08
   ASF1A|ATM|AURKA|AURKB|BARD1|BBC3|BCL2|BCL2A1|BCL2L1|BCL2L11|BRCA1|BUB1|
   BUB1B|BUB3|CCNA1|CCNA2|CCNB1|CCNE2|CDC25C|CDC34|CDC45|CDC6|CDC7|CDCA5|C
   DK1|CDK2|CDK6|CDKN1A|CENPF|CHEK1|DBF4|DMC1|E2F1|EAF2|FANCC|FANCD2|FAS|F
   BXO31|FBXO5|FKBP6|H1F0|HDAC1|HMGA2|HSP90AA1|HUWE1|KNTC1|MCL1|MCM2|MCM4|
   MCM5|MCM6|MCM7|MDM2|MIS18A|MNAT1|MND1|NEK2|NPM1|PDS5B|PLK1|PMAIP1|POLA1
   |PRKCB|PSMC3IP|PYCARD|RBBP6|TFDP1|TNFRSF10A|TNFRSF10B|TNFRSF10C|TOPBP1|
   TP53
   DNA repair Group10
   ANAPC1|ASF1A|ATM|AURKA|AURKB|BARD1|BBC3|BCL2|BCL2L11|BRCA1|CASP8|CCNA1|
   CCNA2|
   CCNB1|CCNB2|CCNE2|CDC20|CDC25C|CDC34|CDC45|CDC6|CDC7|CDCA5|CDK1|CDK2|CD
   K6|CDKN1A|CENPF|CEP70|CHEK1|DBF4|DMC1|E2F1|E2F2|EAF2|FANCC|FANCD2|FAS|F
   ASLG|H1F0|HDAC1|
   HIRA|HMGA2|HSP90AA1|HUWE1|MAP3K5|MCM2|MCM4|MCM5|MCM6|MCM7|MDM2|MNAT1|
   MND1|NEDD1|NEK2|NPM1|PDS5B|PLK1|PMAIP1|POLA1|PRKCA|PRKCB|RBBP6|TFDP1|TN
   FRSF10A|TNFRSF10B|TNFRSF10C|TOPBP1|TP53|TRAF1
   G2/M checkpoints Group04
   ASF1A|ATM|AURKA|AURKB|BARD1|BBC3|BCL2|BCL2A1|BCL2L1|BCL2L11|BRCA1|CCNA1
   |CCNA2|
   CCNB1|CCNB2|CCNE2|CDC25C|CDC34|CDC45|CDC6|CDC7|CDCA5|CDK1|CDK2|CDKN1A|C
   ENPF|
   CHEK1|DBF4|DMC1|E2F1|FANCC|FANCD2|FBXO31|FKBP6|H1F0|HDAC1|HMGA2|HSP90AA
   1|HUWE1|MCL1|MCM2|MCM4|MCM5|MCM6|MCM7|MDM2|MIS18A|MNAT1|MND1|NEK2|NPM1|
   PDS5B|PLK1|PMAIP1|POLA1|PSMC3IP|PYCARD|RBBP6|TFDP1|TOPBP1|TP53
   Immune system Group01
   ANAPC1|BCL2|BCL2L1|BIRC3|CASP8|CDC20|CDC34|CDK1|CDKN1A|CENPE|CPPED1|CYF
   IP2|FASLG|
   FBXO31|FGF2|FGF9|FGFR3|HSP90AA1|HUWE1|IFI27|IFIT3|KIF2C|MCL1|OAS1|OAS2|
   PRKCB|PTK2|
   PYCARD|QSOX1|RBBP6|RNF144B|SDCBP|TNFRSF11A|TNFRSF12A|TNFSF11|TP53
   Measles Group00
   BBC3|CCNE2|CDK2|CDK6|FAS|FASLG|OAS1|OAS2|TNFRSF10A|TNFRSF10B|TNFRSF10C|
   TP53
   Resolution of sister chromatid cohesion Group11
   ANAPC1|ATM|AURKA|AURKB|BARD1|BBC3|BCL2|BCL2L11|BIRC3|BRCA1|BUB1|BUB1B|B
   UB3|CASP8|CCNA1|CCNA2|CCNB1|CCNB2|CCNE2|CDC20|CDC25C|CDC34|CDC6|CDC7|CD
   CA5|CDCA8|CDK1|CDK2|CDKN1A|CENPA|CENPE|CENPF|CEP70|CHEK1|CYFIP2|DMC1|E2
   F1|FANCD2|FAS|FASLG|FBXO31|FBXO5|FKBP6|HMGA2|HSP90AA1|HUWE1|IFI27|KIF2C
   |KNTC1|MAP3K5|MDM2|MNAT1|MND1|NEDD1|NEK2|NPM1|PDS5B|PLK1|PMAIP1|PRKCA|P
   RKCB|PSMC3IP|PTK2|PYCARD|RBBP6|RNF144B|SDCBP|SKA2|STAG1|STAG3|TFDP1|TNF
   RSF10A|TNFRSF10B|TOPBP1|TP53
   Cellular macromolecule metabolic process Group02
   ANAPC1|ASF1A|ATM|AURKA|AURKB|BARD1|BBC3|BCL2|BCL2L11|BIRC3|BRCA1|BUB1|B
   UB1B|BUB3|CASP8|CCNA1|CCNA2|CCNB1|CCNE2|CDC20|CDC25C|CDC34|CDC45|CDC6|C
   DC7|CDCA5|CDCA8|CDK1|CDK2|CDK6|CDKN1A|CENPE|CENPF|CHEK1|CPPED1|CYFIP2|D
   BF4|DMC1|E2F1|E2F2|EAF2|FANCC|FANCD2|FAS|FASLG|FBXO31|FBXO5|FGF2|FGF9|F
   GFR3|FKBP6|H1F0|HDAC1|HIRA|HMGA2|HSP90AA1|HUWE1|IFI27|LGALS1|MAP3K5|MCM
   2|MCM4|MCM5|MCM6|MCM7|MDM2|MIS18A|MNAT1|MND1|NEK2|NPM1|OAS1|OAS2|PDS5B|
   PLK1|PMAIP1|POLA1|PRKCA|PRKCB|PSMC3IP|PTK2|PYCARD|QSOX1|RBBP6|RNF144B|S
   DCBP|STAG1|TFDP1|TNFRSF10A|TNFRSF10B|TNFRSF10C|TNFRSF11A|TNFSF11|TOPBP1
   |TP53|TRAF1|WTAP
   Nuclear division Group09
   ANAPC1|ATM|AURKA|AURKB|BRCA1|BUB1|BUB1B|BUB3|CCNA1|CCNA2|CCNB1|CCNE2|CD
   C20|
   CDC25C|CDC6|CDCA5|CDCA8|CDK1|CDK2|CENPE|CENPF|CHEK1|DMC1|FANCD2|FBXO5|F
   KBP6|
   KIF2C|KNTC1|MND1|NEK2|PDS5B|PLK1|PSMC3IP|STAG1|STAG3
   Protein–DNA complex assembly Group03
   ASF1A|ATM|AURKA|AURKB|BBC3|BCL2|BCL2L11|BIRC3|BRCA1|CASP8|CCNB1|CDC20|C
   DC34|CDC45|CDK1|CDK2|CENPA|CENPE|CENPF|CEP70|DMC1|FANCC|FAS|FBXO5|H1F0|
   HDAC1|HIRA|HMGA2|HSP90AA1|KNTC1|MCM2|MCM7|MDM2|MIS18A|MNAT1|NEDD1|NEK2|
   NPM1|OAS1|OIP5|PLK1|PMAIP1|PRKCA|PTK2|PYCARD|RHOC|SDCBP|STAG1|STAG3|TNF
   SF11|TP53|TRAF1
   Regulation of cell cycle G2/M phase transition Group05
   ANAPC1|ATM|AURKA|AURKB|BARD1|BRCA1|BUB1B|BUB3|CCNA1|CCNA2|CCNB1|CCNB2|C
   DC20|
   CDC25C|CDC7|CDK1|CDK2|CDK6|CDKN1A|CENPF|CEP70|CHEK1|FBXO5|HMGA2|HSP90AA
   1|MDM2|MNAT1|NEDD1|NEK2|NPM1|PLK1|PRKCA|PRKCB|TOPBP1|TP53
   [113]Open in a new tab

Figure 11.

   [114]Figure 11
   [115]Open in a new tab

   Representation of the pathways enriched and the total number of genes
   associated with them.

3.5. Molecular Docking Simulation of NNK with Key Proteins

   Autodock 4.0 was used for docking the final 30 target proteins with
   NNK. CDK7 showed the highest binding energy of −5.93 Kcal/Mol followed
   by CCNA1 (−5.6 Kcal/Mol). The binding energies of the proteins with NNK
   will further help in refining the results in the selection of the
   best-suited target proteins of NNK. The more negative the binding
   energy, the stronger the interaction between the ligand and the
   protein. The binding energies, Ki values, and the H-bonds formed along
   with their distances for all the 20 target proteins are listed in
   [116]Table 4. [117]Figure 12 shows the top three interactions of NNK
   with its target biomolecules, namely CDK7, CCNA1, and CDKN1B.
   [118]Table 5 shows the key biomolecular targets of NNK and their role
   in cell cycle regulation.

Table 4.

   Final 22 target proteins docked with NNK.
   S.No Protein Ligand Binding Energy (Kcal/Mol) K[i] Binding Residues
   H-Bond Distance
   1. CDK7 NNK −5.93 45.31 μM Leu18, Val26, Ala39, Lys41, Ile75, Phe91,
   Asp92, Phe93, Met94, Glu95, Thr96, Asp97, Leu144, Asp155 CDK7:MET94:N -
   :NNK:O7 2.86224
   2. CCNA1 (connector) NNK −5.60 79.09 μM Cys97, Gly98, Gln99, Gly100,
   Val164, Asp165, Thr166, Gly167, Thr168, Leu169, Lys170, Leu173, Tyr218
   :GLY98:H - :NNK:O7
   :GLY167:H - :NNK:N10
   :THR168:H - :ASP165:O
   :LYS170:H - :NNK:N14
   :LYS170:H - :NNK:O15
   :NNK:N14 - :THR168:O 1.83311
   1.98057
   2.11393
   2.36928
   1.99807
   3.02246
   3. CDKN1B NNK −5.42 106.27 μM His573, Lys574, Pro575, Leu576, Glu581,
   Trp582, Gln583, Glu584 CDKN1B:GLN583:N - :NNK:N14
   CDKN1B:GLN583:N - :NNK:O15
   :NNK:N14 - CDKN1B:GLN583:O 2.84311
   2.79833
   2.96186
   4. CASP8 NNK −5.35 119.75 μM Lys2224, Tyr2226, Gln2227, Asp2308,
   Gly2350, Lys2351, Pro2352, Asp2398, Arg2471, Lys2472 CASP8:LYS2351:HZ1
   - :NNK:N10
   CASP8:ARG2471:HH21 - :NNK:N2
   CASP8:ARG2471:HH21 - :NNK:N14
   CASP8:ARG2471:HH21 - :NNK:O15
   CASP8:LYS2472:HZ1 - :NNK:O7 1.9527
   2.48624
   2.19123
   2.25672
   1.82255
   5. CHEK2 (connector) NNK −5.35 120.25 μM Ser228, Gly229, Ala230,
   Cys231, Gly232, Val234, Lys249, Leu301, Thr367, Asp368 CHEK2:CYS231:N -
   :NNK:N10
   CHEK2:GLY232:N - :NNK:N10
   CHEK2:LYS249:NZ - :NNK:N14
   CHEK2:ASP368:N - :NNK:O15 3.11295
   3.11412
   3.20168
   2.86381
   6. PLK1 NNK −5.21 152.98 μM Lys413, Trp414, Val415, Asp416, Leu490,
   Asn533, Lys540 PLK1:TRP414:N - :NNK:O7
   PLK1:ASP416:N - :NNK:N14
   PLK1:ASP416:N - :NNK:O15
   PLK1:ASN533:ND2 - :NNK:N10 2.97424
   2.94596
   2.71702
   3.10323
   7. BID (connector) NNK −5.13 174.27 μM Leu21, Phe24, Gly25, Gln28,
   Leu39, Asp40, Leu42, Gly43, Arg86, Ala89, Arg90, Phe173 :NNK:N14 -
   BID:GLN28:OE1
   BID:PHE24:HA - :NNK:O15
   BID:ARG86:HA - :NNK:N10
   :NNK:C3 - BID:GLN28:OE1
   :NNK:C1 - BID:LEU39:O
   :NNK:C11 - BID:ARG86:O
   :NNK:O15 - BID:PHE24
   :NNK:N14 - BID:GLN28:OE1
   BID:PHE24:HA - :NNK:O15
   BID:ARG86:HA - :NNK:N10
   :NNK:C3 - BID:GLN28:OE1
   :NNK:C1 - BID:LEU39:O
   :NNK:C11 - BID:ARG86:O
   :NNK:O15 - BID:PHE24 3.29101
   2.94776
   2.82375
   3.41587
   3.00214
   3.32997
   3.70686
   3.29101
   2.94776
   2.82375
   3.41587
   3.00214
   3.32997
   3.70686
   8. HSP90AA1 NNK −5.10 183.46 μM Leu48, Asn51, Ser52, Ala55, Asp93,
   Ile96, Gly97, Met98, Asn106, Phe138, Thr184, Val186 HSP90AA1:ASN51:ND2
   - :NNK:N14
   HSP90AA1:ASN51:ND2 - :NNK:O15
   HSP90AA1:THR184:OG1 - :NNK:O7 3.12471
   2.99653
   2.69827
   9. BRCA1 NNK −5.08 187.68 μM Val1654, Ser1655, Gly1656, Leu1657,
   Thr1658, Pro1659, Phe1662, Thr1700, Leu1701, Lys1702 BRCA1:SER1655:OG -
   :NNK:N14
   BRCA1:GLY1656:N - :NNK:O7
   BRCA1:LEU1657:N - :NNK:O7
   BRCA1:LEU1701:N - :NNK:O15
   BRCA1:LYS1702:N - :NNK:O15
   :NNK:N14 - BRCA1:SER1655:OG 3.19252
   2.75464
   2.77131
   3.00037
   2.75018
   3.19252
   10. CDK1 NNK −5.00 217.06 μM Lys88, Leu91, Asp92, Ile94, Pro95, Pro96,
   Glu196, Lys200 CDK1:LYS200:NZ - :NNK:N14
   CDK1:LYS200:NZ - :NNK:O15
   :NNK:N14 - CDK1:ILE94:O 3.0124
   2.96559
   2.89466
   11. CDK2 NNK −4.90 255.02 μM Val29, Glu81, Phe82, Leu83, His84, Ile135,
   Asn136, Thr137 CDK2:PHE82:N - :NNK:N10
   CDK2:HIS84:N - :NNK:N14
   CDK2:HIS84:N - :NNK:O15
   CDK2:HIS84:ND1 - :NNK:N14
   :NNK:O15 - CDK2:ILE135:O 2.88593
   2.95506
   2.76842
   3.18956
   2.91961
   12. CCNB1 NNK −4.86 274.97 μM Ile253, Lys256, Tyr257, Glu285, Leu289,
   Phe294, Gly295, Leu296, Gly297
   CCNB1:TYR257:N - :NNK:N10
   CCNB1:LEU296:N - : NNK:N14
   CCNB1:LEU296:N - : NNK:O15
   CCNB1:GLY297:N - : NNK:O15
   : NNK:N14 - CCNB1:PHE294:O
   3.14377
   2.86319
   2.81795
   2.91169
   3.10697
   13. CHEK1 NNK −4.80 303.50 μM Val23, Val37, Ile39, Glu55, Asn59, Leu82,
   Phe149 CHEK1:ILE39:N - :NNK:N10
   CHEK1:ASN59:ND2 - :NNK:N14
   CHEK1:ASN59:ND2 - :NNK:O15
   CHEK1:PHE149:N - :NNK:O15
   :NNK:N14 - CHEK1:GLU55:OE2
   :NNK:O15 - CHEK1:GLU55:OE2
   :NNK:O15 - CHEK1:PHE149:N 2.8265
   2.71952
   3.02158
   3.15867
   3.05383
   2.67183
   3.15867
   14. RPA2 (connector) NNK −4.70 358.90 μM Cys49, Thr50, Ile76, Val77,
   Asp96, Met97, Tyr125, Phe155, His158, Ile159 RPA2:VAL77:N - :NNK:O15
   :NNK:O15 - RPA2:CYS49:O
   :NNK:O15 - RPA2:VAL77:O
   :NNK:O15 - RPA2:HIS158:NE2 3.09031
   3.06724
   3.18087
   2.98064
   15. ATM NNK −4.48 523.49 μM Thr2059, Ala2062, Gly2063, Ile2065,
   Gln2066, Gln2069, Leu2077, Tyr2080, Leu2081, Leu2084, Glu2094, Leu2095,
   Leu2098 ATM:GLN2066:N - :NNK:O7 3.06063
   16. CDK4 NNK −4.35 650.98 μM Val44, Leu54, Pro55, Thr58, Val59, Val62,
   Ala63, Arg66, Val82, Ile92, Val94 CDK4:PRO55:CD - :NNK:O15
   CDK4:VAL59:CA - :NNK:O7 2.9883
   2.92285
   17. TFDP1 NNK −4.20 833.96 μM Val264, Phe285, Asn286, Phe287, Phe291
   TFDP1:PHE287:N - :NNK:O15
   :NNK:O15 - TFDP1:PHE287:O 2.78538
   3.08744
   18. TP53 NNK −4.14 927.72 μM Gln136, Leu137, Ala138, His179, Cys182,
   Asp184, Asn239, Cys275, Ala276 TP53:LEU137:N - :NNK:O15
   TP53:ASN239:ND2 - :NNK:N14
   TP53:ASN239:ND2 - :NNK:O15
   :NNK:O15 - TP53:CYS275:O 2.93164
   3.07617
   2.87469
   2.82627
   19. RB1 NNK −4.02 1.14 mM Val434, Gly435, Gln436, Cys438, Asn505,
   Leu506, Asp507, Ser508, Gly509, Thr510 RB1:GLN436:HN - :NNK:O15
   RB1:SER508:HN - :NNK:O7
   RB1:GLY509:HN - :NNK:O7 1.93726
   2.36625
   1.90514
   20. RPA1 (connector) NNK −3.67 2.03 mM Val375, Asn402, Pro403, Ala408,
   Tyr409, Arg412, Gly413 RPA1:ARG412:NH1 - :NNK:N14
   :NNK:O15 - RPA1:ALA408:O 3.06161
   2.78707
   [119]Open in a new tab

Figure 12.

   [120]Figure 12
   [121]Open in a new tab

   Binding interactions of the top three proteins with NNK: (a) CDK7
   (−5.93 Kcal/Mol), (b) CCNA1 (−5.6 Kcal/Mol), (c) CDKN1B (−5.42
   Kcal/Mol).

Table 5.

   Key proteins and their roles in cell cycle regulation.
   S.No Protein Role in Cell Cycle References