Abstract Besides its anti-inflammatory, analgesic and anti-pyretic properties, aspirin is used for the prevention of cardiovascular disease and various types of cancer. The multiple activities of aspirin likely involve several molecular targets and pathways rather than a single target. Therefore, systematic identification of these targets of aspirin can help us understand the underlying mechanisms of the activities. In this study, we identified 23 putative targets of aspirin in the human proteome by using binding pocket similarity detecting tool combination with molecular docking, free energy calculation and pathway analysis. These targets have diverse folds and are derived from different protein family. However, they have similar aspirin-binding pockets. The binding free energy with aspirin for newly identified targets is comparable to that for the primary targets. Pathway analysis revealed that the targets were enriched in several pathways such as vascular endothelial growth factor (VEGF) signaling, Fc epsilon RI signaling and arachidonic acid metabolism, which are strongly involved in inflammation, cardiovascular disease and cancer. Therefore, the predicted target profile of aspirin suggests a new explanation for the disease prevention ability of aspirin. Our findings provide a new insight of aspirin and its efficacy of disease prevention in a systematic and global view. Keywords: Target, Cancer, Molecular docking, Aspirin, Cardiovascular disease, Binding site Introduction Aspirin, also known as acetylsalicylic acid, is a nonsteroidal anti-inflammatory drug (NSAID). The primary molecular mechanism of aspirin is the selective acetylaton of Ser-530 of cyclooxygenase-1 (COX-1) ([30]Alfonso et al., 2014; [31]Dovizio et al., 2013; [32]Ghooi, Thatte & Joshi, 1995; [33]Vane, 1971; [34]Vane & Botting, 2003), thereby inhibiting prostaglandin synthesis. This was the basis for its anti-inflammatory, antipyretic, and analgesic effects ([35]Vane & Botting, 2003). In addition, recent studies revealed that phospholipases A2 (PLA2) is functionally coupled with cyclooxygenase-1 and 2 pathway and part of its anti-inflammatory effects of aspirin may be due to its binding with PLA2 ([36]Balsinde et al., 1999; [37]Singh et al., 2005; [38]Touqui & Alaoui-El-Azher, 2001). Besides its the anti-inflammatory, analgesic, and anti-pyretic properties, aspirin is used for the prevention of cardiovascular disease and various types of cancer ([39]Alfonso et al., 2014; [40]Dovizio et al., 2013). Aspirin prevents myocardial infarction, and ischemic stroke when used in the primary prevention of cardiovascular disease ([41]Berger, Brown & Becker, 2008; [42]Nemerovski et al., 2012; [43]Raju et al., 2011; [44]Schnell, Erbach & Hummel, 2012; [45]Younis et al., 2010). Furthermore, aspirin is also highly effective in preventing several common cancers ([46]Avivi et al., 2012; [47]Burn et al., 2011; [48]Hassan et al., 2012; [49]Rothwell et al., 2010; [50]Rothwell et al., 2012; [51]Thun, Jacobs & Patrono, 2012). Taking aspirin daily reduced risk of distant metastasis by 30–40% and reduced the risk of metastatic adenocarcinoma by almost a half ([52]Rothwell et al., 2012). Although it has been convincingly shown that aspirin can prevent cardiovascular disease and several cancer types, the molecular mechanisms underlying these effects of aspirin are not entirely clear. The multiple activities of aspirin cannot be attributed wholly to a single target and most likely involve several molecular targets and pathways ([53]Deng & Fang, 2012; [54]Din et al., 2012; [55]Sclabas et al., 2005; [56]Singh et al., 2005). Therefore, systematic identification of molecular targets of aspirin can help in understanding the mechanisms underlying the activities and adverse reactions of aspirin. Unfortunately, studies on the proteome-wide target profile of aspirin are very limited. In this study, we predicted the targets of aspirin whole proteome-wide by combining structural bioinformatics and systems biology approaches. Starting with the binding sites of aspirin (BSiteAs), the potential targets of aspirin were discovered by using contact matrix based local structural alignment algorithm (CMASA) which was developed in our lab ([57]Li & Huang, 2010). Then, molecular docking and free energy calculation were applied to filter the improper targets to which the aspirin can not bind. We also analyzed the diversity of the putative targets and binding modes of aspirin. Finally, we performed the pathway analysis for the putative targets. We found several new targets for aspirin which are enriched in the pathways that are strongly involved in inflammatory, cardiovascular disease and cancer, such as vascular endothelial growth factor (VEGF), mitogen-activated protein kinase(MAPK), Fc epsilon RI signaling and arachidonic acid metabolism signaling pathways. Methods Overview of pipeline for proteome-wide prediction of aspirin targets The pipeline for proteome-wide prediction of aspirin targets is outlined in [58]Fig. 1. Firstly, we constructed the structural database of human proteins (17,425 non-redundant structures), and the binding sites of aspirin (BSiteAs) were used to search against this database using the program CMASA. Secondly, the binding abilities of aspirin to these putative off-targets were estimated using molecular docking. If aspirin docked unsuccessfully to the predicted binding pocket of a particular protein, this protein was removed from the target list of aspirin. Thirdly, the remaining putative targets were subject to molecular mechanics Poisson–Boltzmann surface area (MM-PBSA) free energy calculation. Finally, we performed the pathway enrichment analysis for these putative targets. Figure 1. The pipeline of the structural proteome-wide prediction of aspirin targets. [59]Figure 1 [60]Open in a new tab Starting with the binding sites of aspirin (BSiteAs), the pipeline integrates local structure detecting, molecular docking, free energy calculation, and pathway analysis. The structure database of human proteins The query database was built by integrating three available sources of protein structure. The first source is the Protein Data Bank managed by Research Collaboratory for Structural Bioinformatics (RCSB PDB) ([61]http://www.rcsb.org) ([62]Rose et al., 2011), which contains 6,983 structures of human proteins. The other two repositories are structure models built by homology modeling techniques from Structure Atlas of Human Genome ([63]Motono et al., 2011) ([64]http://bird.cbrc.jp/sahg) and GPCR Research Database (GPCR-RD) ([65]Zhang & Zhang, 2010), which have 16,529 and 1,028 structures, respectively. This procedure yielded a total of 24,540 human protein structures. The three available sources of protein structure were integrated and removed the redundancy using CD-HIT Suite ([66]Huang et al., 2010). The identity cut-off was set to 0.95. There are totally 17,425 structures in our self-built ‘non-redundant’ structural database of human proteins. The structural database represents a relatively complete and accurate library of human proteins. To detect and compare the aspirin binding sites in the structure database We used all representative aspirin-protein complexes in our study. There are a total of six aspirin-protein complex structures currently available (1OXR, 1TGM, 2QQT, 3GCL, 4NSB and 3IAZ). Both 1OXR and 1TGM are structures of phospholipase A2, we only used 1OXR in this study ([67]Fig. 2A). 2QQT and 3GCL are bovine lactoperoxidase, and we used 3GCL in this study. 4NSB and 3IAZ are structures of buffalo chitinase-3-like protein 1 and bovine lactotransferrin, respectively. The two structures were also used in our study. In addition, 1PTH, the complex structure of salicylic acid and COX-1, was also used in this study ([68]Fig. 2A). Therefore, a total of five complex structures (1OXR, 3GCL, 4NSB, 3IAZ and 1PTH) were used in this study. The binding sites of aspirin (BSiteAs) were defined as amino acid residues lying within 5 Å from atoms of the aspirin. The BSiteAs were used to search against 17,425 non-redundant structures using the program CMASA ([69]http://bsb.kiz.ac.cn/CMASA_flex) which was developed in our lab. The accuracy and sensitivity of this tool were 0.96 and 0.86, respectively. The workflow of CMASA was as following. First, the CMASA parsed the query binding site and the database to be searched. Second, for each structure in database, based on the query binding site, the CMASA enumerated all possible combination of residues in the structure using amino acid substitute. The blocks substitution matrix 62 (BLOSUM62, cutoff = 1) was used. These residue combinations are similar to that in the query binding site. Each residue combination forms a possible local structure. Third, the CMASA used contact matrix average deviation (CMAD) between the query binding site and these local structures to filter the candidate matches. Then, the CMASA calculated the RMSD and the RMSD based P-value if the CMAD < cutoff. At last, the CMASA ranked all of the hits and add their information. Hits are considered significant if the CMASA E-value <0.01. Figure 2. Structural diversity of the putative targets. [70]Figure 2 [71]Open in a new tab (A) The structural similarity network of the putative targets and the structures of 1OXR and 1PTH are also shown. We analyzed the structural similarity of the 23 putative targets by structural alignment. The RMSD between two linked proteins in the network is smaller than 4 Å. The primary targets of aspirin are colored with green, and the newly identified targets are colored with red. (B) Structural alignment of the putative binding sites of aspirin (BSiteAs) from two proteins PLA2G1B (green) and CDK13 (red). Molecular docking The hit proteins have similar local structures with BSiteAs and potential to bind to aspirin. However, it does not mean that aspirin can certainly bind to these proteins. To assess whether aspirin can bind to these proteins, flexible ligand-rigid protein docking was performed using CHARMm-based DOCKER (CDOCKER) ([72]Wu et al., 2003) in the Discovery Studio v3.1. The following steps are included in the CDOCKER protocol. (1) A set of ligand conformations are generated using high-temperature molecular dynamics with different random seeds. (2) Random orientations of the conformations are produced by translating the center of the ligand to a specified location within the receptor active site, and performing a series of random rotations. A softened energy is calculated and the orientation is kept if the energy is less than a specified threshold. This process continues until either the desired number of low-energy orientations is found, or the maximum number of bad orientations have been tried. (3) Each orientation is subjected to simulated annealing molecular dynamics. The temperature is heated up to a high temperature then cooled to the target temperature. (4) A final minimization of the ligand in the rigid receptor using non-softened potential is performed. (5) For each final pose, the CHARMm energy (interaction energy plus ligand strain) and the interaction energy alone are calculated. The poses are sorted by CHARMm energy and the top scoring (most negative, thus favorable to binding) poses are retained. In this study, we generated 10 random conformations for each ligand. The parameters of the dynamic steps, the dynamic target temperature and include electrostatics are set to 1,000 steps, 1,000 K and True, respectively, which is the default setting in CDOCKER. The binding sphere of CDOCKER is defined around the local structure detected by CMASA. The center of binding sphere is set as the center of the local structure. And the radius of binding sphere is set as 10 Å, which allows the free rotation of aspirin. If aspirin docked successfully to a particular protein, the binding poses of aspirin showing the lowest energy were retained and used for MM-PBSA ([73]Kollman et al., 2000; [74]Kuhn et al., 2005) free energy calculation. The 3-D structures of docked complexes were visualized using PyMol v1.5. MM-PBSA free energy calculation and entropy change estimation The complex structures of aspirin and putative targets were further used to obtain more accurate estimate of binding free energy. Binding free energies (ΔG[bind]) of aspirin at the binding site on these target proteins were calculated by using the MM-PBSA free energy calculation protocol in Pipeline pilot v8.5 ([75]http://accelrys.com/products/collaborative-science/biovia-pipeline -pilot/) as follows: [MATH: ΔGbindingPB= G complexG receptor< mi>Gligand :MATH] (1) where G[complex], G[receptor], G[ligand] are the free energies of the complex, receptor and ligand respectively. The free energy of each molecule on the right hand side can be considered as the sum of molecular mechanics energy in gas phase(E[MM]) and solvation free energy(Δsol). As the entropy change (TΔS) is the most time-intensive part, it was not included in the above calculation. We estimate the entropy change in another way. In the study of [76]Chang & Gilson (2004), they computed the entropy changes for different receptors upon same ligand binding. The results showed the entropy changes for different receptors were similar. Therefore, we can estimate an approximate entropy change. The process of entropy change estimation is as following: (1) Based on the formula:—RTlnK = ΔG = ΔH − TΔS (where R is the gas constant, T is the absolute temperature), the entropy changes (TΔS) is equal to the sum of ΔH and RTlnK; (2) The experiment study have shown that aspirin binds to PLA2 enzyme (PDBID: 1OXR) specifically with a binding constant (K) of 1.56 × 10^5M^−1 ([77]Singh et al., 2005). For this complex, the enthalpy change (ΔH) was calculated as −2.327 kcal/mol using the MM-PBSA and RTln K was calculated as 7.055 kcal/mol. So the entropy change (TΔS) is 4.728 kcal/mol (equal to −2.327 kcal/mol + 7.055 kcal/mol) for this complex. 3) Based on above assumption and computation, we set the entropy change in the process of aspirin binding to various putative targets as approximate value 4.728 kcal/mol. Pathway enrichment analysis and interaction network construction To analyze the significance of KEGG pathways associated with our predicted targets, we collected UniProtKB accession number (AC) of these targets and performed KEGG pathway annotation using the DAVID tool ([78]http://david.abcc.ncifcrf.gov/). The significantly over-represented KEGG pathways were identified based on the Bonferroni-adjusted P value (P < 0.01) ([79]Huang, Sherman & Lempicki, 2009). In addition, based on these pathways, an integrated targets-cellular effect interaction network was constructed. Results Identification of putative targets of aspirin in human proteome We presented a proteome-wide prediction of aspirin targets using structural bioinformatics and system biology approaches. We used comparison of BSiteAs to recognize putative targets and further refined by docking and MM-PBSA in structural bioinformatics part whereas pathway enrichment analysis and interaction network construction were performed in system biology section. The steps in our pipeline for proteome-wide prediction of aspirin-binding proteins are shown in [80]Fig. 1. Firstly, the binding sites of aspirin (BSiteAs) were used as queries to search against 17,425 non-redundant structures of human proteins in our self-build structure database using the program CMASA. Totally, 79 proteins with putative BSiteAs were identified ([81]Table S1). Of these proteins, the top 10 ranked proteins are members of the phospholipase A2. cyclooxygenase, lactoperoxidase and Chitotriosidase families, which are the primary targets of aspirin. The remaining 69 proteins have different structural folds from the primary targets. The hit proteins have similar local structures with BSiteAs and potential to bind to aspirin. However, it does not mean that aspirin can certainly bind to these proteins. In the second step, molecular docking was used to assess whether aspirin can bind to these proteins. CDOCKER in the Discovery Studio v3.1 was used to dock aspirin to the predicted binding site on these proteins. Proteins that failed to dock aspirin were removed from the target list. Only 26 proteins were considered for further analysis after filtering by molecular docking, 10 proteins of which are the primary targets of aspirin ([82]Table S1). Finally, MM-PBSA free energy calculation was performed for the lowest-energy protein-aspirin complex obtained in the docking step. In total, 23 proteins bind to aspirin with binding free energies (ΔG[bind]) < 0 listed in [83]Table 1 and selected as the putative targets of aspirin. Next, we analyzed the binding modes and affinities of aspirin to these targets, the structural similarity and pathway enrichment of these targets and clinical outcomes of aspirin. Table 1. The putative targets with binding free energies calculated by MM-PBSA. Gene name UniProtKB entry Family [MATH: ΔGbindingPB :MATH] (kcal/mol) EXOSC3 [84]Q9NQT5 RRP40 family −33.0 MAPK12 [85]P53778 Kinase family −28.6 ITGAL [86]O43746 Integrin alpha chain family −28.0 PTGS2 [87]P35354 Prostaglandin G/H synthase family −20.2 PTGS1 [88]P23219 Prostaglandin G/H synthase family −27.6 PLA2G10 [89]O15496 Phospholipase A2 family −25.7 FBP1 [90]P09467 FBPase class 1 family −25.3 CUL4B [91]Q13620 Cullin family −23.0 MMP12 [92]P39900 Peptidase M10A family −18.6 CDK13 [93]Q14004 kinase family −18.4 TNFAIP6 [94]P98066 Hyaluronan-binding protein family −16.8 PLA2G3 [95]Q9NZ20 Phospholipase A2 family −14.7 HLA-A [96]O19619 MHC class I family −12.1 MOCS3 [97]O95396 HesA/MoeB/ThiF family −12.0 AIDA [98]Q96BJ3 AIDA family −11.1 RAC1 [99]P63000 Rho family −11.0 PLA2G5 [100]P39877 Phospholipase A2 family −10.8 PLA2G1B [101]P04054 Phospholipase A2 family −10.6 PLA2G2A [102]P14555 Phospholipase A2 family −10.6 TNFSF14 [103]O43557 Tumor necrosis factor family −10.3 CHIT1 [104]Q13231 Chitotriosidase family −10.0 EGFLAM [105]Q63HQ2 Pikachurin family −9.2 PLA2G2D [106]Q9UNK4 Phospholipase A2 family −6.0 [107]Open in a new tab Structural diversity of the putative targets In order to analyze the structural similarity of the 23 putative targets, each pair of these targets were structurally aligned using the program Combinatorial Extension (CE) ([108]Shindyalov & Bourne, 2001). The targets were considered to be similar if the root-mean-square deviation (RMSD) between two structures is less than 4 Å. A network of targets was generated by linking structurally similar targets ([109]Fig. 2A). The primary targets of aspirin are colored with green, and the newly identified targets are colored with red. Only two proteins HLA class I histocompatibility antigen, A-2 alpha chain (HLA-A) and axin interactor, dorsalization-associated protein (AIDA) clustered with the phospholipase A2 family. Proteins cyclooxygenase-1 and 2 are linked together but not similar with the other proteins. Therefore, the overall structures of the newly identified targets and the primary targets were not similar. It indicates the structural diversity of the 23 putative targets. However, the 23 targets have similar local structures. For example, proteins group IB phospholipase A2 (PLA2G1B) and cyclin-dependent kinase 13 (CDK13) which belong to different protein families have very similar binding sites of aspirin (BSiteAs) ([110]Fig. 2B). Diverse binding modes of aspirin to the putative targets In our study, aspirin was docked to the predicted binding sites on putative targets. Results of these docking experiments reveal diverse binding modes of aspirin to these targets ([111]Fig. 3). Some examples are given follows. As a first example, aspirin binds to protein CDK13 with a novel mode compared to ATP-analog inhibitor of the kinase ([112]Fig. 3A). ATP-analog inhibitors exhibit inhibitory activity of kinase by competitive binding to its ATP binding site. In contrast, aspirin binds to CDK13 in the vicinity of the ATP binding site and interact with loops (L1 and L2) which are important for ATP binding. Another example is that aspirin binds to protein ras-related C3 botulinum toxin substrate 1 (RAC1), which is very different from GTP binding ([113]Fig. 3B). Aspirin binds to the other side of RAC1 and interact with the N-terminal part of switch II (sequence ^56 WDTAG), which is crucial for interaction between RAC1 and protein Arfaptin ([114]Tarricone et al., 2001). The Arfaptin mediates cross-talk between Rac and Arf signaling pathways. The last example is that aspirin binds to protein integrin alpha-L (ITGAL) in the binding site of its inhibitor BQM ([115]Guckian et al., 2008) ([116]Fig. 3C). The analysis of the binding mode of aspirin to the other 19 putative targets is shown in the [117]Figs. S1–[118]S23. The statistic of 23 putative binding pockets of aspirin is shown in the [119]Fig. S24. The coordinates of 23 putative binding pockets are shown in [120]Table S2. The size of the 23 binding pockets is ranging from 10 to 26 residues, if we define the residues lying within 5 Å from aspirin as binding pockets. The pocket size of aspirin binding to phospholipase A2 (PDBID: 1OXR) and cyclooxygenase-1 (PDBID: 1PTH) is fall in this range. The most used amino acids in pockets are leucine (L), cysteine (C), valine (V), glycine (G), tyrosine (Y), isoleucine (I), alanine (A) and phenylalanine (F). The eight amino acids appeared in the pockets more than 20 times. All the eight amino acids are with hydrophobic side chain except cysteine (C). It suggests that hydrophobic interaction is important for aspirin binding. Additionally, the H-bond is also important for aspirin binding because 78% (18/23) binding pockets formed H-bonds with aspirin. The most used amino acids involved in H-bond formation are aspartic acid (D), lysine (K) and histidine (H). Therefore, the combinations of these charged amino acids (D, K and H) and hydrophobic amino acids (L, V, G, Y, I, A and F) may form the pocket that aspirin prefers to bind to. Figure 3. Diverse binding modes of aspirin to the putative targets. [121]Figure 3 [122]Open in a new tab The docking experiments reveal diverse binding modes of aspirin to these targets (A) Aspirin binding to protein CDK13 (3LQ5.pdb). (B) Aspirin binding to protein RAC1 (1RYH.pdb). © Aspirin binding to protein ITGAL (3BQM.pdb). The overview and close-up view of the binding mode of aspirin to their putative targets are shown in A-C and D-F, respectively. The close-up view (D-F) show all amino acids in the vicinity of aspirin. Aspirin and the known ligands of the three proteins are colored with green and red, respectively (A-C). The residues involved in binding to both aspirin and the ligands are shown as sticks and colored with blue (A-C). SLQ (A), GNP (B) and BQM © are ATP-analog inhibitor of the kinase CDK13, the substrate of the protein RAC1 and the inhibitor of protein ITGAL, respectively. The binding affinities of aspirin to putative targets To obtain accurate estimate of the binding energy of different putative targets with aspirin, MM-PBSA free energy calculation protocol was used. In the combination of the experimental study ([123]Singh et al., 2005) and MM-PBSA free energy calculation, we obtained the estimated entropy change (TΔS = 4.728 kcal/mol) upon aspirin binding. The entropy changes do not have large fluctuations when the same ligand binds to a different acceptor based on the study of [124]Chang & Gilson (2004). Therefore, the entropy changes when aspirin binds to various putative targets was assumed as 4.728 kcal/mol to compare free energies associated with different aspirin binding putative targets. The binding free energies including entropy change for the 23 proteins binding to aspirin were calculated and listed in [125]Table 1. The binding free energies of the 23 proteins with aspirin are varied from −6.0 (group IID secretory phospholipase A2, PLA2G2D) to −33.0 (exosome component 3, EXOSC3) kcal/mol. Overall, the binding free energies for newly identified targets (the average −18.4 kcal/mol) are comparable to that for the primary targets (the average −15.3 kcal/mol). Pathway enrichment and interaction network of putative targets Using the DAVID tool, we find that our predicted targets are significantly overrepresented for several pathways (p < 0.01) ([126]Table 2). Some of these pathways are strongly involved in inflammation, cardiovascular disease and cancer, such as VEGF signaling, Fc epsilon RI signaling, arachidonic acid metabolism, gonadotropin-releasing hormone (GnRH) signaling and MAPK signaling. To illustrate the relationship between the putative targets and their cellular effect, an integrated interaction network of targets-cellular effect based on their associated pathways was constructed ([127]Fig. 4). The interactions between predicted targets and the major effects involved in cancer development, inflammation and cardiovascular disease were present in this network. Represented by green circles in the network, the predicted targets regulate VEGF, epsilon RI signaling, arachidonic acid metabolism, and MAPK pathways through interactions with other proteins (gray circles) connecting the pathways. Inhibition of predicted targets is expected to down-regulate these pathways, and then prevent inflammation and decrease the risk of cardiovascular disease and cancer. Table 2. The pathways significantly overrepresented by our predicted targets (p < 0.01). Pathway Count Percentage Adjust P-value[128]^a Reference[129]^b VEGF signaling pathway 9 40.9 4.60E−10 [130]Cross et al. (2003) Arachidonic acid metabolism 8 36.4 2.00E−09 [131]Spector et al. (2004) Fc epsilon RI signaling pathway 8 36.4 1.50E−08 [132]Kawakami & Galli (2002) Alpha-Linolenic acid metabolism 6 27.3 1.10E−08 [133]Hamberg et al. (2003) Linoleic acid metabolism 6 27.3 1.00E−07 [134]Shureiqi et al. (2003) Ether lipid metabolism 6 27.3 2.80E−07 [135]Nagan & Zoeller (2001) GnRH signaling pathway 7 31.8 1.40E−06 [136]Ruf & Sealfon (2004) Glycerophospholipid metabolism 6 27.3 6.40E−06 [137]Racenis et al. (1992) Long-term depression 6 27.3 6.10E−06 [138]Ito (2002) MAPK signaling pathway 8 36.4 2.30E−05 [139]Dent et al. (2003) Vascular smooth muscle contraction 6 27.3 5.50E−05 [140]Kim et al. (2008) [141]Open in a new tab Notes. ^a P-value was adjusted using Benjamini & Hochberg method. ^b The literature references link the refined putative targets with