Abstract

Objective

   This study’s primary goal is unraveling the mechanism of action of
   bioactives of Curcuma longa L. at the molecular level using
   protein–protein interaction network.

Results

   We used target proteins to create protein–protein interaction network
   (PPIN) and identified significant node and edge attributes of PPIN. We
   identified the cluster of proteins in the PPIN, which were used to
   identify enriched pathways. We identified closeness centrality and
   jaccard score as most important node and edge attribute of the PPIN
   respectively. The enriched pathways of various clusters were overlapped
   suggesting synergistic mechanism of action. The three pathways found to
   be common among three clusters were Gonadotropin-releasing hormone
   receptor pathway, Endothelin signaling pathway, and Inflammation
   mediated by chemokine and cytokine signaling pathway.

   Keywords: Markov clustering, Protein clusters, Centrality measure,
   Synergistic mechanism

Introduction

   The Curcuma longa L. has been studied for antiinflammatory and
   anticancer effects [[27]1]. The exact mechanism remains largely
   unexplored. Bioactives have shown the multi-components and
   multi-targets effect by using protein–protein interaction network
   (PPIN). A target protein usually carries out a typical function by
   regulating other molecules; thus, the study of PPIN helps to understand
   relationship between target proteins and other interacting proteins in
   a systematic way.. Earlier study has shown that the target proteins
   indeed have some special topological features that are significantly
   different than the normal proteins [[28]2]. Thus, we decided to do a
   comparison study of a true PPIN and a false PPIN to identify
   discriminating topological attributes. Further, we used those
   attributes to select importantnodes and edges in the PPIN.

Main text

Methods

   The four bioactive compounds namely curcumin, Desmethoxycurcumin,
   Bisdemethoxycurcumin and Turmerone of C. longa were studied. We used
   similarity ensemble approach ([29]http://sea.bkslab.org/) to identify
   the potential target proteins of all these four bioactive compounds
   [[30]3]. Further, we queried the target proteins to StringDB (human
   protein interaction database) to retrieve all the listed interactions
   involving the target proteins. A small set of target proteins (TP) was
   found to have interaction (biological or physiochemical) with many
   other interacting proteins (IP). We used NetworkX library in python to
   build and study the true and false PPIN.

   To create the true PPIN, we created an undirected graph having edge
   indicating the interaction between the TP and IP as obtained from
   StringDB.

   The false PPIN was created by forming edges belonging to all the
   non-existent interactions for TP and IP.

   We used Markov-clustering library in python to identify protein
   clusters in the network. We used statistical overrepresentation test of
   PANTHER pathways ([31]http://pantherdb.org/) to identify significant
   pathways (using human reference genome) associated with the each
   cluster of the network.

Calculation

   We created true PPIN network using TP and IP as nodes and StringDB
   interactions as edges. For false PPIN, TP and IP were used as nodes
   with the non-existent interaction as edge. We calculated edge property
   of both the networks using four link prediction algorithms;
   jaccard_score, preferential_attachment score, common_neighbors score,
   and resource_allocation_index score, using NetworkX library in python.
   Further, we calculated node topological property of both the networks;
   such as, degree, eigenvector centrality, betweenness centrality,
   closeness centrality, local clustering coefficient, eccentricity
   values.

   We identified best edge attribute and node attribute by using
   statistical analysis. The codes are available at Github.

Results and discussion

Similarity Ensemble Approach

   The chemical-centric method can exploit the pharmacological
   relationships among protein targets in addition to their biological
   [[32]4]. The target molecule Curcumin was queried and we found 193
   human target proteins associated with it. Desmethoxycurcumin was mapped
   to 166 human target proteins, Bisdemethoxycurcumin identified to have
   71 human target proteins and Turmerone was associated with 2 target
   proteins. After removing overlapped target proteins, we had 219 unique
   target proteins for further PPIN study (Fig. [33]1).

Fig. 1.

   [34]Fig. 1
   [35]Open in a new tab

   Schematic representation of workflow used in this study

Network formation and property

   We downloaded Human protein interaction data (scored links between
   proteins) from String DB and retrieved all the interaction in which any
   of the 219 target proteins were involved. This has led to 208,125
   interactions having interaction score from 150 to 999. We removed edges
   having score below 300 which gave a total of 58,482 interactions as
   edgeand 11,979 (TP + IP) proteins as nodes. The nodes were comprised of
   TP (219) and IP (11,760).

Biological interactions network (True PPIN) vs. False interaction network
(false PPIN)

   To understand the network property of both the networks (true PPIN and
   false PPIN), we calculated the four different edge attributes (scores)
   using link prediction algorithms. Thus, we calculate the score value
   for each edge in both the networks. To calculate the score value, we
   used different algorithms implemented in the Networkx library in
   python. The calculated scores were namely 1) preferential attachment
   score: Preferential attachment algorithm shows that the more connected
   a node is, the more likely it is to receive new links. Thus an edge
   which connects two nodes which themselves are highly connected to other
   nodes (by an edge) will have higher edge score value. 2)common
   neighbors score: Common neighbors algorithm captures the idea that two
   strangers who have a friend in common are more likely to be introduced
   than those who don’t have any friends in common. Thus, an edge which
   connects two nodes which are having higher number of common connection
   (other nodes which they are connected to) will have higher value of
   edge score. 3) jaccard score: jaccard score is a measure used to
   compute the closeness of nodes based on their shared neighbors and
   their degree values. The higher jaccard score value for an edge
   (connecting two nodes) shows that the two nodes are having higher
   number of common connection but themselves are not highly connected to
   other nodes. and 4) resource allocation score: resource allocation
   score is a measure used to compute the closeness of nodes based on
   their shared neighbors and the degree value of that shared neighbor
   nodes. The higher resource allocation score value for an edge
   (connecting two nodes) shows that the two nodes are having higher
   number of common connections and those common connections are not
   highly connected to the other nodes. To calculate these edge score
   using above mentioned four link prediction algorithms.

   For true PPIN, we calculated the correlation coefficient of score
   values of edge attributes along with interaction score obtained from
   StringDB using Pearson correlation coefficient. We found a poor
   correlation between interaction score against each of the topological
   edge attributes. The obtained correlation coefficient values ranges
   from 0.076 to 0.31. Thus, none of the topological edge attributes
   resembled the biological interactions between two protein nodes.
   Further, we performed the significance testing of edge attributes
   belonging to the two groups; true PPIN and false PPIN. The most
   significant edge attribute between the two groups obtained by t test
   was jaccard score. The t-tests results are uploaded on the Github as
   folder named Edge_attributes_hypothesis_testing.

Difference in centrality measures

   Further, we studied the node attributes of these two networks, and
   calculated different types centrality measures. We calculated the
   degree, closeness centrality, Eigenvector Centrality, betweenness
   Centrality, Local Clustering Coefficient, Eccentricity.

   We calculated the correlation coefficient of all the centrality
   measures for the true PPIN and false PPIN. For true PPIN, we found the
   very strong correlation between degree and betweenness centrality
   (0.95) which shows that nodes with high degree control the information
   flow in the network by being present in shortest paths in PPIN and may
   contribute to multiple pathways.

   For false PPIN, we found the very strong correlation between degree and
   eigenvector centrality (0.93) but a poor correlation between degree and
   betweenness centrality (0.56). This showed that the unlike true PPIN,
   high degree nodes do not control the information flow in the network.

   Further, we used the machine learning algorithm such as logistic
   regression and random forest to select best classifier node attributes
   to differentiate between the true PPIN and false PPIN. The closeness
   centrality was identified as a best classifier. For true PPIN, nodes
   have relatively higher values for closeness centrality.

   By using our findings, we removed the insignificant edges and nodes
   from true PPIN and made it sparse. We removed edges having jaccard
   score value above 75 percentile of true PPIN. We also removed the nodes
   that had closeness centrality value less than the 25 percentiles in
   true PPIN. This yielded a resulting network of 1900 nodes and 4637
   edges (Fig [36]1).

Protein cluster identification

   We used Markov cluster (MCL) algorithm for protein cluster
   identification. MCL algorithm is particularly noise-tolerant as well as
   effective in identifying high-quality protein clusters [[37]5]. MCL is
   unsupervised cluster algorithm for graphs based on manipulation of
   transition probabilities to identify protein clusters. Protein clusters
   are generally highly overlapped but MCL is hard clustering algorithm
   and proteins are non-overlapping. The fundamental concept of
   identifying protein clusters is that a pair of proteins interacting
   with each other has higher probability of sharing the same function
   (pathway) than two proteins not interacting with each other. The MCL
   algorithm identified 6 clusters within true PPIN (Fig. [38]2).

Fig. 2.

   [39]Fig. 2
   [40]Open in a new tab

   Module identification of PPI network using MCL clustering and target
   protein of each cluster

Pathway enrichment analysis

   Target identification and synergistic interaction among multiple target
   is important unravel the pharmacological mechanism of action of
   bioactives. Target proteins belonging to each cluster were searched
   into Gene Ontology database
   ([41]http://pantherdb.org/webservices/go/overrep.jsp). We uploaded the
   protein list of each cluster, we selected the option of statistical
   overrepresentation test.

   A detailed table showing the cluster number, their TP, IP and pathways
   is uploaded on the Github page as cluster_proteins_pathways.xlsx. we
   can conclude that the three cluster involved in the significant number
   of pathways are cluster number 2, 4, and 5 contributing to 25, 35 and
   38 pathways respectively. Three pathways were overlapped among these
   three cluster. These pathways were Gonadotropin-releasing hormone
   receptor pathway, Endothelin signaling pathway, and Inflammation
   mediated by chemokine and cytokine signaling pathway. Earlier studies
   [[42]6] showed the connection between presence of
   Gonadotropin-releasing hormone receptor in extra-pituitary tissues and
   progression of some cancers which gives indirect evidence to the
   anticancer activity of the C. longa.

Limitations

     * Experimental study is not included in the which was ideal to assess
       pathways enrichment.
     * Lack of complete information about target proteins and
       theirinteraction.

Acknowledgements