Abstract
Purpose
Graph coloring approach has emerged as a valuable problem-solving tool
for both theoretical and practical aspects across various scientific
disciplines, including biology. In this study, we demonstrate the graph
coloring’s effectiveness in computational network biology, more
precisely in analyzing protein–protein interaction (PPI) networks to
gain insights about the viral infections and its consequences on human
health. Accordingly, we propose a generic model that can highlight
important hub proteins of virus-associated disease manifestations,
changes in disease-associated biological pathways, potential drug
targets and respective drugs. We test our model on SARS-CoV-2
infection, a highly transmissible virus responsible for the COVID-19
pandemic. The pandemic took significant human lives, causing severe
respiratory illnesses and exhibiting various symptoms ranging from
fever and cough to gastrointestinal, cardiac, renal, neurological, and
other manifestations.
Methods
To investigate the underlying mechanisms of SARS-CoV-2
infection-induced dysregulation of human pathobiology, we construct a
two-level PPI network and employed a differential evolution-based graph
coloring (DEGCP) algorithm to identify critical hub proteins that might
serve as potential targets for resolving the associated issues.
Initially, we concentrate on the direct human interactors of SARS-CoV-2
proteins to construct the first-level PPI network and subsequently
applied the DEGCP algorithm to identify essential hub proteins within
this network. We then build a second-level PPI network by incorporating
the next-level human interactors of the first-level hub proteins and
use the DEGCP algorithm to predict the second level of hub proteins.
Results
We first identify the potential crucial hub proteins associated with
SARS-CoV-2 infection at different levels. Through comprehensive
analysis, we then investigate the cellular localization, interactions
with other viral families, involvement in biological pathways and
processes, functional attributes, gene regulation capabilities as
transcription factors, and their associations with disease-associated
symptoms of these identified hub proteins. Our findings highlight the
significance of these hub proteins and their intricate connections with
disease pathophysiology. Furthermore, we predict potential drug targets
among the hub proteins and identify specific drugs that hold promise in
preventing or treating SARS-CoV-2 infection and its consequences.
Conclusion
Our generic model demonstrates the effectiveness of DEGCP algorithm in
analyzing biological PPI networks, provides valuable insights into
disease biology, and offers a basis for developing novel therapeutic
strategies for other viral infections that may cause future pandemic.
Supplementary Information
The online version contains supplementary material available at
10.1186/s12859-024-05690-0.
Keywords: Differential evolution, Protein–protein interaction networks,
Drug target, Generic model, Hub proteins, KEGG pathway, Respiratory
disorder
Background
COVID-19, a global pandemic that emerged in late 2019, is caused by the
highly transmissible severe acute respiratory syndrome coronavirus 2
(SARS-CoV-2). Belonging to the Coronaviridae family and
Orthocoronavirinae subfamily, SARS-CoV-2 is a single-stranded
positive-sense RNA (+ssRNA) virus, which belongs to Betacoronavirus
genus among four classified coronavirus genera: Alphacoronavirus,
Betacoronavirus, Gammacoronavirus, and Deltacoronavirus [[31]1]. Within
its genomic makeup, SARS-CoV-2 encodes sixteen non-structural proteins
(Nsp1-16), four structural proteins (S, E, M, and N), and some
accessory proteins (ORF) [[32]2]. SARS-CoV-2-infected individuals
commonly experience severe respiratory symptoms, such as fever, cough,
and loss of taste and smell at the initial stage of infection. However,
the impact of the virus extends beyond the respiratory system,
resulting in additional manifestations such as nausea, intestinal
obstruction, lung injury, and various complications affecting organs
such as the kidney, gut, brain, heart, and other body parts [[33]3,
[34]4].
Till now, this global threat has taken almost seven-million human lives
and infected more than seven hundred million people worldwide.[35]1
During the pandemic period, numerous studies underscored the critical
role of protein–protein interactions (PPIs) between SARS-CoV-2 virus
proteins and human host factors in the initiation and progression of
the infection, leading to the development of pathogenesis and
associated human pathophysiological responses [[36]5–[37]9]. PPIs are
the physical connections between two or more proteins. The collection
of various protein–protein interactions forms a biological network
called protein–protein interaction network (PPIN). Many researchers
predicted these PPIs between SARS-CoV-2 virus and human proteins to
point out the underlying mechanism of SARS-CoV-2 proteins-mediated
disease propagation [[38]5, [39]6, [40]10, [41]11]. The authors also
predicted the affected pathways, target host proteins, associated
disease progression, underlying cause of COVID-19 pathobiology,
anti-viral drug repurposing and preventive therapeutic measures in a
discrete way [[42]6–[43]8, [44]12]. But the actual host cellular
targets of the SARS-CoV-2 virus, the precise pathways of virus
transmission in humans, and the relative importance of each of those
proteins and pathways are yet to be understood fully. Also, the
long-term impacts of SARS-CoV-2 infection in humans still not fully
explored. Furthermore, no generic model has been proposed yet that can
reveal the underlying mechanism, transmission, effects, disease
manifestations, and effective therapeutic strategies in a single
framework for all viral infections. Therefore, it suggests an immense
need to better understand viral infections and consequent dysregulation
of human biological homeostasis in a comprehensive manner to offer new
treatment or preventive strategies for COVID-19 like pandemic which may
occur in future. Biological network establishment using virus-human
protein–protein interactions, analysing the network for extrapolation
to the next level of human–human protein–protein interaction network,
identification of the crucial hub proteins at different levels in those
biological networks along with bioinformatic analysis of those hub
proteins to identify associated disease manifestations might have the
potential to fulfill this purpose.
In Discrete Mathematics and Computer Science, any network is commonly
represented using a graph. A graph is a collection of a set of vertices
called nodes, and a set of links or connections between the nodes,
called edges. Based on the direction of the edges or the weight
associated with the edges, a graph can be directed, undirected, or
weighted. Likewise, based on the functionalities and characteristics of
the network, any biological network can be mapped mathematically to a
directed, undirected, or weighted graph. Hence, biological
protein–protein interaction network can also be mathematically
represented as an undirected graph. In the mathematical representation
of a PPI network, the proteins are represented as the vertices of the
graph, and the interactions between the proteins are represented as the
edges of the graph. Similarly, hub protein identification in a PPI
network can also be mapped to the problem of vertex identification in a
graph, which can be done in various ways, and most of them are
successfully applied to numerous real-life combinatorial optimization
problems which are NP-hard or NP-complete in nature [[45]13, [46]14].
One of a very promising method to identify the vertices of a graph is
graph coloring, which is also a well-known NP-complete combinatorial
optimization problem [[47]15]. Graph coloring is a fundamental concept
in graph theory, involving the assignment of colors to the vertices
(nodes) of a graph so that no two neighboring vertices, connected by an
edge, share the same color. The least number of colors required to
generate such coloring is known as the chromatic number of the graph.
However, determining the chromatic number or finding an optimal
coloring is considered an NP-complete problem, which is computationally
challenging and time-consuming [[48]16]. Consequently, meta-heuristic
algorithms, including evolutionary algorithms and approximation
methods, are commonly employed to find theoretical and practical
solutions efficiently [[49]17].
Graph coloring has wide applications in solving various optimization
and allocation problems in real-world scenarios. Its successful
implementation extends to diverse fields, including computational and
network biology [[50]18, [51]19]. Specifically, in biology, graph
coloring has been proven helpful in modeling gene regulation networks,
protein–protein interaction networks, disease-gene associations, drug
target identification, functional annotation, disease subtyping,
pathway analysis, and numerous other biological processes
[[52]20–[53]24]. It is a straightforward and intuitive method that does
not require complex algorithms or extensive computational resources and
needs less memory consumption [[54]22]. The approach is highly scalable
and can be scaled to networks of varying sizes, making it suitable for
small-scale and large-scale protein–protein interaction networks. As
the network size increases, the graph coloring algorithm can
efficiently handle the increased computational complexity without
sacrificing accuracy [[55]22, [56]25]. In addition, it is a flexible
method that can be adapted to incorporate additional information or
constraints, such as functional annotations or experimental data, to
enhance the accuracy of hub protein identification [[57]26].
In this study, we explore the application of an evolutionary graph
coloring algorithm, namely DEGCP algorithm, in the context of a generic
mathematical modelling of a biological protein–protein interaction
network. The main objective is to propose a generic mathematical model
for analysing a complex biological PPI network to uncover pivotal
mediators associated with viral diseases and their effects on human
pathophysiology. By utilizing DEGCP, we aim to analyze the complex PPI
network efficiently and identify crucial hub proteins that play
essential roles in the disease’s progression and manifestation. This
investigation holds the potential to contribute valuable insights into
the underlying biological mechanisms of viral diseases and may pave the
way for novel therapeutic strategies in combating the disease and
related conditions in future that may cause global threat. The proposed
model (Fig. [58]1) consists of identifying pivotal hub proteins, their
interactions with various DNA & RNA viral families, gene ontology and
pathway analyses of those hub proteins, transcription factors
identification, association of the hub proteins with various disease
symptoms, and identification of their druggable targets and respective
drugs. To validate our proposed model, we used the experimental data
associated with COVID-19 disease, a pandemic that occurred all of the
world in the recent past. The experimental outcomes aim that the
proposed model can also be applied to other viral infections.
Fig. 1.
[59]Fig. 1
[60]Open in a new tab
Schematic view of establishing SARS-CoV-2-human protein–protein
interaction network and application of DEGCP algorithm to unravel the
deeper insight of SARS-CoV-2 infection
Related work
Over the last few decades, literature surveys revealed various
experiments that were conducted to analyze and predict the PPI for
understanding different biological functions and identifying the
crucial target proteins in protein–protein interaction networks
[[61]27–[62]29]. An effective graph coloring-based integrative
statistical algorithm has been designed for essential protein
prediction [[63]22]. The authors proposed hybridization methods
consisting of graph coloring and artificial neural network (ANN) for
finding the target proteins in infectious diseases [[64]25, [65]26].
Graph theoretic approach was also proposed to find the infected
pathways in viral disease [[66]30]. PPIs between virus and host
proteins were studied to highlight the underlying mechanism of
SARS-CoV-2 proteins-mediated disease propagation [[67]5]. A recent
study exhaustively examined the molecular pathways, including
pathways-based therapeutic targets for COVID-19 [[68]6]. A
comprehensive analysis was made using unsupervised machine learning
method to highlight COVID-19-related affected pathways [[69]7].
Bioinformatics analysis was conducted to understand the underlying
molecular mechanism of advancement of SARS-CoV-2 infection by receiver
operating characteristic (ROC) curve analysis [[70]8]. Network
topological analysis was performed to identify the potential target hub
genes and affected pathways of COVID-19 [[71]9]. SARS-CoV-2-induced
pathways and corresponding drug repurposing strategies were identified
by artificial neural network analysis using random walk with restart
(RWR) method [[72]12]. Computational identification of host genomic
biomarkers influencing SARS-CoV-2 infections was made using statistical
R-packages [[73]31]. Underlying causes of COVID-19 pathobiology and
prediction of potential therapeutic targets and effective drugs were
examined extensively through virus-host protein interaction network
study [[74]32–[75]35]. The authors attempted to develop combinatorial
treatment strategies targeting both host factors and viral enzymes
through comprehensive mapping of interactions between SARS-CoV-2
proteins and human proteins [[76]36, [77]37]. An extensive
computational investigation of the interactome of SARS-CoV-2 and human
proteins was conducted for identifying possible virus-affected
processes and potential protein binding sites [[78]10]. The common
pathways and molecular biomarkers in COVID-19 were identified through
PPI network analysis which can cause pulmonary fibrosis and lung cancer
[[79]11].
Motivation and contribution
Despite the enormous research, our understanding about the underlying
mechanism and host targets of viral infections including SARS-CoV-2,
effects of viral infections on host biological pathways, mechanism of
development of disease pathology and its long-term impacts on hosts are
still limited. The development of effective treatment options to
prevent the disease and resolve associated manifestations are still an
open challenge. Moreover, no single generic model exists till date that
can address the issues associated with SARS-CoV-2 and other viral
infections as well. These limitations motivated us to propose a new
model that can enhance the deeper understanding of the mechanism of
such viral infections and consequent pathophysiology. The understanding
of viral-host interactions and their consequences is of utmost
importance in comprehending the disease’s complex pathophysiology.
Thus, in this study, we delve into the protein–protein interaction
network of SARS-CoV-2 and human host proteins to identify crucial
mediators that might shed light on the mechanisms underlying COVID-19
infection and its associated impact on human health. By elucidating
these molecular interactions, we aim to contribute to the development
of targeted therapeutic strategies and improve patient outcomes in this
ongoing global health crisis, as well as for any other pandemic that
can occur in future. We believe, our present research is a wise effort
in this context.
Our proposed model, rooted in an evolutionary graph coloring algorithm,
presents numerous advantages over traditional degree centrality
criteria in discerning hub proteins within PPI networks. In contrast to
degree centrality, which predominantly assesses the number of direct
interactions a protein has, our model integrates the topological
context of the entire network. It explores protein interactions
comprehensively, encompassing both direct and indirect connections. The
employed graph coloring methodology adheres to the principle that
adjacent nodes (proteins) should not share the same color, signifying
the absence of direct interactions. However, proteins with the same
color may possess indirect connections through shared intermediates,
indicating potential functional relationships. This approach recognizes
that proteins involved in a pathway may contribute to the pathway’s
activity without necessitating direct physical interactions. Degree
centrality, reliant on the count of direct interactions, may be
sensitive to the overall network size. This size of the networks or
graphs keep on changing in realistic situation, and thus the networks
or graphs are dynamic in nature. Any changes in network structure, such
as node addition or removal, can significantly influence degree
centrality and alter the identification of hub proteins. In contrast,
our model demonstrates greater robustness to structural changes,
adapting to alterations in network connectivity through the assigned
colors reflecting local context connectivity, and thus is well-suited
for dynamic graphs or dynamic network structures.
While our model does not entirely dismiss the significance of degree
centrality, it incorporates it alongside additional weightings. The hub
protein identification process involves two primary steps. Firstly, the
PPI network graph undergoes chromatic coloring with minimal number of
colors using a combination of differential evolution and sequential
coloring algorithms. Differential evolution algorithm optimizes the
ordering of the nodes so as to minimize the number of colors and the
sequential algorithm assigns valid colors based on the degree and
adjacency of the nodes. Subsequently, weights are assigned to human
proteins based on color class, interactions with viral proteins, direct
interactions with other proteins in the graph, and interactions with
other human proteins outside the network. The incorporation of these
diverse criteria, including but not limited to degree centrality,
contributes to a more comprehensive hub protein identification process.
The calculated Z-score, considering all aforementioned weightings, aids
in establishing essential hub proteins based on a defined threshold.
This multi-criteria approach enhances the probability of identifying
crucial hub proteins compared to methods solely reliant on degree
centrality criteria.
Our proposed work model demonstrates the applicability of graph
coloring in computational network biology. We propose a generic
mathematical model using graph coloring that can extract the potential
human hub proteins associated with different viral infections. We have
tested our models on SARS-CoV-2 infection occured in the recent past.
It shows the importance of two levels of protein–protein interaction
networks to understand the underlying mechanism and highlight the
associated mediators of SARS-CoV-2 infection-induced disease
manifestations. Furthermore, the experimental findings highlight the
importance of hub proteins in COVID-19. It also conjectures the hub
proteins associated biological pathways as the probable underlying
mechanisms of SARS-Cov-2-mediated human pathophysiology. The
bioinformatic analysis-based validation of our obtained results also
identifies some essential transcription factors that might play an
important role in altering biological signaling pathways. Finally, the
proposed model also underscored the interconnection between SARS-CoV-2
infection and the long-term effects of the disease and accordingly
highlighted some of the probable drug targets and respective drugs that
might be beneficial to prevent the disease and resolve associated
disorders. Furthermore, the experimental results also show that the
proposed model can also be applied for other pandemic like COVID-19
that may occur in future.
Materials and methods
Mathematical formulation
Following are the notations used for the mathematical formulation of
the proposed model.
[MATH: Vp: :MATH]
set of virus proteins,
[MATH: Hp: :MATH]
set of human proteins,
[MATH: G=(VG,EG): :MATH]
an undirected graph G.,
[MATH: VG: :MATH]
set of vertices,
[MATH: EG: :MATH]
set of edges whose elements are of the form
[MATH: (vi,vj) :MATH]
where
[MATH:
vi,vj<
/mi>∈VG :MATH]
,
[MATH: deg(vi): :MATH]
degree of
[MATH: vi. :MATH]
[MATH: χ(G): :MATH]
chromatic number of a graph G.
[MATH: Ci=C1,C2,C3,
⋯Cn:
:MATH]
set of color classes.
[MATH: C(vi): :MATH]
color of
[MATH: vi. :MATH]
[MATH: VIC(x): :MATH]
number of virus protein interactors of x.
[MATH: TOTHIC(x): :MATH]
total number of human protein interactors of x.
[MATH:
INTERHIC(x): :MATH]
number of inter human protein interactors of x.
[MATH:
INTRAHIC(x): :MATH]
number of intra human protein interactors of x.
L-1, L-2 graphs:
[MATH: G=(VG,EG) :MATH]
where
[MATH: VG=vi:vi∈Hp
, :MATH]
and
[MATH: EG=(vi,vj) :MATH]
where
[MATH:
vi,vj<
/mi>∈Hp,
:MATH]
[MATH: i≠j, :MATH]
and
[MATH: vi :MATH]
is interacting with
[MATH: vj. :MATH]
[MATH:
HUBp:
mrow> :MATH]
set of human hub proteins.
Z(x) : Z-score of x., X(x) : raw score of x.,
[MATH: μ: :MATH]
mean,
[MATH: σ: :MATH]
standard deviation.
For a given graph
[MATH: G=(VG,EG) :MATH]
, graph coloring problem can be mathematically defined as follows:
[MATH: min∑k=1mck
:MATH]
[MATH: s.t.∑k=
1muvk=1
mtd> :MATH]
1
[MATH: uvk+uwk≤ck∀(v,w)∈EG,k∈1,2,⋯m
mfenced> :MATH]
2
[MATH: uvk,ck∈0,1∀v∈VG
mi>,k∈1,2,⋯m
mfenced> :MATH]
3
where m is the upper bound of the chromatic number
[MATH: χ(G) :MATH]
,
[MATH:
ck=1
:MATH]
for all used color k, and
[MATH: uvk=1
:MATH]
if
[MATH: C(v)=k :MATH]
[MATH:
∀v∈VG
mrow> :MATH]
. Equation (2) prevents two vertices to be assigned the same color
values.
The hub human proteins are extracted at different levels based on the
color values of the vertices of the graph and weightages given for
different types of interactions. The first set of hub proteins at
first-level are calculated as
[MATH: HUBp(L-1)=vi:vi∈Hp,
Z(vi)≥θ :MATH]
4
where
[MATH: Z(vi)=X(vi)-μσ :MATH]
5
[MATH: X(vi)=X1(vi)+X2(vi)+X3(vi)
:MATH]
6
[MATH: X1
(vi)=w1ifC(vi)⩽n100
mfrac>·χ(G)andVIC(vi)⩾ϕw1
2ifC(vi)⩽n100
mfrac>·χ(G)andVIC(vi)<ϕw1
4ifC(vi)>n100
mfrac>·χ(G)andVIC(vi)⩾ϕ-w1
mn>ifC(vi)>n100
mfrac>·χ(G)andVIC(vi)<ϕ
mtable> :MATH]
7
[MATH: X2
(vi)=w2ifC(vi)⩽n100
mfrac>·χ(G)andINTRAHIC(vi)⩾ψw2
2ifC(vi)⩽n100
mfrac>·χ(G)andINTRAHIC(vi)<ψw2
4ifC(vi)>n100
mfrac>·χ(G)andINTRAHIC(vi)⩾ψ-w2
mn>ifC(vi)>n100
mfrac>·χ(G)andINTRAHIC(vi)<ψ
mtable> :MATH]
8
[MATH: X3
(vi)=w3ifC(vi)⩽n100
mfrac>·χ(G)andINTERHIC(vi)⩾ηw3
2ifC(vi)⩽n100
mfrac>·χ(G)andINTERHIC(vi)<ηw3
4ifC(vi)>n100
mfrac>·χ(G)andINTERHIC(vi)⩾η-w3
mn>ifC(vi)>n100
mfrac>·χ(G)andINTERHIC(vi)<η
mtable> :MATH]
9
where n is the percentages of color classes to be chosen,
[MATH: X1(vi) :MATH]
is the score of the node calculated based on the the color classes and
number of virus interactor proteins,
[MATH: X2(vi) :MATH]
is the score of the node calculated based on the the color classes and
number of intra-human interactor proteins,
[MATH: X3(vi) :MATH]
is the score of the node calculated based on the the color classes and
number of inter-human interactor proteins, and
[MATH:
w1,w2<
/mn>,w3,θ
mi>,ϕ,ψ,η
:MATH]
are different threshold values.
The second set of hub proteins at second-level are calculated as
[MATH: HUBp(L-2)=vi:vi∈Hp,
Z(vi)≥λ :MATH]
10
where
[MATH: Z(vi)=X4(vi)-μσ :MATH]
11
[MATH: X4
(vi)=w4ifC(vi)⩽m100
mfrac>·χ(G)andINTERHIC(vi)⩾ζw4
2ifC(vi)⩽m100
mfrac>·χ(G)andINTERHIC(vi)<ζw4
4ifC(vi)>m100
mfrac>·χ(G)andINTERHIC(vi)⩾ζ-w4
mn>ifC(vi)>m100
mfrac>·χ(G)andINTERHIC(vi)<ζ
mtable> :MATH]
12
where m is the percentage of color classes to be considered,
[MATH: X4(vi) :MATH]
is the score of the node calculated based on the the color classes and
number of inter-human interactor proteins, and
[MATH:
w4,λ,ζ :MATH]
are different threshold values.
Development of DEGCP algorithm
DEGCP algorithm (Algorithm 1) was developed based on the existing
Modified Discrete Differential Evolution (MDDE) algorithm [[80]38]. Two
modifications were made in the existing MDDE algorithm. Firstly, we
generated the initial populations with some guidance for producing
promising vectors where the graph’s vertices were first sorted
according to the degree in descending order. Then, the vertices of
equal degree positions were interchanged randomly to generate different
populations. Lastly, instead of a two-point crossover, the ordered
crossover technique was incorporated in DEGCP algorithm.
Algorithm 1.
[81]Algorithm 1
[82]Open in a new tab
DEGCP Algorithm
Algorithm 2.
[83]Algorithm 2
[84]Open in a new tab
Sequential Coloring Algorithm
First level human–human PPI network construction
Thirty SARS-CoV-2 virus proteins (E, M, N, S, Nsp1-Nsp-16, ORF10,
ORF14, ORF3A, ORF3B, ORF6, ORF7A, ORF7B, ORF8, ORF9B, ORF9C) including
spike, envelop and accessory proteins and their direct human
interactors were extracted from BioGRID repository [[85]39, [86]40].
The human interactors were included in our network contained all the
experimentally identified physical interactors of virus proteins
specified in the repository. Based on this criteria, 5225 unique human
interactor proteins were selected for the first level network
construction. The intra-interactions amongst these 5225 human proteins
were extracted based on the same experimentally identified physical
interactions criteria from BioGRID human–human protein interactome
repository. Accordingly, 209,945 unique interactions were identified
amongst 5225 human proteins which were further employed as the size and
order respectively to create an undirected graph. This first-level of
this PPI network was named PPIN1, and the undirected graph named L-1.
Implementation of DEGCP algorithm and degree centrality criteria on L-1 graph
DEGCP algorithm was used to color the L-1 (5225, 209,945) graph with
the parameters
[MATH: P=50 :MATH]
,
[MATH: ω=0.6 :MATH]
,
[MATH:
Pc=0.75
:MATH]
,
[MATH:
Max_Ite
ration=<
/mo>2000 :MATH]
. Consequently, a total of 60 disjoint color classes were identified.
Then, we executed Eqs. (4)–(9) with different threshold values to
extract first level potential human hub proteins. After several trails,
we choose
[MATH: n=25 :MATH]
, and the other threshold values as
[MATH:
ϕ=5,ψ=1
0,η=25,w1=5,w2<
/mn>=1,w3=1 :MATH]
, and
[MATH: θ=1 :MATH]
to obtain a legitimate number of potential hub human proteins in the
first level.
Second level human–human PPI network construction
The second level of PPI network was constructed using the second level
human protein interactors of 1082 L1 hub proteins. Accordingly, a total
of 10,371 unique human interactors of L-1 hub proteins were extracted
from BioGRID repository based on same experimentally identified
physical interaction criteria. We then constructed a PPI network
(PPIN2) and undirected graph L-2 based on 268,738 unique
inter-interactions amongst 11,453 (10,371
[MATH: + :MATH]
1082) human proteins.
Implementation of DEGCP algorithm and degree centrality criteria on L-2 graph
DEGCP algorithm was used similarly to color the L-2 (11,453, 268,738)
graph with the parameters
[MATH: P=50 :MATH]
,
[MATH: ω=0.6 :MATH]
,
[MATH:
Pc=0.75
:MATH]
,
[MATH:
Max_Ite
ration=<
/mo>2000 :MATH]
. Consequently, a total of 80 disjoint color classes were identified.
After that, we executed Eqs. (10)–(12) with
[MATH: m=25 :MATH]
, and the different threshold values to extract second level potential
human hub proteins. After several trails, we set the threshold values
[MATH:
ζ=100,w4
mn>=5 :MATH]
, and
[MATH: λ=1 :MATH]
to obtain the potential hub human proteins in the second level.
Gene ontology analysis of the identified human hub proteins
Gene ontology (GO) analysis of the identified L-1 and L-2 hub proteins
were performed using the DAVID Functional Annotation Tool [[87]41,
[88]42]. The Entrez Gene ID of the hub proteins were uploaded and the
functional annotation were performed using DAVID. Biological Process
(BP), Cellular Component (CC), and Molecular Function (MF) were
extracted for the respective hub proteins. A modified Fisher Exact
p-value, also known as the threshold of EASE Score, was used for gene
enrichment analysis. Enrichment was performed by setting an EASE Score
0.01 and the minimum number of gene count as 5 for stringent gene
enrichment.
KEGG pathway analysis of the identified human hub proteins
Involvement of hub proteins in biological pathways were identified
through KEGG Pathway Analysis using DAVID Functional Annotation Tool
[[89]41, [90]42]. The Entrez Gene ID of the hub proteins were uploaded
and functional annotation were done by selecting the Pathways option.
Only KEGG-PATHWAY was selected for specified pathway analysis. Like GO
Analysis, the threshold value of the EASE Score for pathways analysis
has also been set to 0.01, but unlike GO, the minimum number of gene
count has been set to 15 to identify the most crucial pathways.
Results
Implementation of the DEGCP algorithm on undirected graph instances exhibits
optimal/best-known results
To assess the applicability of the DEGCP algorithm to biological
protein–protein interaction networks, first, we evaluated its
performance on multiple undirected graph instances. Accordingly, this
algorithm was tested on conventional DIMACS benchmark instances widely
employed and originally suggested for graph coloring problems. The
proposed DEGCP algorithm was tested fifty times independently on some
of the different small, medium, and large-size graphs of varying
complexity. Our experimental outcomes for respective graph instances
with corresponding edges and vertices (columns 1–3, Table [91]1)
demonstrated that the results produced by the proposed algorithm
(column 6, Table [92]1) were the same as the optimal (column 4, Table
[93]1) or best-known results (column 5, Table [94]1). Furthermore,
DEGCP algorithm-derived results achieved optimal or best-known results
with a 100% success rate for 17 graph instances (first 17 instances of
column 8, Table [95]1), and the success rates for the other instances
were within an acceptable range (60–80% for 18–20th instances of column
8, Table [96]1). Notably, our proposed algorithm produced the optimal
results within an acceptable time limit (column 7, Table [97]1), which
further suggested its efficiency and applicability for graphs with
larger order and size. Altogether, the results demonstrated the
effectiveness of the proposed algorithm on undirected graphs.
Subsequently, we applied the DEGCP algorithm on SARS-CoV-2-Human
protein–protein interaction network to delineate a deeper insight into
viral infection and its consequences on human pathobiology.
Table 1.
Results obtained by DEGCP Algorithm with time and success rate
DIMACS graph instances Edges Vertex Known optimal
[MATH: χ(G) :MATH]
Best known result for
[MATH: χ(G) :MATH]
= ? Results obtained by DEGCP DEGCP time (S) DEGCP success rate
homer 1629 561 13 13 13 0.12 50/50
fpsol2.i.1 11,654 496 65 65 65 0.59 50/50
inithx.i.1 18,707 864 54 54 54 1.64 50/50
zeroin.i.1 4100 211 49 49 49 0.11 50/50
le450_25a 8260 450 25 25 25 24.81 50/50
le450_25b 8263 450 25 25 25 8.28 50/50
myciel7 2360 191 8 8 8 0.02 50/50
1-Insertions_6 6337 607 ? 7 7 0.16 50/50
2-Insertions_5 3936 597 ? 6 6 0.12 50/50
3-Insertions_5 9695 1406 ? 6 6 0.66 50/50
4-Insertions_4 1795 475 ? 5 5 0.71 50/50
1-FullIns_5 3247 282 ? 6 6 0.03 50/50
2-FullIns_5 12201 852 ? 7 7 0.27 50/50
3-FullIns_5 33,751 2030 ? 8 8 1.20 50/50
4-FullIns_5 77,305 4146 ? 9 9 5.44 50/50
5-FullIns_4 11,395 1085 ? 9 9 0.35 50/50
wap05a 43,081 905 ? 50 50 118.31 50/50
school1 19,095 385 ? 14 14 3145.21 30/50
school1_nsh 14,612 352 ? 14 14 2459.35 35/50
will199GPIA 7065 701 ? 7 7 924.26 40/50
[98]Open in a new tab
PPI network establishment using human interactors of SARS-CoV-2 and
graph-coloring algorithm implementation predicts crucial first-level hub
proteins
To identify crucial human protein facilitators through which SARS-CoV-2
impacts the biological changes in infected individuals, we first
constructed a protein–protein interaction network (PPIN). To create
this PPIN, human interactors of thirty SARS-CoV-2 proteins were
extracted from the BioGRID database (Fig. [99]2A). After that, the PPIN
was constructed using 5225 human interactors of thirty SARS-CoV-2
proteins and their associated 2,09,945 interactions from the BioGRID
human protein interactome database and visualized through a connected
network (Fig. [100]2B). In this network, the human interactors of
SARS-CoV-2 were denoted as different colored nodes based on the
chromatic coloring approach of using minimum number (not necessarily
optimum) of colors to designate adjacent connecting nodes with
different colors. The edges represented direct interactions among these
proteins, devoid of any indication of regulatory relationships,
functional similarities, or directionality of regulation or signal or
upstream-downstream demarcation. Thereafter, to find out the important
hub proteins among those 5225 interactors, a graph coloring algorithm
was implemented on PPIN and the first 25% of color classes were
extracted. Additionally, weightage to the degree centrality of the
nodes was added to extract potential hub proteins based on three
criteria: interaction
[MATH: ≥ :MATH]
5 with SARS-CoV-2 proteins, intra-interaction
[MATH: ≥ :MATH]
10 within 5225 proteins, and inter-interaction
[MATH: ≥ :MATH]
25 with next-level human interactors. The first criterion was used to
add extra weightage based on the higher interaction probability with
viral proteins, the second criterion was to add the additional impact
of the viral interaction on the first-level interactors, and the third
criterion was to add the higher likelihood of impacting on infected
patients. It yielded 1082 potential hub proteins (Additional file
[101]2: Table: Sheet S1) that were designated as L-1 hub proteinsand
visualized through a network (Fig. [102]2C).
Fig. 2.
[103]Fig. 2
[104]Open in a new tab
First-level protein–protein interaction network and graph coloring
algorithm implemented filtration of HUB proteins. A Occurrence of human
interactors of individual SARS-Cov-2 proteins. B PPI network of 5225
total interactors and visualization using chromatic graph coloring
approach. C Highlighted PPI network of 1082 L-1 hub proteins and
visualization using chromatic graph coloring approach. D Subcellular
localization analysis of 1082 L-1 hub proteins. E Interaction frequency
of 1082 L-1 hub proteins with 27 different groups of virus families. F
Categorization of 27 interacting different groups of virus families
based on genome content
Evaluation of cellular localization of L-1 hub proteins showed 21% were
plasma membrane, 21% were ER membrane, and 14% were golgi
membrane-localized proteins whereas 35% were cytosolic proteins
(Fig. [105]2D). It suggests that most of the essential direct
interactors of SARS-CoV-2 were either membrane or cytosolic proteins.
It also supports their higher possibility to interact with SARS-CoV-2
proteins which indirectly validates the potential of our graph coloring
approach and weightage degree centrality-based hub protein
identification method. The literature and HVIDB database [[106]43]
mapping of L-1 hub protein revealed that 80.4% (870 out of 1082) of
them were previously reported as possible interactors of other than
SARS-CoV-2 viruses (Additional file [107]2: Sheet S3), which belong to
27 different viral families (Fig. [108]2E). Notably, the hub proteins
interacting with viral proteins from the same family may or may not
fall into the same category. This is because of our undirected
protein–protein interaction networking model, where color codes are not
assigned based on functional similarity. As SARS-CoV-2 is an RNA virus
family member, categorization based on the genetic content of
interacting virus families indicates 33.33% of them were DNA virus
families, whereas 66.67% were RNA virus families (Fig. [109]2F). It is
noteworthy that the majority of the identified hub proteins, associated
with both RNA and DNA viruses, may contribute to the manifestations of
viral diseases. However, one-third of the DNA-virus family interactors
may have distinctive roles in the context of SARS-COVID-19 responses.
The complexity and distinct manifestations of COVID-19 compared to
other viral diseases prompt the consideration that these hub proteins
might play unique roles in the development of the complex disease
associated with SARS-CoV-2. Alternatively, these hub proteins may act
as common mediators for both DNA and RNA viral infections. This further
suggests our method identified essential hub proteins with a higher
probability of interacting with RNA viruses.
PPI network construction of the second-level interactors of the first-level
hub proteins and application of graph coloring algorithm predicts the
important second-level hub proteins
To characterize the second level of important human proteins through
which L-1 hub proteins influence the biological alterations in infected
individuals, a second protein–protein interaction network (PPIN2) was
constructed. To generate the PPIN2 network the second-level interactors
of 1082 L1-hub proteins were extracted from the BioGRID database
(Fig. [110]3A). Subsequently, PPIN2 was created using 1082 L1-hub
proteins and their second-level 2,68,738 unique inter-interactions with
the rest of the 10,371 human interactors from the BioGRID human protein
interactome database and presented through a network highlighting the
L-1 proteins (Fig. [111]3B). Similar to PPIN1 network, nodes were
colored based on the chromatic coloring approach and the edges denoted
as direct interactions among the nodes, devoid of any indication of
regulatory relationships, functional similarities, or directionality of
regulation or signal or upstream-downstream demarcation. The chromatic
coloring of nodes were applied based on the similar criteria of PPIN
and the first 25% of color classes were extracted as the significant
nodes. Furthermore, weightage to the degree centrality of the nodes was
added to extract most potential hub proteins based on the criteria:
intra-interaction
[MATH: ≥ :MATH]
100 within a total of 11,453 interactors. This criterion was added to
filter out the second-level hub proteins with a higher possibility of
impacting infected patients. It yielded 1922 potential hub proteins
(Additional file [112]2: Sheet S4) that were designated as L-2 hub
proteins and highlighted in the total network (Fig. [113]3C).
Fig. 3.
[114]Fig. 3
[115]Open in a new tab
Second-level human–human protein–protein interaction network and graph
coloring algorithm implemented filtration of L-2 hub proteins. A
Frequency of human interactors of individual L-1 hub proteins. B PPI
network of 1082 L-1 hub proteins and 10,371 total interactors
highlighting L-1 hub proteins. C Highlighted PPI network of 1922 L-2
hub proteins and visualization. D Subcellular localization analysis of
1922 L-2 hub proteins. E Interaction frequency of 1922 L-2 hub proteins
with 32 different groups of virus families. F Categorization of 32
interacting different groups of virus families based on DNA or RNA
genome content
Cellular localization analysis of L-2 hub proteins corresponds to 44%
of nucleus-localized proteins and 50% of Cytosolic proteins and no or
minimal plasma membrane, ER, or Golgi membrane proteins (Fig. [116]3D).
It suggested the possibility that the majority of second-level
interactors might be directly or indirectly involved in gene regulation
which is also expected from the second-level interactors. Hence, it
also validates the potential of our method to extract the important hub
proteins. HVIDB database [[117]43] mapping for L-2 hub proteins for the
interaction information with other viruses revealed that 59% (1134 out
of 1982) of them were previously reported as possible interactors for
non-SARS-CoV-2 viruses (Additional file [118]2: Sheet S6). Grouping of
these viruses according to different families showed they belonged to
32 different viral families (Fig. [119]3E). Similar to L-1 hub
proteins, the identified L-2 hub proteins interacting with viral
proteins from the same family may or may not fall into the same
category because of our proposed protein–protein interaction networking
structure. Additionally, genetic content-based categorization of these
32 virus families appeared as 34.38% of them were DNA virus families
whereas 65.62% were RNA virus families (Fig. [120]3F). As explained
earlier, one-third of the DNA-virus family interactors may have
distinctive roles in the context of SARS-COVID-19 responses and might
play unique roles in the development of the complex disease associated
with SARS-CoV-2. This further suggests our method was able to find out
important hub proteins which have a higher probability to interact with
RNA viruses like SARS-CoV-2.
Collective PPI network-graph coloring model identifies important biological
and functional consequences of SARS-CoV-2 infection
To demonstrate the efficiency of our proposed model for identifying
SARS-CoV-2 infection-linked important hub proteins and to establish
their biological implications, L-1 and L-2 hub proteins associated
biological and functional consequences and their potentiality to alter
the human gene expression in favor of them were explored. Therefore,
transcription factor database [[121]44] mapping and DAVID functional
annotation tool-based analysis of L-1 and L-2 hub proteins were
performed. Cellular localization analysis of L-1 and L-2 hub proteins
(Fig. [122]2D, Additional file [123]2: Sheet S2, Fig. [124]3D,
Additional file [125]2: Sheet S5) showed that the majority of L-1 hub
proteins were cytosolic whereas most of the L-2 hub proteins were
nuclear and cytosolic. In an extension of those findings, transcription
factor (TF) database mapping on this combined PPI network of L-1 and
L-2 proteins demonstrates that none of the L-1 hub proteins were TF
whereas 206 out of 1,922 L-2 hub proteins were TF (Fig. [126]4A,
Additional file [127]2: Sheet S9). Collectively this result
substantiates the possibility that SARS-CoV-2 proteins interact with
key first level interactor (L-1 hub proteins) which further interacts
with the vital second-level proteins (L-2 hub proteins), which have the
potential to enter inside the nucleus and alter gene expression to
establish the disease manifestations. Furthermore, it signifies the
potential of our proposed model to identify the crucial proteins
related to SARS-CoV-2 infection to develop COVID-19 disease.
Fig. 4.
[128]Fig. 4
[129]Open in a new tab
Biological and functional consequences of PPI network-Graph coloring
model identified L-1 and L-2 hub proteins. A L-1 and L-2 hub proteins
highlighted second level PPI network and identified transcription
factors (TF) amongst those hub proteins after TF database mapping. B
L-1 and L-2 proteins associated 50 biological processes and their gene
count in each process. C KEGG pathway enrichment analysis of L-1 and
L-2 hub proteins and 55 represented pathways and their gene count in
each pathway
To further investigate the all-inclusive involvement of L-1 and L-2 hub
proteins in the disease manifestation of COVID-19, DAVID functional
analysis for analyzing their involvement in biological process were
performed (Fig. [130]4B). The result revealed that a good number of of
L-1 and L-2 hub proteins identified by our proposed model were
associated with important biological process like regulation of
transcription positively and negatively through the alteration of
RNA-Pol-II or DNA template or regulating the mRNA splicing and
processing as well as protein translation, which might be the possible
ways of alteration of host gene expression in favor of the SARS-CoV-2
infection and disease manifestation. Also, a good number of proteins
were involved in regulation of apoptosis, cell division and cell cycle,
cell proliferation and cell migration, chromatin organization process
which might be the possible ways SARS-CoV-2 alter the host cell fates
towards the death to induce the injury in different organs mainly in
the first infected sight, lung. Some of those identified L-1 and L-2
proteins were engaged in protein phosphorylation and their transport in
different organellar location and degradation through ubiquitin pathway
that might be the possible ways through which SARS-CoV-2 alters the
hosts’ cellular signaling to support their favorable conditions. Our
result suggested that another group of proteins were involved in
endocytosis and more importantly in accordance to response in hypoxic
conditions which might be the important one to support their survival
and activity in less oxygen environment which is the common
manifestation of SARS-CoV-2 infected hosts (Fig. [131]4B, Additional
file [132]2: Sheet S7).
Furthermore, to delineate the L-1 and L-2 hub proteins’ involvement in
biological pathways through which SARS-CoV-2 essentially alter the
homeostasis of host cellular pathways, the KEGG pathways enrichment
analysis was performed (Fig. [133]4C). The result showed that a large
number of proteins were involved in important cellular signaling
pathways like MAPK signaling, cAMP signaling, cGMP-PGK, FoxO and ErbB
signaling pathways which might be responsible for altering the host
cellular functions to support the infection and disease progression
(Fig. [134]4C). A good number of hub proteins were involved in HIF-1
signaling pathways responsible to alter the pathways related to hypoxic
condition, suggest that through this alteration SARS-CoV-2 might
support their progression at less oxygen conditions. Likewise, proteins
involved in p53 signaling pathway were the possible moderator of
SARS-CoV-2 through which virus change the host cell death or cell
proliferation of the infected organs. Similarly, large number of hub
proteins were involved in different cancer related signaling pathways
including lung cancer, suggested that those are responsible to change
the large number of cellular pathways in favor of infection
establishment and disease manifestations (Fig. [135]4C, Additional file
[136]2: Sheet S8).
This Proposed model underscores the connections between SARS-CoV-2 infection
and its patho-physiology
To investigate the biological connection between SARS-CoV-2 infection
and the development of its pathophysiology, the association of the L-1
and L-2 hub proteins with NIH defined COVID-19 manifestations were
evaluated. Therefore, well established COVID-19 symptoms or
pathophysiology as per NIH report were listed from the published
literature search [[137]45, [138]46]. The four major symptoms
categories were identified as most established manifestations of
COVID-19 as respiratory, cardiovascular, hematologic, and
neuropsychiatric disorders. Based on the literature search (References