Abstract Purpose Graph coloring approach has emerged as a valuable problem-solving tool for both theoretical and practical aspects across various scientific disciplines, including biology. In this study, we demonstrate the graph coloring’s effectiveness in computational network biology, more precisely in analyzing protein–protein interaction (PPI) networks to gain insights about the viral infections and its consequences on human health. Accordingly, we propose a generic model that can highlight important hub proteins of virus-associated disease manifestations, changes in disease-associated biological pathways, potential drug targets and respective drugs. We test our model on SARS-CoV-2 infection, a highly transmissible virus responsible for the COVID-19 pandemic. The pandemic took significant human lives, causing severe respiratory illnesses and exhibiting various symptoms ranging from fever and cough to gastrointestinal, cardiac, renal, neurological, and other manifestations. Methods To investigate the underlying mechanisms of SARS-CoV-2 infection-induced dysregulation of human pathobiology, we construct a two-level PPI network and employed a differential evolution-based graph coloring (DEGCP) algorithm to identify critical hub proteins that might serve as potential targets for resolving the associated issues. Initially, we concentrate on the direct human interactors of SARS-CoV-2 proteins to construct the first-level PPI network and subsequently applied the DEGCP algorithm to identify essential hub proteins within this network. We then build a second-level PPI network by incorporating the next-level human interactors of the first-level hub proteins and use the DEGCP algorithm to predict the second level of hub proteins. Results We first identify the potential crucial hub proteins associated with SARS-CoV-2 infection at different levels. Through comprehensive analysis, we then investigate the cellular localization, interactions with other viral families, involvement in biological pathways and processes, functional attributes, gene regulation capabilities as transcription factors, and their associations with disease-associated symptoms of these identified hub proteins. Our findings highlight the significance of these hub proteins and their intricate connections with disease pathophysiology. Furthermore, we predict potential drug targets among the hub proteins and identify specific drugs that hold promise in preventing or treating SARS-CoV-2 infection and its consequences. Conclusion Our generic model demonstrates the effectiveness of DEGCP algorithm in analyzing biological PPI networks, provides valuable insights into disease biology, and offers a basis for developing novel therapeutic strategies for other viral infections that may cause future pandemic. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-024-05690-0. Keywords: Differential evolution, Protein–protein interaction networks, Drug target, Generic model, Hub proteins, KEGG pathway, Respiratory disorder Background COVID-19, a global pandemic that emerged in late 2019, is caused by the highly transmissible severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Belonging to the Coronaviridae family and Orthocoronavirinae subfamily, SARS-CoV-2 is a single-stranded positive-sense RNA (+ssRNA) virus, which belongs to Betacoronavirus genus among four classified coronavirus genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus [[31]1]. Within its genomic makeup, SARS-CoV-2 encodes sixteen non-structural proteins (Nsp1-16), four structural proteins (S, E, M, and N), and some accessory proteins (ORF) [[32]2]. SARS-CoV-2-infected individuals commonly experience severe respiratory symptoms, such as fever, cough, and loss of taste and smell at the initial stage of infection. However, the impact of the virus extends beyond the respiratory system, resulting in additional manifestations such as nausea, intestinal obstruction, lung injury, and various complications affecting organs such as the kidney, gut, brain, heart, and other body parts [[33]3, [34]4]. Till now, this global threat has taken almost seven-million human lives and infected more than seven hundred million people worldwide.[35]1 During the pandemic period, numerous studies underscored the critical role of protein–protein interactions (PPIs) between SARS-CoV-2 virus proteins and human host factors in the initiation and progression of the infection, leading to the development of pathogenesis and associated human pathophysiological responses [[36]5–[37]9]. PPIs are the physical connections between two or more proteins. The collection of various protein–protein interactions forms a biological network called protein–protein interaction network (PPIN). Many researchers predicted these PPIs between SARS-CoV-2 virus and human proteins to point out the underlying mechanism of SARS-CoV-2 proteins-mediated disease propagation [[38]5, [39]6, [40]10, [41]11]. The authors also predicted the affected pathways, target host proteins, associated disease progression, underlying cause of COVID-19 pathobiology, anti-viral drug repurposing and preventive therapeutic measures in a discrete way [[42]6–[43]8, [44]12]. But the actual host cellular targets of the SARS-CoV-2 virus, the precise pathways of virus transmission in humans, and the relative importance of each of those proteins and pathways are yet to be understood fully. Also, the long-term impacts of SARS-CoV-2 infection in humans still not fully explored. Furthermore, no generic model has been proposed yet that can reveal the underlying mechanism, transmission, effects, disease manifestations, and effective therapeutic strategies in a single framework for all viral infections. Therefore, it suggests an immense need to better understand viral infections and consequent dysregulation of human biological homeostasis in a comprehensive manner to offer new treatment or preventive strategies for COVID-19 like pandemic which may occur in future. Biological network establishment using virus-human protein–protein interactions, analysing the network for extrapolation to the next level of human–human protein–protein interaction network, identification of the crucial hub proteins at different levels in those biological networks along with bioinformatic analysis of those hub proteins to identify associated disease manifestations might have the potential to fulfill this purpose. In Discrete Mathematics and Computer Science, any network is commonly represented using a graph. A graph is a collection of a set of vertices called nodes, and a set of links or connections between the nodes, called edges. Based on the direction of the edges or the weight associated with the edges, a graph can be directed, undirected, or weighted. Likewise, based on the functionalities and characteristics of the network, any biological network can be mapped mathematically to a directed, undirected, or weighted graph. Hence, biological protein–protein interaction network can also be mathematically represented as an undirected graph. In the mathematical representation of a PPI network, the proteins are represented as the vertices of the graph, and the interactions between the proteins are represented as the edges of the graph. Similarly, hub protein identification in a PPI network can also be mapped to the problem of vertex identification in a graph, which can be done in various ways, and most of them are successfully applied to numerous real-life combinatorial optimization problems which are NP-hard or NP-complete in nature [[45]13, [46]14]. One of a very promising method to identify the vertices of a graph is graph coloring, which is also a well-known NP-complete combinatorial optimization problem [[47]15]. Graph coloring is a fundamental concept in graph theory, involving the assignment of colors to the vertices (nodes) of a graph so that no two neighboring vertices, connected by an edge, share the same color. The least number of colors required to generate such coloring is known as the chromatic number of the graph. However, determining the chromatic number or finding an optimal coloring is considered an NP-complete problem, which is computationally challenging and time-consuming [[48]16]. Consequently, meta-heuristic algorithms, including evolutionary algorithms and approximation methods, are commonly employed to find theoretical and practical solutions efficiently [[49]17]. Graph coloring has wide applications in solving various optimization and allocation problems in real-world scenarios. Its successful implementation extends to diverse fields, including computational and network biology [[50]18, [51]19]. Specifically, in biology, graph coloring has been proven helpful in modeling gene regulation networks, protein–protein interaction networks, disease-gene associations, drug target identification, functional annotation, disease subtyping, pathway analysis, and numerous other biological processes [[52]20–[53]24]. It is a straightforward and intuitive method that does not require complex algorithms or extensive computational resources and needs less memory consumption [[54]22]. The approach is highly scalable and can be scaled to networks of varying sizes, making it suitable for small-scale and large-scale protein–protein interaction networks. As the network size increases, the graph coloring algorithm can efficiently handle the increased computational complexity without sacrificing accuracy [[55]22, [56]25]. In addition, it is a flexible method that can be adapted to incorporate additional information or constraints, such as functional annotations or experimental data, to enhance the accuracy of hub protein identification [[57]26]. In this study, we explore the application of an evolutionary graph coloring algorithm, namely DEGCP algorithm, in the context of a generic mathematical modelling of a biological protein–protein interaction network. The main objective is to propose a generic mathematical model for analysing a complex biological PPI network to uncover pivotal mediators associated with viral diseases and their effects on human pathophysiology. By utilizing DEGCP, we aim to analyze the complex PPI network efficiently and identify crucial hub proteins that play essential roles in the disease’s progression and manifestation. This investigation holds the potential to contribute valuable insights into the underlying biological mechanisms of viral diseases and may pave the way for novel therapeutic strategies in combating the disease and related conditions in future that may cause global threat. The proposed model (Fig. [58]1) consists of identifying pivotal hub proteins, their interactions with various DNA & RNA viral families, gene ontology and pathway analyses of those hub proteins, transcription factors identification, association of the hub proteins with various disease symptoms, and identification of their druggable targets and respective drugs. To validate our proposed model, we used the experimental data associated with COVID-19 disease, a pandemic that occurred all of the world in the recent past. The experimental outcomes aim that the proposed model can also be applied to other viral infections. Fig. 1. [59]Fig. 1 [60]Open in a new tab Schematic view of establishing SARS-CoV-2-human protein–protein interaction network and application of DEGCP algorithm to unravel the deeper insight of SARS-CoV-2 infection Related work Over the last few decades, literature surveys revealed various experiments that were conducted to analyze and predict the PPI for understanding different biological functions and identifying the crucial target proteins in protein–protein interaction networks [[61]27–[62]29]. An effective graph coloring-based integrative statistical algorithm has been designed for essential protein prediction [[63]22]. The authors proposed hybridization methods consisting of graph coloring and artificial neural network (ANN) for finding the target proteins in infectious diseases [[64]25, [65]26]. Graph theoretic approach was also proposed to find the infected pathways in viral disease [[66]30]. PPIs between virus and host proteins were studied to highlight the underlying mechanism of SARS-CoV-2 proteins-mediated disease propagation [[67]5]. A recent study exhaustively examined the molecular pathways, including pathways-based therapeutic targets for COVID-19 [[68]6]. A comprehensive analysis was made using unsupervised machine learning method to highlight COVID-19-related affected pathways [[69]7]. Bioinformatics analysis was conducted to understand the underlying molecular mechanism of advancement of SARS-CoV-2 infection by receiver operating characteristic (ROC) curve analysis [[70]8]. Network topological analysis was performed to identify the potential target hub genes and affected pathways of COVID-19 [[71]9]. SARS-CoV-2-induced pathways and corresponding drug repurposing strategies were identified by artificial neural network analysis using random walk with restart (RWR) method [[72]12]. Computational identification of host genomic biomarkers influencing SARS-CoV-2 infections was made using statistical R-packages [[73]31]. Underlying causes of COVID-19 pathobiology and prediction of potential therapeutic targets and effective drugs were examined extensively through virus-host protein interaction network study [[74]32–[75]35]. The authors attempted to develop combinatorial treatment strategies targeting both host factors and viral enzymes through comprehensive mapping of interactions between SARS-CoV-2 proteins and human proteins [[76]36, [77]37]. An extensive computational investigation of the interactome of SARS-CoV-2 and human proteins was conducted for identifying possible virus-affected processes and potential protein binding sites [[78]10]. The common pathways and molecular biomarkers in COVID-19 were identified through PPI network analysis which can cause pulmonary fibrosis and lung cancer [[79]11]. Motivation and contribution Despite the enormous research, our understanding about the underlying mechanism and host targets of viral infections including SARS-CoV-2, effects of viral infections on host biological pathways, mechanism of development of disease pathology and its long-term impacts on hosts are still limited. The development of effective treatment options to prevent the disease and resolve associated manifestations are still an open challenge. Moreover, no single generic model exists till date that can address the issues associated with SARS-CoV-2 and other viral infections as well. These limitations motivated us to propose a new model that can enhance the deeper understanding of the mechanism of such viral infections and consequent pathophysiology. The understanding of viral-host interactions and their consequences is of utmost importance in comprehending the disease’s complex pathophysiology. Thus, in this study, we delve into the protein–protein interaction network of SARS-CoV-2 and human host proteins to identify crucial mediators that might shed light on the mechanisms underlying COVID-19 infection and its associated impact on human health. By elucidating these molecular interactions, we aim to contribute to the development of targeted therapeutic strategies and improve patient outcomes in this ongoing global health crisis, as well as for any other pandemic that can occur in future. We believe, our present research is a wise effort in this context. Our proposed model, rooted in an evolutionary graph coloring algorithm, presents numerous advantages over traditional degree centrality criteria in discerning hub proteins within PPI networks. In contrast to degree centrality, which predominantly assesses the number of direct interactions a protein has, our model integrates the topological context of the entire network. It explores protein interactions comprehensively, encompassing both direct and indirect connections. The employed graph coloring methodology adheres to the principle that adjacent nodes (proteins) should not share the same color, signifying the absence of direct interactions. However, proteins with the same color may possess indirect connections through shared intermediates, indicating potential functional relationships. This approach recognizes that proteins involved in a pathway may contribute to the pathway’s activity without necessitating direct physical interactions. Degree centrality, reliant on the count of direct interactions, may be sensitive to the overall network size. This size of the networks or graphs keep on changing in realistic situation, and thus the networks or graphs are dynamic in nature. Any changes in network structure, such as node addition or removal, can significantly influence degree centrality and alter the identification of hub proteins. In contrast, our model demonstrates greater robustness to structural changes, adapting to alterations in network connectivity through the assigned colors reflecting local context connectivity, and thus is well-suited for dynamic graphs or dynamic network structures. While our model does not entirely dismiss the significance of degree centrality, it incorporates it alongside additional weightings. The hub protein identification process involves two primary steps. Firstly, the PPI network graph undergoes chromatic coloring with minimal number of colors using a combination of differential evolution and sequential coloring algorithms. Differential evolution algorithm optimizes the ordering of the nodes so as to minimize the number of colors and the sequential algorithm assigns valid colors based on the degree and adjacency of the nodes. Subsequently, weights are assigned to human proteins based on color class, interactions with viral proteins, direct interactions with other proteins in the graph, and interactions with other human proteins outside the network. The incorporation of these diverse criteria, including but not limited to degree centrality, contributes to a more comprehensive hub protein identification process. The calculated Z-score, considering all aforementioned weightings, aids in establishing essential hub proteins based on a defined threshold. This multi-criteria approach enhances the probability of identifying crucial hub proteins compared to methods solely reliant on degree centrality criteria. Our proposed work model demonstrates the applicability of graph coloring in computational network biology. We propose a generic mathematical model using graph coloring that can extract the potential human hub proteins associated with different viral infections. We have tested our models on SARS-CoV-2 infection occured in the recent past. It shows the importance of two levels of protein–protein interaction networks to understand the underlying mechanism and highlight the associated mediators of SARS-CoV-2 infection-induced disease manifestations. Furthermore, the experimental findings highlight the importance of hub proteins in COVID-19. It also conjectures the hub proteins associated biological pathways as the probable underlying mechanisms of SARS-Cov-2-mediated human pathophysiology. The bioinformatic analysis-based validation of our obtained results also identifies some essential transcription factors that might play an important role in altering biological signaling pathways. Finally, the proposed model also underscored the interconnection between SARS-CoV-2 infection and the long-term effects of the disease and accordingly highlighted some of the probable drug targets and respective drugs that might be beneficial to prevent the disease and resolve associated disorders. Furthermore, the experimental results also show that the proposed model can also be applied for other pandemic like COVID-19 that may occur in future. Materials and methods Mathematical formulation Following are the notations used for the mathematical formulation of the proposed model. [MATH: Vp: :MATH] set of virus proteins, [MATH: Hp: :MATH] set of human proteins, [MATH: G=(VG,EG): :MATH] an undirected graph G., [MATH: VG: :MATH] set of vertices, [MATH: EG: :MATH] set of edges whose elements are of the form [MATH: (vi,vj) :MATH] where [MATH: vi,vj< /mi>VG :MATH] , [MATH: deg(vi): :MATH] degree of [MATH: vi. :MATH] [MATH: χ(G): :MATH] chromatic number of a graph G. [MATH: Ci=C1,C2,C3,Cn: :MATH] set of color classes. [MATH: C(vi): :MATH] color of [MATH: vi. :MATH] [MATH: VIC(x): :MATH] number of virus protein interactors of x. [MATH: TOTHIC(x): :MATH] total number of human protein interactors of x. [MATH: INTERHIC(x): :MATH] number of inter human protein interactors of x. [MATH: INTRAHIC(x): :MATH] number of intra human protein interactors of x. L-1, L-2 graphs: [MATH: G=(VG,EG) :MATH] where [MATH: VG=vi:viHp , :MATH] and [MATH: EG=(vi,vj) :MATH] where [MATH: vi,vj< /mi>Hp, :MATH] [MATH: ij, :MATH] and [MATH: vi :MATH] is interacting with [MATH: vj. :MATH] [MATH: HUBp: :MATH] set of human hub proteins. Z(x) : Z-score of x., X(x) : raw score of x., [MATH: μ: :MATH] mean, [MATH: σ: :MATH] standard deviation. For a given graph [MATH: G=(VG,EG) :MATH] , graph coloring problem can be mathematically defined as follows: [MATH: mink=1mck :MATH] [MATH: s.t.k= 1muvk=1 :MATH] 1 [MATH: uvk+uwkck(v,w)EG,k1,2,m :MATH] 2 [MATH: uvk,ck0,1vVG,k1,2,m :MATH] 3 where m is the upper bound of the chromatic number [MATH: χ(G) :MATH] , [MATH: ck=1 :MATH] for all used color k, and [MATH: uvk=1 :MATH] if [MATH: C(v)=k :MATH] [MATH: vVG :MATH] . Equation (2) prevents two vertices to be assigned the same color values. The hub human proteins are extracted at different levels based on the color values of the vertices of the graph and weightages given for different types of interactions. The first set of hub proteins at first-level are calculated as [MATH: HUBp(L-1)=vi:viHp, Z(vi)θ :MATH] 4 where [MATH: Z(vi)=X(vi)-μσ :MATH] 5 [MATH: X(vi)=X1(vi)+X2(vi)+X3(vi) :MATH] 6 [MATH: X1 (vi)=w1ifC(vi)n100·χ(G)andVIC(vi)ϕw1 2ifC(vi)n100·χ(G)andVIC(vi)<ϕw1 4ifC(vi)>n100·χ(G)andVIC(vi)ϕ-w1ifC(vi)>n100·χ(G)andVIC(vi)<ϕ :MATH] 7 [MATH: X2 (vi)=w2ifC(vi)n100·χ(G)andINTRAHIC(vi)ψw2 2ifC(vi)n100·χ(G)andINTRAHIC(vi)<ψw2 4ifC(vi)>n100·χ(G)andINTRAHIC(vi)ψ-w2ifC(vi)>n100·χ(G)andINTRAHIC(vi)<ψ :MATH] 8 [MATH: X3 (vi)=w3ifC(vi)n100·χ(G)andINTERHIC(vi)ηw3 2ifC(vi)n100·χ(G)andINTERHIC(vi)<ηw3 4ifC(vi)>n100·χ(G)andINTERHIC(vi)η-w3ifC(vi)>n100·χ(G)andINTERHIC(vi)<η :MATH] 9 where n is the percentages of color classes to be chosen, [MATH: X1(vi) :MATH] is the score of the node calculated based on the the color classes and number of virus interactor proteins, [MATH: X2(vi) :MATH] is the score of the node calculated based on the the color classes and number of intra-human interactor proteins, [MATH: X3(vi) :MATH] is the score of the node calculated based on the the color classes and number of inter-human interactor proteins, and [MATH: w1,w2< /mn>,w3,θ,ϕ,ψ,η :MATH] are different threshold values. The second set of hub proteins at second-level are calculated as [MATH: HUBp(L-2)=vi:viHp, Z(vi)λ :MATH] 10 where [MATH: Z(vi)=X4(vi)-μσ :MATH] 11 [MATH: X4 (vi)=w4ifC(vi)m100·χ(G)andINTERHIC(vi)ζw4 2ifC(vi)m100·χ(G)andINTERHIC(vi)<ζw4 4ifC(vi)>m100·χ(G)andINTERHIC(vi)ζ-w4ifC(vi)>m100·χ(G)andINTERHIC(vi)<ζ :MATH] 12 where m is the percentage of color classes to be considered, [MATH: X4(vi) :MATH] is the score of the node calculated based on the the color classes and number of inter-human interactor proteins, and [MATH: w4,λ,ζ :MATH] are different threshold values. Development of DEGCP algorithm DEGCP algorithm (Algorithm 1) was developed based on the existing Modified Discrete Differential Evolution (MDDE) algorithm [[80]38]. Two modifications were made in the existing MDDE algorithm. Firstly, we generated the initial populations with some guidance for producing promising vectors where the graph’s vertices were first sorted according to the degree in descending order. Then, the vertices of equal degree positions were interchanged randomly to generate different populations. Lastly, instead of a two-point crossover, the ordered crossover technique was incorporated in DEGCP algorithm. Algorithm 1. [81]Algorithm 1 [82]Open in a new tab DEGCP Algorithm Algorithm 2. [83]Algorithm 2 [84]Open in a new tab Sequential Coloring Algorithm First level human–human PPI network construction Thirty SARS-CoV-2 virus proteins (E, M, N, S, Nsp1-Nsp-16, ORF10, ORF14, ORF3A, ORF3B, ORF6, ORF7A, ORF7B, ORF8, ORF9B, ORF9C) including spike, envelop and accessory proteins and their direct human interactors were extracted from BioGRID repository [[85]39, [86]40]. The human interactors were included in our network contained all the experimentally identified physical interactors of virus proteins specified in the repository. Based on this criteria, 5225 unique human interactor proteins were selected for the first level network construction. The intra-interactions amongst these 5225 human proteins were extracted based on the same experimentally identified physical interactions criteria from BioGRID human–human protein interactome repository. Accordingly, 209,945 unique interactions were identified amongst 5225 human proteins which were further employed as the size and order respectively to create an undirected graph. This first-level of this PPI network was named PPIN1, and the undirected graph named L-1. Implementation of DEGCP algorithm and degree centrality criteria on L-1 graph DEGCP algorithm was used to color the L-1 (5225, 209,945) graph with the parameters [MATH: P=50 :MATH] , [MATH: ω=0.6 :MATH] , [MATH: Pc=0.75 :MATH] , [MATH: Max_Ite ration=< /mo>2000 :MATH] . Consequently, a total of 60 disjoint color classes were identified. Then, we executed Eqs. (4)–(9) with different threshold values to extract first level potential human hub proteins. After several trails, we choose [MATH: n=25 :MATH] , and the other threshold values as [MATH: ϕ=5,ψ=1 0,η=25,w1=5,w2< /mn>=1,w3=1 :MATH] , and [MATH: θ=1 :MATH] to obtain a legitimate number of potential hub human proteins in the first level. Second level human–human PPI network construction The second level of PPI network was constructed using the second level human protein interactors of 1082 L1 hub proteins. Accordingly, a total of 10,371 unique human interactors of L-1 hub proteins were extracted from BioGRID repository based on same experimentally identified physical interaction criteria. We then constructed a PPI network (PPIN2) and undirected graph L-2 based on 268,738 unique inter-interactions amongst 11,453 (10,371 [MATH: + :MATH] 1082) human proteins. Implementation of DEGCP algorithm and degree centrality criteria on L-2 graph DEGCP algorithm was used similarly to color the L-2 (11,453, 268,738) graph with the parameters [MATH: P=50 :MATH] , [MATH: ω=0.6 :MATH] , [MATH: Pc=0.75 :MATH] , [MATH: Max_Ite ration=< /mo>2000 :MATH] . Consequently, a total of 80 disjoint color classes were identified. After that, we executed Eqs. (10)–(12) with [MATH: m=25 :MATH] , and the different threshold values to extract second level potential human hub proteins. After several trails, we set the threshold values [MATH: ζ=100,w4=5 :MATH] , and [MATH: λ=1 :MATH] to obtain the potential hub human proteins in the second level. Gene ontology analysis of the identified human hub proteins Gene ontology (GO) analysis of the identified L-1 and L-2 hub proteins were performed using the DAVID Functional Annotation Tool [[87]41, [88]42]. The Entrez Gene ID of the hub proteins were uploaded and the functional annotation were performed using DAVID. Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) were extracted for the respective hub proteins. A modified Fisher Exact p-value, also known as the threshold of EASE Score, was used for gene enrichment analysis. Enrichment was performed by setting an EASE Score 0.01 and the minimum number of gene count as 5 for stringent gene enrichment. KEGG pathway analysis of the identified human hub proteins Involvement of hub proteins in biological pathways were identified through KEGG Pathway Analysis using DAVID Functional Annotation Tool [[89]41, [90]42]. The Entrez Gene ID of the hub proteins were uploaded and functional annotation were done by selecting the Pathways option. Only KEGG-PATHWAY was selected for specified pathway analysis. Like GO Analysis, the threshold value of the EASE Score for pathways analysis has also been set to 0.01, but unlike GO, the minimum number of gene count has been set to 15 to identify the most crucial pathways. Results Implementation of the DEGCP algorithm on undirected graph instances exhibits optimal/best-known results To assess the applicability of the DEGCP algorithm to biological protein–protein interaction networks, first, we evaluated its performance on multiple undirected graph instances. Accordingly, this algorithm was tested on conventional DIMACS benchmark instances widely employed and originally suggested for graph coloring problems. The proposed DEGCP algorithm was tested fifty times independently on some of the different small, medium, and large-size graphs of varying complexity. Our experimental outcomes for respective graph instances with corresponding edges and vertices (columns 1–3, Table [91]1) demonstrated that the results produced by the proposed algorithm (column 6, Table [92]1) were the same as the optimal (column 4, Table [93]1) or best-known results (column 5, Table [94]1). Furthermore, DEGCP algorithm-derived results achieved optimal or best-known results with a 100% success rate for 17 graph instances (first 17 instances of column 8, Table [95]1), and the success rates for the other instances were within an acceptable range (60–80% for 18–20th instances of column 8, Table [96]1). Notably, our proposed algorithm produced the optimal results within an acceptable time limit (column 7, Table [97]1), which further suggested its efficiency and applicability for graphs with larger order and size. Altogether, the results demonstrated the effectiveness of the proposed algorithm on undirected graphs. Subsequently, we applied the DEGCP algorithm on SARS-CoV-2-Human protein–protein interaction network to delineate a deeper insight into viral infection and its consequences on human pathobiology. Table 1. Results obtained by DEGCP Algorithm with time and success rate DIMACS graph instances Edges Vertex Known optimal [MATH: χ(G) :MATH] Best known result for [MATH: χ(G) :MATH] = ? Results obtained by DEGCP DEGCP time (S) DEGCP success rate homer 1629 561 13 13 13 0.12 50/50 fpsol2.i.1 11,654 496 65 65 65 0.59 50/50 inithx.i.1 18,707 864 54 54 54 1.64 50/50 zeroin.i.1 4100 211 49 49 49 0.11 50/50 le450_25a 8260 450 25 25 25 24.81 50/50 le450_25b 8263 450 25 25 25 8.28 50/50 myciel7 2360 191 8 8 8 0.02 50/50 1-Insertions_6 6337 607 ? 7 7 0.16 50/50 2-Insertions_5 3936 597 ? 6 6 0.12 50/50 3-Insertions_5 9695 1406 ? 6 6 0.66 50/50 4-Insertions_4 1795 475 ? 5 5 0.71 50/50 1-FullIns_5 3247 282 ? 6 6 0.03 50/50 2-FullIns_5 12201 852 ? 7 7 0.27 50/50 3-FullIns_5 33,751 2030 ? 8 8 1.20 50/50 4-FullIns_5 77,305 4146 ? 9 9 5.44 50/50 5-FullIns_4 11,395 1085 ? 9 9 0.35 50/50 wap05a 43,081 905 ? 50 50 118.31 50/50 school1 19,095 385 ? 14 14 3145.21 30/50 school1_nsh 14,612 352 ? 14 14 2459.35 35/50 will199GPIA 7065 701 ? 7 7 924.26 40/50 [98]Open in a new tab PPI network establishment using human interactors of SARS-CoV-2 and graph-coloring algorithm implementation predicts crucial first-level hub proteins To identify crucial human protein facilitators through which SARS-CoV-2 impacts the biological changes in infected individuals, we first constructed a protein–protein interaction network (PPIN). To create this PPIN, human interactors of thirty SARS-CoV-2 proteins were extracted from the BioGRID database (Fig. [99]2A). After that, the PPIN was constructed using 5225 human interactors of thirty SARS-CoV-2 proteins and their associated 2,09,945 interactions from the BioGRID human protein interactome database and visualized through a connected network (Fig. [100]2B). In this network, the human interactors of SARS-CoV-2 were denoted as different colored nodes based on the chromatic coloring approach of using minimum number (not necessarily optimum) of colors to designate adjacent connecting nodes with different colors. The edges represented direct interactions among these proteins, devoid of any indication of regulatory relationships, functional similarities, or directionality of regulation or signal or upstream-downstream demarcation. Thereafter, to find out the important hub proteins among those 5225 interactors, a graph coloring algorithm was implemented on PPIN and the first 25% of color classes were extracted. Additionally, weightage to the degree centrality of the nodes was added to extract potential hub proteins based on three criteria: interaction [MATH: :MATH] 5 with SARS-CoV-2 proteins, intra-interaction [MATH: :MATH] 10 within 5225 proteins, and inter-interaction [MATH: :MATH] 25 with next-level human interactors. The first criterion was used to add extra weightage based on the higher interaction probability with viral proteins, the second criterion was to add the additional impact of the viral interaction on the first-level interactors, and the third criterion was to add the higher likelihood of impacting on infected patients. It yielded 1082 potential hub proteins (Additional file [101]2: Table: Sheet S1) that were designated as L-1 hub proteinsand visualized through a network (Fig. [102]2C). Fig. 2. [103]Fig. 2 [104]Open in a new tab First-level protein–protein interaction network and graph coloring algorithm implemented filtration of HUB proteins. A Occurrence of human interactors of individual SARS-Cov-2 proteins. B PPI network of 5225 total interactors and visualization using chromatic graph coloring approach. C Highlighted PPI network of 1082 L-1 hub proteins and visualization using chromatic graph coloring approach. D Subcellular localization analysis of 1082 L-1 hub proteins. E Interaction frequency of 1082 L-1 hub proteins with 27 different groups of virus families. F Categorization of 27 interacting different groups of virus families based on genome content Evaluation of cellular localization of L-1 hub proteins showed 21% were plasma membrane, 21% were ER membrane, and 14% were golgi membrane-localized proteins whereas 35% were cytosolic proteins (Fig. [105]2D). It suggests that most of the essential direct interactors of SARS-CoV-2 were either membrane or cytosolic proteins. It also supports their higher possibility to interact with SARS-CoV-2 proteins which indirectly validates the potential of our graph coloring approach and weightage degree centrality-based hub protein identification method. The literature and HVIDB database [[106]43] mapping of L-1 hub protein revealed that 80.4% (870 out of 1082) of them were previously reported as possible interactors of other than SARS-CoV-2 viruses (Additional file [107]2: Sheet S3), which belong to 27 different viral families (Fig. [108]2E). Notably, the hub proteins interacting with viral proteins from the same family may or may not fall into the same category. This is because of our undirected protein–protein interaction networking model, where color codes are not assigned based on functional similarity. As SARS-CoV-2 is an RNA virus family member, categorization based on the genetic content of interacting virus families indicates 33.33% of them were DNA virus families, whereas 66.67% were RNA virus families (Fig. [109]2F). It is noteworthy that the majority of the identified hub proteins, associated with both RNA and DNA viruses, may contribute to the manifestations of viral diseases. However, one-third of the DNA-virus family interactors may have distinctive roles in the context of SARS-COVID-19 responses. The complexity and distinct manifestations of COVID-19 compared to other viral diseases prompt the consideration that these hub proteins might play unique roles in the development of the complex disease associated with SARS-CoV-2. Alternatively, these hub proteins may act as common mediators for both DNA and RNA viral infections. This further suggests our method identified essential hub proteins with a higher probability of interacting with RNA viruses. PPI network construction of the second-level interactors of the first-level hub proteins and application of graph coloring algorithm predicts the important second-level hub proteins To characterize the second level of important human proteins through which L-1 hub proteins influence the biological alterations in infected individuals, a second protein–protein interaction network (PPIN2) was constructed. To generate the PPIN2 network the second-level interactors of 1082 L1-hub proteins were extracted from the BioGRID database (Fig. [110]3A). Subsequently, PPIN2 was created using 1082 L1-hub proteins and their second-level 2,68,738 unique inter-interactions with the rest of the 10,371 human interactors from the BioGRID human protein interactome database and presented through a network highlighting the L-1 proteins (Fig. [111]3B). Similar to PPIN1 network, nodes were colored based on the chromatic coloring approach and the edges denoted as direct interactions among the nodes, devoid of any indication of regulatory relationships, functional similarities, or directionality of regulation or signal or upstream-downstream demarcation. The chromatic coloring of nodes were applied based on the similar criteria of PPIN and the first 25% of color classes were extracted as the significant nodes. Furthermore, weightage to the degree centrality of the nodes was added to extract most potential hub proteins based on the criteria: intra-interaction [MATH: :MATH] 100 within a total of 11,453 interactors. This criterion was added to filter out the second-level hub proteins with a higher possibility of impacting infected patients. It yielded 1922 potential hub proteins (Additional file [112]2: Sheet S4) that were designated as L-2 hub proteins and highlighted in the total network (Fig. [113]3C). Fig. 3. [114]Fig. 3 [115]Open in a new tab Second-level human–human protein–protein interaction network and graph coloring algorithm implemented filtration of L-2 hub proteins. A Frequency of human interactors of individual L-1 hub proteins. B PPI network of 1082 L-1 hub proteins and 10,371 total interactors highlighting L-1 hub proteins. C Highlighted PPI network of 1922 L-2 hub proteins and visualization. D Subcellular localization analysis of 1922 L-2 hub proteins. E Interaction frequency of 1922 L-2 hub proteins with 32 different groups of virus families. F Categorization of 32 interacting different groups of virus families based on DNA or RNA genome content Cellular localization analysis of L-2 hub proteins corresponds to 44% of nucleus-localized proteins and 50% of Cytosolic proteins and no or minimal plasma membrane, ER, or Golgi membrane proteins (Fig. [116]3D). It suggested the possibility that the majority of second-level interactors might be directly or indirectly involved in gene regulation which is also expected from the second-level interactors. Hence, it also validates the potential of our method to extract the important hub proteins. HVIDB database [[117]43] mapping for L-2 hub proteins for the interaction information with other viruses revealed that 59% (1134 out of 1982) of them were previously reported as possible interactors for non-SARS-CoV-2 viruses (Additional file [118]2: Sheet S6). Grouping of these viruses according to different families showed they belonged to 32 different viral families (Fig. [119]3E). Similar to L-1 hub proteins, the identified L-2 hub proteins interacting with viral proteins from the same family may or may not fall into the same category because of our proposed protein–protein interaction networking structure. Additionally, genetic content-based categorization of these 32 virus families appeared as 34.38% of them were DNA virus families whereas 65.62% were RNA virus families (Fig. [120]3F). As explained earlier, one-third of the DNA-virus family interactors may have distinctive roles in the context of SARS-COVID-19 responses and might play unique roles in the development of the complex disease associated with SARS-CoV-2. This further suggests our method was able to find out important hub proteins which have a higher probability to interact with RNA viruses like SARS-CoV-2. Collective PPI network-graph coloring model identifies important biological and functional consequences of SARS-CoV-2 infection To demonstrate the efficiency of our proposed model for identifying SARS-CoV-2 infection-linked important hub proteins and to establish their biological implications, L-1 and L-2 hub proteins associated biological and functional consequences and their potentiality to alter the human gene expression in favor of them were explored. Therefore, transcription factor database [[121]44] mapping and DAVID functional annotation tool-based analysis of L-1 and L-2 hub proteins were performed. Cellular localization analysis of L-1 and L-2 hub proteins (Fig. [122]2D, Additional file [123]2: Sheet S2, Fig. [124]3D, Additional file [125]2: Sheet S5) showed that the majority of L-1 hub proteins were cytosolic whereas most of the L-2 hub proteins were nuclear and cytosolic. In an extension of those findings, transcription factor (TF) database mapping on this combined PPI network of L-1 and L-2 proteins demonstrates that none of the L-1 hub proteins were TF whereas 206 out of 1,922 L-2 hub proteins were TF (Fig. [126]4A, Additional file [127]2: Sheet S9). Collectively this result substantiates the possibility that SARS-CoV-2 proteins interact with key first level interactor (L-1 hub proteins) which further interacts with the vital second-level proteins (L-2 hub proteins), which have the potential to enter inside the nucleus and alter gene expression to establish the disease manifestations. Furthermore, it signifies the potential of our proposed model to identify the crucial proteins related to SARS-CoV-2 infection to develop COVID-19 disease. Fig. 4. [128]Fig. 4 [129]Open in a new tab Biological and functional consequences of PPI network-Graph coloring model identified L-1 and L-2 hub proteins. A L-1 and L-2 hub proteins highlighted second level PPI network and identified transcription factors (TF) amongst those hub proteins after TF database mapping. B L-1 and L-2 proteins associated 50 biological processes and their gene count in each process. C KEGG pathway enrichment analysis of L-1 and L-2 hub proteins and 55 represented pathways and their gene count in each pathway To further investigate the all-inclusive involvement of L-1 and L-2 hub proteins in the disease manifestation of COVID-19, DAVID functional analysis for analyzing their involvement in biological process were performed (Fig. [130]4B). The result revealed that a good number of of L-1 and L-2 hub proteins identified by our proposed model were associated with important biological process like regulation of transcription positively and negatively through the alteration of RNA-Pol-II or DNA template or regulating the mRNA splicing and processing as well as protein translation, which might be the possible ways of alteration of host gene expression in favor of the SARS-CoV-2 infection and disease manifestation. Also, a good number of proteins were involved in regulation of apoptosis, cell division and cell cycle, cell proliferation and cell migration, chromatin organization process which might be the possible ways SARS-CoV-2 alter the host cell fates towards the death to induce the injury in different organs mainly in the first infected sight, lung. Some of those identified L-1 and L-2 proteins were engaged in protein phosphorylation and their transport in different organellar location and degradation through ubiquitin pathway that might be the possible ways through which SARS-CoV-2 alters the hosts’ cellular signaling to support their favorable conditions. Our result suggested that another group of proteins were involved in endocytosis and more importantly in accordance to response in hypoxic conditions which might be the important one to support their survival and activity in less oxygen environment which is the common manifestation of SARS-CoV-2 infected hosts (Fig. [131]4B, Additional file [132]2: Sheet S7). Furthermore, to delineate the L-1 and L-2 hub proteins’ involvement in biological pathways through which SARS-CoV-2 essentially alter the homeostasis of host cellular pathways, the KEGG pathways enrichment analysis was performed (Fig. [133]4C). The result showed that a large number of proteins were involved in important cellular signaling pathways like MAPK signaling, cAMP signaling, cGMP-PGK, FoxO and ErbB signaling pathways which might be responsible for altering the host cellular functions to support the infection and disease progression (Fig. [134]4C). A good number of hub proteins were involved in HIF-1 signaling pathways responsible to alter the pathways related to hypoxic condition, suggest that through this alteration SARS-CoV-2 might support their progression at less oxygen conditions. Likewise, proteins involved in p53 signaling pathway were the possible moderator of SARS-CoV-2 through which virus change the host cell death or cell proliferation of the infected organs. Similarly, large number of hub proteins were involved in different cancer related signaling pathways including lung cancer, suggested that those are responsible to change the large number of cellular pathways in favor of infection establishment and disease manifestations (Fig. [135]4C, Additional file [136]2: Sheet S8). This Proposed model underscores the connections between SARS-CoV-2 infection and its patho-physiology To investigate the biological connection between SARS-CoV-2 infection and the development of its pathophysiology, the association of the L-1 and L-2 hub proteins with NIH defined COVID-19 manifestations were evaluated. Therefore, well established COVID-19 symptoms or pathophysiology as per NIH report were listed from the published literature search [[137]45, [138]46]. The four major symptoms categories were identified as most established manifestations of COVID-19 as respiratory, cardiovascular, hematologic, and neuropsychiatric disorders. Based on the literature search (References