Abstract

   Developing a common medication strategy for disease control and
   management could be greatly beneficial. Investigating the differences
   between diseased and healthy states using differentially expressed
   genes aids in understanding disease pathophysiology and enables the
   exploration of protein-drug interactions. This study aimed to find the
   most common genes in diarrhea-causing bacteria such as Salmonella
   enterica serovar Typhimurium, Campylobacter jejuni, Escherichia coli,
   Shigella dysenteriae (CESS) to find new drugs. Thus, differential gene
   expression datasets of CESS were screened through computational
   algorithms and programming. Subsequently, hub and common genes were
   prioritized from the analysis of extensive protein–protein
   interactions. Binding predictions were performed to identify the common
   potential therapeutic targets of CESS. We identified a total of 827
   dysregulated genes that are highly linked to CESS. Notably, no common
   gene interaction was found among all CESS bacteria, but we identified 3
   common genes in both Salmonella-Escherichia and
   Escherichia-Campylobacter infections. Later, out of 73 protein
   complexes, molecular simulations confirmed 5 therapeutic candidates
   from the CESS. We have developed a new pipeline for identifying
   therapeutic targets for a common medication strategy against CESS.
   However, further wet-lab validation is needed to confirm their
   effectiveness.

   Subject terms: Computational biology and bioinformatics, Microbiology,
   Structural biology

Introduction

   Microarrays have revolutionized biotechnology, allowing researchers to
   track down the expression of tens of thousands of genes
   simultaneously^[48]1. In most cases, any microarray experiment results
   in a list of genes found to be differentially expressed. The analysis
   of these large-scale gene expressions has become a fundamental approach
   to the identification of clinical diagnostic factors as well as
   potential drug targets^[49]2. The common challenge here is translating
   such lists of gene expression data into a better understanding of the
   underlying disease phenomena. The first solution in this direction can
   be to translate the gene expression pattern into a functional profile,
   which will offer insight into the cellular mechanisms relevant to the
   given disease condition^[50]3. Over the last decade, high-throughput in
   silico genomics, transcriptomics, and proteomics technologies have
   allowed researchers to rapidly acquire and analyze several thousand
   gene expression profiles in any experiment^[51]4.

   Enteric bacterial pathogens and parasites are the leading cause of
   infectious diarrhea in developing countries^[52]5. Common bacteria that
   cause diarrhea include Salmonella, Escherichia, Shigella, and
   Campylobacter. The virulence of Salmonella enterica serovar Typhimurium
   (S. Typhimurium) greatly depends on two types III secretion systems
   (T3SSs) which are encoded in pathogenicity islands 1 (SPI1) and 2
   (SPI2), respectively^[53]6. These systems translocate proteins called
   effectors into a eukaryotic host cell, where they interfere with
   certain host signal transduction pathways to allow the internalization
   of pathogens and their survival and proliferation inside
   vacuoles^[54]6. Escherichia coli (E. coli) is the primary cause of
   watery diarrhea in infants, often accompanied along with causing
   low-grade fever and vomiting^[55]7. Compared with other pathogens such
   as Shigella and Salmonella, E. coli is typically considered
   non-invasive. However, it encodes a T3SS producing a characteristic
   attaching and effacing (A/E) lesion^[56]8. The effacement of microvilli
   on the epithelial surface induced by A/E lesions contributes to the
   diarrheal phenotype owing to the loss of overall absorptive
   surface^[57]8. Out of the four major Shigella species that cause
   diarrheal disease, Shigella sonnei (S. sonnei) and Shigella flexneri
   (S. flexneri) are the most common species in the U.S. and other
   developed countries^[58]9. However, the changing scenario was observed
   in a 2013 study that shows the sudden emergence of S. sonnei in
   Bangladesh^[59]10. The two other Shigella species, Shigella dysenteriae
   (S. dysenteriae), and Shigella boydii (S. boydii) have a generally low
   infection rate and are found very rarely in developed countries. S.
   dysenteriae produces Shiga toxin, making it the most life-threatening
   of all of these infections, which can also lead to hemolytic uremic
   syndrome (HUS). Releasing the exotoxin by S. dysenteriae compromises
   the central nervous system and the gut, while enterotoxin causes the
   diarrhea^[60]11. The primary cause of inflammation by Shigella involves
   various steps of the invasion process. An initial release of IL-1β by
   the destruction of macrophages after emergence from M-cells attracts
   polymorphonuclear leukocytes (PMNs) that release a precursor to the
   secretagogue adenosine, ultimately activating Cl − secretion. The
   presence of free bacteria on the basolateral side of cells aggravates
   this early step in inflammation^[61]12,[62]13. Campylobacter jejuni (C.
   jejuni) initiates infection by penetrating the gastrointestinal mucus
   using its high motility and spiral shape^[63]14. Then they adhere to
   the gut enterocytes and induce diarrhea by toxin release. C. jejuni
   releases several different enterotoxins and cytotoxins varying from
   strain to strain, and the severity of enteritis correlates with these
   toxins^[64]14. These four bacterial infections in the human body have
   the common disease-causing phenomena which is diarrhea. A significant
   number of host (human) genes get upregulated and downregulated
   simultaneously during these pathogenesis events by four
   diarrhea-causing bacteria. Thus, it is important to decipher the
   molecular mechanisms underlying these dysregulated gene networks in the
   pathogenesis of diarrhea.

   Previously, several investigations have been conducted individually on
   the four bacteria to deduce how their gene expression events contribute
   to various intestinal and systemic infections^[65]15–[66]18. However,
   to date, no work has been done to reveal the gene–gene associations and
   their dysregulation due to the pathogenesis of these four bacteria.
   Therefore, we have addressed all of these four bacteria in our study
   and disclosing the significant gene networks of dysregulated genes due
   to these bacterial infections would reveal plausible drug targets and
   shed light on the possibility of common therapy.

   All the molecular events in the cell are controlled primarily by
   changes in the expression of key genes. Gene transcription is pivotal
   in regular events such as cell division, proliferation,
   differentiation, and cell death. Much interest is therefore focused on
   depicting gene expression profiles to identify the key gene clusters
   whose expression is changed in disease states. Gene–gene interactions
   provide us the information to what extent and how the gene share the
   relation with the other genes. One gene can interact with other genes
   by several ways including domain, motif, co-localization, pathways etc.
   For this reason, one gene can have the great influence on another
   gene’s function and regulation. Therefore, the goal of this study was
   to reveal the commonly dysregulated genes and the significant gene
   networks associated with these four bacterial infections. We have
   analyzed the gene expression pattern, hub genes identification
   pathways, regulatory biomarkers and structural associations of
   interacted proteins involved in these disease progressions.
   Furthermore, we investigated the functional associations of final
   products of these dysregulated genes to scrutinize the major drug
   targets for common therapy.

Results

Analysis of microarray data

   Four gene expression profiles were used in this investigation, namely
   ([67]GSE51043, [68]GSE18810, [69]GSE19315, and [70]GSE36701). A total
   of 215 DEGs (Differentially Expressed Genes) were screened from
   [71]GSE51043 with 72 upregulated genes and 143 downregulated genes.
   When the [72]GSE18810 dataset was examined, it yielded 187 DEGs, 50 of
   which were upregulated and 137 of which were downregulated. In the
   [73]GSE19315 gene chip, 214 DEGs were discovered, with 109 upregulated
   genes and 105 downregulated genes. Finally, the [74]GSE36701 dataset
   yielded 213 DEGs, with 67 upregulated and 146 downregulated genes. All
   the upregulated and downregulated DEGs resulting from four bacterial
   pathogenesis are enlisted in Supplementary files 1–2.

Gene ontologies (GO) and signaling pathways

   In a complex disease state, a varying range of signaling pathways and
   GO terms are involved in the progression of diseases. In this process,
   we used all the DEGs (298 upregulated and 529 downregulated genes) to
   determine significant pathways and gene ontologies that may link
   diarrhea pathogenesis. GO terms and pathways were selected based on the
   number of genes involved and having a p-value less than or equal to
   0.05. Top GO terms were identified as regulation of cell population
   proliferation (27), RNA polymerase II cis-regulatory region
   sequence-specific DNA binding (25) in upregulated genes and nervous
   system development (20), cis-regulatory region sequence-specific DNA
   binding (39) in downregulated genes. In addition to GO terms,
   over-presented signaling pathways were predicted for DEGs. The top
   pathways are cancer pathways (12), cytokine-cytokine receptor
   interaction (10) in upregulated genes, Chemical carcinogenesis (12),
   and Rap1 signaling pathway (11) in downregulated genes. Supplementary
   File 3–5 contains the list of all GO terms and pathways of DEGs based
   p-value less than or equal to 0.05. Furthermore, we identified the
   commonly altered pathways (both due to upregulation and downregulation)
   from the top 30 pathways of each DEG set resulting from the
   pathogenesis of the four bacteria shown in Table [75]1.

Table 1.

   Common altered pathways, names of the DEGs and the responsible
   pathogens for upregulation or downregulation of these DEGs.
   Regulation mechanism Pathways Genes Responsible pathogen for altering
   pathways
   Upregulation Insulin signaling pathway RPS6, HK2, PPP1CA Campylobacter
   RPS6KB1, IRS1 Shigella
   Dopaminergic synapse CAMK2B, PPP1CA Campylobacter
   GNB5 E. coli
   Cytokine-cytokine receptor interaction CCL5, XCL2, XCL1 Campylobacter
   TNFRSF19, GDF3 E. coli
   Autophagy ERN1, PRAP1 Campylobacter
   RPS6KB1, IRS1 Shigella
   Necroptosis CAMK2B, FTL Campylobacter
   STAT5B, TNFAIP3 Salmonella
   ErbB signaling pathway JUN, RPS6KB1, HBEGF Shigella
   STAT5B, EREG Salmonella
   Downregulation Steroid biosynthesis FDFT1 Campylobacter
   EBP, DHCR24, DHCR7 Salmonella
   Circadian entrainment NOS1AP, ADCY6 Campylobacter
   GNAO1, ADCY2 Shigella
   Progesterone-mediated oocyte maturation RPS6KA3, ADCY6 Campylobacter
   MAPK10, ADCY2 Shigella
   PIK3R3, PIK3CD, PIK3R1, CPEB4, ADCY5 Salmonella
   Circadian rhythm NPAS2 Campylobacter
   RORB E. coli
   Retrograde endocannabinoid signaling NDUFB4, ADCY6 Campylobacter
   GNAO1, MAPK10, NAPEPLD, ADCY2 Shigella
   Oxytocin signaling pathway CACNB2, ADCY6 Campylobacter
   GNAO1, ADCY2, EGFR Shigella
   GnRH signaling pathway GNRH1, ADCY6 Campylobacter
   MAPK10, ADCY2, EGFR Shigella
   [76]Open in a new tab

Common genes identified within DEGs

   The Cytoscape software v3.8 and the InteractiVenn tool
   ([77]http://www.interactivenn.net/) were used to identify the common up
   and downregulated genes from the pathogenesis of four bacterial
   species. There was no single common DEG found in all four categories.
   The highest number (3) of commonly downregulated genes were found from
   S. Typhimurium-E. coli and E. coli-C. jejuni infections. The result is
   depicted in Figs. [78]1, [79]2, [80]3.

Figure 1.

   Figure 1
   [81]Open in a new tab

   Schematic workflow of identifying the major druggable targets of
   diarrheal pathogens.

Figure 2.

   [82]Figure 2
   [83]Open in a new tab

   The common upregulated DEGs (black colored diamond boxes) found in the
   pathogenesis of four bacterial species. DEGs from Salmonella, E. coli,
   Shigella, and Campylobacter infection are shown in red, green, yellow,
   and blue colors, respectively.

Figure 3.

   [84]Figure 3
   [85]Open in a new tab

   The common downregulated DEGs (black colored diamond boxes) found in
   the pathogenesis of four bacterial species. DEGs from Salmonella, E.
   coli, Shigella, and Campylobacter infection are shown in red, green,
   yellow, and blue colors, respectively.

Protein–protein (PPI) networks and identified hub proteins

   Two PPI networks have been built from all the up-and downregulated gene
   interactions, shown in Supplementary file 6 (supplementary Fig. 1–2).
   Genes that often interact with other genes are known as hub genes in
   the gene networks. Hub genes typically play an essential function in a
   biological system due to these interactions. The protein–protein
   interaction (PPI) network, which is composed of highly connected (hub)
   genes, has a biological role demonstrated by the centrality-lethality
   rule^[86]19. Further, we employed four methods to determine the hub
   proteins within each group of proteins that differentially regulate
   pathogenesis by four bacteria. Each method identified the top 10 hub
   nodes within the PPI network. Except for the upregulated proteins
   resulting from Campylobacter and Salmonella pathogenesis, more than one
   hub protein was found to be common in all four methods within each
   group. The list of these hub proteins is tabulated in Table [87]2.

Table 2.

   The name and selection methods of the potential hub-proteins.
   Causative pathogen Regulation mechanism Selection methods Hub proteins
   Name
   Salmonella Upregulation MCC, MNC, degree OLR1 Oxidized low-density
   lipoprotein receptor 1
   JAZF1 Juxtaposed with another zinc finger protein 1
   ADAMTS1 A disintegrin and metalloproteinase with thrombospondin motifs
   1
   Downregulation MCC, MNC, DMNC, degree PIK3R1 Phosphatidylinositol
   3-kinase regulatory subunit alpha
   TIAM1 Rho guanine nucleotide exchange factor TIAM1
   E. coli Upregulation MCC, MNC, DMNC, degree GCNT7
   Beta-1,3-galactosyl-O-glycosyl-glycoprotein
   beta-1,6-N-acetylglucosaminyltransferase 7
   GNB5 Guanine nucleotide-binding protein subunit beta-5
   Downregulation MCC, MNC, DMNC, degree CD44 CD44 antigen
   GLI1 Zinc finger protein GLI1
   TAS2R14 Taste receptor type 2-member 14
   Shigella Upregulation MCC, MNC, DMNC, degree HIST1H2BC Histone H2B type
   1-C/E/F/G/I
   Downregulation MCC, MNC, DMNC, degree ADCY2 Adenylate cyclase type 2
   AAK1 AP2-associated protein kinase 1
   Campylobacter Upregulation MCC, MNC, DMNC RPS6 40S ribosomal protein S6
   ADRA1D Alpha-1D adrenergic receptor
   Downregulation MCC, MNC, DMNC, degree RAB3IP Rab-3A-interacting protein
   SYNRG Synergin gamma
   SEC31B Protein transport protein Sec31B
   [88]Open in a new tab

Identified transcriptional and post-transcriptional biomarkers

   Using the common and hub DEGs, we found 276 TFs (Transcriptional
   Factors) and 959 miRNAs (micro RNAs) that might influence the
   expression pattern of those genes and lead to the progression of
   diseases, as depicted in Fig. [89]4 and Supplementary File 8–9. Out of
   276 TFs, the top TFs (i.e., ZNF354C, FOXC1, GATA2, FOXL1, YY1, MEF2A,
   NFIC, TFAP2A, SREBF1) were identified with betweenness centrality ≥ 45
   as shown in Fig. [90]4A. Among all the miRNAs, we identified seventeen
   miRNAs (i.e., hsa-mir-17-5p, hsa-mir-20a-5p, hsa-mir-92a-3p,
   hsa-mir-93-5p, hsa-mir-122-5p, hsa-mir-155-5p, hsa-mir-106b-5p,
   hsa-mir-373-3p, hsa-mir-20b-5p, hsa-mir-329-3p, hsa-mir-520a-3p,
   hsa-mir-520c-3p, hsa-mir-519d-3p, hsa-mir-603, hsa-mir-362-3p,
   hsa-mir-6778-3p, hsa-mir-8485) with betweenness centrality ≥ 100 (Fig.
   [91]4B).

Figure 4.

   [92]Figure 4
   [93]Open in a new tab

   Gene regulatory networks associated with the dysregulated common and
   hub genes. The figure showing (A) gene–TF interacting network and (B)
   gene–miRNA interacting network. The interacting network of miRNAs and
   TFs were filtered with betweenness centrality ≥ 100 and 45,
   respectively.

Domains within the common and hub proteins and 3D structural modeling

   NCBI’s (National Center for Biotechnology Information) conserved domain
   search tool revealed that several common and hub proteins were found to
   have multiple domains. In these cases, we selected one domain from each
   protein based on having a lower E-value, and the proteins that did not
   contain any functional domains predicted by this tool were excluded
   from this study. Out of the 26 domains, 3D structures of 8 domains were
   available in the RCSB PDB (Protein Data Bank) database, 12 of them were
   modeled through MODELLER 9.22, and 5 structures were modeled using the
   trROSETTA server due to having less than 40% query coverages. Table
   [94]3 contains the domain names, sequence length, e-values, and the
   method of 3D structure modeling of each domain. Later, all the modeled
   structures were refined using the GalaxyRefine server. The summary of
   quality assessment results (Ramachandran plot analysis, ERRAT server,
   and ProSA-Web analysis) of the refined structure are shown in
   Supplementary file 6 (Supplementary table 1), while Supplementary file
   7 contains the Ramachandran plots of all the modeled structures.

Table 3.

   Predicted protein domains, sequence length, e-values and details of the
   structural modeling tools used to model their 3D structures.
   Proteins Domains Sequence length E-value Structural modeling tools
   Upregulated common proteins
   KIF9 Kinesin motor domain 6–338 0e + 00 Crystal structure (PDB ID:
   3NWN)
   BRD2 First bromodomain in Brdt_like superfamily 74–180 7.39e−79
   MODELLER 9.22 (Template PDB ID: 6TQ1)
   Downregulated common proteins
   NFIB CTF/NFI DNA-binding domain 2–195 1.27e−19 (trROSETTA) De novo
   folding, guided by deep learning restraints
   RAPH1 Ras-associating (RA) & pleckstrin homology (PH) domain 269–355
   3.90e−48 Crystal structure (PDB ID: 4GN1)
   PIK3R3 N-terminal Src homology 2 (nSH2) domain of p85 163–330 7.92e−91
   MODELLER 9.22 (Template PDB ID: 6G6W_B)
   AP1G2 Adaptin N terminal domain of AP-1 24–574 1.93e−178 MODELLER 9.22
   (Template PDB ID: 6CRIG_G)
   RAB3IP Sec2p domain 191–261 4.62e−05 Crystal structure (PDB ID: 6F6P)
   MCF2L Dbs PH domain 816–947 1.10e−81 MODELLER 9.22 (Template PDB ID:
   1LB1_A)
   Upregulated hub proteins
   OLR1 C-type lectin-like domain 144–266 6.04e−41 MODELLER 9.22 (Template
   PDB ID: 6TLA_A)
   JAZF1 PTRRG2/MT domain 169–230 8.50e−16 trROSETTA (De novo folding,
   guided by deep learning restraints)
   ADAMTS1 Zinc-dependent metalloprotease 258–463 1.24e−112 MODELLER 9.22
   (Template PDB ID: 2JIH_A)
   GNB5 WD40 domain 56–99 4.20e−10 MODELLER 9.22 (Template PDB ID: 6N9G_C)
   GCNT7 Core-2/I-Branching enzyme 111–374 3.06e−46 trROSETTA (Template
   PDB IDs: 2GAM_C, 2GAK_B, 3OTK_A, 6EJ7_A, 6FOA_A)
   HIST1H2BC Histone H2B 28–124 3.56e−53 Crystal structure (PDB ID: 6M4D)
   RPS6 Ribosomal protein S6e 1–215 1.27e−102 MODELLER 9.22 (Template PDB
   ID: 3J7P_S)
   ADRA1D Alpha-1 adrenergic receptors subtype D domain 97–413 3.71e−102
   trROSETTA (Template PDB IDs: 3VW7_A, 5T04_A, 5TGZ_A, 5X33_A, 3ODU_A)
   Downregulated hub proteins
   PIK3R1 RhoGAP domain 114–302 2.26e−122 MODELLER 9.22 (Template PDB ID:
   1PBW_A)
   TIAM1 Pleckstrin Homology (PH) domain 1235–1406 6.08e−107 MODELLER 9.22
   (Template PDB ID: 1FOE_A)
   CD44 Hyaluronan (HA)-binding domain 26–170 3.88e−72 Crystal structure
   (PDB ID: 4PZ4)
   GLI1 FOXP coiled-coil domain 237–265 3.27e−06 Crystal structure (PDB
   ID: 2GLI)
   TAS2R14 TAS2R subtype 14 domain 8–295 5.74e−150 trROSETTA (Template PDB
   IDs: 3VW7_A, 5T04_A, 5X33_A, 5ZKP_A, 4GRV_A)
   ADCY2 Adenylate and Guanylate cyclase catalytic domain 878–1077
   3.09e−80 MODELLER 9.22 (Template PDB ID: 1AZS_B)
   AAK1 Catalytic domain of Numb-Associated kinase (NAK)-like
   Serine/Threonine kinases 42- 316 0e + 00 MODELLER 9.22 (Template PDB
   ID: 5L4Q_A)
   RAB3IP Rab11 binding domain 60–252 7.84e−150 MODELLER 9.22 (Template
   PDB ID: 4UJ3_E)
   SYNRG Eps15 homology domain 316–367 1.76e−13 Crystal structure (PDB ID:
   2MX7)
   SEC31B WD40 domain 13–332 2.21e−28 trROSETTA (Template PDB IDs: 4BZK_C,
   6BM0_B, 5NZV_C, 3JCT, 6T9K_D)
   [95]Open in a new tab

Binding interactions of dysregulated protein domains

   The domain-specific protein–protein docking was performed to anticipate
   their binding affinity and interactions. The docking protocol is shown
   in Fig. [96]5 as a schematic representation. In doing so, the ClusPro
   v2.0 server provided up to 30 docked complexes with different poses.
   The complex with the least energy score and binding pose with
   functional interactions were selected from each docking process. It was
   found that the alpha-1 adrenergic receptors subtype D domain-zinc
   dependent metalloprotease domain complex showed the highest docking
   energy score of –1300 kcal/mol. The highest number (29) of hydrogen
   bonds were present in FOXP coiled-coil domain-PH domain complex. In
   contrast, the highest number (9) of salt-bridges were present in the
   interacting plane of core-2/I-branching enzyme-ribosomal protein s6e
   domain complex. The docking energies, several formed hydrogen bonds,
   and salt bridges of the complexes are enlisted in Supplementary file 6
   (supplementary table 2).

Figure 5.

   [97]Figure 5
   [98]Open in a new tab

   This study's schematic representation of domain-specific molecular
   docking protocol. The figure shows docking between (A) common
   upregulated protein domains, (B) common downregulated protein domains,
   (C) domains of upregulated hub proteins, and (D) domains of
   downregulated hub proteins. The black line and the numbers represent
   molecular docking and the numbers of possible docking combinations
   among them, respectively. From this combination, a total number of 73
   molecular dockings were performed to elucidate their binding affinity.

Stability of the docked complexes and potential drug targets identified for a
single therapy

   We employed molecular dynamics simulation to verify the stability of
   the docked protein complexes and to identify the common drug targets
   having a high association with significant proteins dysregulated from
   the pathogenesis of all four bacteria. A 7.3 microseconds (µs)
   production run was performed to simulate the 73 complexes. Out of them,
   20 complexes remained stable while the others either got unstable
   several times during the simulation period or remained completely
   disassociated at the end of the simulation. From these analyses, we
   identified 5 proteins (Fig. [99]6) that might be targeted for single
   therapy as each has an association with significant proteins
   dysregulated from the pathogenesis of other bacteria. The first
   identified protein is RAPH1, a common protein that downregulates from
   S. Typhimurium and E. coli infections. RAPH1 showed the stable
   interactions with MCF2L and AP1G2 proteins, which was downregulated
   from S. dysenteriae-C. jejuni and S. Typhimurium-C. jejuni infections,
   respectively (Fig. [100]6A). The second group of drug targets
   identified was GCNT7 and ADAMTS1, upregulated from E. coli and S.
   Typhimurium infection, respectively. Both had stable interaction with
   HIST1H2BC and RPS6 proteins which were upregulated from S. dysenteriae
   and C. jejuni infection (Fig. [101]6B). Further stable association of
   GCNT7 was found with ADAMTS1 and JAZF1 also, which are the upregulated
   hub proteins resulting from S. Typhimurium infection. The following 2
   plausible drug targets found were GLI1 and SEC31B, downregulated in the
   human body from E. coli and C. jejuni infection, respectively (Fig.
   [102]6C). GLI1 had a stable association with ADCY2, RAB3IP, TIAM1, and
   SEC31B that downregulates from S. dysenteriae, S. Typhimurium and C.
   jejuni, respectively, while SEC31B had stability with PIK3R1, AAK1, and
   TAS2R14 which was downregulated from S. dysenteriae, S. Typhimurium and
   E. coli pathogenesis (Fig. [103]6C). Thus, targeting these 5 drug
   targets with a single therapy might be a remarkable solution to prevent
   diarrheal disease from any four bacteria species.

Figure 6.

   [104]Figure 6
   [105]Open in a new tab

   The 5 potential drug targets (shown in dotted circle) for single
   therapy. These 5 proteins have stable interaction with other proteins
   resulting from dysregulation by the pathogenesis of any of the four
   bacteria. The figure showing (A) the first drug target, RAPH1, which
   has stable interaction with MCF2L and AP1G2 proteins that downregulates
   from Shigella-Campylobacter and Salmonella-Campylobacter pathogenesis,
   respectively. (B) The second group of drug targets, GCNT7, and ADAMTS1,
   which upregulates from E. coli and Salmonella infections, had stable
   interaction with HIST1H2BC and RPS6 proteins that upregulates from
   Shigella and Campylobacter infection. GCNT7 was also found to be stable
   with ADAMTS1 and JAZF1, which are the upregulated hub proteins
   resulting from Salmonella pathogenesis. (C) The third group of drug
   targets, GLI1 and SEC31B, downregulates from E. coli and Campylobacter
   infection. GLI1 showed a stable association with ADCY2, RAB3IP, TIAM1,
   and SEC31B that downregulates from Shigella, Salmonella, and
   Campylobacter, respectively, while SEC31B had stability with PIK3R1,
   AAK1, and TAS2R14 downregulating from Salmonella, Shigella, and E. coli
   infection.

   The dynamic behavior of the protein complexes was analyzed by RMSD,
   atomic distances of the interacting planes, number of hydrogen bonds,
   Rg, and SASA analysis (Fig. [106]7).

Figure 7.

   [107]Figure 7
   [108]Open in a new tab

   MD simulation results of protein complexes. The figure shows RMSD
   analysis (A1, B1, C1), minimum distances in the interacting residues of
   the complexes (A2, B2, C2), number of hydrogen bonds (A3, B3, C3), Rg
   analysis (A4, B4, C4), and SASA analysis (A5, B5, C5).

   All protein complex configuration changes were analyzed in terms of
   RMSD during the simulation periods. Fig. [109]7(A1, B1, C1) shows that
   rather than some fluctuations in the AnTD-RA_PH, C2/Ibe-RPS6e, and
   FOXP-PH complexes, the RMSD values of the remaining complexes were
   quite stable. Although some fluctuations were observed in the C2/Ibe
   and FOXP bound complexes, they tend to stabilize after 75 ns
   (Figs. [110]7B1 and 7C1). We measured the changes in the minimum
   distances between the residues of interacting planes of the complexes
   during the simulation (Fig. [111]7A2, 7B2, 7C2). The high distance was
   observed in the RPS6e bound C2/IBe and ZDM complexes as well as in the
   C2/IBe-ZDM complex (Fig. [112]7B2). Rather than these, all the
   complexes showed a significantly less distance (< 0.3 nm) between the
   interacting residues throughout the simulation. We also calculated the
   number of hydrogen bonding interactions formed during the simulations
   between the domains of the protein complexes, as shown in Fig. [113]7
   (A3, B3, C3). Hydrogen bonding is one of the primary components
   responsible for molecular interactions in any biological system. In the
   FOXP-Rab11 complex, the highest number of conformations formed up to 30
   hydrogen bonds during the simulation (Fig. [114]7C3). Very few
   conformations formed less than five hydrogen bonds in the rest of the
   complexes. Further, we measured the radius of gyration (Rg) for the
   protein complexes contributing to their compactness shown in Fig.
   [115]7(A4, B4, C4). It can be inferred that all the complexes had
   approximately similar compactness as to their starting structure except
   the AnTD-RA_PH complex, where a higher Rg score was observed
   (Fig. [116]7A4). Finally, the solvent-accessible surface areas (SASAs)
   were analyzed to investigate the changes in the protein volumes upon
   association of complexes (Fig. [117]7A5, 7B5, 7C5). Interesting results
   were observed as almost all the complexes showed slightly decreased
   SASA values compared to the starting point of the simulation. These
   decreased SASA values in the protein complexes denote a relatively
   shrunken nature upon their association. Supplementary movies [118]1–15
   visualizes the 100 ns MD simulation of these 15 complexes related to 5
   identified drug targets (the color codes of proteins are the same as
   shown in Fig. [119]6). The simulation analysis of the remaining 5
   stable complexes and the RMSD analysis of the 53 unstable complexes are
   shown in supplementary file 6 (supplementary Fig. 4).

Binding energies among the protein complexes

   We calculated the binding energy of the last 20 ns of MD production run
   of the protein complexes (associated with 5 identified drug targets)
   with an interval of 50 ps from MD trajectories using the MM/PBSA
   method. Further, we utilized the MmPbSaStat.py script for calculating
   the average free binding energy and its standard deviation/error from
   the output files obtained from the g_mmpbsa package (Table [120]4). The
   interaction between the two proteins is shown in the form of binding
   energy, where the lesser the binding energy, the better the binding of
   the two proteins. The final binding energy is the result of the
   cumulative sum of van der Wall, electrostatic, polar solvation, and
   SASA energy. The majority of the complexes showed favorable binding
   energy between them, which further validates their stability, and the
   ZDM-RPS6E complex (ADAMTS1-RPS6) showed the least binding free energy
   (-3060.937 kJ/mol) among all the complexes.

Table 4.

   MM/PBSA calculations of binding energy for the 14 protein complexes
   associated with the 5 significant drug targets.
   Gene–gene interactions Corresponding protein domain-domain complexes
   Van der Waal energy (KJ mol^−1) Electrostatic energy (KJ mol^−1) Polar
   solvation energy (KJ mol^−1) SASA energy (KJ mol^−1) Binding energy (KJ
   mol^−1)
   Downregulated common proteins (domains)
   RAPH1-AP1G2 RA_PH-AnTD − 680.087 ± 37.247 − 439.522  ± 64.004
   1474.158  ± 194.497 − 86.160  ± 8.274 268.390 + / 165.384
   RAPH1-PIKR3 RA_PH-Dbs PH − 189.003 ± 65.551 − 452.402  ± 231.165
   839.545  ± 333.576 − 24.579  ± 9.925 173.560 + / 123.451
   Upregulated hub proteins (domains)
   GCNT7-RPS6 C2/IBe-RPS6e − 493.664  ± 37.128 2758.720  ± 108.689
   1263.838  ± 158.046 − 62.830  ± 6.490 466.065  ± 148.483
   GCNT7-HIST1H2BC C2/IBe-H2B − 227.021  ± 113.840 421.572  ± 51.336
   437.592  ± 205.328 − 27.198  ± 13.569 604.945  ± 113.255
   GCNT7-JAZF1 C2/IBe-PTRRG2/MT − 139.229  ± 103.401 247.097  ± 37.191
   334.411  ± 230.916 − 20.305  ± 16.120 421.974  ± 151.540
   GCNT7-ADAMTS1 C2/IBe-ZDM − 404.509  ± 408.908 − 1191.466  ± 422.802
   814.694  ± 742.721 − 36.994  ± 37.970 − 818.276  ± 160.821
   ADAMTS1-RPS6 ZDM-RPS6E − 658.815  ± 35.537 − 4263.268  ± 163.136
   1944.392  ± 159.166 − 83.247  ± 5.610 − 3060.937  ± 158.136
   ADAMTS1-HIST1H2BC ZDM-H2B − 547.580  ± 218.170 − 1480.476  ± 444.787
   1673.507  ± 673.833 − 76.334  ± 30.324 − 430.883  ± 94.673
   Downregulated hub proteins (domains)
   GLI1-ADCY2 FOXP-AGC − 342.016  ± 296.396 − 962.872  ± 245.660
   797.151  ± 488.452 − 45.236  ± 37.246 − 552.973  ± 164.306
   GLI1-TIAM1 FOXP-PH − 70.478  ± 175.026 − 156.009  ± 244.743
   277.115  ± 587.029 − 9.406  ± 24.011 41.222  ± 169.806
   GLI1-RAB3IP FOXP-Rab11 − 642.355  ± 198.352 − 921.736  ± 296.892
   1618.073  ± 539.597 − 82.793  ± 25.178 − 28.811  ± 154.599
   GLI1-SEC31B FOXP-WD40 − 638.625  ± 34.584 − 966.445  ± 73.608
   1578.224  ± 173.598 − 79.436  ± 6.530 − 106.282  ± 128.871
   SEC31B-TAS2R14 WD40-TAS2R subtype 14 − 283.437  ± 217.294
   − 614.752  ± 378.329 252.006  ± 241.116 − 29.140  ± 25.235
   − 675.323  ± 403.508
   SEC31B-PIK3R3 WD40-RhoGap − 46.216  ± 50.728 56.612  ± 22.919
   79.744  ± 87.659 − 6.394  ± 7.951 83.747  ± 39.309
   SEC31B-AAK1 WD40-Catalytic − 322.391  ± 33.217 − 272.767  ± 42.810
   855.395  ± 97.526 − 41.380  ± 6.579 218.856  ± 88.269
   [121]Open in a new tab

Discussion

   Examining the molecular mechanisms that drive illness onset and
   progression is generally focused on biomolecules such as genes and
   proteins, whose abnormal expression contributes to changes in cellular
   function and, eventually, disease. By concentrating on illness
   molecular pathways, researchers can uncover crucial events that can be
   addressed with novel therapy techniques. Identifying these
   disease-associated targets is thus an important first step in disease
   mechanism research.

   We focused on the causal agents of diarrheal disease in this study,
   specifically S. Typhimurium, C. jejuni, E. coli, S. dysenteriae (Fig.
   [122]1). After being infected by these bacteria, a large number of
   genes in the human body become dysregulated, either individually or as
   a group. From four datasets available through the Gene Expression
   Omnibus database, we found 298 upregulated and 529 downregulated genes
   (Supplementary files 1–2). The pathways that may play a critical role
   in illness development were chosen. Notably, the majority of these
   genes and pathways have been linked to cancer (Supplementary File 3–4).
   Cancer pathways, cytokine-cytokine receptor interactions, chemical
   carcinogenesis, Rap1 signaling pathway, and nuclear receptors
   meta-pathway are among the predicted pathways (Table [123]1 and
   supplementary file 5). Although diarrhea is a common and often
   dose-limiting complication associated with cancer chemotherapy
   treatment, it is underappreciated and poorly handled^[124]20. CS
   diarrhea affects 80 percent of Carcinoid syndrome (CS) patients, who
   experience diarrhea and flushing, necessitating considerable
   modifications in daily activities and lifestyle^[125]21.
   Proinflammatory cytokines are known to increase in diarrhea-predominant
   irritable bowel syndrome patients, which could explain why
   cytokine-cytokine receptor interactions are occurring in expected
   pathways^[126]22. Ras-associated protein 1 (Rap1) is triggered by
   various stimuli in the Rap1 signaling pathway. It then recruits several
   effectors, resulting in its involvement in essential physiological
   processes such as integrin signalling and ERK activation^[127]23. Epac,
   a family of intracellular cAMP sensors, activates Rap1 by accelerating
   the conversion of GDP-Rap1 to GTP-Rap1. In contrast, active GTP-Rap1
   may have a role in the pathophysiology of secretory diarrhea via the
   RhoA-Rho-associated kinase (ROCK) pathway^[128]24. Like pathways, GO
   terms pathways include the cell population proliferation, RNA
   polymerase II cis-regulatory region sequence-specific DNA binding,
   nervous system development, and cis-regulatory region sequence-specific
   DNA binding, which are relevant to diarrhea.

   Based on the comparative analysis, we identified the common genes among
   the upregulated and downregulated DEGs (Fig. [129]2, [130]3). It was
   found that the highest number of three genes were common among the
   downregulated DEGs resulting from S. Typhimurium-E. coli and E. coli-C.
   jejuni infections (Fig. [131]2). Two PPI networks were built using the
   DEGs to display their relationship and identify the key disease
   modulators in diarrhea (supplementary Fig. 1–2). The
   centrality-lethality rule states that deleting a protein node that is
   highly connected (a "hub") is more likely to be fatal to an organism
   than deleting a node that is weakly connected (a "non-hub"). The
   centrality-lethality rule is commonly regarded to reflect the role of
   network design in defining network function since hubs are more
   important than non-hubs in organizing the global network structure. Hub
   proteins have eight or more interactions, whereas non-hub proteins have
   four or fewer interactions^[132]25. Because they have many interacting
   partners within a network, hub proteins are considered functionally
   significant^[133]26. We identified eighteen hub proteins (OLR1, JAZF1,
   ADAMTS1, PIK3R1, TIAM1, GCNT7, GNB5, CD44, GLI1, TAS2R14, HIST1H2BC,
   ADCY2, AAK1, RPS6, ADRA1D, RAB3IP, SYNRG, SEC31B) implicated in
   diarrheal pathogenesis by each of the four bacteria using various
   approaches (Table [134]2). TFs also influence the rate of
   transcription^[135]27, and miRNA is involved in RNA silencing and gene
   expression regulation at the post-transcriptional level^[136]28. As a
   result, both are necessary to comprehend the progression of a certain
   disease. We identified several TFs, such as NFIC, FOXC1, FOXL1, and
   ZNF345, which are known to be involved in DNA-binding transcription
   factor activity^[137]29–[138]32 (Fig. [139]4 and supplementary file 8).
   The remaining transcription factors, such as YY2, TFAP2A, GATA2, MEF2A,
   and SREBF1, are implicated in positive and negative regulation of the
   transcription of several target genes, branchiooculofacial syndrome
   (BOFS), development and proliferation of hematopoietic and endocrine
   cell lineages, muscle development, neuronal differentiation, cell
   growth control, and apoptosis as well as sterol
   biosynthesis^[140]33–[141]36. Moreover, YY1, GATA2, MEF2A, FOXC1, and
   SREBF1 were found to involve in Irritable Bowel Syndrome
   (IBS)^[142]37–[143]41. Mir-92a-3p, mir-122-5p, mir-143-3p, mir-106b-5p,
   and mir-6826-3p were found to be linked to lupus erythematosus^[144]42,
   lipoprotein metabolism^[145]43, acute ischemic stroke^[146]44, chronic
   thromboembolic pulmonary hypertension^[147]45, and neuronal function
   loss^[148]46 among the 24 miRNAs studied (supplementary file 9).
   Gastric, cervical, pancreas, lung, colon, colorectal, thyroid, ovarian,
   prostate, hepatocellular carcinoma, osteosarcoma, and testicular germ
   cell malignancies were all involved in the remaining 19
   miRNAs^[149]47–[150]50. Also, previous research showed that mir-15,
   miR-16, miR-125b, and mir-106b were involved in Irritable Bowel
   Syndrome (IBS) with diarrhea^[151]51,[152]52 while miR-143, miR-145,
   miR-21, miR-155, miR-21, miR-92a, miR-122, miR-17, mir-106a and
   mir-362-3p were found to involved in ulcerative colitis and other
   gastrointestinal tract diseases^[153]53–[154]59.

   Identifying common proteins and hub proteins in each of the four
   protein groups dysregulated by four bacterial infections prompted us to
   investigate the possibility of a functional relationship between them
   to understand their co-regulation better. We identified the domains
   inside each protein and used multiple up-to-date modeling tools to
   model their 3D structures before subjecting them to molecular docking
   utilizing an integrated procedure (Figs. [155]5, [156]6, Table [157]3
   and supplementary file 7). Among the protein complexes, we found a wide
   range of binding energies, hydrogen bonds, and salt bridges
   (supplementary table 1). So, in order to confirm the docking complexes'
   stability and get a better understanding of their possible interaction
   and co-regulation, we performed molecular dynamics simulations on all
   73 protein complexes and observed that just 20 of them are stable in
   the solvated state. This explains that only 20 of these 73 protein
   complexes might co-regulate (either upregulation or downregulation)
   during diarrhea by four bacterial pathogenesis. Following these
   investigations, we identified five proteins (RAPH1, GCNT7, ADAMTS1,
   GLI1, and SEC31B) that have a stable, functional relationship with the
   other hub proteins resulting from the pathogenesis of four bacteria
   (Fig. [158]6).

   The RMSD analysis revealed that all of the complexes were stable and
   showed reduced changes during the simulation period, based on the
   interpretation of post-MD data (Fig. [159]7A1, 7B1, 7C1). Except for
   the C2/Ibe-RPS6e and C2/IBe-ZDM complexes, measurements of changes in
   the minimum distances between the residues of interaction planes of the
   complexes during simulation revealed that the remaining complexes had a
   distance of less than 0.3 nm between the interacting residues
   (Fig. [160]7A2, 7B2, 7C2). The stability of the protein complexes was
   further validated by a sufficient number of H-bond estimations during
   MD simulation (Fig. [161]7A3, 7B3, 7C3). The gyration radius revealed
   that the protein complexes maintained a consistent level of compactness
   across time (Fig. [162]7A4, 7B4, 7C4). The SASA study revealed that the
   complexes obtained less volume as a result of their interaction, which
   could be the cause of protein functional alterations (Fig. [163]7A5,
   7B5, 7C5). The MM/PBSA results revealed that the majority of the
   proteins bind efficiently among themselves, as evidenced by good
   binding free energies (Table [164]4). In particular, the ADAMTS1
   protein exhibited the most efficient binding energies with the GCNT7,
   RPS6, and HIST1H2BC proteins, which are upregulated hub proteins
   resulting from pathogenesis by E. coli, C. jejuni and S. dysenteriae
   respectively, while the ADAMTS1 protein itself is upregulated by S.
   Typhimurium (Table [165]4). These analyses make ADAMTS1 the most
   plausible therapeutic target of all the five identified proteins for
   the development of common drugs against diarrhea.

   The biological function and activity of a cell are driven by switching
   on and off gene expression. Conversely, gene transcription is a
   facilitator of the pathogenic events that drive the evolution and
   progression of the disease, as well as directing the response to
   therapy. By comparing gene expression profiles under different disease
   conditions, individual genes or their corresponding protein products
   can be identified as therapeutic targets. Our study reports five such
   proteins, RAPH1, GCNT7, ADAMTS1, GLI1, and SEC31B, which are strong
   binding partners of other significant proteins and could thus be
   targeted for the discovery of a common medication system against
   diarrheal disease.

Material and methods

   A step-wise protocol consisting of two approaches was followed to
   identify the major druggable targets against the four diarrheal
   pathogens. The workflow is depicted in Fig. [166]1.

Retrieval of microarray data

   Gene Expression Omnibus (GEO) is an internationally acclaimed online
   database^[167]60 ([168]https://www.ncbi.nlm.nih.gov/geo/) by National
   Center for Biotechnology Information (NCBI) for high-throughput
   sequencing data, microarray, and hybridization array data. We
   downloaded four datasets from this database with accession numbers
   [169]GSE51043 ([170]GSM1236481-[171]GSM1236489)^[172]6 for S.
   Typhimurium, [173]GSE18810 ([174]GSM466514-[175]GSM466519)^[176]61
   forE. coli, [177]GSE19315 ([178]GSM479983-[179]GSM479991)^[180]62 for
   S. dysenteriae, and [181]GSE36701
   ([182]GSM899034-[183]GSM899254)^[184]63 for C. jejuni for this study.
   Further, the limma^[185]64 and DESeq2^[186]65 package of R was used to
   analyze these datasets. We used the False Discovery Rate (FDR) to find
   the dysregulated genes from the RNA-seq data analysis. The amount of
   gene expression between the control and sample was determined using the
   statistical criterion of log2fold change (bacteria-infected patients).
   The log2fold change parameters (2 and − 2) were adjusted to reflect the
   higher significant upregulated and downregulated genes, which indicate
   that the genes are dysregulated (upregulated and downregulated).
   Following that, we also employed a statistically significant P
   (probability) value (P ≤ 0.05) to identify genes that were dysregulated
   because a P value larger than 0.05 indicates that no difference between
   the control and the sample was seen.

Gene ontology (GO) and pathway enrichment analysis of differentially
expressed genes (DEG)

   The signaling pathways and gene ontologies associated with the up-and
   downregulated genes were predicted using different databases via the
   Enrichr enrichment analysis online tool^[187]66
   ([188]https://maayanlab.cloud/Enrichr/). For pathways, we considered
   KEGG^[189]67,[190]68 (2021), while biological process (2021) and
   molecular function (2021) were evaluated for gene ontologies. The
   significant pathways were filtered using the p-value with a cutoff
   score set to 0.05.

Identification of common gene expression signature in DEGs

   We identified the common dysregulated (both upregulated and
   downregulated) genes from four bacteria using the Cytoscape software
   v3.8^[191]69 the InteractiVenn tool
   ([192]http://www.interactivenn.net/). The name of the bacterial species
   was set as a node, and the DEGs were set as target nodes to generate
   the network. Further, the resulting network showed the common genes
   among the four species.

Analysis of protein–protein interaction (PPI) and identification of hub
protein networks

   Protein–protein interaction (PPI) of the DEGs was analyzed using the
   STRING database^[193]70 with a confidence score of ≥ 0.4. The organism
   was specified as H. sapiens, and the generated PPI network was
   visualized using the Cytoscape software. We also generated a PPI
   network of all the up-and downregulated DEGs of four bacteria
   separately and determined potential hubs within these networks by
   applying different local-based methods using the cytoHubba^[194]71
   plugin in Cytoscape v3.8. Based on the relationship between the node
   and its direct neighbor, the local method ranked the hub proteins. In
   total, four local rank methods were considered, i.e., maximal clique
   centrality (MCC), maximum neighborhood component (MNC), the density of
   maximum neighborhood component (DMNC), and degree method.

Identification of regulatory biomolecules among the common and hub genes

   Transcription factors (TFs) and microRNAs (miRNAs) are regulatory
   molecules responsible for significant changes in transcription and
   expression results. Therefore, we deployed experimentally verified
   JASPAR^[195]72 and miRTarbase v6.0^[196]73 datasets to anticipate
   TF–gene and miRNA–gene interactions via the NetworkAnalyst v3.0^[197]74
   web tool. Both networks were visualized with Cytoscape v3.8.

Identification and structural modeling of domains within the common and hub
proteins

   The distinct functional and structural units in a protein are domains
   responsible for a particular function contributing to the overall role
   of a protein^[198]75. The domains of analyzed significant hub proteins
   were predicted using NCBI’s CD-Search tool
   ([199]https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi))^[200]76.
   Further, the domains’ three-dimensional (3D) were searched and
   retrieved from the PDB database ([201]https://www.rcsb.org/). The
   structures that were not present in the database were modeled using
   MODELLER 9.22^[202]77 and the trROSETTA^[203]78 sever based on query
   coverage and further refined through the GalaxyRefine^[204]79 server.
   The best structure from MODELLER was chosen using the DOPE and GA341
   objective functions, where a higher GA341 and/or lower DOPE score
   indicates a high quality of a generated model. Further, the modeled
   structures were assessed using PROCHECK^[205]80 and ERRAT^[206]80 tools
   from SAVES 6.0 server and ProSA-web^[207]81 analysis program.

Assessment of binding interactions between protein domains

   Molecular docking analysis was performed to study the stable
   domain-domain interactions among all the proteins and related
   sub-cellular functions. The ClusPro 2.0: protein–protein docking
   server^[208]82 was used for this purpose. In this server, the PIPER
   docking program^[209]83 is used by the rigid body docking phase, which
   relies on the Fast Fourier Transform (FFT)^[210]84 correlation
   approach. PIPER depicts the interaction energy between two protein
   molecules using an expression of form E;
   E = w[1]E[rep] + w[2]E[attr] + w[3]E[elec] + w[4]E[DARS,] where, E[rep]
   and E[attr] represent the attractive and repulsive contributions to the
   van der Waals interaction energy, E elec is electrostatic energy, while
   E[DARS]^[211]85 is a pairwise structure-based potential that primarily
   represents desolvation contributions. The ClusPro 2.0 server can
   differentiate thousands of conformations of the protein on the basis of
   different desolvation and electrostatic potentials. Following the
   docking process, each complex with the least binding energy was
   submitted to the PDBsum^[212]86 server
   ([213]http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html)
   to view the residues involved in the interacting planes.

Molecular dynamics simulation and binding energy calculation

   Molecular dynamics simulation of the docked protein complexes was
   performed using GROMACS 5.1.4^[214]87,[215]88 version on Linux 5.4
   package. The GROMOS96 54a7^[216]89 was the selected force field as this
   parameter set has the enhanced capacity of the backbone NH and CO
   groups to form hydrogen bonds with each other resulting in reproducing
   the folding equilibria slightly better and sampling more 314-helical or
   hairpin conformations than the previous 53A6 or 45A3 force
   fields^[217]90. The protein complexes were solvated using simple point
   charge (SPC) water molecules in a rectangular box where the required
   number of Na + and Cl − ions were added to electrically neutral the
   simulation system. Upon setting the salt concentrations to 0.15 mol/L,
   the solvated systems were subjected to energy minimization for 5000
   steps using the steepest descent method. Afterward, an NVT (constant
   number of particles, volume, and temperature) ensemble and an NPT
   (constant number of particles, pressure, and temperature) ensemble were
   conducted at 300 K temperature and 1 atm for a duration of 100
   picoseconds (ps) to equilibrate the systems. Throughout the simulation,
   V‐rescale and Parrinello‐Rahman were selected as the thermostat and
   barostat, respectively. Finally, the production runs of all the protein
   complexes were performed at 300 K for a duration of 100 ns
   (nanoseconds) in a GPU (Graphics processing unit) accelerated
   supercomputing system that was provided by the Bioinformatics Division
   of the National Institute of Biotechnology (NIB), Bangladesh.
   Thereafter, in order to evaluate the stability of the complexes, root
   mean square deviation (RMSD), root mean square fluctuation (RMSF), a
   number of hydrogen bonds, the radius of gyration (Rg), atomic
   distances, and solvent accessible surface area (SASA) were analyzed and
   represented in the form of plots using the Qtgrace program.

   Further, to calculate the binding energies through the MM/PBSA
   (Molecular Mechanics/Poisson Boltzmann Surface Area) method, the
   g_mmpbsa^[218]91 package of GROMACS was used, followed by the final MD
   run to get a more detailed overview of the biomolecular interactions
   between the two domains in every protein complexes. The total ΔG[bind]
   of each protein–protein complex was determined from the free solvation
   energy (polar and nonpolar solvation energies) and potential energy
   (electrostatic and Van der Waals interactions). The binding energies
   were calculated using the following equation in this method:
   [MATH: <mrow><mi
   mathvariant="normal">Δ</mi><msub><mtext>G</mtext><mtext>binding</mtext>
   </msub><mo>=</mo><msub><mtext>G</mtext><mtext>complex</mtext></msub><mo
   >-</mo><mrow><mo
   stretchy="false">(</mo><msub><mtext>G</mtext><mtext>protein1</mtext></m
   sub><mo>+</mo><msub><mtext>G</mtext><mtext>protein2</mtext></msub><mo
   stretchy="false">)</mo></mrow><mo>.</mo></mrow> :MATH]

   Here, the ΔG[binding] = the total binding energy of the protein–protein
   complex, G[protein1] = the binding energy of the first protein, and
   G[protein2] = the binding energy of the second protein.

Ethics approval and consent to participate

   Not applicable.

Consent for publication

   Not applicable.

Conclusion

   The identification of therapeutic targets is critical for the
   development of novel medications to treat pathogen-related disorders.
   According to our findings, five crucial genes from the CESS are likely
   candidates for common drug discovery against the CESS. The
   pharmaceutical and scientific communities may be interested in this
   innovative method through differential gene expression for identifying
   targets for future therapeutic development research.

Supplementary Information

   [219]Supplementary Information 1.^ (2.9MB, docx)
   [220]Supplementary Information 2.^ (11.9KB, docx)
   [221]Supplementary Information 3.^ (246.2MB, zip)
   [222]Supplementary Information 4.^ (15.1KB, xlsx)
   [223]Supplementary Information 5.^ (17.7KB, xlsx)
   [224]Supplementary Information 6.^ (279.1KB, xlsx)
   [225]Supplementary Information 7.^ (69.4KB, xlsx)
   [226]Supplementary Information 8.^ (54.2KB, xlsx)
   [227]Supplementary Information 9.^ (4.9MB, pdf)
   [228]Supplementary Information 10.^ (15.3KB, xlsx)
   [229]Supplementary Information 11.^ (25.6KB, xlsx)

Acknowledgements