Abstract Some recent studies showed that severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections and idiopathic pulmonary fibrosis (IPF) disease might stimulate each other through the shared genes. Therefore, in this study, an attempt was made to explore common genomic biomarkers for SARS-CoV-2 infections and IPF disease highlighting their functions, pathways, regulators and associated drug molecules. At first, we identified 32 statistically significant common differentially expressed genes (cDEGs) between disease (SARS-CoV-2 and IPF) and control samples of RNA-Seq profiles by using a statistical r-package (edgeR). Then we detected 10 cDEGs (CXCR4, TNFAIP3, VCAM1, NLRP3, TNFAIP6, SELE, MX2, IRF4, UBD and CH25H) out of 32 as the common hub genes (cHubGs) by the protein–protein interaction (PPI) network analysis. The cHubGs regulatory network analysis detected few key TFs-proteins and miRNAs as the transcriptional and post-transcriptional regulators of cHubGs. The cDEGs-set enrichment analysis identified some crucial SARS-CoV-2 and IPF causing common molecular mechanisms including biological processes, molecular functions, cellular components and signaling pathways. Then, we suggested the cHubGs-guided top-ranked 10 candidate drug molecules (Tegobuvir, Nilotinib, Digoxin, Proscillaridin, Simeprevir, Sorafenib, Torin 2, Rapamycin, Vancomycin and Hesperidin) for the treatment against SARS-CoV-2 infections with IFP diseases as comorbidity. Finally, we investigated the resistance performance of our proposed drug molecules compare to the already published molecules, against the state-of-the-art alternatives publicly available top-ranked independent receptors by molecular docking analysis. Molecular docking results suggested that our proposed drug molecules would be more effective compare to the already published drug molecules. Thus, the findings of this study might be played a vital role for diagnosis and therapies of SARS-CoV-2 infections with IPF disease as comorbidity risk. Subject terms: Computational biology and bioinformatics, Drug discovery Introduction The severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) is the main cause of COVID-19 pandemic that brings a major threat for human life as well as economy around the world^[42]1,[43]2. It was first detected from Wuhan town in China at the end of 2019 and rapidly spread all over the world with several symptoms like as cough, fever, and pneumonia diarrhea, severe respiratory diseases and become a complex and deadly health concern^[44]3,[45]4. The World Health Organization (WHO) announced it as a momentous pandemic of twenty-first century on March 11, 2020. It is highly homologous to the SARS coronavirus (SARS-CoV) which was responsible for the respiratory pandemic during the 2002–2003 period^[46]5,[47]6. Coronaviruses are single-stranded RNA viruses of ~ 30 kb. Based on their genomic structures, they are generally classified into four genera known as α, β, γ, and δ. The life cycle of SARS-CoV-2 with the host consists of the five steps classified as attachment, penetration, biosynthesis, maturation and release. The SARS-CoV-2 enters in to the host cells through the membrane fusion or endocytosis (penetration) after binding to the host receptor proteins (attachment). Once viral proteins (biosynthesis) including the major protease (MPro/3ClPro), the papain-like protease (PLpro) and the RNA-dependent RNA polymerase (RdRp) are released inside the host cells, viral RNA enters to the nucleus for replication. Thus, create new viral particles (maturation) and released. Coronaviruses consist of four structural proteins; Spike (S), membrane (M), envelop (E) and nucleocapsid (N)^[48]7. The spike (S) protein of SARS-CoV-2 interacts with the host ACE2 (angiotensin-converting enzyme 2) receptor protein to stimulate the infection^[49]8–[50]10. ACE2 is highly expressed in lung, heart, kidney, ileum and bladder^[51]11,[52]34. In the case of lung, ACE2 is highly expressed in lung epithelial cells which leads the interstitial lung damage including epithelial and endothelial injury with excessive fibroproliferation^[53]12. Until September 2022, there have been around 614,825,354 confirmed cases of COVID-19, including 6,536,284 million deaths^[54]13. Though different vaccination programs reduced the severity of SARS-CoV-2 infections worldwide, however, it is yet one of the most severe risk factor for some chronic diseases including idiopathic pulmonary fibrosis (IPF) disease, since they stimulate each other very much^[55]12,[56]14. The IPF disease is a chronic, progressive lung syndrome which leads to the respiratory collapse and decline the lung function^[57]12. The primary symptoms of IPF are dry cough and breathing complexity^[58]15. The average survival time of a patients suffering from IPF is approximately 3 years after the first diagnosis and therapies. Initially, IPF has been considered as a chronic inflammatory process^[59]16, but recent studies showed that abnormally activated alveolar epithelial cells (AECs) are the main factor responsible for the fibrotic response because they release cytokines that stimulate the fibroblasts^[60]17. IPF is typified by the progressive and fatal accumulation of fibroblasts and extracellular matrix (ECM) in the lung, leading to distortion of the lung architecture and reduction in lung function. So, Individuals with inflammatory lung disease are at a higher risk of death from COVID-19^[61]18,[62]19. Therefore, identification of SARS-CoV-2 and IPF diseases causing shared genes is required for diagnosis and therapies of COVID-19 patients with IPF disease as comorbidity. Taz et al.^[63]14 tried to identify shared genomic biomarkers for diagnosis and therapies of COVID-19 patients with IPF disease as comorbidity. They analyzed bulk RNA-Seq profiles for SARS-CoV-2 infections and microarray gene-expression profiles for IPF disease and found only 11 common differentially expressed genes (cDEGs) to separate both COVID-19 and IPF patients from control samples. They identified top-ranked 5 genes as common hub-genes (cHubGs) by protein–protein interaction (PPI) analysis, where only 2 cHubGs were detected from the cDEGs and the rest 3 cHubGs did not belongs to their cDEGs-set. Also, they did not examine their common differential expression patterns by from any other databases, which indicates that their cHubGs-set were not so representative of their cDEGs-set. Another drawback in their study was that they used microarray data instead of RNA-Seq data for identification of differentially expressed genes (DEGs) between IPF disease and control samples, though RNA-Seq data perform better than microarray data in identifying DEGs^[64]20. Therefore, in this study, an attempt was made to explore comparatively more representative and effective cHubGs-set for SARS-CoV-2 and IPF diseases from RNA-Seq profiles for diagnosis and therapies of COVID-19 patients with IPF disease as comorbidity by using the integrated bioinformatics analyses. The pipeline of this study is given in Fig. [65]1. Figure 1. [66]Figure 1 [67]Open in a new tab The pipeline of this study. Materials and methods Data sources and descriptions We used both original data and meta-data to reach the goal of this study as described below. Collection of RNA-Seq profiles for SARS-CoV-2 infections, IPF disease and control samples We collected RNA-Seq profiles for SARS-CoV-2 infections, IPF disease and control samples from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database ([68]http://www.ncbi.nlm.nih.gov/geo/) to explore both diseases causing common genes. The SARS-CoV-2 infected patient’s RNA-Seq profiles were downloaded from [69]GPL18573 platform of Illumina NextGen 500 (Homo sapiens) with the GEO accession numbers [70]GSE147507 published by Blanco-Melozet al.^[71]21. We collected 18 case and 29 control samples of RNA-Seq profiles from 3 cell lines (1) normal human bronchial epithelium, alveolar cells, transformed lung-derived Calu-3 cells of lung tissues for SARS-CoV-2 infections. On the hand, the IPF patient’s RNA-Seq profiles were downloaded from [72]GPL11154 platform of Illumina HiSeq 2000 (Homo sapiens) with the GEO accession numbers [73]GSE52463 published by Nance et al.^[74]22. This dataset contained 16 samples including 8 IPF and 8 control samples collected from lung tissues. Collection of meta-drug agents for exploring candidate drugs Beck et.al.^[75]23 suggested SARS-CoV-2 main protease (3CLpro)-guided top-ranked 90 drug molecules out of 3410 FDA approved anti-viral drugs for the treatment against COVID-19 by molecular docking analysis. In our study, we collected their suggested top-ranked 90 drug molecules as the meta-drug agents [see Table [76]S1(I)]. We also collected host transcriptome-guided 95 meta-drug agents that are recommended for COVID-19 or IPF diseases [see Table [77]S1(II)] in different published articles. Thus, we considered total 90 + 95 = 185 drugs to explore most probable candidate drug agents by molecular docking with our proposed receptors. Collection of independent meta-receptors for in-silico validation of the proposed drugs We selected the top-ranked 11 hub-genes (independent meta-receptors) as the independent meta-receptors associated with COVID-19 and IPF disease by reviewing 23 published articles [see Table [78]S1(III)] for in-silico validation of the proposed drug molecules by molecular docking. Identification of DEGs from RNA-Seq profiles by using edgeR To explore DEGs between case and control samples from RNA-Seq profiles [79]GSE147507 and [80]GSE52463 of NCBI-GEO database, we considered a popular statistical approach known as edgeR. In order to introduce edgeR, let [MATH: Xgi :MATH] denote the total number of read counts for gth gene (g = 1, 2, …, G) in the ith sample (i = 1, 2, …, n), which is assumed to be followed negative binomial (NB) distribution in the edgeR setting^[81]24,[82]25. That is, [MATH: XNBμgi,δg, :MATH] where the parameters are described by the mean and variance as [MATH: μgi=EXgi=Miπgi :MATH] and [MATH: VXgi=μgi(1+μgiδg) :MATH] , respectively. Here, [MATH: Mi :MATH] is the total number of short reads of RNA-Seq profiles, [MATH: πgi :MATH] denote the fraction of all cDNA fragments for gth gene in the ith sample so that [MATH: g=1Gπgi=1, :MATH] [MATH: δg :MATH] is the squired coefficient of variation of [MATH: πgi :MATH] based on the replicates i. The NB distribution convert to Poisson distribution when [MATH: δg=0 :MATH] . According to the generalized linear model (GLM) approach, the mean response, [MATH: μgi :MATH] , is linked to a linear predictor as [MATH: Log(μgi)=ziTβg+Log(Mi), :MATH] where z[i] is the ith column of the design matrix (Z) including the covariates (e.g. batch effects, experimental conditions, etc.), [MATH: βg :MATH] is a q × 1 vector of regression parameters for differential expression patterns. The regression vector [MATH: βg :MATH] is estimated by maximum likelihood estimation (MLE) and calculated iterative way as [MATH: βgnew=β< /mi>gold+(ZTDgZ)-1< msup>ZTug :MATH] where [MATH: Dg :MATH] is the diagonal matrix of working weights, [MATH: ugi=(xgi-μgi)/(1+δgμgi) :MATH] . The dispersion parameter [MATH: δg :MATH] is calculated as [MATH: δ^g=argmaxAPLgδg+τ.APLgSδg, :MATH] where [MATH: APLgδg=Lδg;Xg,β^g-12Log|Ig| :MATH] is the adjusted profile likelihood (APL) in terms of [MATH: δg :MATH] , penalized for the estimation of the regression parameters [MATH: βg :MATH] , [MATH: Xg :MATH] is the vector of counts for gene g, [MATH: β^g :MATH] is the estimated coefficient vector, L(·) is the log-likelihood function, [MATH: Ig :MATH] is the Fisher information matrix and |·| is the determinant, [MATH: τ :MATH] is the prior degree of freedom afforded to the shared likelihood and [MATH: APLgSδg= 1|C|kCAPLk(δg). :MATH] Then the edgeR approach allows us for testing the significance of any parameter or any contrast or linear combination of the parameters in the linear model. Gene-wise hypothesis testing are conducted by computing likelihood ratio (LRT) statistic [MATH: L(θg,0< /mn>|X)/L(θg|X) :MATH] to compare the null hypothesis (H[0]) that [MATH: βg=0 :MATH] (insignificant gene) against the two-sided alternative (H[1]), [MATH: β0 :MATH] (indicating DEGs), where [MATH: θg={β,δ} :MATH] and [MATH: θg,0=δ :MATH] . The log-LRT follows asymptotically chi-square distribution under H[0]. Adjusted P.values are computed to control the FDR (false discovery rate) by the Benjamini and Hochberg approach^[83]26. We implemented this algorithm to identify DEGs from our downloaded RNA-Seq count datasets [84]GSE147507 and [85]GSE52463 for SARS-CoV-2 and IPF diseases, respectively, by using edgeR, an R-package in Bioconductor^[86]25. To separate up and down-regulated DEGs, we for the combined data as follows [MATH: DEGg=Upregulated,ifadj.Pg.value<0.05 andLog2(aFC)g>+ 1.0Downregulated,ifadj.Pg.value<0.05 andLog2(aFC)g<- 1.0 :MATH] where [MATH: adj.Pg.value :MATH] is the adjusted P.value and [MATH: (aFC)g=X< mo>¯gcase/X¯gcontrol :MATH] : fold change on average expressions, for gth gene. Identification of SARS-CoV-2 and IPF diseases causing common DEGs (cDEGs) Let [MATH: CUR :MATH] and [MATH: CDR :MATH] indicates the upregulated and downregulated DEGs-sets respectively for Covid-19 patients. Again let [MATH: IUR :MATH] and [MATH: IDR :MATH] be the upregulated and downregulated DEGs-sets respectively, for IPF patients. Then, we defined common upregulated gene-set as [MATH: GUR=(CURIUR) :MATH] and common downregulated as [MATH: GDR=(CDRIDR) :MATH] . Finally, we considered common DEGs (cDEGs) set as [MATH: cDEG=(GURGDR) :MATH] that can differentiate both group of patients from the control samples. Therefore, in this study, we considered this cDEGs-set for further investigation. Protein–protein interaction (PPI) network analysis of cDEGs In order to explore both SARS-CoV-2 and IPF diseases causing common hub-genes (cHubGs), we performed protein–protein interaction (PPI) network analysis of cDEGs by using the STRING databases^[87]27. To build the PPI network, the distance "D" between the proteins (u, v) is computed as, [MATH: Du,v=2|NuNv|Nu+|Nv| :MATH] where [MATH: Nu :MATH] and [MATH: Nv :MATH] are the neighbor sets of u and v, respectively. To improve the quality of PPI networks, we used the Cytoscape web-tool^[88]28. The Cytoscape plugin cytoHubba was used to select the Hub-Genes (HubGs) or Hub-Proteins (HubPs) from PPI networks^[89]28,[90]29. The PPI network provides a number of nodes and edges, which indicates proteins and their interactions, respectively. The HubGs were selected from the PPI network by using five topological measures including Degree (Deg)^[91]30, BottleNeck (BN)^[92]31, Betweenness (BC)^[93]32, Stress (Str)^[94]33, and Closeness (Clo)^[95]34. Regulatory network analysis of cHubGs To explore key transcription factors (TFs) and micro-RNAs (miRNAs) as the transcriptional and post-transcriptional regulators of cHubGs, we performed TFs-cHubGs, miRNAs-cHubGs, and TFs-miRNAs-cHubGs interaction networks based on the JASPAR^[96]35, TarBase^[97]36 and RegNetwork^[98]37 databases, respectively by using the NetworkAnalyst web-tool^[99]38. The Cytoscape software^[100]28 was used to improve the quality of networks. The key regulators were selected by using two topological measures (Degree (Deg)^[101]30 and Betweenness (BC)^[102]32) of networks. GO terms and KEGG pathway enrichment analysis of cDEGs highlighting cHubGs To explore the pathogenetic processes of cDEGs, we performed enrichment analysis of cHubGs with different gene ontology (GO) terms (BPs: biological processes, MFs: molecular functions and CCs: cellular components)^[103]39 and Kyoto encyclopedia of genes and genomes (KEGG) pathways^[104]40. Here, BPs are different types of molecular activities that are essential for the functioning of integrated living entities including cells, tissues, and organs. MFs are different types of fundamental molecular activity of a gene product, including catalysis. CCs are different types of components of a cell or the extracellular space. The KEGG pathway database consists of a set of pathway maps that represent the molecular interaction and relationship networks for genetic information processing, metabolism, human diseases, and drug development. Let Si denote the reference/annotated gene-set in the ith GO-term or KEGG-pathway, Mi denote the total number genes in S[i] (i = 1, 2,…, r); N denote the total number of reference/annotated genes that construct the whole-set [MATH: S=i= 1rSi< mo>=SiS ic :MATH] such that [MATH: Ni= 1rMi< mo>; :MATH] where [MATH: Sic :MATH] is the complement set of S[i]. Again, let n represent the total number of cDEGs and ki represent the number of cDEGs that are a part of the annotated gene-set S[i.] To examine the enrichment of ith GO-term or KEGG-pathway by the cDEGs-set, the following contingency table (Table [105]1) is constructed. Table 1. [MATH: 2×2 :MATH] Contingency table. Annotated gene-set (Reference) cDEGs-set Not cDEGs-set Marginal total S[i] k[i] M[i] − k[i] M[i] [MATH: Sic :MATH] n − k[i] N − M[i] − n + k[i] N − M[i] Marginal total n N − n N (Grand total) [106]Open in a new tab We considered three web tools (Enrichr^[107]41, DAVID^[108]42,[109]43, and GeneCodis^[110]44) to explore significantly enriched GO-terms and KEGG pathways by using the chi-square ( [MATH: χ2) :MATH] or Fisher’s exact testing procedures. The [MATH: χ2 :MATH] -testing procedure based on [MATH: 2×2 :MATH] contingency table, is used in GeneCodis to calculate the p-values, while the Fisher’s exact testing procedure is used in both Enrichr and DAVID web-tools. Fisher’s exact testing procedure is formulated based on hypergeometric distribution. This distribution is used to estimate the probability of overlapping exactly k[i] cDEGs with the reference genes in the ith GO-term or pathway (S[i]) out of n cDEGs. Then the p value for testing the significance of k[i] cDEGs in ith GO-term or pathway is calculated as, [MATH: pi=1-j=0< /mrow>kiMijN-Min-jNn,i=1,2< mo>,,r. :MATH] The k[i] cDEGs in S[i] is considered as a significantly enriched gene-set if its adjusted p value (p[i]) < 0.05 at 5% level of significance. We adjusted the p values for each of three procedures by using the Benjamini and Hochberg procedure^[111]26. Association of cHubGs with other disease risk We performed diseases-cHubGs association analysis by using the Enrichr web tool^[112]45 with DisGeNET database^[113]46,[114]47 to explore other diseases that can increase the severity of COVID-19 and IPF both diseases through cHubGs. The DisGeNET database is a comprehensive discovery platform to address association of a gene-set with different disease risk^[115]47. The necessary data for this database has been collected from different online sources including UniProt, Comparative Toxicogenomics Database (CTD), Mouse Genome Database (MGD), Rat Genome Database (RGD) and, peer-reviewed publications on Genome Wide Association Studies (GWAS), Genetics Association Database (GAD), literature of human gene-disease networks (LHGDN), and the BeFree datasets. To measure the gene-disease association (GDA), the DisGeNET web-tool compute a score (S) by using the following formula. [MATH: S=(SUNIPORT+SCTDhuman)+(SMouse+SRat)+(SGDA+SLHGDNA+SBeFree) :MATH] where [MATH: SUNIPORT=0.3, iftheassociationhasbeendeclearedinUnipropt< mtr>0, otherwise< /mfenced> :MATH] [MATH: SCTD\,human=0.3if< /mtext>theassociationhasbeendeclearedinCTDhuman0, otherwise< /mfenced> :MATH] [MATH: SRat=0.3, iftheassociationhasbeendeclearedinCTDorRGDratdataset0, otherwise< /mfenced> :MATH] [MATH: SMouse=0.3, iftheassociationhasbeendeclearedinCTDorRGDmousedataset0, otherwise< /mfenced> :MATH] [MATH: SL=0.1, ifngd×100 NLmax< mtd columnalign="left">ngd×100 NL, ifngd×100 NL<ma x :MATH] and L: LHGDN, GAD, or BeFree, [MATH: ngd :MATH] is the number of publications reported a GDA in the source and [MATH: NL :MATH] is the total number of publications in the source. [MATH: max=0.080ifL=GAD0.06 ifL=LHGDNorBeFree :MATH] Obviously, the DisGeNET score (S) lies between 0 to 1. Prognostic performance of cHubGs To investigate the prognostic performance of cHubGs with lung cancer, we performed multivariate survival analysis of lung cancer patients based on the expressions of cHubGs by using SurvExpress web-tool^[116]48. The TCGA database ([117]https://tcga-data.nci.nih.gov) is used in SurvExpress web-tool for survival analysis with a gene-set. In this analysis, patient samples were divided in to low-risk and high-risk groups by using the median of risk-scores (RSs) which is defined by the linear component of the Cox regression model. That is, [MATH: RS=β1X1+β2X2++βnXn :MATH] , where [MATH: Xi :MATH] is the expression of ith gene, [MATH: βi :MATH] is calculated by using the Cox regression approach^[118]49. Generally, a patient whose risk-score (RS) is greater than median of RSs was considered in the high-risk group, otherwise, the patient belongs to the low-risk group. Then the Kaplan–Meier survival plot for each risk group is constructed for each risk group. The significant difference between two risk groups was investigated by using the concordance indexes (CI), hazard-ratio (HR) and log-rank test^[119]50. The significance level was set to p value < 0.05. Drug repurposing by molecular docking To explore cHubGs-guided FDA approved repurposable drug molecules for the treatment against SARS-CoV-2 infections in presence of IPF risk, we employed molecular docking analysis of cHubGs and its TFs with 185 drug agents as introduced previously in the data sources and descriptions section [see Table [120]S1(I, II)]. The molecular docking analysis requires 3-Dimensional (3D) structures of both receptor proteins and drug agents/ligands. We downloaded 3D structure of all targeted proteins from Protein Data Bank (PDB)^[121]51, AlphaFold Protein Structure Database^[122]52 and SWISS-MODEL^[123]53. The 3D structures of all drug agents were downloaded from PubChem database^[124]54. The 3D structures of the target proteins were visualized by using the Discovery Studio Visualizer 2019^[125]55. Further, the receptor protein was prepared by removing ligand heteroatoms and water molecules and by addition of polar hydrogens on AutoDock tools 1.5.7^[126]56. The drug agents/ligands were prepared by setting the torsion tree and rotatable, nonrotatable bonds present in the ligand through AutoDock tools 1.5.7. Then, pairwise binding affinities between the target proteins and drug agents were calculated using the AutoDock Vina^[127]57. The exhaustiveness parameter was set to 10. Discovery Studio Visualizer 2019^[128]55 and PyMol^[129]58 were used to analyze the docked complexes for its surfaces, types and distances of non-covalent bonds. Let B[ij] is the binding affinity (BA) score corresponding to the ith receptor (i = 1, 2, … , p) and jth drug (j = 1, 2, … , q). The receptors and drug agents were ordered by the decreasing order of their average BA scores for selecting the top-ordered potential drug agents. We compared the binding performance of our suggested drugs with previously suggested candidate drugs by Taz et al.^[130]14 by molecular docking analysis against the Taz et al., suggested receptors as well as top-ranked independent receptors. Finally, we looked into the effectiveness of our suggested drugs with randomly selected independent receptor proteins that were not associated with SARS-CoV-2 infections in the presence of IPF risk. Results Identification of cDEGs The dataset [131]GSE147507 was analyzed by using the edgeR r-package to identify DEGs between SARS-CoV-2 infections and control samples. A total of 851 DEGs were identified by satisfying the cutoff criteria of adjusted p value < 0.05 and |log[2](aFC)|> 1, where 712 DEGs are upregulated and 139 DEGs are downregulated. The [132]GSE52463 dataset was analyzed to identify DEGs between IPF diseases and control samples. A total of 668 DEGs were found according to the same criteria, where 391 DEGs are upregulated and 277 are downregulated. Then we commonly found 27 upregulated cDEGs {TNFAIP3, SELE, MX2, PTX3, CH25H, UBASH3A, EREG, BIRC3, ZBP1, NLRP3, LIF, PRDM1, ADM, VCAM1, FOSL1, CXCR4, CCL22, IRF4, SLC5A5, SYTL3, ADAMTS4, UBD, CCL17, CPNE5, TNFAIP6, IKZF3, TNIP3} and 5 downregulated cDEGs {MVD, KIF12, RAAG1GAP, SLC27A3, TMEM160} that can separate both COVID-19 and IPF patients from the control samples (see Fig. [133]2). Figure 2. Figure 2 [134]Open in a new tab Venn diagrams for visualizing the upregulated and downregulated cDEGs that can separate both COVID-19 and IPF patients from the control samples. PPI network analysis of cDEGs for identification of cHubGs The PPI network of cDEGs was constructed using STRING database (Fig. [135]3) which contains 27 nodes and 115 edges. We selected top ranked 10 cHubGs {CXCR4, TNFAIP3, VCAM1, NLRP3, TNFAIP6, SELE, MX2, IRF4, UBD, CH25H} by applying five topological measures (Deg, BN, BC, Str and Clo) in the PPI network (see Table [136]S2). Figure 3. [137]Figure 3 [138]Open in a new tab Protein–protein interaction (PPI) network of cDEGs for COVID-19 and IPF disease to identify cHubGs. The red color nodes indicate the cHubGs. The regulatory network analysis of cHubGs The cHubGs-TFs interaction network detected top-ranked two sets of TFs proteins {JUN, SRF and NFKB1} and {JUN, SPI1 and NFKB1} based on JASPAR and RegNetwork databases respectively, as the key transcriptional regulatory factors for cHubGs, where two TFs proteins JUN and NFKB1were common in both-sets (see Fig. [139]4B,C). Based on two databases, we observed that JUN is a regulator of 7 cHubGs (TNFAIP3, TNFAIP6, IRF4, SELE, TNFAIP6, NLRP3, VAM1) and NFKB1 for 6 cHubGs (TNFAIP3, IRF4, CXCR4,UBD, SELE, VAM1). The cHubGs-miRNAs interaction network detected top-ranked two sets of miRNAs {hsa-mir-155-5p, hsa-mir-27a-5p, hsa-mir-107, hsa-mir-21-3p, hsa-mir-129-2-3p} and { hsa-mir-155-5p, hsa-mir-154, hsa-mir-613, hsa-mir-21-3p, hsa-mir-132} based on TarBase and RegNetwork databases respectively, as the key post-transcriptional regulatory factors for cHubGs, where two miRNAs hsa-mir-155-5p and hsa-mir-21-3p were common in both-sets (see Figs. [140]4A,C). Based on two databases, we observed that hsa-mir-155-5p regulates 6 cHubGs {IRF4, VAM1, TNFAIP3, TNFAIP6, SELE, CXCR4} and hsa-mir-21-3p regulates 5 cHubGs {VAM1, TNFAIP3, TNFAIP6, MX2, UBD}. Figure 4. [141]Figure 4 [142]Open in a new tab The regulatory network of cHubGs with TFs and miRNAs. The red, light blue and green color nodes represent the cHubGs, TFs and miRNAs. The names of cHubGs, key regulatory TFs and miRNAs were displayed only. (A) The cHubGs-TFs interaction network based on JASPAR database. (B) The miRNA-cHubGs interaction network based on TarBase database. (C) The cHubGs-TFs-miRNAs interaction network based on RegNetwork database. GO functions and KEGG pathway enrichment analysis of cDEGs highlighting cHubGs The Table [143]2 displayed the top five commonly enriched GO terms (BPs, MFs and CCs) and KEGG pathways by cDEGs with three web-tools and databases Enrichr^[144]41, DAVID^[145]42,[146]43 and GeneCodis^[147]44 (p value < 0.001). Among the top five GO terms of BPs, four BPs including inflammatory response, regulation of inflammatory response, response to tumor necrosis factor and response to cytokine were enriched by the cHubGs-sets with all of three databases. The other BP-term (response to virus) was enriched by the cHubGs-sets with each of two databases (DAVID^[148]42,[149]43 and GeneCodis^[150]44). Three MFs (sequence-specific DNA binding, CCR chemokine receptor binding and chemokine activity) out of four, were commonly enriched by the cHubGs-sets with all of three databases. The other MF-term (ubiquitin binding^[151]59) was enriched by the cHubGs-sets with each of two databases (Enrichr^[152]41 and GeneCodis^[153]44). Among the top five significantly enriched CCs terms, four terms including early endosome, extracellular space, cytosol and extracellular region were commonly enriched by the cHubGs-sets with two databases out of 3. Only one CC-term (tertiary granule lumen) was enriched by the cHubGs-sets with all of 3 databases. Interestingly, all of the top five KEGG pathways (TNF signaling pathway, IL-17 signaling pathway, NF-kappa B signaling pathway, NOD-like receptor signaling and Cytokine-cytokine receptor interaction) were commonly enriched by the cHubGs-sets with all of 3 databases. Table 2. The top five commonly enriched GO terms (BPs, MFs and CCs) and KEGG pathways by cDEGs with three web-tools and databases Enrichr^[154]41, DAVID^[155]42,[156]43 and GeneCodis^[157]44 (p value < 0.001). The cHubGs were highlighted by bold gene names. Some references were given