Abstract

   Some recent studies showed that severe acute respiratory syndrome
   coronavirus 2 (SARS-CoV-2) infections and idiopathic pulmonary fibrosis
   (IPF) disease might stimulate each other through the shared genes.
   Therefore, in this study, an attempt was made to explore common genomic
   biomarkers for SARS-CoV-2 infections and IPF disease highlighting their
   functions, pathways, regulators and associated drug molecules. At
   first, we identified 32 statistically significant common differentially
   expressed genes (cDEGs) between disease (SARS-CoV-2 and IPF) and
   control samples of RNA-Seq profiles by using a statistical r-package
   (edgeR). Then we detected 10 cDEGs (CXCR4, TNFAIP3, VCAM1, NLRP3,
   TNFAIP6, SELE, MX2, IRF4, UBD and CH25H) out of 32 as the common hub
   genes (cHubGs) by the protein–protein interaction (PPI) network
   analysis. The cHubGs regulatory network analysis detected few key
   TFs-proteins and miRNAs as the transcriptional and post-transcriptional
   regulators of cHubGs. The cDEGs-set enrichment analysis identified some
   crucial SARS-CoV-2 and IPF causing common molecular mechanisms
   including biological processes, molecular functions, cellular
   components and signaling pathways. Then, we suggested the cHubGs-guided
   top-ranked 10 candidate drug molecules (Tegobuvir, Nilotinib, Digoxin,
   Proscillaridin, Simeprevir, Sorafenib, Torin 2, Rapamycin, Vancomycin
   and Hesperidin) for the treatment against SARS-CoV-2 infections with
   IFP diseases as comorbidity. Finally, we investigated the resistance
   performance of our proposed drug molecules compare to the already
   published molecules, against the state-of-the-art alternatives publicly
   available top-ranked independent receptors by molecular docking
   analysis. Molecular docking results suggested that our proposed drug
   molecules would be more effective compare to the already published drug
   molecules. Thus, the findings of this study might be played a vital
   role for diagnosis and therapies of SARS-CoV-2 infections with IPF
   disease as comorbidity risk.

   Subject terms: Computational biology and bioinformatics, Drug discovery

Introduction

   The severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) is
   the main cause of COVID-19 pandemic that brings a major threat for
   human life as well as economy around the world^[42]1,[43]2. It was
   first detected from Wuhan town in China at the end of 2019 and rapidly
   spread all over the world with several symptoms like as cough, fever,
   and pneumonia diarrhea, severe respiratory diseases and become a
   complex and deadly health concern^[44]3,[45]4. The World Health
   Organization (WHO) announced it as a momentous pandemic of twenty-first
   century on March 11, 2020. It is highly homologous to the SARS
   coronavirus (SARS-CoV) which was responsible for the respiratory
   pandemic during the 2002–2003 period^[46]5,[47]6. Coronaviruses are
   single-stranded RNA viruses of ~ 30 kb. Based on their genomic
   structures, they are generally classified into four genera known as α,
   β, γ, and δ. The life cycle of SARS-CoV-2 with the host consists of the
   five steps classified as attachment, penetration, biosynthesis,
   maturation and release. The SARS-CoV-2 enters in to the host cells
   through the membrane fusion or endocytosis (penetration) after binding
   to the host receptor proteins (attachment). Once viral proteins
   (biosynthesis) including the major protease (MPro/3ClPro), the
   papain-like protease (PLpro) and the RNA-dependent RNA polymerase
   (RdRp) are released inside the host cells, viral RNA enters to the
   nucleus for replication. Thus, create new viral particles (maturation)
   and released. Coronaviruses consist of four structural proteins; Spike
   (S), membrane (M), envelop (E) and nucleocapsid (N)^[48]7. The spike
   (S) protein of SARS-CoV-2 interacts with the host ACE2
   (angiotensin-converting enzyme 2) receptor protein to stimulate the
   infection^[49]8–[50]10. ACE2 is highly expressed in lung, heart,
   kidney, ileum and bladder^[51]11,[52]34. In the case of lung, ACE2 is
   highly expressed in lung epithelial cells which leads the interstitial
   lung damage including epithelial and endothelial injury with excessive
   fibroproliferation^[53]12. Until September 2022, there have been around
   614,825,354 confirmed cases of COVID-19, including 6,536,284
   million deaths^[54]13. Though different vaccination programs reduced
   the severity of SARS-CoV-2 infections worldwide, however, it is yet one
   of the most severe risk factor for some chronic diseases including
   idiopathic pulmonary fibrosis (IPF) disease, since they stimulate each
   other very much^[55]12,[56]14. The IPF disease is a chronic,
   progressive lung syndrome which leads to the respiratory collapse and
   decline the lung function^[57]12. The primary symptoms of IPF are dry
   cough and breathing complexity^[58]15. The average survival time of a
   patients suffering from IPF is approximately 3 years after the first
   diagnosis and therapies. Initially, IPF has been considered as a
   chronic inflammatory process^[59]16, but recent studies showed that
   abnormally activated alveolar epithelial cells (AECs) are the main
   factor responsible for the fibrotic response because they release
   cytokines that stimulate the fibroblasts^[60]17. IPF is typified by the
   progressive and fatal accumulation of fibroblasts and extracellular
   matrix (ECM) in the lung, leading to distortion of the lung
   architecture and reduction in lung function. So, Individuals with
   inflammatory lung disease are at a higher risk of death from
   COVID-19^[61]18,[62]19. Therefore, identification of SARS-CoV-2 and IPF
   diseases causing shared genes is required for diagnosis and therapies
   of COVID-19 patients with IPF disease as comorbidity.

   Taz et al.^[63]14 tried to identify shared genomic biomarkers for
   diagnosis and therapies of COVID-19 patients with IPF disease as
   comorbidity. They analyzed bulk RNA-Seq profiles for SARS-CoV-2
   infections and microarray gene-expression profiles for IPF disease and
   found only 11 common differentially expressed genes (cDEGs) to separate
   both COVID-19 and IPF patients from control samples. They identified
   top-ranked 5 genes as common hub-genes (cHubGs) by protein–protein
   interaction (PPI) analysis, where only 2 cHubGs were detected from the
   cDEGs and the rest 3 cHubGs did not belongs to their cDEGs-set. Also,
   they did not examine their common differential expression patterns by
   from any other databases, which indicates that their cHubGs-set were
   not so representative of their cDEGs-set. Another drawback in their
   study was that they used microarray data instead of RNA-Seq data for
   identification of differentially expressed genes (DEGs) between IPF
   disease and control samples, though RNA-Seq data perform better than
   microarray data in identifying DEGs^[64]20. Therefore, in this study,
   an attempt was made to explore comparatively more representative and
   effective cHubGs-set for SARS-CoV-2 and IPF diseases from RNA-Seq
   profiles for diagnosis and therapies of COVID-19 patients with IPF
   disease as comorbidity by using the integrated bioinformatics analyses.
   The pipeline of this study is given in Fig. [65]1.

Figure 1.

   [66]Figure 1
   [67]Open in a new tab

   The pipeline of this study.

Materials and methods

Data sources and descriptions

   We used both original data and meta-data to reach the goal of this
   study as described below.

Collection of RNA-Seq profiles for SARS-CoV-2 infections, IPF disease and
control samples

   We collected RNA-Seq profiles for SARS-CoV-2 infections, IPF disease
   and control samples from the National Center for Biotechnology
   Information (NCBI) Gene Expression Omnibus (GEO) database
   ([68]http://www.ncbi.nlm.nih.gov/geo/) to explore both diseases causing
   common genes. The SARS-CoV-2 infected patient’s RNA-Seq profiles were
   downloaded from [69]GPL18573 platform of Illumina NextGen 500 (Homo
   sapiens) with the GEO accession numbers [70]GSE147507 published by
   Blanco-Melozet al.^[71]21. We collected 18 case and 29 control samples
   of RNA-Seq profiles from 3 cell lines (1) normal human bronchial
   epithelium, alveolar cells, transformed lung-derived Calu-3 cells of
   lung tissues for SARS-CoV-2 infections. On the hand, the IPF patient’s
   RNA-Seq profiles were downloaded from [72]GPL11154 platform of Illumina
   HiSeq 2000 (Homo sapiens) with the GEO accession numbers [73]GSE52463
   published by Nance et al.^[74]22. This dataset contained 16 samples
   including 8 IPF and 8 control samples collected from lung tissues.

Collection of meta-drug agents for exploring candidate drugs

   Beck et.al.^[75]23 suggested SARS-CoV-2 main protease (3CLpro)-guided
   top-ranked 90 drug molecules out of 3410 FDA approved anti-viral drugs
   for the treatment against COVID-19 by molecular docking analysis. In
   our study, we collected their suggested top-ranked 90 drug molecules as
   the meta-drug agents [see Table [76]S1(I)]. We also collected host
   transcriptome-guided 95 meta-drug agents that are recommended for
   COVID-19 or IPF diseases [see Table [77]S1(II)] in different published
   articles. Thus, we considered total 90 + 95 = 185 drugs to explore most
   probable candidate drug agents by molecular docking with our proposed
   receptors.

Collection of independent meta-receptors for in-silico validation of the
proposed drugs

   We selected the top-ranked 11 hub-genes (independent meta-receptors) as
   the independent meta-receptors associated with COVID-19 and IPF disease
   by reviewing 23 published articles [see Table [78]S1(III)] for
   in-silico validation of the proposed drug molecules by molecular
   docking.

Identification of DEGs from RNA-Seq profiles by using edgeR

   To explore DEGs between case and control samples from RNA-Seq profiles
   [79]GSE147507 and [80]GSE52463 of NCBI-GEO database, we considered a
   popular statistical approach known as edgeR. In order to introduce
   edgeR, let
   [MATH: <msub><mi>X</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub> :MATH]
   denote the total number of read counts for gth gene (g = 1, 2, …, G) in
   the ith sample (i = 1, 2, …, n), which is assumed to be followed
   negative binomial (NB) distribution in the edgeR setting^[81]24,[82]25.
   That is,
   [MATH: <mrow><mi>X</mi><mo>∼</mo><mi>N</mi><mi>B</mi><mfenced close=")"
   open="("><msub><mi>μ</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><mo>,</mo><msub><mi>δ</mi><mi
   >g</mi></msub></mfenced><mo>,</mo></mrow> :MATH]
   where the parameters are described by the mean and variance as
   [MATH: <mrow><msub><mi>μ</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><mo>=</mo><mi>E</mi><mfenced
   close=")" open="("><msub><mi>X</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub></mfenced><mo>=</mo><msub><mi
   >M</mi><mi>i</mi></msub><msub><mi>π</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub></mrow> :MATH]
   and
   [MATH: <mrow><mi>V</mi><mfenced close=")"
   open="("><msub><mi>X</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub></mfenced><mo>=</mo><msub><mi
   >μ</mi><mrow><mi mathvariant="italic">gi</mi></mrow></msub><mrow><mo
   stretchy="false">(</mo><mn>1</mn><mo>+</mo><msub><mi>μ</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><msub><mi>δ</mi><mi>g</mi></m
   sub><mo stretchy="false">)</mo></mrow></mrow> :MATH]
   , respectively. Here,
   [MATH: <msub><mi>M</mi><mi>i</mi></msub> :MATH]
   is the total number of short reads of RNA-Seq profiles,
   [MATH: <msub><mi>π</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub> :MATH]
   denote the fraction of all cDNA fragments for gth gene in the ith
   sample so that
   [MATH:
   <mrow><msubsup><mo>∑</mo><mrow><mi>g</mi><mo>=</mo><mn>1</mn></mrow><mi
   >G</mi></msubsup><msub><mi>π</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><mo>=</mo><mn>1</mn><mo>,</mo
   ></mrow> :MATH]
   [MATH: <msub><mi>δ</mi><mi>g</mi></msub> :MATH]
   is the squired coefficient of variation of
   [MATH: <msub><mi>π</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub> :MATH]
   based on the replicates i. The NB distribution convert to Poisson
   distribution when
   [MATH:
   <mrow><msub><mi>δ</mi><mi>g</mi></msub><mo>=</mo><mn>0</mn></mrow>
   :MATH]
   . According to the generalized linear model (GLM) approach, the mean
   response,
   [MATH: <msub><mi>μ</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub> :MATH]
   , is linked to a linear predictor as
   [MATH: <mrow><msub><mrow><mi>L</mi><mi>o</mi><mi>g</mi><mo
   stretchy="false">(</mo><mi>μ</mi></mrow><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><mrow><mo
   stretchy="false">)</mo><mo>=</mo><msub><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mrow><mi>i</mi></mrow><mi>T</mi
   ></msubsup><mrow><mi
   mathvariant="bold-italic">β</mi></mrow></mrow><mi>g</mi></msub><msub><m
   row><mo>+</mo><mi>L</mi><mi>o</mi><mi>g</mi><mo
   stretchy="false">(</mo><mi>M</mi></mrow><mi>i</mi></msub><mo
   stretchy="false">)</mo></mrow><mo>,</mo></mrow> :MATH]
   where z[i] is the ith column of the design matrix (Z) including the
   covariates (e.g. batch effects, experimental conditions, etc.),
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mi>g</mi></msub> :MATH]
   is a q × 1 vector of regression parameters for differential expression
   patterns. The regression vector
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mi>g</mi></msub> :MATH]
   is estimated by maximum likelihood estimation (MLE) and calculated
   iterative way as
   [MATH: <mrow><msubsup><mi>β</mi><mrow><mi>g</mi></mrow><mrow><mi
   mathvariant="italic">new</mi></mrow></msubsup><mo>=</mo><msubsup><mi>β<
   /mi><mrow><mi>g</mi></mrow><mrow><mi
   mathvariant="italic">old</mi></mrow></msubsup><mo>+</mo><msup><mrow><mo
   stretchy="false">(</mo><msup><mrow><mi>Z</mi></mrow><mi>T</mi></msup><m
   sub><mi>D</mi><mi>g</mi></msub><mi>Z</mi><mo
   stretchy="false">)</mo></mrow><mrow><mo>-</mo><mn>1</mn></mrow></msup><
   msup><mrow><mi>Z</mi></mrow><mi>T</mi></msup><msub><mi>u</mi><mi>g</mi>
   </msub></mrow> :MATH]

   where
   [MATH: <msub><mi>D</mi><mi>g</mi></msub> :MATH]
   is the diagonal matrix of working weights,
   [MATH: <mrow><msub><mi>u</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><mo>=</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>x</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><mo>-</mo><msub><mi>μ</mi><mr
   ow><mi mathvariant="italic">gi</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow><mo stretchy="false">/</mo><mrow><mo
   stretchy="false">(</mo><mn>1</mn><mo>+</mo><msub><mi>δ</mi><mi>g</mi></
   msub><msub><mi>μ</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   . The dispersion parameter
   [MATH: <msub><mi>δ</mi><mi>g</mi></msub> :MATH]
   is calculated as
   [MATH: <mrow><msub><mover accent="true"><mi>δ</mi><mo
   stretchy="true">^</mo></mover><mi>g</mi></msub><mo>=</mo><mi
   mathvariant="normal">argmax</mi><mfenced close="}" open="{"><msub><mi
   mathvariant="normal">APL</mi><mi>g</mi></msub><mfenced close=")"
   open="("><msub><mi
   mathvariant="normal">δ</mi><mi>g</mi></msub></mfenced><mo>+</mo><mi>τ</
   mi><mo>.</mo><msubsup><mi
   mathvariant="normal">APL</mi><mrow><mi>g</mi></mrow><mi>S</mi></msubsup
   ><mfenced close=")" open="("><msub><mi
   mathvariant="normal">δ</mi><mi>g</mi></msub></mfenced></mfenced><mo>,</
   mo></mrow> :MATH]

   where
   [MATH: <mrow><msub><mi
   mathvariant="normal">APL</mi><mi>g</mi></msub><mfenced close=")"
   open="("><msub><mi
   mathvariant="normal">δ</mi><mi>g</mi></msub></mfenced><mo>=</mo><mi>L</
   mi><mfenced close=")" open="("><msub><mi
   mathvariant="normal">δ</mi><mi>g</mi></msub><mo>;</mo><msub><mi
   mathvariant="normal">X</mi><mi>g</mi></msub><mo>,</mo><msub><mover
   accent="true"><mi>β</mi><mo
   stretchy="true">^</mo></mover><mi>g</mi></msub></mfenced><mo>-</mo><mfr
   ac><mn>1</mn><mn>2</mn></mfrac><mi>L</mi><mi>o</mi><mi>g</mi><mrow><mo
   stretchy="false">|</mo><msub><mi>I</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo></mrow></mrow> :MATH]

   is the adjusted profile likelihood (APL) in terms of
   [MATH: <msub><mi>δ</mi><mi>g</mi></msub> :MATH]
   , penalized for the estimation of the regression parameters
   [MATH: <msub><mi>β</mi><mi>g</mi></msub> :MATH]
   ,
   [MATH: <msub><mi mathvariant="normal">X</mi><mi>g</mi></msub> :MATH]
   is the vector of counts for gene g,
   [MATH: <msub><mover accent="true"><mi>β</mi><mo
   stretchy="true">^</mo></mover><mi>g</mi></msub> :MATH]
   is the estimated coefficient vector, L(·) is the log-likelihood
   function,
   [MATH: <msub><mi>I</mi><mi>g</mi></msub> :MATH]
   is the Fisher information matrix and |·| is the determinant,
   [MATH: <mi>τ</mi> :MATH]
   is the prior degree of freedom afforded to the shared likelihood and
   [MATH: <mrow><msubsup><mi
   mathvariant="normal">APL</mi><mrow><mi>g</mi></mrow><mi>S</mi></msubsup
   ><mfenced close=")" open="("><msub><mi
   mathvariant="normal">δ</mi><mi>g</mi></msub></mfenced><mo>=</mo><mfrac>
   <mn>1</mn><mrow><mo stretchy="false">|</mo><mi>C</mi><mo
   stretchy="false">|</mo></mrow></mfrac><munder><mo>∑</mo><mrow><mi>k</mi
   ><mo>∈</mo><mi>C</mi></mrow></munder><msub><mi
   mathvariant="normal">APL</mi><mi>k</mi></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="normal">δ</mi><mi>g</mi></msub><mo
   stretchy="false">)</mo></mrow><mo>.</mo></mrow> :MATH]

   Then the edgeR approach allows us for testing the significance of any
   parameter or any contrast or linear combination of the parameters in
   the linear model. Gene-wise hypothesis testing are conducted by
   computing likelihood ratio (LRT) statistic
   [MATH: <mrow><mi>L</mi><mrow><mo
   stretchy="false">(</mo><msub><mi>θ</mi><mrow><mi>g</mi><mo>,</mo><mn>0<
   /mn></mrow></msub><mo stretchy="false">|</mo><mi>X</mi><mo
   stretchy="false">)</mo></mrow><mo
   stretchy="false">/</mo><mi>L</mi><mrow><mo
   stretchy="false">(</mo><msub><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>X</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   to compare the null hypothesis (H[0]) that
   [MATH: <mrow><msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mo>=</mo><mn>0</mn></mro
   w> :MATH]
   (insignificant gene) against the two-sided alternative (H[1]),
   [MATH: <mrow><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mo>≠</mo><mn>0</mn></mrow>
   :MATH]
   (indicating DEGs), where
   [MATH: <mrow><msub><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mi>g</mi></msub><mo>=</mo><mrow
   ><mo stretchy="false">{</mo><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mo>,</mo><mrow></mrow><mi>δ</mi
   ><mo stretchy="false">}</mo></mrow></mrow> :MATH]
   and
   [MATH:
   <mrow><msub><mi>θ</mi><mrow><mi>g</mi><mo>,</mo><mn>0</mn></mrow></msub
   ><mo>=</mo><mi>δ</mi></mrow> :MATH]
   . The log-LRT follows asymptotically chi-square distribution under
   H[0]. Adjusted P.values are computed to control the FDR (false
   discovery rate) by the Benjamini and Hochberg approach^[83]26. We
   implemented this algorithm to identify DEGs from our downloaded RNA-Seq
   count datasets [84]GSE147507 and [85]GSE52463 for SARS-CoV-2 and IPF
   diseases, respectively, by using edgeR, an R-package in
   Bioconductor^[86]25. To separate up and down-regulated DEGs, we for the
   combined data as follows
   [MATH:
   <mrow><msub><mtext>DEG</mtext><mi>g</mi></msub><mo>=</mo><mfenced
   open="{"><mrow><mtable><mtr><mtd
   columnalign="left"><mrow><mtext>Upregulated</mtext><mo>,</mo></mrow></m
   td><mtd><mrow><mtext>if</mtext><mspace
   width="0.166667em"></mspace><mtext>adj</mtext><mo>.</mo><msub><mi>P</mi
   ><mi>g</mi></msub><mo>.</mo><mtext>value</mtext><mo><</mo><mn>0.05</mn>
   <mspace width="0.166667em"></mspace><mtext>and</mtext><mspace
   width="0.166667em"></mspace><msub><mtext>Log</mtext><mn>2</mn></msub><m
   sub><mrow><mo stretchy="false">(</mo><mtext>aFC</mtext><mo
   stretchy="false">)</mo></mrow><mi>g</mi></msub><mo>></mo><mo>+</mo><mn>
   1.0</mn></mrow></mtd></mtr><mtr><mtd
   columnalign="left"><mrow><mrow></mrow><mrow><mtext>Downregulated</mtext
   ><mo>,</mo></mrow></mrow></mtd><mtd><mrow><mtext>if</mtext><mspace
   width="0.277778em"></mspace><mtext>adj</mtext><mo>.</mo><msub><mi>P</mi
   ><mi>g</mi></msub><mo>.</mo><mtext>value</mtext><mo><</mo><mn>0.05</mn>
   <mspace width="0.166667em"></mspace><mtext>and</mtext><mspace
   width="0.166667em"></mspace><msub><mtext>Log</mtext><mn>2</mn></msub><m
   sub><mrow><mo stretchy="false">(</mo><mtext>aFC</mtext><mo
   stretchy="false">)</mo></mrow><mi>g</mi></msub><mo><</mo><mo>-</mo><mn>
   1.0</mn></mrow></mtd></mtr></mtable></mrow></mfenced><mspace
   width="0.333333em"></mspace></mrow> :MATH]

   where
   [MATH: <mrow><mi
   mathvariant="normal">adj</mi><mo>.</mo><msub><mi>P</mi><mi>g</mi></msub
   ><mo>.</mo><mi mathvariant="normal">value</mi></mrow> :MATH]
   is the adjusted P.value and
   [MATH: <mrow><mo stretchy="false">(</mo><msub><mrow><mi
   mathvariant="normal">aFC</mi><mo stretchy="false">)</mo></mrow><mi
   mathvariant="normal">g</mi></msub><mo>=</mo><msubsup><mover><mi>X</mi><
   mo>¯</mo></mover><mrow><mi>g</mi></mrow><mi
   mathvariant="normal">case</mi></msubsup><mo
   stretchy="false">/</mo><msubsup><mover><mi>X</mi><mo>¯</mo></mover><mro
   w><mi>g</mi></mrow><mi
   mathvariant="normal">control</mi></msubsup></mrow> :MATH]
   : fold change on average expressions, for gth gene.

Identification of SARS-CoV-2 and IPF diseases causing common DEGs (cDEGs)

   Let
   [MATH: <msub><mi mathvariant="normal">C</mi><mi
   mathvariant="normal">UR</mi></msub> :MATH]
   and
   [MATH: <msub><mi mathvariant="normal">C</mi><mi
   mathvariant="normal">DR</mi></msub> :MATH]
   indicates the upregulated and downregulated DEGs-sets respectively for
   Covid-19 patients. Again let
   [MATH: <msub><mi mathvariant="normal">I</mi><mi
   mathvariant="normal">UR</mi></msub> :MATH]
   and
   [MATH: <msub><mi mathvariant="normal">I</mi><mi
   mathvariant="normal">DR</mi></msub> :MATH]
   be the upregulated and downregulated DEGs-sets respectively, for IPF
   patients. Then, we defined common upregulated gene-set as
   [MATH: <mrow><msub><mi>G</mi><mrow><mi
   mathvariant="italic">UR</mi></mrow></msub><mo>=</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>C</mi><mrow><mi
   mathvariant="italic">UR</mi></mrow></msub><mo>∩</mo><msub><mi>I</mi><mr
   ow><mi mathvariant="italic">UR</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   and common downregulated as
   [MATH: <mrow><msub><mi>G</mi><mrow><mi
   mathvariant="italic">DR</mi></mrow></msub><mo>=</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>C</mi><mrow><mi
   mathvariant="italic">DR</mi></mrow></msub><mo>∩</mo><msub><mi>I</mi><mr
   ow><mi mathvariant="italic">DR</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   . Finally, we considered common DEGs (cDEGs) set as
   [MATH: <mrow><mi mathvariant="normal">cDEG</mi><mo>=</mo><mo
   stretchy="false">(</mo><msub><mi mathvariant="normal">G</mi><mi
   mathvariant="normal">UR</mi></msub><mo>∪</mo><msub><mi
   mathvariant="normal">G</mi><mi mathvariant="normal">DR</mi></msub><mo
   stretchy="false">)</mo></mrow> :MATH]
   that can differentiate both group of patients from the control samples.
   Therefore, in this study, we considered this cDEGs-set for further
   investigation.

Protein–protein interaction (PPI) network analysis of cDEGs

   In order to explore both SARS-CoV-2 and IPF diseases causing common
   hub-genes (cHubGs), we performed protein–protein interaction (PPI)
   network analysis of cDEGs by using the STRING databases^[87]27. To
   build the PPI network, the distance "D" between the proteins (u, v) is
   computed as,
   [MATH: <mrow><mi>D</mi><mfenced close=")"
   open="("><mi>u</mi><mo>,</mo><mi>v</mi></mfenced><mo>=</mo><mfrac><mrow
   ><mrow><mn>2</mn><mo
   stretchy="false">|</mo></mrow><msub><mi>N</mi><mi>u</mi></msub><mo>∩</m
   o><msub><mi>N</mi><mi>v</mi></msub><mrow><mo
   stretchy="false">|</mo></mrow></mrow><mrow><mfenced close="|"
   open="|"><msub><mi>N</mi><mi>u</mi></msub></mfenced><mo>+</mo><mrow><mo
   stretchy="false">|</mo><msub><mi>N</mi><mi>v</mi></msub><mo
   stretchy="false">|</mo></mrow></mrow></mfrac></mrow> :MATH]

   where
   [MATH: <msub><mi>N</mi><mi>u</mi></msub> :MATH]
   and
   [MATH: <msub><mi>N</mi><mi>v</mi></msub> :MATH]
   are the neighbor sets of u and v, respectively. To improve the quality
   of PPI networks, we used the Cytoscape web-tool^[88]28. The Cytoscape
   plugin cytoHubba was used to select the Hub-Genes (HubGs) or
   Hub-Proteins (HubPs) from PPI networks^[89]28,[90]29. The PPI network
   provides a number of nodes and edges, which indicates proteins and
   their interactions, respectively. The HubGs were selected from the PPI
   network by using five topological measures including Degree
   (Deg)^[91]30, BottleNeck (BN)^[92]31, Betweenness (BC)^[93]32, Stress
   (Str)^[94]33, and Closeness (Clo)^[95]34.

Regulatory network analysis of cHubGs

   To explore key transcription factors (TFs) and micro-RNAs (miRNAs) as
   the transcriptional and post-transcriptional regulators of cHubGs, we
   performed TFs-cHubGs, miRNAs-cHubGs, and TFs-miRNAs-cHubGs interaction
   networks based on the JASPAR^[96]35, TarBase^[97]36 and
   RegNetwork^[98]37 databases, respectively by using the NetworkAnalyst
   web-tool^[99]38. The Cytoscape software^[100]28 was used to improve the
   quality of networks. The key regulators were selected by using two
   topological measures (Degree (Deg)^[101]30 and Betweenness
   (BC)^[102]32) of networks.

GO terms and KEGG pathway enrichment analysis of cDEGs highlighting cHubGs

   To explore the pathogenetic processes of cDEGs, we performed enrichment
   analysis of cHubGs with different gene ontology (GO) terms (BPs:
   biological processes, MFs: molecular functions and CCs: cellular
   components)^[103]39 and Kyoto encyclopedia of genes and genomes (KEGG)
   pathways^[104]40. Here, BPs are different types of molecular activities
   that are essential for the functioning of integrated living entities
   including cells, tissues, and organs. MFs are different types of
   fundamental molecular activity of a gene product, including catalysis.
   CCs are different types of components of a cell or the extracellular
   space. The KEGG pathway database consists of a set of pathway maps that
   represent the molecular interaction and relationship networks for
   genetic information processing, metabolism, human diseases, and drug
   development. Let Si denote the reference/annotated gene-set in the ith
   GO-term or KEGG-pathway, Mi denote the total number genes in S[i]
   (i = 1, 2,…, r); N denote the total number of reference/annotated genes
   that construct the whole-set
   [MATH:
   <mrow><mi>S</mi><mo>=</mo><msubsup><mo>⋃</mo><mrow><mi>i</mi><mo>=</mo>
   <mn>1</mn></mrow><mi>r</mi></msubsup><msub><mi>S</mi><mi>i</mi></msub><
   mo>=</mo><msub><mi>S</mi><mi>i</mi></msub><mo>⋃</mo><msubsup><mi>S</mi>
   <mrow><mi>i</mi></mrow><mi>c</mi></msubsup></mrow> :MATH]
   such that
   [MATH:
   <mrow><mi>N</mi><mo>≤</mo><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo>
   <mn>1</mn></mrow><mi>r</mi></msubsup><msub><mi>M</mi><mi>i</mi></msub><
   mo>;</mo></mrow> :MATH]
   where
   [MATH: <msubsup><mi>S</mi><mrow><mi>i</mi></mrow><mi>c</mi></msubsup>
   :MATH]
   is the complement set of S[i]. Again, let n represent the total number
   of cDEGs and ki represent the number of cDEGs that are a part of the
   annotated gene-set S[i.] To examine the enrichment of ith GO-term or
   KEGG-pathway by the cDEGs-set, the following contingency table (Table
   [105]1) is constructed.

Table 1.

   [MATH: <mrow><mn>2</mn><mo>×</mo><mn>2</mn></mrow> :MATH]
   Contingency table.
   Annotated gene-set (Reference) cDEGs-set Not cDEGs-set Marginal total
   S[i] k[i] M[i] − k[i] M[i]
   [MATH: <msubsup><mi>S</mi><mrow><mrow><mi
   mathvariant="bold-italic">i</mi></mrow></mrow><mi>c</mi></msubsup>
   :MATH]
   n − k[i] N − M[i] − n + k[i] N − M[i]
   Marginal total n N − n N (Grand total)
   [106]Open in a new tab

   We considered three web tools (Enrichr^[107]41, DAVID^[108]42,[109]43,
   and GeneCodis^[110]44) to explore significantly enriched GO-terms and
   KEGG pathways by using the chi-square (
   [MATH: <mrow><msup><mrow><mi>χ</mi></mrow><mn>2</mn></msup><mrow><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   or Fisher’s exact testing procedures. The
   [MATH: <msup><mrow><mi>χ</mi></mrow><mn>2</mn></msup> :MATH]
   -testing procedure based on
   [MATH: <mrow><mn>2</mn><mo>×</mo><mn>2</mn></mrow> :MATH]
   contingency table, is used in GeneCodis to calculate the p-values,
   while the Fisher’s exact testing procedure is used in both Enrichr and
   DAVID web-tools. Fisher’s exact testing procedure is formulated based
   on hypergeometric distribution. This distribution is used to estimate
   the probability of overlapping exactly k[i] cDEGs with the reference
   genes in the ith GO-term or pathway (S[i]) out of n cDEGs. Then the p
   value for testing the significance of k[i] cDEGs in ith GO-term
   or pathway is calculated as,
   [MATH:
   <mrow><msub><mi>p</mi><mi>i</mi></msub><mo>=</mo><mn>1</mn><mo>-</mo><m
   underover><mo
   movablelimits="false">∑</mo><mrow><mrow><mi>j</mi><mo>=</mo><mn>0</mn><
   /mrow></mrow><msub><mi>k</mi><mi>i</mi></msub></munderover><mfrac><mrow
   ><mfenced close=")" open="("><mfrac linethickness="0.0pt"><mrow><mi
   mathvariant="italic">Mi</mi></mrow><mi>j</mi></mfrac></mfenced><mfenced
   close=")" open="("><mfrac
   linethickness="0.0pt"><mrow><mi>N</mi><mo>-</mo><mi>M</mi><mi>i</mi></m
   row><mrow><mi>n</mi><mo>-</mo><mi>j</mi></mrow></mfrac></mfenced></mrow
   ><mfenced close=")" open="("><mfrac
   linethickness="0.0pt"><mi>N</mi><mi>n</mi></mfrac></mfenced></mfrac><mo
   >,</mo><mi>i</mi><mo>=</mo><mrow><mn>1</mn><mo>,</mo><mn>2</mn></mrow><
   mo>,</mo><mo>⋯</mo><mo>,</mo><mi>r</mi><mo>.</mo></mrow> :MATH]

   The k[i] cDEGs in S[i] is considered as a significantly enriched
   gene-set if its adjusted p value (p[i]) < 0.05 at 5% level of
   significance. We adjusted the p values for each of three procedures by
   using the Benjamini and Hochberg procedure^[111]26.

Association of cHubGs with other disease risk

   We performed diseases-cHubGs association analysis by using the Enrichr
   web tool^[112]45 with DisGeNET database^[113]46,[114]47 to explore
   other diseases that can increase the severity of COVID-19 and IPF both
   diseases through cHubGs. The DisGeNET database is a comprehensive
   discovery platform to address association of a gene-set with different
   disease risk^[115]47. The necessary data for this database has been
   collected from different online sources including UniProt, Comparative
   Toxicogenomics Database (CTD), Mouse Genome Database (MGD), Rat Genome
   Database (RGD) and, peer-reviewed publications on Genome Wide
   Association Studies (GWAS), Genetics Association Database (GAD),
   literature of human gene-disease networks (LHGDN), and the BeFree
   datasets. To measure the gene-disease association (GDA), the DisGeNET
   web-tool compute a score (S) by using the following formula.
   [MATH: <mrow><mi>S</mi><mo>=</mo><msub><mrow><mo
   stretchy="false">(</mo><mi>S</mi></mrow><mi
   mathvariant="normal">UNIPORT</mi></msub><mo>+</mo><msub><mi>S</mi><mrow
   ><mi mathvariant="normal">CTD</mi><mspace
   width="0.166667em"></mspace><mi
   mathvariant="normal">human</mi></mrow></msub><mrow><mo
   stretchy="false">)</mo><mo>+</mo></mrow><msub><mrow><mo
   stretchy="false">(</mo><mi>S</mi></mrow><mi
   mathvariant="normal">Mouse</mi></msub><mo>+</mo><msub><mi>S</mi><mi
   mathvariant="normal">Rat</mi></msub><mrow><mo
   stretchy="false">)</mo><mo>+</mo><msub><mrow><mo
   stretchy="false">(</mo><mi>S</mi></mrow><mi
   mathvariant="normal">GDA</mi></msub><mo>+</mo><msub><mi
   mathvariant="normal">S</mi><mi
   mathvariant="normal">LHGDNA</mi></msub><mo>+</mo><msub><mi>S</mi><mi
   mathvariant="normal">BeFree</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]

   where
   [MATH:
   <mrow><msub><mi>S</mi><mtext>UNIPORT</mtext></msub><mo>=</mo><mfenced
   open="{"><mrow><mtable><mtr><mtd
   columnalign="left"><mrow><mn>0.3</mn><mo>,</mo></mrow></mtd><mtd><mrow>
   <mtext>if</mtext><mspace
   width="0.166667em"></mspace><mtext>the</mtext><mspace
   width="0.166667em"></mspace><mtext>association</mtext><mspace
   width="0.166667em"></mspace><mtext>has</mtext><mspace
   width="0.166667em"></mspace><mtext>been</mtext><mspace
   width="0.166667em"></mspace><mtext>decleared</mtext><mspace
   width="0.166667em"></mspace><mtext>in</mtext><mspace
   width="0.166667em"></mspace><mtext>Unipropt</mtext></mrow></mtd></mtr><
   mtr><mtd
   columnalign="left"><mrow><mrow></mrow><mrow><mn>0</mn><mo>,</mo></mrow>
   </mrow></mtd><mtd><mtext>otherwise</mtext></mtd></mtr></mtable></mrow><
   /mfenced></mrow> :MATH]
   [MATH:
   <mrow><msub><mi>S</mi><mtext>CTD\,human</mtext></msub><mo>=</mo><mfence
   d open="{"><mrow><mtable><mtr><mtd
   columnalign="left"><mrow><mn>0.3</mn></mrow></mtd><mtd><mrow><mtext>if<
   /mtext><mspace width="0.166667em"></mspace><mtext>the</mtext><mspace
   width="0.166667em"></mspace><mtext>association</mtext><mspace
   width="0.166667em"></mspace><mtext>has</mtext><mspace
   width="0.166667em"></mspace><mtext>been</mtext><mspace
   width="0.166667em"></mspace><mtext>decleared</mtext><mspace
   width="0.166667em"></mspace><mtext>in</mtext><mspace
   width="0.166667em"></mspace><mtext>CTD</mtext><mspace
   width="0.277778em"></mspace><mtext>human</mtext></mrow></mtd></mtr><mtr
   ><mtd
   columnalign="left"><mrow><mrow></mrow><mrow><mn>0</mn><mo>,</mo></mrow>
   </mrow></mtd><mtd><mtext>otherwise</mtext></mtd></mtr></mtable></mrow><
   /mfenced></mrow> :MATH]
   [MATH:
   <mrow><msub><mi>S</mi><mtext>Rat</mtext></msub><mo>=</mo><mfenced
   open="{"><mrow><mtable><mtr><mtd
   columnalign="left"><mrow><mn>0.3</mn><mo>,</mo></mrow></mtd><mtd><mrow>
   <mtext>if</mtext><mspace
   width="0.166667em"></mspace><mtext>the</mtext><mspace
   width="0.166667em"></mspace><mtext>association</mtext><mspace
   width="0.166667em"></mspace><mtext>has</mtext><mspace
   width="0.166667em"></mspace><mtext>been</mtext><mspace
   width="0.166667em"></mspace><mtext>decleared</mtext><mspace
   width="0.166667em"></mspace><mtext>in</mtext><mspace
   width="0.166667em"></mspace><mtext>CTD</mtext><mspace
   width="0.277778em"></mspace><mtext>or</mtext><mspace
   width="0.166667em"></mspace><mtext>RGD</mtext><mspace
   width="0.166667em"></mspace><mtext>rat</mtext><mspace
   width="0.277778em"></mspace><mtext>dataset</mtext></mrow></mtd></mtr><m
   tr><mtd
   columnalign="left"><mrow><mrow></mrow><mrow><mn>0</mn><mo>,</mo></mrow>
   </mrow></mtd><mtd><mtext>otherwise</mtext></mtd></mtr></mtable></mrow><
   /mfenced></mrow> :MATH]
   [MATH:
   <mrow><msub><mi>S</mi><mtext>Mouse</mtext></msub><mo>=</mo><mfenced
   open="{"><mrow><mtable><mtr><mtd
   columnalign="left"><mrow><mn>0.3</mn><mo>,</mo></mrow></mtd><mtd><mrow>
   <mtext>if</mtext><mspace
   width="0.166667em"></mspace><mtext>the</mtext><mspace
   width="0.166667em"></mspace><mtext>association</mtext><mspace
   width="0.166667em"></mspace><mtext>has</mtext><mspace
   width="0.166667em"></mspace><mtext>been</mtext><mspace
   width="0.166667em"></mspace><mtext>decleared</mtext><mspace
   width="0.166667em"></mspace><mtext>in</mtext><mspace
   width="0.166667em"></mspace><mtext>CTD</mtext><mspace
   width="0.166667em"></mspace><mtext>or</mtext><mspace
   width="0.166667em"></mspace><mtext>RGD</mtext><mspace
   width="0.166667em"></mspace><mtext>mouse</mtext><mspace
   width="0.166667em"></mspace><mtext>dataset</mtext></mrow></mtd></mtr><m
   tr><mtd
   columnalign="left"><mrow><mrow></mrow><mrow><mn>0</mn><mo>,</mo></mrow>
   </mrow></mtd><mtd><mtext>otherwise</mtext></mtd></mtr></mtable></mrow><
   /mfenced></mrow> :MATH]
   [MATH: <mrow><msub><mi>S</mi><mi>L</mi></msub><mo>=</mo><mfenced
   open="{"><mrow><mtable><mtr><mtd
   columnalign="left"><mrow><mn>0.1</mn><mo>,</mo></mrow></mtd><mtd><mrow>
   <mi>i</mi><mi>f</mi><mspace
   width="0.166667em"></mspace><mfrac><mrow><msub><mi>n</mi><mrow><mi
   mathvariant="italic">gd</mi></mrow></msub><mo>×</mo><mn>100</mn></mrow>
   <msub><mi>N</mi><mi>L</mi></msub></mfrac><mo>≥</mo><mspace
   width="0.166667em"></mspace><mtext>max</mtext></mrow></mtd></mtr><mtr><
   mtd
   columnalign="left"><mrow><mrow></mrow><mrow><mfrac><mrow><msub><mi>n</m
   i><mrow><mi
   mathvariant="italic">gd</mi></mrow></msub><mo>×</mo><mn>100</mn></mrow>
   <msub><mi>N</mi><mi>L</mi></msub></mfrac><mo>,</mo></mrow></mrow></mtd>
   <mtd><mrow><mi>i</mi><mi>f</mi><mspace
   width="0.166667em"></mspace><mfrac><mrow><msub><mi>n</mi><mrow><mi
   mathvariant="italic">gd</mi></mrow></msub><mo>×</mo><mn>100</mn></mrow>
   <msub><mi>N</mi><mi>L</mi></msub></mfrac><mo><</mo><mi>m</mi><mi>a</mi>
   <mi>x</mi></mrow></mtd></mtr></mtable></mrow></mfenced></mrow> :MATH]

   and L: LHGDN, GAD, or BeFree,
   [MATH: <msub><mi>n</mi><mrow><mi
   mathvariant="italic">gd</mi></mrow></msub> :MATH]
   is the number of publications reported a GDA in the source and
   [MATH: <msub><mi>N</mi><mi>L</mi></msub> :MATH]
   is the total number of publications in the source.
   [MATH: <mrow><mtext>max</mtext><mo>=</mo><mfenced
   open="{"><mrow><mtable><mtr><mtd
   columnalign="left"><mrow><mn>0.080</mn></mrow></mtd><mtd><mrow><mi>i</m
   i><mi>f</mi><mspace
   width="0.277778em"></mspace><mi>L</mi><mo>=</mo><mi>G</mi><mi>A</mi><mi
   >D</mi></mrow></mtd></mtr><mtr><mtd
   columnalign="left"><mrow><mrow></mrow><mrow><mn>0.06</mn></mrow></mrow>
   </mtd><mtd><mrow><mi>i</mi><mi>f</mi><mspace
   width="0.277778em"></mspace><mi>L</mi><mo>=</mo><mtext>LHGDN</mtext><ms
   pace width="0.277778em"></mspace><mtext>or</mtext><mspace
   width="0.277778em"></mspace><mtext>BeFree</mtext></mrow></mtd></mtr></m
   table></mrow></mfenced></mrow> :MATH]

   Obviously, the DisGeNET score (S) lies between 0 to 1.

Prognostic performance of cHubGs

   To investigate the prognostic performance of cHubGs with lung cancer,
   we performed multivariate survival analysis of lung cancer patients
   based on the expressions of cHubGs by using SurvExpress
   web-tool^[116]48. The TCGA database
   ([117]https://tcga-data.nci.nih.gov) is used in SurvExpress web-tool
   for survival analysis with a gene-set. In this analysis, patient
   samples were divided in to low-risk and high-risk groups by using the
   median of risk-scores (RSs) which is defined by the linear component of
   the Cox regression model. That is,
   [MATH: <mrow><msub><mrow><mi mathvariant="normal">RS</mi><mo>=</mo><mi
   mathvariant="normal">β</mi></mrow><mn>1</mn></msub><msub><mi
   mathvariant="normal">X</mi><mn>1</mn></msub><mo>+</mo><msub><mi
   mathvariant="normal">β</mi><mn>2</mn></msub><msub><mi
   mathvariant="normal">X</mi><mn>2</mn></msub><mo>+</mo><mo>⋯</mo><mo>+</
   mo><msub><mi mathvariant="normal">β</mi><mi
   mathvariant="normal">n</mi></msub><msub><mi
   mathvariant="normal">X</mi><mi
   mathvariant="normal">n</mi></msub></mrow> :MATH]
   , where
   [MATH: <msub><mi mathvariant="normal">X</mi><mi
   mathvariant="normal">i</mi></msub> :MATH]
   is the expression of ith gene,
   [MATH: <msub><mi mathvariant="normal">β</mi><mi
   mathvariant="normal">i</mi></msub> :MATH]
   is calculated by using the Cox regression approach^[118]49. Generally,
   a patient whose risk-score (RS) is greater than median of RSs was
   considered in the high-risk group, otherwise, the patient belongs to
   the low-risk group. Then the Kaplan–Meier survival plot for each risk
   group is constructed for each risk group. The significant difference
   between two risk groups was investigated by using the concordance
   indexes (CI), hazard-ratio (HR) and log-rank test^[119]50. The
   significance level was set to p value < 0.05.

Drug repurposing by molecular docking

   To explore cHubGs-guided FDA approved repurposable drug molecules for
   the treatment against SARS-CoV-2 infections in presence of IPF risk, we
   employed molecular docking analysis of cHubGs and its TFs with 185 drug
   agents as introduced previously in the data sources and descriptions
   section [see Table [120]S1(I, II)]. The molecular docking analysis
   requires 3-Dimensional (3D) structures of both receptor proteins and
   drug agents/ligands. We downloaded 3D structure of all targeted
   proteins from Protein Data Bank (PDB)^[121]51, AlphaFold Protein
   Structure Database^[122]52 and SWISS-MODEL^[123]53. The 3D structures
   of all drug agents were downloaded from PubChem database^[124]54. The
   3D structures of the target proteins were visualized by using the
   Discovery Studio Visualizer 2019^[125]55. Further, the receptor protein
   was prepared by removing ligand heteroatoms and water molecules and by
   addition of polar hydrogens on AutoDock tools 1.5.7^[126]56. The drug
   agents/ligands were prepared by setting the torsion tree and rotatable,
   nonrotatable bonds present in the ligand through AutoDock tools 1.5.7.
   Then, pairwise binding affinities between the target proteins and drug
   agents were calculated using the AutoDock Vina^[127]57. The
   exhaustiveness parameter was set to 10. Discovery Studio Visualizer
   2019^[128]55 and PyMol^[129]58 were used to analyze the docked
   complexes for its surfaces, types and distances of non-covalent bonds.
   Let B[ij] is the binding affinity (BA) score corresponding to the ith
   receptor (i = 1, 2, … , p) and jth drug (j = 1, 2, … , q). The
   receptors and drug agents were ordered by the decreasing order of their
   average BA scores for selecting the top-ordered potential drug agents.
   We compared the binding performance of our suggested drugs with
   previously suggested candidate drugs by Taz et al.^[130]14 by molecular
   docking analysis against the Taz et al., suggested receptors as well as
   top-ranked independent receptors. Finally, we looked into the
   effectiveness of our suggested drugs with randomly selected independent
   receptor proteins that were not associated with SARS-CoV-2 infections
   in the presence of IPF risk.

Results

Identification of cDEGs

   The dataset [131]GSE147507 was analyzed by using the edgeR r-package to
   identify DEGs between SARS-CoV-2 infections and control samples. A
   total of 851 DEGs were identified by satisfying the cutoff criteria of
   adjusted p value < 0.05 and |log[2](aFC)|> 1, where 712 DEGs are
   upregulated and 139 DEGs are downregulated. The [132]GSE52463 dataset
   was analyzed to identify DEGs between IPF diseases and control samples.
   A total of 668 DEGs were found according to the same criteria, where
   391 DEGs are upregulated and 277 are downregulated. Then we commonly
   found 27 upregulated cDEGs {TNFAIP3, SELE, MX2, PTX3, CH25H, UBASH3A,
   EREG, BIRC3, ZBP1, NLRP3, LIF, PRDM1, ADM, VCAM1, FOSL1, CXCR4, CCL22,
   IRF4, SLC5A5, SYTL3, ADAMTS4, UBD, CCL17, CPNE5, TNFAIP6, IKZF3, TNIP3}
   and 5 downregulated cDEGs {MVD, KIF12, RAAG1GAP, SLC27A3, TMEM160} that
   can separate both COVID-19 and IPF patients from the control samples
   (see Fig. [133]2).

Figure 2.

   Figure 2
   [134]Open in a new tab

   Venn diagrams for visualizing the upregulated and downregulated cDEGs
   that can separate both COVID-19 and IPF patients from the control
   samples.

PPI network analysis of cDEGs for identification of cHubGs

   The PPI network of cDEGs was constructed using STRING database
   (Fig. [135]3) which contains 27 nodes and 115 edges. We selected top
   ranked 10 cHubGs {CXCR4, TNFAIP3, VCAM1, NLRP3, TNFAIP6, SELE, MX2,
   IRF4, UBD, CH25H} by applying five topological measures (Deg, BN, BC,
   Str and Clo) in the PPI network (see Table [136]S2).

Figure 3.

   [137]Figure 3
   [138]Open in a new tab

   Protein–protein interaction (PPI) network of cDEGs for COVID-19 and IPF
   disease to identify cHubGs. The red color nodes indicate the cHubGs.

The regulatory network analysis of cHubGs

   The cHubGs-TFs interaction network detected top-ranked two sets of TFs
   proteins {JUN, SRF and NFKB1} and {JUN, SPI1 and NFKB1} based on JASPAR
   and RegNetwork databases respectively, as the key transcriptional
   regulatory factors for cHubGs, where two TFs proteins JUN and NFKB1were
   common in both-sets (see Fig. [139]4B,C). Based on two databases, we
   observed that JUN is a regulator of 7 cHubGs (TNFAIP3, TNFAIP6, IRF4,
   SELE, TNFAIP6, NLRP3, VAM1) and NFKB1 for 6 cHubGs (TNFAIP3, IRF4,
   CXCR4,UBD, SELE, VAM1). The cHubGs-miRNAs interaction network detected
   top-ranked two sets of miRNAs {hsa-mir-155-5p, hsa-mir-27a-5p,
   hsa-mir-107, hsa-mir-21-3p, hsa-mir-129-2-3p} and { hsa-mir-155-5p,
   hsa-mir-154, hsa-mir-613, hsa-mir-21-3p, hsa-mir-132} based on TarBase
   and RegNetwork databases respectively, as the key post-transcriptional
   regulatory factors for cHubGs, where two miRNAs hsa-mir-155-5p and
   hsa-mir-21-3p were common in both-sets (see Figs. [140]4A,C). Based on
   two databases, we observed that hsa-mir-155-5p regulates 6 cHubGs
   {IRF4, VAM1, TNFAIP3, TNFAIP6, SELE, CXCR4} and hsa-mir-21-3p regulates
   5 cHubGs {VAM1, TNFAIP3, TNFAIP6, MX2, UBD}.

Figure 4.

   [141]Figure 4
   [142]Open in a new tab

   The regulatory network of cHubGs with TFs and miRNAs. The red, light
   blue and green color nodes represent the cHubGs, TFs and miRNAs. The
   names of cHubGs, key regulatory TFs and miRNAs were displayed only. (A)
   The cHubGs-TFs interaction network based on JASPAR database. (B) The
   miRNA-cHubGs interaction network based on TarBase database. (C) The
   cHubGs-TFs-miRNAs interaction network based on RegNetwork database.

GO functions and KEGG pathway enrichment analysis of cDEGs highlighting
cHubGs

   The Table [143]2 displayed the top five commonly enriched GO terms
   (BPs, MFs and CCs) and KEGG pathways by cDEGs with three web-tools and
   databases Enrichr^[144]41, DAVID^[145]42,[146]43 and GeneCodis^[147]44
   (p value < 0.001). Among the top five GO terms of BPs, four BPs
   including inflammatory response, regulation of inflammatory response,
   response to tumor necrosis factor and response to cytokine were
   enriched by the cHubGs-sets with all of three databases. The other
   BP-term (response to virus) was enriched by the cHubGs-sets with each
   of two databases (DAVID^[148]42,[149]43 and GeneCodis^[150]44). Three
   MFs (sequence-specific DNA binding, CCR chemokine receptor binding and
   chemokine activity) out of four, were commonly enriched by the
   cHubGs-sets with all of three databases. The other MF-term (ubiquitin
   binding^[151]59) was enriched by the cHubGs-sets with each of two
   databases (Enrichr^[152]41 and GeneCodis^[153]44). Among the top five
   significantly enriched CCs terms, four terms including early endosome,
   extracellular space, cytosol and extracellular region were commonly
   enriched by the cHubGs-sets with two databases out of 3. Only one
   CC-term (tertiary granule lumen) was enriched by the cHubGs-sets with
   all of 3 databases. Interestingly, all of the top five KEGG pathways
   (TNF signaling pathway, IL-17 signaling pathway, NF-kappa B signaling
   pathway, NOD-like receptor signaling and Cytokine-cytokine receptor
   interaction) were commonly enriched by the cHubGs-sets with all of 3
   databases.

Table 2.

   The top five commonly enriched GO terms (BPs, MFs and CCs) and KEGG
   pathways by cDEGs with three web-tools and databases Enrichr^[154]41,
   DAVID^[155]42,[156]43 and GeneCodis^[157]44 (p value < 0.001). The
   cHubGs were highlighted by bold gene names. Some references were given