Abstract
Some recent studies showed that severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2) infections and idiopathic pulmonary fibrosis
(IPF) disease might stimulate each other through the shared genes.
Therefore, in this study, an attempt was made to explore common genomic
biomarkers for SARS-CoV-2 infections and IPF disease highlighting their
functions, pathways, regulators and associated drug molecules. At
first, we identified 32 statistically significant common differentially
expressed genes (cDEGs) between disease (SARS-CoV-2 and IPF) and
control samples of RNA-Seq profiles by using a statistical r-package
(edgeR). Then we detected 10 cDEGs (CXCR4, TNFAIP3, VCAM1, NLRP3,
TNFAIP6, SELE, MX2, IRF4, UBD and CH25H) out of 32 as the common hub
genes (cHubGs) by the protein–protein interaction (PPI) network
analysis. The cHubGs regulatory network analysis detected few key
TFs-proteins and miRNAs as the transcriptional and post-transcriptional
regulators of cHubGs. The cDEGs-set enrichment analysis identified some
crucial SARS-CoV-2 and IPF causing common molecular mechanisms
including biological processes, molecular functions, cellular
components and signaling pathways. Then, we suggested the cHubGs-guided
top-ranked 10 candidate drug molecules (Tegobuvir, Nilotinib, Digoxin,
Proscillaridin, Simeprevir, Sorafenib, Torin 2, Rapamycin, Vancomycin
and Hesperidin) for the treatment against SARS-CoV-2 infections with
IFP diseases as comorbidity. Finally, we investigated the resistance
performance of our proposed drug molecules compare to the already
published molecules, against the state-of-the-art alternatives publicly
available top-ranked independent receptors by molecular docking
analysis. Molecular docking results suggested that our proposed drug
molecules would be more effective compare to the already published drug
molecules. Thus, the findings of this study might be played a vital
role for diagnosis and therapies of SARS-CoV-2 infections with IPF
disease as comorbidity risk.
Subject terms: Computational biology and bioinformatics, Drug discovery
Introduction
The severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) is
the main cause of COVID-19 pandemic that brings a major threat for
human life as well as economy around the world^[42]1,[43]2. It was
first detected from Wuhan town in China at the end of 2019 and rapidly
spread all over the world with several symptoms like as cough, fever,
and pneumonia diarrhea, severe respiratory diseases and become a
complex and deadly health concern^[44]3,[45]4. The World Health
Organization (WHO) announced it as a momentous pandemic of twenty-first
century on March 11, 2020. It is highly homologous to the SARS
coronavirus (SARS-CoV) which was responsible for the respiratory
pandemic during the 2002–2003 period^[46]5,[47]6. Coronaviruses are
single-stranded RNA viruses of ~ 30 kb. Based on their genomic
structures, they are generally classified into four genera known as α,
β, γ, and δ. The life cycle of SARS-CoV-2 with the host consists of the
five steps classified as attachment, penetration, biosynthesis,
maturation and release. The SARS-CoV-2 enters in to the host cells
through the membrane fusion or endocytosis (penetration) after binding
to the host receptor proteins (attachment). Once viral proteins
(biosynthesis) including the major protease (MPro/3ClPro), the
papain-like protease (PLpro) and the RNA-dependent RNA polymerase
(RdRp) are released inside the host cells, viral RNA enters to the
nucleus for replication. Thus, create new viral particles (maturation)
and released. Coronaviruses consist of four structural proteins; Spike
(S), membrane (M), envelop (E) and nucleocapsid (N)^[48]7. The spike
(S) protein of SARS-CoV-2 interacts with the host ACE2
(angiotensin-converting enzyme 2) receptor protein to stimulate the
infection^[49]8–[50]10. ACE2 is highly expressed in lung, heart,
kidney, ileum and bladder^[51]11,[52]34. In the case of lung, ACE2 is
highly expressed in lung epithelial cells which leads the interstitial
lung damage including epithelial and endothelial injury with excessive
fibroproliferation^[53]12. Until September 2022, there have been around
614,825,354 confirmed cases of COVID-19, including 6,536,284
million deaths^[54]13. Though different vaccination programs reduced
the severity of SARS-CoV-2 infections worldwide, however, it is yet one
of the most severe risk factor for some chronic diseases including
idiopathic pulmonary fibrosis (IPF) disease, since they stimulate each
other very much^[55]12,[56]14. The IPF disease is a chronic,
progressive lung syndrome which leads to the respiratory collapse and
decline the lung function^[57]12. The primary symptoms of IPF are dry
cough and breathing complexity^[58]15. The average survival time of a
patients suffering from IPF is approximately 3 years after the first
diagnosis and therapies. Initially, IPF has been considered as a
chronic inflammatory process^[59]16, but recent studies showed that
abnormally activated alveolar epithelial cells (AECs) are the main
factor responsible for the fibrotic response because they release
cytokines that stimulate the fibroblasts^[60]17. IPF is typified by the
progressive and fatal accumulation of fibroblasts and extracellular
matrix (ECM) in the lung, leading to distortion of the lung
architecture and reduction in lung function. So, Individuals with
inflammatory lung disease are at a higher risk of death from
COVID-19^[61]18,[62]19. Therefore, identification of SARS-CoV-2 and IPF
diseases causing shared genes is required for diagnosis and therapies
of COVID-19 patients with IPF disease as comorbidity.
Taz et al.^[63]14 tried to identify shared genomic biomarkers for
diagnosis and therapies of COVID-19 patients with IPF disease as
comorbidity. They analyzed bulk RNA-Seq profiles for SARS-CoV-2
infections and microarray gene-expression profiles for IPF disease and
found only 11 common differentially expressed genes (cDEGs) to separate
both COVID-19 and IPF patients from control samples. They identified
top-ranked 5 genes as common hub-genes (cHubGs) by protein–protein
interaction (PPI) analysis, where only 2 cHubGs were detected from the
cDEGs and the rest 3 cHubGs did not belongs to their cDEGs-set. Also,
they did not examine their common differential expression patterns by
from any other databases, which indicates that their cHubGs-set were
not so representative of their cDEGs-set. Another drawback in their
study was that they used microarray data instead of RNA-Seq data for
identification of differentially expressed genes (DEGs) between IPF
disease and control samples, though RNA-Seq data perform better than
microarray data in identifying DEGs^[64]20. Therefore, in this study,
an attempt was made to explore comparatively more representative and
effective cHubGs-set for SARS-CoV-2 and IPF diseases from RNA-Seq
profiles for diagnosis and therapies of COVID-19 patients with IPF
disease as comorbidity by using the integrated bioinformatics analyses.
The pipeline of this study is given in Fig. [65]1.
Figure 1.
[66]Figure 1
[67]Open in a new tab
The pipeline of this study.
Materials and methods
Data sources and descriptions
We used both original data and meta-data to reach the goal of this
study as described below.
Collection of RNA-Seq profiles for SARS-CoV-2 infections, IPF disease and
control samples
We collected RNA-Seq profiles for SARS-CoV-2 infections, IPF disease
and control samples from the National Center for Biotechnology
Information (NCBI) Gene Expression Omnibus (GEO) database
([68]http://www.ncbi.nlm.nih.gov/geo/) to explore both diseases causing
common genes. The SARS-CoV-2 infected patient’s RNA-Seq profiles were
downloaded from [69]GPL18573 platform of Illumina NextGen 500 (Homo
sapiens) with the GEO accession numbers [70]GSE147507 published by
Blanco-Melozet al.^[71]21. We collected 18 case and 29 control samples
of RNA-Seq profiles from 3 cell lines (1) normal human bronchial
epithelium, alveolar cells, transformed lung-derived Calu-3 cells of
lung tissues for SARS-CoV-2 infections. On the hand, the IPF patient’s
RNA-Seq profiles were downloaded from [72]GPL11154 platform of Illumina
HiSeq 2000 (Homo sapiens) with the GEO accession numbers [73]GSE52463
published by Nance et al.^[74]22. This dataset contained 16 samples
including 8 IPF and 8 control samples collected from lung tissues.
Collection of meta-drug agents for exploring candidate drugs
Beck et.al.^[75]23 suggested SARS-CoV-2 main protease (3CLpro)-guided
top-ranked 90 drug molecules out of 3410 FDA approved anti-viral drugs
for the treatment against COVID-19 by molecular docking analysis. In
our study, we collected their suggested top-ranked 90 drug molecules as
the meta-drug agents [see Table [76]S1(I)]. We also collected host
transcriptome-guided 95 meta-drug agents that are recommended for
COVID-19 or IPF diseases [see Table [77]S1(II)] in different published
articles. Thus, we considered total 90 + 95 = 185 drugs to explore most
probable candidate drug agents by molecular docking with our proposed
receptors.
Collection of independent meta-receptors for in-silico validation of the
proposed drugs
We selected the top-ranked 11 hub-genes (independent meta-receptors) as
the independent meta-receptors associated with COVID-19 and IPF disease
by reviewing 23 published articles [see Table [78]S1(III)] for
in-silico validation of the proposed drug molecules by molecular
docking.
Identification of DEGs from RNA-Seq profiles by using edgeR
To explore DEGs between case and control samples from RNA-Seq profiles
[79]GSE147507 and [80]GSE52463 of NCBI-GEO database, we considered a
popular statistical approach known as edgeR. In order to introduce
edgeR, let
[MATH: Xgi :MATH]
denote the total number of read counts for gth gene (g = 1, 2, …, G) in
the ith sample (i = 1, 2, …, n), which is assumed to be followed
negative binomial (NB) distribution in the edgeR setting^[81]24,[82]25.
That is,
[MATH: X∼NBμgi,δg, :MATH]
where the parameters are described by the mean and variance as
[MATH: μgi=EXgi=Miπgi :MATH]
and
[MATH: VXgi=μgi(1+μgiδg) :MATH]
, respectively. Here,
[MATH: Mi :MATH]
is the total number of short reads of RNA-Seq profiles,
[MATH: πgi :MATH]
denote the fraction of all cDNA fragments for gth gene in the ith
sample so that
[MATH:
∑g=1Gπgi=1, :MATH]
[MATH: δg :MATH]
is the squired coefficient of variation of
[MATH: πgi :MATH]
based on the replicates i. The NB distribution convert to Poisson
distribution when
[MATH:
δg=0
:MATH]
. According to the generalized linear model (GLM) approach, the mean
response,
[MATH: μgi :MATH]
, is linked to a linear predictor as
[MATH: Log(μgi)=ziTβg+Log(Mi), :MATH]
where z[i] is the ith column of the design matrix (Z) including the
covariates (e.g. batch effects, experimental conditions, etc.),
[MATH: βg :MATH]
is a q × 1 vector of regression parameters for differential expression
patterns. The regression vector
[MATH: βg :MATH]
is estimated by maximum likelihood estimation (MLE) and calculated
iterative way as
[MATH: βgnew=β<
/mi>gold+(ZTDgZ)-1<
msup>ZTug
:MATH]
where
[MATH: Dg :MATH]
is the diagonal matrix of working weights,
[MATH: ugi=(xgi-μgi)/(1+δg
msub>μgi) :MATH]
. The dispersion parameter
[MATH: δg :MATH]
is calculated as
[MATH: δ^g=argmaxAPLgδg+τ
mi>.APLgSδg,
mo> :MATH]
where
[MATH: APLgδg=L
mi>δg;Xg,β^g-12Log|Ig| :MATH]
is the adjusted profile likelihood (APL) in terms of
[MATH: δg :MATH]
, penalized for the estimation of the regression parameters
[MATH: βg :MATH]
,
[MATH: Xg :MATH]
is the vector of counts for gene g,
[MATH: β^g :MATH]
is the estimated coefficient vector, L(·) is the log-likelihood
function,
[MATH: Ig :MATH]
is the Fisher information matrix and |·| is the determinant,
[MATH: τ :MATH]
is the prior degree of freedom afforded to the shared likelihood and
[MATH: APLgSδg=
1|C|∑k∈CAPLk(δg). :MATH]
Then the edgeR approach allows us for testing the significance of any
parameter or any contrast or linear combination of the parameters in
the linear model. Gene-wise hypothesis testing are conducted by
computing likelihood ratio (LRT) statistic
[MATH: L(θg,0<
/mn>|X)/L(θg|X) :MATH]
to compare the null hypothesis (H[0]) that
[MATH: βg=0 :MATH]
(insignificant gene) against the two-sided alternative (H[1]),
[MATH: β≠0
:MATH]
(indicating DEGs), where
[MATH: θg={β,δ} :MATH]
and
[MATH:
θg,0=δ :MATH]
. The log-LRT follows asymptotically chi-square distribution under
H[0]. Adjusted P.values are computed to control the FDR (false
discovery rate) by the Benjamini and Hochberg approach^[83]26. We
implemented this algorithm to identify DEGs from our downloaded RNA-Seq
count datasets [84]GSE147507 and [85]GSE52463 for SARS-CoV-2 and IPF
diseases, respectively, by using edgeR, an R-package in
Bioconductor^[86]25. To separate up and down-regulated DEGs, we for the
combined data as follows
[MATH:
DEGg=Upregulated,ifadj.Pg.value<0.05
andLog2(aFC)g>+
1.0Downregulated,ifadj.Pg.value<0.05
andLog2(aFC)g<-
1.0 :MATH]
where
[MATH: adj.Pg.value :MATH]
is the adjusted P.value and
[MATH: (aFC)g=X<
mo>¯gcase/X¯gcontrol :MATH]
: fold change on average expressions, for gth gene.
Identification of SARS-CoV-2 and IPF diseases causing common DEGs (cDEGs)
Let
[MATH: CUR :MATH]
and
[MATH: CDR :MATH]
indicates the upregulated and downregulated DEGs-sets respectively for
Covid-19 patients. Again let
[MATH: IUR :MATH]
and
[MATH: IDR :MATH]
be the upregulated and downregulated DEGs-sets respectively, for IPF
patients. Then, we defined common upregulated gene-set as
[MATH: GUR=(CUR∩IUR) :MATH]
and common downregulated as
[MATH: GDR=(CDR∩IDR) :MATH]
. Finally, we considered common DEGs (cDEGs) set as
[MATH: cDEG=(GUR∪GDR) :MATH]
that can differentiate both group of patients from the control samples.
Therefore, in this study, we considered this cDEGs-set for further
investigation.
Protein–protein interaction (PPI) network analysis of cDEGs
In order to explore both SARS-CoV-2 and IPF diseases causing common
hub-genes (cHubGs), we performed protein–protein interaction (PPI)
network analysis of cDEGs by using the STRING databases^[87]27. To
build the PPI network, the distance "D" between the proteins (u, v) is
computed as,
[MATH: Du,v=2|Nu∩Nv|Nu+|Nv| :MATH]
where
[MATH: Nu :MATH]
and
[MATH: Nv :MATH]
are the neighbor sets of u and v, respectively. To improve the quality
of PPI networks, we used the Cytoscape web-tool^[88]28. The Cytoscape
plugin cytoHubba was used to select the Hub-Genes (HubGs) or
Hub-Proteins (HubPs) from PPI networks^[89]28,[90]29. The PPI network
provides a number of nodes and edges, which indicates proteins and
their interactions, respectively. The HubGs were selected from the PPI
network by using five topological measures including Degree
(Deg)^[91]30, BottleNeck (BN)^[92]31, Betweenness (BC)^[93]32, Stress
(Str)^[94]33, and Closeness (Clo)^[95]34.
Regulatory network analysis of cHubGs
To explore key transcription factors (TFs) and micro-RNAs (miRNAs) as
the transcriptional and post-transcriptional regulators of cHubGs, we
performed TFs-cHubGs, miRNAs-cHubGs, and TFs-miRNAs-cHubGs interaction
networks based on the JASPAR^[96]35, TarBase^[97]36 and
RegNetwork^[98]37 databases, respectively by using the NetworkAnalyst
web-tool^[99]38. The Cytoscape software^[100]28 was used to improve the
quality of networks. The key regulators were selected by using two
topological measures (Degree (Deg)^[101]30 and Betweenness
(BC)^[102]32) of networks.
GO terms and KEGG pathway enrichment analysis of cDEGs highlighting cHubGs
To explore the pathogenetic processes of cDEGs, we performed enrichment
analysis of cHubGs with different gene ontology (GO) terms (BPs:
biological processes, MFs: molecular functions and CCs: cellular
components)^[103]39 and Kyoto encyclopedia of genes and genomes (KEGG)
pathways^[104]40. Here, BPs are different types of molecular activities
that are essential for the functioning of integrated living entities
including cells, tissues, and organs. MFs are different types of
fundamental molecular activity of a gene product, including catalysis.
CCs are different types of components of a cell or the extracellular
space. The KEGG pathway database consists of a set of pathway maps that
represent the molecular interaction and relationship networks for
genetic information processing, metabolism, human diseases, and drug
development. Let Si denote the reference/annotated gene-set in the ith
GO-term or KEGG-pathway, Mi denote the total number genes in S[i]
(i = 1, 2,…, r); N denote the total number of reference/annotated genes
that construct the whole-set
[MATH:
S=⋃i=
1rSi<
mo>=Si⋃S
ic :MATH]
such that
[MATH:
N≤∑i=
1rMi<
mo>; :MATH]
where
[MATH: Sic
:MATH]
is the complement set of S[i]. Again, let n represent the total number
of cDEGs and ki represent the number of cDEGs that are a part of the
annotated gene-set S[i.] To examine the enrichment of ith GO-term or
KEGG-pathway by the cDEGs-set, the following contingency table (Table
[105]1) is constructed.
Table 1.
[MATH: 2×2 :MATH]
Contingency table.
Annotated gene-set (Reference) cDEGs-set Not cDEGs-set Marginal total
S[i] k[i] M[i] − k[i] M[i]
[MATH: Sic
:MATH]
n − k[i] N − M[i] − n + k[i] N − M[i]
Marginal total n N − n N (Grand total)
[106]Open in a new tab
We considered three web tools (Enrichr^[107]41, DAVID^[108]42,[109]43,
and GeneCodis^[110]44) to explore significantly enriched GO-terms and
KEGG pathways by using the chi-square (
[MATH: χ2) :MATH]
or Fisher’s exact testing procedures. The
[MATH: χ2 :MATH]
-testing procedure based on
[MATH: 2×2 :MATH]
contingency table, is used in GeneCodis to calculate the p-values,
while the Fisher’s exact testing procedure is used in both Enrichr and
DAVID web-tools. Fisher’s exact testing procedure is formulated based
on hypergeometric distribution. This distribution is used to estimate
the probability of overlapping exactly k[i] cDEGs with the reference
genes in the ith GO-term or pathway (S[i]) out of n cDEGs. Then the p
value for testing the significance of k[i] cDEGs in ith GO-term
or pathway is calculated as,
[MATH:
pi=1-∑j=0<
/mrow>kiMijN-Min-jNn,i=1,2<
mo>,⋯,r. :MATH]
The k[i] cDEGs in S[i] is considered as a significantly enriched
gene-set if its adjusted p value (p[i]) < 0.05 at 5% level of
significance. We adjusted the p values for each of three procedures by
using the Benjamini and Hochberg procedure^[111]26.
Association of cHubGs with other disease risk
We performed diseases-cHubGs association analysis by using the Enrichr
web tool^[112]45 with DisGeNET database^[113]46,[114]47 to explore
other diseases that can increase the severity of COVID-19 and IPF both
diseases through cHubGs. The DisGeNET database is a comprehensive
discovery platform to address association of a gene-set with different
disease risk^[115]47. The necessary data for this database has been
collected from different online sources including UniProt, Comparative
Toxicogenomics Database (CTD), Mouse Genome Database (MGD), Rat Genome
Database (RGD) and, peer-reviewed publications on Genome Wide
Association Studies (GWAS), Genetics Association Database (GAD),
literature of human gene-disease networks (LHGDN), and the BeFree
datasets. To measure the gene-disease association (GDA), the DisGeNET
web-tool compute a score (S) by using the following formula.
[MATH: S=(SUNIPORT+SCTDhuman)+(SMouse+SRat)+(SGDA+SLHGDNA+SBeFree) :MATH]
where
[MATH:
SUNIPORT=0.3,
iftheassociationhasbeendeclearedinUnipropt<
mtr>0,
otherwise<
/mfenced> :MATH]
[MATH:
SCTD\,human=0.3if<
/mtext>theassociationhasbeendeclearedinCTDhuman0,
otherwise<
/mfenced> :MATH]
[MATH:
SRat=0.3,
iftheassociationhasbeendeclearedinCTDorRGDratdataset0,
otherwise<
/mfenced> :MATH]
[MATH:
SMouse=0.3,
iftheassociationhasbeendeclearedinCTDorRGDmousedataset0,
otherwise<
/mfenced> :MATH]
[MATH: SL=0.1,
ifngd×100
NL≥max<
mtd
columnalign="left">ngd×100
NL,
ifngd×100
NL<ma
x :MATH]
and L: LHGDN, GAD, or BeFree,
[MATH: ngd :MATH]
is the number of publications reported a GDA in the source and
[MATH: NL :MATH]
is the total number of publications in the source.
[MATH: max=0.080ifL=GAD0.06
ifL=LHGDNorBeFree :MATH]
Obviously, the DisGeNET score (S) lies between 0 to 1.
Prognostic performance of cHubGs
To investigate the prognostic performance of cHubGs with lung cancer,
we performed multivariate survival analysis of lung cancer patients
based on the expressions of cHubGs by using SurvExpress
web-tool^[116]48. The TCGA database
([117]https://tcga-data.nci.nih.gov) is used in SurvExpress web-tool
for survival analysis with a gene-set. In this analysis, patient
samples were divided in to low-risk and high-risk groups by using the
median of risk-scores (RSs) which is defined by the linear component of
the Cox regression model. That is,
[MATH: RS=β1X1+β2X2+⋯+
mo>βnXn :MATH]
, where
[MATH: Xi :MATH]
is the expression of ith gene,
[MATH: βi :MATH]
is calculated by using the Cox regression approach^[118]49. Generally,
a patient whose risk-score (RS) is greater than median of RSs was
considered in the high-risk group, otherwise, the patient belongs to
the low-risk group. Then the Kaplan–Meier survival plot for each risk
group is constructed for each risk group. The significant difference
between two risk groups was investigated by using the concordance
indexes (CI), hazard-ratio (HR) and log-rank test^[119]50. The
significance level was set to p value < 0.05.
Drug repurposing by molecular docking
To explore cHubGs-guided FDA approved repurposable drug molecules for
the treatment against SARS-CoV-2 infections in presence of IPF risk, we
employed molecular docking analysis of cHubGs and its TFs with 185 drug
agents as introduced previously in the data sources and descriptions
section [see Table [120]S1(I, II)]. The molecular docking analysis
requires 3-Dimensional (3D) structures of both receptor proteins and
drug agents/ligands. We downloaded 3D structure of all targeted
proteins from Protein Data Bank (PDB)^[121]51, AlphaFold Protein
Structure Database^[122]52 and SWISS-MODEL^[123]53. The 3D structures
of all drug agents were downloaded from PubChem database^[124]54. The
3D structures of the target proteins were visualized by using the
Discovery Studio Visualizer 2019^[125]55. Further, the receptor protein
was prepared by removing ligand heteroatoms and water molecules and by
addition of polar hydrogens on AutoDock tools 1.5.7^[126]56. The drug
agents/ligands were prepared by setting the torsion tree and rotatable,
nonrotatable bonds present in the ligand through AutoDock tools 1.5.7.
Then, pairwise binding affinities between the target proteins and drug
agents were calculated using the AutoDock Vina^[127]57. The
exhaustiveness parameter was set to 10. Discovery Studio Visualizer
2019^[128]55 and PyMol^[129]58 were used to analyze the docked
complexes for its surfaces, types and distances of non-covalent bonds.
Let B[ij] is the binding affinity (BA) score corresponding to the ith
receptor (i = 1, 2, … , p) and jth drug (j = 1, 2, … , q). The
receptors and drug agents were ordered by the decreasing order of their
average BA scores for selecting the top-ordered potential drug agents.
We compared the binding performance of our suggested drugs with
previously suggested candidate drugs by Taz et al.^[130]14 by molecular
docking analysis against the Taz et al., suggested receptors as well as
top-ranked independent receptors. Finally, we looked into the
effectiveness of our suggested drugs with randomly selected independent
receptor proteins that were not associated with SARS-CoV-2 infections
in the presence of IPF risk.
Results
Identification of cDEGs
The dataset [131]GSE147507 was analyzed by using the edgeR r-package to
identify DEGs between SARS-CoV-2 infections and control samples. A
total of 851 DEGs were identified by satisfying the cutoff criteria of
adjusted p value < 0.05 and |log[2](aFC)|> 1, where 712 DEGs are
upregulated and 139 DEGs are downregulated. The [132]GSE52463 dataset
was analyzed to identify DEGs between IPF diseases and control samples.
A total of 668 DEGs were found according to the same criteria, where
391 DEGs are upregulated and 277 are downregulated. Then we commonly
found 27 upregulated cDEGs {TNFAIP3, SELE, MX2, PTX3, CH25H, UBASH3A,
EREG, BIRC3, ZBP1, NLRP3, LIF, PRDM1, ADM, VCAM1, FOSL1, CXCR4, CCL22,
IRF4, SLC5A5, SYTL3, ADAMTS4, UBD, CCL17, CPNE5, TNFAIP6, IKZF3, TNIP3}
and 5 downregulated cDEGs {MVD, KIF12, RAAG1GAP, SLC27A3, TMEM160} that
can separate both COVID-19 and IPF patients from the control samples
(see Fig. [133]2).
Figure 2.
Figure 2
[134]Open in a new tab
Venn diagrams for visualizing the upregulated and downregulated cDEGs
that can separate both COVID-19 and IPF patients from the control
samples.
PPI network analysis of cDEGs for identification of cHubGs
The PPI network of cDEGs was constructed using STRING database
(Fig. [135]3) which contains 27 nodes and 115 edges. We selected top
ranked 10 cHubGs {CXCR4, TNFAIP3, VCAM1, NLRP3, TNFAIP6, SELE, MX2,
IRF4, UBD, CH25H} by applying five topological measures (Deg, BN, BC,
Str and Clo) in the PPI network (see Table [136]S2).
Figure 3.
[137]Figure 3
[138]Open in a new tab
Protein–protein interaction (PPI) network of cDEGs for COVID-19 and IPF
disease to identify cHubGs. The red color nodes indicate the cHubGs.
The regulatory network analysis of cHubGs
The cHubGs-TFs interaction network detected top-ranked two sets of TFs
proteins {JUN, SRF and NFKB1} and {JUN, SPI1 and NFKB1} based on JASPAR
and RegNetwork databases respectively, as the key transcriptional
regulatory factors for cHubGs, where two TFs proteins JUN and NFKB1were
common in both-sets (see Fig. [139]4B,C). Based on two databases, we
observed that JUN is a regulator of 7 cHubGs (TNFAIP3, TNFAIP6, IRF4,
SELE, TNFAIP6, NLRP3, VAM1) and NFKB1 for 6 cHubGs (TNFAIP3, IRF4,
CXCR4,UBD, SELE, VAM1). The cHubGs-miRNAs interaction network detected
top-ranked two sets of miRNAs {hsa-mir-155-5p, hsa-mir-27a-5p,
hsa-mir-107, hsa-mir-21-3p, hsa-mir-129-2-3p} and { hsa-mir-155-5p,
hsa-mir-154, hsa-mir-613, hsa-mir-21-3p, hsa-mir-132} based on TarBase
and RegNetwork databases respectively, as the key post-transcriptional
regulatory factors for cHubGs, where two miRNAs hsa-mir-155-5p and
hsa-mir-21-3p were common in both-sets (see Figs. [140]4A,C). Based on
two databases, we observed that hsa-mir-155-5p regulates 6 cHubGs
{IRF4, VAM1, TNFAIP3, TNFAIP6, SELE, CXCR4} and hsa-mir-21-3p regulates
5 cHubGs {VAM1, TNFAIP3, TNFAIP6, MX2, UBD}.
Figure 4.
[141]Figure 4
[142]Open in a new tab
The regulatory network of cHubGs with TFs and miRNAs. The red, light
blue and green color nodes represent the cHubGs, TFs and miRNAs. The
names of cHubGs, key regulatory TFs and miRNAs were displayed only. (A)
The cHubGs-TFs interaction network based on JASPAR database. (B) The
miRNA-cHubGs interaction network based on TarBase database. (C) The
cHubGs-TFs-miRNAs interaction network based on RegNetwork database.
GO functions and KEGG pathway enrichment analysis of cDEGs highlighting
cHubGs
The Table [143]2 displayed the top five commonly enriched GO terms
(BPs, MFs and CCs) and KEGG pathways by cDEGs with three web-tools and
databases Enrichr^[144]41, DAVID^[145]42,[146]43 and GeneCodis^[147]44
(p value < 0.001). Among the top five GO terms of BPs, four BPs
including inflammatory response, regulation of inflammatory response,
response to tumor necrosis factor and response to cytokine were
enriched by the cHubGs-sets with all of three databases. The other
BP-term (response to virus) was enriched by the cHubGs-sets with each
of two databases (DAVID^[148]42,[149]43 and GeneCodis^[150]44). Three
MFs (sequence-specific DNA binding, CCR chemokine receptor binding and
chemokine activity) out of four, were commonly enriched by the
cHubGs-sets with all of three databases. The other MF-term (ubiquitin
binding^[151]59) was enriched by the cHubGs-sets with each of two
databases (Enrichr^[152]41 and GeneCodis^[153]44). Among the top five
significantly enriched CCs terms, four terms including early endosome,
extracellular space, cytosol and extracellular region were commonly
enriched by the cHubGs-sets with two databases out of 3. Only one
CC-term (tertiary granule lumen) was enriched by the cHubGs-sets with
all of 3 databases. Interestingly, all of the top five KEGG pathways
(TNF signaling pathway, IL-17 signaling pathway, NF-kappa B signaling
pathway, NOD-like receptor signaling and Cytokine-cytokine receptor
interaction) were commonly enriched by the cHubGs-sets with all of 3
databases.
Table 2.
The top five commonly enriched GO terms (BPs, MFs and CCs) and KEGG
pathways by cDEGs with three web-tools and databases Enrichr^[154]41,
DAVID^[155]42,[156]43 and GeneCodis^[157]44 (p value < 0.001). The
cHubGs were highlighted by bold gene names. Some references were given