Abstract
Background
   Higher mortality of COVID-19 patients with lung disease is a formidable
   challenge for the health care system. Genetic association between
   COVID-19 and various lung disorders must be understood to comprehend
   the molecular basis of comorbidity and accelerate drug development.
Methods
   Lungs tissue-specific neighborhood network of human targets of
   SARS-CoV-2 was constructed. This network was integrated with lung
   diseases to build a disease–gene and disease-disease association
   network. Network-based toolset was used to identify the overlapping
   disease modules and drug targets. The functional protein modules were
   identified using community detection algorithms and biological
   processes, and pathway enrichment analysis.
Results
   In total, 141 lung diseases were linked to a neighborhood network of
   SARS-CoV-2 targets, and 59 lung diseases were found to be topologically
   overlapped with the COVID-19 module. Topological overlap with various
   lung disorders allows repurposing of drugs used for these disorders to
   hit the closely associated COVID-19 module. Further analysis showed
   that functional protein–protein interaction modules in the lungs,
   substantially hijacked by SARS-CoV-2, are connected to several lung
   disorders. FDA-approved targets in the hijacked protein modules were
   identified and that can be hit by exiting drugs to rescue these modules
   from virus possession.
Conclusion
   Lung diseases are clustered with COVID-19 in the same network vicinity,
   indicating the potential threat for patients with respiratory diseases
   after SARS-CoV-2 infection. Pathobiological similarities between lung
   diseases and COVID-19 and clinical evidence suggest that shared
   molecular features are the probable reason for comorbidity.
   Network-based drug repurposing approaches can be applied to improve the
   clinical conditions of COVID-19 patients.
Supplementary Information
   The online version contains supplementary material available at
   10.1186/s12920-021-01079-7.
   Keywords: COVID-19, SARS-CoV-2, Lung disease, Comorbidity, Disease
   network
Background
   The novel coronavirus disease 2019 (COVID-19) cases, caused by
   SARS-CoV-2, crossed 189,000,000 globally as of July 16, 2021. Data show
   that the most affected groups had two or more pre-existing medical
   conditions such as hypertension, diabetes, and metabolic,
   cardiovascular, and digestive disorders [[25]1–[26]3]. Moreover,
   comorbidity (or existence of multiple disorders) in COVID-19 patients
   is associated with a higher risk of severe illness, poor prognosis, and
   high mortality [[27]4]. During viral infection, a virus hijacks the
   host cell machinery for its replication. Virus–host interactions
   perturb highly organized host cellular networks and reconstruct
   different networks favouring virus replication. The topology of
   molecular interactions is altered in a disease. Hence, the interaction
   of SARS-CoV-2 with healthy human cells is different from that with
   disease cells, which thus leads to various impacts on humans after
   SARS-CoV-2infection. Human diseases are connected via defects in common
   genes [[28]5, [29]6]. Moreover, the similarity in disease phenotypes
   often indicates underlying genetic connections. Therefore, pre-existing
   medical conditions can facilitate the development of another disease if
   they share the same or functionally related genes [[30]7, [31]8].
   SARS-CoV-2 has been associated with respiratory tract infections, and
   in some cases, it severely damages lungs in adult patients. Here, we
   investigated the underlying molecular link between COVID-19 and lung
   diseases to understand the basis of comorbidity. In the present study,
   we have considered a disease in the lung or symptoms in the lung or
   diseases in other tissues or organs affecting the lungs as a "lung
   disease." Gordon et al. [[32]9] recently identified 26 of the 29
   SARS-CoV-2 proteins that bind to 332 human proteins and hijack the host
   translational machinery. Here, we constructed a tissue (lungs)-specific
   neighborhood network of the 332 human targets of SARS-CoV-2. This
   network was integrated with lung diseases to build a disease–gene
   network of the lung. Subsequently, we constructed a lung disease
   network, which also includes COVID-19. In total, 141lung diseases were
   found to be associated with COVID-19. Among them, 49 were directly
   linked to COVID-19, apparently justifying the characteristics of a
   complex disorder. Further, we observed that 59 lung diseases
   topologically overlapped with COVID-19, indicating a higher risk of
   comorbidity. This observation also presents the opportunity to
   repurpose drugs used to treat lung diseases because these drugs can
   simultaneously hit a lung disease and closely associated COVID-19
   module. Moreover, we observed that genes in overlapping lung diseases
   and COVID-19 are coexpressed and involved in a similar molecular
   function and biological processes, representing pathobiological
   similarities between various lung disorders and COVID-19.
   Next, we identified functional protein modules that are maximally
   perturbed by SARS-CoV-2 and involved in RNA processing, export, and
   protein synthesis machinery of the cell. Moreover, these modules are
   associated with various lung disorders, indicating the hotspots for
   comorbidity. Hence, we employed a network-based proximity approach
   [[33]10] and explored the DrugBank database [[34]11] to identified
   approved targets in these protein modules that can be hit by existing
   drugs and rescued from virus possession. Studies have reported that a
   network-based toolset can be effectively used to identify drugs for
   COVID-19 treatment [[35]12, [36]13].We identified 56 druggable human
   proteins in proximity to the COVID-19 disease module and found that
   these proteins can be targeted by FDA-approved or investigational
   drugs. SARS-CoV-2 has a very high mutation rate, which allows it to
   develop drug resistance [[37]14]. Therefore, identifying and targeting
   host factors, rather than targeting viral proteins, will be an enduring
   approach. In summary, this work presents the risk of different lung
   disorders at COVID-19 onset and drug repurposing opportunities to treat
   patients with lung disorders.
Materials and methods
Construction of a lung-specific PPI network of SARS-CoV-2 targets
   Human lung tissue-specific interactome data were retrieved from the
   TissueNet v.2 database. TissueNet v.2 synergizes between large-scale
   data of human PPIs and tissue-specific expression profiles to generate
   tissue-specific PPIs. This database also consolidates PPI data from
   four major databases, BioGrid, IntAct, MINT, and DIP, and integrates
   resulting PPIs with RNA-sequencing profiles of the Genotype-Tissue
   Expression consortium (GTEx). We downloaded 168,296 lung-specific
   interactions from TissueNet v.2 to construct a SARS-CoV-2 target
   interactome. Next, we obtained a list of 332 human proteins targeted by
   SARS-CoV-2, which were identified through affinity-purification mass
   spectrometry [[38]9]. Using these 332 proteins, we built a subnetwork
   from 168,296 lung-specific interactions, SARS-CoV-2 target network
   (STN). Nine SARS-CoV-2 targets (AATF, CEP43, CISD3, MTARC1, NUP62,
   SRP19, THTPA, TIMM10B, and TRIM59) showed no interaction in the lung.
Construction of a lung-specific disease–gene and disease–disease network
   The disease-gene association data in the lungs were retrieved from the
   Gene ORGANizer [[39]15], which is a phenotype-based curated database
   that links human genes to the body parts they affect. Phenotypes
   classified by Human Phenotype Ontology (HPO) were considered with
   certain modifications. After disease-gene association data were
   pre-processed, disease–gene pairs that were not included but matched
   with the HPO phenotype were manually added. Aspirin-induced asthma and
   asthma were both considered as asthma. Pulmonary emphysema,
   sarcoidosis, and silicosis and their associated genes were also added
   to the list. Finally, 6040 disease–gene pairs and 184 various lung
   diseases were listed. If a gene is associated with a known lung
   disorder, then the gene and lung disorder were connected via links.
   Subsequently, nodes in the STN were linked to the lung disorder to
   construct the disease–gene association map of the network. Of the 5050
   nodes of the STN, 618 were linked to 145 lung diseases. Of the 618
   genes, 36 were the direct targets of SARS-CoV-2 and were connected to
   COVID-19 as a new disease–gene pair. Finally, a lung disease–gene
   network (LDGN) consisting of 1815 disease–gene pairs, including that
   for COVID-19, was constructed. The disease–disease association network
   (DDAN) was derived from the lung disease–gene association network; two
   diseases were connected if they shared one common gene. The disgenet2r
   package [[40]16] was used to study the association between disease
   classes and functional protein modules.
Network-based separation measure between diseases
   To identify the overlapping disease modules, a "separation" measure,
   S[ab] was calculated between COVID-19 (a) and lung disease (b) using
   the following formula:
   [MATH: Sab=
   mrow><dab>
   -<daa>
   -<dbb>
   2 :MATH]
   [MATH: Sab :MATH]
   compares the shortest distances between proteins connected to each
   disease,
   [MATH: <daa>
   :MATH]
   and
   [MATH: <dbb>
   :MATH]
   , to those
   [MATH: <dab>
   :MATH]
   between a–b protein pairs. Positive
   [MATH: Sab :MATH]
   shows that the two disease modules are separated on the lung
   interactome, whereas a negative value indicates overlapping modules.
   The statistical significance of module overlap between COVID-19 and the
   lung disease was evaluated using the full randomization model. The same
   number of proteins associated with two diseases was randomly sampled
   1000 times, and the corresponding
   [MATH: Sabran :MATH]
   between the two gene sets was calculated. Next, z-score was calculated
   as follows:
   [MATH: z-score=S<
   mrow>ab-mσ :MATH]
   where m and σ indicates the mean value and standard deviation of 1000
   [MATH: Sabran :MATH]
   . Here, z-score < 0 indicates that the two diseases are closely
   overlapped than expected by chance [[41]17].
Community detection
   We applied fast-greedy, walktrap, louvain, leading eigenvector, and
   spinglass on the STN as an undirected, unweighted network. These
   community detection algorithms segregate the nodes into higher-density
   modules and optimize an objective function, that is, modularity.
   Communities separated by spinglass were selected for subsequent
   analysis based on the modularity score and community size. Spinglass
   uses a random number generator to find the communities. Therefore, we
   ran Spinglass 10 times with different seed values. We compared the rand
   statistics between each run, and results showed that the structures of
   these communities are highly similar (> 0.7) [[42]18, [43]19].
Network-based proximity measure
   Network proximity between drug targets (A) and SARS-CoV-2 targets in
   the host (B) was measured using the closest method (d[c]).
   [MATH:
   dc=1‖A‖+‖B‖∑a∈A<
   /munder>minb∈Bda,b+
   ∑b∈B<
   mi
   mathvariant="italic">minb∈Ad(a,b) :MATH]
   where d(a, b) represents the shortest distance between genes a and b in
   the lung interactome. The statistical significance of proximity was
   evaluated using z-score (z[c]). z[c] was calculated by comparing the
   observed distance to a reference distance distribution. To compute
   reference distance distribution, the sets of proteins of size and
   degree similar to those of the drug targets and disease proteins were
   randomly selected for 1000 times from the lung interactome. The mean
   and standard deviation of distance distribution was calculated to
   compute z[c] [[44]10, [45]20].
Process and pathway enrichment analysis and gene ontology semantic similarity
   Pathway and process enrichment analysis was performed using Metascape
   [[46]21]. Gene ontology biological processes, KEGG Pathway, and
   Reactome were used as ontology sources. GO semantic similarity between
   genes was measured using Wang et al. [[47]22] method with the GOSemSim
   package in R. Considering that two genes G1 and G2 are annotated by the
   GO term sets GO1 = [go11, go12, …, go1m] and GO2 = [go21, go22,
   …, go2n], respectively, their semantic similarity score, which is
   determined using Wang's method, is defined as follows:
   [MATH: SimG1,G2=<
   msub>∑1≤i≤mSimgo1i,
   GO2+∑1≤j≤nSimgo2j,
   GO1m+n.
   :MATH]
Correlation analysis
   GTEx gene expression datasets of healthy human lung tissues were
   downloaded from the UCSC Xena project [[48]23]. log[2](RSEM + 1) (RSEM:
   RNA-Seq by Expectation Maximization) transformed gene expression data
   (n = 288) were retrieved, and the Pearson correlation coefficient was
   computed to measure coexpression levels using the Hmisc package in R.
Computation of topological parameters
   The largest connected component (LCC), dyadicity, and Jaccard
   similarity coefficient were measured using the igraph package in R.
   Dyadicity (D) is the number of same label edges divided by the expected
   number of same label edges, and D > 1 indicates higher connectedness
   between the nodes with the same label. The Jaccard similarity
   coefficient of two nodes was calculated as the number of common
   neighbors divided by the number of nodes that are neighbors of at least
   one of the two nodes.
Tools for data analysis, plotting, and statistical analysis
   R packages tidyverse and stringr were used for data analysis, and
   graphs were plotted using ggplot2. Networks were visualized using
   Gephi. Statistical significance between the groups was analyzed using
   the non-parametric Mann–Whitney test in R, and that of the overlap
   between gene lists was analyzed using Fisher's exact test.
Results
Construction of SARS-CoV-2–host interactome in the lung
   To depict the SARS-CoV-2–host interaction network, the protein–protein
   interaction (PPI) network of the lungs (lung interactome) was obtained
   from the TissueNet v.2 database [[49]24]. We referred to Gordon et al.
   [[50]9] for the list of 332 human targets of SARS-CoV-2 and constructed
   a subnetwork of these proteins from the PPI network. Of the 332 viral
   targets, 323 proteins were present in the subnetwork. The resulting
   subnetwork, named as the SARS-CoV-2 target network (STN), has 5050
   nodes and 11,256 pairwise interactions (Fig. [51]1a, Additional file
   [52]2: Table S1). Next, 181 of the 323 viral targets form the LCC
   within the lung interactome. To determine the statistical significance
   of the LCC, we randomly selected proteins with a matching degree and
   calculated the size of the LCC. We repeated the random selection 1000
   times and found that the size of the random LCC was 136.28 ± 16.05
   (Fig. [53]1b), and z-score = 2.78 (p-value = 5.36 × 10^−3), indicating
   that the SARS-CoV-2–host interaction network did not appear by chance
   and the target proteins were located in the same network vicinity
   [[54]10, [55]13]. To confirm this result, we computed dyadicity (D) (a
   measure of the connectedness of the nodes with the same label, see
   Materials and Methods) among the SARS-CoV-2 targets in the STN to
   determine if they share more or fewer edges than expected in a random
   configuration of the network. We found D = 7.664, indicating high
   connectedness among SARS-CoV-2 targets. D > 1 signifies that the
   SARS-CoV-2 targets form a community-like structure to hijack the host
   cellular machinery. If implicated in diseases, proteins in a community
   confer a higher chance of comorbidity than those not in the community
   because proteins in a community frequently interact, coexpress, and are
   functionally interconnected [[56]25]. Therefore, to understand the link
   between COVID-19 and other lung diseases, we constructed and analyzed
   the disease–gene and disease–disease association map linked to the STN.
Fig. 1.
   [57]Fig. 1
   [58]Open in a new tab
   a Neighbourhood interaction network of SARS-CoV-2 targets (STN) in the
   lung. The size of the node is proportional to its degree. b SARS-CoV-2
   targets form a LCC of size 181 in the lung interactome. The size of the
   LCC is significantly larger than the random expectation
Disease–gene and disease–disease association map of COVID-19 in lungs
   To construct a disease association map of the STN, we obtained the
   disease-gene association data from the ORGANizer database [[59]15]. In
   total, 184 lung diseases, 1957 genes, and 6040 disease–gene pairs were
   considered for further analysis (see Materials and Methods) (Additional
   file [60]3: Table S2). However, 1442 of the 1957 genes are present in
   the lung interactome. To create the disease–gene association map, we
   screened the diseases associated with proteins (nodes) in the STN. A
   disease and gene are then connected if the gene is associated with the
   lung disorder. We observed that 618 proteins consisting of 36
   SARS-CoV-2 targets were linked to 146 disorders, which includes
   COVID-19 (Additional file [61]4: Table S3). The overlap between
   SARS-CoV-2 targets and 1442 lung disease-associated genes was not
   statistically significant (Fisher's exact test, value = 0.454). Gysi et
   al. [[62]13] reported a similar observation with a group of genes
   involved in various disease classes. However, the overlap between 5050
   nodes in the STN and lung disease-associated genes was statistically
   significant (Fisher's exact test, p-value = 2.93 × 10^−5).
   Figure [63]2a shows the resulting disease–gene association map of the
   STN, named as the lung disease–gene network (LDGN), consisting of1814
   disease–gene pairs.
Fig. 2.
   [64]Fig. 2
   [65]Open in a new tab
   Disease-gene association network. a Lung disease-gene network (LDGN),
   including COVID19 (yellow node). The network shows the SARS-CoV-2
   targets (red) and neighborhood genes (green). b, c Dot plot shows the
   highly connected diseases (k > 20) and genes in LDGN, respectively
   The LCC within the LDGN consists of 141 lung diseases and 610 genes,
   indicating that many of the disorders share a common genotype. For
   example, the SARS-CoV-2 targets, FBN1 (degree, k = 15), FBLN5 (k = 11),
   and COMT (k = 9), and neighborhood nodes, OFD1 (k = 19), DNAAF2
   (k = 16), and DNAAF5 (k = 16), are linked to multiple disorders
   (Fig. [66]2b). Similarly, a disorder in the LDGN is also connected with
   multiple genes [e.g., ventricular septal defect (k = 142), respiratory
   insufficiency (k = 133), congestive heart failure (k = 95), apnea
   (k = 63), and hypothyroidism (k = 60) (Fig. [67]2c, Additional file
   [68]1: Fig. S1 and Fig. S2)].
   The disease-gene association pattern in the LDGN indicates the presence
   of a molecular connection between COVID-19 and a wide range of lung
   disorders. To comprehend this connection, a disease-disease association
   network, DDAN was constructed, where two diseases were linked if they
   share one associated gene (Fig. [69]3a). DDAN consists of 141 diseases
   (nodes) and 1326 links, indicating a higher clustering between
   diseases. Further, the degree distribution of the DDAN did not follow
   the scale-free property (Fig. [70]3b). To determine the exact
   topological nature, we measured network transitivity (
   [MATH: TDDAN :MATH]
   =0.4264) and average path length (
   [MATH: LDDAN :MATH]
   =2.0585) of the DDAN and compared them with the equivalent 1000
   Erdős − Rényi random graphs. The results showed that the average path
   length is significantly lower (p-value < 0.0001), whereas transitivity
   is significantly higher (p-value < 0.0001) than random graphs (
   [MATH: Lrandom=2.44680 :MATH]
   and
   [MATH: Trandom=0 :MATH]
   .0668) (Fig. [71]3c, d). Further, we calculated the small-worldness
   scalar (S) for the DDAN as follows:
   [MATH: γ=TDDANTrandom=6.3
   83 :MATH]
   [MATH: λ=LDDANLrandom=0.8
   41 :MATH]
   [MATH:
   S=γλ=
   7.589 :MATH]
Fig. 3.
   [72]Fig. 3
   [73]Open in a new tab
   Disease-disease association network (DDAN). a DDAN, including COVID19,
   red nodes represent the diseases that are directly direct linked to
   COVID19. b Scatter plot shows the degree distribution of DDAN, which
   does not follow the scale-free property. c The average path length
   between the diseases in DDNA and distribution of average path length of
   1000 random networks (green). d Transitivity of DDNA and distribution
   of transitivity of 1000 random networks (pink)
   A network is considered a small-world network if S > 1 [[74]26]. Hence,
   the topology of the DDAN represents a small-world property, indicating
   that any two diseases in this network have a high tendency to be
   interconnected and may cause the overlapping disease pathogenesis.
   Forty-nine diseases in the DDAN were directly connected to COVID-19.
   Using the number of common genes, the Jaccard similarity coefficient
   was computed to identify the extent of molecular overlap between the 49
   lung diseases and COVID-19 (Additional file [75]1: Fig. S3). Several
   diseases, such as respiratory insufficiency, congestive heart failure,
   respiratory failure, ventricular septal defect, mitral regurgitation,
   and hyperthyroidism, are closely associated with COVID-19. Of note,
   although molecular connections exist between COVID-19 and various lung
   diseases, these overlaps are not statistically significant (considering
   only SARS-CoV-2 targets). Nevertheless, these molecular connections are
   crucial for analyzing the effect of SARS-CoV-2 infection on lung
   patients; however, opportunities to comprehend disease comorbidity are
   limited with these molecular connections.
Topological overlap between disease modules, pathobiological similarities,
and opportunities for drug repurposing
   For a greater understanding of comorbidity, we measured the
   network-based separation between two disease modules to comprehend
   their degree of overlap. The network-based separation measure is
   primarily advantageous because it can predict disease–disease
   association, even if two diseases share no genes. If two disease
   modules overlap, then perturbations to one disease can cause
   disturbance to another, indicating that they have similar clinical
   characteristics. The magnitude of the overlap indicates the biological
   and pathobiological similarities between the two disease modules
   [[76]17]. Network-based separation (
   [MATH: Sab :MATH]
   ) (see Materials and Methods) between COVID-19 and all lung diseases
   was measured. Of the184 lung diseases, 59 demonstrated overlapping
   modules (
   [MATH: Sab<0
   :MATH]
   ) with COVID-19 (Additional file [77]5: Table S4). The statistical
   significance of
   [MATH: Sab :MATH]
   for each disease pair, that is, COVID-19 and each lung disease, was
   evaluated using a full randomization model. We observed, for all 59
   diseases, the z-score was < 0, indicating that these diseases are
   closely overlapped with COVID-19 than expected by chance.
   Figure [78]4a–j shows the top 10 closely overlapping lung disease
   modules with COVID-19 (e.g., hemolytic-uremic syndrome (
   [MATH: Sab=-0.214
   2 :MATH]
   ), abnormal respiratory motile cilium morphology (ARMCM;
   [MATH: Sab=-0.211
   38 :MATH]
   ), obstructive lung disease (
   [MATH: Sab=-0.210
   22 :MATH]
   ), pleural effusion (
   [MATH: Sab=-0.182
   16 :MATH]
   ), patent foramen ovale (PFO,
   [MATH: Sab=-0.161
   9 :MATH]
   ), and pulmonary insufficiency (
   [MATH: Sab=-0.156
   94 :MATH]
   ). Thus, patients with these disorders are probably more vulnerable to
   COVID-19 symptoms or vice versa because of overlapping disease modules.
   The same set of genes induce ARMCM, absent respiratory ciliary axoneme
   radial spokes, and respiratory insufficiency, which are caused because
   of defective ciliary clearance; therefore, we considered only ARMCM in
   the top 10 list. According to the network-based separation measure,
   almost 32% of lung diseases have overlapping modules with COVID-19 and
   the remaining 68% are topologically separated. To understand the
   biological relationship and pathobiological similarities, the
   expression correlation and semantic similarity (molecular functions and
   biological processes) of genes involved in COVID-19 and overlapping
   lung diseases (
   [MATH: Sab<0
   :MATH]
   ).) were measured. Gene coexpression and semantic similarity were
   significantly (p-value < 0.0001) higher compared to those in the random
   control (Fig. [79]4k–m), indicating the biological and pathobiological
   similarities between COVID-19 and overlapping lung diseases. To further
   investigate the similarities in clinical features, results from recent
   publications were explored. Reports have raised concerns about lung
   injuries linked to COVID-19 [[80]27, [81]28]. A higher percentage of
   COVID-19 patients in severe conditions are more likely to develop
   chronic obstructive pulmonary disease (COPD) and impairment of
   diffusion capacity [[82]4, [83]29]. Many lung diseases (
   [MATH: Sab<0) :MATH]
   (Additional file [84]5: Table S4) with a overlapping module with
   COVID-19 are linked to these aforementioned phenotypes. A disease
   closely associated with COVID-19, haemolytic-uremic syndrome
   [MATH: (Sab<0) :MATH]
   (Fig. [85]4a), causes pulmonary hemorrhage, which is linked to kidney
   failure [[86]30], and studies have recently reported that chronic
   kidney diseases and chronic pulmonary disease cause adverse outcomes in
   COVID-19 patients [[87]4, [88]31]. Abnormal respiratory motile cilium
   (Fig. [89]4b) or ciliary dyskinesia (
   [MATH: Sab=-0.211
   38 :MATH]
   ) causes chronic respiratory tract infections because the improper
   movement of mucus restricts the complete elimination of fluid,
   bacteria, and particles from the lungs, leading to bronchitis (chronic
   bronchitis,
   [MATH: Sab=-0.146) :MATH]
   ([90]www.ghr.nlm.nih.gov). Lack of respiratory clearance in a patient
   with ciliary dyskinesia could confer a higher risk of health hazard
   after SARS-CoV-2 infection. Another study showed that patients with
   obstructive lung disease (
   [MATH: Sab=-0.21<
   /mn> :MATH]
   ) (Fig. [91]4c) and pulmonary emphysema (
   [MATH: Sab=-0.09) :MATH]
   are at a higher risk of pneumothorax after SARS-CoV-2 infection
   [[92]32]. Rajendram et al. [[93]33] predicted that PFO (Fig. [94]4e)
   may be common in COVID-19 patients because PFO causes pulmonary
   embolism [[95]34]. Even the disease module of pulmonary embolism
   overlapped (
   [MATH: Sab=-0.008) :MATH]
   with COVID-19. A clinical study in Wuhan, China [[96]35] reported that
   almost 5% COVID-19 patients had pleural effusion(
   [MATH: Sab=-0.18) :MATH]
   (Fig. [97]4d), which is often caused by congestive heart failure and
   blood clots in lung arteries. Importantly, pleural effusion is commonly
   associated with age-related respiratory problems and cancer [[98]36].
   On the other hand, congestive heart failure, which causes many
   lung-related diseases [[99]37], also overlapped with the COVID-19
   disease module (Additional file [100]5: Table S4). A meta-analysis by
   Alqahtani et al. [[101]38] demonstrated that the risk of more severe
   COVID-19 was higher in patients with COPD (risk of severity = 63%) than
   in those without (33.4%). Although these results suggest the clinical
   similarities between COVID-19 and overlapping lung disorders, they are
   limited and cannot be extrapolated for all overlapping lung disorders
   without clinical evidence. Furthermore, a genome-wide association study
   has presented the genetic susceptibility locus in the chromosome of
   patients with COVID-19 and respiratory failure [[102]39], and genes
   present in this locus (SLC6A20, LZTFL1, and CCR9) were also associated
   with different lung disorders (
   [MATH: Sab<0) :MATH]
   such as pulmonary fibrosis, respiratory distress, asthma, and nephrotic
   syndrome.
Fig. 4.
   [103]Fig. 4
   [104]Open in a new tab
   Network-based separation (
   [MATH: Sab :MATH]
   ) and pathobiological similarities. a–j shows observed
   [MATH: Sab :MATH]
   , z-score (red arrow) and distribution of
   [MATH: Sabran :MATH]
   of top 10 overlapping lung diseases with COVID-19 (here, ARMCM
   indicates abnormal respiratory motile cilium morphology). k Box plot
   represents the pairwise correlation between genes is significantly
   (p-value < 0.0001) higher than the random gene sets. l, m Box plots
   show the distribution of functional similarities (MF) and GO processes
   (BP) between the genes involved in lung disease and COVID-19. The GO
   processes and functional similarity between the genes are significantly
   high (p < 0.0001) compared to the random gene sets (note, in figures k,
   l, and m 1–10 indicates disease in a similar sequence as it is
   mentioned in figures a to j). n The strategy of drug repurposing to
   target the COVID19 module
   The availability of efficient drugs for the treatment of clinically
   characterized lung diseases having overlapping neighborhoods with
   COVID-19 has shown the scope for repurposing these drugs for
   COVID-19treatment. When two diseases are localized in the same network
   vicinity and overlap with each other, then targeting one disease can
   affect another disease module (Fig. [105]4n), leading to efficient
   clinical outcomes for both as they have common network neighborhoods
   [[106]20]. Clinical data from the Clinicaltrials.gov database show that
   some drugs used for lung diseases, such as methylprednisolone for
   tracheal stenosis (
   [MATH: SAB=-0.149
   15 :MATH]
   ) and ketamine and budesonide for COPD are in clinical trials for
   COVID-19 treatment [[107]40]. Therefore, existing drugs that are
   presently used for treating lung disorders on COVID-19 patients must be
   tested for better clinical outcomes. Treating a comorbid patient is
   challenging, but an accurate clinical picture of patients, molecular
   signature of diseases, and drug target information can improve the
   present crisis.
Functional protein modules preferentially hijacked by SARS-CoV-2 are linked
to a broad range of lung disorders
   Modularity in the network refers to the pattern of connectedness in
   which nodes are grouped into highly connected subsets [[108]41]. A key
   feature in the PPI network is that tightly connected proteins within a
   community are mostly involved in similar biological functions
   [[109]42]. Similarly, genes involved in related diseases are highly
   connected; moreover, diseases linked to common genes result in the
   formation of disease modules and comorbidity [[110]43]. We compared
   various community detection algorithms, that is, fast-greedy, walktrap,
   louvain, leading eigenvector, and spinglass, to identify protein
   modules in the STN [[111]19, [112]44]. Spinglass showed good
   partitioning with a higher modularity score compared with other
   algorithms (see “Materials and Methods” section and Additional file
   [113]6: Table S5). Our findings are in agreement with those of the
   study by Rahiminejad et al. [[114]18], where good partitioning of the
   functional protein module was observed using spinglass in eukaryotes.
   Of the 21 modules, the top 4 protein modules were selected according to
   the presence of a large number of SARS-CoV-2 targets (> 20) and a gene
   ontology semantic similarity score (> 0.2) of biological processes
   (Additional file [115]6: Table S6). Numerous viral targets were
   considered because these modules are largely hijacked and strongly
   perturbed after infection compared with other functional modules in the
   network. The modules were named as modules 1, 2, 3, and 4, and each
   module contains 63, 50, 28, and 23 SARS-CoV-2 target proteins,
   respectively (Fig. [116]5). The biological process and pathway
   enrichment analysis showed that module1, the largest module, is mainly
   enriched with RNA metabolism, including transcription, mRNA processing,
   transport, mRNA de-adenylation, and surveillance. Presumably,
   biological processes linked to module1 are hijacked by SARS-CoV-2 in
   the early stage of infection for viral RNA production. Notably, the
   components of module1 are linked to 64 disorders, among which the
   highly connected are respiratory insufficiency, ventricular septal
   defect, respiratory distress, pneumonia, and lung neoplasm
   (Fig. [117]5a, 3rd column, Additional file [118]7: Table S7). Most
   module 1-associated diseases are directly connected to COVID-19
   (Fig. [119]3b, Additional file [120]1: Fig. S2). On the other hand,
   hijacking module2 can predominantly affect protein degradation (ERAD
   pathway, HRD1 complex, and regulation of the protein catabolic
   process), transport, folding, and stability (retrograde protein
   transport, regulation of protein stability,
   VCP-VIMP-DERL1-DERL2-HRD1-SEL1L complex, regulation of intracellular
   transport, regulation of vesicle-mediated transport, and protein
   folding in the endoplasmic reticulum). Module3 and module4 involve
   several processes, primarily cellular transport, localization,
   organization, and cell cycle. Modules 2, 3, and 4 were linked to 79,
   60, and 32 different disorders, respectively (Additional file [121]7:
   Table S7). The disease association of all four protein modules was
   significantly higher (p-value < 0.0001) than 1000 random gene sets.
   Moreover, a broad spectrum of disorders of various classes, such as
   neoplasms, neurological, and digestive systems, was associated with
   these modules (Additional file [122]1: Fig. S4). Gysi et al. [[123]13]
   predicted that the manifestation of SARS-CoV-2 in different human
   tissues could cause various disorders. Therefore, not only lung-related
   disorders, but diseases in other organs can also be a potential threat
   for COVID-19 patients. To confirm this observation, gene coexpression
   pattern in functional modules was analyzed. Genes in the same
   functional module often show a high coexpression profile, which
   indicates their involvement in similar biological processes. Therefore,
   we calculated Pearson correlation coefficients of gene pairs using gene
   expression data of healthy lung tissues from GTEx. The median value of
   the positive correlation between the genes in all modules was
   significantly higher (p-value < 0.0001) than that for the random gene
   set (Fig. [124]5, [125]4th column). Therefore, these modules can be
   identified as coexpression modules that share core transcriptional
   programs in the lung, which indicates that their perturbation can lead
   to a similar disease phenotype. Next, we used drug repurposing to find
   the targets to hit functional modules.
Fig. 5.
   [126]Fig. 5
   [127]Open in a new tab
   Community detection in STN and functional protein module. a–d show the
   modules 1 to 4, pathway and process enrichment analysis of each module,
   their disease associations, and positive correlation between genes in
   each module in healthy lung tissue. The pairwise correlation between
   genes in each module is significantly (p-value < 2.2 × e^−16) higher
   than the random gene sets
Drug repurposing to target functional modules
   We propose targeting functional protein modules hijacked by SARS-CoV-2,
   by drug repositioning. There are two main reasons to target these
   modules. First, the binding of a drug to its target in a module will
   prevent virus replication. Second, as a module is linked to several
   lung diseases, targeting a module can reduce severity in patients. From
   Drug Bank, we identified 56 approved targets (red color nodes in
   Fig. [128]6) that can be hit by 144 approved or investigational drugs
   in the clinical trial (Additional file [129]8: Table S8) [[130]11]. The
   list contains10 approved drugs at different stages of clinical trials
   for COVID-19 treatment, including chloroquine targeting Glutathione
   S-transferase Mu 1 in module3 (Additional file [131]1: Fig. S5).
   However, the efficacy of chloroquine on COVID-19 patients is arguable.
   Coagulation factor X (F10) was observed in module2, which has recently
   been implicated as a target due to the potential role of coagulopathy
   in COVID-19 [[132]45]. To determine the effectiveness of targets, we
   applied network-based proximity measures to calculate the proximity
   between the COVID-19 disease module (network among SARS-CoV-2 targets)
   and FDA-approved targets in the functional module. We used the
   "closest" (d[c]) measure, representing the closest path length between
   a target and the nearest SARS-CoV-2 target protein. Then, we calculated
   z-score (z[c]) to validate the proximity by comparing the observed
   target–disease protein distance to the random expectation [[133]10].
   All 56 targets were proximal (z[c] <  − 2) to the COVID-19 disease
   module (Additional file [134]9: Table S9) compared to the random
   expectation. Next, approved drugs in clinical trials for COVID-19
   treatment (Additional file [135]1: Fig. S5) were also significantly
   closer to the COVID-19 disease module (z[c] < − 2) (Additional file
   [136]10: Table S10). The highly connected (hub) targets in the
   functional module, such asNTRK1 (k = 43) and IMPDH2 (k = 37) in
   module1, as well as PLAT (k = 17) and COMT (k = 10) in module2, can be
   considered as potential targets for COVID-19 treatment (Additional file
   [137]1: Fig. S5). Considering the complexity of COVID-19, different
   locations in the STN must be targeted as this may help efficiently
   rewire the cellular network [[138]46] and rescue these functional
   modules from the virus, thereby reducing virus growth [[139]20]. Many
   of the target proteins suggested do not directly interact with
   SARS-CoV-2; instead, they are neighborhood nodes. The binding of drugs
   to these targets present in the same network vicinity may efficiently
   perturb the network modules, including viral growth [[140]47]. Notably,
   this article presents the computation analysis; therefore all
   drug-target combinations should be tested on SARS-CoV-2-infected cell
   lines and validated through clinical trials.
Fig. 6.
   [141]Fig. 6
   [142]Open in a new tab
   Targetable protein in functional modules: The red nodes in each module
   indicate the FDA-approved targets
Discussion
   Currently, a speedy drug discovery is urgently required to stop the
   infection and rapid transmission of SARS-CoV-2. Aged COVID-19 patients
   with comorbidity are at severe health risks worldwide. The present
   study evidenced the risk of COVID-19 at the onset of various
   lung-related disorders and the molecular basis of comorbidity by
   applying the principle of network biology. COVID-19 can be considered
   as complex disease because of wide-ranging SARS-CoV-2 targets in the
   host cell, which thus establishes the molecular connection with various
   lung-related disorders. The disease-gene, disease–disease association
   map, and network separation analysis have revealed molecular links and
   clustering of diseases in the same network vicinity, indicating a
   pathobiological similarity between COVID-19 and various lung disorders.
   Some diseases closely associated with COVID-19 are haemolytic-uremic
   syndrome, obstructive lung disease, pleural effusion, and chronic
   bronchitis. Because of the close association, the pre-existence of
   these diseases can lead to higher mortality of COVID-19 patients.
   Severity of one of the common respiratory problems, asthma, which has
   an overlapping disease module and is directly connected to COVID-19,
   becomes moderate to high with SARS-CoV-2 infection ([143]www.cdc.gov).
   These observations provide a detailed understanding of the molecular
   basis of severe illness in COVID-19 patients with specific lung
   disorders and help us decipher the patient-specific etiology of
   COVID-19. Because of multiple molecular connections and overlapping
   disease modules of COVID-19 with various lung disorders, finding
   specific targets and potential drugs for COVID-19 patients with
   pre-existing medical conditions is challenging. The present crisis
   cannot wait until new drugs arrive; therefore, we proposed two
   approaches for drug repositioning. The first approach is testing drugs
   approved for lung diseases having modules overlapping with COVID-19.
   These drugs can simultaneously affect two disease modules, leading to
   much-improved treatment outcomes. The second approach is targeting host
   functional protein modules that are linked with many lung disorders and
   are primarily hijacked by SARS-CoV-2. Perturbing these modules by
   repurposing FDA-approved (or investigational) drugs may rescue the host
   cellular machinery utilized for virus replication. Considering the
   complexity of SARS-CoV-2 infection, we suggest hitting multiple targets
   in different functional modules to improve clinical outcomes. However,
   systematic studies through clinical trials for identifying drug
   combinations and their targets are highly recommended to increase
   clinical efficacy and lower toxicity [[144]20]. Moreover,
   patient-specific high-throughput transcriptomics data or construction
   of a weighted gene expression networks from SARS-CoV-2-infected lung
   tissues can further the possibility of target identification, like in
   other human diseases [[145]48, [146]49]. In addition, Mendelian
   randomization study can be performed to understand the causal
   relationships between lung diseases and susceptibility and severity of
   COVID-19 [[147]50, [148]51]. Lastly, the experimental validation of our
   observation and in vitro or in vivo assays of drug combination and
   study of pharmacokinetics are warranted to establish a proper treatment
   strategy.
Conclusion
   In summary, this study used the network biology framework to elucidate
   the molecular link between lung disorders and COVID-19. The
   network-based separation measure identified 59 lung diseases
   topologically overlapped with the COVID-19 module. In addition, the
   Disease-disease association network showed forty-nine diseases were
   directly connected to COVID-19. This revealed the cause of severe
   illness of patients with respiratory problems after SARS-CoV-2
   infection. Genes in functional protein modules, hijacked by SARS-CoV-2,
   are coexpressed and connected to several lung diseases. The
   perturbation of these modules may block the virus growth in the host
   cells. Therefore, existing FDA-approved drugs can target the hijacked
   protein modules to avoid the life-threatening situation of COVID-19
   patients with lung disorders.
Supplementary Information
   [149]12920_2021_1079_MOESM1_ESM.docx^ (3.6MB, docx)
   Additional file 1. Fig.S1. Dot plot shows the number of genes
   associated with a lung disorder in LDGN. Fig. S2. Dot plot shows the
   number of shared genes between COVID-19 other lung disorders. Fig.S3.
   The network view of the Jaccard similarity coefficient between lung
   diseases and COVID19. Fig.S4. Heat map shows functional protein modules
   are associated with different disease classes. Fig.S5. Drug repurposing
   to target functional protein modules.
   [150]12920_2021_1079_MOESM2_ESM.pdf^ (703.2KB, pdf)
   Additional file 2. Table S1. Edge list of SARS-CoV-2 target network
   (STN) from TissueNet v.2 database.
   [151]12920_2021_1079_MOESM3_ESM.pdf^ (570.4KB, pdf)
   Additional file 3. Table S2. Disease-gene association data of lungs
   from the ORGANizer database.
   [152]12920_2021_1079_MOESM4_ESM.pdf^ (298.5KB, pdf)
   Additional file 4. Table S3. Disease-gene association data of STN.
   [153]12920_2021_1079_MOESM5_ESM.pdf^ (419.6KB, pdf)
   Additional file 5. Table S4. Network-based separation measure between
   disease modules and statistical significance.
   [154]12920_2021_1079_MOESM6_ESM.pdf^ (408.7KB, pdf)
   Additional file 6. Table S5 and S6. Comparison of different community
   detection algorithms applied to SARS-CoV2 target network (STN) and
   protein modules generated using Spinglass algorithem.
   [155]12920_2021_1079_MOESM7_ESM.pdf^ (429.1KB, pdf)
   Additional file 7. Table S7. Disease association with functional
   protein modules.
   [156]12920_2021_1079_MOESM8_ESM.pdf^ (448.7KB, pdf)
   Additional file 8. Table S8. Targets in functional protein modules and
   drugs from DrugBank database.
   [157]12920_2021_1079_MOESM9_ESM.pdf^ (403.1KB, pdf)
   Additional file 9. Table S9. Proximity between targets and COVID-19
   disease module.
   [158]12920_2021_1079_MOESM10_ESM.pdf^ (397.1KB, pdf)
   Additional file 10. Table S10. Proximity between drugs in clinical
   trial and COVID-19 disease module.
Acknowledgements