Graphical abstract

   graphic file with name ga1.jpg
   [33]Open in a new tab

Abstract

   The novel coronavirus SARS-CoV-2 is damaging the world’s social and
   economic fabrics seriously. Effective drugs are urgently needed to
   decrease the high mortality rate of COVID-19 patients. Unfortunately,
   effective antiviral drugs or vaccines are currently unavailable.
   Herein, we systematically evaluated the effect of SARS-CoV-2 on gene
   expression of both lung tissue and blood from COVID-19 patients using
   transcriptome profiling. Differential gene expression analysis revealed
   potential core mechanism of COVID-19-induced pneumonia in which IFN-α,
   IFN-β, IFN-γ, TNF and IL6 triggered cytokine storm mediated by
   neutrophil, macrophage, B and DC cells. Weighted gene correlation
   network analysis identified two gene modules that are highly correlated
   with clinical traits of COVID-19 patients, and confirmed that
   over-activation of immune system-mediated cytokine release syndrome is
   the underlying pathogenic mechanism for acute phase of COVID-19
   infection. It suggested that anti-inflammatory therapies may be
   promising regimens for COVID-19 patients. Furthermore, drug repurposing
   analysis of thousands of drugs revealed that TNFα inhibitor etanercept
   and γ-aminobutyric acid-B receptor (GABABR) agonist baclofen showed
   most significant reversal power to COVID-19 gene signature, so we are
   highly optimistic about their clinical use for COVID-19 treatment. In
   addition, our results suggested that adalimumab, tocilizumab, rituximab
   and glucocorticoids may also have beneficial effects in restoring
   normal transcriptome, but not chloroquine, hydroxychloroquine or
   interferons. Controlled clinical trials of these candidate drugs are
   needed in search of effective COVID-19 treatment in current crisis.

1. Introduction

   Based on the situation report of World Health Organization (WHO), as of
   December 8, 2020, the outbreak of coronavirus disease 2019 (COVID-19),
   caused by the severe acute respiratory syndrome coronavirus 2
   (SARS-CoV-2) has spread worldwide with a total of 68,266,812 infections
   and 1,557,674 deaths [34][1]. The rapid propagation of COVID-19 makes
   it difficult to tackle. No effective drugs or vaccines also contributes
   to the uncontrollable situation. Researchers and scientists all over
   the world are racing against time to characterize the nature of this
   virus and find novel therapeutics for this virus [35][2], [36][3]. The
   most striking drug in the current COVID-19 crisis is remdesivir, a
   broad-spectrum antiviral medication with antiviral activity against
   several RNA viruses including SARS coronavirus and middle eastern
   respiratory syndrome virus (MERS) [37][4]. Significant outcome
   improvements were achieved for severe COVID-19 patients after treatment
   with compassionate-use of remdesivir [38][4]. However, another study
   from China with a randomized, double-blind, placebo-controlled and
   multicenter trial showed that remdesivir was not associated with
   statistically significant clinical benefits for severe COVID-19
   patients [39][5]. Moreover, large scale trials of vaccines are still
   ongoing though the vaccine must be the line of defense to eventually
   eliminate this coronavirus. So how to reduce the mortality of infected
   patients before vaccines are approved, is now the most urgent task of
   researchers and physicians.

   Fortunately, one of the most efficient ways to identify effective
   therapy is through repurposing existing clinic therapeutic drugs using
   public data since bioinformatic prediction [40][6] and machine learning
   [41][7] could provide novel biological insigths [42][8]. These drugs
   can be divided into two broad categories, those that can directly
   target the virus replication cycle, and those that can prevent the
   cytokine release syndrome (CRS) and severe inflammatory responses. In a
   recent study, researchers identified 11 possible covalent drugs
   targeting the main protease (3CLpro) of SARS-CoV-2 by a computer-aided
   drug discovery protocol [43][9]. In another study, researchers
   identified 69 compounds targeting 66 druggable human proteins or host
   factors from 332 high-confidence SARS-CoV-2-human protein–protein
   interactions (PPIs) assayed by affinity-purification mass spectrometry.
   According to the guidelines published by National Institutes of Health
   (NIH), remdesivir is the only drug recommended for the treatment of
   COVID-19 in hospitalized patients with severe disease [44][1]. Apart
   from the antiviral treatment strategy, targeting the inflammatory
   cascade should be considered, particularly in the most severe cases
   with acute respiratory distress syndrome (ARDS) and secondary
   hemophagocytic lymphohistiocytosis (sHLH) caused by CRS. CRS-induced
   ARDS and sHLH are commonly observed and associated with high mortality
   in patients with SARS-CoV-2 and SARS-CoV as well as in patients with
   MERS [45][10]. Several trials on evaluating the safety and
   effectiveness of immunosuppressants commonly used in rheumatic diseases
   are ongoing in patients with COVID-19 and CRS, some of which are
   achieving promising results [46][11]. For example, interleukin-6
   inhibitors (e.g., sarilumab and tocilizumab) are potential
   immunosuppressive drugs for treatments of COVID-19 patients although
   there are insufficient data from clinical trials [47][1]. However, a
   systematic approach for drug repurposing based on big data analysis are
   lacking although the world’s largest clinical trial has launched to
   evaluate whether existing drugs work to treat people hospitalized with
   COVID-19 [48][12].

   In the current study, we systematically evaluated the effect of
   SARS-CoV-2 on the transcriptome profiles of both blood and lung tissue
   from COVID-19 patients. Differential gene analysis revealed key
   pathways associated with SARS-CoV-2 infection. Moreover, we applied
   weighted correlation network analysis (WGCNA) to identify gene modules
   that are highly correlated with clinical traits of COVID-19 patients.
   Finally, we employed three state-of-the-art methods to predict the
   potential therapeutic drugs which possess rescue effect on COVID-19
   gene signatures. Our results highlight the potential of TNFα inhibitor
   etanercept and GABABR agonist baclofen for COVID-19 treatment. We also
   suggest that adalimumab, tocilizumab, rituximab and glucocorticoids may
   have beneficial effects for COVID-19 treatment.

2. Methods

2.1. Data sources

   RNA-seq or microarray raw data of [49]GSE147507, [50]GSE36177,
   [51]GSE128101, [52]GSE145918, [53]GSE119939, [54]GSE48027,
   [55]GSE30351, [56]GSE74235, [57]GSE76492, [58]GSE112101, [59]GSE40165,
   [60]GSE18948, [61]GSE67368 and [62]GSE54629 were downloaded from the
   GEO database ([63]https://www.ncbi.nlm.nih.gov/geo/). RNA-seq data of
   CRA002390 were downloaded from the National Genomics Data Center
   database ([64]https://bigd.big.ac.cn/). NanoString nCounter
   transcriptomic profiling from E-MTAB-8871, and microarray data of
   E-TABM-116, E-TABM-642, E-GEOD-20272, and RNA-seq data of E-MTAB-5921
   were downloaded from Array Express
   ([65]https://www.ebi.ac.uk/arrayexpress/). Patients’ information was
   shown in [66]Supplementary Table 1.

2.2. RNA-seq data analysis

   According to our previous epigenetic study [67][13], the cutadapt
   ([68]https://cutadapt.readthedocs.org/en/stable/) and FastQC
   ([69]http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) tools
   wrapped in Trim Galore
   ([70]http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)
   were used to trim low-quality bases (Q < 20) and adapter for raw
   sequences. Reads with length < 20 bp were removed after trimming.
   Cleaned RNA-seq reads were mapped to the hg19 genome for human and the
   mm10 genome for mouse using HISAT2 (v2.1.0) [71][14]. Duplicated reads
   for pair-end data were removed by SAMtools (v1.5) [72][15].

2.3. Identification of differential expressed genes (DEGs)

   RNA-Seq data ([73]GSE147507) for two uninfected human lung biopsies and
   two lung samples from COVID-19 deceased patient, and RNA-Seq data
   (CRA002390) for three PBMC samples from healthy donors and COVID-19
   patients were analyzed by DESeq2 [74][16] to identify differentially
   expressed genes, respectively. Similar to our previous study [75][17],
   false discovery rate (FDR q-value) was calculated by adjusting P-values
   with the Benjamini-Hochberg method. Genes with FDR q-value < 0.05 and
   |Log2 (Fold change) | > 1 were considered as DEGs.
   The volcano plot was drawn using the ggplot2 package by R language.

2.4. Gene ontology (GO) pathway enrichment analysis of DEGs

   Pathway enrichment analysis was performed by Metascape
   ([76]https://metascape.org) [77][18]. Upregulated and downregulated
   DEGs were analyzed in this study, respectively. Pathway with
   P-value < 0.05 was considered significantly enriched pathway.
   Pathwaybubble plot was created by R package ggplot2.

2.5. Protein-protein interaction (PPI) network construction

   STRING database version 11.0 ([78]https://string-db.org/) was used to
   construct the PPI network with 0.9 highest confidence interaction
   score. Cytoscape (3.7.1) was used for visualizing the PPI network. The
   Molecular Complex Detection (MCODE) was applied to screen core modules
   in the PPI network with parameters as following: node score
   cutoff = 0.2, k-core = 2, and max. depth = 100, degree cutoff = 2.

2.6. Gene set enrichment analysis (GSEA)

   To further identify the gene sets that were affected by COVID-19
   infection, the whole gene expression profile of [79]GSE147507,
   CRA002390 and E-MTAB-8871 were analyzed by GSEA with the curated gene
   sets (c2.all.v7.1.symbols.gmt) from Molecular Signatures Database
   (MSigDB, [80]https://www.gsea-msigdb.org/) and cell markers of
   different cell types from CellMarker database
   ([81]http://biocc.hrbmu.edu.cn/CellMarker/) [82][19]. For enrichment of
   the reversal effects of potential drugs, the gene signatures of
   [83]GSE147507, CRA002390 and E-MTAB-8871 were used as the reference
   gene sets respectively. Gene sets with |NES| > 1 and Nom P-value < 0.05
   were considered statistically significant. To directly evaluate the
   reversal effect of potential drugs, we developed a scoring system as
   showed below:
   [MATH: <mrow><mi mathvariant="italic">Reversal</mi><mspace
   width="0.35em"></mspace><mi
   mathvariant="italic">score</mi><mo>=</mo><msup><mrow><mi
   mathvariant="italic">NES</mi></mrow><mrow><mi
   mathvariant="italic">dn</mi></mrow></msup><mo>×</mo><mrow><mfenced><mro
   w><mo>-</mo><mi>L</mi><mi>o</mi><mi>g</mi><mn>10</mn><mrow><mfenced><ms
   up><mrow><mi>P</mi></mrow><mrow><mi
   mathvariant="italic">dn</mi></mrow></msup></mfenced></mrow></mrow></mfe
   nced></mrow><mo>-</mo><msup><mrow><mi
   mathvariant="italic">NES</mi></mrow><mrow><mi
   mathvariant="italic">up</mi></mrow></msup><mo>×</mo><mrow><mfenced><mro
   w><mo>-</mo><mi>L</mi><mi>o</mi><mi>g</mi><mn>10</mn><mrow><mfenced><ms
   up><mrow><mi>P</mi></mrow><mrow><mi
   mathvariant="italic">up</mi></mrow></msup></mfenced></mrow></mrow></mfe
   nced></mrow></mrow> :MATH]

   where
   [MATH: <mrow><msup><mrow><mi
   mathvariant="italic">NES</mi></mrow><mrow><mi
   mathvariant="italic">dn</mi></mrow></msup></mrow> :MATH]
   and
   [MATH: <mrow><msup><mrow><mi>P</mi></mrow><mrow><mi
   mathvariant="italic">dn</mi></mrow></msup></mrow> :MATH]
   represent NES and P-value of downregulated genes from COVID-19 gene
   signature while
   [MATH: <mrow><msup><mrow><mi
   mathvariant="italic">NES</mi></mrow><mrow><mi
   mathvariant="italic">up</mi></mrow></msup></mrow> :MATH]
   and
   [MATH: <mrow><msup><mrow><mi>P</mi></mrow><mrow><mi
   mathvariant="italic">up</mi></mrow></msup></mrow> :MATH]
   represent NES and P-value of upregulated genes from COVID-19 gene
   signature. Higher reversal score indicates a higher reversal capacity
   of potential drugs.

2.7. WGCNA analysis

   Similar to our previous study [84][20], the weighted correlation
   network analysis (WGCNA) was used to construct the gene co-expression
   network. In this study, whole blood RNA NanoString nCounter
   transcriptomic profiling (E-MTAB-8871) from 10 healthy donors and a
   total of 9 days from illness onset to remission of a COVID-19 patient,
   with their clinic traits, were analyzed. We selected the power = 8 as
   the soft threshold to ensure a scale-free network. The topological
   overlap was calculated to measure network interconnectedness, and then
   average linkage hierarchical clustering was used to identify gene
   modules whose gene expression was highly correlated with the clinic
   traits from the gene co-expression network. The minimum number of genes
   in a module was set to 25. Throughout the analysis, we finally
   identified 6 modules. The expression of top 30 hub genes from brown and
   turquoise modules across healthy donors (shown as an average) and each
   day of illness progression of patients were shown in heatmap created by
   the R package pheatmap, and their PPI network was drawn by Cytoscape
   (v3.7.1).

2.8. Verify the utility of gene signature

   The DEGs associated with the enriched pathway of [85]GSE147507 and
   CRA002390, and genes from brown and turquoise modules of E-MTAB-8871
   were used as processed gene signature for further analysis. Similar to
   our previous study [86][21], an over-representation method in
   WebGestalt webserver ([87]http://www.webgestalt.org/) was used to
   enrich the gene signatures-related diseases.

2.9. Drug predicted by CREEDS and WebGestalt webserver

   Three COVID-19 gene signatures in this study were put into WebGestalt
   webserver ([88]http://www.webgestalt.org/), respectively [89][22]. A
   GSEA method in WebGestalt webserver with the minimum number of genes
   for a category = 4 was used to enrich potential COVD-19 therapeutic
   drugs from Drugbank ([90]https://www.drugbank.ca/), and higher
   enrichment score indicates a higher similarity of the gene signature
   affected by the drug with our input COVID-19 gene signature. CREEDS
   webserver ([91]http://amp.pharm.mssm.edu/creeds/) was further used for
   predicting the potential COVD-19 therapeutic drugs, whose affected gene
   signatures were opposite to our input COVID-19 gene signatures. Lower
   signed jaccard index indicates a better reversal effect [92][23].

3. Results

3.1. Gene signature of COVID-19 lung samples

   In order to get the gene signature of COVID-19, we first screened the
   differentially expressed genes (DEGs) of the RNA-seq transcriptome
   profiling from [93]GSE147507 [94][24]. Two uninfected lung biopsies
   from two healthy male were used as the control group, and two lung
   biopsies derived from two male COVID19 deceased patient were used as
   the COVID-19 group. A total of 1052 genes were reserved with false
   discovery rate (FDR) q-value < 0.05 and |Log2 (Fold change) | > 1,
   including 537 upregulated DEGs and 515 downregulated DEGs ([95]Fig. 1A
   and [96]Supplementary Table 2). Pathway enrichment analyses showed that
   upregulated genes upon COVID-19 infection were mainly enriched in
   immune and inflammation-related pathways, such as cytokine-mediated
   signaling pathway, regulation of cytokine production, neutrophil
   activation, granulocyte activation, interferon signaling, and
   regulation of inflammatory response ([97]Fig. 1B). This result was
   consistent with other reports that severe COVID-19 patients strongly
   associated with cytokine release syndrome (CRS), leading to acute
   respiratory distress syndrome (ARDS) and secondary haemophagocytic
   lymphohistiocytosis (sHLH), which are two of main causes of mortality
   [98][25], [99][26]. In contrast, downregulated genes were mainly
   associated with blood vessel development and nervous system
   development, suggesting that COVID-19 infection can cause damage to
   blood vessels and nervous systems in lung tissue ([100]Supplementary
   Fig. 1A). Our gene set enrichment analysis (GSEA) further confirmed
   that COVID-19 infection was significantly enriched in cytokine-related
   gene sets, especially interferon-induced antiviral, including IFN-α,
   IFN-β and IFN-γ. Notably, COVID-19 infection was significantly
   positively enriched in neutrophil, eosinophil, macrophage and CDC1^+ B
   dendritic cell, however, COVID-19 infection was negatively associated
   with multipotent progenitor cell, naïve CD8 + T cell, ionocyte cell,
   endothelial cell and leydig cell ([101]Fig. 1C).

Fig. 1.

   [102]Fig. 1
   [103]Open in a new tab

   Transcriptome profiling of lung samples from uninfected donor and
   COVID-19 patients. RNA-Seq data ([104]GSE147507) including two
   uninfected human lung biopsies and two lung samples from COVID-19
   deceased patients were analyzed by DESeq2 differential expression
   analysis. A. Volcano plot of total 1052 differential expression genes
   with FDR q-value < 0.05. Red color represented upregulated genes in
   COVID-19 samples with Log2(fold change) > 1.0 and green represented
   downregulated genes in COVID-19 samples with Log2(fold change) < 1.0.
   B. Metascape pathway enrichment of 537 upregulated genes. C. Enrichment
   plots of GSEA correlation analyses for the
   whole gene expression profile of COVID-19 and donor lung samples by
   using curated gene sets (c2.all.v7.1.symbols.gmt) and cell marker gene
   set (CellMarker). (For interpretation of the references to color in