Graphical abstract
graphic file with name ga1.jpg
[33]Open in a new tab
Abstract
The novel coronavirus SARS-CoV-2 is damaging the world’s social and
economic fabrics seriously. Effective drugs are urgently needed to
decrease the high mortality rate of COVID-19 patients. Unfortunately,
effective antiviral drugs or vaccines are currently unavailable.
Herein, we systematically evaluated the effect of SARS-CoV-2 on gene
expression of both lung tissue and blood from COVID-19 patients using
transcriptome profiling. Differential gene expression analysis revealed
potential core mechanism of COVID-19-induced pneumonia in which IFN-α,
IFN-β, IFN-γ, TNF and IL6 triggered cytokine storm mediated by
neutrophil, macrophage, B and DC cells. Weighted gene correlation
network analysis identified two gene modules that are highly correlated
with clinical traits of COVID-19 patients, and confirmed that
over-activation of immune system-mediated cytokine release syndrome is
the underlying pathogenic mechanism for acute phase of COVID-19
infection. It suggested that anti-inflammatory therapies may be
promising regimens for COVID-19 patients. Furthermore, drug repurposing
analysis of thousands of drugs revealed that TNFα inhibitor etanercept
and γ-aminobutyric acid-B receptor (GABABR) agonist baclofen showed
most significant reversal power to COVID-19 gene signature, so we are
highly optimistic about their clinical use for COVID-19 treatment. In
addition, our results suggested that adalimumab, tocilizumab, rituximab
and glucocorticoids may also have beneficial effects in restoring
normal transcriptome, but not chloroquine, hydroxychloroquine or
interferons. Controlled clinical trials of these candidate drugs are
needed in search of effective COVID-19 treatment in current crisis.
1. Introduction
Based on the situation report of World Health Organization (WHO), as of
December 8, 2020, the outbreak of coronavirus disease 2019 (COVID-19),
caused by the severe acute respiratory syndrome coronavirus 2
(SARS-CoV-2) has spread worldwide with a total of 68,266,812 infections
and 1,557,674 deaths [34][1]. The rapid propagation of COVID-19 makes
it difficult to tackle. No effective drugs or vaccines also contributes
to the uncontrollable situation. Researchers and scientists all over
the world are racing against time to characterize the nature of this
virus and find novel therapeutics for this virus [35][2], [36][3]. The
most striking drug in the current COVID-19 crisis is remdesivir, a
broad-spectrum antiviral medication with antiviral activity against
several RNA viruses including SARS coronavirus and middle eastern
respiratory syndrome virus (MERS) [37][4]. Significant outcome
improvements were achieved for severe COVID-19 patients after treatment
with compassionate-use of remdesivir [38][4]. However, another study
from China with a randomized, double-blind, placebo-controlled and
multicenter trial showed that remdesivir was not associated with
statistically significant clinical benefits for severe COVID-19
patients [39][5]. Moreover, large scale trials of vaccines are still
ongoing though the vaccine must be the line of defense to eventually
eliminate this coronavirus. So how to reduce the mortality of infected
patients before vaccines are approved, is now the most urgent task of
researchers and physicians.
Fortunately, one of the most efficient ways to identify effective
therapy is through repurposing existing clinic therapeutic drugs using
public data since bioinformatic prediction [40][6] and machine learning
[41][7] could provide novel biological insigths [42][8]. These drugs
can be divided into two broad categories, those that can directly
target the virus replication cycle, and those that can prevent the
cytokine release syndrome (CRS) and severe inflammatory responses. In a
recent study, researchers identified 11 possible covalent drugs
targeting the main protease (3CLpro) of SARS-CoV-2 by a computer-aided
drug discovery protocol [43][9]. In another study, researchers
identified 69 compounds targeting 66 druggable human proteins or host
factors from 332 high-confidence SARS-CoV-2-human protein–protein
interactions (PPIs) assayed by affinity-purification mass spectrometry.
According to the guidelines published by National Institutes of Health
(NIH), remdesivir is the only drug recommended for the treatment of
COVID-19 in hospitalized patients with severe disease [44][1]. Apart
from the antiviral treatment strategy, targeting the inflammatory
cascade should be considered, particularly in the most severe cases
with acute respiratory distress syndrome (ARDS) and secondary
hemophagocytic lymphohistiocytosis (sHLH) caused by CRS. CRS-induced
ARDS and sHLH are commonly observed and associated with high mortality
in patients with SARS-CoV-2 and SARS-CoV as well as in patients with
MERS [45][10]. Several trials on evaluating the safety and
effectiveness of immunosuppressants commonly used in rheumatic diseases
are ongoing in patients with COVID-19 and CRS, some of which are
achieving promising results [46][11]. For example, interleukin-6
inhibitors (e.g., sarilumab and tocilizumab) are potential
immunosuppressive drugs for treatments of COVID-19 patients although
there are insufficient data from clinical trials [47][1]. However, a
systematic approach for drug repurposing based on big data analysis are
lacking although the world’s largest clinical trial has launched to
evaluate whether existing drugs work to treat people hospitalized with
COVID-19 [48][12].
In the current study, we systematically evaluated the effect of
SARS-CoV-2 on the transcriptome profiles of both blood and lung tissue
from COVID-19 patients. Differential gene analysis revealed key
pathways associated with SARS-CoV-2 infection. Moreover, we applied
weighted correlation network analysis (WGCNA) to identify gene modules
that are highly correlated with clinical traits of COVID-19 patients.
Finally, we employed three state-of-the-art methods to predict the
potential therapeutic drugs which possess rescue effect on COVID-19
gene signatures. Our results highlight the potential of TNFα inhibitor
etanercept and GABABR agonist baclofen for COVID-19 treatment. We also
suggest that adalimumab, tocilizumab, rituximab and glucocorticoids may
have beneficial effects for COVID-19 treatment.
2. Methods
2.1. Data sources
RNA-seq or microarray raw data of [49]GSE147507, [50]GSE36177,
[51]GSE128101, [52]GSE145918, [53]GSE119939, [54]GSE48027,
[55]GSE30351, [56]GSE74235, [57]GSE76492, [58]GSE112101, [59]GSE40165,
[60]GSE18948, [61]GSE67368 and [62]GSE54629 were downloaded from the
GEO database ([63]https://www.ncbi.nlm.nih.gov/geo/). RNA-seq data of
CRA002390 were downloaded from the National Genomics Data Center
database ([64]https://bigd.big.ac.cn/). NanoString nCounter
transcriptomic profiling from E-MTAB-8871, and microarray data of
E-TABM-116, E-TABM-642, E-GEOD-20272, and RNA-seq data of E-MTAB-5921
were downloaded from Array Express
([65]https://www.ebi.ac.uk/arrayexpress/). Patients’ information was
shown in [66]Supplementary Table 1.
2.2. RNA-seq data analysis
According to our previous epigenetic study [67][13], the cutadapt
([68]https://cutadapt.readthedocs.org/en/stable/) and FastQC
([69]http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) tools
wrapped in Trim Galore
([70]http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)
were used to trim low-quality bases (Q < 20) and adapter for raw
sequences. Reads with length < 20 bp were removed after trimming.
Cleaned RNA-seq reads were mapped to the hg19 genome for human and the
mm10 genome for mouse using HISAT2 (v2.1.0) [71][14]. Duplicated reads
for pair-end data were removed by SAMtools (v1.5) [72][15].
2.3. Identification of differential expressed genes (DEGs)
RNA-Seq data ([73]GSE147507) for two uninfected human lung biopsies and
two lung samples from COVID-19 deceased patient, and RNA-Seq data
(CRA002390) for three PBMC samples from healthy donors and COVID-19
patients were analyzed by DESeq2 [74][16] to identify differentially
expressed genes, respectively. Similar to our previous study [75][17],
false discovery rate (FDR q-value) was calculated by adjusting P-values
with the Benjamini-Hochberg method. Genes with FDR q-value < 0.05 and
|Log2 (Fold change) | > 1 were considered as DEGs.
The volcano plot was drawn using the ggplot2 package by R language.
2.4. Gene ontology (GO) pathway enrichment analysis of DEGs
Pathway enrichment analysis was performed by Metascape
([76]https://metascape.org) [77][18]. Upregulated and downregulated
DEGs were analyzed in this study, respectively. Pathway with
P-value < 0.05 was considered significantly enriched pathway.
Pathwaybubble plot was created by R package ggplot2.
2.5. Protein-protein interaction (PPI) network construction
STRING database version 11.0 ([78]https://string-db.org/) was used to
construct the PPI network with 0.9 highest confidence interaction
score. Cytoscape (3.7.1) was used for visualizing the PPI network. The
Molecular Complex Detection (MCODE) was applied to screen core modules
in the PPI network with parameters as following: node score
cutoff = 0.2, k-core = 2, and max. depth = 100, degree cutoff = 2.
2.6. Gene set enrichment analysis (GSEA)
To further identify the gene sets that were affected by COVID-19
infection, the whole gene expression profile of [79]GSE147507,
CRA002390 and E-MTAB-8871 were analyzed by GSEA with the curated gene
sets (c2.all.v7.1.symbols.gmt) from Molecular Signatures Database
(MSigDB, [80]https://www.gsea-msigdb.org/) and cell markers of
different cell types from CellMarker database
([81]http://biocc.hrbmu.edu.cn/CellMarker/) [82][19]. For enrichment of
the reversal effects of potential drugs, the gene signatures of
[83]GSE147507, CRA002390 and E-MTAB-8871 were used as the reference
gene sets respectively. Gene sets with |NES| > 1 and Nom P-value < 0.05
were considered statistically significant. To directly evaluate the
reversal effect of potential drugs, we developed a scoring system as
showed below:
[MATH: Reversalscore=NESdn×-Log10Pdn-NESup×-Log10Pup :MATH]
where
[MATH: NESdn :MATH]
and
[MATH: Pdn :MATH]
represent NES and P-value of downregulated genes from COVID-19 gene
signature while
[MATH: NESup :MATH]
and
[MATH: Pup :MATH]
represent NES and P-value of upregulated genes from COVID-19 gene
signature. Higher reversal score indicates a higher reversal capacity
of potential drugs.
2.7. WGCNA analysis
Similar to our previous study [84][20], the weighted correlation
network analysis (WGCNA) was used to construct the gene co-expression
network. In this study, whole blood RNA NanoString nCounter
transcriptomic profiling (E-MTAB-8871) from 10 healthy donors and a
total of 9 days from illness onset to remission of a COVID-19 patient,
with their clinic traits, were analyzed. We selected the power = 8 as
the soft threshold to ensure a scale-free network. The topological
overlap was calculated to measure network interconnectedness, and then
average linkage hierarchical clustering was used to identify gene
modules whose gene expression was highly correlated with the clinic
traits from the gene co-expression network. The minimum number of genes
in a module was set to 25. Throughout the analysis, we finally
identified 6 modules. The expression of top 30 hub genes from brown and
turquoise modules across healthy donors (shown as an average) and each
day of illness progression of patients were shown in heatmap created by
the R package pheatmap, and their PPI network was drawn by Cytoscape
(v3.7.1).
2.8. Verify the utility of gene signature
The DEGs associated with the enriched pathway of [85]GSE147507 and
CRA002390, and genes from brown and turquoise modules of E-MTAB-8871
were used as processed gene signature for further analysis. Similar to
our previous study [86][21], an over-representation method in
WebGestalt webserver ([87]http://www.webgestalt.org/) was used to
enrich the gene signatures-related diseases.
2.9. Drug predicted by CREEDS and WebGestalt webserver
Three COVID-19 gene signatures in this study were put into WebGestalt
webserver ([88]http://www.webgestalt.org/), respectively [89][22]. A
GSEA method in WebGestalt webserver with the minimum number of genes
for a category = 4 was used to enrich potential COVD-19 therapeutic
drugs from Drugbank ([90]https://www.drugbank.ca/), and higher
enrichment score indicates a higher similarity of the gene signature
affected by the drug with our input COVID-19 gene signature. CREEDS
webserver ([91]http://amp.pharm.mssm.edu/creeds/) was further used for
predicting the potential COVD-19 therapeutic drugs, whose affected gene
signatures were opposite to our input COVID-19 gene signatures. Lower
signed jaccard index indicates a better reversal effect [92][23].
3. Results
3.1. Gene signature of COVID-19 lung samples
In order to get the gene signature of COVID-19, we first screened the
differentially expressed genes (DEGs) of the RNA-seq transcriptome
profiling from [93]GSE147507 [94][24]. Two uninfected lung biopsies
from two healthy male were used as the control group, and two lung
biopsies derived from two male COVID19 deceased patient were used as
the COVID-19 group. A total of 1052 genes were reserved with false
discovery rate (FDR) q-value < 0.05 and |Log2 (Fold change) | > 1,
including 537 upregulated DEGs and 515 downregulated DEGs ([95]Fig. 1A
and [96]Supplementary Table 2). Pathway enrichment analyses showed that
upregulated genes upon COVID-19 infection were mainly enriched in
immune and inflammation-related pathways, such as cytokine-mediated
signaling pathway, regulation of cytokine production, neutrophil
activation, granulocyte activation, interferon signaling, and
regulation of inflammatory response ([97]Fig. 1B). This result was
consistent with other reports that severe COVID-19 patients strongly
associated with cytokine release syndrome (CRS), leading to acute
respiratory distress syndrome (ARDS) and secondary haemophagocytic
lymphohistiocytosis (sHLH), which are two of main causes of mortality
[98][25], [99][26]. In contrast, downregulated genes were mainly
associated with blood vessel development and nervous system
development, suggesting that COVID-19 infection can cause damage to
blood vessels and nervous systems in lung tissue ([100]Supplementary
Fig. 1A). Our gene set enrichment analysis (GSEA) further confirmed
that COVID-19 infection was significantly enriched in cytokine-related
gene sets, especially interferon-induced antiviral, including IFN-α,
IFN-β and IFN-γ. Notably, COVID-19 infection was significantly
positively enriched in neutrophil, eosinophil, macrophage and CDC1^+ B
dendritic cell, however, COVID-19 infection was negatively associated
with multipotent progenitor cell, naïve CD8 + T cell, ionocyte cell,
endothelial cell and leydig cell ([101]Fig. 1C).
Fig. 1.
[102]Fig. 1
[103]Open in a new tab
Transcriptome profiling of lung samples from uninfected donor and
COVID-19 patients. RNA-Seq data ([104]GSE147507) including two
uninfected human lung biopsies and two lung samples from COVID-19
deceased patients were analyzed by DESeq2 differential expression
analysis. A. Volcano plot of total 1052 differential expression genes
with FDR q-value < 0.05. Red color represented upregulated genes in
COVID-19 samples with Log2(fold change) > 1.0 and green represented
downregulated genes in COVID-19 samples with Log2(fold change) < 1.0.
B. Metascape pathway enrichment of 537 upregulated genes. C. Enrichment
plots of GSEA correlation analyses for the
whole gene expression profile of COVID-19 and donor lung samples by
using curated gene sets (c2.all.v7.1.symbols.gmt) and cell marker gene
set (CellMarker). (For interpretation of the references to color in