Graphical abstract graphic file with name ga1.jpg [33]Open in a new tab Abstract The novel coronavirus SARS-CoV-2 is damaging the world’s social and economic fabrics seriously. Effective drugs are urgently needed to decrease the high mortality rate of COVID-19 patients. Unfortunately, effective antiviral drugs or vaccines are currently unavailable. Herein, we systematically evaluated the effect of SARS-CoV-2 on gene expression of both lung tissue and blood from COVID-19 patients using transcriptome profiling. Differential gene expression analysis revealed potential core mechanism of COVID-19-induced pneumonia in which IFN-α, IFN-β, IFN-γ, TNF and IL6 triggered cytokine storm mediated by neutrophil, macrophage, B and DC cells. Weighted gene correlation network analysis identified two gene modules that are highly correlated with clinical traits of COVID-19 patients, and confirmed that over-activation of immune system-mediated cytokine release syndrome is the underlying pathogenic mechanism for acute phase of COVID-19 infection. It suggested that anti-inflammatory therapies may be promising regimens for COVID-19 patients. Furthermore, drug repurposing analysis of thousands of drugs revealed that TNFα inhibitor etanercept and γ-aminobutyric acid-B receptor (GABABR) agonist baclofen showed most significant reversal power to COVID-19 gene signature, so we are highly optimistic about their clinical use for COVID-19 treatment. In addition, our results suggested that adalimumab, tocilizumab, rituximab and glucocorticoids may also have beneficial effects in restoring normal transcriptome, but not chloroquine, hydroxychloroquine or interferons. Controlled clinical trials of these candidate drugs are needed in search of effective COVID-19 treatment in current crisis. 1. Introduction Based on the situation report of World Health Organization (WHO), as of December 8, 2020, the outbreak of coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread worldwide with a total of 68,266,812 infections and 1,557,674 deaths [34][1]. The rapid propagation of COVID-19 makes it difficult to tackle. No effective drugs or vaccines also contributes to the uncontrollable situation. Researchers and scientists all over the world are racing against time to characterize the nature of this virus and find novel therapeutics for this virus [35][2], [36][3]. The most striking drug in the current COVID-19 crisis is remdesivir, a broad-spectrum antiviral medication with antiviral activity against several RNA viruses including SARS coronavirus and middle eastern respiratory syndrome virus (MERS) [37][4]. Significant outcome improvements were achieved for severe COVID-19 patients after treatment with compassionate-use of remdesivir [38][4]. However, another study from China with a randomized, double-blind, placebo-controlled and multicenter trial showed that remdesivir was not associated with statistically significant clinical benefits for severe COVID-19 patients [39][5]. Moreover, large scale trials of vaccines are still ongoing though the vaccine must be the line of defense to eventually eliminate this coronavirus. So how to reduce the mortality of infected patients before vaccines are approved, is now the most urgent task of researchers and physicians. Fortunately, one of the most efficient ways to identify effective therapy is through repurposing existing clinic therapeutic drugs using public data since bioinformatic prediction [40][6] and machine learning [41][7] could provide novel biological insigths [42][8]. These drugs can be divided into two broad categories, those that can directly target the virus replication cycle, and those that can prevent the cytokine release syndrome (CRS) and severe inflammatory responses. In a recent study, researchers identified 11 possible covalent drugs targeting the main protease (3CLpro) of SARS-CoV-2 by a computer-aided drug discovery protocol [43][9]. In another study, researchers identified 69 compounds targeting 66 druggable human proteins or host factors from 332 high-confidence SARS-CoV-2-human protein–protein interactions (PPIs) assayed by affinity-purification mass spectrometry. According to the guidelines published by National Institutes of Health (NIH), remdesivir is the only drug recommended for the treatment of COVID-19 in hospitalized patients with severe disease [44][1]. Apart from the antiviral treatment strategy, targeting the inflammatory cascade should be considered, particularly in the most severe cases with acute respiratory distress syndrome (ARDS) and secondary hemophagocytic lymphohistiocytosis (sHLH) caused by CRS. CRS-induced ARDS and sHLH are commonly observed and associated with high mortality in patients with SARS-CoV-2 and SARS-CoV as well as in patients with MERS [45][10]. Several trials on evaluating the safety and effectiveness of immunosuppressants commonly used in rheumatic diseases are ongoing in patients with COVID-19 and CRS, some of which are achieving promising results [46][11]. For example, interleukin-6 inhibitors (e.g., sarilumab and tocilizumab) are potential immunosuppressive drugs for treatments of COVID-19 patients although there are insufficient data from clinical trials [47][1]. However, a systematic approach for drug repurposing based on big data analysis are lacking although the world’s largest clinical trial has launched to evaluate whether existing drugs work to treat people hospitalized with COVID-19 [48][12]. In the current study, we systematically evaluated the effect of SARS-CoV-2 on the transcriptome profiles of both blood and lung tissue from COVID-19 patients. Differential gene analysis revealed key pathways associated with SARS-CoV-2 infection. Moreover, we applied weighted correlation network analysis (WGCNA) to identify gene modules that are highly correlated with clinical traits of COVID-19 patients. Finally, we employed three state-of-the-art methods to predict the potential therapeutic drugs which possess rescue effect on COVID-19 gene signatures. Our results highlight the potential of TNFα inhibitor etanercept and GABABR agonist baclofen for COVID-19 treatment. We also suggest that adalimumab, tocilizumab, rituximab and glucocorticoids may have beneficial effects for COVID-19 treatment. 2. Methods 2.1. Data sources RNA-seq or microarray raw data of [49]GSE147507, [50]GSE36177, [51]GSE128101, [52]GSE145918, [53]GSE119939, [54]GSE48027, [55]GSE30351, [56]GSE74235, [57]GSE76492, [58]GSE112101, [59]GSE40165, [60]GSE18948, [61]GSE67368 and [62]GSE54629 were downloaded from the GEO database ([63]https://www.ncbi.nlm.nih.gov/geo/). RNA-seq data of CRA002390 were downloaded from the National Genomics Data Center database ([64]https://bigd.big.ac.cn/). NanoString nCounter transcriptomic profiling from E-MTAB-8871, and microarray data of E-TABM-116, E-TABM-642, E-GEOD-20272, and RNA-seq data of E-MTAB-5921 were downloaded from Array Express ([65]https://www.ebi.ac.uk/arrayexpress/). Patients’ information was shown in [66]Supplementary Table 1. 2.2. RNA-seq data analysis According to our previous epigenetic study [67][13], the cutadapt ([68]https://cutadapt.readthedocs.org/en/stable/) and FastQC ([69]http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) tools wrapped in Trim Galore ([70]http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) were used to trim low-quality bases (Q < 20) and adapter for raw sequences. Reads with length < 20 bp were removed after trimming. Cleaned RNA-seq reads were mapped to the hg19 genome for human and the mm10 genome for mouse using HISAT2 (v2.1.0) [71][14]. Duplicated reads for pair-end data were removed by SAMtools (v1.5) [72][15]. 2.3. Identification of differential expressed genes (DEGs) RNA-Seq data ([73]GSE147507) for two uninfected human lung biopsies and two lung samples from COVID-19 deceased patient, and RNA-Seq data (CRA002390) for three PBMC samples from healthy donors and COVID-19 patients were analyzed by DESeq2 [74][16] to identify differentially expressed genes, respectively. Similar to our previous study [75][17], false discovery rate (FDR q-value) was calculated by adjusting P-values with the Benjamini-Hochberg method. Genes with FDR q-value < 0.05 and |Log2 (Fold change) | > 1 were considered as DEGs. The volcano plot was drawn using the ggplot2 package by R language. 2.4. Gene ontology (GO) pathway enrichment analysis of DEGs Pathway enrichment analysis was performed by Metascape ([76]https://metascape.org) [77][18]. Upregulated and downregulated DEGs were analyzed in this study, respectively. Pathway with P-value < 0.05 was considered significantly enriched pathway. Pathwaybubble plot was created by R package ggplot2. 2.5. Protein-protein interaction (PPI) network construction STRING database version 11.0 ([78]https://string-db.org/) was used to construct the PPI network with 0.9 highest confidence interaction score. Cytoscape (3.7.1) was used for visualizing the PPI network. The Molecular Complex Detection (MCODE) was applied to screen core modules in the PPI network with parameters as following: node score cutoff = 0.2, k-core = 2, and max. depth = 100, degree cutoff = 2. 2.6. Gene set enrichment analysis (GSEA) To further identify the gene sets that were affected by COVID-19 infection, the whole gene expression profile of [79]GSE147507, CRA002390 and E-MTAB-8871 were analyzed by GSEA with the curated gene sets (c2.all.v7.1.symbols.gmt) from Molecular Signatures Database (MSigDB, [80]https://www.gsea-msigdb.org/) and cell markers of different cell types from CellMarker database ([81]http://biocc.hrbmu.edu.cn/CellMarker/) [82][19]. For enrichment of the reversal effects of potential drugs, the gene signatures of [83]GSE147507, CRA002390 and E-MTAB-8871 were used as the reference gene sets respectively. Gene sets with |NES| > 1 and Nom P-value < 0.05 were considered statistically significant. To directly evaluate the reversal effect of potential drugs, we developed a scoring system as showed below: [MATH: Reversalscore=NESdn×-Log10Pdn-NESup×-Log10Pup :MATH] where [MATH: NESdn :MATH] and [MATH: Pdn :MATH] represent NES and P-value of downregulated genes from COVID-19 gene signature while [MATH: NESup :MATH] and [MATH: Pup :MATH] represent NES and P-value of upregulated genes from COVID-19 gene signature. Higher reversal score indicates a higher reversal capacity of potential drugs. 2.7. WGCNA analysis Similar to our previous study [84][20], the weighted correlation network analysis (WGCNA) was used to construct the gene co-expression network. In this study, whole blood RNA NanoString nCounter transcriptomic profiling (E-MTAB-8871) from 10 healthy donors and a total of 9 days from illness onset to remission of a COVID-19 patient, with their clinic traits, were analyzed. We selected the power = 8 as the soft threshold to ensure a scale-free network. The topological overlap was calculated to measure network interconnectedness, and then average linkage hierarchical clustering was used to identify gene modules whose gene expression was highly correlated with the clinic traits from the gene co-expression network. The minimum number of genes in a module was set to 25. Throughout the analysis, we finally identified 6 modules. The expression of top 30 hub genes from brown and turquoise modules across healthy donors (shown as an average) and each day of illness progression of patients were shown in heatmap created by the R package pheatmap, and their PPI network was drawn by Cytoscape (v3.7.1). 2.8. Verify the utility of gene signature The DEGs associated with the enriched pathway of [85]GSE147507 and CRA002390, and genes from brown and turquoise modules of E-MTAB-8871 were used as processed gene signature for further analysis. Similar to our previous study [86][21], an over-representation method in WebGestalt webserver ([87]http://www.webgestalt.org/) was used to enrich the gene signatures-related diseases. 2.9. Drug predicted by CREEDS and WebGestalt webserver Three COVID-19 gene signatures in this study were put into WebGestalt webserver ([88]http://www.webgestalt.org/), respectively [89][22]. A GSEA method in WebGestalt webserver with the minimum number of genes for a category = 4 was used to enrich potential COVD-19 therapeutic drugs from Drugbank ([90]https://www.drugbank.ca/), and higher enrichment score indicates a higher similarity of the gene signature affected by the drug with our input COVID-19 gene signature. CREEDS webserver ([91]http://amp.pharm.mssm.edu/creeds/) was further used for predicting the potential COVD-19 therapeutic drugs, whose affected gene signatures were opposite to our input COVID-19 gene signatures. Lower signed jaccard index indicates a better reversal effect [92][23]. 3. Results 3.1. Gene signature of COVID-19 lung samples In order to get the gene signature of COVID-19, we first screened the differentially expressed genes (DEGs) of the RNA-seq transcriptome profiling from [93]GSE147507 [94][24]. Two uninfected lung biopsies from two healthy male were used as the control group, and two lung biopsies derived from two male COVID19 deceased patient were used as the COVID-19 group. A total of 1052 genes were reserved with false discovery rate (FDR) q-value < 0.05 and |Log2 (Fold change) | > 1, including 537 upregulated DEGs and 515 downregulated DEGs ([95]Fig. 1A and [96]Supplementary Table 2). Pathway enrichment analyses showed that upregulated genes upon COVID-19 infection were mainly enriched in immune and inflammation-related pathways, such as cytokine-mediated signaling pathway, regulation of cytokine production, neutrophil activation, granulocyte activation, interferon signaling, and regulation of inflammatory response ([97]Fig. 1B). This result was consistent with other reports that severe COVID-19 patients strongly associated with cytokine release syndrome (CRS), leading to acute respiratory distress syndrome (ARDS) and secondary haemophagocytic lymphohistiocytosis (sHLH), which are two of main causes of mortality [98][25], [99][26]. In contrast, downregulated genes were mainly associated with blood vessel development and nervous system development, suggesting that COVID-19 infection can cause damage to blood vessels and nervous systems in lung tissue ([100]Supplementary Fig. 1A). Our gene set enrichment analysis (GSEA) further confirmed that COVID-19 infection was significantly enriched in cytokine-related gene sets, especially interferon-induced antiviral, including IFN-α, IFN-β and IFN-γ. Notably, COVID-19 infection was significantly positively enriched in neutrophil, eosinophil, macrophage and CDC1^+ B dendritic cell, however, COVID-19 infection was negatively associated with multipotent progenitor cell, naïve CD8 + T cell, ionocyte cell, endothelial cell and leydig cell ([101]Fig. 1C). Fig. 1. [102]Fig. 1 [103]Open in a new tab Transcriptome profiling of lung samples from uninfected donor and COVID-19 patients. RNA-Seq data ([104]GSE147507) including two uninfected human lung biopsies and two lung samples from COVID-19 deceased patients were analyzed by DESeq2 differential expression analysis. A. Volcano plot of total 1052 differential expression genes with FDR q-value < 0.05. Red color represented upregulated genes in COVID-19 samples with Log2(fold change) > 1.0 and green represented downregulated genes in COVID-19 samples with Log2(fold change) < 1.0. B. Metascape pathway enrichment of 537 upregulated genes. C. Enrichment plots of GSEA correlation analyses for the whole gene expression profile of COVID-19 and donor lung samples by using curated gene sets (c2.all.v7.1.symbols.gmt) and cell marker gene set (CellMarker). (For interpretation of the references to color in