Abstract Given the considerable cost of drug discovery, drug repurposing is becoming attractive as it can effectively shorten the development timeline and reduce the development cost. However, most existing drug-repurposing methods omitted the heterogeneous health conditions of different COVID-19 patients. In this study, we evaluated the adverse effect (AE) profiles of 106 COVID-19 drugs. We extracted four AE signatures to characterize the AE distribution of 106 COVID-19 drugs by non-negative matrix factorization (NMF). By integrating the information from four distinct databases (AE, bioassay, chemical structure, and gene expression information), we predicted the AE profiles of 91 drugs with inadequate AE feedback. For each of the drug clusters, discriminant genes accounting for mechanisms of different AE signatures were identified by sparse linear discriminant analysis. Our findings can be divided into three parts. First, drugs abundant with AE-signature 1 (for example, remdesivir) should be taken with caution for patients with poor liver, renal, or cardiac functions, where the functional genes accumulate in the RHO GTPases Activate NADPH Oxidases pathway. Second, drugs featuring AE-signature 2 (for example, hydroxychloroquine) are unsuitable for patients with vascular disorders, with relevant genes enriched in signal transduction pathways. Third, drugs characterized by AE signatures 3 and 4 have relatively mild AEs. Our study showed that NMF and network-based frameworks contribute to more precise drug recommendations. Keywords: COVID-19 drugs, adverse effect signature, non-negative matrix factorization, network integration, precise drug recommendation Introduction The coronavirus disease 2019 (COVID-19) has swept the world for over 2 years. Although COVID-19 vaccines have made an indelible contribution to triumphing over the epidemic, it is not the silver bullet to end the pandemic. An increasing number of infected cases were reported even though they were fully vaccinated (i.e., the COVID-19 vaccine breakthrough infections) ([34]CDC Covid-19 Vaccine Breakthrough Case Investigations Team, 2021). There is an urgent need to develop therapeutic drugs to fight against this pandemic. Because traditional drug discovery methods are time consuming and expensive, the drug repurposing method is becoming an attractive option in the current urgent circumstance. Recently, many computational approaches have been developed to narrow down the search space of drugs to accelerate the process of drug repurposing against SARS-COV-2 ([35]Gordon et al., 2020; [36]Jang et al., 2021; [37]Adhami et al., 2021; [38]Guo et al., 2021; [39]Gysi et al., 2021). Among them, the network-based approach accounts for a large proportion. [40]Rintala et al. (2022) reviewed some representative drug repurposing results for COVID-19, which applied network proximity algorithms. Most of the existing computational methods have two main shortcomings. First, only one particular type of data was used in the drug repurposing procedure, which may limit the power to identify therapeutic candidates. Integrating multiple heterogeneous data sources would be more powerful. The other shortcoming is the so-called winner-takes-all pattern. Specifically, the selected repurposed drug may be recommended to all patients without taking the heterogeneous health conditions into account. In the era of precision medicine, it is normal to recommend specific drugs for patients with distinct underlying conditions, even though they are diagnosed with the same disease. To address the limitations of the winner-takes-all pattern in the existing drug repurposing scheme, predicting the drug’s heterogeneous safety levels with respect to the different health conditions of patients is essential. From the perspective of drug safety, drugs repurposed for treating COVID-19 can be classified into two categories. The first category contains drugs that are still in or about to be in the stage of clinical trials. Because these drugs have not been used by COVID-19 patients on a large scale, their adverse effect (AE) feedback may be inadequate and the evaluation of drug safety remains unclear. In this case, it is more desirable to conduct computational methods to predict side effects for drugs in the clinical trial stage to save research time and cost. The prediction of drug side effects is usually based on drug–drug similarity through integrating multi-source data, including chemical structures, protein targets, and therapeutic indication ([41]Wang et al., 2014; [42]Zhang et al., 2016; [43]Timilsina et al., 2019). The second category involves drugs that have been approved by the World Health Organization (WHO) or drugs with relatively abundant post-market surveillance-reported AEs offered by COVID-19 patients. [44]Jing et al. (2021) mined the AE information of several COVID-19 drugs using the FDA adverse events reporting system (FAERS) database, presenting the landscape of overreported AEs in each organ/system for each drug. [45]Wu et al. (2020) explored the associations between COVID-19 drugs and 30 human tissues based on network proximity. Although it is more systematic to classify AEs into tissue level or system level, the biological process causing AEs might be cross organs or cross systems. Thus, it is more powerful to integrate different AEs and different drugs together to dig out a higher-level landscape of AEs, which can give us a comprehensive picture of drug safety and offer benefits for drug recommendation in precision medicine. In this drug safety study, we conducted the AE profile for two different kinds of drugs: drugs with abundant AE information and drugs without abundant AE information. First, based on the relatively abundant AE information provided by FAERS, we proposed a non-negative matrix factorization (NMF) ([46]Lee and Seung, 2000) framework for drugs that have already been used to treat COVID-19. Apart from extracting four AE signatures to depict each drug’s AE distribution, NMF also provided abundance fractions on the four AE signatures for each drug, which we refer to a characteristic combination of AEs as an AE-signature. The higher-level landscape of AEs represented by AE signatures can help us partition the whole COVID-19 patient cohort into four subpopulations, with each subpopulation more vulnerable to its corresponding AE signature. Therefore, we can conduct the precise drug recommendation by incorporating the AE signature’s abundance fraction for each drug. Second, because COVID-19 drugs in the clinical trial stage lack AE feedback, we predicted their AE profiles to achieve more accurate personalized drug recommendations. Specifically, we * 1) Constructed drug–drug similarity networks using data coming from multiple sources * 2) Built a network imputation framework to tackle information imbalance among networks * 3) Combined the integrated similarity network with drugs possessing a relatively affluent drug safety profile We presented literature support and molecular-level explanations to illustrate the rationality of the AE prediction results. Methods Workflow The workflow of our study is illustrated in [47]Figure 1. FIGURE 1. [48]FIGURE 1 [49]Open in a new tab Workflow. (A) Four types of input data, including bioassay, chemical structure, LINCS, and FAERS data. (B) Construction of four first-stage networks based on four types of drug data. (C) Construction of four second-stage networks based on drug intersection and MDS-based network imputation. (D) Construction of last-stage network from similarity network fusion (SNF). (E) Non-negative matrix factorization (NMF) of count matrix from the CEUAFD; [MATH: X :MATH] , drug-AE count matrix; [MATH: W :MATH] , AE-signature matrix; and [MATH: H :MATH] , drug abundance fraction matrix. (F) Clustering of all COVID-19 drugs by combining results from (D) and (E), with each cluster featuring one AE signature. (G) Gene expression information extracted from LINCS data with the A549 cell line for four clusters of drugs. (H) Discriminant gene sets obtained by sparse linear discriminant analysis (SLDA) with respect to each pair of drug clusters. First of all, we collected drugs that are in the current COVID-19 clinical trial stage ([50]Pundi et al., 2020; [51]Bugin and Woodcock, 2021) together with drugs from the COVID-19 Emergency Use Authorization FAERS database (CEUAFD), mined their information from four different datasets (encompassing AE, bioassay, chemical structure, and gene expression information), and constructed four first-stage drug–drug similarity networks accordingly (see [52]Supplementary Material). For simplicity, we refer to COVID-19 drugs that emerged in the CEUAFD as CEUAFD drugs hereafter. Second, for drugs in the clinical trial stage, we only reserved drugs whose information was available among all four databases. For some CEUAFD drugs abundant in certain AE signatures, their information across all four databases may be incomplete. To leverage their informative AE feedback and tackle the information imbalance problem, we built an imputation method based on multi-dimensional scaling (MDS) ([53]Cox and Cox, 2008) to patch up the information imbalance among data sources, forming four-dimension-consistent imputed second-stage networks. Third, we applied the similarity network fusion (SNF) method ([54]Wang et al., 2014) to the four second-stage networks to obtain a final-stage integrated drug–drug similarity network (see [55]Supplementary Material). At the same time, we conducted NMF to the CEUAFD drugs to extract four higher-level AE signatures and drugs’ abundance fractions on the deciphered AE signatures. Meanwhile, the dimension reduction of AEs and clustering of CEUAFD drugs can be achieved. Finally, based on the integrated similarity network and CEUAFD drugs’ clustering outcome from the NMF procedure, we obtained four drug clusters for all analyzed COVID-19 drugs, with each cluster featuring one specific AE signature. Precise recommendations for drugs in different clinical stages can be implemented subsequently (see [56]Supplementary Material). To demonstrate the reliability of our predicted results, we performed extensive literature explorations and measured drug–tissue distances by taking a network proximity approach on the Genotype-Tissue Expression (GTEx) database ([57]Lonsdale et al., 2013). For every pair of drug clusters among the four clusters, we conducted sparse linear discriminant analysis (SLDA) ([58]Mai et al., 2012) to pick up genes with discriminant expression patterns. If there is a correspondence between discriminant genes and AE signatures for each pair of drug clusters, the rationality of our drug clustering results can be proved. Pathway enrichment analysis (see [59]Supplementary Material) and gene function analysis revealed that the discriminant genes do have the ability to account for different AE signatures in each pair of drug clusters. Network imputation We found that drugs abundant with AE feedback may have insufficient information from other data sources (for example, multi-compound drugs casirivimab and imdevimab). To make full use of the AE information reported by COVID-19 patients and construct a more informative integrated drug–drug similarity network, we imputed similarity scores for the other three databases based on the FAERS data. We took the bioassay database as an example to demonstrate the imputation process. Suppose the first-stage drug–drug similarity matrices constructed by FAERS and bioassay database are denoted by [MATH: SfRnf×nf :MATH] and [MATH: SaRna×na :MATH] (see [60]Supplementary Material). To make drug information reflected by different databases transmissible, we adopted the MDS ([61]Cox and Cox, 2008) method to translate “drug–drug similarity” into “lower-dimensional coordinate” for all drugs. Therefore, a predictive model linking the “lower-dimensional coordinate” of FAERS and the bioassay database can be built using dataset-shared drugs. For drugs that lack information in bioassay form, the information provided by the FAERS database can be transmitted by this predictive model to fulfill the imputation. To let the “lower-dimensional coordinate” of drugs preserve the similarity given by [MATH: Sf :MATH] or [MATH: Sa :MATH] to the largest extent, we applied an eigenvalue decomposition (EVD) method to the double-centering matrix [MATH: Cf=12HfSfHf :MATH] and [MATH: Ca=12HaSaHa :MATH] , where [MATH: Hf :MATH] and [MATH: Ha :MATH] are the centering matrix, i.e., [MATH: Hf=InfJnf/nf :MATH] , and [MATH: Ha=InaJna/na :MATH] with [MATH: InfRnf×nf :MATH] and [MATH: IaRna×na :MATH] denoting the identity matrix and [MATH: JnfRnf×nf :MATH] and [MATH: JaRna×na :MATH] denoting the matrix of all ones. Concretely, we suppose [MATH: Cf=PΛfPTPPT, :MATH] (1) [MATH: Ca=QΛaQTQQT, :MATH] (2) where [MATH: P :MATH] and [MATH: Q :MATH] are orthogonal matrices with [MATH: P=(< mi mathvariant="bold-italic">ξ1,...,ξnf) :MATH] and [MATH: Q=(< mi mathvariant="bold-italic">η1,...,ηna). :MATH] Here, we denote [MATH: ξk= (ξk,1,..., ξk,nf)T :MATH] for [MATH: k=1,... ,nf :MATH] ; [MATH: ηl= (ηl,1,..., ηl,na)T :MATH] for [MATH: l=1,... ,na :MATH] . [MATH: Λf :MATH] and [MATH: Λa :MATH] are diagonal matrices with [MATH: Λf=diag(λf< /mi>,1,...,λf,n< /mi>f) :MATH] , [MATH: Λa=diag(λa< /mi>,1,...,λa,n< /mi>a). :MATH] [MATH: P=PΛf12=( λf,1ξ1,...,λf ,nfξnf) :MATH] and [MATH: Q=QΛa12=( λa,1η1,...,λa ,naηna) :MATH] . For convenience, we suppose [MATH: λf,1...λf,nf0 :MATH] , [MATH: λa,1...λa,na0 :MATH] . If the [MATH: nf :MATH] and [MATH: na :MATH] drugs in FAERS and bioassay databases are embedded in the [MATH: kf :MATH] - and [MATH: ka :MATH] -dimensional Euclidean space separately, the coordinates of drugs in the two datasets can be represented by [MATH: Pkf=(λf,1ξ1,...,λf ,kfξkf) :MATH] and [MATH: Qka=(λa,1η1,...,λa ,kaηka) :MATH] , with each row denoting one drug’s coordinate. More specifically, the coordinate of the [MATH: i :MATH] -th drug in the FAERS database is [MATH: (λf, 1ξ1,i,... ,λf,k fξ kf,i) :MATH] , and the coordinate of the [MATH: j :MATH] -th drug in the bioassay database is [MATH: (λa, 1η1,j,... ,λa,k aη ka,j) :MATH] . We introduce set [MATH: U :MATH] to contain drugs shared between FAERS and bioassay datasets. We denote the coordinates in these two datasets as [MATH: PUR|U|× kf :MATH] and [MATH: QUR|U|× ka :MATH] separately and assume a linear relationship between [MATH: PU :MATH] and [MATH: QU :MATH] , i.e., [MATH: QU=PUB+E, :MATH] (3) where [MATH: B :MATH] is the coefficient matrix and [MATH: ER|U|× kf :MATH] is the noise matrix with every element [MATH: ei,jN(0,σ2) :MATH] . If we denote the least square estimator of [MATH: B :MATH] as [MATH: B^ :MATH] , for a particular drug only equipped with FAERS coordinates [MATH: ζ :MATH] while lacking bioassay coordinates, we can predict its bioassay coordinate by [MATH: ζTB^ :MATH] . Once the bioassay coordinates are predicted, the similarity score in the bioassay database can be computed by the Euclidean distance between the bioassay coordinates of drugs. Non-negative matrix factorization of COVID-19 emergency use authorization FAERS database drugs Suppose there are [MATH: n :MATH] unique drugs and [MATH: p :MATH] unique AEs reported in the CEUAFD after the pre-processing process (see [62]Supplementary Material), and [MATH: XRp×n< /mrow> :MATH] is a matrix with element [MATH: Xij :MATH] representing the total number of patients reporting the [MATH: i :MATH] -th AE after taking the [MATH: j :MATH] -th drug ( [MATH: i=1 :MATH] ,..., [MATH: p :MATH] and [MATH: j=1 :MATH] ,..., [MATH: n :MATH] ). We applied NMF to [MATH: X :MATH] to decipher the underlying AE signatures among CEUAFD drugs. Meanwhile, for each drug, we assigned an abundance score to each AE signature. Concretely, [MATH: X :MATH] is approximated by the product of two low-rank matrices: [MATH: XWH, :MATH] (4) where [MATH: WRp×k< /mrow> :MATH] is the AE-signature matrix with each column corresponding to a specific AE signature, [MATH: HRk×n< /mrow> :MATH] is an abundance fraction matrix with the [MATH: j :MATH] -th column representing the relative abundance on each AE signature for the [MATH: j :MATH] -th drug ( [MATH: j=1 :MATH] ,..., [MATH: n :MATH] ), and [MATH: k :MATH] is the number of AE signatures. After data pre-processing, we have [MATH: p :MATH] = 134, [MATH: n :MATH] = 15. We adopted the optimization algorithm given by [63]Lee and Seung (2000) to get the factorized matrix [MATH: W :MATH] and [MATH: H :MATH] . In practice, we proceeded NMF with 50 different initializations of [MATH: W :MATH] and [MATH: H :MATH] to avoid sticking in the local stationary point. Clustering of CEUAFD drugs Similar to the work of [64]Brunet et al. (2004), [MATH: H :MATH] returned by the NMF procedure can be used for drug clustering. If [MATH: v*=argmax< /mtext>v=1,...,kh< mi>vj :MATH] , we can classify drug [MATH: j :MATH] into cluster [MATH: v* :MATH] . Tissue-specific genes in the GTEx database For a specific tissue [MATH: t :MATH] , we denote the mean expression level for gene [MATH: g :MATH] as [MATH: E¯tg :MATH] , and the mean expression level for gene [MATH: g :MATH] across all tissues in the GTEx database as [MATH: E¯g :MATH] . We further assume that among all tissues, the standard deviation of the expression level for gene [MATH: g :MATH] is [MATH: Sg :MATH] , and denote the tissue-specific score for gene [MATH: g :MATH] in tissue [MATH: t :MATH] as [MATH: Ztg=E¯tgE¯gSg. :MATH] (5) For each tissue [MATH: t :MATH] , we pick up 200 tissue-specific genes whose corresponding [MATH: Ztg :MATH] values are the 200 largest values among all genes. Evaluation of the association between the drug and tissue We evaluated the association between each drug and tissue using network proximity. Suppose for a specific drug [MATH: j :MATH] , the module [MATH: Aj :MATH] consists of its target proteins, which can be obtained from the DrugBank database ([65]Wishart et al., 2018), while the module [MATH: Bt :MATH] for a specific tissue [MATH: t :MATH] contains the tissue-specific proteins. We define the association between drug [MATH: j :MATH] and tissue [MATH: t :MATH] as the distance between module [MATH: Aj :MATH] and [MATH: Bt :MATH] : [MATH: djt=aAjminbBtd(a,b) +bBtminaAjd(a,b)|Aj |+|Bt< /mi>|, :MATH] (6) where [MATH: d(a,b) :MATH] is the shortest distance between gene [MATH: a :MATH] and gene [MATH: b :MATH] in the gene–gene regulatory network. The significance of the association was calculated by the permutation test. Specifically, for each drug [MATH: j :MATH] and tissue [MATH: t :MATH] pair, we randomly picked up two sets of genes, where the first set contains the same number of genes as the drug–target gene set and the second set contains the same number of genes as the tissue-specific gene set, and then calculated the network distance by [66]Eq. 6. This procedure was conducted 100 times to construct the null distribution of network distance. Suppose the 100 network distances obtained from the permutation procedure are denoted as [MATH: djt,1 :MATH] ,..., [MATH: djt,100< /mrow> :MATH] , and we introduced set [MATH: Djt :MATH] to contain these 100 distance values. The significance level of the drug–tissue association is denoted as the proportion of distance value in [MATH: Djt :MATH] , which is less than [MATH: djt :MATH] . One-sided Wilcoxon rank test We grouped AEs using their System Organ Class (SOC) categories in Medical Dictionary for Regulatory Activities (MedDRA). For a specific SOC category, suppose it contained [MATH: m :MATH] AEs in the CEUAFD. We further assumed that for the [MATH: v :MATH] -th AE signature, the probabilities for these [MATH: m :MATH] AEs were [MATH: W1v :MATH] ,…, [MATH: Wmv :MATH] . To test whether there is a significant association between this SOC category and the [MATH: v :MATH] -th AE signature, we reformulated it to test whether the distribution formed by [MATH: {Wiv}i=1m :MATH] is significantly larger than the distribution formed by pooling [MATH: {Wiv}i=1m :MATH] for all [MATH: vv :MATH] . This test was performed by the one-sided Wilcoxon rank test. Effect size of the association between the SOC category and AE signature Apart from the one-sided Wilcoxon rank test, we also evaluated the association between SOC categories and AE signatures by calculating effect size (ES). Specifically, for a SOC category containing [MATH: m :MATH] AEs, we use [MATH: ωv= :MATH] (W[1v]  , [MATH: :MATH] ,W[mv] ) to denote the probabilities for the [MATH: m :MATH] AEs in the [MATH: v :MATH] -th AE signature, and [MATH: ωvc=(W1v,,Wmv)vv :MATH] is a vector containing probabilities for the [MATH: m :MATH] AEs in the other three AE signatures. The ES is defined as [MATH: ES=mean (ωv)mean(ωvc)sd(ωvc). :MATH] (7) Sparse linear discriminant analysis We conducted SLDA ([67]Mai et al., 2012) for each pair of drug clusters to pick up discriminant genes. Specifically, for a pair of drug clusters ( [MATH: v1,v2< /mn> :MATH] ), suppose there are [MATH: n1 :MATH] and [MATH: n2 :MATH] drugs in these two clusters with expression information accessible in the Library of Integrated Network-Based Cellular Signatures (LINCS) A549 database ([68]Subramanian et al., 2017). We assumed that the cluster labels are coded as [MATH: yj=nn1 :MATH] if drug [MATH: j :MATH] belongs to cluster [MATH: v1 :MATH] and [MATH: yj=nn2 :MATH] if drug [MATH: j :MATH] belongs to cluster [MATH: v2 :MATH] , where [MATH: n1 :MATH] and [MATH: n2 :MATH] are the number of drugs in cluster [MATH: v1 :MATH] and [MATH: v2 :MATH] separately, with [MATH: n=n1 :MATH] + [MATH: n2 :MATH] . The gene expression profile for drug [MATH: j :MATH] was denoted as [MATH: xj :MATH] with length [MATH: p :MATH] . The discriminant genes were selected by solving the following l1 penalized least squares problem: [MATH: (β^,β^0)=argminβ,β0{ n1j=1< /mn>n(yjβ0xjTβ)2+ λk=1< /mn>p|βk|}. :MATH] (8) In practice, we used the R package TULIP to solve the abovementioned optimization problem. Because of the limited sample size, cross validation is not reliable in selecting a suitable [MATH: λ :MATH] , and we set [MATH: λ :MATH] to be the smallest one among the sequence of [MATH: λ :MATH] values provided by function dsda. Those genes whose [MATH: β^ :MATH] are non-zero were considered as discriminant genes. Results AE-signature analysis for approved CEUAFD drugs The CEUAFD collected from the beginning of the COVID-19 pandemic until September 2021 was downloaded for COVID-19 drug-related AE-signature analysis. We obtained 9,754 reports, corresponding with 15 drugs and 134 unique AE reports after the data-preprocessing procedures (see [69]Supplementary Material). We then applied NMF to the drug-AE count matrix [MATH: XR134×15 :MATH] to extract the collection of AE signatures, with each element of [MATH: X :MATH] representing the number of patients reporting a specific AE for a particular drug. After a stability-driven model selection procedure ([70]Brunet et al., 2004), we obtained four AE signatures deciphered by NMF, which constituted a sufficient and non-redundant base in depicting the drug-specific AE distributions (see [71]Supplementary Material). We also conducted the clustering of CEUAFD drugs, with drugs in each cluster possessing high abundance fractions on the same AE signature. Some basic information for nine widely used CEUAFD drugs (more than 20 items of AE feedback in the CEUAFD by September 2021) is shown in [72]Table 1. Five out of the nine drugs fall into the second cluster. The two combinational drugs, casirivimab and imdevimab and bamlanivimab and etesevimab, are both enriched with AE signature 3. In contrast, the other two widely used drugs, remdesivir and bamlanivimab, are representative drugs of clusters 1 and 4 separately. Overall, drugs with the most frequent AEs are also representative AEs for each corresponding AE signature within each drug cluster. TABLE 1. Summary of nine CEUAFD drugs. Information on nine widely used CEUAFD drugs (more than 20 items of AE feedback in the CEUAFD by September 2021) is shown, including the number of patients taking one specific drug, the two most frequent AEs, and the cluster to which each drug was assigned. CEUAFD drugs #patients Two most frequent AEs Clusters Hydroxychloroquine 71 Electrocardiogram Qt prolonged; hypoglycaemia 2 Remdesivir 4047 Alanine aminotransferase increased; aspartate aminotransferase increased 1 Tocilizumab 71 Alanine aminotransferase increased; aspartate aminotransferase increased 2 Casirivimab and imdevimab 722 Dyspnoea; chills 3 Bamlanivimab 4375 Dyspnoea; pyrexia 4 Baricitinib 123 Lymphocyte count decreased; acute kidney injury 2 Covid-19 convalescent plasma 82 Dyspnoea; chills 2 Bamlanivimab and etesevimab 171 Dyspnoea; pyrexia 3 Vancomycin hydrochloride 23 Acute kidney injury; respiratory failure 2 [73]Open in a new tab To give a more general picture of each AE signature, we used the top 20 representative AEs to characterize the rough distributions of four AE signatures (see [74]Supplementary Material) and grouped AEs by their SOC categories in MedDRA. In addition, we performed a one-sided Wilcoxon rank test for every SOC category in each AE signature to determine whether AEs belonging to one SOC category have a high-probability accumulation in some AE-signature distributions. For the first AE signature, AEs with top probabilities include some liver-related symptoms (alanine aminotransferase increased, aspartate aminotransferase increased, and blood creatinine increased), kidney-related symptoms (acute kidney injury and glomerular filtration rate decreased), and cardiovascular-related symptoms (bradycardia, cardiac arrest, and hypotension) ([75]Supplementary Figure S1). Most of them were classified into SOC categories “investigations” and “renal and urinary disorders.” These two categories correspond to a higher level of AE severity ([76]Figure 2), where the “investigations” category includes some laboratory test indexes, radiologic test indexes, or physical examination indexes. Compared with other drugs, AEs of remdesivir are more abundant in the first AE signature, with more than 90% of AEs attributed to AE signature 1 ([77]Supplementary Figure S2). This finding can be supported by several existing studies which reported the damage to the liver and renal functions of remdesivir users among COVID-19 patients ([78]Perveen et al., 2020; [79]Zampino et al., 2020; [80]Singh and Kamath, 2021). Thus, COVID-19 patients with poor liver, renal, or probably cardiac functions should take remdesivir with caution. FIGURE 2. [81]FIGURE 2 [82]Open in a new tab Summary of statistically significant SOC categories returned by the one-sided Wilcoxon rank test. All 134 AEs extracted from the CEUAFD are classified into different SOC categories in MedDRA. By the one-sided Wilcoxon rank test, only significant associations between the SOC category and AE signature are shown. The size of the circles corresponds to different levels of p values. The colors represent different effect sizes (ESs) of the association between the SOC category and AE signature. We detected significant aggregation in some kidney-related AEs (acute kidney injury), cardiovascular-related AEs (electrocardiogram QT prolonged, hypotension, and hypoglycemia), and AEs related to the immune system (lymphocyte count decreased, lymphopenia, and leukocytosis) for AE signature 2. This leads to a significant high-probability accumulation of SOC categories “vascular disorder” (p = 0.0095) and “blood and lymphatic system disorders” (p = 0.03) in AE signature 2 ([83]Figure 2 and [84]Supplementary Figure S3). Hydroxychloroquine, tocilizumab, COVID-19 convalescent plasma, vancomycin hydrochloride, and baricitinib are more enriched with AE signature 2 compared with other drugs ([85]Supplementary Figure S2). Interestingly, tocilizumab and baricitinib are both categorized as “immune modulators” by the FDA when treating COVID-19. Recent research also highlighted that hydroxychloroquine treatment may impair host immunity in response to SARS-CoV-2 ([86]Devarajan and Vaseghi, 2021; [87]Rother et al., 2020), while baricitinib could not be initiated in patients with small lymphocyte count ([88]Praveen et al., 2020). Synthesizing all the abovementioned evidence, we recommend that COVID-19 patients with weak immune systems or vascular disorders should not be prescribed hydroxychloroquine, tocilizumab, or baricitinib. AE signature 3 and AE signature 4 have many representative AEs in common, such as dyspnoea, pyrexia, hypoxia, oxygen saturation decreased, nausea, vomiting, dizziness, and cough ([89]Supplementary Figures S4, S5). Most of these common representative AEs were classified into the “General disorders and administration site conditions” SOC category ([90]Figure 2). Standing on the SOC category perspective, we can distill more general differences between these two AE signatures—part of the representative AEs in AE signature 3 can be attributed to the “Skin and subcutaneous tissue disorders” SOC category, while AEs belonging to the SOC category “Nervous system disorders” are more abundant in AE signature 4 ([91]Figure 2). Casirivimab and imdevimab and bamlanivimab are the representative drugs of AE signature 3 and 4 separately. Another combinational drug bamlanivimab and etesevimab is also enriched with AE signature 3 ([92]Supplementary Figure S2). Different from the first two AE signatures, the symptoms of the last two AE signatures are relatively mild. This finding is consistent with the FDA’s recommendation that mild-to-moderate COVID-19 in adults and children not admitted to hospital may be treated by bamlanivimab ([93]Mahase, 2020). We also found that reduced oxygen saturation is a common AE brought by both casirivimab and imdevimab and bamlanivimab, which was supported by existing research ([94]Kano et al., 2021). Therefore, casirivimab and imdevimab should be taken carefully by COVID-19 patients with low oxygen saturation. AE-signature analysis for drugs in the clinical trial stage An AE-signature analysis for drugs that are still in the clinical trial stage to fight against COVID-19 was performed. Under the rationale assumption that similar drugs may induce similar side effects ([95]Campillos et al., 2008; [96]Zhang et al., 2017), we predicted their AEs by combining the last-stage similarity network and CEUAFD drugs’ clustering output. Part of the predicted last-stage drug–drug similarity network is presented in [97]Figure 3A, including the five widely used CEUAFD drugs and their five nearest neighbor drugs which were still in the clinical investigation stage. We found that baricitinib and hydroxychloroquine are each other’s five nearest neighbor drugs, which is consistent with the NMF result of the CEUAFD that these two drugs are both representative drugs of AE signature 2. On the other hand, casirivimab and imdevimab and bamlanivimab (both have relatively mild AE feedback from COVID-19 patients) are also among each other’s top five nearest neighbor drugs. Finally, chloroquine, a widely recognized drug with functions similar to hydroxychloroquine, is also among the top five nearest neighbors of hydroxychloroquine. We define drugs from the CEUAFD, i.e., remdesivir, baricitinib, casirivimab and imdevimab, and bamlanivimab, as the representative drugs for the four clusters. With respect to the four drugs, the top 15 nearest neighbor drugs with each are shown in [98]Table 2. FIGURE 3. [99]FIGURE 3 [100]Open in a new tab (A) Drug–drug similarity network between five widely used CEUAFD drugs and their five nearest neighbor drugs. The arrow pointing from drug A to drug B means drug B is among the five nearest neighbor drugs of drug A. (B) Heatmap of the significance of the association between tissues and drugs in four drug clusters. The association for each drug–tissue pair was calculated by network proximity, while the significance of the association was performed using a permutation test. Colors indicate the −log10 (p value) of each drug–tissue pair. TABLE 2. Four drug clusters based on the integrated drug–drug similarity network and NMF outcomes of the CEUAFD. Drugs in bold are from the CEUAFD, and each drug cluster features one specific AE signature. Cluster 1 Cluster 2 Cluster 3 Cluster 4 Remdesivir Baricitinib Casirivimab and imdevimab Bamlanivimab Fostamatinib Ruxolitinib Argatroban Ritonavir Iloprost Tofacitinib Fluvoxamine Lopinavir Thalidomide Nintedanib Quetiapine Nitazoxanide Methotrexate Hydroxychloroquine Ivermectin Ibrutinib Pitavastatin Azithromycin Amiodarone Clofazimine Naltrexone Chloroquine Amlodipine Sirolimus Chlorpromazine Nicotinamide Resveratrol Decitabine Artesunate Telmisartan Cyproheptadine Amoxicillin Toremifene Ibudilast Simvastatin Etoposide Masitinib Imatinib Artemisinin Rivaroxaban Enzalutamide Clarithromycin Ambrisentan Atorvastatin Doxycycline Dexamethasone Melphalan Maraviroc Candesartan Methylprednisolone Disulfiram Leflunomide Pirfenidone Prednisone Dipyridamole Prazosin [101]Open in a new tab To give a further explanation of the predicted drug clustering result, we applied the network proximity approach to evaluate the association between each drug–tissue pair using the GTEx database. The significance level for each drug–tissue pair is shown in [102]Figure 3B. Compared with the other three drug clusters, more drugs from cluster 1 possess a higher association level in tissues such as the liver, lung, prostate, and blood. There is a strong association between brain tissue and drugs from cluster 4, which gives an indirect reflection that drugs in cluster 4 may induce nervous system disorders. Furthermore, we listed several representative drugs with literature-reported AEs observed in pharmacological and genomic spaces ([103]Table 3). The widely reported AEs showed high-level consistency with the typical AEs in each drug cluster. TABLE 3. Summary of 10 representative COVID-19 drugs in the clinical trial stage. Information on 10representative COVID-19 drugs which are in the clinical trial stage is shown, including DrugBank ID, typical AEs with literature support, and the drug cluster to which each drug belongs. DrugBank ID Drug name AEs reported in the literature Drug cluster DB12010 Fostamatinib Hepatic function impairment, [104]Newland and McDonald (2020) 1 DB01041 Thalidomide Bradycardia, [105]Thangaraju et al. (2019) 1 DB01088 Iloprost Acute kidney injury, [106]Uyar et al. (2016); increased GGT, [107]Hsu and Rubin (2005) 1 DB08860 Pitavastatin Change of glomerular filtration rate, [108]Baik et al. (2019) 1 DB01611 Hydroxychloroquine Tachycardia and hypotension, [109]Marquardt and Albertson (2001) 2 DB08895 Tofacitinib Immune system injury, [110]Chen et al. (2017); deep vein thrombosis, [111]Cohen et al. (2020) 2 DB08877 Ruxolitinib Increased risk of infections and thromboembolic events, [112]Shalabi et al. (2022) 2 DB00176 Fluvoxamine Dyspnoea, nausea, and headache, [113]Lenze et al. (2020) 3 DB01601 Lopinavir Acute kidney injury, [114]Binois et al. (2020) 4 DB00507 Nitazoxanide Gastrointestinal complaints and headache, [115]Walsh et al. (2020) 4 [116]Open in a new tab Molecular mechanisms accounting for different drug clusters We employed LINCS data with the A549 cell line ([117]Subramanian et al., 2017) and applied SLDA ([118]Mai et al., 2012) to further explore the molecular mechanisms for the four deciphered AE signatures. Our target is to find therapeutic biomarkers presenting distinguishing expression patterns for each pair of drug clusters. Part of the distinguished molecular mechanisms of four AE signatures was reflected by discriminative biomarkers extracted from SLDA. For every pair of drug clusters, we obtained a discriminative gene set by conducting SLDA. Log fold change of each drug’s expression levels on the selected 67 SLDA genes is shown in [119]Figure 4A. Because representative AEs of drug clusters 3 and 4 are relatively mild and the AEs in clusters 1 and 2 are more severe and characteristic, we merged drug clusters 3 and 4 as the control cluster. SLDA genes showing differential expression patterns in the drug cluster 1–control pair and 2–control pair were collected for further validation. A biological function enrichment analysis was implemented to uncover the possible relationships between enriched pathways and representative differential AEs in the pair of drug clusters (see [120]Supplementary Material). FIGURE 4. [121]FIGURE 4 [122]Open in a new tab (A) Heatmap of the log fold change of gene expression levels for drugs in four clusters. Each row is a gene, and each column is a drug. Drugs are ordered according to their cluster index number. (B) Enriched pathways in two discriminant gene sets corresponding to two pairs of drug clusters—drug cluster 1–control pair and 2–control pair. Enrichment analysis was performed in two discriminant gene sets separately, where the two gene sets contain genes showing distinguishing expression patterns between drug cluster 1 and the control group and drug cluster 2 and the control group separately, and the control group was defined as the combination of drug cluster 3 and 4. Ten pathways with the most significant enrichment p values are shown, and pathways enriched in different gene sets are filled with different colors (left). Discriminant genes that emerged in the enriched pathways are shown (right). We found that gene FAS presents highly different expression levels between drug clusters 1 and 2 ([123]Figure 4A). Fas and Fas-ligands are classical members of the TNF receptor and TNF ligand families. The interaction of the receptor with its ligand allows the formation of a death-inducing signaling complex that includes the Fas-associated death domain protein (FADD), caspase 8, and caspase 10. With this mechanism, they play an important role in the regulation of apoptotic processes, including activation-induced cell death, T-cell-induced cytotoxicity, and multiple organ function impairment ([124]Wajant et al., 2003). By conducting enrichment analysis, the differential genes between drug cluster 1–control pair are enriched in the RHO GTPases Activate NADPH Oxidases pathway, with PIN1 and RAC2 being the pathway input genes ([125]Figure 4B). To our knowledge, RAC2 is an important component of NADPH oxidase ([126]Kim and Dinauer, 2001). Excessive NADPH oxidase-derived ROS production can induce multiple tissue injuries and prolonged inflammatory responses, leading to inflammatory diseases ([127]Belambri et al., 2018). Different from that mentioned above, the KEGG Reactome analysis showed that a large number of differential genes in the drug cluster 2–control pair flowed to signal transduction pathway ([128]Figure 4B), which is related to many diseases such as cancer, atherosclerosis, and inflammatory diseases. Notably, the SMAD3 gene is selected in this pathway. SMAD3 is a major transcription factor in transforming growth factor- [MATH: β :MATH] (TGF- [MATH: β :MATH] ) signaling. The TGF- [MATH: β :MATH] /Smad-dependent signaling pathway has been shown to be activated in models of myocardial infarction, as well as in multiple pathological processes ([129]Bujak et al., 2007; [130]Peng et al., 2021). Discussion With respect to COVID-19 drug safety, we developed a framework to predict the AE profiles for those drugs which are not in large-scale use. This framework has some advantages from methodological, clinical, and molecular viewpoints. Some innovative points in our method are worth to be mentioned. The advantages of NMF are two-fold. First, NMF possessed the ability of dimension reduction and drug clustering. Second, by further SOC category enrichment analysis, NMF can capture the subpopulation fragile to each AE signature with a higher precision level. On the other hand, most network integration methods assumed node consistency among different networks. With the rapid development of new drugs, it is hard to fulfill that each drug’s information is available in all data sources. We developed an MDS-based network imputation method, which was achieved by node embedding and building a predictive model employing network-shared drugs. This method can effectively use incomplete information. The imputation is particularly useful when drugs’ information is unable to be presented by low-dimensional vectors (for example, chemical structure or high-dimensional bioassay data), as our imputation model was built in the low-dimensional embedded space. Lastly, we utilized SLDA to seek discriminative genes which may provide molecular-level explanation for different AE signatures among drug clusters. If there is a correspondence between the mechanism of SLDA genes and different AE-signature patterns for each pair of drug clusters, we validate the rationality of our drug clustering results and AE prediction results in some sense. On the other hand, combining symptomatic features with molecular-level features for COVID-19 patients may contribute to more precise therapeutic prescriptions. Clinically, we presented some important discoveries. Our analysis of the COVID-19 EUA FAERS database showed that remdesivir may bring damage to liver, renal, or probably cardiac functions, which is widely acknowledged with the evolvement of in-depth research ([131]Li et al., 2021). Tocilizumab and baricitinib, which are categorized as “immune modulators” by the FDA, may cause vascular, blood, and lymphatic system disorders in COVID-19 patients. Another important message brought by the CEUAFD is that AE feedbacks from combinational drugs are relatively mild. In addition, by clustering all COVID-19 drugs, we found that among the top five nearest neighbors of Baricitinib, three of them belong to the “tinib” family. Our prediction can be verified by [132]Halimi et al. (2008), who reported that prolongation of the QT interval and heart failure are frequent AEs in patients using tinibs. From the viewpoint of molecular level, drugs from cluster 1 and control cluster showed differential expression patterns in the RHO GTPases Activate NADPH Oxidases pathway. NADPH oxidase (NOX) is a multimeric transmembrane enzyme complex that uses NADPH as an electron donor to generate superoxide (O2-) and hydrogen peroxide (H2O2) from molecular oxygen. It participates in various biological processes including innate immunity, and biosynthetic processes ([133]Bedard and Krause, 2007). In the pathophysiological process of the liver, NADPH oxidase is expressed functionally in phagocytic and non-phagocytic forms. NOX-derived ROS contributes to various liver diseases caused by alcohol, hepatitis C virus, and toxic bile acids ([134]Paik et al., 2014). In addition to causing liver injury, the impairment of this pathway is also related to renal insufficiency. Abnormally activated NOX in renal microvessels can lead to superoxide production. Oxidative stress in the kidney contributes to renal vascular remodeling and increases preglomerular resistance. Some reports showed that they are key factors in acute and chronic kidney injury ([135]Xu et al., 2020), while for the drug cluster 2–control pair, the differential genes are more relevant to the signal transduction pathway. The relationship between the blockage of the signal transduction pathway and cardiovascular events has been widely reported ([136]Wheeler-Jones, 2005). Previous studies have demonstrated that several kinds of TKIs could lead to grade III or higher QT prolongation, and animal models suggested that this might be caused by the inhibition of PI3K signaling ([137]Schiefer et al., 2018). In addition, the impairment of other signaling pathways is also intimately related to cardiovascular disorders. Mechanistic studies suggested that disturbed TGF- [MATH: β :MATH] signaling may also contribute to non-genetic cardiovascular disorders such as atherosclerosis and cardiac fibrosis ([138]Goumans and Ten Dijke, 2018). Reactivation of the WNT signaling pathway has also been observed in many pathologies of cardiac and vascular vessels ([139]Foulquier et al., 2018). There are also some limitations in our study. Although the CEUAFD provided rich AE feedback for some approved COVID-19 drugs, only 9 drugs were reported by more than 20 patients. This insufficient AE feedback phenomenon may induce inaccuracy in extracting AE signatures. Therefore, the four factorized AE signatures cannot depict AE information for all COVID-19 drugs. Due to the limitations of data accessibility, we only used LINCS data processed on lung tissues to explore molecular-level differences among drug clusters. With the propelling of COVID-19 research, more and more molecular-level data will be released to promote our understanding of the mechanism of AEs returned by different COVID-19 drugs. Conclusion Many existing COVID-19 drugs are still in the clinical trial stage, thus lacking abundant AE feedback. This drug safety analysis tried to solve this important problem by borrowing information from the AE profile of other “old” drugs. Our analysis took the heterogeneous health conditions of COVID-19 patients into consideration and proposed a computational framework to predict AEs for potential COVID-19 drugs. Therefore, the proposed framework jumps out of the current winner-takes-all drug repurposing framework and can provide better precise drug recommendations. We believe that our framework can be generalized to other diseases for precision drug recommendation and drug clustering with reduced time and cost. Data availability statement The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/[140]Supplementary Material. Author contributions YZ conceived and supervised the study. HW and XW collected the data. HW conducted the analysis. HW wrote the first draft of the manuscript with input from XW, TL, and YZ. HW, DL, and YZ revised the manuscript. All authors reviewed and approved the final version of the manuscript. Conflict of interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Publisher’s note All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher. Supplementary material The Supplementary Material for this article can be found online at: [141]https://www.frontiersin.org/articles/10.3389/fgene.2022.1019940/fu ll#supplementary-material [142]Click here for additional data file.^ (4.1MB, PDF) References