Abstract

   Given the considerable cost of drug discovery, drug repurposing is
   becoming attractive as it can effectively shorten the development
   timeline and reduce the development cost. However, most existing
   drug-repurposing methods omitted the heterogeneous health conditions of
   different COVID-19 patients. In this study, we evaluated the adverse
   effect (AE) profiles of 106 COVID-19 drugs. We extracted four AE
   signatures to characterize the AE distribution of 106 COVID-19 drugs by
   non-negative matrix factorization (NMF). By integrating the information
   from four distinct databases (AE, bioassay, chemical structure, and
   gene expression information), we predicted the AE profiles of 91 drugs
   with inadequate AE feedback. For each of the drug clusters,
   discriminant genes accounting for mechanisms of different AE signatures
   were identified by sparse linear discriminant analysis. Our findings
   can be divided into three parts. First, drugs abundant with
   AE-signature 1 (for example, remdesivir) should be taken with caution
   for patients with poor liver, renal, or cardiac functions, where the
   functional genes accumulate in the RHO GTPases Activate NADPH Oxidases
   pathway. Second, drugs featuring AE-signature 2 (for example,
   hydroxychloroquine) are unsuitable for patients with vascular
   disorders, with relevant genes enriched in signal transduction
   pathways. Third, drugs characterized by AE signatures 3 and 4 have
   relatively mild AEs. Our study showed that NMF and network-based
   frameworks contribute to more precise drug recommendations.

   Keywords: COVID-19 drugs, adverse effect signature, non-negative matrix
   factorization, network integration, precise drug recommendation

Introduction

   The coronavirus disease 2019 (COVID-19) has swept the world for over
   2 years. Although COVID-19 vaccines have made an indelible contribution
   to triumphing over the epidemic, it is not the silver bullet to end the
   pandemic. An increasing number of infected cases were reported even
   though they were fully vaccinated (i.e., the COVID-19 vaccine
   breakthrough infections) ([34]CDC Covid-19 Vaccine Breakthrough Case
   Investigations Team, 2021). There is an urgent need to develop
   therapeutic drugs to fight against this pandemic.

   Because traditional drug discovery methods are time consuming and
   expensive, the drug repurposing method is becoming an attractive option
   in the current urgent circumstance. Recently, many computational
   approaches have been developed to narrow down the search space of drugs
   to accelerate the process of drug repurposing against SARS-COV-2
   ([35]Gordon et al., 2020; [36]Jang et al., 2021; [37]Adhami et al.,
   2021; [38]Guo et al., 2021; [39]Gysi et al., 2021). Among them, the
   network-based approach accounts for a large proportion. [40]Rintala et
   al. (2022) reviewed some representative drug repurposing results for
   COVID-19, which applied network proximity algorithms. Most of the
   existing computational methods have two main shortcomings. First, only
   one particular type of data was used in the drug repurposing procedure,
   which may limit the power to identify therapeutic candidates.
   Integrating multiple heterogeneous data sources would be more powerful.
   The other shortcoming is the so-called winner-takes-all pattern.
   Specifically, the selected repurposed drug may be recommended to all
   patients without taking the heterogeneous health conditions into
   account. In the era of precision medicine, it is normal to recommend
   specific drugs for patients with distinct underlying conditions, even
   though they are diagnosed with the same disease. To address the
   limitations of the winner-takes-all pattern in the existing drug
   repurposing scheme, predicting the drug’s heterogeneous safety levels
   with respect to the different health conditions of patients is
   essential.

   From the perspective of drug safety, drugs repurposed for treating
   COVID-19 can be classified into two categories. The first category
   contains drugs that are still in or about to be in the stage of
   clinical trials. Because these drugs have not been used by COVID-19
   patients on a large scale, their adverse effect (AE) feedback may be
   inadequate and the evaluation of drug safety remains unclear. In this
   case, it is more desirable to conduct computational methods to predict
   side effects for drugs in the clinical trial stage to save research
   time and cost. The prediction of drug side effects is usually based on
   drug–drug similarity through integrating multi-source data, including
   chemical structures, protein targets, and therapeutic indication
   ([41]Wang et al., 2014; [42]Zhang et al., 2016; [43]Timilsina et al.,
   2019).

   The second category involves drugs that have been approved by the World
   Health Organization (WHO) or drugs with relatively abundant post-market
   surveillance-reported AEs offered by COVID-19 patients. [44]Jing et al.
   (2021) mined the AE information of several COVID-19 drugs using the FDA
   adverse events reporting system (FAERS) database, presenting the
   landscape of overreported AEs in each organ/system for each drug.
   [45]Wu et al. (2020) explored the associations between COVID-19 drugs
   and 30 human tissues based on network proximity. Although it is more
   systematic to classify AEs into tissue level or system level, the
   biological process causing AEs might be cross organs or cross systems.
   Thus, it is more powerful to integrate different AEs and different
   drugs together to dig out a higher-level landscape of AEs, which can
   give us a comprehensive picture of drug safety and offer benefits for
   drug recommendation in precision medicine.

   In this drug safety study, we conducted the AE profile for two
   different kinds of drugs: drugs with abundant AE information and drugs
   without abundant AE information. First, based on the relatively
   abundant AE information provided by FAERS, we proposed a non-negative
   matrix factorization (NMF) ([46]Lee and Seung, 2000) framework for
   drugs that have already been used to treat COVID-19. Apart from
   extracting four AE signatures to depict each drug’s AE distribution,
   NMF also provided abundance fractions on the four AE signatures for
   each drug, which we refer to a characteristic combination of AEs as an
   AE-signature. The higher-level landscape of AEs represented by AE
   signatures can help us partition the whole COVID-19 patient cohort into
   four subpopulations, with each subpopulation more vulnerable to its
   corresponding AE signature. Therefore, we can conduct the precise drug
   recommendation by incorporating the AE signature’s abundance fraction
   for each drug.

   Second, because COVID-19 drugs in the clinical trial stage lack AE
   feedback, we predicted their AE profiles to achieve more accurate
   personalized drug recommendations. Specifically, we
     * 1) Constructed drug–drug similarity networks using data coming from
       multiple sources
     * 2) Built a network imputation framework to tackle information
       imbalance among networks
     * 3) Combined the integrated similarity network with drugs possessing
       a relatively affluent drug safety profile

   We presented literature support and molecular-level explanations to
   illustrate the rationality of the AE prediction results.

Methods

Workflow

   The workflow of our study is illustrated in [47]Figure 1.

FIGURE 1.

   [48]FIGURE 1
   [49]Open in a new tab

   Workflow. (A) Four types of input data, including bioassay, chemical
   structure, LINCS, and FAERS data. (B) Construction of four first-stage
   networks based on four types of drug data. (C) Construction of four
   second-stage networks based on drug intersection and MDS-based network
   imputation. (D) Construction of last-stage network from similarity
   network fusion (SNF). (E) Non-negative matrix factorization (NMF) of
   count matrix from the CEUAFD;
   [MATH: <mrow><mi mathvariant="bold-italic">X</mi></mrow> :MATH]
   , drug-AE count matrix;
   [MATH: <mrow><mi mathvariant="bold-italic">W</mi></mrow> :MATH]
   , AE-signature matrix; and
   [MATH: <mrow><mi mathvariant="bold-italic">H</mi></mrow> :MATH]
   , drug abundance fraction matrix. (F) Clustering of all COVID-19 drugs
   by combining results from (D) and (E), with each cluster featuring one
   AE signature. (G) Gene expression information extracted from LINCS data
   with the A549 cell line for four clusters of drugs. (H) Discriminant
   gene sets obtained by sparse linear discriminant analysis (SLDA) with
   respect to each pair of drug clusters.

   First of all, we collected drugs that are in the current COVID-19
   clinical trial stage ([50]Pundi et al., 2020; [51]Bugin and Woodcock,
   2021) together with drugs from the COVID-19 Emergency Use Authorization
   FAERS database (CEUAFD), mined their information from four different
   datasets (encompassing AE, bioassay, chemical structure, and gene
   expression information), and constructed four first-stage drug–drug
   similarity networks accordingly (see [52]Supplementary Material). For
   simplicity, we refer to COVID-19 drugs that emerged in the CEUAFD as
   CEUAFD drugs hereafter. Second, for drugs in the clinical trial stage,
   we only reserved drugs whose information was available among all four
   databases. For some CEUAFD drugs abundant in certain AE signatures,
   their information across all four databases may be incomplete. To
   leverage their informative AE feedback and tackle the information
   imbalance problem, we built an imputation method based on
   multi-dimensional scaling (MDS) ([53]Cox and Cox, 2008) to patch up the
   information imbalance among data sources, forming
   four-dimension-consistent imputed second-stage networks. Third, we
   applied the similarity network fusion (SNF) method ([54]Wang et al.,
   2014) to the four second-stage networks to obtain a final-stage
   integrated drug–drug similarity network (see [55]Supplementary
   Material). At the same time, we conducted NMF to the CEUAFD drugs to
   extract four higher-level AE signatures and drugs’ abundance fractions
   on the deciphered AE signatures. Meanwhile, the dimension reduction of
   AEs and clustering of CEUAFD drugs can be achieved. Finally, based on
   the integrated similarity network and CEUAFD drugs’ clustering outcome
   from the NMF procedure, we obtained four drug clusters for all analyzed
   COVID-19 drugs, with each cluster featuring one specific AE signature.
   Precise recommendations for drugs in different clinical stages can be
   implemented subsequently (see [56]Supplementary Material). To
   demonstrate the reliability of our predicted results, we performed
   extensive literature explorations and measured drug–tissue distances by
   taking a network proximity approach on the Genotype-Tissue Expression
   (GTEx) database ([57]Lonsdale et al., 2013). For every pair of drug
   clusters among the four clusters, we conducted sparse linear
   discriminant analysis (SLDA) ([58]Mai et al., 2012) to pick up genes
   with discriminant expression patterns. If there is a correspondence
   between discriminant genes and AE signatures for each pair of drug
   clusters, the rationality of our drug clustering results can be proved.
   Pathway enrichment analysis (see [59]Supplementary Material) and gene
   function analysis revealed that the discriminant genes do have the
   ability to account for different AE signatures in each pair of drug
   clusters.

Network imputation

   We found that drugs abundant with AE feedback may have insufficient
   information from other data sources (for example, multi-compound drugs
   casirivimab and imdevimab). To make full use of the AE information
   reported by COVID-19 patients and construct a more informative
   integrated drug–drug similarity network, we imputed similarity scores
   for the other three databases based on the FAERS data.

   We took the bioassay database as an example to demonstrate the
   imputation process. Suppose the first-stage drug–drug similarity
   matrices constructed by FAERS and bioassay database are denoted by
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">S</mi><mi>f</mi></msub><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><msub><mi>n</mi><mi>f</mi></msu
   b><mo>×</mo><msub><mi>n</mi><mi>f</mi></msub></mrow></msup></mrow>
   :MATH]
   and
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">S</mi><mi>a</mi></msub><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><msub><mi>n</mi><mi>a</mi></msu
   b><mo>×</mo><msub><mi>n</mi><mi>a</mi></msub></mrow></msup></mrow>
   :MATH]
   (see [60]Supplementary Material). To make drug information reflected by
   different databases transmissible, we adopted the MDS ([61]Cox and Cox,
   2008) method to translate “drug–drug similarity” into
   “lower-dimensional coordinate” for all drugs. Therefore, a predictive
   model linking the “lower-dimensional coordinate” of FAERS and the
   bioassay database can be built using dataset-shared drugs. For drugs
   that lack information in bioassay form, the information provided by the
   FAERS database can be transmitted by this predictive model to fulfill
   the imputation.

   To let the “lower-dimensional coordinate” of drugs preserve the
   similarity given by
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">S</mi><mi>f</mi></msub></mrow> :MATH]
   or
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">S</mi><mi>a</mi></msub></mrow> :MATH]
   to the largest extent, we applied an eigenvalue decomposition (EVD)
   method to the double-centering matrix
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">C</mi><mi>f</mi></msub><mo>=</mo><mo>−</mo><m
   frac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><msub><mi
   mathvariant="bold-italic">H</mi><mi>f</mi></msub><msub><mi
   mathvariant="bold-italic">S</mi><mi>f</mi></msub><msub><mi
   mathvariant="bold-italic">H</mi><mi>f</mi></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">C</mi><mi>a</mi></msub><mo>=</mo><mo>−</mo><m
   frac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><msub><mi
   mathvariant="bold-italic">H</mi><mi>a</mi></msub><msub><mi
   mathvariant="bold-italic">S</mi><mi>a</mi></msub><msub><mi
   mathvariant="bold-italic">H</mi><mi>a</mi></msub></mrow> :MATH]
   , where
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">H</mi><mi>f</mi></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">H</mi><mi>a</mi></msub></mrow> :MATH]
   are the centering matrix, i.e.,
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">H</mi><mi>f</mi></msub><mo>=</mo><msub><mi
   mathvariant="bold-italic">I</mi><msub><mi>n</mi><mi>f</mi></msub></msub
   ><mo>−</mo><msub><mi
   mathvariant="bold-italic">J</mi><msub><mi>n</mi><mi>f</mi></msub></msub
   ><mo>/</mo><msub><mi>n</mi><mi>f</mi></msub></mrow> :MATH]
   , and
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">H</mi><mi>a</mi></msub><mo>=</mo><msub><mi
   mathvariant="bold-italic">I</mi><msub><mi>n</mi><mi>a</mi></msub></msub
   ><mo>−</mo><msub><mi
   mathvariant="bold-italic">J</mi><msub><mi>n</mi><mi>a</mi></msub></msub
   ><mo>/</mo><msub><mi>n</mi><mi>a</mi></msub></mrow> :MATH]
   with
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">I</mi><msub><mi>n</mi><mi>f</mi></msub></msub
   ><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><msub><mi>n</mi><mi>f</mi></msu
   b><mo>×</mo><msub><mi>n</mi><mi>f</mi></msub></mrow></msup></mrow>
   :MATH]
   and
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">I</mi><mi>a</mi></msub><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><msub><mi>n</mi><mi>a</mi></msu
   b><mo>×</mo><msub><mi>n</mi><mi>a</mi></msub></mrow></msup></mrow>
   :MATH]
   denoting the identity matrix and
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">J</mi><msub><mi>n</mi><mi>f</mi></msub></msub
   ><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><msub><mi>n</mi><mi>f</mi></msu
   b><mo>×</mo><msub><mi>n</mi><mi>f</mi></msub></mrow></msup></mrow>
   :MATH]
   and
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">J</mi><mi>a</mi></msub><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><msub><mi>n</mi><mi>a</mi></msu
   b><mo>×</mo><msub><mi>n</mi><mi>a</mi></msub></mrow></msup></mrow>
   :MATH]
   denoting the matrix of all ones. Concretely, we suppose
   [MATH: <mrow><msub><mi mathvariant="bold-italic">C</mi><mi
   mathvariant="bold-italic">f</mi></msub><mo>=</mo><mi
   mathvariant="bold-italic">P</mi><msub><mi mathvariant="bold">Λ</mi><mi
   mathvariant="bold-italic">f</mi></msub><msup><mi
   mathvariant="bold-italic">P</mi><mi
   mathvariant="bold-italic">T</mi></msup><mo>≜</mo><mover
   accent="true"><mi
   mathvariant="bold-italic">P</mi><mo>∼</mo></mover><msup><mover
   accent="true"><mi mathvariant="bold-italic">P</mi><mo>∼</mo></mover><mi
   mathvariant="bold-italic">T</mi></msup><mo>,</mo></mrow> :MATH]
   (1)
   [MATH: <mrow><msub><mi mathvariant="bold-italic">C</mi><mi
   mathvariant="bold-italic">a</mi></msub><mo>=</mo><mi
   mathvariant="bold-italic">Q</mi><msub><mi mathvariant="bold">Λ</mi><mi
   mathvariant="bold-italic">a</mi></msub><msup><mi
   mathvariant="bold-italic">Q</mi><mi
   mathvariant="bold-italic">T</mi></msup><mo>≜</mo><mover
   accent="true"><mi
   mathvariant="bold-italic">Q</mi><mo>∼</mo></mover><msup><mover
   accent="true"><mi mathvariant="bold-italic">Q</mi><mo>∼</mo></mover><mi
   mathvariant="bold-italic">T</mi></msup><mo>,</mo></mrow> :MATH]
   (2)

   where
   [MATH: <mrow><mi mathvariant="bold-italic">P</mi></mrow> :MATH]
   and
   [MATH: <mrow><mi mathvariant="bold-italic">Q</mi></mrow> :MATH]
   are orthogonal matrices with
   [MATH: <mrow><mi
   mathvariant="bold-italic">P</mi><mo>=</mo><mrow><mo>(</mo><mrow><msub><
   mi
   mathvariant="bold-italic">ξ</mi><mn>1</mn></msub><mo>,</mo><mo>.</mo><m
   o>.</mo><mo>.</mo><mo>,</mo><msub><mi
   mathvariant="bold-italic">ξ</mi><msub><mi>n</mi><mi>f</mi></msub></msub
   ></mrow><mo>)</mo></mrow></mrow> :MATH]
   and
   [MATH: <mrow><mi
   mathvariant="bold-italic">Q</mi><mo>=</mo><mrow><mo>(</mo><mrow><msub><
   mi
   mathvariant="bold-italic">η</mi><mn>1</mn></msub><mo>,</mo><mo>.</mo><m
   o>.</mo><mo>.</mo><mo>,</mo><msub><mi
   mathvariant="bold-italic">η</mi><msub><mi>n</mi><mi>a</mi></msub></msub
   ></mrow><mo>)</mo></mrow><mo>.</mo></mrow> :MATH]
   Here, we denote
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">ξ</mi><mi>k</mi></msub><mo>=</mo><msup><mrow>
   <mo>(</mo><mrow><msub><mi>ξ</mi><mrow><mi>k</mi><mo>,</mo><mn>1</mn></m
   row></msub><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>,</mo><msub><mi>
   ξ</mi><mrow><mi>k</mi><mo>,</mo><msub><mi>n</mi><mi>f</mi></msub></mrow
   ></msub></mrow><mo>)</mo></mrow><mi>T</mi></msup></mrow> :MATH]
   for
   [MATH:
   <mrow><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>.</mo><mo>.</mo><mo>.
   </mo><mo>,</mo><msub><mi>n</mi><mi>f</mi></msub></mrow> :MATH]
   ;
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">η</mi><mi>l</mi></msub><mo>=</mo><msup><mrow>
   <mo>(</mo><mrow><msub><mi>η</mi><mrow><mi>l</mi><mo>,</mo><mn>1</mn></m
   row></msub><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>,</mo><msub><mi>
   η</mi><mrow><mi>l</mi><mo>,</mo><msub><mi>n</mi><mi>a</mi></msub></mrow
   ></msub></mrow><mo>)</mo></mrow><mi>T</mi></msup></mrow> :MATH]
   for
   [MATH:
   <mrow><mi>l</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>.</mo><mo>.</mo><mo>.
   </mo><mo>,</mo><msub><mi>n</mi><mi>a</mi></msub></mrow> :MATH]
   .
   [MATH: <mrow><msub><mi
   mathvariant="bold">Λ</mi><mi>f</mi></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi
   mathvariant="bold">Λ</mi><mi>a</mi></msub></mrow> :MATH]
   are diagonal matrices with
   [MATH: <mrow><msub><mi
   mathvariant="bold">Λ</mi><mi>f</mi></msub><mo>=</mo><mi>d</mi><mi>i</mi
   ><mi>a</mi><mi>g</mi><mrow><mo>(</mo><mrow><msub><mi>λ</mi><mrow><mi>f<
   /mi><mo>,</mo><mn>1</mn></mrow></msub><mo>,</mo><mo>.</mo><mo>.</mo><mo
   >.</mo><mo>,</mo><msub><mi>λ</mi><mrow><mi>f</mi><mo>,</mo><msub><mi>n<
   /mi><mi>f</mi></msub></mrow></msub></mrow><mo>)</mo></mrow></mrow>
   :MATH]
   ,
   [MATH: <mrow><msub><mi
   mathvariant="bold">Λ</mi><mi>a</mi></msub><mo>=</mo><mi>d</mi><mi>i</mi
   ><mi>a</mi><mi>g</mi><mrow><mo>(</mo><mrow><msub><mi>λ</mi><mrow><mi>a<
   /mi><mo>,</mo><mn>1</mn></mrow></msub><mo>,</mo><mo>.</mo><mo>.</mo><mo
   >.</mo><mo>,</mo><msub><mi>λ</mi><mrow><mi>a</mi><mo>,</mo><msub><mi>n<
   /mi><mi>a</mi></msub></mrow></msub></mrow><mo>)</mo></mrow><mo>.</mo></
   mrow> :MATH]
   [MATH: <mrow><mover accent="true"><mi
   mathvariant="bold-italic">P</mi><mo>∼</mo></mover><mo>=</mo><mi
   mathvariant="bold-italic">P</mi><msubsup><mi
   mathvariant="bold">Λ</mi><mi
   mathvariant="bold-italic">f</mi><mfrac><mrow><mn>1</mn></mrow><mrow><mn
   >2</mn></mrow></mfrac></msubsup><mo>=</mo><mrow><mo>(</mo><mrow><msqrt>
   <msub><mi>λ</mi><mrow><mi>f</mi><mo>,</mo><mn>1</mn></mrow></msub></msq
   rt><msub><mi
   mathvariant="bold-italic">ξ</mi><mn>1</mn></msub><mo>,</mo><mo>.</mo><m
   o>.</mo><mo>.</mo><mo>,</mo><msqrt><msub><mi>λ</mi><mrow><mi>f</mi><mo>
   ,</mo><msub><mi>n</mi><mi>f</mi></msub></mrow></msub></msqrt><msub><mi
   mathvariant="bold-italic">ξ</mi><msub><mi>n</mi><mi>f</mi></msub></msub
   ></mrow><mo>)</mo></mrow></mrow> :MATH]
   and
   [MATH: <mrow><mover accent="true"><mi
   mathvariant="bold-italic">Q</mi><mo>∼</mo></mover><mo>=</mo><mi
   mathvariant="bold-italic">Q</mi><msubsup><mi
   mathvariant="bold">Λ</mi><mi
   mathvariant="bold-italic">a</mi><mfrac><mrow><mn>1</mn></mrow><mrow><mn
   >2</mn></mrow></mfrac></msubsup><mo>=</mo><mrow><mo>(</mo><mrow><msqrt>
   <msub><mi>λ</mi><mrow><mi>a</mi><mo>,</mo><mn>1</mn></mrow></msub></msq
   rt><msub><mi
   mathvariant="bold-italic">η</mi><mn>1</mn></msub><mo>,</mo><mo>.</mo><m
   o>.</mo><mo>.</mo><mo>,</mo><msqrt><msub><mi>λ</mi><mrow><mi>a</mi><mo>
   ,</mo><msub><mi>n</mi><mi>a</mi></msub></mrow></msub></msqrt><msub><mi
   mathvariant="bold-italic">η</mi><msub><mi>n</mi><mi>a</mi></msub></msub
   ></mrow><mo>)</mo></mrow></mrow> :MATH]
   . For convenience, we suppose
   [MATH:
   <mrow><msub><mi>λ</mi><mrow><mi>f</mi><mo>,</mo><mn>1</mn></mrow></msub
   ><mo>≥</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>≥</mo><msub><mi>λ</mi><mro
   w><mi>f</mi><mo>,</mo><msub><mi>n</mi><mi>f</mi></msub></mrow></msub><m
   o>≥</mo><mn>0</mn></mrow> :MATH]
   ,
   [MATH:
   <mrow><msub><mi>λ</mi><mrow><mi>a</mi><mo>,</mo><mn>1</mn></mrow></msub
   ><mo>≥</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>≥</mo><msub><mi>λ</mi><mro
   w><mi>a</mi><mo>,</mo><msub><mi>n</mi><mi>a</mi></msub></mrow></msub><m
   o>≥</mo><mn>0</mn></mrow> :MATH]
   .

   If the
   [MATH: <mrow><msub><mi>n</mi><mi>f</mi></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi>n</mi><mi>a</mi></msub></mrow> :MATH]
   drugs in FAERS and bioassay databases are embedded in the
   [MATH: <mrow><msub><mi>k</mi><mi>f</mi></msub></mrow> :MATH]
   - and
   [MATH: <mrow><msub><mi>k</mi><mi>a</mi></msub></mrow> :MATH]
   -dimensional Euclidean space separately, the coordinates of drugs in
   the two datasets can be represented by
   [MATH: <mrow><msub><mover accent="true"><mi
   mathvariant="bold-italic">P</mi><mo>∼</mo></mover><msub><mi
   mathvariant="normal">k</mi><mi
   mathvariant="normal">f</mi></msub></msub><mo>=</mo><mrow><mo>(</mo><mro
   w><msqrt><msub><mi>λ</mi><mrow><mi>f</mi><mo>,</mo><mn>1</mn></mrow></m
   sub></msqrt><msub><mi
   mathvariant="bold-italic">ξ</mi><mn>1</mn></msub><mo>,</mo><mo>.</mo><m
   o>.</mo><mo>.</mo><mo>,</mo><msqrt><msub><mi>λ</mi><mrow><mi>f</mi><mo>
   ,</mo><msub><mi>k</mi><mi>f</mi></msub></mrow></msub></msqrt><msub><mi
   mathvariant="bold-italic">ξ</mi><msub><mi>k</mi><mi>f</mi></msub></msub
   ></mrow><mo>)</mo></mrow></mrow> :MATH]
   and
   [MATH: <mrow><msub><mover accent="true"><mi
   mathvariant="bold-italic">Q</mi><mo>∼</mo></mover><msub><mi
   mathvariant="normal">k</mi><mi
   mathvariant="normal">a</mi></msub></msub><mo>=</mo><mrow><mo>(</mo><mro
   w><msqrt><msub><mi>λ</mi><mrow><mi>a</mi><mo>,</mo><mn>1</mn></mrow></m
   sub></msqrt><msub><mi
   mathvariant="bold-italic">η</mi><mn>1</mn></msub><mo>,</mo><mo>.</mo><m
   o>.</mo><mo>.</mo><mo>,</mo><msqrt><msub><mi>λ</mi><mrow><mi>a</mi><mo>
   ,</mo><msub><mi>k</mi><mi>a</mi></msub></mrow></msub></msqrt><msub><mi
   mathvariant="bold-italic">η</mi><msub><mi>k</mi><mi>a</mi></msub></msub
   ></mrow><mo>)</mo></mrow></mrow> :MATH]
   , with each row denoting one drug’s coordinate. More specifically, the
   coordinate of the
   [MATH: <mrow><mi>i</mi></mrow> :MATH]
   -th drug in the FAERS database is
   [MATH:
   <mrow><mo>(</mo><mrow><msqrt><msub><mi>λ</mi><mrow><mi>f</mi><mo>,</mo>
   <mn>1</mn></mrow></msub></msqrt><msub><mi>ξ</mi><mrow><mn>1</mn><mo>,</
   mo><mi>i</mi></mrow></msub><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>
   ,</mo><msqrt><msub><mi>λ</mi><mrow><mi>f</mi><mo>,</mo><msub><mi>k</mi>
   <mi>f</mi></msub></mrow></msub></msqrt><msub><mi>ξ</mi><mrow><msub><mi>
   k</mi><mi>f</mi></msub><mo>,</mo><mi>i</mi></mrow></msub></mrow><mo>)</
   mo></mrow> :MATH]
   , and the coordinate of the
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   -th drug in the bioassay database is
   [MATH:
   <mrow><mo>(</mo><mrow><msqrt><msub><mi>λ</mi><mrow><mi>a</mi><mo>,</mo>
   <mn>1</mn></mrow></msub></msqrt><msub><mi>η</mi><mrow><mn>1</mn><mo>,</
   mo><mi>j</mi></mrow></msub><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>
   ,</mo><msqrt><msub><mi>λ</mi><mrow><mi>a</mi><mo>,</mo><msub><mi>k</mi>
   <mi>a</mi></msub></mrow></msub></msqrt><msub><mi>η</mi><mrow><msub><mi>
   k</mi><mi>a</mi></msub><mo>,</mo><mi>j</mi></mrow></msub></mrow><mo>)</
   mo></mrow> :MATH]
   .

   We introduce set
   [MATH: <mrow><mi mathvariant="script">U</mi></mrow> :MATH]
   to contain drugs shared between FAERS and bioassay datasets. We denote
   the coordinates in these two datasets as
   [MATH: <mrow><msub><mi mathvariant="bold-italic">P</mi><mi
   mathvariant="script">U</mi></msub><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><mrow><mo>|</mo><mrow><mi
   mathvariant="script">U</mi></mrow><mo>|</mo></mrow><mo>×</mo><msub><mi>
   k</mi><mi>f</mi></msub></mrow></msup></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi mathvariant="bold-italic">Q</mi><mi
   mathvariant="script">U</mi></msub><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><mrow><mo>|</mo><mrow><mi
   mathvariant="script">U</mi></mrow><mo>|</mo></mrow><mo>×</mo><msub><mi>
   k</mi><mi>a</mi></msub></mrow></msup></mrow> :MATH]
   separately and assume a linear relationship between
   [MATH: <mrow><msub><mi mathvariant="bold-italic">P</mi><mi
   mathvariant="script">U</mi></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi mathvariant="bold-italic">Q</mi><mi
   mathvariant="script">U</mi></msub></mrow> :MATH]
   , i.e.,
   [MATH: <mrow><msub><mi mathvariant="bold-italic">Q</mi><mi
   mathvariant="script">U</mi></msub><mo>=</mo><msub><mi
   mathvariant="bold-italic">P</mi><mi
   mathvariant="script">U</mi></msub><mi
   mathvariant="bold-italic">B</mi><mo>+</mo><mi
   mathvariant="bold-italic">E</mi><mo>,</mo></mrow> :MATH]
   (3)

   where
   [MATH: <mrow><mi mathvariant="bold-italic">B</mi></mrow> :MATH]
   is the coefficient matrix and
   [MATH: <mrow><mi mathvariant="bold-italic">E</mi><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><mrow><mo>|</mo><mrow><mi
   mathvariant="script">U</mi></mrow><mo>|</mo></mrow><mo>×</mo><msub><mi>
   k</mi><mi>f</mi></msub></mrow></msup></mrow> :MATH]
   is the noise matrix with every element
   [MATH:
   <mrow><msub><mi>e</mi><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></msub
   ><mo>∼</mo><mi>N</mi><mrow><mo>(</mo><mrow><mn>0</mn><mo>,</mo><msup><m
   i>σ</mi><mn>2</mn></msup></mrow><mo>)</mo></mrow></mrow> :MATH]
   . If we denote the least square estimator of
   [MATH: <mrow><mi mathvariant="bold-italic">B</mi></mrow> :MATH]
   as
   [MATH: <mrow><mover accent="true"><mi
   mathvariant="bold-italic">B</mi><mo>^</mo></mover></mrow> :MATH]
   , for a particular drug only equipped with FAERS coordinates
   [MATH: <mrow><mi mathvariant="bold-italic">ζ</mi></mrow> :MATH]
   while lacking bioassay coordinates, we can predict its bioassay
   coordinate by
   [MATH: <mrow><msup><mi
   mathvariant="bold-italic">ζ</mi><mi>T</mi></msup><mover
   accent="true"><mi
   mathvariant="bold-italic">B</mi><mo>^</mo></mover></mrow> :MATH]
   . Once the bioassay coordinates are predicted, the similarity score in
   the bioassay database can be computed by the Euclidean distance between
   the bioassay coordinates of drugs.

Non-negative matrix factorization of COVID-19 emergency use authorization
FAERS database drugs

   Suppose there are
   [MATH: <mrow><mi>n</mi></mrow> :MATH]
   unique drugs and
   [MATH: <mrow><mi>p</mi></mrow> :MATH]
   unique AEs reported in the CEUAFD after the pre-processing process (see
   [62]Supplementary Material), and
   [MATH: <mrow><mi mathvariant="bold-italic">X</mi><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><mi>p</mi><mo>×</mo><mi>n</mi><
   /mrow></msup></mrow> :MATH]
   is a matrix with element
   [MATH:
   <mrow><msub><mi>X</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow>
   :MATH]
   representing the total number of patients reporting the
   [MATH: <mrow><mi>i</mi></mrow> :MATH]
   -th AE after taking the
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   -th drug (
   [MATH: <mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow> :MATH]
   ,...,
   [MATH: <mrow><mi>p</mi></mrow> :MATH]
   and
   [MATH: <mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow> :MATH]
   ,...,
   [MATH: <mrow><mi>n</mi></mrow> :MATH]
   ). We applied NMF to
   [MATH: <mrow><mi mathvariant="bold-italic">X</mi></mrow> :MATH]
   to decipher the underlying AE signatures among CEUAFD drugs. Meanwhile,
   for each drug, we assigned an abundance score to each AE signature.
   Concretely,
   [MATH: <mrow><mi mathvariant="bold-italic">X</mi></mrow> :MATH]
   is approximated by the product of two low-rank matrices:
   [MATH: <mrow><mi mathvariant="bold-italic">X</mi><mo>≈</mo><mi
   mathvariant="bold-italic">W</mi><mi
   mathvariant="bold-italic">H</mi><mo>,</mo></mrow> :MATH]
   (4)

   where
   [MATH: <mrow><mi mathvariant="bold-italic">W</mi><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><mi>p</mi><mo>×</mo><mi>k</mi><
   /mrow></msup></mrow> :MATH]
   is the AE-signature matrix with each column corresponding to a specific
   AE signature,
   [MATH: <mrow><mi mathvariant="bold-italic">H</mi><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><mi>k</mi><mo>×</mo><mi>n</mi><
   /mrow></msup></mrow> :MATH]
   is an abundance fraction matrix with the
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   -th column representing the relative abundance on each AE signature for
   the
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   -th drug (
   [MATH: <mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow> :MATH]
   ,...,
   [MATH: <mrow><mi>n</mi></mrow> :MATH]
   ), and
   [MATH: <mrow><mi>k</mi></mrow> :MATH]
   is the number of AE signatures. After data pre-processing, we have
   [MATH: <mrow><mi>p</mi></mrow> :MATH]
   = 134,
   [MATH: <mrow><mi>n</mi></mrow> :MATH]
   = 15.

   We adopted the optimization algorithm given by [63]Lee and Seung (2000)
   to get the factorized matrix
   [MATH: <mrow><mi mathvariant="bold-italic">W</mi></mrow> :MATH]
   and
   [MATH: <mrow><mi mathvariant="bold-italic">H</mi></mrow> :MATH]
   . In practice, we proceeded NMF with 50 different initializations of
   [MATH: <mrow><mi mathvariant="bold-italic">W</mi></mrow> :MATH]
   and
   [MATH: <mrow><mi mathvariant="bold-italic">H</mi></mrow> :MATH]
   to avoid sticking in the local stationary point.

Clustering of CEUAFD drugs

   Similar to the work of [64]Brunet et al. (2004),
   [MATH: <mrow><mi mathvariant="bold-italic">H</mi></mrow> :MATH]
   returned by the NMF procedure can be used for drug clustering. If
   [MATH:
   <mrow><msup><mi>v</mi><mo>*</mo></msup><mo>=</mo><munder><mtext>argmax<
   /mtext><mrow><mi>v</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>.</mo><mo>.</m
   o><mo>.</mo><mo>,</mo><mi>k</mi></mrow></munder><msub><mi>h</mi><mrow><
   mi>v</mi><mi>j</mi></mrow></msub></mrow> :MATH]
   , we can classify drug
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   into cluster
   [MATH: <mrow><msup><mi>v</mi><mo>*</mo></msup></mrow> :MATH]
   .

Tissue-specific genes in the GTEx database

   For a specific tissue
   [MATH: <mrow><mi>t</mi></mrow> :MATH]
   , we denote the mean expression level for gene
   [MATH: <mrow><mi>g</mi></mrow> :MATH]
   as
   [MATH: <mrow><msub><mover
   accent="true"><mi>E</mi><mo>¯</mo></mover><mrow><mi>t</mi><mi>g</mi></m
   row></msub></mrow> :MATH]
   , and the mean expression level for gene
   [MATH: <mrow><mi>g</mi></mrow> :MATH]
   across all tissues in the GTEx database as
   [MATH: <mrow><msub><mover
   accent="true"><mi>E</mi><mo>¯</mo></mover><mi>g</mi></msub></mrow>
   :MATH]
   . We further assume that among all tissues, the standard deviation of
   the expression level for gene
   [MATH: <mrow><mi>g</mi></mrow> :MATH]
   is
   [MATH: <mrow><msub><mi>S</mi><mi>g</mi></msub></mrow> :MATH]
   , and denote the tissue-specific score for gene
   [MATH: <mrow><mi>g</mi></mrow> :MATH]
   in tissue
   [MATH: <mrow><mi>t</mi></mrow> :MATH]
   as
   [MATH:
   <mrow><msub><mi>Z</mi><mrow><mi>t</mi><mi>g</mi></mrow></msub><mo>=</mo
   ><mfrac><mrow><msub><mover
   accent="true"><mi>E</mi><mo>¯</mo></mover><mrow><mi>t</mi><mi>g</mi></m
   row></msub><mo>−</mo><msub><mover
   accent="true"><mi>E</mi><mo>¯</mo></mover><mi>g</mi></msub></mrow><msub
   ><mi>S</mi><mi>g</mi></msub></mfrac></mrow><mo>.</mo> :MATH]
   (5)

   For each tissue
   [MATH: <mrow><mi>t</mi></mrow> :MATH]
   , we pick up 200 tissue-specific genes whose corresponding
   [MATH:
   <mrow><msub><mi>Z</mi><mrow><mi>t</mi><mi>g</mi></mrow></msub></mrow>
   :MATH]
   values are the 200 largest values among all genes.

Evaluation of the association between the drug and tissue

   We evaluated the association between each drug and tissue using network
   proximity. Suppose for a specific drug
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   , the module
   [MATH: <mrow><msub><mi>A</mi><mi>j</mi></msub></mrow> :MATH]
   consists of its target proteins, which can be obtained from the
   DrugBank database ([65]Wishart et al., 2018), while the module
   [MATH: <mrow><msub><mi>B</mi><mi>t</mi></msub></mrow> :MATH]
   for a specific tissue
   [MATH: <mrow><mi>t</mi></mrow> :MATH]
   contains the tissue-specific proteins. We define the association
   between drug
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   and tissue
   [MATH: <mrow><mi>t</mi></mrow> :MATH]
   as the distance between module
   [MATH: <mrow><msub><mi>A</mi><mi>j</mi></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi>B</mi><mi>t</mi></msub></mrow> :MATH]
   :
   [MATH:
   <mrow><msub><mi>d</mi><mrow><mi>j</mi><mi>t</mi></mrow></msub><mo>=</mo
   ><mfrac><mrow><msub><mo>∑</mo><mrow><mi>a</mi><mo>∈</mo><msub><mi>A</mi
   ><mi>j</mi></msub></mrow></msub><munder><mi>min</mi><mrow><mi>b</mi><mo
   >∈</mo><msub><mi>B</mi><mi>t</mi></msub></mrow></munder><mi>d</mi><mrow
   ><mo>(</mo><mrow><mi>a</mi><mo>,</mo><mi>b</mi></mrow><mo>)</mo></mrow>
   <mo>+</mo><msub><mo>∑</mo><mrow><mi>b</mi><mo>∈</mo><msub><mi>B</mi><mi
   >t</mi></msub></mrow></msub><munder><mi>min</mi><mrow><mi>a</mi><mo>∈</
   mo><msub><mi>A</mi><mi>j</mi></msub></mrow></munder><mi>d</mi><mrow><mo
   >(</mo><mrow><mi>a</mi><mo>,</mo><mi>b</mi></mrow><mo>)</mo></mrow></mr
   ow><mrow><mrow><mo>|</mo><mrow><msub><mi>A</mi><mi>j</mi></msub></mrow>
   <mo>|</mo></mrow><mo>+</mo><mrow><mo>|</mo><mrow><msub><mi>B</mi><mi>t<
   /mi></msub></mrow><mo>|</mo></mrow></mrow></mfrac></mrow><mo>,</mo>
   :MATH]
   (6)

   where
   [MATH:
   <mrow><mi>d</mi><mrow><mo>(</mo><mrow><mi>a</mi><mo>,</mo><mi>b</mi></m
   row><mo>)</mo></mrow></mrow> :MATH]
   is the shortest distance between gene
   [MATH: <mrow><mi>a</mi></mrow> :MATH]
   and gene
   [MATH: <mrow><mi>b</mi></mrow> :MATH]
   in the gene–gene regulatory network.

   The significance of the association was calculated by the permutation
   test. Specifically, for each drug
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   and tissue
   [MATH: <mrow><mi>t</mi></mrow> :MATH]
   pair, we randomly picked up two sets of genes, where the first set
   contains the same number of genes as the drug–target gene set and the
   second set contains the same number of genes as the tissue-specific
   gene set, and then calculated the network distance by [66]Eq. 6. This
   procedure was conducted 100 times to construct the null distribution of
   network distance. Suppose the 100 network distances obtained from the
   permutation procedure are denoted as
   [MATH:
   <mrow><msub><mi>d</mi><mrow><mi>j</mi><mi>t</mi><mo>,</mo><mn>1</mn></m
   row></msub></mrow> :MATH]
   ,...,
   [MATH:
   <mrow><msub><mi>d</mi><mrow><mi>j</mi><mi>t</mi><mo>,</mo><mn>100</mn><
   /mrow></msub></mrow> :MATH]
   , and we introduced set
   [MATH: <mrow><msub><mi
   mathvariant="script">D</mi><mrow><mi>j</mi><mi>t</mi></mrow></msub></mr
   ow> :MATH]
   to contain these 100 distance values. The significance level of the
   drug–tissue association is denoted as the proportion of distance value
   in
   [MATH: <mrow><msub><mi
   mathvariant="script">D</mi><mrow><mi>j</mi><mi>t</mi></mrow></msub></mr
   ow> :MATH]
   , which is less than
   [MATH:
   <mrow><msub><mi>d</mi><mrow><mi>j</mi><mi>t</mi></mrow></msub></mrow>
   :MATH]
   .

One-sided Wilcoxon rank test

   We grouped AEs using their System Organ Class (SOC) categories in
   Medical Dictionary for Regulatory Activities (MedDRA). For a specific
   SOC category, suppose it contained
   [MATH: <mrow><mi>m</mi></mrow> :MATH]
   AEs in the CEUAFD. We further assumed that for the
   [MATH: <mrow><mi>v</mi></mrow> :MATH]
   -th AE signature, the probabilities for these
   [MATH: <mrow><mi>m</mi></mrow> :MATH]
   AEs were
   [MATH:
   <mrow><msub><mi>W</mi><mrow><mn>1</mn><mi>v</mi></mrow></msub></mrow>
   :MATH]
   ,…,
   [MATH:
   <mrow><msub><mi>W</mi><mrow><mi>m</mi><mi>v</mi></mrow></msub></mrow>
   :MATH]
   . To test whether there is a significant association between this SOC
   category and the
   [MATH: <mrow><mi>v</mi></mrow> :MATH]
   -th AE signature, we reformulated it to test whether the distribution
   formed by
   [MATH:
   <mrow><msubsup><mrow><mo>{</mo><mrow><msub><mi>W</mi><mrow><mi>i</mi><m
   i>v</mi></mrow></msub></mrow><mo>}</mo></mrow><mrow><mi>i</mi><mo>=</mo
   ><mn>1</mn></mrow><mi>m</mi></msubsup></mrow> :MATH]
   is significantly larger than the distribution formed by pooling
   [MATH:
   <mrow><msubsup><mrow><mo>{</mo><mrow><msub><mi>W</mi><mrow><mi>i</mi><m
   sup><mi>v</mi><mo>′</mo></msup></mrow></msub></mrow><mo>}</mo></mrow><m
   row><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></msubsup></mrow>
   :MATH]
   for all
   [MATH:
   <mrow><msup><mi>v</mi><mo>′</mo></msup><mo>≠</mo><mi>v</mi></mrow>
   :MATH]
   . This test was performed by the one-sided Wilcoxon rank test.

Effect size of the association between the SOC category and AE signature

   Apart from the one-sided Wilcoxon rank test, we also evaluated the
   association between SOC categories and AE signatures by calculating
   effect size (ES). Specifically, for a SOC category containing
   [MATH: <mrow><mi>m</mi></mrow> :MATH]
   AEs, we use
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">ω</mi><mi>v</mi></msub><mo>=</mo></mrow>
   :MATH]
   (W[1v]  ,
   [MATH: <mrow><mo>…</mo></mrow> :MATH]
   ,W[mv] ) to denote the probabilities for the
   [MATH: <mrow><mi>m</mi></mrow> :MATH]
   AEs in the
   [MATH: <mrow><mi>v</mi></mrow> :MATH]
   -th AE signature, and
   [MATH: <mrow><msubsup><mi mathvariant="bold-italic">ω</mi><mi>v</mi><mi
   mathvariant="normal">c</mi></msubsup><mo>=</mo><msub><mrow><mo>(</mo><m
   row><msub><mi>W</mi><mrow><mn>1</mn><msup><mi>v</mi><mo>′</mo></msup></
   mrow></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>W</mi><mrow><mi>m</
   mi><msup><mi>v</mi><mo>′</mo></msup></mrow></msub></mrow><mo>)</mo></mr
   ow><mrow><msup><mi>v</mi><mo>′</mo></msup><mo>≠</mo><mi>v</mi></mrow></
   msub></mrow> :MATH]
   is a vector containing probabilities for the
   [MATH: <mrow><mi>m</mi></mrow> :MATH]
   AEs in the other three AE signatures. The ES is defined as
   [MATH:
   <mrow><mtext>ES</mtext><mo>=</mo><mfrac><mrow><mtext>mean</mtext><mrow>
   <mo>(</mo><mrow><msub><mi
   mathvariant="bold-italic">ω</mi><mi>v</mi></msub></mrow><mo>)</mo></mro
   w><mo>−</mo><mtext>mean</mtext><mrow><mo>(</mo><mrow><msubsup><mi
   mathvariant="bold-italic">ω</mi><mi>v</mi><mi
   mathvariant="normal">c</mi></msubsup></mrow><mo>)</mo></mrow></mrow><mr
   ow><mtext>sd</mtext><mrow><mo>(</mo><mrow><msubsup><mi
   mathvariant="bold-italic">ω</mi><mi>v</mi><mi
   mathvariant="normal">c</mi></msubsup></mrow><mo>)</mo></mrow></mrow></m
   frac></mrow><mo>.</mo> :MATH]
   (7)

Sparse linear discriminant analysis

   We conducted SLDA ([67]Mai et al., 2012) for each pair of drug clusters
   to pick up discriminant genes. Specifically, for a pair of drug
   clusters (
   [MATH:
   <mrow><msub><mi>v</mi><mn>1</mn></msub><mo>,</mo><msub><mi>v</mi><mn>2<
   /mn></msub></mrow> :MATH]
   ), suppose there are
   [MATH: <mrow><msub><mi>n</mi><mn>1</mn></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi>n</mi><mn>2</mn></msub></mrow> :MATH]
   drugs in these two clusters with expression information accessible in
   the Library of Integrated Network-Based Cellular Signatures (LINCS)
   A549 database ([68]Subramanian et al., 2017). We assumed that the
   cluster labels are coded as
   [MATH:
   <mrow><msub><mi>y</mi><mi>j</mi></msub><mo>=</mo><mo>−</mo><mfrac><mrow
   ><mi>n</mi></mrow><mrow><msub><mi>n</mi><mn>1</mn></msub></mrow></mfrac
   ></mrow> :MATH]
   if drug
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   belongs to cluster
   [MATH: <mrow><msub><mi>v</mi><mn>1</mn></msub></mrow> :MATH]
   and
   [MATH:
   <mrow><msub><mi>y</mi><mi>j</mi></msub><mo>=</mo><mfrac><mrow><mi>n</mi
   ></mrow><mrow><msub><mi>n</mi><mn>2</mn></msub></mrow></mfrac></mrow>
   :MATH]
   if drug
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   belongs to cluster
   [MATH: <mrow><msub><mi>v</mi><mn>2</mn></msub></mrow> :MATH]
   , where
   [MATH: <mrow><msub><mi>n</mi><mn>1</mn></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi>n</mi><mn>2</mn></msub></mrow> :MATH]
   are the number of drugs in cluster
   [MATH: <mrow><msub><mi>v</mi><mn>1</mn></msub></mrow> :MATH]
   and
   [MATH: <mrow><msub><mi>v</mi><mn>2</mn></msub></mrow> :MATH]
   separately, with
   [MATH:
   <mrow><mi>n</mi><mo>=</mo><msub><mi>n</mi><mn>1</mn></msub></mrow>
   :MATH]
   +
   [MATH: <mrow><msub><mi>n</mi><mn>2</mn></msub></mrow> :MATH]
   . The gene expression profile for drug
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   was denoted as
   [MATH: <mrow><msub><mi
   mathvariant="bold-italic">x</mi><mi>j</mi></msub></mrow> :MATH]
   with length
   [MATH: <mrow><mi>p</mi></mrow> :MATH]
   .

   The discriminant genes were selected by solving the following l1
   penalized least squares problem:
   [MATH: <mrow><mrow><mo>(</mo><mrow><mover accent="true"><mi
   mathvariant="bold">β</mi><mo>^</mo></mover><mo>,</mo><msub><mover
   accent="true"><mi
   mathvariant="normal">β</mi><mo>^</mo></mover><mn>0</mn></msub></mrow><m
   o>)</mo></mrow><mo>=</mo><munder><mtext>argmin</mtext><mrow><mi
   mathvariant="bold">β</mi><mo>,</mo><msub><mi
   mathvariant="normal">β</mi><mn>0</mn></msub></mrow></munder><mrow><mo>{
   </mo><mrow><msup><mi>n</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mro
   w><munderover><mstyle
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>j</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>n</mi></munderover><mrow><msup><mrow><mo>(</mo><mrow><ms
   ub><mi>y</mi><mi>j</mi></msub><mo>−</mo><msub><mi
   mathvariant="normal">β</mi><mn>0</mn></msub><mo>−</mo><msubsup><mi
   mathvariant="bold-italic">x</mi><mi>j</mi><mi
   mathvariant="normal">T</mi></msubsup><mi
   mathvariant="bold">β</mi></mrow><mo>)</mo></mrow><mn>2</mn></msup><mo>+
   </mo><mi>λ</mi></mrow></mrow><mrow><munderover><mstyle
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>k</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>p</mi></munderover><mrow><mo>|</mo><mrow><msub><mi
   mathvariant="normal">β</mi><mi>k</mi></msub></mrow><mo>|</mo></mrow></m
   row></mrow><mo>}</mo></mrow></mrow><mo>.</mo> :MATH]
   (8)

   In practice, we used the R package TULIP to solve the abovementioned
   optimization problem. Because of the limited sample size, cross
   validation is not reliable in selecting a suitable
   [MATH: <mrow><mi>λ</mi></mrow> :MATH]
   , and we set
   [MATH: <mrow><mi>λ</mi></mrow> :MATH]
   to be the smallest one among the sequence of
   [MATH: <mrow><mi>λ</mi></mrow> :MATH]
   values provided by function dsda. Those genes whose
   [MATH: <mrow><mover accent="true"><mi
   mathvariant="bold">β</mi><mo>^</mo></mover></mrow> :MATH]
   are non-zero were considered as discriminant genes.

Results

AE-signature analysis for approved CEUAFD drugs

   The CEUAFD collected from the beginning of the COVID-19 pandemic until
   September 2021 was downloaded for COVID-19 drug-related AE-signature
   analysis. We obtained 9,754 reports, corresponding with 15 drugs and
   134 unique AE reports after the data-preprocessing procedures (see
   [69]Supplementary Material). We then applied NMF to the drug-AE count
   matrix
   [MATH: <mrow><mi mathvariant="bold-italic">X</mi><mo>∈</mo><msup><mi
   mathvariant="double-struck">R</mi><mrow><mn>134</mn><mo>×</mo><mn>15</m
   n></mrow></msup></mrow> :MATH]
   to extract the collection of AE signatures, with each element of
   [MATH: <mrow><mi mathvariant="bold-italic">X</mi></mrow> :MATH]
   representing the number of patients reporting a specific AE for a
   particular drug. After a stability-driven model selection procedure
   ([70]Brunet et al., 2004), we obtained four AE signatures deciphered by
   NMF, which constituted a sufficient and non-redundant base in depicting
   the drug-specific AE distributions (see [71]Supplementary Material). We
   also conducted the clustering of CEUAFD drugs, with drugs in each
   cluster possessing high abundance fractions on the same AE signature.
   Some basic information for nine widely used CEUAFD drugs (more than 20
   items of AE feedback in the CEUAFD by September 2021) is shown in
   [72]Table 1. Five out of the nine drugs fall into the second cluster.
   The two combinational drugs, casirivimab and imdevimab and bamlanivimab
   and etesevimab, are both enriched with AE signature 3. In contrast, the
   other two widely used drugs, remdesivir and bamlanivimab, are
   representative drugs of clusters 1 and 4 separately. Overall, drugs
   with the most frequent AEs are also representative AEs for each
   corresponding AE signature within each drug cluster.

TABLE 1.

   Summary of nine CEUAFD drugs. Information on nine widely used CEUAFD
   drugs (more than 20 items of AE feedback in the CEUAFD by September
   2021) is shown, including the number of patients taking one specific
   drug, the two most frequent AEs, and the cluster to which each drug was
   assigned.
   CEUAFD drugs #patients Two most frequent AEs Clusters
   Hydroxychloroquine 71 Electrocardiogram Qt prolonged; hypoglycaemia 2
   Remdesivir 4047 Alanine aminotransferase increased; aspartate
   aminotransferase increased 1
   Tocilizumab 71 Alanine aminotransferase increased; aspartate
   aminotransferase increased 2
   Casirivimab and imdevimab 722 Dyspnoea; chills 3
   Bamlanivimab 4375 Dyspnoea; pyrexia 4
   Baricitinib 123 Lymphocyte count decreased; acute kidney injury 2
   Covid-19 convalescent plasma 82 Dyspnoea; chills 2
   Bamlanivimab and etesevimab 171 Dyspnoea; pyrexia 3
   Vancomycin hydrochloride 23 Acute kidney injury; respiratory failure 2
   [73]Open in a new tab

   To give a more general picture of each AE signature, we used the top 20
   representative AEs to characterize the rough distributions of four AE
   signatures (see [74]Supplementary Material) and grouped AEs by their
   SOC categories in MedDRA. In addition, we performed a one-sided
   Wilcoxon rank test for every SOC category in each AE signature to
   determine whether AEs belonging to one SOC category have a
   high-probability accumulation in some AE-signature distributions.

   For the first AE signature, AEs with top probabilities include some
   liver-related symptoms (alanine aminotransferase increased, aspartate
   aminotransferase increased, and blood creatinine increased),
   kidney-related symptoms (acute kidney injury and glomerular filtration
   rate decreased), and cardiovascular-related symptoms (bradycardia,
   cardiac arrest, and hypotension) ([75]Supplementary Figure S1). Most of
   them were classified into SOC categories “investigations” and “renal
   and urinary disorders.” These two categories correspond to a higher
   level of AE severity ([76]Figure 2), where the “investigations”
   category includes some laboratory test indexes, radiologic test
   indexes, or physical examination indexes. Compared with other drugs,
   AEs of remdesivir are more abundant in the first AE signature, with
   more than 90% of AEs attributed to AE signature 1 ([77]Supplementary
   Figure S2). This finding can be supported by several existing studies
   which reported the damage to the liver and renal functions of
   remdesivir users among COVID-19 patients ([78]Perveen et al., 2020;
   [79]Zampino et al., 2020; [80]Singh and Kamath, 2021). Thus, COVID-19
   patients with poor liver, renal, or probably cardiac functions should
   take remdesivir with caution.

FIGURE 2.

   [81]FIGURE 2
   [82]Open in a new tab

   Summary of statistically significant SOC categories returned by the
   one-sided Wilcoxon rank test. All 134 AEs extracted from the CEUAFD are
   classified into different SOC categories in MedDRA. By the one-sided
   Wilcoxon rank test, only significant associations between the SOC
   category and AE signature are shown. The size of the circles
   corresponds to different levels of p values. The colors represent
   different effect sizes (ESs) of the association between the SOC
   category and AE signature.

   We detected significant aggregation in some kidney-related AEs (acute
   kidney injury), cardiovascular-related AEs (electrocardiogram QT
   prolonged, hypotension, and hypoglycemia), and AEs related to the
   immune system (lymphocyte count decreased, lymphopenia, and
   leukocytosis) for AE signature 2. This leads to a significant
   high-probability accumulation of SOC categories “vascular disorder” (p
   = 0.0095) and “blood and lymphatic system disorders” (p = 0.03) in AE
   signature 2 ([83]Figure 2 and [84]Supplementary Figure S3).
   Hydroxychloroquine, tocilizumab, COVID-19 convalescent plasma,
   vancomycin hydrochloride, and baricitinib are more enriched with AE
   signature 2 compared with other drugs ([85]Supplementary Figure S2).
   Interestingly, tocilizumab and baricitinib are both categorized as
   “immune modulators” by the FDA when treating COVID-19. Recent research
   also highlighted that hydroxychloroquine treatment may impair host
   immunity in response to SARS-CoV-2 ([86]Devarajan and Vaseghi, 2021;
   [87]Rother et al., 2020), while baricitinib could not be initiated in
   patients with small lymphocyte count ([88]Praveen et al., 2020).
   Synthesizing all the abovementioned evidence, we recommend that
   COVID-19 patients with weak immune systems or vascular disorders should
   not be prescribed hydroxychloroquine, tocilizumab, or baricitinib.

   AE signature 3 and AE signature 4 have many representative AEs in
   common, such as dyspnoea, pyrexia, hypoxia, oxygen saturation
   decreased, nausea, vomiting, dizziness, and cough ([89]Supplementary
   Figures S4, S5). Most of these common representative AEs were
   classified into the “General disorders and administration site
   conditions” SOC category ([90]Figure 2). Standing on the SOC category
   perspective, we can distill more general differences between these two
   AE signatures—part of the representative AEs in AE signature 3 can be
   attributed to the “Skin and subcutaneous tissue disorders” SOC
   category, while AEs belonging to the SOC category “Nervous system
   disorders” are more abundant in AE signature 4 ([91]Figure 2).
   Casirivimab and imdevimab and bamlanivimab are the representative drugs
   of AE signature 3 and 4 separately. Another combinational drug
   bamlanivimab and etesevimab is also enriched with AE signature 3
   ([92]Supplementary Figure S2). Different from the first two AE
   signatures, the symptoms of the last two AE signatures are relatively
   mild. This finding is consistent with the FDA’s recommendation that
   mild-to-moderate COVID-19 in adults and children not admitted to
   hospital may be treated by bamlanivimab ([93]Mahase, 2020). We also
   found that reduced oxygen saturation is a common AE brought by both
   casirivimab and imdevimab and bamlanivimab, which was supported by
   existing research ([94]Kano et al., 2021). Therefore, casirivimab and
   imdevimab should be taken carefully by COVID-19 patients with low
   oxygen saturation.

AE-signature analysis for drugs in the clinical trial stage

   An AE-signature analysis for drugs that are still in the clinical trial
   stage to fight against COVID-19 was performed. Under the rationale
   assumption that similar drugs may induce similar side effects
   ([95]Campillos et al., 2008; [96]Zhang et al., 2017), we predicted
   their AEs by combining the last-stage similarity network and CEUAFD
   drugs’ clustering output.

   Part of the predicted last-stage drug–drug similarity network is
   presented in [97]Figure 3A, including the five widely used CEUAFD drugs
   and their five nearest neighbor drugs which were still in the clinical
   investigation stage. We found that baricitinib and hydroxychloroquine
   are each other’s five nearest neighbor drugs, which is consistent with
   the NMF result of the CEUAFD that these two drugs are both
   representative drugs of AE signature 2. On the other hand, casirivimab
   and imdevimab and bamlanivimab (both have relatively mild AE feedback
   from COVID-19 patients) are also among each other’s top five nearest
   neighbor drugs. Finally, chloroquine, a widely recognized drug with
   functions similar to hydroxychloroquine, is also among the top five
   nearest neighbors of hydroxychloroquine. We define drugs from the
   CEUAFD, i.e., remdesivir, baricitinib, casirivimab and imdevimab, and
   bamlanivimab, as the representative drugs for the four clusters. With
   respect to the four drugs, the top 15 nearest neighbor drugs with each
   are shown in [98]Table 2.

FIGURE 3.

   [99]FIGURE 3
   [100]Open in a new tab

   (A) Drug–drug similarity network between five widely used CEUAFD drugs
   and their five nearest neighbor drugs. The arrow pointing from drug A
   to drug B means drug B is among the five nearest neighbor drugs of drug
   A. (B) Heatmap of the significance of the association between tissues
   and drugs in four drug clusters. The association for each drug–tissue
   pair was calculated by network proximity, while the significance of the
   association was performed using a permutation test. Colors indicate the
   −log10 (p value) of each drug–tissue pair.

TABLE 2.

   Four drug clusters based on the integrated drug–drug similarity network
   and NMF outcomes of the CEUAFD. Drugs in bold are from the CEUAFD, and
   each drug cluster features one specific AE signature.
   Cluster 1      Cluster 2          Cluster 3                 Cluster 4
   Remdesivir     Baricitinib        Casirivimab and imdevimab Bamlanivimab
   Fostamatinib   Ruxolitinib        Argatroban                Ritonavir
   Iloprost       Tofacitinib        Fluvoxamine               Lopinavir
   Thalidomide    Nintedanib         Quetiapine                Nitazoxanide
   Methotrexate   Hydroxychloroquine Ivermectin                Ibrutinib
   Pitavastatin   Azithromycin       Amiodarone                Clofazimine
   Naltrexone     Chloroquine        Amlodipine                Sirolimus
   Chlorpromazine Nicotinamide       Resveratrol               Decitabine
   Artesunate     Telmisartan        Cyproheptadine            Amoxicillin
   Toremifene     Ibudilast          Simvastatin               Etoposide
   Masitinib      Imatinib           Artemisinin               Rivaroxaban
   Enzalutamide   Clarithromycin     Ambrisentan               Atorvastatin
   Doxycycline    Dexamethasone      Melphalan                 Maraviroc
   Candesartan    Methylprednisolone Disulfiram                Leflunomide
   Pirfenidone    Prednisone         Dipyridamole              Prazosin
   [101]Open in a new tab

   To give a further explanation of the predicted drug clustering result,
   we applied the network proximity approach to evaluate the association
   between each drug–tissue pair using the GTEx database. The significance
   level for each drug–tissue pair is shown in [102]Figure 3B. Compared
   with the other three drug clusters, more drugs from cluster 1 possess a
   higher association level in tissues such as the liver, lung, prostate,
   and blood. There is a strong association between brain tissue and drugs
   from cluster 4, which gives an indirect reflection that drugs in
   cluster 4 may induce nervous system disorders. Furthermore, we listed
   several representative drugs with literature-reported AEs observed in
   pharmacological and genomic spaces ([103]Table 3). The widely reported
   AEs showed high-level consistency with the typical AEs in each drug
   cluster.

TABLE 3.

   Summary of 10 representative COVID-19 drugs in the clinical trial
   stage. Information on 10representative COVID-19 drugs which are in the
   clinical trial stage is shown, including DrugBank ID, typical AEs with
   literature support, and the drug cluster to which each drug belongs.
   DrugBank ID Drug name AEs reported in the literature Drug cluster
   DB12010 Fostamatinib Hepatic function impairment, [104]Newland and
   McDonald (2020) 1
   DB01041 Thalidomide Bradycardia, [105]Thangaraju et al. (2019) 1
   DB01088 Iloprost Acute kidney injury, [106]Uyar et al. (2016);
   increased GGT, [107]Hsu and Rubin (2005) 1
   DB08860 Pitavastatin Change of glomerular filtration rate, [108]Baik et
   al. (2019) 1
   DB01611 Hydroxychloroquine Tachycardia and hypotension, [109]Marquardt
   and Albertson (2001) 2
   DB08895 Tofacitinib Immune system injury, [110]Chen et al. (2017); deep
   vein thrombosis, [111]Cohen et al. (2020) 2
   DB08877 Ruxolitinib Increased risk of infections and thromboembolic
   events, [112]Shalabi et al. (2022) 2
   DB00176 Fluvoxamine Dyspnoea, nausea, and headache, [113]Lenze et al.
   (2020) 3
   DB01601 Lopinavir Acute kidney injury, [114]Binois et al. (2020) 4
   DB00507 Nitazoxanide Gastrointestinal complaints and headache,
   [115]Walsh et al. (2020) 4
   [116]Open in a new tab

Molecular mechanisms accounting for different drug clusters

   We employed LINCS data with the A549 cell line ([117]Subramanian et
   al., 2017) and applied SLDA ([118]Mai et al., 2012) to further explore
   the molecular mechanisms for the four deciphered AE signatures. Our
   target is to find therapeutic biomarkers presenting distinguishing
   expression patterns for each pair of drug clusters. Part of the
   distinguished molecular mechanisms of four AE signatures was reflected
   by discriminative biomarkers extracted from SLDA.

   For every pair of drug clusters, we obtained a discriminative gene set
   by conducting SLDA. Log fold change of each drug’s expression levels on
   the selected 67 SLDA genes is shown in [119]Figure 4A. Because
   representative AEs of drug clusters 3 and 4 are relatively mild and the
   AEs in clusters 1 and 2 are more severe and characteristic, we merged
   drug clusters 3 and 4 as the control cluster. SLDA genes showing
   differential expression patterns in the drug cluster 1–control pair and
   2–control pair were collected for further validation. A biological
   function enrichment analysis was implemented to uncover the possible
   relationships between enriched pathways and representative differential
   AEs in the pair of drug clusters (see [120]Supplementary Material).

FIGURE 4.

   [121]FIGURE 4
   [122]Open in a new tab

   (A) Heatmap of the log fold change of gene expression levels for drugs
   in four clusters. Each row is a gene, and each column is a drug. Drugs
   are ordered according to their cluster index number. (B) Enriched
   pathways in two discriminant gene sets corresponding to two pairs of
   drug clusters—drug cluster 1–control pair and 2–control pair.
   Enrichment analysis was performed in two discriminant gene sets
   separately, where the two gene sets contain genes showing
   distinguishing expression patterns between drug cluster 1 and the
   control group and drug cluster 2 and the control group separately, and
   the control group was defined as the combination of drug cluster 3 and
   4. Ten pathways with the most significant enrichment p values are
   shown, and pathways enriched in different gene sets are filled with
   different colors (left). Discriminant genes that emerged in the
   enriched pathways are shown (right).

   We found that gene FAS presents highly different expression levels
   between drug clusters 1 and 2 ([123]Figure 4A). Fas and Fas-ligands are
   classical members of the TNF receptor and TNF ligand families. The
   interaction of the receptor with its ligand allows the formation of a
   death-inducing signaling complex that includes the Fas-associated death
   domain protein (FADD), caspase 8, and caspase 10. With this mechanism,
   they play an important role in the regulation of apoptotic processes,
   including activation-induced cell death, T-cell-induced cytotoxicity,
   and multiple organ function impairment ([124]Wajant et al., 2003).

   By conducting enrichment analysis, the differential genes between drug
   cluster 1–control pair are enriched in the RHO GTPases Activate NADPH
   Oxidases pathway, with PIN1 and RAC2 being the pathway input genes
   ([125]Figure 4B). To our knowledge, RAC2 is an important component of
   NADPH oxidase ([126]Kim and Dinauer, 2001). Excessive NADPH
   oxidase-derived ROS production can induce multiple tissue injuries and
   prolonged inflammatory responses, leading to inflammatory diseases
   ([127]Belambri et al., 2018).

   Different from that mentioned above, the KEGG Reactome analysis showed
   that a large number of differential genes in the drug cluster 2–control
   pair flowed to signal transduction pathway ([128]Figure 4B), which is
   related to many diseases such as cancer, atherosclerosis, and
   inflammatory diseases. Notably, the SMAD3 gene is selected in this
   pathway. SMAD3 is a major transcription factor in transforming growth
   factor-
   [MATH: <mrow><mi>β</mi></mrow> :MATH]
   (TGF-
   [MATH: <mrow><mi>β</mi></mrow> :MATH]
   ) signaling. The TGF-
   [MATH: <mrow><mi>β</mi></mrow> :MATH]
   /Smad-dependent signaling pathway has been shown to be activated in
   models of myocardial infarction, as well as in multiple pathological
   processes ([129]Bujak et al., 2007; [130]Peng et al., 2021).

Discussion

   With respect to COVID-19 drug safety, we developed a framework to
   predict the AE profiles for those drugs which are not in large-scale
   use. This framework has some advantages from methodological, clinical,
   and molecular viewpoints.

   Some innovative points in our method are worth to be mentioned. The
   advantages of NMF are two-fold. First, NMF possessed the ability of
   dimension reduction and drug clustering. Second, by further SOC
   category enrichment analysis, NMF can capture the subpopulation fragile
   to each AE signature with a higher precision level. On the other hand,
   most network integration methods assumed node consistency among
   different networks. With the rapid development of new drugs, it is hard
   to fulfill that each drug’s information is available in all data
   sources. We developed an MDS-based network imputation method, which was
   achieved by node embedding and building a predictive model employing
   network-shared drugs. This method can effectively use incomplete
   information. The imputation is particularly useful when drugs’
   information is unable to be presented by low-dimensional vectors (for
   example, chemical structure or high-dimensional bioassay data), as our
   imputation model was built in the low-dimensional embedded space.
   Lastly, we utilized SLDA to seek discriminative genes which may provide
   molecular-level explanation for different AE signatures among drug
   clusters. If there is a correspondence between the mechanism of SLDA
   genes and different AE-signature patterns for each pair of drug
   clusters, we validate the rationality of our drug clustering results
   and AE prediction results in some sense. On the other hand, combining
   symptomatic features with molecular-level features for COVID-19
   patients may contribute to more precise therapeutic prescriptions.

   Clinically, we presented some important discoveries. Our analysis of
   the COVID-19 EUA FAERS database showed that remdesivir may bring damage
   to liver, renal, or probably cardiac functions, which is widely
   acknowledged with the evolvement of in-depth research ([131]Li et al.,
   2021). Tocilizumab and baricitinib, which are categorized as “immune
   modulators” by the FDA, may cause vascular, blood, and lymphatic system
   disorders in COVID-19 patients. Another important message brought by
   the CEUAFD is that AE feedbacks from combinational drugs are relatively
   mild. In addition, by clustering all COVID-19 drugs, we found that
   among the top five nearest neighbors of Baricitinib, three of them
   belong to the “tinib” family. Our prediction can be verified by
   [132]Halimi et al. (2008), who reported that prolongation of the QT
   interval and heart failure are frequent AEs in patients using tinibs.

   From the viewpoint of molecular level, drugs from cluster 1 and control
   cluster showed differential expression patterns in the RHO GTPases
   Activate NADPH Oxidases pathway. NADPH oxidase (NOX) is a multimeric
   transmembrane enzyme complex that uses NADPH as an electron donor to
   generate superoxide (O2-) and hydrogen peroxide (H2O2) from molecular
   oxygen. It participates in various biological processes including
   innate immunity, and biosynthetic processes ([133]Bedard and Krause,
   2007). In the pathophysiological process of the liver, NADPH oxidase is
   expressed functionally in phagocytic and non-phagocytic forms.
   NOX-derived ROS contributes to various liver diseases caused by
   alcohol, hepatitis C virus, and toxic bile acids ([134]Paik et al.,
   2014). In addition to causing liver injury, the impairment of this
   pathway is also related to renal insufficiency. Abnormally activated
   NOX in renal microvessels can lead to superoxide production. Oxidative
   stress in the kidney contributes to renal vascular remodeling and
   increases preglomerular resistance. Some reports showed that they are
   key factors in acute and chronic kidney injury ([135]Xu et al., 2020),
   while for the drug cluster 2–control pair, the differential genes are
   more relevant to the signal transduction pathway. The relationship
   between the blockage of the signal transduction pathway and
   cardiovascular events has been widely reported ([136]Wheeler-Jones,
   2005). Previous studies have demonstrated that several kinds of TKIs
   could lead to grade III or higher QT prolongation, and animal models
   suggested that this might be caused by the inhibition of PI3K signaling
   ([137]Schiefer et al., 2018). In addition, the impairment of other
   signaling pathways is also intimately related to cardiovascular
   disorders. Mechanistic studies suggested that disturbed TGF-
   [MATH: <mrow><mi>β</mi></mrow> :MATH]
   signaling may also contribute to non-genetic cardiovascular disorders
   such as atherosclerosis and cardiac fibrosis ([138]Goumans and Ten
   Dijke, 2018). Reactivation of the WNT signaling pathway has also been
   observed in many pathologies of cardiac and vascular vessels
   ([139]Foulquier et al., 2018).

   There are also some limitations in our study. Although the CEUAFD
   provided rich AE feedback for some approved COVID-19 drugs, only 9
   drugs were reported by more than 20 patients. This insufficient AE
   feedback phenomenon may induce inaccuracy in extracting AE signatures.
   Therefore, the four factorized AE signatures cannot depict AE
   information for all COVID-19 drugs. Due to the limitations of data
   accessibility, we only used LINCS data processed on lung tissues to
   explore molecular-level differences among drug clusters. With the
   propelling of COVID-19 research, more and more molecular-level data
   will be released to promote our understanding of the mechanism of AEs
   returned by different COVID-19 drugs.

Conclusion

   Many existing COVID-19 drugs are still in the clinical trial stage,
   thus lacking abundant AE feedback. This drug safety analysis tried to
   solve this important problem by borrowing information from the AE
   profile of other “old” drugs. Our analysis took the heterogeneous
   health conditions of COVID-19 patients into consideration and proposed
   a computational framework to predict AEs for potential COVID-19 drugs.
   Therefore, the proposed framework jumps out of the current
   winner-takes-all drug repurposing framework and can provide better
   precise drug recommendations. We believe that our framework can be
   generalized to other diseases for precision drug recommendation and
   drug clustering with reduced time and cost.

Data availability statement

   The datasets presented in this study can be found in online
   repositories. The names of the repository/repositories and accession
   number(s) can be found in the article/[140]Supplementary Material.

Author contributions

   YZ conceived and supervised the study. HW and XW collected the data. HW
   conducted the analysis. HW wrote the first draft of the manuscript with
   input from XW, TL, and YZ. HW, DL, and YZ revised the manuscript. All
   authors reviewed and approved the final version of the manuscript.

Conflict of interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Publisher’s note

   All claims expressed in this article are solely those of the authors
   and do not necessarily represent those of their affiliated
   organizations, or those of the publisher, the editors, and the
   reviewers. Any product that may be evaluated in this article, or claim
   that may be made by its manufacturer, is not guaranteed or endorsed by
   the publisher.

Supplementary material

   The Supplementary Material for this article can be found online at:
   [141]https://www.frontiersin.org/articles/10.3389/fgene.2022.1019940/fu
   ll#supplementary-material
   [142]Click here for additional data file.^ (4.1MB, PDF)

References