Abstract

   Computational drug repositioning aims at ranking and selecting existing
   drugs for novel diseases or novel use in old diseases. In silico drug
   screening has the potential for speeding up considerably the
   shortlisting of promising candidates in response to outbreaks of
   diseases such as COVID-19 for which no satisfactory cure has yet been
   found. We describe DrugMerge as a methodology for preclinical
   computational drug repositioning based on merging multiple drug
   rankings obtained with an ensemble of disease active subnetworks.
   DrugMerge uses differential transcriptomic data on drugs and diseases
   in the context of a large gene co-expression network. Experiments with
   four benchmark diseases demonstrate that our method detects in first
   position drugs in clinical use for the specified disease, in all four
   cases. Application of DrugMerge to COVID-19 found rankings with many
   drugs currently in clinical trials for COVID-19 in top positions, thus
   showing that DrugMerge can mimic human expert judgment.

   Subject terms: Predictive medicine, Virtual drug screening

Introduction

   Among the strategies pharmacological science adopts for tackling
   diseases (either new or old), there are: the development of new drugs,
   the development of vaccines, and the repurposing of existing drugs.
   While the development of vaccines and novel drugs are often effective,
   they are also expensive and time-consuming. Thus repurposing of
   existing drugs is a cheaper and faster route to explore when one wishes
   to hedge one’s bets^[26]1.

   As data about drugs and diseases accumulate in databases comprising
   many aspects: genomic, transcriptomic, phenotypic, clinical, and
   epidemiological, it becomes feasible and desirable to employ this body
   of accumulated knowledge within a computerized system so as to have a
   pre-screen in silico of the candidate drugs to be repurposed for a
   specific disease^[27]2. These shortlisted drugs are in reduced number
   with respect to the initial pool of drugs and thus are more amenable to
   the subsequent steps of the full pipeline (including in vitro ed in
   vivo experiments, clinical trials, and eventual deployment in clinical
   practice). For existing drugs often their safety profile for humans is
   already known, therefore once the initial in vitro and in vivo assays
   are positive, they can be moved directly to clinical trials of phase II
   or phase III (thus skipping phases 0 and I), and accelerating the drug
   approval lifecycle.

   The recent pandemic sparked by the SARS-CoV-2 virus has placed drug
   repurposing in the spotlight because of the pressure on the public
   health systems to find cures for acute cases of patients infected by
   SARS-CoV-2. At the moment of writing, remdesivir has been approved by
   regulatory bodies to treat acute COVID-19 cases needing
   hospitalization. No other drug or treatment has passed all phases of
   clinical trials leading to full approval by the Federal Drug
   Administration (FDA) or the European Medicines Agency (EMA). Initial
   drug investigations on COVID-19 were based on the accumulated
   experience of the efficacy of drugs in vitro and in vivo experiments
   for SARS-CoV, MERS-CoV, and other coronaviruses. Computerized drug
   repurposing however aims at a more systematic approach to candidate
   drug ranking and selection. Moreover, while having accrued data from
   other viruses belonging to the same family is a bonus, we should not
   always rely on having such data handy. In a longer perspective, both
   the possible variations in the behavior of SARS-CoV-2, due to genetic
   mutations, or the emergence of new human viral diseases due to
   cross-species transmission, imply that an easy to use, accurate, and
   fast pre-screening and ranking of repurposable drugs aimed at
   countering any new viral threats is needed. Our study is a contribution
   in this direction. Network-based drug repurposing is a recent new
   approach to drug repurposing that may increase the effectiveness of
   drug repurposing pipelines. In particular network-based drug
   repurposing can merge many ‘omics’ data sets in a unified framework,
   thus such approach may be more robust and capable of including systemic
   biological effects^[28]3.

   In Lucchetta et al.^[29]4 we established Core&Peel as a competitive
   method for building Disease Active subnetworks (DAS) compared to
   several state-of-the-art algorithms in several known benchmark
   diseases. Moreover, we applied Core&Peel and the other algorithms to
   build a collection of active subnetworks from gene expression data of
   human tissues infected by COVID-19. Subsequent pathway enrichment
   analysis revealed that in a COVID-19 infection are activated pathways
   similar to those activated in other diseases (mostly viral infections).
   These observations had implications for the choice of existing drugs
   likely to have an impact on COVID-19 to be short-listed for in vitro
   experiments and eventually go on to clinical trials. Although often the
   connection between a disease-related pathway and a repurposable drug
   was obvious in the specific context, sometimes the connection was hard
   to establish since, for example, for a relevant disease-related pathway
   too many drugs are known to be effective. Moreover, there was no
   quantitative way to prioritize the potentially repurposable drugs
   obtained from this type of qualitative analysis.

   In this paper, building upon Lucchetta et al.^[30]4, we describe a
   systematic way to rank repurposable drugs for a specific disease,
   taking advantage of multiple active subnetwork algorithms and multiple
   gene expression data sets. We exploit several drug databases to
   identify drugs that can affect the expression of the genes included in
   each disease active subnetwork and thus can contrast the disease.
   Firstly, the drug ranking is based on drug significance (i.e. p-value).
   After that, we explore the ranking based on ’positivity score’, which
   measures the antagonist activity of the drug onto the disease. We also
   measure the improvements attainable by merging more drug lists that
   come from different DAS algorithms.

   In our strategy we espouse the point of view that it is convenient to
   use generalist, off-the-shelf, Disease Active subnetworks (DAS)
   algorithms rather than developing ad-hoc algorithms for specific
   networks or specific data. Generalist methods have the advantage of
   extensive prior use and validation in a variety of tasks, not
   necessarily related to drug discovery. We do not believe that a single
   approach or algorithm to building DAS can be a ‘silver bullet’ for all
   possible drug repositioning tasks, thus our aim is not that of
   indicating a priori a winner among a pool of competing algorithms and
   their combinations. Our methodology is based on using golden standard
   data (e.g. drugs already validated in clinical use) as a benchmark to
   determine the best performing algorithm or combination of algorithms to
   produce a list with validated drugs in high ranking position. The hunch
   is that the other not-yet-validated drugs high in such a ranking are
   good drug candidates to shortlist for further downstream analysis.
   Among the chosen methods, Core&Peel and ModuleDiscoverer are special
   since they produce a covering of the Disease Active subnetworks by a
   collection of dense ego-networks which are, in turns, used to determine
   the positivity score. The remaining three methods do not have this
   topological property. We have chosen a pool of well known and usable
   algorithms as a proof of concept. It is likely that DrugMerge will be
   improved in future releases by adding carefully chosen additional
   methods for DAS, preferably based on novel principles and insights.

   We use known clinically approved drugs for benchmark diseases, and
   trial drugs for COVID-19 as golden standard. We apply two measures
   (called precision@20 and reciprocal hit-rank) to assess how well
   DrugMerge performs with respect to the golden standard. We show that
   our method has better measurable performance than C-map^[31]5, a widely
   used drug ranking portal, as well as COVID-19 specific drug rankings in
   Taguchi et al.^[32]6 and Mousavi et al.^[33]7. The quality of our
   ranking results on COVID-19 is arguably comparable to those in Gysi et
   al.^[34]8 and Zhou et al.^[35]9, which use however quite different
   strategies. The whole workflow of our method is shown in Fig. [36]1.

Figure 1.

   [37]Figure 1
   [38]Open in a new tab

   Overview of the workflow. The first part of the figure shows how we
   generated the data inputs of DrugMerge. These data have been created in
   Lucchetta et al.^[39]4, and we reported them here for sake of
   completeness. We used several transcriptomic data to identify
   differentially expressed genes and the gene co-expression network to
   find one active subnetwork for each method used (Core&Peel,
   ModuleDiscoverer, Degas, ClustEx, and KeyPathwayMiner). For each of
   them, we performed the drug enrichment analysis on four drug
   perturbation databases: DrugMatrix, GEO, L1000, and CMAP. After that,
   we ranked the drugs according to the p-value or the positivity score.
   We also merged more drug lists, which come from different algorithms
   for the same disease and we applied the same ranking strategy. Finally,
   we evaluated the DrugMerge performance through the Precison@20 and the
   Reciprocal Hit-Rank using the golden-standard drugs, collected in the
   Therapeutic Target Database, and ClinicalTrials for COVID-19. The image
   has been created using Adobe Illustrator.

   Ruiz et al.^[40]10 observe that “...a drug’s effectiveness can often be
   attributed to targeting genes that are distinct from disease-associated
   genes but that affect the same functional pathways”, in contrast to
   “... existing approaches assume that, for a drug to treat a disease,
   the proteins targeted by the drug need to be close to or even need to
   coincide with the disease-perturbed proteins”. Ruiz et al. thus build a
   multi-layered interaction graph comprising proteins, drugs, diseases,
   and functional pathways (extracted from the GO repository), and devise
   a random-walk diffusion algorithm to connect the drug nodes and the
   disease nodes of the model through paths that include the functional
   nodes of the graph.

   Our approach is intermediate between the functional-based method
   advocated by Ruiz et al.^[41]10 and protein proximity based
   methods^[42]8. We do not represent GO pathways as functional nodes in
   the model, instead, we rely on a large and comprehensive gene
   co-expression network to provide the functional association in an
   implicit way. Dense subgraphs of the co-expression network will be
   formed by genes that have a coherent pattern of being up/down
   expressed, and thus are likely to be involved in specific cellular
   processes, either directly, by expressing binding proteins, or
   indirectly through cascading regulatory action chains, possibly
   involving also ncRNA and other regulatory elements not explicitly
   modelled. All algorithms used in this paper to build Active Network
   will take advantage of the dense neighborhood of the co-expression
   network, either directly (e.g. Core&Peel, ModuleDiscoverer) or
   indirectly via the augmented local graph connectivity (KeyPathwayMiner,
   Degas, ClustEx). In contrast with the proximity-based methods, we do
   not define any ad hoc explicit distance function over a network, since
   the Active Network algorithms already provide a clear-cut demarcation
   for the part of the global gene network onto which a perturbation
   should act, thus we can use directly indices for measuring module
   enrichment or incidence of the drug’s differentially expressed genes
   (DEG) onto the disease’s Active Network. We follow a network-based
   approach to drug repositioning and we focus mainly on the human
   transcriptional response to the disease, rather than the interaction
   between the disease agent (e.g. a virus) and the transcriptional
   response to it. In the case of COVID-19, also it is known that an
   over-response by the immune system is often the main cause of death in
   acute cases^[43]11, [44]12. Our strategy for COVID-19 is thus to
   prioritize drugs that act on a disease active subnetwork embedded into
   a global human gene co-expression network, to counter the virus’ global
   effects. This approach is complementary to a different one that aims at
   drugs interfering with the interactions of the host-proteins with the
   viral-protein. While the former approach aims at controlling the global
   transcriptional response of human tissues to the virus, the latter
   favors drugs blocking key aspects of the virus actions in entering the
   host cells, and activating the cells’ biochemical machinery for
   reproduction.

Results

   We compute as main quality measures the reciprocal hit ranking (RHR)
   and the precision at 20 (precision@20) of a ranking with respect to a
   golden standard : the Therapeutic Target Database (TTD)^[45]13 records
   for four benchmark diseases (asthma, rheumatoid arthritis, colorectal
   cancer, and prostate cancer), and the trials.gov records for COVID-19.
   Each measure is then normalized via a z-score, and corresponding
   p-value, referred to the distribution of quality measures given by
   taking repeatedly random permutations of the input drug data set. The
   results are shown in Figs. [46]2, [47]3, [48]4, [49]5, [50]6 and [51]7,
   Supplementary Figs. [52]S1 and [53]S2 (in Sects. [54]4 and [55]5 of
   Supplementary Material), and Supplementary Tables  [56]S1–[57]S8. For
   DrugMerge we report only significant results at p-value
   [MATH: <mo>≤</mo> :MATH]
   0.05 for at least one of the two quality measures. The DrugMerge method
   could attain significant results for all four benchmark diseases on at
   least one Drug Database (GEO, DrugMatrix, CMAP, L1000), and often on
   more than one. Moreover, as reported in the actual rankings in
   Supplementary Table [58]S7, for the four benchmark data sets the
   reported drug ranking always has a drug in clinical use at the top of
   the list for the chosen Drug data set (attaining the best p-value). In
   Fig. [59]4 we compare, using the p-value, the performance of DrugMerge
   versus the CMAP algorithm^[60]5 on the four benchmark diseases (see
   “[61]Description of CMAP algorithm” section in [62]Supplementary
   Material). DrugMerge has always a better performance in terms of
   p-value, even when the same CMAP drug database is used also by
   DrugMerge. Next, we discuss in detail the results for colorectal
   cancer, prostate cancer, and COVID-19. The results of asthma and
   rheumatoid arthritis are reported in Supplementary Material (Sects.
   [63]4, [64]5).

Figure 2.

   [65]Figure 2
   [66]Open in a new tab

   DrugMerge performance on colorectal cancer data. The bars represent the
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mi>p</mi><mspace
   width="0.166667em"></mspace><mi mathvariant="normal">value</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   with respect to the precision@20 (red), and the RHR (light blu). On the
   x-axis, different algorithms or combinations of them are shown. The
   numbers on the top of the bars show the absolute values of precision@20
   or RHR. The dotted red line represents the limit of significance (
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mn>0.05</mn><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   ). All the bars above the line show a significant p-value. The (a)
   panel represents the DrugMerge performance when only FDA-approved drugs
   are considered; (b) when all drugs (without any FDA filtering) are
   considered. The plot has been generated by the ggplot2^[67]81 R
   package.

Figure 3.

   [68]Figure 3
   [69]Open in a new tab

   DrugMerge performance on Prostate Cancer data. The bars represent the
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mi>p</mi><mspace
   width="0.166667em"></mspace><mi mathvariant="normal">value</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   with respect to the precision@20 (red), and the RHR (light blu). On the
   x-axis, different algorithms or combinations of them are shown. The
   numbers on the top of the bars show the absolute values of precision@20
   or RHR. The dotted red line represents the limit of significance (
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mn>0.05</mn><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   ). All the bars above the dotted line show a significant p-value. The
   (a) panel represents the DrugMerge performance when only FDA-approved
   drugs are considered; (b) when all drugs (without any FDA filtering)
   are considered. The plot has been generated by the ggplot2^[70]81 R
   package.

Figure 4.

   [71]Figure 4
   [72]Open in a new tab

   Comparison of performance of DrugMerge and the CMAP algorithm across
   the four benchmark diseases. For Colorectal Cancer and Prostate Cancer
   both methods use CMAP data. For Asthma and Rheumatoid Artrithis
   DrugMerge uses GEO data. The bars represent the
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mi>p</mi><mspace
   width="0.166667em"></mspace><mi mathvariant="normal">value</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   with respect to the precision@20 (red), and the RHR (light blu). The
   numbers on the top of the bars show the absolute values of precision@20
   or RHR. The dotted red line represents the limit of significance (
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mn>0.05</mn><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   ). All the bars above the dotted line show a significant p-value. The
   plot has been generated by the ggplot2^[73]81 R package.

Figure 5.

   [74]Figure 5
   [75]Open in a new tab

   DrugMerge performance on COVID-19 data restricted to drugs with FDA
   approval. The bars represent the
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mi>p</mi><mspace
   width="0.166667em"></mspace><mi mathvariant="normal">value</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   with respect to the precision 20 (red), and the RHR (light blu). On the
   x-axis, different algorithms or combinations of them are indexed with
   aliases, since the complete algorithm names are too long. The mapping
   between them is on Table S9. The numbers on the top of the bars show
   the absolute values of precision@20 or RHR. The dotted red line
   represents the limit of significance (
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mn>0.05</mn><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   ). All the bars above the dotted line show a significant p-value. The
   plot has been generated by the ggplot2^[76]81 R package.

Figure 6.

   [77]Figure 6
   [78]Open in a new tab

   DrugMerge performance on COVID-19 data without FDA approval
   restriction. The bars represent the
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mi>p</mi><mspace
   width="0.166667em"></mspace><mi mathvariant="normal">value</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   with respect to the precision@20 (red), and the RHR (light blu). On the
   x-axis, different algorithms or combinations of them are indexed with
   aliases, since the complete algorithm names are too long. The mapping
   between them is on Table [79]S9. The numbers on the top of the bars
   show the absolute values of precision@20 or RHR. The dotted red line
   represents the limit of significance (
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mn>0.05</mn><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   ). All the bars above the dotted line show a significant p-value. The
   plot has been generated by the ggplot2^[80]81 R package.

Figure 7.

   [81]Figure 7
   [82]Open in a new tab

   Performance of drug rankings reported in literature for COVID-19 drug
   repurposing. The (a,b) panels show the absolute values of RHR and
   precision 20, respectively. The (c,d) show the
   [MATH:
   <mrow><mo>-</mo><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub><
   mrow><mo stretchy="false">(</mo><mi>p</mi><mspace
   width="0.166667em"></mspace><mi mathvariant="normal">value</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   of the RHR and precision 20, respectively. The dotted red line
   indicates the 0.05 significance threshold. The analyses have been
   performed using only FDA-approved drugs (green bars) and without any
   FDA filtering (red bars). The plot has been generated by the
   ggplot2^[83]81 R package.

DrugMerge results on colorectal cancer

   Figure [84]2 and Supplementary Table [85]S3 show that DrugMerge finds
   significant rankings in several drug databases (L1000, CMAP, GEO),
   using several algorithms, both when all drugs are considered or when
   only FDA-approved drugs are considered. For the L1000 dataset
   Supplementary Table [86]S7, we find in the first position the
   clinically relevant drug erlotinib. Other drugs in clinical use
   according to the TTD records that appear in the top 20 positions are
   lapatinib and sunitinib. Among the top positions in the ranking, we
   find drugs with known effects in colorectal cancer patients, related
   animal models, or cell lines, and one of these went into clinical trial
   stage: mitoxantrone^[87]14, sorafenib^[88]15, sutent/sunitinib (in
   clinical trial [89]https://clinicaltrials.gov/ct2/show/NCT00961571),
   dasatinib^[90]16, sirolimus (rapamycin)^[91]17, azacitidine^[92]18,
   imatinib^[93]19, tamoxifen^[94]20, doxorubicin^[95]21,
   naltrexone^[96]22, and digoxin^[97]23 . Mitoxantrone has been tested in
   13 patients with advanced or metastatic colon carcinoma and the authors
   observed that mitoxantrone was moderately effective^[98]14. Sorafenib
   is a kinase inhibitor drug approved for the treatment of some cancers.
   It is known that sorafenib has a strong antitumor and antiangiogenic
   effect on the colorectal cancer cell line^[99]15. Scott et al.^[100]16
   demonstrated that dasatinib had significant anti-proliferative activity
   in a subset of a colorectal cancer cell line, even if dasatinib is
   currently being studied in combination with chemotherapy, as its use as
   a single agent appears limited. Wagner et al.^[101]17 studied the
   effects of rapamycin in mice and showed that it was able to suppress
   advanced stage colorectal cancer. Azacitidine has been tested in
   combination with entinostat in metastatic colorectal cancer patients,
   resulting in a tolerable therapy^[102]18. Imatinib is an oral
   chemotherapy medication used to treat cancer and it has been proved
   that it inhibits proliferation of colon cancer cells^[103]19. Kuruppo
   et al.^[104]20 demonstrated that tamoxifen has a potent inhibitory
   action on colorectal cancer metastases in a murine model. Doxorubicin
   is a chemotherapy medication used to treat cancer. In Lee et
   al.^[105]21 the authors propose doxorubicin-loaded oligonucleotides
   attached to gold nanoparticles as a drug delivery system for cancer
   chemotherapy. They tested this approach in colon cancer cell lines,
   demonstrating the drug’s efficacies such as in vitro cytotoxicity.
   Naltrexone is primarily used to manage alcohol or opioid dependence. Ma
   et al.^[106]22 explored the inhibitory effect of low-dose naltrexone on
   colorectal cancer progression in vivo and in vitro. Digoxin is used to
   treat various heart conditions such as atrial fibrillation, and heart
   failure. A recent study has examined the effects of digoxin on two
   kinds of colorectal cancer cells. This study suggests that digoxin has
   the potential to be applied as an antitumor drug via inhibiting
   proliferation and metastasis^[107]23.

   For the remaining drugs in the top twenty positions, we focus on their
   action mechanism (i.e. we determine the main pathways affected by the
   drug having the desired pharmacological effect), and we find some links
   with colorectal cancer: epirubicin^[108]24, everolimus^[109]25,
   itavastatin^[110]26, and dactinomycin^[111]27. Epirubicin is a
   chemotherapy agent, intercalating into DNA and inhibiting topoisomerase
   II, thereby inhibiting DNA replication. In Ref.^[112]24 the authors
   demonstrate topoisomerase IIa gene and protein alterations in
   colorectal cancer and show that increased expression is associated with
   pathologically advanced disease. Everolimus is a medication used as an
   immunosuppressant and in the treatment of several tumors. Its main
   mechanism of action is mTOR inhibition. Thiem et al.^[113]25
   demonstrated that the mTORC1 inhibition reduces the inflammation
   associated with the gastrointestinal tumor. Itavastatin (or
   pitavastatin) belongs to the class of statins, which are inhibitors of
   HMG-CoA reductase, the enzyme that catalyzes the first step of
   cholesterol synthesis. A meta-analysis work^[114]26 showed that statin
   use is a protective factor for colorectal cancer prognosis, and it
   might be significantly associated with lower overall mortality.
   Finally, dactinomycin is a chemotherapy medication and can inhibit
   transcription. Recent studies have revealed crucial roles of
   transcription regulation in colorectal cancer development. In
   particular, deregulation of transcription factors is pretty frequent,
   and drastic changes in gene expression profiles play fundamental roles
   in the multistep process of tumorigenesis^[115]27.

DrugMerge results on prostate cancer

   Figure [116]3 and Supplementary Table [117]S4 show that DrugMerge finds
   significant rankings in several drug databases (L1000, GEO) for
   prostate cancer, using several algorithms, both when all drugs are
   considered or when only FDA approved drugs are considered. For the
   L1000 data set Supplementary Table [118]S7, we find in the first
   position the clinically relevant drug mitoxantrone. Other drugs in
   clinical use according to the TTD records that appear in the top twenty
   positions are dasatinib, decitabine, menadione, and doxorubicin
   hydrochloride. Among the top twenty positions in the ranking we find
   drugs with known effects in prostate cancer patients, related animal
   models, or cell lines, and one of these went into clinical trial stage:
   erlotinib^[119]28, sorafenib^[120]29, lapatinib^[121]30,
   sirolimus^[122]31, sutent (sunitinib)^[123]32, azacitidine^[124]33,
   everolimus^[125]34, gemcitabine (in clinical trial
   [126]https://clinicaltrials.gov/ct2/show/NCT00276549),
   ciclosporine^[127]35, and auranofin^[128]36.

   A study of 30 patients with advanced or metastatic prostate cancer has
   been conducted to test the effect of erlotinib^[129]28, and it
   demonstrated an improvement in clinical benefit. Sorafenib has been
   tested in patients with hormone-refractory prostate cancer^[130]29,
   indicating that the treatment may be associated with good outcomes in
   terms of overall survival. Lapatinib is a dual tyrosine kinase
   inhibitor of the epidermal growth factor 1 (EGFR) and 2 (HER2). In a
   study of twenty-nine castration-resistant prostate cancer patients,
   lapatinib showed single-agent activity in a small subset of unselected
   patients^[131]30. Nandi et al.^[132]31 investigated the effect of
   sirolimus encapsulated liposomes in two different prostate cancer cell
   lines, and noticed an antiproliferative effect. Sunitinib, an inhibitor
   of tyrosine kinase receptors, has been studied in men with
   castration-resistant prostate cancer^[133]32, indicating that it is
   well tolerated and may have modest benefit in this patient population.
   Azacitidine, a demethylating agent, has been tested in combination with
   docetaxel, and prednisone in patients with metastatic
   castration-resistant prostate cancer and showed to be active in this
   subset of population^[134]33. Everolimus, an mTOR inhibitor, has been
   assessed in mice, resulting in an inhibition of prostate cancer
   growth^[135]34.

   For the remaining drugs, we focus on their action mechanism, and we
   find some links with prostate cancer for imatinib^[136]37,
   epirubicin^[137]38, and mitomycin C^[138]39. Imatinib is an oral
   chemotherapy medication and functions as a tyrosine kinase inhibitor.
   Vicentini et al.^[139]37 investigated the effects of the epidermal
   growth factor receptor tyrosine kinase inhibitor on human prostatic
   cancer cell lines, suggesting that it may have the potential in
   blocking tumor growth and progression. Epirubicin has been already
   found in colorectal cancer and as mentioned before, it is a
   topoisomerase II inhibitor. An in vitro cell-based assay^[140]38
   demonstrated that MHY336, a topoisomerase II inhibitor, significantly
   inhibited the proliferation of three prostate cancer cell lines.
   Finally, mitomycin C is used as a chemotherapeutic agent and it is a
   potent DNA crosslinker. DNA crosslinking has useful merit in
   chemotherapy and targeting cancerous cells for apoptosis^[141]40. It is
   known that evading cell death is one hallmark of cancer, in particular,
   prostate cancer cells develop multiple apoptosis blocking strategies
   during the various stages of tumor progression^[142]39.

DrugMerge results on COVID-19

   Figures [143]5 and [144]6 and Supplementary Table [145]S5 show that
   DrugMerge finds significant rankings in all drug databases (L1000,
   CMAP, GEO, DrugMatrix), using a variety of algorithms and their
   combinations, both when all drugs are considered and when only FDA
   approved drugs are considered. For COVID-19 we have a choice of three
   disease data sets : (i) human adenocarcinomic alveolar basal epithelial
   A549 cells (cells); (ii) bronchoalveolar lavage fluid RNA-Seq (BALF);
   (iii) infected patient peripheral blood mononuclear cells (PBMC), four
   Drug datasets (L1000, CMAP, GEO, and DrugMatrix), and several
   algorithms and weighting functions. We proceed by first running a
   single algorithm on a single drug data set, collecting the
   configurations that give the best significant performance measures.
   Next, we seek improvements by merging the rankings for several
   algorithms with significant performance on a single disease data set.
   And finally, we combine the improved significant results by merging the
   rankings obtained by several significant algorithms on several COVID-19
   disease data sets. Figures [146]5 and [147]6 and Supplementary
   Table [148]S5 report the best results in each phase, while Table S9
   report the full algorithmic settings.

   We focus now on two such rankings in Supplementary File [149]S7
   attaining the best p-values on RHR for all drugs or FDA-approved drugs
   only. The highest p-value on RHR when only FDA-approved drugs are
   considered is attained on the CMAP Drug dataset using all three disease
   data sets for COVID-19 and three algorithms (ClustEx, Core&Peel,
   Degas), (Alg 01 in Fig. [150]5, see Supplementary Table [151]S9). The
   top two positions of the ranking report two drugs listed in clinical
   trial on [152]www.trials.gov for COVID-19: etoposide and mefloquine.
   Within the first 20 positions, further 5 drugs are in clinical trials
   for COVID-19: colchicine, chlorpromazine, propofol, tretinoin, and
   ivermectin. Among the top ten positions in the ranking we find drugs
   with known effects animal models or cell lines or proposed for further
   scrutiny in silico screenings: thioridazine^[153]41,
   prochlorperazine^[154]42, primaquine^[155]43, fluphenazine^[156]44. The
   highest p-value on RHR when all drugs are considered is attained by
   KeyPathwayMiner (KPM) on the L1000 drug data set with the BALF COVID-19
   disease data set (Alg 17 in Fig. [157]6, see Supplementary
   Table [158]S9). The top position of the ranking reports a drug listed
   in clinical trial on [159]www.trials.gov for COVID-19: sirolimus.
   Within the first twenty positions further 4 drugs are in clinical
   trials for COVID-19: decitabine, tamoxifen, simvastatin, and imatinib.
   Among the top ten positions in the ranking we find drugs with known
   effects in animal models or cell lines or proposed for further scrutiny
   in in silico screenings: lapatinib^[160]45, dasatinib^[161]46,
   mitoxantrone^[162]47, everolimus^[163]48, sunitinib^[164]49.

   For the remaining drugs in the top 10 positions in both lists, we
   determine the main mechanism of action (i.e. we determine the main
   pathways affected by the drug having the desired pharmacological
   effect) and we then look for hints in the literature of relevance of
   these pathways for COVID-19. In the first list, we find
   methylergometrine, which is a partial agonist and antagonist of
   serotonin, dopamine, and
   [MATH: <mi>α</mi> :MATH]
   -adrenergic receptors. Costa et al.^[165]50 conjectured that serotonin
   inhibitors could have a neuroprotective effect in patients affected by
   COVID-19. In the second list, we find erlotinib, stuent/sunitinib,
   azacitidine, and sorafenib. Erlotinib is an epidermal growth factor
   receptor (EGFR) inhibitor in use for the treatment of non-small cell
   lung cancer (NSCLC) and pancreatic cancer. Epidermal growth factor
   receptor (EGFR) signaling is known to be involved in the progress of
   viral infections^[166]51, including those caused by SARS-CoV^[167]52
   and SARS-CoV-2^[168]53, thus it is a plausible target pathway to
   consider. Sunitinib (Sutent) multi-targeted receptor tyrosine kinase
   (RTK) inhibitor used in the treatment of renal cell carcinoma (RCC) and
   imatinib-resistant gastrointestinal stromal tumor (GIST). Weidberg et
   al.^[169]46 recently advocated the repurposing of kinase inhibitors for
   the treatment of COVID-19. Azacitidine has been shown effective against
   human immunodeficiency virus (HIV) in vitro^[170]54 and human
   T-lymphotropic virus (HTLV)^[171]55. Its action is mainly at an
   epigenetic level, inhibiting DNA methyltransferase, causing
   hypomethylation of DNA. The role of epigenetics in COVID-19 is a
   growing area of research with large potential for advances in drug
   target identification^[172]56. Sorafenib is a kinase inhibitor drug
   approved for the treatment of advanced primary renal cell carcinoma,
   advanced primary liver cancer, and forms of resistant advanced thyroid
   carcinoma. Sorafenib is a protein kinase inhibitor with activity
   against the Raf/Mek/Erk pathway^[173]57. Several authors have explored
   the role of MAP Kinase signaling pathway in the coronavirus family at
   large and for SARS-CoV-2 in particular^[174]58. In Supplementary
   File [175]S8 and Fig. [176]7 we report performance measures for the
   proposed COVID-19 drug repurposing rankings from Taguchi et al.^[177]6,
   Mousavi et al.^[178]7, Gysi et al.^[179]8, and Zhou et al.^[180]9.

   For Taguchi et al. and Mousavi et al. the drug data sets used are
   overlapping with ours thus it is possible to compute the significance
   of the performance measures on a common basis. Interestingly as shown
   in Supplementary File [181]S8 no such rankings attain the significance
   threshold p-value
   [MATH: <mo>≤</mo> :MATH]
   0.05 in any configuration. For Gysi et al. and Zhou et al., the drug
   data sets used are different from the ones we use, and the two methods
   are not based on gene expression data, and it is not possible to
   determine with confidence the initial pool of drug candidates. For this
   reason, we do not compute the normalized significance, and we report
   directly the absolute performance measures. The results of Gysi et al.
   are in absolute value superior to those of Zhou et al. and
   qualitatively comparable to those obtained by DrugMerge on GEO data for
   precision@20, and on all four Drug datasets for RHR. In particular,
   Gysi et al.^[182]8 use data on interactions between viral and human
   proteins in Gordon et al.^[183]59 to seed their active subnetwork and
   finish their listing with the help of expert advice, which we do not
   use. Gysi et al. also have a notion of ‘positive’ drug action akin to
   the one used in our paper.

Discussion

   As drug repositioning is a vast subject, with implications from several
   areas of biology and medicine, we comment on the relationship of
   DrugMerge with several issues arising in the Drug Repositioning
   literature. Each issue is introduced by a heading.

Three strategies

   We can identify three main strategies in drug repositioning (DR): the
   first is disease-oriented DR (i.e. find repurposable drugs for a
   specified disease), the second is drug-oriented DR (finding a disease
   for a specified drug), the third is generic DR (find any novel
   drug-disease pair). The approaches may vary significantly across the
   three types, and also the validation methodology is strongly influenced
   by the objective of the strategy. In this study we tackle
   disease-oriented DR, however, we should point out that the base
   methodology can in principle be adapted easily to the other two
   strategies, provided sufficient high-quality data is available.

DrugMerge in a nutshell

   DrugMerge is a proof-of-concept exploring a novel use of disease active
   subnetworks in the context of drug repositioning aimed at a specific
   disease. In particular, DrugMerge uses multiple Disease Active
   subnetworks (DAS) algorithms and a state-of-the-art rank merging
   procedure in conjunction with two scoring functions. The first scoring
   function is directly the p-value of the enrichment of a drug DE genes
   within the DAS. The second scoring function, called ‘positivity score’,
   measures the antagonistic activity of the drug onto the disease, as
   mediated by a covering of the DAS as a collection of dense ego-networks
   computed by two methods: Core&Peel and ModuleDiscoverer. Matching the
   resulting rankings with the golden standard of a disease indicates the
   configuration attaining the best results, and thus those likely to
   include good candidates for drug repositioning in top positions (see
   “[184]Methods” section for more details).

Performance on real data

   The tests involving four benchmark diseases with drugs in clinical use
   did confirm the superior performance of the proposed method over the
   current state of the art (e.g. CMAP) in placing such clinically
   approved drugs in top-ranking positions. In this case, the evaluation
   criterion is very strict, however we were able to guess correctly the
   efficacy and safety of drugs administered to patients basing our
   analysis only on transcriptomic data from in vitro cell lines and
   tissues. Interestingly, while efficacy is in principle detectable by
   examining specific types of cells or tissues primarily affected by a
   disease, safety is a much broader concept involving all human organs
   and tissues, thus harder to pinpoint with localized experiments either
   in vitro or in silico.

   Tests on the data from the COVID-19 disease also show, according to our
   validation criterion, a performance superior to some of the approaches
   proposed in the literature, while for some others we notice a broadly
   equivalent outcome, even when using different approaches and different
   input data. Results on COVID-19 should be taken with caution, however,
   as our validation methodology implies that the proposed ranking is able
   to mimic the best judgment of human experts, which may be based on
   their prior knowledge and expertise, data available to them, and their
   assessment of the odds, leading to a shortlisting of such drugs in
   clinical trials. Thus our validation methodology might indeed be
   failing us in cases where the disease to be tackled has key features
   that escape our present general understanding. What can be gained in
   our exercise is a way to make drug shortlisting for in vitro further
   assessment (and eventual accession to clinical trial) more systematic,
   while also using relatively cost effective data sources.

Use of trascriptomic data

   Besides the global gene co-expression network, which we can assume as
   background knowledge, our method relies on rather standard and
   relatively inexpensive RNA-Seq and microarray transcriptomic
   measurements of differentially expressed genes in infected
   cells/tissues. Systematization and finding alternatives to pure
   serendipity (discovery by chance) are key aspects of all in silico
   based drug repositioning efforts. In the “[185]Related work” section on
   Supplementary Material, we discuss other related works on drug
   repositioning. One bonus of this parsimonious data requirement is that
   it becomes easier to track the evolution of viral Mechanism of Action
   and disease etiology as the viral agent mutates, thus in principle the
   drug ranking, and thus follow up decisions, may change rapidly when
   transcriptomic data on new viral strains of COVID-19 become available.

Other coronaviruses

   SARS-CoV-2 is a member of the coronavirus family, responsible also for
   the SARS and MERS epidemics. Many studies, in particular at the end of
   2019 and beginning of 2020, make up for the paucity of data on
   SARS-CoV-2 by making use of genomic, transcriptomic, proteomic, and
   pharmacological data relative to SARS-CoV and MERS-CoV and other
   members of the coronavirus family, including those known to be hosted
   by other animal species.

   In our study, we chose not to use data from viruses different from
   SARS-CoV-2 as primary data. This choice comes from two orders of
   considerations. The first is that data from close relatives of the
   pathogen may not always be available, and there is thus a general long
   term advantage attained by methods that rely just on relatively small
   and focused data. The second issue is that we focus not on the pathogen
   in itself but on the human transcriptional response to it, which is
   known to be rather different across SARS-CoV-2, SARS-CoV, and MERS
   epidemics. In this line of reasoning, it makes more sense to highlight
   what makes human transcriptional response to SARS-CoV-2 different from
   that induced by other coronaviruses, even when the causing pathogens
   are phylogenetically very close.

Current limitations of DrugMerge

   We use the list merging procedure to merge lists obtained by several
   algorithms and several DE gene data sets from different cell
   lines/tissues but always referred to a single drug data sets, and a
   single disease. Different drug data sets are often obtained using
   different initial candidate drugs and use different technologies.
   Moreover sometimes, due to the different technologies and other
   experimental settings, even when we restrict the analysis to common
   drugs, replicability across drug data sets may be low (see
   e.g.^[186]60, [187]61 reporting low replicability across CMAP and L1000
   data using the CMAP algorithm). Moreover the performance measures we
   use measure the quality of the whole ranking, and the ranking itself
   gives a way to compare drugs listed by position. However, based on
   these measures we refrain from comparing (and thus give a relative
   order) to two drugs in different rankings obtained with different
   perturbation drug data sets. Such a global ranking would need further
   insight and technological bias correction, which we leave for future
   research.

Dose dependency

   Drug datasets report often measurements for a range of different
   dosages taken at a range of different time points. For the p-valued
   ranking, we use the dosage/time point giving the lowest p-value. For
   rankings based on positivity score, we take the average positivity
   score. These choices are justified for an initial assessment of drug
   potential to be effective against a disease. Dosage and time points
   could be taken into account together with available dose-dependent
   toxicity data in a second level of analysis when additional data is
   collected in in vitro experiments.

Tissue specificity

   For this study, we use all available transcriptomic drug and disease
   data regardless of the cell lines/tissues/species for which the
   transcriptomic measurements are taken. This choice increases the
   robustness of the results in general, however, it might miss some more
   subtle tissue-specific phenomena.

   On the one hand, it is known that the SARS-CoV-2 virus infects several
   human organs including lungs, liver, kidneys, and gut^[188]62, and the
   cells transcriptional response may in principle be diverse in each
   tissue. However, Dudley et al.^[189]63 analyzing a large repository of
   disease-related gene differential expression data conclude that
   “molecular signature of disease across tissues is overall more
   prominent than the signature of tissue expression across diseases”. Our
   analysis of active subnetworks extracted from three COVID-19 data sets
   (BALF, PBMC, cells) also supports for COVID-19 a general high
   similarity of gene expression across the sample types. Differences in
   the extracted active networks are mainly due to the algorithms used.

   Also, drugs may act differently in different cell types. For example,
   chloroquine has been found to be ineffective in infected human lung
   cells^[190]64, while it was found effective in infected Vero cells (a
   cell line isolated from kidney epithelial cells extracted from an
   African green monkey)^[191]65. Similar tissue-specific differences in
   drug sensitivity are noticed by Dittmar et al.^[192]66. In principle,
   if a variety of options are available, one may opt for including only
   the data drawn from biological samples more similar to the actual
   tissues affected by the disease in patients.

   In our study, we use a large variety of cellular types for drug
   perturbation analysis. Drug matrix has drug perturbation signatures
   available on liver, heart, kidney, thigh muscle, and primary
   hepatocytes (from Rattus norvegicus). GEO has drug perturbation
   signatures for several tissues and species including homo sapiens, mus
   musculus and Rattus norvegicus. CMAP measures drug perturbations in
   breast cancer epithelial cell line MCF7 (MCF7/ssMCF7), in prostate
   cancer epithelial cell line PC3, and the nonepithelial lines HL60
   (leukemia) and SKMEL5 (melanoma). L1000 collects drug perturbation data
   from 98 different cellular contexts including human primary cell lines
   and human cancer cell lines.

   For COVID-19 we report two drug rankings attaining top performance, one
   is obtained by merging data from all three tissues used to measure DEG,
   therefore it indicates evidence for drugs that may act on a general
   systemic level across different human organs. This is complemented by
   the second ranking which is attained specifically for BALF measurements
   (bronchoalveolar lavage fluid) and thus it may indicate drugs with a
   specific action in pulmonary tissues.

Drug mechanisms of action

   One advantage of our approach, in contrast with many others, is that we
   do not have an initial explicit guess for a putative target protein for
   the drug or a putative target pathway (or mechanism of action).
   However, once relevant drugs are shortlisted we can analyze their
   likely mechanism of action, either through the literature or through
   the use of specialized databases.

On specific drugs in clinical trials

   In this study, we have used four public Drug-related data sets (GEO,
   DrugMatrix, CMAP, and L1000) representing a sufficient drug basis for
   our investigation and the validation of the proposed methodology.
   However, a much larger amount of useful data on drugs is proprietary
   therefore it is important to devise incentives for sharing these
   repositories of useful data in a global drug repositioning strategy. As
   a consequence of the non-complete drug coverage of our data sets we
   could not assess some drugs whose use in COVID-19 cases has been
   advocated, such as remdesivir (approved by FDA in October 2020^[193]67)
   or lopinavir (not approved by FDA). Cao et al.^[194]68 recently
   reported the results of a randomized clinical trial involving 199
   hospitalized adult patients with severe COVID-19: no benefit was
   observed with lopinavir-ritonavir treatment over the standard care
   cohort. Interestingly, we have data from DrugMatrix on ritonavir (not
   approved by FDA), which fails the positivity score filter, thus our
   results on ritonavir are consistent with the findings in Cao et al.
   which are at a clinical level.

Role of comorbidities

   Comorbidities in patients affected by COVID-19 are important clinical
   aspects, as it is known that comorbidities are one of the strongest
   risk factors for such patients, strongly affecting mortality
   rates^[195]69. Comorbidities may alter drastically the effect of drugs.
   The disease gene expression data (for all five diseases) we use in this
   study do not come from patients with known comorbidities, therefore the
   results of our study should be interpreted in such a setting. Fiscon et
   al.^[196]70 try to exploit known comorbidities as a positive tool in
   modeling the relevant features of COVID-19, while other authors, based
   on protein target analysis, have a more skeptical view^[197]8: “In
   summary, we find that the SARS-CoV-2 targets do not overlap with
   disease genes associated with any major diseases, indicating that a
   potential COVID-19 treatment can not be derived from the arsenal of
   therapies approved for specific diseases”. The effect of concomitant
   comorbidities on repurposed drugs might be better approached when
   transcriptomic data from these sub-populations of COVID-19 patients
   become available.

Drug combinations

   In this study, we do not treat the issue of combinations of drugs. This
   is a novel area of research in which network-based approaches have been
   shown to give novel insight and may help in coping with the
   combinatorial explosion of drug combinations. However, validated data
   on drug combinations is limited thus making the task of performance
   evaluation and method validation more complex w.r.t the analysis for
   single drugs. We plan to incorporate models of multi-drug/disease
   interactions in our setting as future research^[198]71.

Toxicity and adverse events

   Drugs can be toxic or cause adverse events. Toxicity can be detected
   via ad-hoc in vitro ed in vivo experiments or it can be predicted via
   in silico models^[199]72. Any of these techniques should be applied
   downstream to the use of DrugMerge for extra safety. DrugMerge uses two
   forms of filtering to handle adverse effects and toxicity. The first
   mechanism is the inclusion of drugs having a contrary effect to the
   disease as measured by the positivity score (which in turn is similar
   to a mechanism also used by Gysi et al.). The second filtering is via
   the possibility to limit the analysis to FDA-approved drugs, for which
   toxicity and adverse effects are known and have been considered to be
   outweighed by the benefits in certain conditions. This mild form of
   filtering is sufficient to increase the performance in our tests and
   thus mimic human expert judgment, which includes consideration of
   toxicity and adverse effects. However, toxicity should be always be
   reconsidered ex-post to finalize any shortlisting decision.

Role of the disease causes

   Since we base our approach on the transcriptomic responses of human
   cells/tissues to drugs and diseases, the cause of the disease (viral,
   bacterial, genetic, metabolic, etc.) is not critical for the
   application of the methodology. Indeed in our study, we applied it to a
   variety of cases. Asthma is an inflammatory disease of the airways of
   the lungs having several triggering causes, including exposure to air
   pollution and allergens. Rheumatoid arthritis (RA) is a long-term
   autoimmune disorder that primarily affects joints. Cancer is generally
   associated with alterations of the genetic code in somatic human cells.
   Finally, COVID-19 is of viral origin. The advantage of generality is
   balanced by the possibility that a high-ranking drug is effective
   against symptoms or side-effects of the diseases, rather than acting on
   its main mechanism of action. In one of the drug rankings for COVID-19,
   we find within the top 20 positions prochlorperazine which has both
   antiviral activity^[200]41 and is also a broad-spectrum antiemetic
   contrasting COVID-19 symptoms like nausea and vomiting^[201]42. At this
   level of analysis, DrugMerge does not yet distinguish the different
   actions a drug might have thus further investigations are needed to
   untangle complex drug actions.

Conclusions

   In this conclusive section we summarize the features of DrugMerge and
   we comment on possible extensions and applications, as well as
   additional benefits of drug prioritization in general.

   In this study, we describe DrugMerge, a methodology for ranking
   repurposable drugs where we rank drugs by their ability to affect
   collections of disease active subnetworks (computed by an ensemble of
   state-of-the-art algorithms) and contrast the perturbation induced by
   the disease in such sub-networks. DrugMerge should be considered as a
   proof-of-concept and its predictions should be considered as
   indications for shortlisting drugs for further in vitro analysis rather
   than absolute predictions. In particular, aspects like dosage, tissue
   specificity, and a more refined analysis of possible adverse side
   effects will need to be tackled further in developing DrugMerge towards
   a full-fledged product.

Network drug-disease model

   The rationale behind our approach is a model in which drugs and
   diseases interact indirectly by their mutual action on the sub-networks
   rather than directly by affecting the same target proteins. Moreover,
   we are able to leverage an ensemble of different algorithms for
   computing active subnetworks, and then merge the resulting rankings,
   thus increasing the robustness of the methodology. Results of four
   benchmark diseases are encouraging since we could find in all four
   cases drugs in clinical use (thus both safe and effective) in first
   ranking positions, and several other drugs in clinical use in high
   ranking positions.

Benchmarking with prior knowledge

   These results imply that DrugMerge when provided with input data from
   transcriptional assays, has the potential for coming close to expert
   human judgment, and guess drugs in actual clinical usage. For COVID-19
   we measure how close we come to guessing drugs that have been
   shortlisted for clinical trials according to the FDA records. For most
   drugs in high ranking positions but not on clinical trials we have
   found several supporting evidence in the literature or by direct
   assessment of their likely Mechanism of Action, that makes their
   indication by DrugMerge biologically plausible.

Other benefits of drug prioritization

   Prioritizing drugs for in vitro testing and eventual accession to
   clinical trials is valuable for decision makers in a pandemic
   situation. Lack of prioritization in drug repurposing efforts has led
   to some drugs being over-researched, early in the pandemic, even based
   on dubious evidence (e.g. hydroxycholoroquine in COVID-19). Moreover
   having many small clinical trials has the effect of reducing the pool
   of eligible patients available for larger and more promising trials (
   Generic Drug Repurposing for COVID-19 and Beyond. Susan Athey, Rena
   Conti, Richard Frank, and Jonathan Gruber Institute for Health System
   Innovation & Policy. July2020.
   [202]https://www.gsb.stanford.edu/faculty-research/publications/generic
   -drug-repurposing-covid-19-beyond).

Sub-populations of patients

   A tool such as DrugMerge aiming at drug ranking might be used in
   conjunction with an activity of data collection that targets specific
   sub-populations of patients. For example, most trials to date for
   repurposed drugs in COVID-19 have focused on hospitalized patients,
   even though prevention might be more valuable and effective if
   initiated earlier in disease progression. Also transcriptomic data from
   patients affected by frequent comorbidities might help towards the
   study of drugs specifically selected for sub-populations at higher risk
   of mortality.

Preclinical nature of this research

   This paper focuses on the preclinical considerations that may indicate
   a drug to be shortlisted for further preclinical in vitro and in vivo
   testing, leading to eventual clinical trial. Naturally, all evidence,
   which may conceivably be noisy and often contradictory, should be
   gathered and considered before pushing a drug further down the
   pipeline.

Methods

Drug ranking based on p-value

   We use the Core&Peel technique described in Lucchetta et al.^[203]4 and
   4 other methods (ClustEx^[204]73, ModuleDiscoverer^[205]74,
   Degas^[206]75, and KeyPathwayMiner^[207]76) to produce active
   subnetworks for a given disease D, starting from a list of
   differentially expressed genes in samples affected by the disease
   w.r.t. healthy controls DEG(D) and a gene co-expression network. In
   Lucchetta et al., we employed the gene co-expression network made
   available by the DREAM challenge^[208]77, and DEG for seven different
   datasets: two inflammatory disease microarray experiments (asthma and
   rheumatoid arthritis), two cancer RNA-Seq studies (prostate and
   colorectal cancer), and three COVID-19 datasets called BALF
   (bronchoalveolar lavage fluid RNA-Seq), PBMC (infected patient
   peripheral blood mononuclear cells) and COVID19 cells (human
   adenocarcinomic alveolar basal epithelial A549 cells). In Lucchetta et
   al., we compared the performance of Core&Peel and the other four
   algorithms in detecting active subnetworks, resulting in that Core&Peel
   is a competitive method in that purpose. In this work, we use the five
   active subnetworks detected by all five methods as inputs of DrugMerge.
   Let AN be any such active network. For a given Drug perturbation
   datasets (for us: GEO, DrugMatrix, CMAP, and L1000; see description in
   “[209]Drug perturbation databases” section of Supplementary Material)
   using the enrichR R/Bioconductor package, we obtain a listing of drugs
   enriched in AN ranked by their adjusted p-value in reverse order (from
   the smallest to the largest). We cut this list at the adjusted p-value
   of 0.05, retaining only the significantly enriched drugs.

Filtering of drug ranking

   We have noticed that a straightforward application of the p-value based
   drug ranking includes in top ranking positions some molecules and drugs
   that are known to be either generally toxic or to induce diseases in
   animal models. For this reason, we introduce the option of applying two
   filters to a ranked list of drugs. The first filter is used to retain
   only drugs that are approved by the FDA (we use a list of drugs
   downloaded from [210]https://www.accessdata.fda.gov on September 9th
   2020, with 6315 entries). For FDA-approved drugs usually adverse
   effects and toxicity levels are known and probably acceptable in most
   cases. The second filter retains drugs that by the directionality
   analysis appear to have a ’positive’ action on the disease, i.e. genes
   are up/down regulated antagonistically to the disease effect. This list
   of positive drugs can be obtained with the method described in the next
   section.

Drug ranking based on drug positivity scores

   For this analysis, we exploit the fact that two of the methods
   (Core&Peel and ModuleDiscoverer) we employ to build active subnetworks
   produce the active sub-network as a union of dense sub-graphs
   (ego-networks) of a large co-expression network (in detail this
   co-expression network has been produced for the DREAM challenge on
   disease module detection^[211]77). Since such network is built on the
   correlation of the expression-vectors over thousands of conditions, a
   high correlation vector implies that a pair of genes are generally both
   up or both down regulated at any given time. In a dense ego-network,
   such tendency is coherently common to all genes in the ego-network. We
   exploit this phenomenon as follows. For an ego-network that is
   significantly enriched in genes down-regulated by the disease, we
   subtract the number of up-regulated genes by the drug from the number
   of down-regulated genes by the drug. If this number is positive we say
   that the drug has a positive effect of the disease (for this
   ego-network). We do a symmetric calculation of the case of ego-networks
   that are significantly enriched in genes up-regulated by the disease.
   Summing these effects (with their sign) over all the ego-networks
   comprising the active subnetwork, we can compute the directionality of
   action of the drug and a magnitude (directionality score). We retain
   the drugs that have a positive directionality score and we rank them by
   the magnitude of the score. Note that this scheme relies just on
   counting the number of up/down regulated genes, not on the magnitude of
   the expression, as long as it is significant. In a variant of this
   scheme, we can use the size of the ego-networks as a weighting factor.
   This method is justified by an extension to the ego-networks of the
   phenomenon Chen et al.^[212]78 have noticed that in preclinical models
   of breast, liver, and colon cancers, reversal of gene expression
   between drug and disease correlates well with drug efficacy.
   Furthermore, Chen et al. validated the efficacy of drugs in xenograft
   models of the diseases.

Merging of drug lists

   Given two or more ranked lists of drugs, we wish to aggregate them into
   a single list that takes into account the ranking of any drug in each
   of the lists where it occurs. Following Gysi et al.^[213]8 we use the
   C-rank routine in Zitnik et al.^[214]79 to perform rank aggregation.
   C-rank is defined in Zitnik et al. to work with any custom-defined
   ranking of items (thus we do not make use of the graph-based community
   ranking scores for which c-rank was originally designed, instead we use
   it here as a generic rank aggregation method). Interestingly this rank
   aggregation method is oblivious of the origin of the ranking and we can
   thus easily aggregate lists obtained with different active subnetwork
   algorithms on different disease-DEG sets. The only caveat is that the
   lists supplied to c-rank must be of equal length, thus the case of
   missing drugs must be handled, by fitting the drugs missing from a list
   at its bottom with a scaled mean score value obtained from the other
   lists.

Performance evaluation

   The performance evaluation aims at assessing how well a predicted
   ranked list gives priorities in accordance to human expert judgment as
   codified by disease-treatment pairs extracted from the Therapeutic
   Target Database^[215]13 for the four benchmark diseases. Since for
   COVID-19 currently only one drug has gained acceptance as an approved
   clinical treatment, we resort to the list of drugs for the treatment of
   COVID-19 undergoing clinical trial as listed in the portal
   [216]https://www.clinicaltrials.gov/ (at Sept 5th, 2020). We use two
   main measures. The first is precision at 20 (precision@20), that is the
   number of drugs in the first 20 positions of the proposed list of
   repurposable drugs that are hits in the TTD-derived listing, or, for
   COVID-19, in the clinicaltrials.gov listing. The second is the
   Reciprocal Hit-Rank (RHR)^[217]80. For a list L of predictions length
   N, let the hits be in positions
   [MATH: <msub><mi>p</mi><mn>1</mn></msub> :MATH]
   ,
   [MATH: <msub><mi>p</mi><mn>2</mn></msub> :MATH]
   , ... ,
   [MATH: <msub><mi>p</mi><mi>h</mi></msub> :MATH]
   . We define :
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow><mi>R</mi><mi>H</mi><mi>R</mi><mrow><mo
   stretchy="false">(</mo><mi>L</mi><mo
   stretchy="false">)</mo></mrow><mo>=</mo><munderover><mo>∑</mo><mrow><mi
   >i</mi><mo>=</mo><mn>1</mn></mrow><mi>h</mi></munderover><mfrac><mn>1</
   mn><msub><mi>p</mi><mi>i</mi></msub></mfrac><mo>.</mo></mrow></mtd></mt
   r></mtable></mrow> :MATH]

   Note that this formula does not imply a fixed length for the lists
   being compared, and it is fair in comparing both long and short lists.
   A longer list may have more hits by chance, however it will have
   positive but diminishing returns for hits in its tail.

Statistical significance

   Let D be a set of drugs and
   [MATH: <mrow><mi>H</mi><mo>⊂</mo><mi>D</mi></mrow> :MATH]
   a golden standard for a disease (e.g. the drugs in D that are in
   clinical use for the disease). We generate n random permutations
   uniformly at random
   [MATH: <mrow><msub><mi>r</mi><mi>i</mi></msub><mrow><mo
   stretchy="false">(</mo><mi>D</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   , for
   [MATH:
   <mrow><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>n
   </mi></mrow> :MATH]
   and we take, for a performance measure
   [MATH: <mrow><msub><mi>μ</mi><mi>H</mi></msub><mrow><mo
   stretchy="false">(</mo><mo stretchy="false">)</mo></mrow></mrow> :MATH]
   depending on H, the mean m and standard deviation sd of the sequence
   [MATH: <mrow><mi>M</mi><mo>=</mo><mo
   stretchy="false">[</mo><msub><mi>μ</mi><mi>H</mi></msub><mrow><mo
   stretchy="false">(</mo><msub><mi>r</mi><mi>i</mi></msub><mrow><mo
   stretchy="false">(</mo><mi>D</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow><mspace width="4pt"></mspace><mspace
   width="0.333333em"></mspace><mtext>for</mtext><mspace
   width="0.333333em"></mspace><mspace
   width="4pt"></mspace><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo>
   <mo>,</mo><mi>n</mi><mo stretchy="false">]</mo></mrow> :MATH]
   . For a given drug ranking R(D), its zscore for
   [MATH: <mrow><msub><mi>μ</mi><mi>H</mi></msub><mrow><mo
   stretchy="false">(</mo><mo stretchy="false">)</mo></mrow></mrow> :MATH]
   is its deviation from the mean of the above described random
   distribution normalized by the standard deviation of the random
   distribution:
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow><mi>z</mi><mi>s</mi><mi>c</mi><mi>o</mi><mi>r
   </mi><mi>e</mi><mrow><mo stretchy="false">(</mo><mi>R</mi><mrow><mo
   stretchy="false">(</mo><mi>D</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mfrac><mrow><msub><mi>μ</mi><m
   i>H</mi></msub><mrow><mo stretchy="false">(</mo><mi>R</mi><mrow><mo
   stretchy="false">(</mo><mi>D</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow><mo>-</mo><mi>m</mi><mrow><mo
   stretchy="false">(</mo><mi>M</mi><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mi>s</mi><mi>d</mi><mo
   stretchy="false">(</mo><mi>M</mi><mo
   stretchy="false">)</mo></mrow></mfrac><mo>.</mo></mrow></mtd></mtr></mt
   able></mrow> :MATH]

   Assuming that the random distribution of the values of
   [MATH: <mrow><msub><mi>μ</mi><mi>H</mi></msub><mrow><mo
   stretchy="false">(</mo><msub><mi>r</mi><mi>i</mi></msub><mrow><mo
   stretchy="false">(</mo><mi>D</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   is normal, we obtain the (two sided) p-value by plugging the z-score
   value into the complementary cumulative distribution function (ccdf) of
   the normal distribution from the package scipy.stats of SciPy.org.

   The number n of random permutations is determined as follows. Starting
   with
   [MATH:
   <mrow><mi>n</mi><mo>=</mo><mn>100</mn><mo>,</mo><mn>000</mn></mrow>
   :MATH]
   , we increment n in steps of 10, 000, stopping when the value of both m
   and sd are stable, with a displacement from the previous iteration less
   than 0.01.

Supplementary information

   [218]Supplementary Information.^ (1.7MB, pdf)
   [219]Supplementary Table S1.^ (13.3KB, xlsx)
   [220]Supplementary Table S2.^ (12.7KB, xlsx)
   [221]Supplementary Table S3.^ (12.3KB, xlsx)
   [222]Supplementary Table S4.^ (12.4KB, xlsx)
   [223]Supplementary Table S5.^ (16.3KB, xlsx)
   [224]Supplementary Table S6.^ (13.2KB, xlsx)
   [225]Supplementary Table S7.^ (38.6KB, xlsx)
   [226]Supplementary Table S8.^ (13.1KB, xlsx)
   [227]Supplementary Table S9.^ (10.9KB, xlsx)

Acknowledgements