Abstract
Computational drug repositioning aims at ranking and selecting existing
drugs for novel diseases or novel use in old diseases. In silico drug
screening has the potential for speeding up considerably the
shortlisting of promising candidates in response to outbreaks of
diseases such as COVID-19 for which no satisfactory cure has yet been
found. We describe DrugMerge as a methodology for preclinical
computational drug repositioning based on merging multiple drug
rankings obtained with an ensemble of disease active subnetworks.
DrugMerge uses differential transcriptomic data on drugs and diseases
in the context of a large gene co-expression network. Experiments with
four benchmark diseases demonstrate that our method detects in first
position drugs in clinical use for the specified disease, in all four
cases. Application of DrugMerge to COVID-19 found rankings with many
drugs currently in clinical trials for COVID-19 in top positions, thus
showing that DrugMerge can mimic human expert judgment.
Subject terms: Predictive medicine, Virtual drug screening
Introduction
Among the strategies pharmacological science adopts for tackling
diseases (either new or old), there are: the development of new drugs,
the development of vaccines, and the repurposing of existing drugs.
While the development of vaccines and novel drugs are often effective,
they are also expensive and time-consuming. Thus repurposing of
existing drugs is a cheaper and faster route to explore when one wishes
to hedge one’s bets^[26]1.
As data about drugs and diseases accumulate in databases comprising
many aspects: genomic, transcriptomic, phenotypic, clinical, and
epidemiological, it becomes feasible and desirable to employ this body
of accumulated knowledge within a computerized system so as to have a
pre-screen in silico of the candidate drugs to be repurposed for a
specific disease^[27]2. These shortlisted drugs are in reduced number
with respect to the initial pool of drugs and thus are more amenable to
the subsequent steps of the full pipeline (including in vitro ed in
vivo experiments, clinical trials, and eventual deployment in clinical
practice). For existing drugs often their safety profile for humans is
already known, therefore once the initial in vitro and in vivo assays
are positive, they can be moved directly to clinical trials of phase II
or phase III (thus skipping phases 0 and I), and accelerating the drug
approval lifecycle.
The recent pandemic sparked by the SARS-CoV-2 virus has placed drug
repurposing in the spotlight because of the pressure on the public
health systems to find cures for acute cases of patients infected by
SARS-CoV-2. At the moment of writing, remdesivir has been approved by
regulatory bodies to treat acute COVID-19 cases needing
hospitalization. No other drug or treatment has passed all phases of
clinical trials leading to full approval by the Federal Drug
Administration (FDA) or the European Medicines Agency (EMA). Initial
drug investigations on COVID-19 were based on the accumulated
experience of the efficacy of drugs in vitro and in vivo experiments
for SARS-CoV, MERS-CoV, and other coronaviruses. Computerized drug
repurposing however aims at a more systematic approach to candidate
drug ranking and selection. Moreover, while having accrued data from
other viruses belonging to the same family is a bonus, we should not
always rely on having such data handy. In a longer perspective, both
the possible variations in the behavior of SARS-CoV-2, due to genetic
mutations, or the emergence of new human viral diseases due to
cross-species transmission, imply that an easy to use, accurate, and
fast pre-screening and ranking of repurposable drugs aimed at
countering any new viral threats is needed. Our study is a contribution
in this direction. Network-based drug repurposing is a recent new
approach to drug repurposing that may increase the effectiveness of
drug repurposing pipelines. In particular network-based drug
repurposing can merge many ‘omics’ data sets in a unified framework,
thus such approach may be more robust and capable of including systemic
biological effects^[28]3.
In Lucchetta et al.^[29]4 we established Core&Peel as a competitive
method for building Disease Active subnetworks (DAS) compared to
several state-of-the-art algorithms in several known benchmark
diseases. Moreover, we applied Core&Peel and the other algorithms to
build a collection of active subnetworks from gene expression data of
human tissues infected by COVID-19. Subsequent pathway enrichment
analysis revealed that in a COVID-19 infection are activated pathways
similar to those activated in other diseases (mostly viral infections).
These observations had implications for the choice of existing drugs
likely to have an impact on COVID-19 to be short-listed for in vitro
experiments and eventually go on to clinical trials. Although often the
connection between a disease-related pathway and a repurposable drug
was obvious in the specific context, sometimes the connection was hard
to establish since, for example, for a relevant disease-related pathway
too many drugs are known to be effective. Moreover, there was no
quantitative way to prioritize the potentially repurposable drugs
obtained from this type of qualitative analysis.
In this paper, building upon Lucchetta et al.^[30]4, we describe a
systematic way to rank repurposable drugs for a specific disease,
taking advantage of multiple active subnetwork algorithms and multiple
gene expression data sets. We exploit several drug databases to
identify drugs that can affect the expression of the genes included in
each disease active subnetwork and thus can contrast the disease.
Firstly, the drug ranking is based on drug significance (i.e. p-value).
After that, we explore the ranking based on ’positivity score’, which
measures the antagonist activity of the drug onto the disease. We also
measure the improvements attainable by merging more drug lists that
come from different DAS algorithms.
In our strategy we espouse the point of view that it is convenient to
use generalist, off-the-shelf, Disease Active subnetworks (DAS)
algorithms rather than developing ad-hoc algorithms for specific
networks or specific data. Generalist methods have the advantage of
extensive prior use and validation in a variety of tasks, not
necessarily related to drug discovery. We do not believe that a single
approach or algorithm to building DAS can be a ‘silver bullet’ for all
possible drug repositioning tasks, thus our aim is not that of
indicating a priori a winner among a pool of competing algorithms and
their combinations. Our methodology is based on using golden standard
data (e.g. drugs already validated in clinical use) as a benchmark to
determine the best performing algorithm or combination of algorithms to
produce a list with validated drugs in high ranking position. The hunch
is that the other not-yet-validated drugs high in such a ranking are
good drug candidates to shortlist for further downstream analysis.
Among the chosen methods, Core&Peel and ModuleDiscoverer are special
since they produce a covering of the Disease Active subnetworks by a
collection of dense ego-networks which are, in turns, used to determine
the positivity score. The remaining three methods do not have this
topological property. We have chosen a pool of well known and usable
algorithms as a proof of concept. It is likely that DrugMerge will be
improved in future releases by adding carefully chosen additional
methods for DAS, preferably based on novel principles and insights.
We use known clinically approved drugs for benchmark diseases, and
trial drugs for COVID-19 as golden standard. We apply two measures
(called precision@20 and reciprocal hit-rank) to assess how well
DrugMerge performs with respect to the golden standard. We show that
our method has better measurable performance than C-map^[31]5, a widely
used drug ranking portal, as well as COVID-19 specific drug rankings in
Taguchi et al.^[32]6 and Mousavi et al.^[33]7. The quality of our
ranking results on COVID-19 is arguably comparable to those in Gysi et
al.^[34]8 and Zhou et al.^[35]9, which use however quite different
strategies. The whole workflow of our method is shown in Fig. [36]1.
Figure 1.
[37]Figure 1
[38]Open in a new tab
Overview of the workflow. The first part of the figure shows how we
generated the data inputs of DrugMerge. These data have been created in
Lucchetta et al.^[39]4, and we reported them here for sake of
completeness. We used several transcriptomic data to identify
differentially expressed genes and the gene co-expression network to
find one active subnetwork for each method used (Core&Peel,
ModuleDiscoverer, Degas, ClustEx, and KeyPathwayMiner). For each of
them, we performed the drug enrichment analysis on four drug
perturbation databases: DrugMatrix, GEO, L1000, and CMAP. After that,
we ranked the drugs according to the p-value or the positivity score.
We also merged more drug lists, which come from different algorithms
for the same disease and we applied the same ranking strategy. Finally,
we evaluated the DrugMerge performance through the Precison@20 and the
Reciprocal Hit-Rank using the golden-standard drugs, collected in the
Therapeutic Target Database, and ClinicalTrials for COVID-19. The image
has been created using Adobe Illustrator.
Ruiz et al.^[40]10 observe that “...a drug’s effectiveness can often be
attributed to targeting genes that are distinct from disease-associated
genes but that affect the same functional pathways”, in contrast to
“... existing approaches assume that, for a drug to treat a disease,
the proteins targeted by the drug need to be close to or even need to
coincide with the disease-perturbed proteins”. Ruiz et al. thus build a
multi-layered interaction graph comprising proteins, drugs, diseases,
and functional pathways (extracted from the GO repository), and devise
a random-walk diffusion algorithm to connect the drug nodes and the
disease nodes of the model through paths that include the functional
nodes of the graph.
Our approach is intermediate between the functional-based method
advocated by Ruiz et al.^[41]10 and protein proximity based
methods^[42]8. We do not represent GO pathways as functional nodes in
the model, instead, we rely on a large and comprehensive gene
co-expression network to provide the functional association in an
implicit way. Dense subgraphs of the co-expression network will be
formed by genes that have a coherent pattern of being up/down
expressed, and thus are likely to be involved in specific cellular
processes, either directly, by expressing binding proteins, or
indirectly through cascading regulatory action chains, possibly
involving also ncRNA and other regulatory elements not explicitly
modelled. All algorithms used in this paper to build Active Network
will take advantage of the dense neighborhood of the co-expression
network, either directly (e.g. Core&Peel, ModuleDiscoverer) or
indirectly via the augmented local graph connectivity (KeyPathwayMiner,
Degas, ClustEx). In contrast with the proximity-based methods, we do
not define any ad hoc explicit distance function over a network, since
the Active Network algorithms already provide a clear-cut demarcation
for the part of the global gene network onto which a perturbation
should act, thus we can use directly indices for measuring module
enrichment or incidence of the drug’s differentially expressed genes
(DEG) onto the disease’s Active Network. We follow a network-based
approach to drug repositioning and we focus mainly on the human
transcriptional response to the disease, rather than the interaction
between the disease agent (e.g. a virus) and the transcriptional
response to it. In the case of COVID-19, also it is known that an
over-response by the immune system is often the main cause of death in
acute cases^[43]11, [44]12. Our strategy for COVID-19 is thus to
prioritize drugs that act on a disease active subnetwork embedded into
a global human gene co-expression network, to counter the virus’ global
effects. This approach is complementary to a different one that aims at
drugs interfering with the interactions of the host-proteins with the
viral-protein. While the former approach aims at controlling the global
transcriptional response of human tissues to the virus, the latter
favors drugs blocking key aspects of the virus actions in entering the
host cells, and activating the cells’ biochemical machinery for
reproduction.
Results
We compute as main quality measures the reciprocal hit ranking (RHR)
and the precision at 20 (precision@20) of a ranking with respect to a
golden standard : the Therapeutic Target Database (TTD)^[45]13 records
for four benchmark diseases (asthma, rheumatoid arthritis, colorectal
cancer, and prostate cancer), and the trials.gov records for COVID-19.
Each measure is then normalized via a z-score, and corresponding
p-value, referred to the distribution of quality measures given by
taking repeatedly random permutations of the input drug data set. The
results are shown in Figs. [46]2, [47]3, [48]4, [49]5, [50]6 and [51]7,
Supplementary Figs. [52]S1 and [53]S2 (in Sects. [54]4 and [55]5 of
Supplementary Material), and Supplementary Tables [56]S1–[57]S8. For
DrugMerge we report only significant results at p-value
[MATH: ≤ :MATH]
0.05 for at least one of the two quality measures. The DrugMerge method
could attain significant results for all four benchmark diseases on at
least one Drug Database (GEO, DrugMatrix, CMAP, L1000), and often on
more than one. Moreover, as reported in the actual rankings in
Supplementary Table [58]S7, for the four benchmark data sets the
reported drug ranking always has a drug in clinical use at the top of
the list for the chosen Drug data set (attaining the best p-value). In
Fig. [59]4 we compare, using the p-value, the performance of DrugMerge
versus the CMAP algorithm^[60]5 on the four benchmark diseases (see
“[61]Description of CMAP algorithm” section in [62]Supplementary
Material). DrugMerge has always a better performance in terms of
p-value, even when the same CMAP drug database is used also by
DrugMerge. Next, we discuss in detail the results for colorectal
cancer, prostate cancer, and COVID-19. The results of asthma and
rheumatoid arthritis are reported in Supplementary Material (Sects.
[63]4, [64]5).
Figure 2.
[65]Figure 2
[66]Open in a new tab
DrugMerge performance on colorectal cancer data. The bars represent the
[MATH:
-log10<
mrow>(pvalue) :MATH]
with respect to the precision@20 (red), and the RHR (light blu). On the
x-axis, different algorithms or combinations of them are shown. The
numbers on the top of the bars show the absolute values of precision@20
or RHR. The dotted red line represents the limit of significance (
[MATH:
-log10<
mrow>(0.05) :MATH]
). All the bars above the line show a significant p-value. The (a)
panel represents the DrugMerge performance when only FDA-approved drugs
are considered; (b) when all drugs (without any FDA filtering) are
considered. The plot has been generated by the ggplot2^[67]81 R
package.
Figure 3.
[68]Figure 3
[69]Open in a new tab
DrugMerge performance on Prostate Cancer data. The bars represent the
[MATH:
-log10<
mrow>(pvalue) :MATH]
with respect to the precision@20 (red), and the RHR (light blu). On the
x-axis, different algorithms or combinations of them are shown. The
numbers on the top of the bars show the absolute values of precision@20
or RHR. The dotted red line represents the limit of significance (
[MATH:
-log10<
mrow>(0.05) :MATH]
). All the bars above the dotted line show a significant p-value. The
(a) panel represents the DrugMerge performance when only FDA-approved
drugs are considered; (b) when all drugs (without any FDA filtering)
are considered. The plot has been generated by the ggplot2^[70]81 R
package.
Figure 4.
[71]Figure 4
[72]Open in a new tab
Comparison of performance of DrugMerge and the CMAP algorithm across
the four benchmark diseases. For Colorectal Cancer and Prostate Cancer
both methods use CMAP data. For Asthma and Rheumatoid Artrithis
DrugMerge uses GEO data. The bars represent the
[MATH:
-log10<
mrow>(pvalue) :MATH]
with respect to the precision@20 (red), and the RHR (light blu). The
numbers on the top of the bars show the absolute values of precision@20
or RHR. The dotted red line represents the limit of significance (
[MATH:
-log10<
mrow>(0.05) :MATH]
). All the bars above the dotted line show a significant p-value. The
plot has been generated by the ggplot2^[73]81 R package.
Figure 5.
[74]Figure 5
[75]Open in a new tab
DrugMerge performance on COVID-19 data restricted to drugs with FDA
approval. The bars represent the
[MATH:
-log10<
mrow>(pvalue) :MATH]
with respect to the precision 20 (red), and the RHR (light blu). On the
x-axis, different algorithms or combinations of them are indexed with
aliases, since the complete algorithm names are too long. The mapping
between them is on Table S9. The numbers on the top of the bars show
the absolute values of precision@20 or RHR. The dotted red line
represents the limit of significance (
[MATH:
-log10<
mrow>(0.05) :MATH]
). All the bars above the dotted line show a significant p-value. The
plot has been generated by the ggplot2^[76]81 R package.
Figure 6.
[77]Figure 6
[78]Open in a new tab
DrugMerge performance on COVID-19 data without FDA approval
restriction. The bars represent the
[MATH:
-log10<
mrow>(pvalue) :MATH]
with respect to the precision@20 (red), and the RHR (light blu). On the
x-axis, different algorithms or combinations of them are indexed with
aliases, since the complete algorithm names are too long. The mapping
between them is on Table [79]S9. The numbers on the top of the bars
show the absolute values of precision@20 or RHR. The dotted red line
represents the limit of significance (
[MATH:
-log10<
mrow>(0.05) :MATH]
). All the bars above the dotted line show a significant p-value. The
plot has been generated by the ggplot2^[80]81 R package.
Figure 7.
[81]Figure 7
[82]Open in a new tab
Performance of drug rankings reported in literature for COVID-19 drug
repurposing. The (a,b) panels show the absolute values of RHR and
precision 20, respectively. The (c,d) show the
[MATH:
-log10<
mrow>(pvalue) :MATH]
of the RHR and precision 20, respectively. The dotted red line
indicates the 0.05 significance threshold. The analyses have been
performed using only FDA-approved drugs (green bars) and without any
FDA filtering (red bars). The plot has been generated by the
ggplot2^[83]81 R package.
DrugMerge results on colorectal cancer
Figure [84]2 and Supplementary Table [85]S3 show that DrugMerge finds
significant rankings in several drug databases (L1000, CMAP, GEO),
using several algorithms, both when all drugs are considered or when
only FDA-approved drugs are considered. For the L1000 dataset
Supplementary Table [86]S7, we find in the first position the
clinically relevant drug erlotinib. Other drugs in clinical use
according to the TTD records that appear in the top 20 positions are
lapatinib and sunitinib. Among the top positions in the ranking, we
find drugs with known effects in colorectal cancer patients, related
animal models, or cell lines, and one of these went into clinical trial
stage: mitoxantrone^[87]14, sorafenib^[88]15, sutent/sunitinib (in
clinical trial [89]https://clinicaltrials.gov/ct2/show/NCT00961571),
dasatinib^[90]16, sirolimus (rapamycin)^[91]17, azacitidine^[92]18,
imatinib^[93]19, tamoxifen^[94]20, doxorubicin^[95]21,
naltrexone^[96]22, and digoxin^[97]23 . Mitoxantrone has been tested in
13 patients with advanced or metastatic colon carcinoma and the authors
observed that mitoxantrone was moderately effective^[98]14. Sorafenib
is a kinase inhibitor drug approved for the treatment of some cancers.
It is known that sorafenib has a strong antitumor and antiangiogenic
effect on the colorectal cancer cell line^[99]15. Scott et al.^[100]16
demonstrated that dasatinib had significant anti-proliferative activity
in a subset of a colorectal cancer cell line, even if dasatinib is
currently being studied in combination with chemotherapy, as its use as
a single agent appears limited. Wagner et al.^[101]17 studied the
effects of rapamycin in mice and showed that it was able to suppress
advanced stage colorectal cancer. Azacitidine has been tested in
combination with entinostat in metastatic colorectal cancer patients,
resulting in a tolerable therapy^[102]18. Imatinib is an oral
chemotherapy medication used to treat cancer and it has been proved
that it inhibits proliferation of colon cancer cells^[103]19. Kuruppo
et al.^[104]20 demonstrated that tamoxifen has a potent inhibitory
action on colorectal cancer metastases in a murine model. Doxorubicin
is a chemotherapy medication used to treat cancer. In Lee et
al.^[105]21 the authors propose doxorubicin-loaded oligonucleotides
attached to gold nanoparticles as a drug delivery system for cancer
chemotherapy. They tested this approach in colon cancer cell lines,
demonstrating the drug’s efficacies such as in vitro cytotoxicity.
Naltrexone is primarily used to manage alcohol or opioid dependence. Ma
et al.^[106]22 explored the inhibitory effect of low-dose naltrexone on
colorectal cancer progression in vivo and in vitro. Digoxin is used to
treat various heart conditions such as atrial fibrillation, and heart
failure. A recent study has examined the effects of digoxin on two
kinds of colorectal cancer cells. This study suggests that digoxin has
the potential to be applied as an antitumor drug via inhibiting
proliferation and metastasis^[107]23.
For the remaining drugs in the top twenty positions, we focus on their
action mechanism (i.e. we determine the main pathways affected by the
drug having the desired pharmacological effect), and we find some links
with colorectal cancer: epirubicin^[108]24, everolimus^[109]25,
itavastatin^[110]26, and dactinomycin^[111]27. Epirubicin is a
chemotherapy agent, intercalating into DNA and inhibiting topoisomerase
II, thereby inhibiting DNA replication. In Ref.^[112]24 the authors
demonstrate topoisomerase IIa gene and protein alterations in
colorectal cancer and show that increased expression is associated with
pathologically advanced disease. Everolimus is a medication used as an
immunosuppressant and in the treatment of several tumors. Its main
mechanism of action is mTOR inhibition. Thiem et al.^[113]25
demonstrated that the mTORC1 inhibition reduces the inflammation
associated with the gastrointestinal tumor. Itavastatin (or
pitavastatin) belongs to the class of statins, which are inhibitors of
HMG-CoA reductase, the enzyme that catalyzes the first step of
cholesterol synthesis. A meta-analysis work^[114]26 showed that statin
use is a protective factor for colorectal cancer prognosis, and it
might be significantly associated with lower overall mortality.
Finally, dactinomycin is a chemotherapy medication and can inhibit
transcription. Recent studies have revealed crucial roles of
transcription regulation in colorectal cancer development. In
particular, deregulation of transcription factors is pretty frequent,
and drastic changes in gene expression profiles play fundamental roles
in the multistep process of tumorigenesis^[115]27.
DrugMerge results on prostate cancer
Figure [116]3 and Supplementary Table [117]S4 show that DrugMerge finds
significant rankings in several drug databases (L1000, GEO) for
prostate cancer, using several algorithms, both when all drugs are
considered or when only FDA approved drugs are considered. For the
L1000 data set Supplementary Table [118]S7, we find in the first
position the clinically relevant drug mitoxantrone. Other drugs in
clinical use according to the TTD records that appear in the top twenty
positions are dasatinib, decitabine, menadione, and doxorubicin
hydrochloride. Among the top twenty positions in the ranking we find
drugs with known effects in prostate cancer patients, related animal
models, or cell lines, and one of these went into clinical trial stage:
erlotinib^[119]28, sorafenib^[120]29, lapatinib^[121]30,
sirolimus^[122]31, sutent (sunitinib)^[123]32, azacitidine^[124]33,
everolimus^[125]34, gemcitabine (in clinical trial
[126]https://clinicaltrials.gov/ct2/show/NCT00276549),
ciclosporine^[127]35, and auranofin^[128]36.
A study of 30 patients with advanced or metastatic prostate cancer has
been conducted to test the effect of erlotinib^[129]28, and it
demonstrated an improvement in clinical benefit. Sorafenib has been
tested in patients with hormone-refractory prostate cancer^[130]29,
indicating that the treatment may be associated with good outcomes in
terms of overall survival. Lapatinib is a dual tyrosine kinase
inhibitor of the epidermal growth factor 1 (EGFR) and 2 (HER2). In a
study of twenty-nine castration-resistant prostate cancer patients,
lapatinib showed single-agent activity in a small subset of unselected
patients^[131]30. Nandi et al.^[132]31 investigated the effect of
sirolimus encapsulated liposomes in two different prostate cancer cell
lines, and noticed an antiproliferative effect. Sunitinib, an inhibitor
of tyrosine kinase receptors, has been studied in men with
castration-resistant prostate cancer^[133]32, indicating that it is
well tolerated and may have modest benefit in this patient population.
Azacitidine, a demethylating agent, has been tested in combination with
docetaxel, and prednisone in patients with metastatic
castration-resistant prostate cancer and showed to be active in this
subset of population^[134]33. Everolimus, an mTOR inhibitor, has been
assessed in mice, resulting in an inhibition of prostate cancer
growth^[135]34.
For the remaining drugs, we focus on their action mechanism, and we
find some links with prostate cancer for imatinib^[136]37,
epirubicin^[137]38, and mitomycin C^[138]39. Imatinib is an oral
chemotherapy medication and functions as a tyrosine kinase inhibitor.
Vicentini et al.^[139]37 investigated the effects of the epidermal
growth factor receptor tyrosine kinase inhibitor on human prostatic
cancer cell lines, suggesting that it may have the potential in
blocking tumor growth and progression. Epirubicin has been already
found in colorectal cancer and as mentioned before, it is a
topoisomerase II inhibitor. An in vitro cell-based assay^[140]38
demonstrated that MHY336, a topoisomerase II inhibitor, significantly
inhibited the proliferation of three prostate cancer cell lines.
Finally, mitomycin C is used as a chemotherapeutic agent and it is a
potent DNA crosslinker. DNA crosslinking has useful merit in
chemotherapy and targeting cancerous cells for apoptosis^[141]40. It is
known that evading cell death is one hallmark of cancer, in particular,
prostate cancer cells develop multiple apoptosis blocking strategies
during the various stages of tumor progression^[142]39.
DrugMerge results on COVID-19
Figures [143]5 and [144]6 and Supplementary Table [145]S5 show that
DrugMerge finds significant rankings in all drug databases (L1000,
CMAP, GEO, DrugMatrix), using a variety of algorithms and their
combinations, both when all drugs are considered and when only FDA
approved drugs are considered. For COVID-19 we have a choice of three
disease data sets : (i) human adenocarcinomic alveolar basal epithelial
A549 cells (cells); (ii) bronchoalveolar lavage fluid RNA-Seq (BALF);
(iii) infected patient peripheral blood mononuclear cells (PBMC), four
Drug datasets (L1000, CMAP, GEO, and DrugMatrix), and several
algorithms and weighting functions. We proceed by first running a
single algorithm on a single drug data set, collecting the
configurations that give the best significant performance measures.
Next, we seek improvements by merging the rankings for several
algorithms with significant performance on a single disease data set.
And finally, we combine the improved significant results by merging the
rankings obtained by several significant algorithms on several COVID-19
disease data sets. Figures [146]5 and [147]6 and Supplementary
Table [148]S5 report the best results in each phase, while Table S9
report the full algorithmic settings.
We focus now on two such rankings in Supplementary File [149]S7
attaining the best p-values on RHR for all drugs or FDA-approved drugs
only. The highest p-value on RHR when only FDA-approved drugs are
considered is attained on the CMAP Drug dataset using all three disease
data sets for COVID-19 and three algorithms (ClustEx, Core&Peel,
Degas), (Alg 01 in Fig. [150]5, see Supplementary Table [151]S9). The
top two positions of the ranking report two drugs listed in clinical
trial on [152]www.trials.gov for COVID-19: etoposide and mefloquine.
Within the first 20 positions, further 5 drugs are in clinical trials
for COVID-19: colchicine, chlorpromazine, propofol, tretinoin, and
ivermectin. Among the top ten positions in the ranking we find drugs
with known effects animal models or cell lines or proposed for further
scrutiny in silico screenings: thioridazine^[153]41,
prochlorperazine^[154]42, primaquine^[155]43, fluphenazine^[156]44. The
highest p-value on RHR when all drugs are considered is attained by
KeyPathwayMiner (KPM) on the L1000 drug data set with the BALF COVID-19
disease data set (Alg 17 in Fig. [157]6, see Supplementary
Table [158]S9). The top position of the ranking reports a drug listed
in clinical trial on [159]www.trials.gov for COVID-19: sirolimus.
Within the first twenty positions further 4 drugs are in clinical
trials for COVID-19: decitabine, tamoxifen, simvastatin, and imatinib.
Among the top ten positions in the ranking we find drugs with known
effects in animal models or cell lines or proposed for further scrutiny
in in silico screenings: lapatinib^[160]45, dasatinib^[161]46,
mitoxantrone^[162]47, everolimus^[163]48, sunitinib^[164]49.
For the remaining drugs in the top 10 positions in both lists, we
determine the main mechanism of action (i.e. we determine the main
pathways affected by the drug having the desired pharmacological
effect) and we then look for hints in the literature of relevance of
these pathways for COVID-19. In the first list, we find
methylergometrine, which is a partial agonist and antagonist of
serotonin, dopamine, and
[MATH: α :MATH]
-adrenergic receptors. Costa et al.^[165]50 conjectured that serotonin
inhibitors could have a neuroprotective effect in patients affected by
COVID-19. In the second list, we find erlotinib, stuent/sunitinib,
azacitidine, and sorafenib. Erlotinib is an epidermal growth factor
receptor (EGFR) inhibitor in use for the treatment of non-small cell
lung cancer (NSCLC) and pancreatic cancer. Epidermal growth factor
receptor (EGFR) signaling is known to be involved in the progress of
viral infections^[166]51, including those caused by SARS-CoV^[167]52
and SARS-CoV-2^[168]53, thus it is a plausible target pathway to
consider. Sunitinib (Sutent) multi-targeted receptor tyrosine kinase
(RTK) inhibitor used in the treatment of renal cell carcinoma (RCC) and
imatinib-resistant gastrointestinal stromal tumor (GIST). Weidberg et
al.^[169]46 recently advocated the repurposing of kinase inhibitors for
the treatment of COVID-19. Azacitidine has been shown effective against
human immunodeficiency virus (HIV) in vitro^[170]54 and human
T-lymphotropic virus (HTLV)^[171]55. Its action is mainly at an
epigenetic level, inhibiting DNA methyltransferase, causing
hypomethylation of DNA. The role of epigenetics in COVID-19 is a
growing area of research with large potential for advances in drug
target identification^[172]56. Sorafenib is a kinase inhibitor drug
approved for the treatment of advanced primary renal cell carcinoma,
advanced primary liver cancer, and forms of resistant advanced thyroid
carcinoma. Sorafenib is a protein kinase inhibitor with activity
against the Raf/Mek/Erk pathway^[173]57. Several authors have explored
the role of MAP Kinase signaling pathway in the coronavirus family at
large and for SARS-CoV-2 in particular^[174]58. In Supplementary
File [175]S8 and Fig. [176]7 we report performance measures for the
proposed COVID-19 drug repurposing rankings from Taguchi et al.^[177]6,
Mousavi et al.^[178]7, Gysi et al.^[179]8, and Zhou et al.^[180]9.
For Taguchi et al. and Mousavi et al. the drug data sets used are
overlapping with ours thus it is possible to compute the significance
of the performance measures on a common basis. Interestingly as shown
in Supplementary File [181]S8 no such rankings attain the significance
threshold p-value
[MATH: ≤ :MATH]
0.05 in any configuration. For Gysi et al. and Zhou et al., the drug
data sets used are different from the ones we use, and the two methods
are not based on gene expression data, and it is not possible to
determine with confidence the initial pool of drug candidates. For this
reason, we do not compute the normalized significance, and we report
directly the absolute performance measures. The results of Gysi et al.
are in absolute value superior to those of Zhou et al. and
qualitatively comparable to those obtained by DrugMerge on GEO data for
precision@20, and on all four Drug datasets for RHR. In particular,
Gysi et al.^[182]8 use data on interactions between viral and human
proteins in Gordon et al.^[183]59 to seed their active subnetwork and
finish their listing with the help of expert advice, which we do not
use. Gysi et al. also have a notion of ‘positive’ drug action akin to
the one used in our paper.
Discussion
As drug repositioning is a vast subject, with implications from several
areas of biology and medicine, we comment on the relationship of
DrugMerge with several issues arising in the Drug Repositioning
literature. Each issue is introduced by a heading.
Three strategies
We can identify three main strategies in drug repositioning (DR): the
first is disease-oriented DR (i.e. find repurposable drugs for a
specified disease), the second is drug-oriented DR (finding a disease
for a specified drug), the third is generic DR (find any novel
drug-disease pair). The approaches may vary significantly across the
three types, and also the validation methodology is strongly influenced
by the objective of the strategy. In this study we tackle
disease-oriented DR, however, we should point out that the base
methodology can in principle be adapted easily to the other two
strategies, provided sufficient high-quality data is available.
DrugMerge in a nutshell
DrugMerge is a proof-of-concept exploring a novel use of disease active
subnetworks in the context of drug repositioning aimed at a specific
disease. In particular, DrugMerge uses multiple Disease Active
subnetworks (DAS) algorithms and a state-of-the-art rank merging
procedure in conjunction with two scoring functions. The first scoring
function is directly the p-value of the enrichment of a drug DE genes
within the DAS. The second scoring function, called ‘positivity score’,
measures the antagonistic activity of the drug onto the disease, as
mediated by a covering of the DAS as a collection of dense ego-networks
computed by two methods: Core&Peel and ModuleDiscoverer. Matching the
resulting rankings with the golden standard of a disease indicates the
configuration attaining the best results, and thus those likely to
include good candidates for drug repositioning in top positions (see
“[184]Methods” section for more details).
Performance on real data
The tests involving four benchmark diseases with drugs in clinical use
did confirm the superior performance of the proposed method over the
current state of the art (e.g. CMAP) in placing such clinically
approved drugs in top-ranking positions. In this case, the evaluation
criterion is very strict, however we were able to guess correctly the
efficacy and safety of drugs administered to patients basing our
analysis only on transcriptomic data from in vitro cell lines and
tissues. Interestingly, while efficacy is in principle detectable by
examining specific types of cells or tissues primarily affected by a
disease, safety is a much broader concept involving all human organs
and tissues, thus harder to pinpoint with localized experiments either
in vitro or in silico.
Tests on the data from the COVID-19 disease also show, according to our
validation criterion, a performance superior to some of the approaches
proposed in the literature, while for some others we notice a broadly
equivalent outcome, even when using different approaches and different
input data. Results on COVID-19 should be taken with caution, however,
as our validation methodology implies that the proposed ranking is able
to mimic the best judgment of human experts, which may be based on
their prior knowledge and expertise, data available to them, and their
assessment of the odds, leading to a shortlisting of such drugs in
clinical trials. Thus our validation methodology might indeed be
failing us in cases where the disease to be tackled has key features
that escape our present general understanding. What can be gained in
our exercise is a way to make drug shortlisting for in vitro further
assessment (and eventual accession to clinical trial) more systematic,
while also using relatively cost effective data sources.
Use of trascriptomic data
Besides the global gene co-expression network, which we can assume as
background knowledge, our method relies on rather standard and
relatively inexpensive RNA-Seq and microarray transcriptomic
measurements of differentially expressed genes in infected
cells/tissues. Systematization and finding alternatives to pure
serendipity (discovery by chance) are key aspects of all in silico
based drug repositioning efforts. In the “[185]Related work” section on
Supplementary Material, we discuss other related works on drug
repositioning. One bonus of this parsimonious data requirement is that
it becomes easier to track the evolution of viral Mechanism of Action
and disease etiology as the viral agent mutates, thus in principle the
drug ranking, and thus follow up decisions, may change rapidly when
transcriptomic data on new viral strains of COVID-19 become available.
Other coronaviruses
SARS-CoV-2 is a member of the coronavirus family, responsible also for
the SARS and MERS epidemics. Many studies, in particular at the end of
2019 and beginning of 2020, make up for the paucity of data on
SARS-CoV-2 by making use of genomic, transcriptomic, proteomic, and
pharmacological data relative to SARS-CoV and MERS-CoV and other
members of the coronavirus family, including those known to be hosted
by other animal species.
In our study, we chose not to use data from viruses different from
SARS-CoV-2 as primary data. This choice comes from two orders of
considerations. The first is that data from close relatives of the
pathogen may not always be available, and there is thus a general long
term advantage attained by methods that rely just on relatively small
and focused data. The second issue is that we focus not on the pathogen
in itself but on the human transcriptional response to it, which is
known to be rather different across SARS-CoV-2, SARS-CoV, and MERS
epidemics. In this line of reasoning, it makes more sense to highlight
what makes human transcriptional response to SARS-CoV-2 different from
that induced by other coronaviruses, even when the causing pathogens
are phylogenetically very close.
Current limitations of DrugMerge
We use the list merging procedure to merge lists obtained by several
algorithms and several DE gene data sets from different cell
lines/tissues but always referred to a single drug data sets, and a
single disease. Different drug data sets are often obtained using
different initial candidate drugs and use different technologies.
Moreover sometimes, due to the different technologies and other
experimental settings, even when we restrict the analysis to common
drugs, replicability across drug data sets may be low (see
e.g.^[186]60, [187]61 reporting low replicability across CMAP and L1000
data using the CMAP algorithm). Moreover the performance measures we
use measure the quality of the whole ranking, and the ranking itself
gives a way to compare drugs listed by position. However, based on
these measures we refrain from comparing (and thus give a relative
order) to two drugs in different rankings obtained with different
perturbation drug data sets. Such a global ranking would need further
insight and technological bias correction, which we leave for future
research.
Dose dependency
Drug datasets report often measurements for a range of different
dosages taken at a range of different time points. For the p-valued
ranking, we use the dosage/time point giving the lowest p-value. For
rankings based on positivity score, we take the average positivity
score. These choices are justified for an initial assessment of drug
potential to be effective against a disease. Dosage and time points
could be taken into account together with available dose-dependent
toxicity data in a second level of analysis when additional data is
collected in in vitro experiments.
Tissue specificity
For this study, we use all available transcriptomic drug and disease
data regardless of the cell lines/tissues/species for which the
transcriptomic measurements are taken. This choice increases the
robustness of the results in general, however, it might miss some more
subtle tissue-specific phenomena.
On the one hand, it is known that the SARS-CoV-2 virus infects several
human organs including lungs, liver, kidneys, and gut^[188]62, and the
cells transcriptional response may in principle be diverse in each
tissue. However, Dudley et al.^[189]63 analyzing a large repository of
disease-related gene differential expression data conclude that
“molecular signature of disease across tissues is overall more
prominent than the signature of tissue expression across diseases”. Our
analysis of active subnetworks extracted from three COVID-19 data sets
(BALF, PBMC, cells) also supports for COVID-19 a general high
similarity of gene expression across the sample types. Differences in
the extracted active networks are mainly due to the algorithms used.
Also, drugs may act differently in different cell types. For example,
chloroquine has been found to be ineffective in infected human lung
cells^[190]64, while it was found effective in infected Vero cells (a
cell line isolated from kidney epithelial cells extracted from an
African green monkey)^[191]65. Similar tissue-specific differences in
drug sensitivity are noticed by Dittmar et al.^[192]66. In principle,
if a variety of options are available, one may opt for including only
the data drawn from biological samples more similar to the actual
tissues affected by the disease in patients.
In our study, we use a large variety of cellular types for drug
perturbation analysis. Drug matrix has drug perturbation signatures
available on liver, heart, kidney, thigh muscle, and primary
hepatocytes (from Rattus norvegicus). GEO has drug perturbation
signatures for several tissues and species including homo sapiens, mus
musculus and Rattus norvegicus. CMAP measures drug perturbations in
breast cancer epithelial cell line MCF7 (MCF7/ssMCF7), in prostate
cancer epithelial cell line PC3, and the nonepithelial lines HL60
(leukemia) and SKMEL5 (melanoma). L1000 collects drug perturbation data
from 98 different cellular contexts including human primary cell lines
and human cancer cell lines.
For COVID-19 we report two drug rankings attaining top performance, one
is obtained by merging data from all three tissues used to measure DEG,
therefore it indicates evidence for drugs that may act on a general
systemic level across different human organs. This is complemented by
the second ranking which is attained specifically for BALF measurements
(bronchoalveolar lavage fluid) and thus it may indicate drugs with a
specific action in pulmonary tissues.
Drug mechanisms of action
One advantage of our approach, in contrast with many others, is that we
do not have an initial explicit guess for a putative target protein for
the drug or a putative target pathway (or mechanism of action).
However, once relevant drugs are shortlisted we can analyze their
likely mechanism of action, either through the literature or through
the use of specialized databases.
On specific drugs in clinical trials
In this study, we have used four public Drug-related data sets (GEO,
DrugMatrix, CMAP, and L1000) representing a sufficient drug basis for
our investigation and the validation of the proposed methodology.
However, a much larger amount of useful data on drugs is proprietary
therefore it is important to devise incentives for sharing these
repositories of useful data in a global drug repositioning strategy. As
a consequence of the non-complete drug coverage of our data sets we
could not assess some drugs whose use in COVID-19 cases has been
advocated, such as remdesivir (approved by FDA in October 2020^[193]67)
or lopinavir (not approved by FDA). Cao et al.^[194]68 recently
reported the results of a randomized clinical trial involving 199
hospitalized adult patients with severe COVID-19: no benefit was
observed with lopinavir-ritonavir treatment over the standard care
cohort. Interestingly, we have data from DrugMatrix on ritonavir (not
approved by FDA), which fails the positivity score filter, thus our
results on ritonavir are consistent with the findings in Cao et al.
which are at a clinical level.
Role of comorbidities
Comorbidities in patients affected by COVID-19 are important clinical
aspects, as it is known that comorbidities are one of the strongest
risk factors for such patients, strongly affecting mortality
rates^[195]69. Comorbidities may alter drastically the effect of drugs.
The disease gene expression data (for all five diseases) we use in this
study do not come from patients with known comorbidities, therefore the
results of our study should be interpreted in such a setting. Fiscon et
al.^[196]70 try to exploit known comorbidities as a positive tool in
modeling the relevant features of COVID-19, while other authors, based
on protein target analysis, have a more skeptical view^[197]8: “In
summary, we find that the SARS-CoV-2 targets do not overlap with
disease genes associated with any major diseases, indicating that a
potential COVID-19 treatment can not be derived from the arsenal of
therapies approved for specific diseases”. The effect of concomitant
comorbidities on repurposed drugs might be better approached when
transcriptomic data from these sub-populations of COVID-19 patients
become available.
Drug combinations
In this study, we do not treat the issue of combinations of drugs. This
is a novel area of research in which network-based approaches have been
shown to give novel insight and may help in coping with the
combinatorial explosion of drug combinations. However, validated data
on drug combinations is limited thus making the task of performance
evaluation and method validation more complex w.r.t the analysis for
single drugs. We plan to incorporate models of multi-drug/disease
interactions in our setting as future research^[198]71.
Toxicity and adverse events
Drugs can be toxic or cause adverse events. Toxicity can be detected
via ad-hoc in vitro ed in vivo experiments or it can be predicted via
in silico models^[199]72. Any of these techniques should be applied
downstream to the use of DrugMerge for extra safety. DrugMerge uses two
forms of filtering to handle adverse effects and toxicity. The first
mechanism is the inclusion of drugs having a contrary effect to the
disease as measured by the positivity score (which in turn is similar
to a mechanism also used by Gysi et al.). The second filtering is via
the possibility to limit the analysis to FDA-approved drugs, for which
toxicity and adverse effects are known and have been considered to be
outweighed by the benefits in certain conditions. This mild form of
filtering is sufficient to increase the performance in our tests and
thus mimic human expert judgment, which includes consideration of
toxicity and adverse effects. However, toxicity should be always be
reconsidered ex-post to finalize any shortlisting decision.
Role of the disease causes
Since we base our approach on the transcriptomic responses of human
cells/tissues to drugs and diseases, the cause of the disease (viral,
bacterial, genetic, metabolic, etc.) is not critical for the
application of the methodology. Indeed in our study, we applied it to a
variety of cases. Asthma is an inflammatory disease of the airways of
the lungs having several triggering causes, including exposure to air
pollution and allergens. Rheumatoid arthritis (RA) is a long-term
autoimmune disorder that primarily affects joints. Cancer is generally
associated with alterations of the genetic code in somatic human cells.
Finally, COVID-19 is of viral origin. The advantage of generality is
balanced by the possibility that a high-ranking drug is effective
against symptoms or side-effects of the diseases, rather than acting on
its main mechanism of action. In one of the drug rankings for COVID-19,
we find within the top 20 positions prochlorperazine which has both
antiviral activity^[200]41 and is also a broad-spectrum antiemetic
contrasting COVID-19 symptoms like nausea and vomiting^[201]42. At this
level of analysis, DrugMerge does not yet distinguish the different
actions a drug might have thus further investigations are needed to
untangle complex drug actions.
Conclusions
In this conclusive section we summarize the features of DrugMerge and
we comment on possible extensions and applications, as well as
additional benefits of drug prioritization in general.
In this study, we describe DrugMerge, a methodology for ranking
repurposable drugs where we rank drugs by their ability to affect
collections of disease active subnetworks (computed by an ensemble of
state-of-the-art algorithms) and contrast the perturbation induced by
the disease in such sub-networks. DrugMerge should be considered as a
proof-of-concept and its predictions should be considered as
indications for shortlisting drugs for further in vitro analysis rather
than absolute predictions. In particular, aspects like dosage, tissue
specificity, and a more refined analysis of possible adverse side
effects will need to be tackled further in developing DrugMerge towards
a full-fledged product.
Network drug-disease model
The rationale behind our approach is a model in which drugs and
diseases interact indirectly by their mutual action on the sub-networks
rather than directly by affecting the same target proteins. Moreover,
we are able to leverage an ensemble of different algorithms for
computing active subnetworks, and then merge the resulting rankings,
thus increasing the robustness of the methodology. Results of four
benchmark diseases are encouraging since we could find in all four
cases drugs in clinical use (thus both safe and effective) in first
ranking positions, and several other drugs in clinical use in high
ranking positions.
Benchmarking with prior knowledge
These results imply that DrugMerge when provided with input data from
transcriptional assays, has the potential for coming close to expert
human judgment, and guess drugs in actual clinical usage. For COVID-19
we measure how close we come to guessing drugs that have been
shortlisted for clinical trials according to the FDA records. For most
drugs in high ranking positions but not on clinical trials we have
found several supporting evidence in the literature or by direct
assessment of their likely Mechanism of Action, that makes their
indication by DrugMerge biologically plausible.
Other benefits of drug prioritization
Prioritizing drugs for in vitro testing and eventual accession to
clinical trials is valuable for decision makers in a pandemic
situation. Lack of prioritization in drug repurposing efforts has led
to some drugs being over-researched, early in the pandemic, even based
on dubious evidence (e.g. hydroxycholoroquine in COVID-19). Moreover
having many small clinical trials has the effect of reducing the pool
of eligible patients available for larger and more promising trials (
Generic Drug Repurposing for COVID-19 and Beyond. Susan Athey, Rena
Conti, Richard Frank, and Jonathan Gruber Institute for Health System
Innovation & Policy. July2020.
[202]https://www.gsb.stanford.edu/faculty-research/publications/generic
-drug-repurposing-covid-19-beyond).
Sub-populations of patients
A tool such as DrugMerge aiming at drug ranking might be used in
conjunction with an activity of data collection that targets specific
sub-populations of patients. For example, most trials to date for
repurposed drugs in COVID-19 have focused on hospitalized patients,
even though prevention might be more valuable and effective if
initiated earlier in disease progression. Also transcriptomic data from
patients affected by frequent comorbidities might help towards the
study of drugs specifically selected for sub-populations at higher risk
of mortality.
Preclinical nature of this research
This paper focuses on the preclinical considerations that may indicate
a drug to be shortlisted for further preclinical in vitro and in vivo
testing, leading to eventual clinical trial. Naturally, all evidence,
which may conceivably be noisy and often contradictory, should be
gathered and considered before pushing a drug further down the
pipeline.
Methods
Drug ranking based on p-value
We use the Core&Peel technique described in Lucchetta et al.^[203]4 and
4 other methods (ClustEx^[204]73, ModuleDiscoverer^[205]74,
Degas^[206]75, and KeyPathwayMiner^[207]76) to produce active
subnetworks for a given disease D, starting from a list of
differentially expressed genes in samples affected by the disease
w.r.t. healthy controls DEG(D) and a gene co-expression network. In
Lucchetta et al., we employed the gene co-expression network made
available by the DREAM challenge^[208]77, and DEG for seven different
datasets: two inflammatory disease microarray experiments (asthma and
rheumatoid arthritis), two cancer RNA-Seq studies (prostate and
colorectal cancer), and three COVID-19 datasets called BALF
(bronchoalveolar lavage fluid RNA-Seq), PBMC (infected patient
peripheral blood mononuclear cells) and COVID19 cells (human
adenocarcinomic alveolar basal epithelial A549 cells). In Lucchetta et
al., we compared the performance of Core&Peel and the other four
algorithms in detecting active subnetworks, resulting in that Core&Peel
is a competitive method in that purpose. In this work, we use the five
active subnetworks detected by all five methods as inputs of DrugMerge.
Let AN be any such active network. For a given Drug perturbation
datasets (for us: GEO, DrugMatrix, CMAP, and L1000; see description in
“[209]Drug perturbation databases” section of Supplementary Material)
using the enrichR R/Bioconductor package, we obtain a listing of drugs
enriched in AN ranked by their adjusted p-value in reverse order (from
the smallest to the largest). We cut this list at the adjusted p-value
of 0.05, retaining only the significantly enriched drugs.
Filtering of drug ranking
We have noticed that a straightforward application of the p-value based
drug ranking includes in top ranking positions some molecules and drugs
that are known to be either generally toxic or to induce diseases in
animal models. For this reason, we introduce the option of applying two
filters to a ranked list of drugs. The first filter is used to retain
only drugs that are approved by the FDA (we use a list of drugs
downloaded from [210]https://www.accessdata.fda.gov on September 9th
2020, with 6315 entries). For FDA-approved drugs usually adverse
effects and toxicity levels are known and probably acceptable in most
cases. The second filter retains drugs that by the directionality
analysis appear to have a ’positive’ action on the disease, i.e. genes
are up/down regulated antagonistically to the disease effect. This list
of positive drugs can be obtained with the method described in the next
section.
Drug ranking based on drug positivity scores
For this analysis, we exploit the fact that two of the methods
(Core&Peel and ModuleDiscoverer) we employ to build active subnetworks
produce the active sub-network as a union of dense sub-graphs
(ego-networks) of a large co-expression network (in detail this
co-expression network has been produced for the DREAM challenge on
disease module detection^[211]77). Since such network is built on the
correlation of the expression-vectors over thousands of conditions, a
high correlation vector implies that a pair of genes are generally both
up or both down regulated at any given time. In a dense ego-network,
such tendency is coherently common to all genes in the ego-network. We
exploit this phenomenon as follows. For an ego-network that is
significantly enriched in genes down-regulated by the disease, we
subtract the number of up-regulated genes by the drug from the number
of down-regulated genes by the drug. If this number is positive we say
that the drug has a positive effect of the disease (for this
ego-network). We do a symmetric calculation of the case of ego-networks
that are significantly enriched in genes up-regulated by the disease.
Summing these effects (with their sign) over all the ego-networks
comprising the active subnetwork, we can compute the directionality of
action of the drug and a magnitude (directionality score). We retain
the drugs that have a positive directionality score and we rank them by
the magnitude of the score. Note that this scheme relies just on
counting the number of up/down regulated genes, not on the magnitude of
the expression, as long as it is significant. In a variant of this
scheme, we can use the size of the ego-networks as a weighting factor.
This method is justified by an extension to the ego-networks of the
phenomenon Chen et al.^[212]78 have noticed that in preclinical models
of breast, liver, and colon cancers, reversal of gene expression
between drug and disease correlates well with drug efficacy.
Furthermore, Chen et al. validated the efficacy of drugs in xenograft
models of the diseases.
Merging of drug lists
Given two or more ranked lists of drugs, we wish to aggregate them into
a single list that takes into account the ranking of any drug in each
of the lists where it occurs. Following Gysi et al.^[213]8 we use the
C-rank routine in Zitnik et al.^[214]79 to perform rank aggregation.
C-rank is defined in Zitnik et al. to work with any custom-defined
ranking of items (thus we do not make use of the graph-based community
ranking scores for which c-rank was originally designed, instead we use
it here as a generic rank aggregation method). Interestingly this rank
aggregation method is oblivious of the origin of the ranking and we can
thus easily aggregate lists obtained with different active subnetwork
algorithms on different disease-DEG sets. The only caveat is that the
lists supplied to c-rank must be of equal length, thus the case of
missing drugs must be handled, by fitting the drugs missing from a list
at its bottom with a scaled mean score value obtained from the other
lists.
Performance evaluation
The performance evaluation aims at assessing how well a predicted
ranked list gives priorities in accordance to human expert judgment as
codified by disease-treatment pairs extracted from the Therapeutic
Target Database^[215]13 for the four benchmark diseases. Since for
COVID-19 currently only one drug has gained acceptance as an approved
clinical treatment, we resort to the list of drugs for the treatment of
COVID-19 undergoing clinical trial as listed in the portal
[216]https://www.clinicaltrials.gov/ (at Sept 5th, 2020). We use two
main measures. The first is precision at 20 (precision@20), that is the
number of drugs in the first 20 positions of the proposed list of
repurposable drugs that are hits in the TTD-derived listing, or, for
COVID-19, in the clinicaltrials.gov listing. The second is the
Reciprocal Hit-Rank (RHR)^[217]80. For a list L of predictions length
N, let the hits be in positions
[MATH: p1 :MATH]
,
[MATH: p2 :MATH]
, ... ,
[MATH: ph :MATH]
. We define :
[MATH: RHR(L)=∑i=1h1
mn>pi. :MATH]
Note that this formula does not imply a fixed length for the lists
being compared, and it is fair in comparing both long and short lists.
A longer list may have more hits by chance, however it will have
positive but diminishing returns for hits in its tail.
Statistical significance
Let D be a set of drugs and
[MATH: H⊂D :MATH]
a golden standard for a disease (e.g. the drugs in D that are in
clinical use for the disease). We generate n random permutations
uniformly at random
[MATH: ri(D) :MATH]
, for
[MATH:
i=1,…,n
:MATH]
and we take, for a performance measure
[MATH: μH() :MATH]
depending on H, the mean m and standard deviation sd of the sequence
[MATH: M=[μH(ri(D))fori=1,…
,n] :MATH]
. For a given drug ranking R(D), its zscore for
[MATH: μH() :MATH]
is its deviation from the mean of the above described random
distribution normalized by the standard deviation of the random
distribution:
[MATH: zscor
e(R(D))=μH(R(D))-m(M)sd(M). :MATH]
Assuming that the random distribution of the values of
[MATH: μH(ri(D)) :MATH]
is normal, we obtain the (two sided) p-value by plugging the z-score
value into the complementary cumulative distribution function (ccdf) of
the normal distribution from the package scipy.stats of SciPy.org.
The number n of random permutations is determined as follows. Starting
with
[MATH:
n=100,000
:MATH]
, we increment n in steps of 10, 000, stopping when the value of both m
and sd are stable, with a displacement from the previous iteration less
than 0.01.
Supplementary information
[218]Supplementary Information.^ (1.7MB, pdf)
[219]Supplementary Table S1.^ (13.3KB, xlsx)
[220]Supplementary Table S2.^ (12.7KB, xlsx)
[221]Supplementary Table S3.^ (12.3KB, xlsx)
[222]Supplementary Table S4.^ (12.4KB, xlsx)
[223]Supplementary Table S5.^ (16.3KB, xlsx)
[224]Supplementary Table S6.^ (13.2KB, xlsx)
[225]Supplementary Table S7.^ (38.6KB, xlsx)
[226]Supplementary Table S8.^ (13.1KB, xlsx)
[227]Supplementary Table S9.^ (10.9KB, xlsx)
Acknowledgements