Abstract

Background

   The high cost and the long time required to bring drugs into commerce
   is driving efforts to repurpose FDA approved drugs—to find new uses for
   which they weren’t intended, and to thereby reduce the overall cost of
   commercialization, and shorten the lag between drug discovery and
   availability. We report on the development, testing and application of
   a promising new approach to repositioning.

Methods

   Our approach is based on mining a human functional linkage network for
   inversely correlated modules of drug and disease gene targets. The
   method takes account of multiple information sources, including gene
   mutation, gene expression, and functional connectivity and proximity of
   within module genes.

Results

   The method was used to identify candidates for treating breast and
   prostate cancer. We found that (i) the recall rate for FDA approved
   drugs for breast (prostate) cancer is 20/20 (10/11), while the rates
   for drugs in clinical trials were 131/154 and 82/106; (ii) the ROC/AUC
   performance substantially exceeds that of comparable methods; (iii)
   preliminary in vitro studies indicate that 5/5 candidates have
   therapeutic indices superior to that of Doxorubicin in MCF7 and SUM149
   cancer cell lines. We briefly discuss the biological plausibility of
   the candidates at a molecular level in the context of the biological
   processes that they mediate.

Conclusions

   Our method appears to offer promise for the identification of
   multi-targeted drug candidates that can correct aberrant cellular
   functions. In particular the computational performance exceeded that of
   other CMap-based methods, and in vitro experiments indicate that 5/5
   candidates have therapeutic indices superior to that of Doxorubicin in
   MCF7 and SUM149 cancer cell lines. The approach has the potential to
   provide a more efficient drug discovery pipeline.

Electronic supplementary material

   The online version of this article (doi:10.1186/s12920-016-0212-7)
   contains supplementary material, which is available to authorized
   users.

   Keywords: Computational drug repositioning, Drug screening, Cancer
   treatment

Background

   The high cost and the long time required to bring drugs into commerce
   [[31]1–[32]3] is driving efforts to repurpose FDA approved drugs—to
   find new uses for which they weren’t intended, and to thereby reduce
   the overall cost of commercialization, and shorten the lag between drug
   discovery and availability [[33]4]. Among the successes of this
   approach are sildenafil, originally developed as a cardiovascular drug
   [[34]5] and repositioned to treat erectile dysfunction; and zidovudine
   (AZT), originally developed as an anticancer drug [[35]6], and
   repositioned for the treatment of HIV. These discoveries, though
   serendipitous, motivated more systematic approaches which might amplify
   the number of discoveries many-fold.

   Systematic approaches generally begin with some form of computer based
   screening to generate large numbers of plausible candidates
   [[36]7–[37]11]. Many current computational strategies exploit shared
   similarities among drugs or diseases and infer similar therapeutic
   applications or drug selections. Drug similarities include chemical
   structures [[38]12–[39]14], drug-induced phenotypic side effects
   [[40]12, [41]15], molecular activities [[42]16]. Disease similarities
   include phenotypic similarity constructed by identifying similarity
   between MeSH terms [[43]17] from OMIM database [[44]18]; semantic
   phenotypic similarity [[45]12]. The efficacy of the candidates
   generated by such approaches would not exceed that of existing drugs
   since the disease biomarkers remain the same.

   A more general approach searches for disease (Gene Expression Omnibus,
   GEO) and drug (CMap) induced transcriptional profiles that are
   inversely correlated [[46]19–[47]23]. Strong anti correlation between
   the gene expression profiles of an FDA approved drug and those of a
   disease for which it was not intended identifies the drug as a
   candidate for repositioning. This procedure, though useful, is
   relatively agnostic with respect to the functional relations between
   profiles (the ordered lists of perturbed genes). A drug identified this
   way is limited in that it is not informed by cellular function, but
   simply targets a group of generally non-interacting differentially
   expressed genes.

   The idea underlying our method, which we refer to as the method of
   functional modules (MFM), is to impose the condition that candidates
   must affect the same cellular functions in opposite ways, and to use
   information about DNA as well as RNA. In particular we search for drugs
   that strongly perturb sets of genes having the following properties:
   (i) they share a strong functional relationship (ii) they are mutated
   in the disease state (iii) their expression is highly perturbed by the
   disease (iv) they are within significantly perturbed pathways of
   diseases. Functional association is based on position in a human
   functional linkage network (FLN) [[48]24]—an evidence weighted network
   that provides a quantitative measure of the degree of functional
   association among any set of human genes. This means the method
   integrates multiple sources of evidence such as protein-protein
   interactions and is not limited to catalogued functional associations,
   e.g. KEGG, but uses a general approach to find functional modules.

   We used genome-wide transcriptional data for more than 3500 compounds
   provided by LINCS [[49]25] and identified 519 (410) repositioned drug
   candidates for breast (prostate) cancer. We also compared the accuracy
   of our method with that of comparable approaches [[50]20, [51]22] (see
   [52]Results). We applied CMap datasets and ranked bioactive compounds
   using different methods, then compared the predictability of the ranked
   lists of compounds (see [53]Statistical validation). We then presented
   evidence that a set of disease mutated genes and their nearest FLN
   neighbors (mutation associated genes (MAGs), see [54]Methods) provided
   more functional insight than a set of differentially expressed genes in
   the disease.

   In addition to these computational assessments, in vitro viability
   tests confirmed that 4 our predicted drug candidates were more
   efficacious than Doxorubicin--an FDA-approved drug for breast
   cancer--against MCF7 and SUM149 cell lines.

Methods

   The method built non-incrementally on the work of Shigemizu et al.
   [[55]22]. In particular: (i) we took account of information on
   mutations (DNA) as opposed to just expression (RNA); and (ii) we took
   account of functional information by using a so-called FLN [[56]24], as
   explained below. Specifically, we annotated mutated genes on the FLN
   [[57]24], and identified and eliminated all genes that 1) are not
   within a specified distance of a mutated gene (the functional module
   constraint); 2) have a differential expression below some threshold
   (the disease condition constraint); 3) are not in pathways that
   distinguish the cancer/normal phenotype.

   An FLN [[58]24] is represented as a network of nodes (genes/proteins)
   connected by links whose weights are proportional to the likelihood
   that the connected nodes share common biological functions. We set a
   threshold on linkage weight so as to exclude approximately 95 % of the
   neighbors of any given node, leaving clusters of functionally related
   aberrant genes. We carried out the procedure twice, once starting with
   mutated genes and their first nearest neighbors, and then with mutated
   genes and their first and second nearest neighbors.

   We considered each drug in turn and identified two FLN landscapes: one
   defined by genes that are up-regulated by the disease and down
   regulated by the drugs (Up regulated Cancer gene, Down regulated
   Bioactive target gene--UCDB) and, the other defined by genes that are
   down regulated by disease and up regulated by the drug (DCUB). Each
   landscape was thus an interconnected set of drug and disease perturbed
   genes. Finally we assigned a score, mutual predictability (discussed
   below), which measured the connectivity within each landscape, which is
   roughly speaking the extent to which the drug and disease genes sets
   are correlated. The greater the relationship, the higher the likelihood
   that the drug is a viable candidate for repositioning. The methodology
   is summarized in Fig. [59]1. The specifics follow.

Fig. 1.

   Fig. 1
   [60]Open in a new tab

   Analytic workflow. (1) After mapping mutated genes to the FLN, identify
   the functional neighbors that are up or down regulated (DEG:
   differentially expressed genes) and within significantly enriched
   disease pathways (FDR < 0.05). (2) Map the genes that are down or up
   regulated by drug candidates to the FLN (3) Compute the MP score; i.e.
   the significance of the functional overlap between the drug and disease
   perturbed genes (see text). (4) Rank the compounds according to the MP
   score. (5) Compute the sensitivity and specificity of the ranked list
   of compounds. (6) Repeat the process with different groups of MAG and
   DRG (Drug Response Gene) generated by looping over the parameters (m &
   k). (7) Choose the parameter set that has highest sensitivity and
   specificity. (8) The drug candidates are chosen form the ranked list
   generated by the best parameter set. (9) The top ranked drug candidates
   are chosen for in vitro experimental validation

Data sources

   Well-documented mutated genes were downloaded from the Online Mendelian
   Inheritance in Man (OMIM) ([61]http://www.ncbi.nlm.nih.gov/omim)
   [[62]18]. 40 breast cancer and prostate cancer and 69 leukemia
   well-documented genes were obtained from OMIM (see Additional file
   [63]1). FLN was downloaded from [64]http://visant.bu.edu/misi/fln/.

Transcript levels

   The differentially expressed genes were obtained from the Illumina
   HiSeq 2000 RNA Sequencing platform for 108 breast and 51 prostate
   paired tumor and normal samples, downloaded from the TCGA portal
   ([65]http://cancergenome.nih.gov/). Differential expression data in
   response to leukemia ([66]GSE1159, [67]GSE9476) were obtained from the
   National Center for Biotechnology Information (NCBI) Gene Expression
   Omnibus (GEO) ([68]http://www.ncbi.nlm.nih.gov/geo/). The ranked list
   of differentially expressed genes was generated using edgeR [[69]26]
   and a t-statistic.

   Ranked list of differentially expressed genes in response to compounds
   treated in breast cancer (MCF7 cell line), myelogenous leukemia (HL60
   cell line), and prostate cancer (PC3 cell line) were obtained from
   connectivity map (CMap) build 02 [[70]20],
   [71]https://www.broadinstitute.org/cmap) and LINCS (level 4)
   ([72]http://www.lincscloud.org/) [[73]20].

Mutation-associated genes (MAG)

   The procedure maps to the FLN, known mutated drivers for the disease of
   interest, and their first nearest neighbors. It then sets the linkage
   threshold to 0.2, eliminating 95 % of the links and leaving gene
   clusters each of which is relatively homogeneous functionally. The
   remaining genes are further selected by 1) setting a threshold on
   transcription level; 2) filtering out the genes that are not in
   pathways that distinguish phenotype (i.e. cancer from normal--see
   [74]Pathway enrichment analysis). As indicated below we were left with
   relatively small gene sets at the end of the process. In order to
   identify well-correlated drug-disease gene sets, the definitions of up-
   and down-regulated genes were not tightly constrained. In particular,
   we looped through m sets of various sizes, ranging from the 1000 most
   up-regulated genes, to the top half of the total number of genes in our
   universe--which depends on the number of probes on the chip--in
   increments of 2,000. A similar procedure was followed to obtain
   networks of the most down-regulated genes.

   Networks were obtained for each member of our universe of bioactive
   compounds. A drug was ranked in accord with the intersection between
   its functional network and the disease functional network, as described
   below. The procedure was then repeated, by starting with first and
   second nearest neighbors. The final number of MAG ranged from 75 to
   1074 for breast cancer; 15 to 460 for prostate cancer; and 46 to 772
   for leukemia.

Pathway enrichment analysis

   We focused on the enrichment of pathways abnormally perturbed in the
   disease state compared to the normal state. PWEA [[75]27]
   ([76]http://zlab.bu.edu/PWEA/download.php) was used to identify
   significantly perturbed pathways in the gene expression profiles of
   breast cancer, leukemia and prostate cancer described above.

Drug response genes (DRG)

   The top (up-regulated) and bottom (down-regulated) k most
   differentially expressed genes in response to bioactive compounds in
   disease cell lines were selected as DRG. We restricted the number of up
   (down)-regulated DRG to be within +/− 500 genes of the matched down
   (up)-regulated MAG. For example, if 500 up-regulated MAG are in an FLN
   cluster, k would from a low of 100 to a high of 1000 in increments of
   100.

Library of Integrated Cellular Signatures (LINCS)

   LINCS profiles are generated using 3,678 and 4,228 bioactive compounds
   for breast cancer and prostate cancer, respectively, each compound
   typically applied at 6 different concentrations (0.0003-177 μM) and 2
   time points (6 and 24 h). We retained the expression profile of a
   compound that produced maximal mutual predictability score before
   ranking the compounds. Twenty of the 3678 (11 of 4228) were FDA
   approved drugs for breast (prostate) cancer.

Connectivity map

   We used CMap datasets for comparing the performance between our method
   with others. CMap profiles are generated using 1251, 1079 and 1182
   bioactive compounds for breast cancer, leukemia and prostate cancer,
   respectively. Eight of the 1251, 6 of 1079, and 7 of 1182 were FDA
   approved drugs for breast cancer, leukemia and prostate cancer
   respectively.

Drug and clinical trial information retrieval

   We collected data from DrugBank ([77]http://www.drugbank.ca/). FDA
   approved drugs from FDA service: Drugs@FDA. Clinical trial data were
   downloaded from [78]https://clinicaltrials.gov.

Mutual predictability (MP)

   We used mutual predictability [[79]4] to score the correlation between
   mutation associated genes (MAG) and drug response genes (DRG). In
   essence, mutual predictability is a measure of the degree to which MAG
   can be used as seed genes to predict DRG (predictability M-D), and vice
   versa (predictability D-M). The mutual predictability of the two sets
   measures the extent to which genes in one set can be used to identify
   (predict) genes in the other [[80]24]. A disease drug pair with high
   mutual predictability has a strong functional relation; the higher the
   score, the stronger the relation.

   To quantify the predictability M-D, we use MAG as seeds, and score and
   rank each gene connected to a seed using the disease mutual
   predictability score S[i]:
   [MATH: <msub><mi>S</mi><mi>i</mi></msub><mo>=</mo><mstyle
   displaystyle="true"><munder><mo
   stretchy="true">∑</mo><mrow><mi>j</mi><mo>∈</mo><mi
   mathvariant="italic">seeds</mi></mrow></munder></mstyle><msub><mi>w</mi
   ><mrow><mi>i</mi><mi>j</mi></mrow></msub> :MATH]

   where w[ij] weights the link between gene i and seed j, and the score
   is 0 if there is no seed connection.

   We obtained the sensitivity and specify variation by using a series of
   cutoffs on the ranked list. The number of true positives is taken to be
   the number of DRG above a particular cutoff; the number of true
   negatives is the number of non-DRG below the cutoff; the number of
   false positives is the number of non-DRG above the cutoff, and the
   false negatives are the number of DRG below the cutoff. AUC scores
   range from 0 and 1, with 0.5 and 1.0 indicating random and perfect
   predictive performance, respectively.

   AUC[D-M] as a measure of predictability D-M is similarly calculated.
   The mutual predictability between MAG and DRG is then defined as the
   geometric mean of AUC[D-M] and AUC[M-D]:
   [MATH: <mi mathvariant="normal">Mutual</mi><mspace
   width="0.25em"></mspace><mi
   mathvariant="normal">Predictability</mi><mspace
   width="0.25em"></mspace><mfenced close=")" open="("><mrow><mi
   mathvariant="normal">M</mi><mi mathvariant="normal">A</mi><mi
   mathvariant="normal">G</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">and</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">D</mi><mi mathvariant="normal">R</mi><mi
   mathvariant="normal">G</mi></mrow></mfenced><mspace
   width="0.25em"></mspace><mo>=</mo><msqrt><mrow><msub><mrow><mi
   mathvariant="normal">A</mi><mi mathvariant="normal">U</mi><mi
   mathvariant="normal">C</mi></mrow><mrow><mi
   mathvariant="normal">D</mi><mo>−</mo><mi
   mathvariant="normal">M</mi></mrow></msub><mspace
   width="0.25em"></mspace><mo>×</mo><mspace
   width="0.25em"></mspace><msub><mrow><mi mathvariant="normal">A</mi><mi
   mathvariant="normal">U</mi><mi
   mathvariant="normal">C</mi></mrow><mrow><mi
   mathvariant="normal">M</mi><mo>−</mo><mi
   mathvariant="normal">D</mi></mrow></msub></mrow></msqrt> :MATH]

   Each bioactive compound is thereby ranked by its mutual predictability
   score.

   A detailed example of MP score computation is shown in Additional file
   [81]2, 2-1 and Additional file [82]3 Figure S1.

Evaluation of predictability

Statistical validation

   We determined the extent to which FDA approved cancer drugs were
   enriched in our ranked list by again calculating an AUC as indicated
   above. Briefly, focus on a position t from the top. The ratio of FDA
   approved drugs for target disease at or above position t, to total
   drugs at or above t is counted as TP; the ratio of non-FDA approved
   drugs below t to total drugs below t is TN. The running index t is
   varied to produce a ROC, and the area under the curve (AUC) is used as
   a measure of predictability. This is of course a non-normalized result,
   but as we now indicate it is used only in a relative way, to compare
   different parameter sets.

Parameter optimization

   Each set of parameters (rank cutoffs m & k for filtering MAG and
   selecting DRG) generated different ranked lists of bioactive compounds.
   We computed the AUC score using the ranked list, and chose the best set
   of parameters based on the maximum AUC score. Repositioned drug
   candidates were selected from the ranked list generated by the best
   parameter set. After optimization, the best parameters (number of MAG
   and DRG (MAG/DRG)) are 237/700 (UCDB) and 75/100 (DCUB) for breast
   cancer; and 333/100 (UCDB) and 46/100 (DCUB) for prostate cancer.

   For the ranked list, the significance of the mutual predictability
   scores for each compound was estimated by randomly selecting a set of n
   DRG, computing the mutual predictability score given the MAG, repeating
   the process 100,000 times to generate a null distribution, and then
   estimating the probability that our observation was obtained by chance.
   We computed the false discovery rate (FDR) for individual compounds by
   calculating the expected number of false positives, given the actual
   distribution of mutual predictability scores and the null distribution.

   We assessed the significance of the best AUC score by randomly
   selecting from LINCS, 20 out of 3678 drugs for breast cancer and 11 out
   of 4228 for prostate cancer as true positives. For CMap, we randomly
   selected 8 out of 1251 drugs for breast cancer; 6 out of 1079 for
   leukemia; and 7 out of 1182 for prostate cancer. We then computed the
   AUC for each parameter set, repeated the process 100,000 times and
   generated a null distribution. The p-value was used to estimate FDR for
   multiple tests.

Comparison with other methods

   We applied the methods (Lamb et al. and Shegemizu et al.) that used
   CMap data to breast cancer, leukemia and prostate cancer and compared
   them with MFM.

Lamb et al. [[83]20]

   We queried the 50 to 500 (in increments of 50) up- and down-regulated
   signature genes of breast cancer (MCF7), leukemia (HL60) and prostate
   cancer (PC3) on
   ([84]https://www.broadinstitute.org/cmap/newQuery?servletAction=querySe
   tup&queryType=quick), and obtained ranked lists of bioactive compounds.
   The disease signature genes (FDR < 0.05) were generated from the same
   expression data used for MFM, as described in [85]Transcript levels.
   The total number of compounds and the corresponding cell lines were the
   same as those were used for MFM. Then we followed the same procedure as
   that was used for MFM to assess the performance. The highest AUC score
   was selected for comparison.

Shegemizu et al. [[86]22]

   We used the same expression profiles (GDS2617, GDS2908 and GDS1439) and
   parameters (1200 and 1400 for UCDB and DCUB for breast cancer; 700 and
   800 for UCDB and DCUB for leukemia; 5200 and 4200 for UCDB and DCUB for
   prostate cancer) reported in the [[87]22] to generate ranked lists of
   compounds. Performance was assessed with the same procedure used for
   MFM.

Experimental validation

Cell cultures and reagents

   Cell lines MCF7, SUM149 and MCF10A were obtained from ATCC (American
   Type Culture Collection, Manassas, VA) and maintained as recommended.
   The growth medium was supplemented with 10 % fetal bovine serum (FBS),
   50 units/ml of penicillin and streptomycin, and incubated at 37 °C with
   5 % carbon dioxide. Dimethyl sulfoxide (DMSO), at 0.2 %, was used as
   the vehicle control.

MTT assay

   Metabolic activity of MCF7, MCF10A and SUM149 cells treated with
   vehicle (0.1 % DMSO) or repositioned drug candidates was assessed with
   the MTT (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide)
   assay. Cells were placed in 96-well plates and treated for 24 h with
   drugs with concentrations ranging from 0–1000 μM, then assayed for
   metabolic activity. 10 μl of MTT solution (10 mg/ml in PBS) was added
   to each well and incubated for an additional 3 h. The medium was then
   replaced with 200 μl of DMSO. Absorbance was determined at 570 nm
   (experimental absorbance and 690 nm (background absorbance) by an ELISA
   plate reader. The inhibitory effect of drug candidates was expressed as
   the relative metabolic activity (% control) and calculated as shown
   below. The relative viability was calculated as relative
   viability = (experimental absorbance - background absorbance)/
   (absorbance of vehicle controls - background absorbance of vehicle
   controls) × 100 %.

Results

   We screened repositioned drug candidates by using mutual predictability
   [[88]24] to score correlation between mutation-associated genes
   up-regulated in disease samples and genes down-regulated by bioactive
   compounds (DCUB), and vice versa (UCDB). Since a high mutual
   predictability score indicates strong functional linkage between sets
   of disease and drug related genes, our hypothesis is that candidate
   drugs so identified have potential to correct the sets of disease genes
   and have therapeutic effect on the disease.

Identification of repositioned drug candidates for breast cancer and prostate
cancer using LINCS

   We performed analysis on the most updated data of gene expression
   signatures of bioactive compounds from LINCS [[89]25]. We evaluated the
   significance of mutual predictability score of each compound, and FDRs
   as explained under [90]Methods.

Statistics of significant bioactive compounds

Breast cancer

   LINCS includes breast cancer cell line expression in response to 3678
   compounds. We calculated the mutual predictability score for each of
   these, as described in Method – [91]Mutual Predictability Score. The
   gene sets associated with each cancer/compound were assigned p-values
   as described in Method – [92]Parameter optimization, to obtain ranked
   lists of 2435 DCUB compounds and 1875 UCDB compounds with FDR < 0.05
   (Table [93]1). Of these 510 were FDA approved drug candidates for
   repositioning to breast cancer. The detailed description of candidates
   is in Additional file [94]4.

Table 1.

   Breast cancer and prostate cancer repositioned drug candidates
   identified from analysis of LINCS. Complete lists of repositioned drug
   candidates for breast cancer and prostate cancer are shown in
   Additional file [95]13
   Breast Cancer Prostate Cancer
   Total compounds 3678 4228
   Compounds that are FDA drugs 632 676
   Compounds that are FDA drugs for target disease 20 11
   Compounds that are in clinical trial for target disease 154 106
   UCDB DCUB UCDB DCUB
   Compounds with FDR < 0.05 2435 1875 2500 1668
   Compounds that are clinical drugs with FDR < 0.05 (p-value) 131
   (6.2E-8) 109 (2.7E-7) 82 (4.9E-5) 67 (4.8E-7)
   FDA drugs with FDR < 0.05 427 325 456 317
   FDA drugs with FDR < 0.05 in both UCDB and DCUB 244 291
   FDA drugs for target disease with FDR < 0.05 (p-value) 20 (2.5E-4) 19
   (2.7E-5) 10 (2.6E-2) 9 (5.3E-3)
   AUC (p-value) 0.86 (<1.0E-6) 0.81 (<1.0E-6) 0.77 (9E-3) 0.83 (4.7E-5)
   Number of MAG/DRG 237/700 75/100 333/100 46/100
   [96]Open in a new tab

Prostate cancer

   LINCS includes prostate cancer cell line expression in response to 4228
   compounds. The gene sets associated with each cancer/compound were
   assigned p-values to obtain ranked lists of 2500 DCUB compounds and
   1668 UCDB compounds with FDR < 0.05 (Table [97]1). Of these 291 were
   FDA approved drug candidates for repositioning to prostate cancer
   (Additional file [98]4).

Supporting evidence

Sensitivity and specificity

   To evaluate the predictability of the ranked drug candidates, ROC
   curves were generated using 20 FDA breast cancer drugs and 11 FDA
   prostate cancer drugs as true positive. The highest AUC scores were
   0.86 (p = 1.0E-6) and 0.83 (p = 4.5E-5) for breast cancer and prostate
   cancer, respectively. We estimated the significance of the AUC scores
   as described in [99]Parameter optimization session.

Comparisons with computational drug repositioning methods

   We compared the predictability of our method with that of the
   computational drug repositioning methods, which screen drugs based on
   the anti-correlation between similar gene and disease signatures,
   omitting the functional correlation between genes. In order to compare
   the performance with Shegimizu et al. [[100]22], and CMap [[101]20], we
   obtained the expression data of 1251, 1079 and 1182 compounds treated
   in MCF7, HL60 and PC3 from CMap data sets. We used methods to generate
   ranked drug lists and compared the highest AUC scores. As shown in
   Fig. [102]2 MFM consistently outperforms the 2 pervious methods,
   sometimes by wide margins.

Fig. 2.

   Fig. 2
   [103]Open in a new tab

   Comparison of performance for the MFM with other methods. We applied
   CMap datasets to compare performance of MFM with Shegemizu et al. and
   Lamb et al. The sensitivity and specificity were calculated as
   explained in the Methods section, and the area under the ROC curve was
   used as a measure of performance. UCDB: prediction of drug candidates
   that can down-regulate genes up-regulated in cancer. DCUB: prediction
   of drug candidates that can up-regulate genes down-regulated in cancer.
   It shows that MFM consistently outperforms the two methods in different
   datasets and diseases

Recall rate

   Among 2587 bioactive compounds with FDR less than 0.05, 20/20
   (p = 2.5E-4) FDA breast cancer drugs and 150/173 (p = 3.1E-10) clinical
   drugs (compounds that have been in clinical trials for breast cancer,
   Additional file [104]5) were recalled. For prostate cancer, among 1668
   bioactive compounds with FDR less than 0.05, 10/11 (p = 2.6E-2) FDA
   prostate cancer drugs and 89/113 (p = 6.3E-6) clinical drugs were
   recalled. Significance was calculated using the Fisher exact test.

Functional plausibility

Breast cancer

   One way to characterize the functional implications of breast cancer
   MAGs is by estimating the chance probability of their observed
   distribution over KEGG pathways. We took the MAGs (MAG-UP, see,
   Additional file [105]6) that produced the drug ranked lists with the
   highest AUC scores after optimization. The MAGs contain 40 breast
   cancer mutations and their 237 filtered first nearest neighbors on the
   FLN, which are up regulated in breast cancer (see Additional file
   [106]6).

   As shown in Additional file [107]6, we found 95 pathways
   over-represented in breast cancer (FDR < 0.05), 18 of which are
   classified in KEGG as cancer pathways (22 of the 287 KEGG pathways, are
   labeled cancer-related). For example, [[108]28] found that the
   spliceosome assembly pathway is enriched in genes that are
   overexpressed in breast cancer samples, compared to benign lesions.
   They have shown that siRNA-mediated depletion of SmE (SNRPE) or SmD1
   (SNRPD1) led to a marked reduction of cell viability in breast cancer
   cell lines, whereas it had little effect on the survival of the
   nonmalignant MCF10A breast epithelial cells [[109]29].

   In addition, signaling pathways that regulate pluripotent stems cells
   are enriched in overexpressed genes that are in the functional
   neighborhood of genes mutated in breast cancer tissue (MAGs,
   p = 4E-09). The deregulation of these pathways many play a role in the
   development of chemoresistance of cancer stem cells, including breast
   cancer [[110]30]. Other published breast cancer causal pathways such as
   Estrogen signaling [[111]31], ErbB [[112]32], neurotrophin [[113]33],
   MAPK [[114]34] and PI3K/AKT [[115]35] were significantly enriched in
   mutation associated genes (MAGs).

Prostate cancer

   A similar approach was followed for prostate cancer. As summarized in
   Additional file [116]6, we found 117 enriched pathways (FDR <0.05), 18
   of which are KEGG cancer pathways, including the prostate cancer
   pathway (p = 6.9E-10). There was also supporting evidence that showed
   deregulation of the enriched pathways in prostate cancer. For example,
   T cell infiltration of the prostate induced by androgen withdrawal has
   been found in patients with prostate cancer [[117]36]; the
   androgen-androgen receptor (AR) system plays vital roles in prostate
   cancer development and progression [[118]37]. Insulin-like growth
   factor 1 or insulin signaling has been found to activate androgen
   signaling through direct interactions of Foxo1 with androgen receptors.
   Intervention of IGF1/insulin-phosphatidylinositol 3-kinase-Akt
   signaling was reported to be of clinical value for prostate cancer. T
   cell receptor, PI3K-Akt, FoxO, and insulin signaling pathways were
   highly ranked candidates with p < E-05.

   A number of studies have shown that breast and prostate cancer are
   genetically related [[119]38, [120]39], as are almost all cancers to
   various degrees. Our finding that breast and prostate cancer share 80
   pathways is a striking illustration of this connection (see Additional
   file [121]6). We expect that the selected drug candidates having a
   strong functional relation (mutual predictability score) with this set
   of genes could potentially correct these aberrant functions.

MFM provides functional insight

   We compared the functional information gained from MAGs with
   information obtained using disease differentially expressed genes
   (DEGs) (often referred to as disease signature genes) exclusively
   [[122]19, [123]20]. As shown in Additional file [124]6, we found that
   our current method identifies more significantly enriched pathways and
   well-documented breast cancer and prostate cancer pathways than does
   the use of differential expression alone. To make a comparison, we
   mapped DEGs onto KEGG pathways. For breast cancer, one set contains the
   most up-regulated 247 DEGs; for prostate cancer, there were 333
   up-regulated DEGs. The disease DEGs were generated from the expression
   data as explained in [125]Transcript Level. These results taken
   collectively suggest that the inclusion of mutational and functional
   information into disease gene signatures, substantially improves
   prediction of disease mechanism and adds specificity and accuracy to
   the identification of repositioned candidates.

Experimental validation

Repositioned drug candidates inhibit metabolism of breast cancer cells

   We employed an MTT assay to assess cancer cell viability after
   treatments of 5 repositioned drug candidates (Table [126]2) [[127]40].
   In particular, we tested the viability of 2 breast cancer cell lines:
   MCF7 (Luminal A subtype), and SUM 149 (Triple negative, inflammatory
   breast cancer subtype). We assessed non specific drug toxicity by
   comparing the inhibition with that obtained against the immortalized
   but non-malignant MCF10A cell line.

Table 2.

   ^aMutual predicatbility score of breast cancer drug candiates predicted
   by MFM
     FDA Drug   ^aMP score P-value    FDR
   Clotrimazole    0.7     5.00E-06 4.88E-05
   Triprolidine    0.69    2.00E-05 1.64E-04
   Thioridazine    0.69    2.00E-05 1.64E-04
   Mefloquine      0.69    3.00E-05 2.28E-04
   Fluphenazine    0.66    1.11E-02 2.13E-02
   [128]Open in a new tab

   As shown in Additional file [129]7: Figure S2, Additional file [130]8:
   Figure S3, Additional file [131]9: Figure S4, Additional file [132]10:
   Figure S5, Additional file [133]11: Figure S6 and Additional file
   [134]12: Figure-S7, MCF7, SUM149 and MCF10A cells exposed to increasing
   concentrations of drugs for 24 h exhibited a dose dependent reduction
   in viability. The important measure of efficacy is therapeutic index
   (TI), the IC50 of a drug when it targets a non-tumor cell line,
   relative to its IC50 when it targets a tumor cell line. As shown in
   Fig. [135]3, the TIs of candidates tested against MCF7 and SUM149 are
   all substantially higher than that of Doxorubicin. In addition, all
   drug candidates except for Triprolidine achieved maximum efficacy
   (E[max]) at lower concentrations than did Doxorubicin.

Fig. 3.

   Fig. 3
   [136]Open in a new tab

   a FDA approved indications of predicted drug candidates; b Half maximal
   inhibitory concentration (IC50) (μM) of predicted drug candidates and
   Doxorubicin against MCF7, SUM149 and MCF10A; c and d Therapeutic index
   (TI) and maximal inhibitory concentrations (E[max]) of predicted
   repositioned drug candidates on MCF7, SUM149 and MCF10A. (*Currently
   used FDA drug for breast cancer; Therapeutic index (TI) was calculated
   as a ratio of the IC50 of MCF10A, to the IC50 of MCF7 and SUM149)

Discussion

   We developed a computational drug screening method -- based on the
   correlation between functional modules of genes perturbed by diseases
   and drugs -- that could potentially accelerate the introduction of new
   therapeutics for serious diseases and conditions. Our approach
   performed substantially better than previous methods by computational
   measures, and successfully predicted novel drugs that had higher
   inhibitory effect against breast cancer in vitro than Doxorubicin. The
   study benefited substantially from LINCS, the most up to date drug
   response expression data sets currently available.

   A number of computational drug-repositioning methods that utilized CMap
   have been devised and the efficacy of identified drugs have been
   supported by in vivo [[137]16, [138]19] experiments. However, the
   methodologies are exclusively based on gene expression, without taking
   disease driver/mutated genes or functional information between genes
   into account. Sirota, M., et al. [[139]15] searched for drug candidates
   based on similarities between drug response gene signatures (DEG) and
   [[140]12] predicted drug molecular functions based on drug response
   gene signatures.

   Here we indicate a method that has taken this into account and shows
   better performance than previous methods that utilized solely DEGs. We
   also showed that there was more functional information gained from MAGs
   than significantly differentially expressed genes (DEGs). Therefore, we
   believe that the method could screen more effective therapeutics than
   previous methods.

   Of the five drugs for which we did preliminary in vitro tests, they all
   have higher TI in both cell types than does Doxorubicin. Mefloquine is
   a lipophilic molecule that is an FDA-approved anti-malaria agent. It
   has 3 known protein targets: Fe(II)-protoporphyrin IX, hemoglobin
   subunit alpha, and A2A adenosine receptor (A2AR). Its antimalarial
   action is believed to result from inhibition of heme polymerization
   within the food vacuole in the blood stages of the malaria life cycle
   [[141]41]. Its potential role as a cancer therapeutic; however, stems
   from its antagonistic action on A2AR [[142]42].

   A study has shown that antagonizing A2AR could provide a basis for
   cancer immunotherapy [[143]43]. Preclinical studies have confirmed that
   blockade of A2a receptor activation has the ability to markedly enhance
   anti-tumor immunity and be effective against melanoma and lymphoma
   [[144]44–[145]46].

   Tumors may evade immune repose by usurping pathways; such as
   adenosinergic signaling pathway, that negatively regulates immune
   response. Tumors and its microenvironment have been found to have high
   levels of adenosine and ATP, which is triggered by increased cellular
   turnover and hypoxia [[146]43]. The extracellular adenosine then
   activates specific purinergic receptors such as A2AR. The activation of
   A2AR in cancer results in inhibition of the immune response to tumors
   via suppression of T regulatory cell function and inhibition of natural
   killer cell cytotoxicity and tumor-specific CD4^+ and CD8^+ T cell
   activity, therefore, inhibition of A2AR by specific antagonists may
   enhance anti-tumor immunity.

   Immunosuppression is associated with hypoxia and accelerated cell turn
   over. In accordance with the findings, in our analysis of pathway
   enrichment of MAGs for breast cancer, cell cycle, HIF1 and T cell
   signaling pathways were significantly dysregulated in breast cancer.
   Therefore, Mefloquine, the A2aR antagonist could be applied as an
   effective immunotherapeutic strategy.

   Fluphenazine and Thioridazine are both antipsychotics. The mechanism of
   action of fluphenazine is not well established, but it is known to
   antagonize dopamine by binding to the D2 receptor. Thioridazine binds a
   range of receptor types including dopamine and various serotonin
   receptor subtypes. The relationship to inhibition of transformed (MCF7
   and SUM149) cells is not entirely obvious.

   In our in vitro study, breast cancer cells (MCF7, SUM149 and MCF10A)
   had shown resistance against Doxorubicin. The Emax of Doxorubcin was
   higher than 4 out of 5 of our candidate drugs, which corresponds with
   the reported fact that breast cancer patients show drug resistance
   against Doxorubicin. It also suggests the ability of our drug candidate
   to overcome the drug resistance. The study [[147]47] has found that
   Thioridazine antagonized dopamine receptors, which are expressed on
   cancer stem cells (CSC) and breast cancer cells, and could induce death
   of leukemia cancer stem cells preferentially without harming normal
   blood stem cells. The dopamine receptor pathway is known to regulate
   the growth of CSCs [[148]48]. Therefore, Fluphnazine and Thioridazine
   could inhibit drug resistance of breast cancers by modulating CSC
   through dopamine receptor signaling pathway.

Conclusion

   MFM, which utilizes a functional-linkage network, known mutations, and
   altered RNA levels, appears to be a promising method for identifying
   multi-targeted drug candidates that can correct aberrant cellular
   functions. In particular the computational performance exceeded that of
   other CMap-based methods, and in vitro experiments indicate that 5/5
   candidates have therapeutic indices superior to that of Doxorubicin in
   MCF7 and SUM149 cancer cell lines. This new approach has the potential
   to provide a more efficient drug discovery pipeline.

Abbreviations

   A2AR, adenosine A2a receptor; AUC, area under the curve; CMap,
   connectivity map; CSC, cancer stem cells; DCUB, down regulated cancer
   genes up regulated bioactive compounds; DEG, differentially expressed
   genes; DMSO, Dimethylsulfoxide; DNA, deoxyribonucleic acid; DRG, drug
   response gene; EMax, maximal inhibitory concentration; FDA, Food and
   drug administration; FDR, false discovery rate; FLN, functional linkage
   network; GEO, gene expression omnibus; IC50, half maximal inhibitory
   concentration; KEGG, Kyoto encyclopedia of genes and genomes; LINCS,
   library of integrated network based cellular signatures; MAG, mutation
   associated gene; MFM, method of functional modules; MP, mutual
   predictability; MTT,
   3-(4,5-Dimethylthiazol-2-Yl)-2,5-Diphenyltetrazolium Bromide; OMIM,
   online mendelian inheritance in man; RNA, ribonucleic acid; ROC,
   receiver operating characteristic; TCGA, the cancer genome atlas; TI,
   therapeutic index; UCDB, up regulated cancer genes down regulated
   bioactive compounds

Acknowledgement