Abstract

Background

   Recent efforts to repurpose existing drugs to different indications
   have been accompanied by a number of computational methods, which
   incorporate protein-protein interaction networks and signaling
   pathways, to aid with prioritizing existing targets and/or drugs.
   However, many of these existing methods are focused on integrating
   additional data that are only available for a small subset of diseases
   or conditions.

Methods

   We have designed and implemented a new R-based open-source target
   prioritization and repurposing method that integrates both canonical
   intracellular signaling information from five public pathway databases
   and target information from public sources including
   [34]OpenTargets.org. The Pathway2Targets algorithm takes a list of
   significant pathways as input, then retrieves and integrates public
   data for all targets within those pathways for a given condition. It
   also incorporates a weighting scheme that is customizable by the user
   to support a variety of use cases including target prioritization, drug
   repurposing, and identifying novel targets that are biologically
   relevant for a different indication.

Results

   As a proof of concept, we applied this algorithm to a public colorectal
   cancer RNA-sequencing dataset with 144 case and control samples. Our
   analysis identified 430 targets and ~700 unique drugs based on
   differential gene expression and signaling pathway enrichment. We found
   that our highest-ranked predicted targets were significantly enriched
   in targets with FDA-approved therapeutics for colorectal cancer
   (p-value < 0.025) that included EGFR, VEGFA, and PTGS2. Interestingly,
   there was no statistically significant enrichment of targets for other
   cancers in this same list suggesting high specificity of the results.
   We also adjusted the weighting scheme to prioritize more novel targets
   for CRC. This second analysis revealed epidermal growth factor receptor
   (EGFR), phosphoinositide-3-kinase (PI3K), and two mitogen-activated
   protein kinases (MAPK14 and MAPK3). These observations suggest that our
   open-source method with a customizable weighting scheme can accurately
   prioritize targets that are specific and relevant to the disease or
   condition of interest, as well as targets that are at earlier stages of
   development. We anticipate that this method will complement other
   approaches to repurpose drugs for a variety of indications, which can
   contribute to the improvement of the quality of life and overall health
   of such patients.

   Keywords: Drug repurposing, Drug targets, Target prioritization,
   Bioinformatics, Colorectal cancer, Target, Pathways, Prediction

Introduction

   Substantial effort and resources have been devoted to identifying
   therapeutic treatments for many human diseases and conditions. The
   maladies could be caused by autoimmunity, uncontrolled cell growth,
   genetics, infection, and other acute or chronic ailments. Since moving
   a candidate treatment through the process of approval by the US Food
   and Drug Administration (FDA) is risky ([35]Zhong et al., 2018), often
   taking many years, and requiring a substantial financial investment;
   researchers have expanded their development efforts to drug repurposing
   ([36]Hernandez et al., 2017; [37]Parvathaneni et al., 2019).
   Traditional methods of drug discovery have involved using low- or
   high-throughput screens to identify inhibitors or activators of a given
   target ([38]Thakur et al., 2021; [39]Olgen, 2019; [40]Glanz et al.,
   2020). Hits that are identified in these screens are generally
   optimized prior to subsequent testing in cell culture, animal models,
   and clinical trials ([41]Thakur et al., 2021). Alternatively, using a
   pathway-based approach to drug discovery involves performing
   experiments to better understand the underlying mechanism(s) of a given
   condition, and to identify relevant targets ([42]Deng et al., 2020;
   [43]Wang et al., 2020b; [44]Damale et al., 2020; [45]Chatterjee et al.,
   2022; [46]Liu et al., 2021; [47]Ren et al., 2010). Past studies have
   shown that incorporating a signaling pathway approach can successfully
   identify proteins that can be targeted with therapeutics having
   sufficient efficacy and safety to warrant approval by regulators
   ([48]Ding et al., 2020; [49]Khojasteh Poor et al., 2021; [50]Choi et
   al., 2020; [51]Proctor, Thompson & O’Bryant, 2014).

   Drug repurposing is the process of getting regulatory approval for
   applying an existing therapeutic to a separate disease or condition
   (i.e., indication) ([52]Ding et al., 2020; [53]Zali et al., 2019;
   [54]Harb, Lin & Hao, 2019). The benefits of this approach include a
   potentially shorter time to approval since the therapeutic has already
   been deemed as “safe” by government regulatory agencies. Early
   repurposing efforts were focused on identifying symptom similarities or
   using known side-effects from patients with other conditions to treat a
   separate condition ([55]Kingsmore, Grammer & Lipsky, 2020; [56]Ballard
   et al., 2020). Subsequent advances in understanding intracellular
   signaling mechanisms enabled a transition to more complex analyses that
   identify a candidate therapeutic for repurposing, and to develop novel
   therapeutics towards known targets ([57]Schein, 2020). This is
   evidenced by the wide variety of drug and target discovery tools that
   have already been reported ([58]Paananen & Fortino, 2020; [59]Sleno &
   Emili, 2008; [60]Huang et al., 2020). The majority of these modern
   methods take advantage of protein-protein interaction networks ([61]Ma
   et al., 2019; [62]Ozdemir et al., 2019; [63]Cheng et al., 2019), gene
   sets ([64]Masoudi-Sobhanzadeh et al., 2020; [65]Tanoli, Vähä-Koskela &
   Aittokallio, 2021), and/or signaling pathways ([66]Jain et al., 2021)
   for repurposing-based drug- and target prioritization efforts. Some
   methods combine one or more of these methods with artificial
   intelligence to further improve the pace of drug discovery ([67]Tanoli,
   Vähä-Koskela & Aittokallio, 2021; [68]Anderson et al., 2020; [69]Paul
   et al., 2021; [70]Gupta et al., 2021).

   Even with such recent advances, many prioritization algorithms rely on
   public or proprietary protein network data ([71]Emig et al., 2013;
   [72]Huang et al., 2014; [73]Louhimo et al., 2016; [74]Li & Lu, 2013;
   [75]Isik et al., 2015; [76]Carrella et al., 2014; [77]Setoain et al.,
   2015; [78]Duan et al., 2016; [79]Barrio-Hernandez et al., 2023;
   [80]Fang et al., 2019; [81]Lee et al., 2011; [82]Greene et al., 2015;
   [83]Huang et al., 2018), with some algorithms focusing on a particular
   set of diseases or conditions ([84]Crowther et al., 2010; [85]Xu, Kong
   & Hu, 2021; [86]Dezső & Ceccarelli, 2020; [87]Fiscon et al., 2021;
   [88]Chen & Xu, 2016; [89]Regan-Fendt et al., 2020). Drug repurposing
   and target prioritization algorithms generally apply a consistent set
   of parameters, which are often specific to a given indication. Such
   specialization makes it difficult to effectively and adequately support
   the efforts of researchers working in other disease areas ([90]Sharma
   et al., 2021; [91]Begley et al., 2021).

   Given the specialization that is prevalent among many repurposing
   tools, the aim of the current study was to incorporate a novel,
   flexible, and customizable open-source target prioritization method
   into the Pathway2Targets algorithm, which would increase the number of
   supported use cases. This updated algorithm retrieves additional target
   information, clinical trial data, automatically fetches the reactome
   pathway diagrams for the signaling pathways with the highest number of
   targets, and accepts reactome pathway enrichments generated by the
   enrichr algorithm ([92]Xie et al., 2021a). This additional data and
   prioritization method are used by the updated algorithm to generate
   ranked lists of targets and therapeutics that can be applicable to
   multiple use cases ([93]Scott, Jensen & Pickett, 2021; [94]Gray et al.,
   2022; [95]Moreno et al., 2022; [96]Rapier-Sharman, Clancy & Pickett,
   2022). The entities in these lists can then be evaluated as candidates
   for condition-specific repurposing efforts based solely on the unique
   signaling pathway “profile” for the disease/condition of interest.

Method

GEO query for transcriptomics data

   The Gene Expression Omnibus database was queried for a well-controlled
   bulk transcriptomics human colorectal cancer dataset with a
   sufficiently high number of samples ([97]GSE156451) to enable confident
   downstream repurposing analysis ([98]Sayers et al., 2021). The
   paired-end fastq files for this publicly available study were then
   downloaded from the Sequence Read Archive (SRA), a database within the
   National Center for Biotechnology Information ([99]Sayers et al.,
   2021). This study consisted of 144 samples, with 72 from tumors in
   patients with colorectal cancer (CRC) and the other 72 from native
   human tissue ([100]Li et al., 2021).

Transcriptomic preprocessing and analysis

   The 144 public colorectal cancer transcriptomics samples were
   preprocessed using the Automated Reproducible MOdular Workflow for
   Preprocessing and Differential Analysis of RNA-seq Data (ARMOR)
   software ([101]Orjuela et al., 2019). Briefly, this open-source
   Snakemake-based workflow was used to perform read trimming on the
   RNA-sequencing fastq files with TrimGalore ([102]Köster & Rahmann,
   2018;
   [103]https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/),
   determine quality control metrics with FastQC
   ([104]www.bioinformatics.babraham.ac.uk/projects/fastqc/), map and
   quantify reads to the human GRCh38 transcriptome with Salmon
   ([105]Patro et al., 2017), and calculate differential gene expression
   with edgeR ([106]Robinson, McCarthy & Smyth, 2010) by comparing the CRC
   samples (case) to the native tissue samples (control). The Ensembl Gene
   IDs that were generated by the edgeR algorithm were converted to Entrez
   Gene IDs using a R-based application programming interface (API) to the
   BiomaRt database prior to pathway analysis ([107]Kasprzyk, 2011).
   Similarly, the enrichr pathway enrichment software only required a gene
   symbol, log[2] fold-change values, and FDR-adjusted p-values as input
   from the DEG list. The statistically significant differentially
   expressed genes (DEGs; FDR-corrected p-value < 0.05) were then
   subjected to signaling pathway analysis using the Signal Pathway Impact
   Analysis (SPIA) algorithm with 3,000 bootstrap replicates to generate a
   null distribution for each of over 2,000 public signaling pathways
   ([108]Tarca et al., 2009), as reported previously ([109]Scott, Jensen &
   Pickett, 2021; [110]Gray et al., 2022; [111]Moreno et al., 2022;
   [112]Rapier-Sharman, Clancy & Pickett, 2022; [113]Ferrarini et al.,
   2021; [114]Scott et al., 2022; [115]Gifford & Pickett, 2022). The lists
   of pathways were derived from publicly available versions of KEGG
   ([116]Aoki-Kinoshita & Kanehisa, 2007), Reactome ([117]Jassal et al.,
   2020), Pathway Interaction Database ([118]Schaefer et al., 2009),
   BioCarta, and Panther ([119]Mi et al., 2017).

Target data acquisition and integration

   The only input file required for the Pathway2Targets software was the
   tabular output file containing the significant signaling pathways
   (Bonferroni-corrected p-value < 0.05) generated by SPIA, although an
   output file from the enrichr algorithm would have also been compatible
   ([120]Xie et al., 2021a). The Pathway2Targets software then
   programmatically retrieved the gene products that were members of each
   significant pathway from the five pathway databases mentioned
   previously, and then obtained the UniProt protein identifiers for each
   Ensembl ID using the BiomaRt API ([121]Scott, Jensen & Pickett, 2021;
   [122]UniProt Consortium, 2019). A GraphQL query was automatically
   generated and submitted through the Open Targets Platform API to access
   the relevant drug and target information for each of the UniProt
   protein identifiers in each pathway ([123]Ochoa et al., 2021). These
   additional data for each target included the number of associated
   diseases, tractability, subcellular location, safety, number of unique
   drugs, number of signaling pathways, number of FDA-approved
   therapeutics, number of therapeutics in phase-three clinical trials,
   number of therapeutics in phase-two clinical trials, number of
   therapeutics in phase-one clinical trials, and number of therapeutics
   in phase-four clinical trials. This information was then automatically
   integrated with the significant results from the signaling pathway
   enrichment analysis described above in a single table to facilitate
   downstream target scoring and prioritization.

Target weighting factors

   A logical and customizable weighting scheme was constructed in the
   algorithm that would compile and analyze the data for all existing
   therapeutics for each pathway member to facilitate target
   prioritization ([124]Table 1). The default weights for each target
   attribute were specifically chosen in a way that would prioritize
   targets present in multiple pathways, a high number of associated
   disease, and a higher number of therapeutics further along in clinical
   trials. However, these default weighted values could also be easily
   adjusted to customize the output based on individual prioritization
   preferences or desired outcome. As a proof of concept for adjusting the