Abstract Background Recent efforts to repurpose existing drugs to different indications have been accompanied by a number of computational methods, which incorporate protein-protein interaction networks and signaling pathways, to aid with prioritizing existing targets and/or drugs. However, many of these existing methods are focused on integrating additional data that are only available for a small subset of diseases or conditions. Methods We have designed and implemented a new R-based open-source target prioritization and repurposing method that integrates both canonical intracellular signaling information from five public pathway databases and target information from public sources including [34]OpenTargets.org. The Pathway2Targets algorithm takes a list of significant pathways as input, then retrieves and integrates public data for all targets within those pathways for a given condition. It also incorporates a weighting scheme that is customizable by the user to support a variety of use cases including target prioritization, drug repurposing, and identifying novel targets that are biologically relevant for a different indication. Results As a proof of concept, we applied this algorithm to a public colorectal cancer RNA-sequencing dataset with 144 case and control samples. Our analysis identified 430 targets and ~700 unique drugs based on differential gene expression and signaling pathway enrichment. We found that our highest-ranked predicted targets were significantly enriched in targets with FDA-approved therapeutics for colorectal cancer (p-value < 0.025) that included EGFR, VEGFA, and PTGS2. Interestingly, there was no statistically significant enrichment of targets for other cancers in this same list suggesting high specificity of the results. We also adjusted the weighting scheme to prioritize more novel targets for CRC. This second analysis revealed epidermal growth factor receptor (EGFR), phosphoinositide-3-kinase (PI3K), and two mitogen-activated protein kinases (MAPK14 and MAPK3). These observations suggest that our open-source method with a customizable weighting scheme can accurately prioritize targets that are specific and relevant to the disease or condition of interest, as well as targets that are at earlier stages of development. We anticipate that this method will complement other approaches to repurpose drugs for a variety of indications, which can contribute to the improvement of the quality of life and overall health of such patients. Keywords: Drug repurposing, Drug targets, Target prioritization, Bioinformatics, Colorectal cancer, Target, Pathways, Prediction Introduction Substantial effort and resources have been devoted to identifying therapeutic treatments for many human diseases and conditions. The maladies could be caused by autoimmunity, uncontrolled cell growth, genetics, infection, and other acute or chronic ailments. Since moving a candidate treatment through the process of approval by the US Food and Drug Administration (FDA) is risky ([35]Zhong et al., 2018), often taking many years, and requiring a substantial financial investment; researchers have expanded their development efforts to drug repurposing ([36]Hernandez et al., 2017; [37]Parvathaneni et al., 2019). Traditional methods of drug discovery have involved using low- or high-throughput screens to identify inhibitors or activators of a given target ([38]Thakur et al., 2021; [39]Olgen, 2019; [40]Glanz et al., 2020). Hits that are identified in these screens are generally optimized prior to subsequent testing in cell culture, animal models, and clinical trials ([41]Thakur et al., 2021). Alternatively, using a pathway-based approach to drug discovery involves performing experiments to better understand the underlying mechanism(s) of a given condition, and to identify relevant targets ([42]Deng et al., 2020; [43]Wang et al., 2020b; [44]Damale et al., 2020; [45]Chatterjee et al., 2022; [46]Liu et al., 2021; [47]Ren et al., 2010). Past studies have shown that incorporating a signaling pathway approach can successfully identify proteins that can be targeted with therapeutics having sufficient efficacy and safety to warrant approval by regulators ([48]Ding et al., 2020; [49]Khojasteh Poor et al., 2021; [50]Choi et al., 2020; [51]Proctor, Thompson & O’Bryant, 2014). Drug repurposing is the process of getting regulatory approval for applying an existing therapeutic to a separate disease or condition (i.e., indication) ([52]Ding et al., 2020; [53]Zali et al., 2019; [54]Harb, Lin & Hao, 2019). The benefits of this approach include a potentially shorter time to approval since the therapeutic has already been deemed as “safe” by government regulatory agencies. Early repurposing efforts were focused on identifying symptom similarities or using known side-effects from patients with other conditions to treat a separate condition ([55]Kingsmore, Grammer & Lipsky, 2020; [56]Ballard et al., 2020). Subsequent advances in understanding intracellular signaling mechanisms enabled a transition to more complex analyses that identify a candidate therapeutic for repurposing, and to develop novel therapeutics towards known targets ([57]Schein, 2020). This is evidenced by the wide variety of drug and target discovery tools that have already been reported ([58]Paananen & Fortino, 2020; [59]Sleno & Emili, 2008; [60]Huang et al., 2020). The majority of these modern methods take advantage of protein-protein interaction networks ([61]Ma et al., 2019; [62]Ozdemir et al., 2019; [63]Cheng et al., 2019), gene sets ([64]Masoudi-Sobhanzadeh et al., 2020; [65]Tanoli, Vähä-Koskela & Aittokallio, 2021), and/or signaling pathways ([66]Jain et al., 2021) for repurposing-based drug- and target prioritization efforts. Some methods combine one or more of these methods with artificial intelligence to further improve the pace of drug discovery ([67]Tanoli, Vähä-Koskela & Aittokallio, 2021; [68]Anderson et al., 2020; [69]Paul et al., 2021; [70]Gupta et al., 2021). Even with such recent advances, many prioritization algorithms rely on public or proprietary protein network data ([71]Emig et al., 2013; [72]Huang et al., 2014; [73]Louhimo et al., 2016; [74]Li & Lu, 2013; [75]Isik et al., 2015; [76]Carrella et al., 2014; [77]Setoain et al., 2015; [78]Duan et al., 2016; [79]Barrio-Hernandez et al., 2023; [80]Fang et al., 2019; [81]Lee et al., 2011; [82]Greene et al., 2015; [83]Huang et al., 2018), with some algorithms focusing on a particular set of diseases or conditions ([84]Crowther et al., 2010; [85]Xu, Kong & Hu, 2021; [86]Dezső & Ceccarelli, 2020; [87]Fiscon et al., 2021; [88]Chen & Xu, 2016; [89]Regan-Fendt et al., 2020). Drug repurposing and target prioritization algorithms generally apply a consistent set of parameters, which are often specific to a given indication. Such specialization makes it difficult to effectively and adequately support the efforts of researchers working in other disease areas ([90]Sharma et al., 2021; [91]Begley et al., 2021). Given the specialization that is prevalent among many repurposing tools, the aim of the current study was to incorporate a novel, flexible, and customizable open-source target prioritization method into the Pathway2Targets algorithm, which would increase the number of supported use cases. This updated algorithm retrieves additional target information, clinical trial data, automatically fetches the reactome pathway diagrams for the signaling pathways with the highest number of targets, and accepts reactome pathway enrichments generated by the enrichr algorithm ([92]Xie et al., 2021a). This additional data and prioritization method are used by the updated algorithm to generate ranked lists of targets and therapeutics that can be applicable to multiple use cases ([93]Scott, Jensen & Pickett, 2021; [94]Gray et al., 2022; [95]Moreno et al., 2022; [96]Rapier-Sharman, Clancy & Pickett, 2022). The entities in these lists can then be evaluated as candidates for condition-specific repurposing efforts based solely on the unique signaling pathway “profile” for the disease/condition of interest. Method GEO query for transcriptomics data The Gene Expression Omnibus database was queried for a well-controlled bulk transcriptomics human colorectal cancer dataset with a sufficiently high number of samples ([97]GSE156451) to enable confident downstream repurposing analysis ([98]Sayers et al., 2021). The paired-end fastq files for this publicly available study were then downloaded from the Sequence Read Archive (SRA), a database within the National Center for Biotechnology Information ([99]Sayers et al., 2021). This study consisted of 144 samples, with 72 from tumors in patients with colorectal cancer (CRC) and the other 72 from native human tissue ([100]Li et al., 2021). Transcriptomic preprocessing and analysis The 144 public colorectal cancer transcriptomics samples were preprocessed using the Automated Reproducible MOdular Workflow for Preprocessing and Differential Analysis of RNA-seq Data (ARMOR) software ([101]Orjuela et al., 2019). Briefly, this open-source Snakemake-based workflow was used to perform read trimming on the RNA-sequencing fastq files with TrimGalore ([102]Köster & Rahmann, 2018; [103]https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), determine quality control metrics with FastQC ([104]www.bioinformatics.babraham.ac.uk/projects/fastqc/), map and quantify reads to the human GRCh38 transcriptome with Salmon ([105]Patro et al., 2017), and calculate differential gene expression with edgeR ([106]Robinson, McCarthy & Smyth, 2010) by comparing the CRC samples (case) to the native tissue samples (control). The Ensembl Gene IDs that were generated by the edgeR algorithm were converted to Entrez Gene IDs using a R-based application programming interface (API) to the BiomaRt database prior to pathway analysis ([107]Kasprzyk, 2011). Similarly, the enrichr pathway enrichment software only required a gene symbol, log[2] fold-change values, and FDR-adjusted p-values as input from the DEG list. The statistically significant differentially expressed genes (DEGs; FDR-corrected p-value < 0.05) were then subjected to signaling pathway analysis using the Signal Pathway Impact Analysis (SPIA) algorithm with 3,000 bootstrap replicates to generate a null distribution for each of over 2,000 public signaling pathways ([108]Tarca et al., 2009), as reported previously ([109]Scott, Jensen & Pickett, 2021; [110]Gray et al., 2022; [111]Moreno et al., 2022; [112]Rapier-Sharman, Clancy & Pickett, 2022; [113]Ferrarini et al., 2021; [114]Scott et al., 2022; [115]Gifford & Pickett, 2022). The lists of pathways were derived from publicly available versions of KEGG ([116]Aoki-Kinoshita & Kanehisa, 2007), Reactome ([117]Jassal et al., 2020), Pathway Interaction Database ([118]Schaefer et al., 2009), BioCarta, and Panther ([119]Mi et al., 2017). Target data acquisition and integration The only input file required for the Pathway2Targets software was the tabular output file containing the significant signaling pathways (Bonferroni-corrected p-value < 0.05) generated by SPIA, although an output file from the enrichr algorithm would have also been compatible ([120]Xie et al., 2021a). The Pathway2Targets software then programmatically retrieved the gene products that were members of each significant pathway from the five pathway databases mentioned previously, and then obtained the UniProt protein identifiers for each Ensembl ID using the BiomaRt API ([121]Scott, Jensen & Pickett, 2021; [122]UniProt Consortium, 2019). A GraphQL query was automatically generated and submitted through the Open Targets Platform API to access the relevant drug and target information for each of the UniProt protein identifiers in each pathway ([123]Ochoa et al., 2021). These additional data for each target included the number of associated diseases, tractability, subcellular location, safety, number of unique drugs, number of signaling pathways, number of FDA-approved therapeutics, number of therapeutics in phase-three clinical trials, number of therapeutics in phase-two clinical trials, number of therapeutics in phase-one clinical trials, and number of therapeutics in phase-four clinical trials. This information was then automatically integrated with the significant results from the signaling pathway enrichment analysis described above in a single table to facilitate downstream target scoring and prioritization. Target weighting factors A logical and customizable weighting scheme was constructed in the algorithm that would compile and analyze the data for all existing therapeutics for each pathway member to facilitate target prioritization ([124]Table 1). The default weights for each target attribute were specifically chosen in a way that would prioritize targets present in multiple pathways, a high number of associated disease, and a higher number of therapeutics further along in clinical trials. However, these default weighted values could also be easily adjusted to customize the output based on individual prioritization preferences or desired outcome. As a proof of concept for adjusting the