Abstract Correctly identifying perturbed biological pathways is a critical step in uncovering basic disease mechanisms and developing much-needed therapeutic strategies. However, whether current tools are optimal for unbiased discovery of relevant pathways remains unclear. Here, we create “Benchmark” to critically evaluate existing tools and find that most function sub-optimally. We thus develop the “Pathway Ensemble Tool” (PET), which outperforms existing methods. Deploying PET, we identify prognostic pathways across 12 cancer types. PET-identified prognostic pathways offer additional insights, with genes within these pathways serving as reliable biomarkers for clinical outcomes. Additionally, normalizing these pathways using drug repurposing strategies represents therapeutic opportunities. For example, the top predicted repurposed drug for bladder cancer, a CDK2/9 inhibitor, represses cell growth in vitro and in vivo. We anticipate that using Benchmark and PET for unbiased pathway discovery will offer additional insights into disease mechanisms across a spectrum of diseases, enabling biomarker discovery and therapeutic strategies. Subject terms: Cancer genomics, Tumour biomarkers, Computational biology and bioinformatics, Prognostic markers, Drug discovery __________________________________________________________________ Multiple cellular pathways are altered in cancer and identifying them is relevant for prognosis and therapy. Here, the authors develop Benchmark and Pathway Ensemble Tool (PET), two computational approaches to optimise pathway discovery in cancer and predict related biomarkers and therapeutic avenues. Introduction The pathogenesis of complex diseases is often underpinned by dysregulation of several cellular pathways. This is well exemplified by neoplastic diseases, where abnormalities in multiple pathways contribute to cancerous transformation or in which cancer-disposing events perturb multiple cellular pathways. Despite notable successes in some cancers, limited therapeutic gains have been made by identification and targeting of single contributory genes. Precisely pinpointing key dysregulated celluar pathways in cancer and ranking them by order of importance is, therefore, of major biological and clinical significance. Doing so would enable molecular subtyping, determination of diagnostic/prognostic biomarkers and inform choice of effective and cancer-specific drug regimens. This is especially important at the present time because the incidence of cancer is increasing^[78]1. The contemporary approach to elucidate aberrations in biological pathways is dominated by omics methods, such as RNA-sequencing (RNA-seq). As expected, vast amounts of omics data have been generated for most cancers. However, unlocking their full potential has remained challenging. Biological pathway analysis, a computational search to identify and rank by importance the key relevant and/or dysregulated pathways among the thousands of known potential biological pathways, is a critical means for deciphering these data. Numerous in silico methods (reviewed here^[79]2,[80]3), for example GSEA^[81]4 and Enrichr^[82]5, have been developed and broadly applied to conduct pathway analyses. One common application of these methods has been to test whether specific pathways hypothesized by researchers are, in fact, enriched in a dataset. In this approach, pathway enrichment analyses have generally proven successful, but pre-suppose a priori knowledge or hypothesis. An alternative, highly desired application, especially in the context of exponentially growing omics datasets, is the adoption of unbiased discovery-based approaches. Here, the objective is to rank the “correct pathways” above all other pathways in the reported results without imparting bias from a priori knowledge. However, the performance and suitability of existing pathway analysis methods for this form of blinded or unbiased discovery remains largely unknown. In other words, it is not known how often the top pathway(s) returned by an analysis method represent(s) the most important perturbation. This is likely due to the fact that most methods were developed and evaluated (“benchmarked”) using computer simulated pathways^[83]6 or verified only in very small scale biological datasets from historical, low-resolution microarray technology^[84]2,[85]7–[86]9. Therefore, a key limitation is lack of proper benchmarking and assessment of the capacity of these tools for pathway discovery under experimental settings, which often contain substantial “noise” due to stochastic events and biological variations. Such lack of a suitable evaluation platform predicated on large-scale experimental data significantly impedes informed deployment of pathway analysis tools for blinded discovery of perturbations in cancer. Here, we designed an extensive “Benchmark” to explicitly measure the performance of many commonly used pathway analysis tools for unbiased discovery under experimental settings. Benchmark was constructed from large and diverse high throughput experimental data generated by ENCODE^[87]10, such that expected pathways are well-defined. Using Benchmark, we found that commonly used pathway analysis tools consistently underperformed in unbiased pathway discovery, even after extensive optimization. Given these findings, we next developed another method for biological pathway discovery, termed Pathway Ensemble Tool (PET), which statistically combines the rank metrics from multiple input methods. PET significantly outperformed all existing tools for unbiased identification of dysregulated pathways with high accuracy and was resistant to biological noise. Importantly, we applied PET to systematically determine biological pathways associated with favorable or unfavorable prognosis in 12 distinct cancer types. Combinations of leading genes from PET-identified prognostic pathways offered superior prognostication power compared to existing biomarkers, dividing patients into highly resilient and highly vulnerable categories. As a way of validation, these findings were confirmed in independent cancer datasets and found to be consistent with established aggressive molecular subtypes. We next carried out computational screening for repurposing existing drugs to normalize genes from PET-derived prognostic pathways for each cancer type. This was done by specifically looking for drugs that can down-regulate the leading genes of pathways associated with unfavorable prognosis or up-regulate the leading genes of pathways associated with favorable prognosis. This resulted in the identification of both established effective drugs and additional drug candidates. To explore these, we validated the top predicted drug, CCT068127, a CDK2/9 inhibitor, for bladder and cervical cancer in vitro and in vivo settings. We confirmed significant repression of cancer growth, which was superior to that seen with known clinically efficacious drugs in this category. We demonstrated that the predicted drug indeed restricts cancer cells by normalizing the expression of genes belonging to predicted prognostic pathways. We anticipate that Benchmark will enable development of highly accurate, unbiased pathway discovery methods such as PET, which can lead to tangible insights into molecular mechanisms and identify additional diagnostic/prognostic biomarkers and therapeutics. Results Benchmark enables identification of optimal settings and best practices for systematic pathway discovery in experimental datasets We developed Benchmark as an evaluation platform to explicitly evaluate performance of pathway/geneset analysis tools (Fig. [88]1a, b; please note that the terms ‘pathway’ and ‘geneset’ are used interchangeably throughout the manuscript). Briefly, Benchmark comprises three components: 1) “input genesets” (IGS), representing genesets to be investigated – these are most commonly user lists of genes derived from high throughput assays, such as RNA-seq or Chromatin Immunoprecipitation sequencing (ChIP-seq) performed during projects of interest; 2) “target genesets” (TGS), representing curated biological pathways (e.g. in the KEGG and Gene Ontology databases) to be examined for their relationship to the IGS; 3) pathway enrichment analysis tools which test statistical relationships between IGS and TGS (Fig. [89]1a). The genesets used in Benchmark were extracted from large numbers (~1000) of high-throughput sequencing experimental assays from ENCODE^[90]10 (Supplementary Data [91]1a). Every geneset in Benchmark was composed of genes bound in ChIP-seq by a defined transcription factor (TF), in eCLIP-seq by a defined RNA binding protein (RBP), or in RNA-seq as genes differentially expressed after small interfering or hairpin RNA-mediated knockdown of a specified target (gKD). For each TF, RBP or gKD, at least two genesets from distinct cell lines (e.g. K562 and HepG2) or species (e.g. human and mouse) were represented. Importantly, Benchmark was designed to simulate a common scenario where the investigated genesets (i.e. IGSs) are derived from different sources (e.g. cell lines, species, conditions) than established biological pathways (i.e. TGSs). To achieve this, Benchmark was created such that genesets from the same TF, RBP or gKD but from distinct cell lines are better matched to each other than any other genesets. As an example, the IGS that contains STAT1-bound genes in K562 cells is more closely related to the TGS that contains STAT1-bound genes in HepG2 cells than any other TGS. Fig. 1. Benchmark for evaluating and optimizing existing pathway analysis methods. [92]Fig. 1 [93]Open in a new tab a Schematic representation of Benchmark for each pathway analysis method. For every input geneset (IGSi), the outputs, of pathway enrichment analysis against all target genesets (TGSi) are recorded in the score matrix M. Benchmark is created such that TGSi is the most related geneset to IGSi compared to all other TGSs. Thus, the rank of diagonal entries in M represent how well the correct genesets are scored for each IGS. The best score is 1. b, c Cumulative distribution function (CDF) plots of correct pathway ranks comparing indicated pathway analysis methods on shRNA-seq genesets. Median ranks for correct pathways, the fraction of correct pathways among top 10 reported pathways (precision@10/P@10, and average precision@10/AP@10) are shown. d Plots showing the correct pathway ranks of the indicated methods in shRNA-seq genesets. Boxes span from the first to the third quartile, with the median highlighted, and whiskers covering the minimum to maximum values. The precision statistics of each method at finding the correct pathway amongst reported pathways are reported below each bar. Target genesets in b–d were defined as the top 200 upregulated genes by DESeq2 in K562 cells; input genesets in b–d for default and optimized settings were extracted using signal2noise (default setting of GSEA) and DESeq2 in HepG2 cells, respectively. e Plots showing the performance of multiple methods as increasing percentage of unrelated random genes (“noise”) are introduced to target genesets. This evaluates the robustness of methods to inaccuracies (or noise) in target genesets. Lines and error bars show median +95% confidence interval at each noise level. *p < 0.05; **p < 0.01; ****p < 0.0001 by two-sided Wilcoxon rank sum test corrected for multiple hypothesis. ora! and ora are essentially identical, with the only difference being that they are implemented using different programming languages. Specifically, ora! is implemented in R by egsea, while ora is implemented in Python by PET. Source data are provided as a Source Data file. Consequently, for each IGS, an ideal pathway analysis method should identify the most closely matched (i.e. “correct”) TGS and to rank it as number one compared to all other TGSs. Thus, in Benchmark, the enrichment of TGS[1 to n] is calculated for each IGS using the user’s pathway analysis method of choice to generate a score matrix, from which the rank of correct pathways can be determined for that method (Fig. [94]1a). Thus, by evaluating how accurately pathway analysis methods rank matched IGSs and TGSs compared to all other TGSs, Benchmark can assess the performance of different methods in identifying and ranking correct biological pathways in experimental settings. Using Benchmark, we first evaluated the performance of different pathway analysis tools, namely, decoupler^[95]11, piano^[96]12, egsea^[97]6, ssgsea^[98]4, camera^[99]13, ora^[100]6, enrichment browser^[101]14 (e.browser), zscore^[102]15, gsva^[103]16, padog^[104]17, roast^[105]18, plage^[106]19, gage^[107]20 and safe^[108]21. We compared three statistics for all pathway analysis methods. First, we determined the median rank of the correct pathway returned by each method. As experimental biologists typically focus on the top 10 reported pathways, we also determined the frequency by which the correct pathway is among the top 10 reported pathways, termed “Precision@10” (P@10). We also adopted another reliable performance metric commonly used in information retrieval, known as Average Precision at 10 (AP@10). AP@10 is the average (mean) of the precision scores at each of the first 10 positions, i.e. P@1, P@2,…, P@10. We tested four different settings for how the genes can be ranked and genesets can be generated in these methods (i.e. ratio of classes or signal2noise or edgeR- or DESeq2-based ranking (Fig. [109]1b, Supplementary Fig. [110]S1a). The genes generated from matched IGSs and TGSs using these settings exhibited a significantly higher degree of overlap compared to randomly generated control IGSs and TGSs (Supplementary Fig. [111]S1b). The top three methods, decoupler^[112]11, piano^[113]12, egsea^[114]6, notably all ensemble approaches, achieved a median correct pathway rank of 1–8, P@10 of 52–76%, and AP@10 of 44–69%, under the best setting (Fig. [115]1b). This creates a scenario where pathways with significant biological importance are regularly not among the top 10 reported pathways, hindering the utility of these methods for unbiased discovery. We next focused on evaluating the performance of the three most used individual pathway analysis methods ora^[116]6, GSEA^[117]4 and Enrichr^[118]5 in more detail. We tested the four different settings described above (Supplementary Fig. [119]S1c) as well as 3 different options for sizes of genesets (i.e. 200, 500 and 1000 genes) (Supplementary Fig. [120]S1d). Specifically, under the default settings (i.e. signal2noise ranking), the median ranks of the correct pathway (7–14) and P@10 (45–54%) values were similar to the top three performing methods above (Fig. [121]1c, Supplementary Fig. [122]S1c), indicating that ora, GSEA and Enrichr are not ideal for unbiased pathway discovery under default settings. We found that optimization of input parameters did significantly improve pathway discovery - a geneset size of 200 genes was near optimal for both GSEA and Enrichr (Supplementary Fig. [123]S1d) and using DESeq2-ranked lists of differentially expressed genes returned the correct pathway at a median rank of 3 and 63–68% of the time among the top 10 reported pathways (Fig. [124]1c, Supplementary Fig. [125]S1c). Similar improvement could be achieved when selecting the optimal binding intensity statistics in ChIP-seq or eCLIP-seq. For example, using the signal value of binding in ChIP-seq data to rank genes and create genesets usually leads to an improved pathway discovery performance compared to utilizing adjusted p-values (or q-values) of binding signals (Supplementary Fig. [126]S1e). It is worth noting that the overlap size of matched IGSs and TGSs generated from ChIP-seq and eCLIP-seq data was significantly larger compared to that of shRNA-seq data (Supplementary Fig. [127]S1f), which could explain the overall trend of better performance observed in pathway analysis methods on these genesets. Collectively, these data indicate that Benchmark can be used for evaluating existing and future pathway analysis methods for discovery. Notably, although the most widely used individual methods can be optimized, they remain sub-optimal for unbiased pathway discovery, and ensemble approaches typically outperform these individual methods. Pathway Ensemble Tool (PET) enables unbiased pathway discovery We next sought to develop a high precision method to enable unbiased systematic pathway discovery. Leveraging the accuracy and robustness of ensemble approaches, we created Pathway Ensemble Tool (PET). PET works by calculating the unweighted mean of the rank statistics from GSEA, ora, and Enrichr, all operating under the optimal settings determined above (see Methods). We tested the accuracy of PET using Benchmark. PET significantly improved the median rank of the correct genesets to 1, indicating that PET tends to rank the correct pathway as number 1 (Fig. [128]1d). PET also significantly enhanced P@10 and AP@10 statistics, with 73–82% of the time the correct pathways being among the top 10 pathways reported by PET (Fig. [129]1d). Pathway analysis tools are expected to conform to the false discovery rate (FDR) estimates, meaning that at 5% FDR levels, 5% of reported pathways are expected to be false. To determine the actual false discovery rates of pathway analysis methods, we randomly sampled and divided replicated RNA-seq datasets into two groups and ran pathway analysis methods comparing the two groups for enrichment of biological pathways (see Methods). Since the samples were randomly selected biological replicates, any reported pathways significantly different between the groups represent false discoveries. Surprisingly, we found that some of the pathway analysis methods had much higher than the expected false discovery rate, in some cases as high as 25% (Supplementary Fig. [130]S1g, h). In fact, specific pathways, such as the KEGG Ribosome pathway, were common false discoveries (Supplementary Fig. [131]S1i and Supplementary Data [132]1b), suggesting that caution should be exercised when these pathways are identified as significant during unbiased discovery approaches. However, we found that PET operated at the expected 5% false discovery rate (Supplementary Fig. [133]S1g, h). To further validate PET reported results and ensure its ability to discern pathways associated with alterations that drive differences in transcriptional programs within groups, we applied PET to compare fibroblast growth factor receptor 3 (FGFR3) mutant vs. wild-type (WT) bladder cancer cell lines, estrogen receptor (ER)^+ vs. ER^– breast cancer cell lines, and epidermal growth factor receptor (EGFR) mutant vs. WT lung cancer cell lines, available in CCLE^[134]22 (Supplementary Data [135]1c), assessing all canonical pathways. In all three comparisons, the top reported pathways were supported by the literature (Supplementary Fig. [136]S2). For instance, the top two enriched pathways in ER^+ breast cancer cell lines were “Estrogen dependent gene expression” and “Estrogen receptors (ESR)-mediated signaling”, which corresponds with the established understanding of estrogen receptor signaling in ER^+ breast cancer^[137]23,[138]24. In EGFR mutant lung cancer cell lines, the top two enriched pathways were “Surfactant metabolism” and “Integrin5 pathway”, which are both recognized for their cross-talk with EGF signaling in lung cancer^[139]25–[140]29. Moreover, the “MYC activity pathway” and its regulated “Ribosomal RNA processing” pathway were among the top enriched pathways in FGFR3 mutant bladder cancer cell lines, aligning with known interactions between FGFR3 and MYC in bladder cancer, as supported by existing literature^[141]30–[142]34 (Supplementary Fig. [143]S2). These data demonstrate that PET is able to discern subtle yet significant variations in pathway activities across different genetic backgrounds. Because most pathways are defined under real experimental conditions, there are multiple variables, such as cell type, species and stimulation, that contribute a degree of differences or inaccuracies in genesets. This “noise” manifests as missing or unrelated genes among matched IGSs and TGSs. To test the robustness of common pathway analysis tools, we evaluated their performance under conditions of simulated noise by replacing genes from the target genesets with increasing proportions of unrelated genes selected randomly from the genome. We found most available pathway analysis methods to be highly sensitive to such differences as their performance significantly dropped after >30% of unrelated random genes were introduced into existing genesets (Fig. [144]1e). Importantly, however, PET was highly robust to even as high as 60% added noise (Fig. [145]1e). Collectively, these results indicate that PET is robust, has controlled false discovery rate and high precision, thus can enable reliable and systematic identification of key dysregulated pathways in omics datasets. PET identifies prognostic pathways in 12 distinct cancer types We next sought to deploy PET for unbiased pathway discovery in cancer. We selected 12 distinct cancer types from the cancer genome atlas (TCGA) that had cancer tissue RNA-seq data from the earliest stages of disease (based on the pathological stage or primary tumor size/extent) in at least 20 subjects that had unfavorable prognosis (i.e. succumbed to death) during the follow-up period (Fig. [146]2a; Supplementary Data [147]2a, b). We used PET to identify the pathways enriched in samples from patients with favorable (survivors) or unfavorable (deceased) prognosis for each cancer type, since these pathways could act as potential biomarkers of outcomes. To ensure that age, sex and cancer stage were not confounders, patients with favorable prognosis were randomly matched with those with unfavorable prognosis (see Methods). We confirmed that the groups were matched in age and sex and that age and sex were not determinants of outcome (Supplementary Fig. [148]S3). Since tumor infiltrating lymphocytes, classically depicted by CD8-expression, are known determinants of outcome, we also noted that CD8 expression did not differ between subjects with favorable and unfavorable outcomes and that CD8 expression could not distinguish survivors from deceased subjects (Supplementary Fig. [149]S4a). We next examined >1000 canonical biological pathways curated in MSigDB^[150]4 that could potentially differentiate cancer patients that survived from those that did not using PET. Out of these pathways, PET identified tens of biological pathways significantly associated with favorable or unfavorable prognosis for each cancer type (Fig. [151]2b, Supplementary Data [152]2c). Examples of the top favorable and unfavorable prognostic pathways in each cancer types are shown in Supplementary Fig. [153]S4b and Fig. [154]2c, respectively. As proof-of-principle, we validated the reported pathways associated with unfavorable outcome in bladder cancer in three independent cohorts^[155]35–[156]37 (Supplementary Fig. [157]S4c). Some pathways were significantly dysregulated in multiple different cancers (e.g. cell cycle in bladder carcinoma (BLCA), cervical squamous cell carcinoma (CESC), kidney renal papillary cell carcinoma (KIRP) and pancreatic adenocarcinoma (PAAD)) whereas others were cancer-specific (e.g. glycerolipid metabolism in breast invasive carcinoma (BRCA) and arginine and proline metabolism in KIRP. Some of these pathways have been shown to have biological significance in previous studies but have not yet been formally recognized as prognostic markers. For example, we identified triacylglyceride synthesis as a significant poor prognostic pathway in breast carcinoma (Fig. [158]2c). This is consistent with fatty acid synthesis as a requisite for breast cancer metastasis to brain^[159]38 and association of upregulated cellular triacylglycerol with enhanced survival of human breast cancer cells^[160]39. Another identified poor prognostic pathway that was shared between multiple cancer types including bladder and pancreatic cancers was extracellular matrix (ECM) organization (Fig. [161]2c). This is in line with prior knowledge about the role of ECM components, such as laminin, collagen and fibronectins in invasion, progression, and metastasis of bladder and pancreatic cancers and restriction of antitumor drug delivery^[162]40,[163]41. Similarly, we found that Vitamin D (VitD) receptor pathway to be a strong favorable prognostic pathway in stomach cancer (Supplementary Fig. [164]S4b). This is congruent with the fact that in gastric cancer patients, VitD levels serve as an independent prognostic factor and VitD supplementation can reduce the risk of developing gastric cancer^[165]42,[166]43. Fig. 2. PET identifies pathways and gene combinations associated with unfavorable prognosis. [167]Fig. 2 [168]Open in a new tab a Cancers used in this study, along with tissue type, pathological stage (roman numerals) or pathological primary tumor TNM T score (e.g. T1, T2, T3) and number of samples. BLCA bladder carcinoma, BRCA Breast invasive carcinoma, CESC Cervical squamous cell carcinoma, COAD Colon adenocarcinoma, HNSC Head and Neck squamous cell carcinoma, KIRC Kidney renal clear-cell carcinoma, KIRP Kidney renal papillary cell carcinoma, LIHC Liver hepatocellular carcinoma, LAUD Lung adenocarcinoma, LUSC Lung squamous cell carcinoma, PAAD Pancreatic adenocarcinoma, STAD Stomach adenocarcinoma. To adjust for confounders, samples from later stages of diseases have also been included in cancers denoted by #. b Plot showing the number of significantly enriched pathways identified using PET among >1000 MSigDB canonical pathways in patients with favorable (blue) or unfavorable (red) prognosis in the indicated cancers. c Top pathways from each cancer associated with unfavorable survival. FDR values are from PET. d Area under the curve (AUC) distribution of indicated sets used to predict deceased status in each cancer type. Leading genes were selected from top 20 significant pathways associated with unfavorable prognosis. DEGs, differentially expressed genes; error bars show mean ± s.d. AUCs of leading gene combinations are higher than all genes and the AUCs of 5-gene combinations are higher than individual leading genes (Wilcoxon rank sum test p-value < 1e−9) (Supplementary Data [169]2d). e Elbow plots showing significant AUCs (low to high) from all 1–5 leading gene combinations. Statistics were based on FDR-adjusted logrank test of Kaplan–Meier (KM) survival curves. BRCA and LIHC did not yield any significant unfavorable predictors. KM plots showing of the top combination biomarker (darker lines in f), the top DEG (lighter lines in f) of unfavorable survival, PAI-1 expression (orange lines in g) and uPA expression (blue lines in g) to stratify overall survival. Samples in f and g were evenly split into two groups according to z-score/expression values of the indicated biomarker. Hazard ratio (HR) and logrank test p-values are reported. *p < 0.01; **p < 0.001; ***p < 0.0001. See Supplementary Fig. [170]S4 for information on favorable biomarkers. Source data are provided as a Source Data file. PET-identified biological pathways guide combination biomarker discovery We next explored the genes in significant prognostic pathways to identify a subset of genes with high prognostic power that might have clinical utility. Single gene expression values have provided some level of prognostic power for different cancers^[171]44,[172]45. However, expression of multiple biomarkers are consistently more accurate diagnostic/prognostic tools to stratify patients and improve cancer management^[173]46,[174]47. This has fueled a growth in studies identifying gene signatures associated with response to therapy and/or survival. However, these signatures are typically derived from lists of differentially expressed genes and characteristically consist of hundreds of genes, making clinical application of these large signatures challenging^[175]48. The alternative brute force strategy, which involves iteratively uncovering combinations of genes with prognostic power among ~20,000 protein-coding genes in the genome, is computationally challenging (or impossible) and statistically unfeasible. For example, an exhaustive search space for a 3-gene combination has 8 trillion permutations, which is statistically unfeasible given the number of cancer samples available for study. We postulated that limiting the search space of combinatorial biomarker discovery to leading genes from prognostic pathways might alleviate this problem. We therefore selected up to 20 of the most significant prognostic pathways reported by PET and extracted their leading genes from the corresponding GSEA results. We then generated all possible 1–5 gene combinations from the leading genes and quantified prognostic power by calculating the area under the curve (AUC) statistic of the receiver operating characteristic (ROC) curve for the ability of a gene or gene combinations to distinguish survivors from deceased subjects (see Methods). Combinations of 3–5 leading genes from prognostic pathways were found to be the best predictor of prognosis. Notably, none of the individual gene biomarkers displayed a higher AUC than the combinations identified. These are shown in Fig. [176]2d, e, Supplementary Fig. [177]S4d, e and Supplementary Data [178]2d, e, together with their corresponding overall survival plots (Fig. [179]2f, Supplementary Fig. [180]S4f). As an example, high expression of two collagen genes (COL5A1, COL3A1) and vitronectin (VTN) provides a significantly unfavorable overall survival outcome (<20% 10-year survival; HR 9.5; p < 0.0001) in renal papillary cancers (KIRP; Fig. [181]2e, f). The ability of this 3-gene combination to distinguish survivors from non-survivors over a prolonged period of follow-up is superior to any other biomarkers of renal cancer in current use^[182]49. Previous studies have linked the expression of individual collagen genes such as COL5A1 and COL23A1 to renal cancer^[183]50,[184]51. Collagen expression has also been associated with tumor grade^[185]52 and plays a key role in promoting tumor cell invasion^[186]53 in renal cancers. Furthermore, vitronectin is linked to inducing the differentiation of cancer stem cells and the formation of tumors^[187]54. Thus, it is biologically conceivable that high expression of this 3-gene combination biomarker provides the observed dramatically unfavorable overall survival outcome, surpassing the prognostic power of individual collagen genes alone. Moreover, the combination biomarkers outperformed any individual gene including differentially expressed genes (DEGs; Supplementary Data [188]2f) in all cancers, and no individual gene was found to be the most effective on its own (Fig. [189]2d, e, Supplementary Fig. [190]S4d, e). Illustrative examples are indicated in Fig. [191]2f and Supplementary Fig. [192]S4f, showing Kaplan–Meier overall survival curves of the top combination biomarker compared to the top DEG in each cancer type. As further illustration, we compared the performance of the PET-derived combination biomarkers (Fig. [193]2f) to that of PAI-1 (Plasminogen activator inhibitor 1) and uPA (Urokinase plasminogen activator) (Fig. [194]2g), because these two genes are frequently investigated as potential cancer prognostic markers^[195]55. PAI-1 is associated with poor prognosis in some cancer types, such as breast, colon, and lung cancer. uPA is a protease involved in the degradation of the extracellular matrix, and linked to tumor invasiveness, angiogenesis, and poor prognosis in some cancers, including breast, prostate, and ovarian cancers. We found that although elevated expression of PAI-1 in CESC, LUSC, PAAD, and STAD, and uPA in PAAD may have prognostic value (Fig. [196]2g), they are both inferior to the identified top combination biomarkers (Fig. [197]2f). Moreover, the identified combination biomarkers accurately predicted outcomes across independent cohorts of patients. For example, >70% of biomarkers in TCGA cohorts exhibited consistent favorable or unfavorable overall survival outcomes in several independent cohorts of bladder^[198]35–[199]37, breast^[200]56, pancreatic^[201]57, and stomach^[202]58,[203]59 cancers (Supplementary Fig. [204]S4g; Supplementary Data [205]2g). We also investigated the percentage of combination biomarkers that consistently exhibited predictive power in later-stage samples. We found that in certain cancer types (e.g. HNSC, PAAD), the early-stage biomarkers consistently predict outcomes even in later stages of cancer. However, in other cancer types (e.g. BLCA, KIRP), only a small fraction of the early-stage biomarkers demonstrate consistency in their predictive ability (Supplementary Data [206]2h, i). Mutations play an important role as biomarkers in clinical settings, therefore we sought to compare the common cancer mutations versus gene expression-based biomarkers for their power in determining overall survival outcome. To this end, we focused on the three most frequently mutated genes in cancer samples, BRAF, TP53 and KRAS, that often confer poor prognosis^[207]60–[208]62. For each TCGA cancer type with sufficient number of samples exhibiting mutations (n > 15), we split the samples into two groups based on their mutational status: wild-type samples without the mutation and samples containing mutant oncogenes. To assess whether these mutations could serve as indicators of survival, we conducted log-rank tests between the wild-type (WT) samples and mutated (mut) samples for each cancer type. Among all the cancer types examined, pancreatic cancer (PAAD) was the only one in which the presence of KRAS mutation served as a significant indicator of poorer overall survival. (Hazard ratio = 1.8; logrank p-value = 0.02) (Supplementary Fig. [209]S4h). Nevertheless, the KRAS mutation exhibited less discriminatory power compared to gene expression-based biomarkers in distinguishing between favorable (Supplementary Fig. [210]S4f) and unfavorable (Fig. [211]2f) survival outcomes. We also assessed the susceptibility of the identified biomarkers or conventional DEGs and their combinations to overfitting, a commonly encountered pitfall in the search for trustworthy biomarkers that leads to poor performance when tested on different samples. We found that combination biomarkers were significantly less prone to overfitting compared to those derived from conventional DEGs (Supplementary Fig. [212]S4g). This indicates that combination biomarkers have a better balance between complexity and robustness, which makes them more suitable for real-world applications. Collectively these data indicate that combination biomarkers identified using PET have great potential to stratify patient outcomes and perform better than existing known biomarkers. Identifying resilient and vulnerable cancer patients is crucial as it allows healthcare providers to personalize treatment plans, improve prognosis predictions, prioritize resources, and match patients with appropriate clinical trials. This leads to more effective and efficient care, better outcomes, and improved quality of life for patients. We thus considered whether combining the top biomarkers of favorable with the top biomarkers of unfavorable outcome would provide added prognostication power. To do this, we divided the samples into two groups based on expression of the top favorable combination biomarker and two groups based on expression of the top unfavorable combination biomarker. By overlapping these groups, patients can be further divided into subgroups according to simultaneous low and high expressions of the top biomarkers of favorable and unfavorable survival. The resilient category was defined as having high levels of the top favorable combination biomarker and low levels of the top unfavorable combination biomarker, while the vulnerable category was defined as having low levels of the top favorable combination biomarker and high levels of the top unfavorable combination biomarker. Indeed, this approach identified highly resilient and highly vulnerable cancer patients (Fig. [213]3a, Supplementary Fig. [214]S5a, Supplementary Data [215]3a). For example, resilient patients with papillary or clear cell kidney cancer (KIRP and KIRC) had nearly 90–100% survival, while vulnerable patients had only 10–20% overall survival over a prolonged period of follow-up (~4000 days) (Fig. [216]3b). This prognostication strategy also corroborated with established independent prognostic biomarkers in these patients^[217]63. Specifically, vulnerable patients have more than two-fold higher frequency of repressed CDKN2A expression via copy number loss or DNA methylation when compared to resilient patients (Fig. [218]3c, Supplementary Data [219]3b). The same approach in other cancers also yielded added benefit, because in almost all cancers combining the top biomarker of favorable and the top biomarker of unfavorable outcome provided highly accurate prognostication (Supplementary Fig. [220]S5a, Supplementary Data [221]3a). For instance, the 10-year overall survival rate for resilient patients with bladder cancer is nearly 9 times higher than that of vulnerable patients (HR: 4.6; p < 0.0001; Supplementary Fig. [222]S5a). Such accurate prognostication can have a major influence on clinical decision making and treatment course and can potentially help rationalize less or more aggressive individualized treatment for patients within different vulnerability. Fig. 3. Juxtaposition of favorable and unfavorable prognostic biomarkers offers precise overall survival stratification. Fig. 3 [223]Open in a new tab a Venn diagrams showing the distribution of patients with kidney renal papillary cell carcinoma (KIRP) (left) and kidney renal clear cell carcinoma KIRC (right) based on expression of the top favorable (labeled in Supplementary Fig. [224]S4e) and the top unfavorable (labeled in Fig. [225]2e) combination prognostic markers on RNA-seq from tumor samples. b Kaplan–Meier (KM) overall survival curves for 4 subgroups of KIRP or KIRC patients whose tumors express indicated combination of prognostic biomarkers. The four subgroups are highlighted with a bold font in a. Hazard ratio (HR) and logrank test p-values denote comparisons with the lines in red. ns not significant; *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001. c Plots showing the fraction of KIRP (left) and KIRC (right) tumors which have alterations (methylation or copy number loss) in CDKN2A^[226]63 in each group. Source data are provided as a Source Data file. The molecular subtype of tumors, particularly basal and luminal subtypes, is one of the factors implicated in predicting the outcome of bladder cancer. The basal subtype is associated with muscle invasiveness and worse overall prognosis but is more responsive to platinum-based chemotherapy^[227]64. To verify if identified prognostic markers corroborate with this known prognostic factor, we sourced three independent cohorts of bladder cancer samples^[228]35,[229]64,[230]65 that was annotated with the two major molecular subtypes. Indeed, we found that combination biomarkers of unfavorable prognosis were expressed at high levels in the basal molecular subtype across all three cohorts (Supplementary Fig. [231]S6a, Supplementary Data [232]3c). We had previously demonstrated using Benchmark that PET is highly robust to noise (Fig. [233]1e). To formally test this under real-world settings, we obtained RNA-seq datasets from a different species of mammal. We selected canine bladder cancers that included annotations for luminal and basal subtypes^[234]66 as these sub-types behave similarly to humans and are considered as having good and poor prognosis, respectively. The genes associated with poor prognosis in human bladder cancer effectively differentiated luminal and basal subtypes of canine bladder cancer as well (Supplementary Fig. [235]S6b), demonstrating the robustness of these biomarkers even across different species. Collectively, these data indicate that our biomarker discovery approach can identify combination prognostic biomarkers with significant clinical value and to stratify molecular subtypes of cancer with differential prognosis. PET-identified biological pathways guide drug repurposing Since prognostic markers are derived from dysregulated pathways, we postulated that manipulating gene expression to promote the biomarkers of favorable prognosis and demote the biomarkers of poor prognosis by re-purposing known drugs might provide clinical benefit. Major consortium efforts have created a valuable library of integrated cellular signatures (LINC) that includes cellular transcriptional responses to large numbers of pharmacologic perturbagens^[236]67. Similar signatures have also been derived from mining large numbers of publicly available sequencing read archive (SRA) data^[237]68. Recently we showed that computational screening for drugs that normalize expression of genes in highly prognostic and pathogenic pathways using cellular signatures of those drugs provides significant therapeutic potential^[238]69,[239]70. Thus, we computationally screened >1600 drugs (mostly FDA-approved) for which the up- or down-regulated drug targets are known^[240]68 (Supplementary Data [241]4a). Drugs were ranked by their ability to suppress unfavorable and to promote favorable prognostic biomarkers. The rank was calculated based on the significance of overlap between genes up/down regulated by a drug and genes from favorable/unfavorable prognostic pathways in cancer (Fig. [242]4a; see Methods). This in silico screening predicted several drugs already in clinical use for each cancer type, as well as many drugs that have not yet been tested (Fig. [243]4b and Supplementary Data [244]4b). Key interactions between highly significant unfavorable prognostic pathways and drugs that can normalize these are summarized for each cancer type in Fig. [245]4c. By way of validation, the top predictions included drugs currently in clinical practice or under trial for some of these cancers. For example, doxorubicin, a commonly used chemotherapy agent for the management of metastatic breast cancer^[246]71,[247]72, was the second top prediction for breast cancer. Celastrol, the top prediction for colon cancer, was recently shown to inhibit proliferation of colorectal cancer cells and migration via the PI3K/AKT pathway^[248]73. Ruxolitinib, the top prediction for liver cancer, is known to have marked tumoricidal effects on hepatocellular carcinoma by inhibiting JAK/STAT signaling^[249]74. Interestingly, irinotecan, a topoisomerase inhibitor, which regulates the cell cycle, was the top prediction for pancreatic cancer (PAAD), a tumor type in which “cell cycle” is among the top pathways associated with unfavorable prognosis (Fig. [250]2c). Last year, the FDA approved investigational use of liposomal irinotecan for first line treatment of pancreatic cancer patients^[251]75–[252]77. Two other pathways strongly associated with unfavorable outcome from pancreatic adenocarcinoma are the Erbb1 (also known as EGFR) and MYC pathways; both of these have previously been associated with worse outcomes from this cancer type^[253]78,[254]79 and are also inhibited by irinotecan^[255]80. These data indicate that matching drug signatures and prognostic markers can be used for repurposing existing therapeutics. Fig. 4. PET-derived prognostic pathways guide drug repurposing. [256]Fig. 4 [257]Open in a new tab a Schematic of drug screening aims; computational screen searches for drugs that upregulate expressions of genes from pathways with favorable prognosis and downregulate expressions of genes from pathways with unfavorable prognosis. b Elbow plots showing the significance of overlap between drug up (or down) regulated genes with leading genes from pathways with favorable (or unfavorable) prognostic potential. Drugs are sorted based on the significance (see Methods). Top two drugs for each cancer type are highlighted. c Sankey plot showing top pathways associated with unfavorable prognosis for each cancer type on the left and top two predicted drugs that demote expression of genes in indicated pathway on the right. The thicknesses of links are proportional to the significance of the pathway (E-value > 5; left; see Methods) or the number of genes affected by drug (right). Only drugs that have a significant impact on the top prognosis pathways for each cancer are displayed. As a result, cancers without drugs affecting their top predicted prognosis pathways are not displayed. Source data are provided as a Source Data file. The CDK2/9 inhibition restricts cervical and bladder cells in vitro and/or in vivo We next sought to test the efficacy of one of our predicted drugs. The top predicted drug for bladder and cervical cancer was a cyclin-dependent kinase (CDK)2/9 inhibitor named 0175029 (Fig. [258]4b). This was somewhat surprising because inhibitors of other CDKs, specifically CKD4/6 inhibitors, have previously been proposed as monotherapy for bladder cancer. The use of CDK4/6 inhibitors have been justified because elevated expressions of either CDK4 or CDK6 are associated with poor survival in bladder cancer, whereas high expression of CDK2/9 have been associated with better survival (Supplementary Fig. [259]S7a). However, CDK4/6 inhibitors including Palbociclib did not meet the primary endpoint during phase II clinical trials for bladder cancer^[260]81–[261]84. Moreover, taking the conventional DEG-based approach, where one normalizes the most differentially expressed genes between cancer survivors and non-survivors, does not identity CDK2/9 inhibitor as a significant lead in either bladder or cervical cancers (Supplementary Fig. [262]S7b, Supplementary Data [263]4c). However, it is worth noting that although (CDK)2/9 inhibitor emerges as the top prediction for bladder cancer using DEG-based approach, it does not reach the statistical significance threshold (Supplementary Fig. [264]S7b). We therefore scrutinized whether PET-based drug prediction specifically advocates CDK2/9 or CDK4/6 inhibition in bladder cancer. Since, the drug database used for screening above^[265]68 did not include any CDK4/6 inhibitors, we computationally screened an additional 5425 compounds from the LINC1000 database^[266]85 that included the CDK4/6 inhibitor Palbociclib. PET-based drug predictions did not advocate Palbociclib (p-value = 1; rank 1601) but returned 0175029, a CDK2/9 inhibitor as the number 1 scoring drug that can significantly normalize genes of prognostic pathways (Supplementary Fig. [267]S7c, Supplementary Data [268]4d). For experimental validation, we then tested the ability of a panel of CDK inhibitors to restrict the growth of bladder and cervical cancers in vitro, including Palbociclib (a CDK4/6 inhibitor), GW8510 (a CDK2 inhibitor), and Seliciclib (a CDK2/7/9 inhibitor) (Supplementary Fig. [269]S7d). Because 0175029 is not commercially available we used its recently identified analog CCT068127 which is also a potent CDK2/9 inhibitor^[270]86. Consistent with PET-based drug prediction, CCT068127, but not Palbociclib, significantly inhibited the growth of both cervical and bladder cancer cell lines, whereas the others either did not, or had minimal effect (Fig. [271]5a). We also found that in two out of three independent cohorts of bladder cancer samples, the expression of CDK9 (but not CDK2) was higher in the basal subtypes of bladder cancers compared to luminal subtypes (Supplementary Fig. [272]S7e), suggesting that CDK9 inhibition might be more beneficial for the basal subtype of bladder cancer. CDK9 expression also showed a significant correlation (Pearson r = 0.71; p-value = 0.006) with unfavorability scores across bladder cancer cell lines available in the CCLE database^[273]22 (see Fig. [274]5b and Supplementary Data [275]5a). The unfavorability score for each sample was calculated as the mean (z-score) expression of genes associated with an unfavorable outcome (Supplementary Data [276]2e). To expand on this, we referenced publicly available data from the GDSC database^[277]87 where a range of bladder cancer cell lines were treated with a different CDK9 specific inhibitor (i.e. CDK9_5038) and IC50 values were calculated. Despite the difference in inhibitors, we observed a trend where cell lines with higher unfavorability scores required higher concentrations of the inhibitor to achieve similar inhibition of proliferation (Pearson r = 0.54, p < 0.06) (Fig. [278]5b). These associations suggest a broader efficacy of CDK9 inhibitors across bladder cancer cell lines, with efficacy correlating inversely with unfavorability scores. Fig. 5. Predicted CDK2/9 inhibition restricts the growth of cervical and bladder cancers in vitro and in vivo. [279]Fig. 5 [280]Open in a new tab a Plots showing the survival rate, measured by CCK-8 assay, of cervical (Hela) and bladder (T24 and UMUC3) cancer cell lines treated with different concentration of vehicle or indicated drugs for 72 h. The selective targets of CDK inhibitors are shown in parenthesis. Mean +/− sem from n > 3 biological replicates. b Plot showing expression of the CDK9 gene versus the unfavorability score for each bladder cancer cell line in the CCLE database^[281]22. The color scale represents the IC50 values reported by the GDSC database^[282]87 for the CDK9-specific inhibitor CDK9_5038. Cell lines for which IC50 values are not available are highlighted in gray. The Spearman correlation value and corresponding p-value are shown in red. The center line and error bars indicate the trend line and 95% confidence interval. Selected cell lines are labeled. c Heatmap showing differentially expressed genes (DEGs; 3-fold at adjusted p-value < 0.05) between DMSO or CCT068127 (1uM) treated T24 cells for 48 h in n = 3 independent biological replicates. GSEA plots showing enrichment of genes repressed by the CDK4/6 inhibitor Palbociclib (top) or genes repressed by the CDK9 inhibitor VIP152 (bottom) in T24 cells treated 48 h with the CDK2/9 inhibitor CCT068127 (1 uM) or DMSO control. Palbociclib (SRP404373) or VIP152^[283]101 inhibited genes are publicly sourced. d GSEA plot showing enrichment of transcriptomes of CCT068127 (1 uM) or DMSO treated T24 cells in favorable (top) or unfavorable (bottom) PET identified prognosis genes in bladder cancer. Shown are the FDR value and normalized enrichment score (NES). e Schematic of xenograft experiment and treatment plan with 30 mg/kg daily i.p. injections of CCT068127 or DMSO carrier. p-values compare DMSO-treated to CCT068127-treated groups at each time-point. f Plot showing daily tumor volumes during course of experiment. g Bar plots showing the mass of tumors at explant on day 23. Error bars show mean ± s.d. p-values in a and f are by two-way ANOVA with multiple comparisons adjustment and in g by two-sided Wilcoxon rank sum test. ns non-significant; *p < 0.01; **p < 0.001; ***p < 0.0001. Panel e Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license ([284]https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en). Source data are provided as a Source Data file. At present, the first line of treatment for bladder cancer consists of chemotherapy regimens that feature cisplatin. As expected, cisplatin also significantly inhibited the growth of bladder cancer cell lines (Fig. [285]5a). It is worth noting that our approach also predicted that cisplatin desirably normalizes prognostic genes (Supplementary Fig. [286]S7c). To further investigate the impact CCT068127 on bladder cancer cells, we performed RNA-sequencing on T24 bladder cancer cells treated with carrier or CCT068127 (see Methods). 822 genes were found to be up-regulated and 490 genes down-regulated (RPKM > 2, Fold change ≥ ±3 at adjusted p-value ≤ 0.05; Fig. [287]5c; Supplementary Data [288]5b). CCT068127 significantly reduced the expression of genes associated with CDK9 inhibition, while showing no significant effect on genes associated with CDK4/6 inhibition (Fig. [289]5c). To test whether CCT068127 can normalize the predicted prognostic signature, we gathered all the genes linked to favorable or unfavorable prognosis from the prognostic pathways identified by PET and conducted a gene set enrichment analysis. As anticipated from the drug predictions, genes that were upregulated by CCT068127 were significantly enriched in those with PET-predicted favorable prognosis. Conversely, genes that were downregulated by CCT068127 were significantly enriched in in those with PET-predicted unfavorable prognosis (Fig. [290]5c, d). The up-regulated genes included PRDX4 and DIO3, which were determined by PET to be key markers of a favorable prognosis (Supplementary Fig. [291]S4e). Conversely, the down-regulated genes including XAF1, KPNB1 and PAS1 which were found to be significant markers of an unfavorable prognosis (Fig. [292]2e). These data corroborated the expectation that the top predicted drug normalized identified prognostic genes. We next tested the in vivo efficacy of CCT068127 in mice. Pharmacokinetic studies identified daily administration of up to 30 mg/kg of CCT068127 for two weeks to be well-tolerated by mice, as evidenced by stable weight during that time (Supplementary Fig. [293]S7e) and lack of gross pathology on necropsy. We then implanted bladder cancer cells subcutaneously in immunodeficient mice and after two weeks, when tumors were approximately 100 mm^3 in size, randomly assigned mice to control or treatment groups, injecting them with 30 mg/kg/day of DMSO or CCT068127 i.p. for 10 days, respectively (Fig. [294]5e). CCT068127-treated mice demonstrated a significant reduction in both tumor volumes and weights (Fig. [295]5f, g), indicating that CCT068127 was effective at restricting tumor growth in vivo. Taken together, these data indicate that, despite being unexpected from traditional methods, CDK9 inhibition predicated on favorable and unfavorable prognostic biomarkers suppress tumor cell growth both in vitro and in vivo. Furthermore, they corroborate the notion that a PET-based approach can guide repurposing drugs that prove effective in the treatment of cancer. Discussion The contemporary investigation of the underlying mechanisms in cancer relies disproportionately on obtaining genome-wide omics data. Biological pathway analysis is a crucial tool in interpreting these data. As such, several pathway analysis methods have been developed, with the implicit assumption that they identify and prioritize the most important pathways among their top-ranked outputs (reviewed here^[296]2,[297]3). Indeed, these methods have been successful in confirming some of the speculated and/or dysregulated pathways. Many of these methods have also been tested on simulated data^[298]6 or anecdotally verified in small scale biological datasets^[299]2,[300]8,[301]9. However, the performance and limitations of these methods for an unbiased discovery of related biological pathways has been largely unknown. Here, we developed a platform named Benchmark to explicitly measure performance of pathway analysis tools for discovery under experimental settings. It is worth noting that Benchmark has several key advantageous features, making it superior to existing gold-standard evaluation platforms^[302]7. First, IGSs and TGSs that are used for evaluation in Benchmark are derived from large numbers (~1000) of experimental datasets, where expected pathways are well-defined. This is in clear contrast to the few existing evaluation platforms that utilize small experimental datasets^[303]2,[304]7, computer-simulated pathways^[305]88, and/or subjective measures for defining pathways^[306]2. Second, since IGSs and TGSs in Benchmark are derived from experimental data, they eliminate systematic biases introduced by computer simulations. For example, IGSs and TGSs in Benchmark are free from biases introduced due to the assumption by most simulations that genes within each pathway are independent of each other. Third, IGSs and TGSs recapitulate a realistic setting where genesets under investigation (i.e., IGSs) are often derived under conditions (e.g., cell lines, species, conditions) that are different from those of known biological pathways (TGSs). Fourth, because Benchmark uses experimental data, it contains noise inherent to biological data, therefore provides a platform for a faithful readout of the performances of pathway analysis tools under real-life settings. Lastly, we present two evaluation measures, namely P@10 and AP@10, which accurately reflect the practical application of pathway analysis methods for biologists. Our findings from Benchmark show that while common pathway analysis methods are effective for confirming speculated pathways, they are not optimal for unbiased and systematic discovery of the most disrupted pathways. Our study found that PET, which integrates the ranking statistics of several pathway analysis tools, is significantly more effective for discovery and has high accuracy and noise resistance. It is worth noting that ensemble approaches, such as decoupler^[307]11, piano^[308]12, and egsea^[309]89, have been previously developed for pathway analysis and have generally resulted in improved outcomes compared to single method approaches. In our evaluation using Benchmark, we found that these ensemble methods, when employed with default settings and when the underlying methods and input parameters were chosen ‘blindly’ without careful consideration, either performed marginally better or were comparable to their underlying methods. Our results demonstrated that optimizing the input parameters of individual pathway analysis methods not only significantly improves their performance but also enhances the results of the ensemble approach overall. Here we have optimized and combined results from the three most common pathway analysis methods. However, additional research is required to assess how to effectively integrate statistics from other pathway analysis methods to improve pathway discovery. The number of replicates in each group is an important consideration in controlling for false discovery rates^[310]90. Our Benchmark is generally constructed from 2 replicates per group available in ENCODE^[311]10. However, the average number of overlapping genes among related genesets (n = 16) compared to unrelated genesets (n = 1.3) across cell lines, obtained from 2 replicates (Supplementary Fig. [312]S1b, f), suggests that this sample size is sufficient for accurately distinguishing related from unrelated genesets. Future studies will extend the Benchmark to include additional genesets and replicates, and to evaluate the suitability of pathway analysis methods for increasingly available single cell data. We next applied PET to systematically determine pathways associated with prognosis in 12 different well-characterized cancer types. The predicted pathways from PET were correlated with prognostic features hypothesized from biological experiments and were confirmed in independent datasets. We further leveraged genes from the prognostic pathways to identify single gene and combination prognostic biomarkers. It is important to note that exhaustive genome-wide searches to identify combinations of a few genes as biomarkers (even as few as 3) is computationally and statistically unfeasible. Thus, many studies extract large gene signatures (>50 genes) associated with favorable or unfavorable prognosis by identifying differentially expressed genes (DEGs). Although useful, these signature sets are challenging to apply in a clinical setting^[313]48. Our data indicate that a search space of only 3–5 leading edge genes from prognostic pathways yeilds gene combinations with superior prognostic power when compared to single genes or DEGs. Importanly, combining leading biomarkers of favorable and unfavorablle prognoses resulted in the categorization of patients into highly resilient and highly vulnerable groups, providing a guidance for determining the level of treatment intensity. Moreover, our data show that genes from prognostic pathways distingush known molecular subtyes of cancer with high precision. Thus it is likely that these genes will also identify previously unknown molecular subtypes, but further investigations are needed to provide proof. It is important to recognize that the selection of samples can influence the identification of prognostic pathways and biomarkers. In our analyses, we classified patients with unfavorable outcomes as those in early cancer stages who passed away, and age and sex matched patients with favorable outcomes as those in later stages who were marked as alive in the TCGA cohort. A more precise alternative selection criterion would have been to classify deceased patients with survival times less than the median, and age and sex matched alive individuals with days to last follow-up more than the median of all samples as unfavorable and favorable selections, respectively. Pathways with prognostic potential could provide a means for identification of effective therapeutics. Indeed, our data indicate that a computational search for therapeutics that can normalize the expression of genes from prognostic pathways can be used for drug repurposing. Proof-of-concept in vitro and in vivo validations of the top predicted repurposed drug in bladder and cervical cancer cells indicate the potential of our approach. The prediction of CDK9 but not CDK4/6 by our approach was somewhat unexpected but aligned with the observation that Palbociclib (a CDK4/6 inhibitor) did not meet the primary endpoint during phase II clinical trials in bladder cancer^[314]81–[315]84. Previous studies have indicated that alterations in the CDK4/6 pathway could potentially drive bladder cancer^[316]91–[317]93 but the cell lines and assays used in these studies are slightly different than ours. Consistently, our cell viability measurements, conducted using the CCK-8 assay, also demonstrated a modest yet significant effect of Palbociclib on T24 cells. Further in vivo investigation using patient-derived bladder cancer xenografts (PDX) models are warranted^[318]94, to compare the effects of CDK9 inhibitors and CDK4/6 inhibitors on primary bladder tumors in future studies. Of note, our data driven approach will also provide the mechanistic rationale (i.e., the involved pathways) for identified therapeutics and offer insights into basic mechanisms that can be exploited for drug development. However, whether predicted prognostic pathways contribute directly to the pathophysiology (as opposed to merely representing biomarkers) can only be determined via experimentation. It is also important to note that prognostic biomarkers, which predict patient outcomes, and predictive biomarkers, which determine the efficacy of individual drugs, are often distinct from each other. With increasing omics data from responders and non-responders, future research will be needed to apply pathway enrichment analysis to identify not only prognostic pathways but also predictive pathways. Methods Ethics statement All procedures involving mice were conducted in accordance with the guidelines established in the Guide and Use of Laboratory Animals of the National Institutes of Health, USA. The protocol was approved by the Institutional Animal Care and Use Committee at the Purdue University (PACUC Protocol #1908001941). Animals were housed in Hansen mouse facility at Purdue University. All mice are fed and watered ad libitum, with consistent access to food and water, and housed in a facility maintained at ambient temperature and humidity with 12-h light/12-h dark cycles. All mice were sacrificed prior to the tumors reaching a mean diameter of 2.0 cm or in the event of ulceration with necrosis/infection as outlined in the PACUC policy on “Human endpoint criteria for rodent tumor studies”. Cell culture T-24 and UMUC3, human urinary bladder cancer cells were acquired from ATCC (HTB-4^TM and CRL-1749^TM, respectively). Both the cells were maintained in Roswell Park Memorial Institute medium (RPMI 1640; Gibco) supplemented with 10% fetal bovine serum (FBS; Hyclone Lab, Logan, Utah), 1% penicillin/streptomycin (PS) (Invitrogen, Carlsbad, CA, USA) in humidified chamber with 5% CO[2] at 37 °C. HeLa, human cervical carcinoma cells were acquired from ATCC (CCL-2™) and maintained in complete Eagle’s Minimum Essential Medium (MEM, Gibco) at similar conditions. The cell lines used here were not independently authenticated but were regularly tested for mycoplasma contamination using the Mycoplasma Detection Kit (abm, cat#G238). Cell viability assay CDK inhibitors GW8510 (CDK2), Palbociclib (CDK4/6), Seliciclib (CDK2/7/9), CCT068127 (CDK2/9) were purchased from AOBIOUS, INC (Gloucester, MA, USA) and were dissolved in dimethyl sulfoxide (DMSO) to make stock solutions of 10 mM and stored at −20 °C. Working concentrations, as indicated, were further diluted with either cell culture media or saline. Hela, UMUC3 and T-24 cells (approximately 20,000 per well) were seeded in a 96-well plate 24 h before prior to treatment with various concentrations of the drugs for 72 h at 37 °C. Cell viability was measured using Cell Counting Kit-8 (APExBIO; Houston, USA) reagent following manufacturer’s recommendations. Briefly, 10 ul of CCK-8 reagent per 100 ul media was added to each well and kept at 37 °C for 2 h. The absorbance of WST-8 formazan was recorded using a Veritas Microplate Luminometer at 450 nm (BioTec; Epoch2). The viability (%) was calculated by using formula: Cell viability (also called survival rate) (%) = [(Absorbance of tested compound − Absorbance of blank)/(Absorbance of control − Absorbance of blank)] × 100. RNA-sequencing and data analysis T-24 cells (approximately 1 × 10^6 per well) were seeded in a 6 well plate. Next day, cells were treated with either vehicle (DMSO) control or CCT068127 at 1uM concentration for 48 h. Cells were stored with TRI Reagent (ZYMO Research International, Tustin, CA) at −80 °C. Genomic RNA was extracted from cells using a Direct-zol RNA MiniPrep Plus (ZYMO Research International; Tustin, CA). An Agilent 4200 TapeStation system was used to determine quality and quantity of RNA. Samples having RNA integrity number (RIN) value above 8.5 were used for library preparation. 500 ng total RNA for each sample was subjected to NEBNext Poly(A) mRNA Magnetic Isolation (E7490) and resulting mRNA libraries were prepared for RNA-seq by NEB Ultra II kit (New England Biolabs) following manufacturer’s protocols and sequenced by NovaSeq (Illumina). All sequenced reads were aligned to hg38 genome using RSEM^[319]95 with parameters “rsem-calculate-expression --bowtie-n 1 --bowtie-m 100 --seed-length 28 --bowtie-chunkmbs 1000”. Differential expression analysis was performed using edgeR. Average RPKM ≥2 in at least one condition, Fold change ≥3 and adjusted p-value ≤0.05 was used to determine DEGs. The TCGA data analysis used read count, normalized expression profiles, FPKM, and clinical information from samples of 12 cancer types (as shown in Fig. [320]2a). Stage selection was based on ajcc_pathologic_stage (NCI Thesaurus Code: [321]C28257) and ajcc_pathologic_t (NCI Thesaurus Code: [322]C48881 and [323]C48739) from clinical data. In cases where both indicators were present but contradictory, the indicator that corresponds to an earlier stage of tumor staging is prioritized. To account for age as a potential confounder, all deceased samples and those annotated as alive with an age higher than the first quantile of all deceased patients in the dataset were included. The selected samples were tested to confirm that age and sex is not a significant confounder for all cancers. For canine data analysis, we employed PET to discover prognostic pathways in later stages of human Bladder Urothelial Carcinoma (BLCA) patients (annotated as T3 and T4). The ability to predict outcomes for single genes and combinations of genes from the identified pathways was determined (see below). Finally, the genes with the strongest prognostic abilities were used to distinguish molecular subtypes of canine tumors. Xenograft mouse model The original strain of all mice used in this study were purchased from Jackson Laboratory, Strain# 025216, NOD.Cg-Prkdc^scid H2-K1^b-tm1Bpe H2-Ab1^g7-em1Mvw H2-D1^b-tm1Bpe Il2rg^tm1Wjl/SzJ and were acclimatized and maintained in Hansen mice facility at Purdue University. Due to availability, 10–11 weeks-old female mice were used in all experiments. Tumors were implanted into the flanks of these mice. Briefly, cultured UMUC3 cells were resuspended at 1 × 10^6 cells/100 ul in PBS. Mice were anesthetized with isoflurane. Cells were injected subcutaneously into left and right flank of the mouse using 29G1/2 needle. Animals were monitored and weighed every day. Tumor volumes were measured by using digital caliper and calculated by the formula: Tumor volume = (length × width × height)0.52. On 14^th day post inoculation, mice developed about 100 mm^3 size tumors. Mice were randomized into two groups and treated for the next 10 days with vehicle (DMSO in 100 ul saline) and CCT068127 in saline (30 mg/kg). All treatments were administered by intra-peritoneal injections. Pharmacokinetic study Groups of NSG female mice (n = 4) were injected intraperitoneally every day for two weeks with doses of 10 mg/kg or 20 mg/kg or 30 mg/kg of body weight of CCT068127 in saline. Mice were monitored daily for weight changes and clinical signs of toxicity. After completion of the experiment, mice were euthanized by CO[2] inhalation. A necropsy study was done in a systemic manner to identify any changes in anatomical features due to drug toxicity. We selected a concentration of 30 mg/kg of CCT068127 for further experiment. Benchmark data We curated raw and processed data from ENCODE project^[324]10 listed in Supplementary Data [325]1a. The expected counts and normalized gene expression values, FPKM, and statistics from differential expression analysis were extracted from HepG2 and K562 shRNA-seq datasets. For ChIP-seq, we annotated the processed peak to the nearest gene using HOMER annotatePeaks.pl^[326]96 from A549, CH12.LX, GM12878, HEK293, HepG2, K562 and MEL cell lines. RNA protein binding profiles (i.e. eCLIP-seq) were processed similarly to ChIP-seq for HepG2 and K562. Pathway ensemble tool PET combines the outputs from three underlying pathway analysis methods: GSEA, ora and Enrichr. Reported pathways were ordered by normalized enrichment score (NES), ora p-values by Fisher exact test, Enrichr “combined scores”, respectively, and the ranks were determined. For each pathway, an average rank between the three methods was calculated and a combined p-value was estimated by Stouffer’s Method using scipy.stats.combine_pvalues with method = “stouffer”. Finally, all pathways were ordered based on the average rank. Ora was re-implemented using scipy.stats.fisher_exact function. Enrichr was re-implemented in Python as described in Computing enrichment section of the original manuscript^[327]5 using N = 1000 permutations. GSEA was invoked using different indicated ranking metrics in the manuscript. For “signal2noise” and “ratio of classes”, the normalized expression values were used with GSEA parameters set as -collapse No_Collapse -set_max 500 -set_min 20”. For other modes (e.g. DESeq2, edgeR), a pre-ranked gene list was generated from statistics of differential expression analysis or peak calling results, where -log10(p-value)*sign(log(fold change)) was used as the gene weight. The “prerank” mode parameters were set as -collapse No_Collapse -set_max 500 -set_min 20”. All the other parameters were kept as default. PET implementation in python incorporates the option to use DESeq2^[328]97 or PyDESeq2^[329]98, as well as GSEA^[330]4 or GSEApy^[331]99, providing flexibility in selecting the platform for their analysis. Running ensemble methods To conduct pathway analyses using e.browser^[332]14, we utilized the built-in function deAna (expr_mat, de.method = “DESeq2”) to make the differential gene expression analyses consistent with PET. Every underlying pathway analysis method supported by e.browser was executed (except padog which had technical errors), and their results were combined using “combResults” and then ranked by “gsRanking” function. To conduct pathway analyses using piano^[333]12, the geneLevelStats were provided as the p-values from DESeq2 analyses for all the methods allowed or log2 fold change as required by the remaining methods. Every underlying pathway analysis method supported by piano was executed, and their results were combined using consensusScores (res_list, class = ‘distinct’, direction = ‘up’). To conduct pathway analyses using decoupler^[334]11, we utilized the function run_consensus. The DESeq2 statistics, defined as -log10(p-value) for each gene were provided as the required ‘mat’ argument. All Benchmark genesets were provided as the required ‘net’ argument, with each geneset ‘connected’ to its genes. Then, all genesets were ranked based on ‘consensus_estimate’ score returned by run_consensus function. All other indicated pathway analysis methods were executed using egsea Bioconductor toolkit^[335]89 provided with read count matrix. The “average rank” setting for egsea resulted in the best overall median rank, precision and average precision scores among all its other ranking metrics and thus was chosen to be reported here. Simulated noise and false-discovery rate calculation To simulate inaccuracies and noise in biological pathways, a portion of genes in each pathway were substituted with an equal number of randomly chosen genes from the entire human genome. To calculate false discovery rates, we first collected raw read counts and normalized gene expression data from TCGA breast invasive carcinoma (BRCA), TCGA lung adenocarcinoma (LUAD), TCGA prostate adenocarcinoma (PRAD), and GTEx breast, lung and prostate tissues. We randomly selected 2 N samples (i.e. N = 2–10, 15, 20, 25, 30) from each tissue/cancer and split them into two groups, and ran pathway analysis methods on the expression matrix of the two groups against all canonical pathways (c2.cp.v7.1.symbols.gmt), where the permutations are repeated 100 times for each N. For methods requiring a pre-ranked gene list or set of genes, we utilized DESeq2 to perform the differential expression analysis and used –log10(p-value) or top 200 DEGs as input for those methods. Since the samples were randomly selected biological replicates, the expectation is that any reported pathway significantly different between the groups represents false discovery, where significance is defined as the adjusted p-value ≤ 0.05. Thus, the proportion of significant pathways among all the pathways is defined as the false-discovery rate. Prognostic power of individual genes, gene combinations and independent validation The prognostic power of each biomarker was quantified using the area under the curve (AUC) based on the ability of a gene or a gene combination biomarker to distinguish different sample groups, e.g., alive vs. deceased or progression vs. no-progression to the next tumor stage. The gene expression values were first transformed to z-scores (across samples). For gene combination biomarkers, the average z-score of the participating genes were calculated. The z-scores were then used to calculated AUC values by sklearn.metrics.roc_auc_score. To calculate the significance of individua genes or combination biomarkers, the p-value for logrank test of survival were generated by LIFELINES [10.21105/joss.01317] and were FDR-adjusted for multiple testing by statsmodels.stats.multitest.fdrcorrection. Adjusted FDR value less than 0.05 were considered as significant. To identify gene combination biomarkers from prognostic pathways identified by PET or differentially expressed genes (DEGs) in each cancer type in TCGA dataset, the following steps were taken: For PET, we identified the top 20 pathways associated with unfavorable prognoses and the top 20 pathways associated with favorable prognoses according to PET p-values. From each pathway, we extracted up to 10 leading genes based on optimized GSEA results and generated all possible combinations of 1–5 leading genes (n = 637). Finally, we calculated the AUC values for all the combinations as mentioned above. For individual DEGs, we obtained the top 100 up and top 100 down-regulated DEGs (sorted by DESeq2 adjusted p-value) between alive vs. deceased groups and calculated the AUC values for these DEGs as mentioned above (Fig. [336]2d, Supplementary Fig. [337]S4d). Since generating combinations of 1–5 genes from 100 DEGs would be impractical, for combination biomarkers derived from DEGs, we focused on the top 10 most upregulated and top 10 most downregulated genes separately. We enumerated all possible combinations of 1–5 genes from the upregulated genes (n = 637) and another set of combinations from the downregulated genes (n = 637). Finally, we calculated the AUC values for these combination biomarkers, following the same method mentioned above. Only significant combination biomarkers based on FDR-adjusted logrank of survival (see above) were kept for visualization (e.g. Fig. [338]2e, Supplementary Fig. [339]S4e) and further analysis. To independently validate the combination biomarkers derived from either DEGs or PET analysis, we extracted the top 200 significant combination biomarkers based on their AUC values. We then tested the performances of these biomarkers on several independent datasets (Supplementary Fig. [340]S4g and Supplementary Data [341]2g). Specifically, for each cancer type, we calculated the proportion of identified favorable or unfavorable markers identified from corresponding TCGA cancer dataset that remained as favorable or unfavorable markers in the indicated independent dataset. Computational drug prediction The drug-target dataset was downloaded from DSigDB V1^[342]68. For each drug, the down-regulated and up-regulated target genesets were designated as drug_repressed_geneset and drug_induced_geneset. The gene biomarkers that are intended to be upregulated or downregulated were designated as favorable_geneset and unfavorable_geneset respectively. To identify the “favorable” or “unfavorable” geneset in each type of cancer from PET analysis, we extracted the top 20 pathways associated with either favorable or unfavorable prognoses. From these pathways, we selected up to 10 leading genes based on optimized GSEA results and compiled them as the respective “favorable” or “unfavorable” geneset. For DEG-based drug prediction, we extracted the top 200 upregulated and the top 200 downregulated DEGs according to DESeq2 adjusted p-value between the alive vs. deceased patient groups as the corresponding “favorable” or “unfavorable” geneset. For each pair of drug and cancer, we then performed the following four Fisher exact tests with respect to the gene universe and calculated the corresponding p-values using scipy.stats.fisher_exact: a) drug_induced_geneset and unfavorable_geneset, b) drug_repressed_geneset and favorable_geneset, c) drug_induced_geneset and favorable_geneset, d) drug_repressed_geneset and unfavorable_geneset. The drug effect on a prognostic marker is then calculated as E = −log10(c) − log10(d) + log10(a) + log10(b). Finally, all the drugs were ranked by the predicted effect E. Reporting summary Further information on research design is available in the [343]Nature Portfolio Reporting Summary linked to this article. Supplementary information [344]Supplementary Information^ (1.5MB, pdf) [345]Peer Review File^ (6.5MB, pdf) [346]41467_2024_51859_MOESM3_ESM.pdf^ (113.8KB, pdf) Description of Additional Supplementary Files [347]Supplementary Data 1^ (233KB, xlsx) [348]Supplementary Data 2^ (2.5MB, xlsx) [349]Supplementary Data 3^ (84.7KB, xlsx) [350]Supplementary Data 4^ (1.8MB, xlsx) [351]Supplementary Data 5^ (1.2MB, xlsx) [352]Reporting summary^ (2.6MB, pdf) Source data [353]Source Data^ (341.4KB, xlsx) Acknowledgements