Abstract
Pathway enrichment analysis is a ubiquitous computational biology
method to interpret a list of genes (typically derived from the
association of large-scale omics data with phenotypes of interest) in
terms of higher-level, predefined gene sets that share biological
function, chromosomal location, or other common features. Among many
tools developed so far, Gene Set Enrichment Analysis (GSEA) stands out
as one of the pioneering and most widely used methods. Although
originally developed for microarray data, GSEA is nowadays extensively
utilized for RNA-seq data analysis. Here, we quantitatively assessed
the performance of a variety of GSEA modalities and provide guidance in
the practical use of GSEA in RNA-seq experiments. We leveraged
harmonized RNA-seq datasets available from The Cancer Genome Atlas
(TCGA) in combination with large, curated pathway collections from the
Molecular Signatures Database to obtain cancer-type-specific target
pathway lists across multiple cancer types. We carried out a detailed
analysis of GSEA performance using both gene-set and phenotype
permutations combined with four different choices for the
Kolmogorov-Smirnov enrichment statistic. Based on our benchmarks, we
conclude that the classic/unweighted gene-set permutation approach
offered comparable or better sensitivity-vs-specificity tradeoffs
across cancer types compared with other, more complex and
computationally intensive permutation methods. Finally, we analyzed
other large cohorts for thyroid cancer and hepatocellular carcinoma. We
utilized a new consensus metric, the Enrichment Evidence Score (EES),
which showed a remarkable agreement between pathways identified in TCGA
and those from other sources, despite differences in cancer etiology.
This finding suggests an EES-based strategy to identify a core set of
pathways that may be complemented by an expanded set of pathways for
downstream exploratory analysis. This work fills the existing gap in
current guidelines and benchmarks for the use of GSEA with RNA-seq data
and provides a framework to enable detailed benchmarking of other
RNA-seq-based pathway analysis tools.
1 Introduction
Since its inception two decades ago, pathway enrichment analysis has
grown in popularity to become an essential data analysis tool in
omics-based studies. By casting gene-level measurements in a broader
biological context, this approach allows researchers to interpret their
data in terms of a great variety of gene sets that may represent
different functions, processes, components, or associations with
disease. The success of pathway enrichment analysis, however, led to a
very complex state of affairs with more than 70 different methods and
hundreds of variants published to date [[26]1–[27]4] compounded by more
than 33,000 human gene sets available from the Molecular Signatures
Database (MSigDB) [[28]5] alone, a scenario that has become extremely
difficult to navigate.
Although pathway enrichment analysis methods share a common goal, their
approach and underlying statistical models vary substantially. Existing
methods have been categorized in various ways; for the purpose of our
work, we are primarily concerned with over-representation analysis
(ORA), whose input is an unranked list of genes (selected from the full
list of measured genes by imposing a threshold criterion), and
functional class scoring (FCS), whose input is the full list of
measured genes (ranked based on quantitative scores) [[29]6]. In recent
years, several groups have discussed and implemented benchmarks to
compare findings from different tools as well as to evaluate multiple
factors that affect the results of enrichment analysis [[30]1,
[31]7–[32]15]. Typically, these benchmarking efforts were aimed at
comparing many different tools belonging to one or more broad
categories, such as ORA and FCS. In contrast, our aim is to focus on
the pioneering FCS-type method known as Gene Set Enrichment Analysis
(GSEA) [[33]16, [34]17], which arguably remains the most highly cited
and widely used pathway analysis method, thereby comprehensively and
thoroughly exploring its performance under different modalities by
means of high-quality, curated RNA-seq datasets and pathway
collections. Similarly to other early-generation pathway analysis
methods, GSEA was originally developed for use with microarray data,
but nowadays it is extensively used for analysis of RNA-seq data.
Different omics technologies, however, are known to introduce a variety
of potential sources of bias that may confound functional enrichment
analysis [[35]18, [36]19]. Thus, here we aim to assess GSEA’s
performance specifically using the type of data on which GSEA is most
commonly being currently utilized.
To that end, we leveraged an approach first proposed by Tarca et al.
[[37]7] and followed by several studies [[38]1, [39]8, [40]12, [41]14,
[42]15], which built a set of positive-control pathways using
annotations that matched the conditions under study. In our approach,
we scanned 11,274 harmonized RNA-seq samples across 33 cancer types
from The Cancer Genome Atlas (TCGA) [[43]20]. We applied filters to
remove technical sources of variability and, in order to control for
intertumor transcriptional heterogeneity, we focused on TCGA projects
with available paired primary-tumor / non-tumor tissue samples. The
final analysis involved 1,219 paired samples from 604 subjects across
12 cancer types. To select positive-control pathways, we scanned 33,591
gene sets from MSigDB (which includes KEGG [[44]21], Reactome [[45]22],
Gene Ontology [[46]23], and many other public resources) and
implemented a sequence of filters leading to a total of 253 pathways
associated to those 12 cancer types. GSEA was run using the latest
available version 4.3.2 (build 13, October 2022) [[47]24]. We generated
random gene set ensembles to quantitatively assess the sensitivity and
specificity of different GSEA modalities. Finally, we analyzed other
large cohorts (a thyroid cancer study consisting of 780 paired
tumor/nontumor RNA-seq tissue samples from 390 subjects [[48]25] and a
hepatocellular carcinoma study consisting of 140 paired tumor/nontumor
RNA-seq tissue samples from 70 subjects [[49]26]). By comparing results
from these cohorts and their TCGA counterparts, we propose a new
consensus metric termed Enrichment Evidence Score (EES) that allows the
identification of a core set of pathways (those showing a high degree
of consensus across enrichment metrics) complemented by an expanded set
of pathways for exploratory analysis. Moreover, we show how the EES
metric may be utilized to identify high-consensus leading-edge genes
and integrate pathway- and gene-level information.
In summary, our primary goal is to leverage large, harmonized RNA-seq
data repositories and large, curated pathway repositories in order to
quantitatively assess the performance of a variety of GSEA modalities
and provide guidance in the practical use of GSEA for the analysis of
RNA-seq data. Moreover, a secondary goal of this study is to set up a
framework capable of benchmarking other RNA-seq-based pathway analysis
tools. To enable such secondary aim, we made all source code openly and
publicly available along with detailed step-by-step documentation
[[50]27].
2 Methods
2.1 RNA-seq datasets
2.1.1 The Cancer Genome Atlas (TCGA)
From NCI’s Genomic Data Commons portal [[51]28] we downloaded all of
TCGA’s gene expression quantification files and associated metadata. In
total, these are 11,274 harmonized RNA-seq expression data files across
33 cancer types. In order to control for intertumor transcriptional
heterogeneity, we only considered individuals with paired
primary-tumor/non-tumor solid tissue samples available. Moreover, to
remove technical sources of variability [[52]29, [53]30], we excluded
Formalin-Fixed Paraffin-Embedded (FFPE) tissue samples and considered
only TCGA studies with at least 10 subjects available after these
exclusions. This process left us with 15 TCGA projects. However, 3 of
them (KICH, KIRP and UCEC) were later excluded because we could not
identify enough positive-control pathways based on the procedure
explained below (see Sect 2.3.1). The final analysis involved 1,219
paired primary-tumor/non-tumor solid tissue samples from 604 subjects
across 12 cancer types ([54]Table 1).
Table 1. Summary of TCGA projects and associated cancer-type-specific
pathways from MSigDB.
TCGA Project RNA-seq Data MSigDB Pathways
ID Description Samples Subjects Preselected Target Positive-control
BLCA Bladder urothelial carcinoma 39 19 16 15 12
BRCA Breast invasive carcinoma 228 113 137 109 72
COAD Colon adenocarcinoma 84 41 30 25 15
ESCA Esophageal carcinoma 26 13 10 7 5
HNSC Head and neck squamous cell carcinoma 86 43 9 9 8
KIRC Kidney renal clear cell carcinoma 144 72 8 6 3
LIHC Liver hepatocellular carcinoma 100 50 114 92 70
LUAD Lung adenocarcinoma 122 58 39 29 16
LUSC Lung squamous cell carcinoma 102 51 39 29 16
PRAD Prostate adenocarcinoma 104 52 27 24 16
STAD Stomach adenocarcinoma 66 33 12 11 8
THCA Thyroid carcinoma 118 59 30 21 12
[55]Open in a new tab
2.1.2 Thyroid cancer from the Chernobyl Tissue Bank (REBC-THYR)
Following a similar procedure as described above, we used NCI’s Genomic
Data Commons portal [[56]28] to download gene expression quantification
files and associated metadata for REBC-THYR, a radiation-induced
thyroid cancer study from the Chernobyl Tissue Bank [[57]25]. Based on
the same exclusion criteria followed for TCGA data, we obtained a total
of 780 paired primary-tumor/non-tumor solid tissue samples from 390
subjects.
2.1.3 Hepatocellular carcinoma from Mongolia (MO-HCC)
From NCBI’s Gene Expression Omnibus repository (accession
[58]GSE144269) [[59]31], we downloaded gene expression quantification
files and associated metadata from a study of hepatocellular carcinoma
in Mongolia, which comprises a total of 140 paired
primary-tumor/non-tumor solid tissue samples obtained from 70 subjects
[[60]26].
2.2 Differential Gene Expression (DGE) analysis
All of TCGA’s gene expression quantification files are openly available
to download from NCI’s Genomic Data Commons (GDC) portal [[61]28]; we
provide step-by-step details in our code and documentation repository
[[62]27]. The GDC mRNA quantification analysis pipeline measures gene
level expression with STAR as raw read counts. Subsequently, the counts
are augmented with several transformations including Fragments per
Kilobase of transcript per Million mapped reads (FPKM), upper quartile
normalized FPKM (FPKM-UQ), and Transcripts per Million (TPM). These
values are additionally annotated with the gene symbol and gene
bio-type. These data are generated through this pipeline by first
aligning reads to the GRCh38 reference genome and then by quantifying
the mapped reads. To facilitate harmonization across samples, all
RNA-Seq reads are treated as unstranded during analyses. Further
details are provided in the GDC documentation [[63]32].
Gene expression count matrices consist of 60,660 quantified genes, out
of which we kept 19,962 protein-coding genes. After correcting for gene
name aliases, we found that 19,321 of these genes were included in
pathways from MSigDB. We then performed further filtering and DGE
analysis separately for each TCGA project. Using TPM quantification, we
removed low-abundance genes with TPM > 1 in less than 40% of the
samples, transformed the count matrix using log[2](TPM + 1) and fitted
the expression values against tissue type (primary tumor or non-tumor
tissue) and subject (to take into account the paired sample design)
using functions lmFit, eBayes and topTable from the limma [[64]33] R
package v.3.56.1. [65]S1 Table shows the number of measured genes in
each TCGA project after all filters were applied and how many of them
were found differentially expressed in either direction, i.e.
over-expressed in primary-tumor (fold change FC > 1) or non-tumor (FC <
1) tissue, using Bonferroni or Benjamini-Hochberg (B-H) q—value < 0.05
significance thresholds adjusted for multiple testing. An illustration
of these results is shown in [66]S1 Fig with a Volcano plot of
significant genes based on Bonferroni-adjusted q—value < 0.05 for
TCGA-BRCA (breast cancer).
In order to explore an alternative procedure for count normalization
and gene filtering feeding into the DGE analysis, we re-analyzed all
paired primary-tumor vs non-tumor cancer samples from TCGA using raw
counts. We implemented a standard analysis pipeline using the edgeR
package v.3.42.2 by applying filterByExpr to filter genes, followed by
voom quantile normalization and limma for assessing differentially
expressed genes with variance stabilization. Using standard gene
ranking scores, defined by [67]Eq (3) below, [68]S2 Fig illustrates the
excellent agreement between the two procedures for TCGA-BRCA samples.
Furthermore, in order to extend the comparison to the pathway level, we
re-ran GSEA with gene set permutations (described below, see Sect. 2.4
for details) using the voom-normalized data and determined
pathway-level ranking scores based on GSEA p-values and enrichment
directions. [69]S3 Fig shows comparisons of these pathway-level ranking
scores derived from the two alternative count normalization procedures
for unweighted (classic) and weighted (p = 1, 1.5 and 2) enrichment
statistics across all TCGA-BRCA target pathways (see below for further
details). Taken together, [70]S2 and [71]S3 Figs show that differential
expression/enrichment analyses derived from these different count
normalization and filtering procedures lead to highly concordant
results at both gene and pathway levels.
2.3 Gene sets
2.3.1 Cancer-type-specific pathways
[72]Fig 1 illustrates the process to obtain cancer-type-specific
pathways as positive controls. We downloaded 33,591 human gene sets
from MSigDB v2023.1.Hs (released on March 2023) [[73]34]. Using
functionality provided by the R package fgsea v.1.26.20, we ran a
custom script to identify, for each cancer type, gene sets whose names
matched a curated list of search terms. The preselected pathways were
then ran through a size filter procedure (following GSEA
recommendations, we excluded gene sets outside the 15–500 size range,
considering only genes that passed all filters in the quantification
step). After running GSEA in multiple modalities, we applied a
subsequent filter to keep target pathways that were found significant
(unadjusted p—value < 0.05) at least once after running GSEA with
gene-set and phenotype permutations using all available enrichment
statistic settings. [74]Table 1 shows the number of pathways selected
at each stage. Full details are provided in [75]S2 Table.
Fig 1. Cancer-type-specific pathways.
[76]Fig 1
[77]Open in a new tab
Procedure to identify lists of preselected, target, and
positive-control pathways using TCGA-BRCA (breast cancer) as example.
See [78]S2 Table for full details.
2.3.2 Gene set overlap
An important challenge of pathway enrichment analysis is that of gene
set overlap, where some genes participate in multiple gene sets
[[79]35, [80]36]. This phenomenon occurs in the presence of
multifunctional genes (i.e., genes that play a role in several
biological functions or molecular processes) but is also prevalent in
some gene set collections with redundancies or a strong hierarchical
structure. The latter case is most notably present, by design, in the
three lineages of Gene Ontology (GO) [[81]23]. A useful metric used to
quantify similarities between gene sets
[MATH: Gi :MATH]
and
[MATH: Gj :MATH]
is the Jaccard index, given by
[MATH: Jij
=∥Gi∩Gj∥∥Gi∪Gj∥, :MATH]
(1)
which defines a symmetric matrix normalized to the 0 − 1 range.
Alternatively, the fractional pairwise overlap matrix, given by
[MATH: Oij
=∥Gi∩Gj∥∥Gi∥, :MATH]
(2)
defines a non-symmetric matrix normalized to the 0 − 1 range that
better captures hierarchical relations such as those present in GO.
2.4 Gene Set Enrichment Analysis (GSEA)
Gene Set Enrichment Analysis is a well-established tool based on a
weighted, normalized Kolmogorov-Smirnov statistic to assess pathway
enrichment of ranked gene lists [[82]16, [83]17]. Originally developed
for microarray data, GSEA is a threshold-free method that analyzes all
genes on the basis of their ranking score, without prior gene
filtering, and is thus recommended when ranks are available for all or
most of the genes in the genome (e.g., transcriptomics) [[84]37].
Let r[k] be the ranking score associated with the k − th gene, g[k],
with k = 1, …, n an index running through all measured genes. Without
loss of generality, we assume genes to be indexed in decreasing order
of their ranking scores. Typically, it is assumed that ranking scores
span positive and negative values; the absolute value of a ranking
score is a measure of effect size and/or statistical significance,
whereas the sign indicates the direction of change. For a case-control
differential gene expression design with fold change FC, one popular
choice is a ranking score of the form
[MATH: rk=sign(log(FCk))×(-log10(p-valuek)), :MATH]
(3)
where p–value[k] is an estimate of the statistical significance of the
observed FC[k] for the k − th gene. Based on the DGE analysis outlined
above (Sect. 2.2), we obtained standard ranking scores, defined by
[85]Eq (3), which are provided in [86]S3 Table for all TCGA projects
analyzed in our study.
Let
[MATH: Gi :MATH]
be a gene set (after removing any genes not included in the universe of
measured genes). The step variable x[k] is defined as
[MATH: xk={|r
k|p<
mrow>∑k∈Gi<
mo>|rk|p
msup>ifgk∈<
mi
mathvariant="script">Gi-1
n-∥Gi∥ifgk∉<
mi
mathvariant="script">Gi,
:MATH]
(4)
where p > 0 is a weight parameter introduced in GSEA’s follow-up paper
[[87]17] to emphasize the role of genes at the top and bottom of the
gene list (i.e. those with larger absolute values of the ranking score
in either direction) in detriment of genes in the center of the list
(i.e. those with lower absolute values of the ranking score in either
direction). The denominators are chosen for normalization purposes. The
p = 0 case is akin to the standard, unweighted formulation from GSEA’s
original paper [[88]16], although with a different normalization.
The cumulative enrichment of gene set
[MATH: Gi :MATH]
is defined as the running sum of the step variable x[k] from top to
bottom, i.e.
[MATH: Em=∑k=1mxk,form=1,.
..,n. :MATH]
(5)
In this way, starting at E[0] ≡ 0, GSEA progressively examines genes
from the top to the bottom of the ranked list, increasing the
cumulative enrichment if the gene at that position is part of the
pathway and decreasing the running sum otherwise. The step variable is
normalized to ensure that E[n] = 0 at the bottom of the list. Then, the
target gene set’s enrichment score (ES) is defined as the maximum
departure from zero (in either direction, i.e. positive or negative)
along the cumulative enrichment profile; its sign indicates whether the
observed enrichment is associated with the top (positive ES) or bottom
(negative ES) of the ranked gene list. Genes in the target gene set
that contribute to the maximum departure from zero in the cumulative
enrichment running sum are termed leading-edge or core-enrichment
genes. By definition of the step variable normalization, ES is
restricted to the [−1, 1] range; the extreme ES = ±1 values may only
occur when all genes in a gene set appear strictly at the top (ES = 1)
or bottom (ES = −1) of the ranked gene list. Because, by virtue of Eqs
([89]4) and ([90]5), GSEA’s weighted Kolmogorov-Smirnov (global)
statistic cannot be expressed in terms of a simple function of the
ranking score (local statistic), formal analytical solutions to express
the significance of a pathway’s enrichment score based on parametric
modeling are not known. Instead, non-parametric methods based on
empirical distributions are computationally generated via random
permutations. GSEA offers two options: (i) gene-set permutations, in
which a target gene set’s enrichment score is compared against a
null-model distribution of enrichment scores from an ensemble of
randomly generated gene sets of equal size, and (ii) phenotype
permutations, in which the null-model distribution is generated by
randomly shuffling phenotype labels among the samples while keeping the
original target gene set.
GSEA’s GUI applications offer a limited choice of ranking statistics
and design matrices [[91]37], which are not appropriate in the context
of our work (see Sect. 4 below for a detailed discussion). For gene-set
permutations, it is possible to upload the ranked list of genes into
GSEA’s GUI in Preranked mode; however, GSEA does not offer suitable
functionality to perform phenotype permutations on RNA-seq data
[[92]37]. Because our study involved both gene-set and phenotype
permutations, we implemented GSEA in the command line via custom
scripts. In this work, GSEA was run using gsea-cli.sh in a macOS
environment (Monterey v.12.6.9) from the latest available GSEA version
4.3.2 (build 13, October 2022) [[93]24]. Importantly, it must be
pointed out that gsea-3.0.jar, utilized in protocols published by
Reimand et al [[94]37], is affected by serious security vulnerabilities
due to the use of the Java-based logging utility Apache Log4j in GSEA
versions earlier than 4.2.3. Moreover, as reported by the GSEA Team,
version 3.0 contained microarray-specific code (mostly related to
Affymetrix) that may cause issues with RNA-seq data analysis, which was
removed in later GSEA updates.
Our analysis of phenotype permutations followed an approach similar to
that outlined by Reimand et al. [[95]37]. For each TCGA project under
study, we generated n[rdm] = 1000 phenotype randomizations and obtained
n[rdm] randomized list of ranks, which were distributed among parallel
command-line GSEA executions. The statistical significance for each
pathway was then empirically estimated as
[MATH: p-value≃
mo>11+nrdm
(1+∑i=1nrdmΘ<
/mo>(|ESrdmi|-|ES|)), :MATH]
(6)
where Θ is the Heaviside step function. This expression computes the
fraction of randomized ranked gene lists that yield enrichment scores
[MATH: ESrdmi :MATH]
larger (in absolute value) than the enrichment score ES obtained in the
original, non-randomized ranked gene list, correcting for random
sampling [[96]38]. It should be noticed that, because our matrix design
was paired, the phenotype permutation procedure randomized tumor vs
non-tumor labels while preserving subject identifiers [[97]39].
In this work, we introduce a new consensus metric termed Enrichment
Evidence Score (EES). At the pathway level, EES is defined as the
signed sum of GSEA runs in which the pathway was significantly
enriched, summed over the four enrichment statistics (classic and
weighted with p = 1, 1.5, and 2). For instance, EES = 3 would
correspond to a pathway found significantly enriched in tumor samples
(positive sign) in three out of the four GSEA runs, whereas EES = −2
would correspond to a pathway found significantly enriched in non-tumor
samples (negative sign) in two out of the four GSEA runs. Similarly, we
define a gene-level EES by summing over GSEA runs in which a given gene
was identified as leading-edge.
2.5 Over-representation analysis (ORA)
ORA is the pioneer approach for pathway enrichment analysis, originally
designed for microarray data [[98]40]. Despite its simplicity, it is a
robust and well-established method that has been implemented in more
than 40 tools [[99]41], which differ in their various components such
as gene set database, data visualization, and user interface. Unlike
GSEA, ORA requires a selection criterion to define a list of top
ranking genes,
[MATH: L :MATH]
, and treats all selected genes equally, ignoring measurable
quantitative differences (e.g. the magnitude of gene expression
differences). ORA tests the null hypothesis of no association between
genes in
[MATH: L :MATH]
and those contained in a target gene set
[MATH: Gi :MATH]
. The probability for the observed intersection between these sets,
[MATH:
ni≡∥L∩Gi∥
:MATH]
, is given by the hypergeometric distribution, i.e.
[MATH: Pni=∥
Gi∥ni×n−∥Gi∥
∥L∥−ni<
munder>n∥L∥<
/mrow>, :MATH]
(7)
and the cumulative probability of an intersection equal or larger than
that observed is given by
[MATH: P(≥ni)=1-∑
m=0ni<
/mi>-1Pm, :MATH]
(8)
which matches the one-sided Fisher’s exact test of the association
between genes in
[MATH: L :MATH]
and
[MATH: Gi :MATH]
. This approach lies at the core of many popular tools for pathway
analysis, both public and proprietary, such as PathwayStudio [[100]42],
DAVID [[101]43], g:Profiler [[102]44] and Ingenuity Pathway Analysis
[[103]45], among many others [[104]41].
We generated custom implementations of ORA using multiple modalities.
After generating DGE tables with tumor/non-tumor FC and p—value
estimates (see Sect. 2.2), we proceeded according to two different
approaches, namely (i) signed ORA and (ii) unsigned ORA. In the signed
ORA approach, we performed two analyses, one using significant genes
with FC > 1 and another one using significant genes with FC < 1; a
target pathway was thus assigned two enrichment p—values, which were
merged by choosing the most significant between the two. In the
unsigned ORA approach, in contrast, FC was ignored and genes were
selected solely based on their p—values. Pan et al [[105]46] pointed
out one shortcoming of ORA, namely that pathways identified as
statistically significant may strongly depend on the cutoffs used to
select the input lists of differentially expressed genes. To take this
effect into consideration, we implemented the q—value < 0.05 selection
criterion applied to two multiple-testing correction scenarios, namely:
(i) Bonferroni-adjusted p-values (stricter and therefore leading to
shorter input gene lists), and (ii) Benjamini-Hochberg (also known as
false discovery rate, FDR), as an alternative, more lenient approach
used to select lists for ORA. In combination with the signed/unsigned
ORA approaches mentioned above, these gene selection criteria yield a
total of four ORA-based scenarios.
2.6 Analysis of sensitivity and specificity
To perform an analysis of sensitivity and specificity for each TCGA
project in this study, we generated an ensemble of 1000
negative-control random pathways. Pathway sizes in the 15 − 500 range
were sampled from a uniform distribution; genes were randomly chosen
from the set of measured genes that passed the filters (see Sect. 2.2
for data processing details). After ordering positive- and
negative-control pathways by enrichment p—value, we used the roc
function from R package pROC [[106]47] v.1.18.4 to generate
one-directional receiver operating characteristic (ROC) curves to test
the assumption that positive-control pathways are more significantly
enriched than negative-control ones. The Area Under the Curve (AUC) was
calculated with function auc and DeLong’s 95% confidence interval with
function ci.auc from the same package. For visualizations, we used R
package VennDiagram v.1.7.3, R package gplots v.3.1.3, and Cytoscape
v.3.9.1.
3 Results
We begin to explore our ideas using TCGA-BRCA (breast cancer), the most
common cancer type in the U.S. and worldwide, for which we have 228
paired primary-tumor/non-tumor RNA-seq samples available from 113
subjects. Following the procedure outlined in Sect. 2.3.1, a sequential
series of filters was applied on MSigDB gene sets to identify 72
positive-control pathways ([107]Fig 1). The analysis of gene set
overlap metrics, namely the Jaccard index ([108]Eq (1), see [109]S4
Fig) and the fractional pairwise overlap matrix ([110]Eq (2), see
[111]S5 Fig), revealed that positive-control pathways are mostly
non-overlapping. We then ran GSEA under eight different modalities,
choosing gene-set or phenotype permutations combined with four
different choices for the enrichment statistic, either classic
(unweighted) or weighted (p = 1, 1.5, 2), which comprise all the
options available in the current GSEA release (see Sect. 2.4 for
further details).
[112]Fig 2(a) shows the overlap of positive-control pathways found
significant among different weight parameter choices using gene-set
permutations and p—value < 0.05 as significance criterion. We observe
that, while half (36) of the pathways were found significant by all
four runs and an additional set of 8 were found significant by three
runs, 9 pathways were identified only by two methods and 19 of them
were found by only one method. Among the latter, most of them (17) were
identified by the classic, unweighted method. By tightening the
significance criterion to p—value < 0.01, shown in [113]Fig 2(b), the
percentage of pathways found significant by all methods increases only
slightly to 56% (32 out of 57 pathways). We find that stronger
consensus cannot be enforced by tightening the significance criterion
even further; using p—value < 0.001, still only 56% (26 out of 46)
pathways agree among all methods (see [114]S6 Fig). Similarly, [115]Fig
2(c) and 2(d) show results for phenotype permutations using different
p-value cutoffs, as indicated. It is clearly apparent that the
phenotype permutation approach identifies fewer pathways and shows
poorer agreement among different weight parameter choices. For stricter
selection criteria (p—value < 0.001), none of the 72 positive control
pathways are identified.
Fig 2. Significant TCGA-BRCA positive control pathways across different
weight parameter choices.
[116]Fig 2
[117]Open in a new tab
(a) Gene-set permutation with p—value < 0.05. (b) Gene-set permutation
with p—value < 0.01. (c) Phenotype permutation with p—value < 0.05. (d)
Phenotype permutation with p—value < 0.01. GSEA enrichment statistics:
classic (“cl”), weight parameter p = 1 (“p1”), weight parameter p = 1.5
(“p1.5”), and weight parameter p = 2 (“p2”).
To complement these findings, [118]Fig 3 shows the overlap between
gene-set and phenotype permutation methods at significance p—value <
0.05 for each enrichment statistic. As shown in this figure, phenotype
permutation is significantly less sensitive than gene-set permutation,
since the former approach identifies subsets of less than half the
number of positive control pathways obtained by the latter.
Furthermore, to explore the impact of tumor/non-tumor sample
pairedness, which affects the magnitude and statistical significance of
differential gene expression estimates (and, therefore, gene ranking
metrics) as well as phenotype randomization procedures (Sect. 2.4), we
re-ran our pipeline as unpaired data. As shown by [119]S7 Fig (for
gene-set permutations) and [120]S8 Fig (for phenotype permutations),
differences between paired and unpaired analyses appear to be minor.
Fig 3. Significant TCGA-BRCA positive control pathways using gene-set (“gs”)
and phenotype (“ph”) permutation approaches for different enrichment
statistics.
[121]Fig 3
[122]Open in a new tab
The significance criterion was p—value < 0.05. (a) Classic
(unweighted). (b) Weight parameter p = 1. (c) Weight parameter p = 1.5.
(d) Weight parameter p = 2.
In order to compare the sensitivity and specificity of GSEA under
different modalities, we generated 1000 negative-control pathways by
random selection of genes (see Sect. 2.6 for details). Ordering
positive and negative controls by significance, [123]Fig 4 shows
receiver operating characteristic (ROC) curves for (a) gene-set and (b)
phenotype permutation approaches, using different enrichment
statistics. For comparison, we also show results from (c) signed and
(d) unsigned implementations of over-representation analysis (ORA)
using gene lists selected by q—value < 0.05 adjusted by two different
methods (see Sect. 2.5 for details). This analysis shows that the
classic gene-set permutation approach of GSEA yields the best
sensitivity vs specificity tradeoff with Area Under the Curve (AUC)
estimated as 0.99 and DeLong’s 95% confidence interval (CI) in the 0.98
− 1 range. The introduction of weighted enrichment statistics appears
to be deleterious, negatively impacting the performance of gene-set
approaches ([124]Fig 4(a)). Moreover, in agreement with previous
results, phenotype permutation models display poorer sensitivity
([125]Fig 4(b)). The straightforward signed ORA ([126]Fig 4(c))
achieves a performance similar to that of gene-set GSEA with p = 1 and
better than that of other weighted approaches, but below gene-set
classic GSEA. It should be noticed that our definition of
positive-control pathways relies on a pathway significance filter,
which removes target pathways that appeared not significant in any of
the GSEA runs (Sect. 2.3.1 and [127]Fig 1). By removing this filtering
step and, thereby, expanding the target list from 72 to 109 pathways,
our main conclusions remain the same: the classic gene-set permutation
approach of GSEA is the best-performing model with AUC = 0.87 (95% CI =
0.83 − 0.91), as shown in [128]S9 Fig.
Fig 4. ROC curves for different GSEA and ORA approaches using 72 TCGA-BRCA
positive control pathways and 1000 randomized negative controls.
[129]Fig 4
[130]Open in a new tab
(a) Gene-set permutation GSEA. (b) Phenotype permutation GSEA. (c)
Signed ORA. (d) Unsigned ORA. GSEA approaches used different enrichment
statistics, as indicated. ORA approaches used Bonferroni and
Benjamini-Hochberg (B-H) adjusted q-values as different inclusion
criteria to select differentially expressed genes, as indicated.
[131]Fig 5 expands the analysis by showing the Area Under the Curve
across all cancer types included in our study, where error bars
indicate DeLong’s 95% confidence intervals, for different GSEA and ORA
approaches using positive control pathways and randomized negative
controls as previously described. Despite the fact that the number of
available samples, subjects, and pathways identified as positive
controls varies significantly across projects (recall [132]Table 1),
the classic gene-set permutation approach of GSEA remains, quite
remarkably, the top performing method. As shown in [133]S10 Fig, this
finding is confirmed by expanding the analysis to include all target
pathways (i.e. by removing the last pathway filtering step in the
procedure described in Sect. 2.3.1 and [134]Fig 1).
Fig 5. AUC across TCGA projects for different GSEA and ORA approaches using
cancer-type-specific positive control pathways and 1000 randomized negative
controls.
[135]Fig 5
[136]Open in a new tab
(a) Gene-set permutation GSEA. (b) Phenotype permutation GSEA. (c)
Signed ORA. (d) Unsigned ORA. GSEA approaches used different enrichment
statistics, as indicated. ORA approaches used Bonferroni and
Benjamini-Hochberg (B-H) adjusted q-values as different inclusion
criteria to select differentially expressed genes, as indicated.
As discussed earlier (see Sect. 2.2), alternative methods for
differential gene expression analysis exist, which rely on a variety of
approaches for count normalization and statistical models for
differential expression quantification. For the sake of comparison,
[137]S11 Fig shows ROC curves for GSEA gene-set permutations under
different enrichment statistics obtained from TCGA-BRCA’s raw counts
analyzed using the edgeR-voom-limma pipeline previously described.
Although the differences among different enrichment statistics appear
to be less pronounced compared with [138]Fig 4(a), the unweighted
Kolmogorov statistic approach still yields the largest AUC value.
Similarly, [139]S12 Fig extends the comparison to all TCGA projects
analyzed in this study.
Finally, we explored similarities and differences between target
pathways identified in TCGA versus other cancer-type-specific cohorts.
Firstly, we analyzed REBC-THYR, a thyroid cancer study consisting of
780 paired tumor/nontumor RNA-seq tissue samples from 390 subjects
(Sect. 2.1.2). Using gene-set permutations in GSEA with threshold
p—value < 0.05, we calculated the Enrichment Evidence Score (EES) for
each target pathway (see Sect. 2.4 for details). [140]Fig 6(a) shows
the contingency table of EES across target pathways for REBC-THYR vs
TCGA-THCA, which displays a remarkable agreement between pathways
detected in both thyroid cancer studies (see [141]S4 Table for
details). By grouping EES intervals as: (i) [−4, −3]
(non-tumor-enriched), (ii) [−2, 2] (undetermined), and (iii) [[142]3,
[143]4] (tumor-enriched), the association between enriched pathways is
further emphasized, as shown in [144]Fig 6(b), leading to Fisher’s
exact test p—value = 2.5 × 10^−6. Secondly, we analyzed MO-HCC, a study
on hepatocellular carcinoma (the most common form of liver cancer)
consisting of 140 paired tumor/nontumor RNA-seq tissue samples from 70
subjects (Sect. 2.3.1). Following a similar procedure, we obtained the
contingency table of EES across target pathways for MO-HCC vs
TCGA-LIHC, shown in [145]Fig 6(c) and [146]S5 Table, as well as the
contingency table by grouped intervals, displayed in [147]Fig 6(d).
These results appear to capture the larger heterogeneity of TCGA-LIHC
compared with MO-HCC, which is to be expected since the Mongolia study
is focused on a cohort of rather uniform genetic background from a
relatively isolated region exposed to weaker external influences
[[148]26]. Despite the spread attributable to differences in etiology,
the association between EES groups appears highly significant, with
Fisher’s exact test p—value = 4.4 × 10^−15.
Fig 6. Pathway-level enrichment evidence scores for thyroid cancer and
hepatocellular carcinoma cohorts.
[149]Fig 6
[150]Open in a new tab
(a) Comparison between significant target pathways in REBC-THYR vs
TCGA-THCA. (b) Contingency table of grouped EES intervals in REBC-THYR
vs TCGA-THCA (Fisher’s exact test p—value = 2.5 × 10^−6). (c)
Comparison between significant target pathways in MO-HCC vs TCGA-LIHC.
(d) Contingency table of grouped EES intervals in MO-HCC vs TCGA-LIHC
(Fisher’s exact test p—value = 4.4 × 10^−15). Distance to the diagonal
is represented with increasingly darker shades of blue.
A similar approach can also be implemented to investigate the role of
leading-edge genes within the previously identified core pathways.
Leading-edge genes are those identified by the GSEA algorithm to make
the strongest contribution to the enrichment signal of a pathway (see
Sect. 2.4 for details). For the sake of illustration, [151]Fig 7(a)
shows gene-level EES scores for LUI_THYROID_CANCER_CLUSTER_1, a
previously identified high-consensus tumor-enriched pathway in thyroid
cancer (recall [152]S4 Table). The association between EES groups,
shown in [153]Fig 7(b), is highly significant, with Fisher’s exact test
p—value = 4.6 × 10^−8. Out of 52 constituent genes, we observe that 17
of them are robustly identified as leading-edge genes in this thyroid
cancer pathway in both cohorts (see [154]S6 Table for details).
Interestingly, three genes (EPOR, GAS6, CDC25B) are found to be
leading-edge only in TCGA-THCA, while other three genes (SPRY1,
ARHGAP1, RUBCN) are identified as leading-edge only in REBC-THYR, which
may point to mechanistic differences across etiologies. [155]Fig 7(c)
illustrates how we can integrate high-consensus leading-edge genes
(defined by requiring |EES| ≥ 3 in at least one of the cohorts) and
core pathway information. The seven core thyroid cancer pathways
reported earlier here appear depicted in connection with high-consensus
leading-edge genes; for simplicity, only genes connected to two or more
pathways are shown.
Fig 7. Gene-level enrichment evidence scores for thyroid cancer cohorts.
[156]Fig 7
[157]Open in a new tab
(a) Comparison between EES leading-edge genes in REBC-THYR vs TCGA-THCA
for genes in LUI_THYROID_CANCER_CLUSTER_1, previously identified as a
high-consensus tumor-enriched pathway in thyroid cancer. (b)
Contingency table of grouped EES intervals for the same case as in
panel (a) (Fisher’s exact test p—value = 4.6 × 10^−8). (c) Network
representation showing seven core thyroid cancer pathways and
high-consensus leading-edge genes (|EES| ≥ 3 in at least one of the
cohorts). Only genes connected to two or more pathways are shown.
4 Discussion
Gene Set Enrichment Analysis (GSEA) pioneered the broad class of
functional class scoring approaches to pathway analysis and remains, to
this day, one of the leading methods of choice. For this reason, it is
of paramount importance for users to be aware of its caveats and
limitations. In this Section, we discuss some important aspects of the
current implementation of GSEA.
The GSEA algorithm provides a number of alternative statistics that can
be used for feature ranking. For categorical phenotypes, options are
signal-to-noise (default), t-test, ratio of classes, difference of
classes, and log[2] of ratio of classes. For numerical phenotypes,
options are cosine, Euclidean, Manhattan, Pearson and Spearman
correlation ranking metrics. These ranking statistics aim to capture
some measure of differential expression between categorical phenotypes,
or some expression trend associated with quantitative phenotypes. Zyla
et al [[158]14] have analyzed these and other ranking metrics, showing
that they have a critical impact on GSEA results based on a large
number of microarray benchmark datasets. While these metrics are also
widely used for RNA-seq datasets, it is important to be aware that
these ranking statistics, originally selected for their effectiveness
when used with microarray-based expression data, have not been
evaluated for their use with data derived from RNA-seq experiments. As
noted by Reimand et al [[159]37], rather than using these built-in
options, RNA-seq-derived differential gene expression values should be
computed outside of GSEA, using methods that include variance
stabilization (such as edgeR [[160]48], DESeq [[161]49], limma
[[162]33], and voom [[163]50]) and then imported into GSEA software for
the actual pathway analysis computation. The current implementation of
GSEA, moreover, is not able to handle slightly more complex design
matrices, even common ones such as paired data analysis. In this work,
we illustrate an approach using paired tumor/non-tumor samples to
control for inter-subject heterogeneity, which also makes use of
limma’s linear modeling and empirical Bayes moderation to assess
differential expression (see Sect. 2.2).
GSEA has two methods for determining the statistical significance of
pathway enrichment: gene-set permutation and phenotype permutation (see
Sect. 2.4 for details). Gene-set permutation is recommended for studies
with limited variability and biological replicates, while phenotype
permutation is recommended when a larger number of replicates is
available (as a rule of thumb, at least ten per condition) [[164]37].
The main advantage of the latter is that it preserves the structure of
gene sets with biologically important gene-gene correlations, in
contrast to the gene-set permutation approach [[165]51]. Despite these
recommendations, performing phenotype permutation with the appropriate
methods that RNA-seq data require for differential expression modeling
and ranking is a daunting task for the typical GSEA user, since it
demands custom programming to compute pathway enrichment scores and
differential expression statistics separately for thousands of
phenotype randomizations [[166]37]. Besides the issues of custom
programming and computational complexity, phenotype permutation jobs
typically take several hours to complete, even in a parallelized
multi-CPU environment. By implementing and executing the phenotype
permutation approach, however, our work provided evidence that the
gene-set permutation method showed superior performance to identify
RNA-seq-based, cancer-type-specific pathways.
GSEA offers the option of a weighted Kolmogorov-Smirnov global
statistic to assess gene set enrichment; this global statistic is
weighed using a power of the local statistic, p > 0 (see Sect. 2.4).
Besides the so-called classic, or unweighted, original version of GSEA
(akin to setting p = 0), built-in options for this weight parameter are
p = 1 (default), p = 1.5, and p = 2. By increasing p, the suppression
effects of genes with smaller absolute values of the local statistic
become more pronounced, thereby emphasizing the impact of the genes at
the top and bottom of the ranked gene list. No clear guidance is
offered to users as to how to choose this so-called enrichment
statistic with RNA-seq data, since the main evidence in favor of
weighted statistics was produced when the method was first introduced
for the analysis of microarray datasets [[167]17]. In this work, we
explored all available options for both permutation types and concluded
that the classic, unweighted gene-set permutation procedure offers
comparable or better sensitivity-specificity tradeoffs across cancer
types compared with other, more complex and computationally intensive
permutation methods. It should be noticed that only performing
unweighted GSEA and permuting the genes, analytical solutions for the
expected average enrichment score are available [[168]51]; all other
GSEA modalities require the calculation of non-parametric,
computationally intensive enrichment score distributions.
In fact, by running all GSEA enrichment statistics under gene-set
permutation, which can be easily accomplished by uploading a ranked
list of genes to GSEA’s Preranked mode, no custom programming is
required (except for RNA-seq data processing, QC, normalization,
filtering, and differential gene expression analysis, which may be
accomplished with standard software outside of GSEA). As demonstrated
by our analysis of additional cohorts from non-TCGA sources, we see
value in a strategy that uses the proposed Enrichment Evidence Score
(EES) as a consensus metric to identify a core set of pathways,
complemented by an expanded set of pathways for exploratory analysis,
which may be more specifically tailored to the study of interest.
Similarly, EES can be used to identify high-consensus leading-edge
genes, which allows to integrate pathway- and gene-level information.
Hence, the standard, unidirectional sense of information flow through
pathway analysis (from genes to pathways) acquires a feedback loop to
enrich the input, gene-phenotype associations (e.g. differential gene
expression) with functional, gene-pathway associations. The information
thus flows from genes to pathways and back.
5 Conclusion
The main goal of this work was to quantitatively assess the performance
of a variety of GSEA modalities and provide guidance in the practical
use of GSEA for the analysis of RNA-seq data. We leveraged harmonized
RNA-seq datasets available from TCGA in combination with large, curated
pathway collections from MSigDB to obtain cancer-type-specific target
pathway lists across multiple cancer types. We performed a detailed
analysis of GSEA performance using both gene-set and phenotype
permutations combined with four different choices for the
Kolmogorov-Smirnov enrichment statistic, either unweighted (also known
as classic) or weighted (with three different options for the weight
parameter). Based on our benchmarks, we concluded that the
classic/unweighted gene-set permutation approach offered comparable or
better sensitivity-specificity tradeoffs across cancer types compared
with other, more complex and computationally intensive permutation
methods, and also appeared superior to the performance of standard
over-representation analysis procedures. Finally, we analyzed other
large cohorts for thyroid cancer and hepatocellular carcinoma. We
utilized a new consensus metric, the Enrichment Evidence Score (EES),
which showed a remarkable agreement between pathways identified in TCGA
and those from other sources, despite differences in cancer etiology.
This procedure suggests an analysis strategy in which, by running
multiple GSEA modalities in palallel, one may identify a core set of
pathways (namely, those showing a high degree of consensus across
enrichment metrics), which may be complemented by an expanded set of
pathways for exploratory analysis. The EES metric can also be utilized
to identify high-consensus leading-edge genes and integrate pathway-
and gene-level information. In accordance to FAIR Principles for
Research Software (FAIR4RS) [[169]52], we made all source code openly
and publicly available along with detailed step-by-step documentation
[[170]27], with the two-pronged aims (i) to ensure the transparency and
reproducibility of our results, as well as (ii) to enable further
RNA-seq-based benchmarking work by setting up a suitable and flexible
framework. We believe that our work fills the existing gap in current
guidelines and benchmarks for the use of GSEA with RNA-seq data and
that, hopefully, it will stimulate further efforts to advance pathway
analysis methods.
Supporting information
S1 Table. Genes measured and differentially expressed in each TCGA
project.
Number of measured protein-coding genes after all filters were applied
and how many of them were found differentially expressed in either
direction, i.e. over-expressed in primary-tumor (FC > 1) or non-tumor
(FC < 1) tissue, using Bonferroni or Benjamini-Hochberg (B-H) q—value <
0.05 significance thresholds adjusted for multiple testing.
(XLSX)
[171]pone.0302696.s001.xlsx^ (11.3KB, xlsx)
S2 Table. Cancer-type specific pathways.
For each TCGA study, we provide name, size, collection, and GMT file
name for all cancer-type specific preselected pathways. Additional
columns indicate whether preselected pathways were included in the
target and positive-control pathway lists.
(XLSX)
[172]pone.0302696.s002.xlsx^ (34.7KB, xlsx)
S3 Table. Gene ranking scores for all TCGA projects analyzed in our
study.
(XLSX)
[173]pone.0302696.s003.xlsx^ (3.9MB, xlsx)
S4 Table. Enrichment evidence scores of thyroid cancer pathways in
TCGA-THCA and REBC-THYR.
(XLSX)
[174]pone.0302696.s004.xlsx^ (11KB, xlsx)
S5 Table. Enrichment evidence scores of liver cancer pathways in
TCGA-LIHC and MO-HCC.
(XLSX)
[175]pone.0302696.s005.xlsx^ (12.8KB, xlsx)
S6 Table. Enrichment evidence scores for genes in the
LUI_THYROID_CANCER_CLUSTER_1 pathway in TCGA-THCA and REBC-THYR.
(XLSX)
[176]pone.0302696.s006.xlsx^ (11.5KB, xlsx)
S1 Fig. Differentially-expressed genes in TCGA-BRCA.
Volcano plot showing significant genes over-expressed in primary-tumor
(red) or non-tumor (blue) tissue based on Bonferroni-adjusted q—value <
0.05. Labels for the top ten genes on either side are also shown.
(TIF)
[177]pone.0302696.s007.tif^ (2.8MB, tif)
S2 Fig. Comparison of gene ranking scores obtained from differential
gene expression (DGE) assessed via voom quantile normalization vs
log-transformed transcript per million (TPM) normalized counts.
DGE was assessed from 228 paired primary-tumor vs non-tumor breast
cancer samples from TCGA-BRCA. The linear fit is shown by a dashed red
line. Spearman’s correlation estimate and p-value are shown in the
legend.
(TIF)
[178]pone.0302696.s008.tif^ (6.9MB, tif)
S3 Fig. Comparison of pathway ranking scores obtained from differential
gene expression (DGE) assessed via voom quantile normalization vs
log-transformed transcript per million (TPM) normalized counts.
DGE was assessed from 228 paired primary-tumor vs non-tumor breast
cancer samples from TCGA-BRCA. Target pathways were analyzed using GSEA
with different enrichment statistics: (a) classic (unweighted); (b)
weight parameter p = 1; (c) weight parameter p = 1.5; (d) weight
parameter p = 2. Pathway ranking scores were defined as
−log_10(p—value)*sign(ES), where p-values were empirically determined
via gene-set permutations and ES represented gene set enrichment
scores. Because we used 1000 permutations for GSEA’s null model, we
adopted p—value = 0.001 as lower threshold, which implies that pathway
ranking scores are constrained to the [−3, 3] range. Linear fits are
shown by dashed red lines. Spearman’s correlation estimates and
p-values are shown in the legends.
(TIF)
[179]pone.0302696.s009.tif^ (919KB, tif)
S4 Fig. Jaccard index matrix to assess overlaps among 72 TCGA-BRCA
positive control pathways.
(TIF)
[180]pone.0302696.s010.tif^ (19MB, tif)
S5 Fig. Fractional pairwise overlap matrix to assess redundancies among
72 TCGA-BRCA positive control pathways.
(TIF)
[181]pone.0302696.s011.tif^ (18.7MB, tif)
S6 Fig. Significant TCGA-BRCA positive control pathways across
different weight parameter choices.
Gene-set permutation with p—value < 0.001.
(TIF)
[182]pone.0302696.s012.tif^ (1.3MB, tif)
S7 Fig. Comparison of significant TCGA-BRCA target pathways from paired
vs unpaired analyses using gene-set permutation approaches for
different enrichment statistics.
The significance criterion was p—value < 0.05. (a) Classic
(unweighted). (b) Weight parameter p = 1. (c) Weight parameter p = 1.5.
(d) Weight parameter p = 2.
(TIF)
[183]pone.0302696.s013.tif^ (1.5MB, tif)
S8 Fig. Comparison of significant TCGA-BRCA target pathways from paired
vs unpaired analyses using phenotype permutation approaches for
different enrichment statistics.
The significance criterion was p—value < 0.05. (a) Classic
(unweighted). (b) Weight parameter p = 1. (c) Weight parameter p = 1.5.
(d) Weight parameter p = 2.
(TIF)
[184]pone.0302696.s014.tif^ (1.2MB, tif)
S9 Fig. ROC curves for different GSEA and ORA approaches using 109
TCGA-BRCA target pathways and 1000 randomized negative controls.
(a) Gene-set permutation GSEA. (b) Phenotype permutation GSEA. (c)
Signed ORA. (d) Unsigned ORA. GSEA approaches used different enrichment
statistics, as indicated. ORA approaches used Bonferroni and
Benjamini-Hochberg (B-H) adjusted p-values as different inclusion
criteria to select differentially expressed genes, as indicated.
(TIF)
[185]pone.0302696.s015.tif^ (2.3MB, tif)
S10 Fig. AUC across TCGA projects for different GSEA and ORA approaches
using cancer-type-specific target pathways and 1000 randomized negative
controls.
(a) Gene-set permutation GSEA. (b) Phenotype permutation GSEA. (c)
Signed ORA. (d) Unsigned ORA. GSEA approaches used different enrichment
statistics, as indicated. ORA approaches used Bonferroni and
Benjamini-Hochberg (B-H) adjusted p-values as different inclusion
criteria to select differentially expressed genes, as indicated.
(TIF)
[186]pone.0302696.s016.tif^ (3.4MB, tif)
S11 Fig. ROC curves for gene-set permutation GSEA from TCGA-BRCA
samples.
Results obtained by an alternative gene expression analysis derived via
the edgeR-voom-limma pipeline described in Sect. 2.2.
(TIF)
[187]pone.0302696.s017.tif^ (7.7MB, tif)
S12 Fig. AUC for gene-set permutation GSEA across TCGA projects.
Results obtained by an alternative gene expression analysis derived via
the edgeR-voom-limma pipeline described in Sect. 2.2.
(TIF)
[188]pone.0302696.s018.tif^ (7.6MB, tif)
Acknowledgments