Abstract

   Introduction: Gene set enrichment analysis (GSEA) subsequent to
   differential expression analysis is a standard step in transcriptomics
   and proteomics data analysis. Although many tools for this step are
   available, the results are often difficult to reproduce because set
   annotations can change in the databases, that is, new features can be
   added or existing features can be removed. Finally, such changes in set
   compositions can have an impact on biological interpretation.

   Methods: We present bootGSEA, a novel computational pipeline, to study
   the robustness of GSEA. By repeating GSEA based on bootstrap samples,
   the variability and robustness of results can be studied. In our
   pipeline, not all genes or proteins are involved in the different
   bootstrap replicates of the analyses. Finally, we aggregate the ranks
   from the bootstrap replicates to obtain a score per gene set that shows
   whether it gains or loses evidence compared to the ranking of the
   standard GSEA. Rank aggregation is also used to combine GSEA results
   from different omics levels or from multiple independent studies at the
   same omics level.

   Results: By applying our approach to six independent cancer
   transcriptomics datasets, we showed that bootstrap GSEA can aid in the
   selection of more robust enriched gene sets. Additionally, we applied
   our approach to paired transcriptomics and proteomics data obtained
   from a mouse model of spinal muscular atrophy (SMA), a
   neurodegenerative and neurodevelopmental disease associated with
   multi-system involvement. After obtaining a robust ranking at both
   omics levels, both ranking lists were combined to aggregate the
   findings from the transcriptomics and proteomics results. Furthermore,
   we constructed the new R-package “bootGSEA,” which implements the
   proposed methods and provides graphical views of the findings.
   Bootstrap-based GSEA was able in the example datasets to identify gene
   or protein sets that were less robust when the set composition changed
   during bootstrap analysis.

   Discussion: The rank aggregation step was useful for combining
   bootstrap results and making them comparable to the original findings
   on the single-omics level or for combining findings from multiple
   different omics levels.

   Keywords: bootstrap analysis, gene set enrichment analysis, multi-omics
   analysis, proteomics, rank aggregation, transcriptomics

1 Introduction

   Set-based enrichment methods are an integral part of the analysis of
   high-throughput expression data, such as those originating from
   transcriptomics or proteomics experiments. Enrichment methods allow the
   identification of molecular pathways, Gene Ontology (GO) terms, and
   other gene sets that might play a role in the disease of interest. Most
   enrichment methods are subsequently conducted for differential
   expression analysis; that is, they rely on the ranking of genes after
   comparing two groups of samples. Statistical tests are used to
   determine whether the genes of a particular set are disproportionately
   highly enriched among the differentially expressed genes (DEGs)
   ([34]Beissbarth and Speed, 2004; [35]Subramanian et al., 2005;
   [36]Ackermann and Strimmer, 2009). This contrasts with self-contained
   gene set tests, which are based on subsets of expression data related
   to a particular gene set ([37]Goeman et al., 2004; [38]Hummel et al.,
   2008; [39]Jung et al., 2011; [40]Bayerlová et al., 2015).

   Set information for enrichment analyses is usually obtained from public
   databases, for example, on molecular pathways or GO terms. The most
   commonly used databases are the “Reactome pathway knowledgebase”
   ([41]Fabregat et al., 2016) ([42]https://reactome.org/), “Kyoto
   Encyclopedia of Genes and Genomes” ([43]Kanehisa and Goto, 2000) (KEGG;
   [44]https://www.genome.jp/kegg/), “WikiPathways” database ([45]Kelder
   et al., 2012) ([46]https://www.wikipathways.org), “GO” knowledgebase
   ([47]Consortium, 2004) ([48]http://geneontology.org/), and the
   “Molecular Signatures Database” ([49]Liberzon et al., 2011) (MSigDB;
   [50]https://www.gsea-msigdb.org/gsea/msigdb). In the GO database, a
   particular GO term comprises a set of genes that can be assigned to a
   biological process (BP), molecular function (MF), or cellular component
   (CC).

   The contents of the databases are curated either automatically by
   computer algorithms or manually by experts ([51]Consortium, 2004).
   Specifically, WikiPathways provides community-based curation by
   registered contributors ([52]Kelder et al., 2012). An example of a
   database where curation is done both ways, manually and
   computationally, is the MSigDB. Furthermore, pathway membership can be
   experimentally validated or predicted computationally. However, none of
   the modes of curation can prevent the uncertainties remaining regarding
   the membership of individual genes to a particular pathway ([53]Gillis
   and Pavlidis, 2013). This is important because all enrichment analyses
   rely on the correctness of the database information, and the results of
   enrichment analyses would change if features of a set are removed or
   added. This can especially happen when the database information is
   retrieved at different times. For example, the GO database contained
   42,442 terms classified as valid and 5,287 classified as obsolete in
   January 2024. Two months before, only 4,889 terms were classified as
   obsolete, meaning that nearly 400 terms would have to be reconsidered
   when a Gene-set enrichment analysis (GSEA) is performed after January
   2024. In addition, the WikiPathways database reports roughly between
   100 and 700 edits per month. Furthermore, in the KEGG database,
   complete pathways can be merged, leading to a large number of changes.
   For example, the KEGG pathway map00471 has been deleted and then added
   to the KEGG pathway map00470 (“D-amino acid metabolism”).

   In this work, we present bootGSEA as a novel bootstrap approach to
   repeatedly sample subsets of pathways or other gene sets to study
   whether a result remains significant when the set composition is
   changed. The ranking lists of the gene sets obtained from each
   bootstrap replicate were aggregated using a score that can be used for
   a new ranking list. The analyst can then compare the original ranking
   with the bootstrap-based ranking list to study whether the association
   of a pathway or GO term with the disease gains or loses evidence. A
   similar approach was proposed by [54]Schmid et al. (2016), who
   generated a robustness score for each gene set using random subsets of
   gene sets. In contrast to their approach, our method results in a new
   ranking of gene sets that can be helpful in aggregating findings from
   different independent studies or different omics levels. Thus, our
   approach for multi-omics follows the idea of aggregating the different
   omics levels after performing primary analysis on the individual levels
   first. This way of multi-omics analysis has also been implemented in
   other studies. For example, [55]Wang et al. (2014) fused networks that
   were first derived on individual omics levels, [56]Xiong et al. (2012)
   integrated genetic and transcriptomics results in a joint score, and
   [57]Kang et al. (2022) fitted neural networks from individual omics
   levels and merged them into a joint model.

   We also demonstrated that this method is useful for aggregating the
   results of enrichment analysis from different omics domains in the same
   experiment. We applied our new bootstrap pipeline to a single-omics
   scenario (transcriptomics only) comprising six independent renal cancer
   datasets. This example was used to show how our bootstrap pipeline can
   help study the robustness of GSEA when comparing results from multiple
   independent datasets from studies on a similar research question. In
   addition, we analyzed our multi-omics kidney data (transcriptomics and
   proteomics) from our research consortium on spinal muscular atrophy
   (SMA). The data were obtained from a SMA mouse model to demonstrate the
   usefulness of our approach when comparing GSEA results between
   different omics levels. SMA is a monogenic disease caused by the
   mutation or deletion of the survival motor neuron 1 (SMN1) gene
   ([58]Lefebvre et al., 1995). The disease is characterized by the
   degeneration of motoneurons, with the subsequent atrophy of skeletal
   muscles to muscular atrophy since the SMN affects all tissues, which
   also include non-skeletal muscles. Moreover, SMA is a multi-system
   disorder that also affects peripheral organs, such as the kidney
   ([59]Allardyce et al., 2020). Three treatment methods are available,
   all increasing SMN expression. The SMN is expressed ubiquitously and
   has several important cellular functions, including snRNP assembly,
   R-loop resolution, and regulation of the actin cytoskeleton and
   translation ([60]Hensel et al., 2020). Therefore, SMA is a highly
   complex disease with expected dysregulations in pathways in several
   cell types and on several molecular levels.

2 Methods

   In this section, we describe the analysis pipeline, including the
   approach for the bootstrap step used to repeatedly analyze different
   random subsets of the data. Furthermore, the rank aggregation step and
   examples of transcriptomics and proteomics data are presented. The
   complete workflow is shown in [61]Figure 1. All the analyses were
   implemented in the R programming environment [[62]www.r-project.org,
   version 4.2 ([63]R Core Team, 2022)].

FIGURE 1.

   [64]FIGURE 1
   [65]Open in a new tab

   Workflow of the analysis pipeline for bootstrap enrichment analysis at
   the single-omics level and the steps for single- and multi-omics rank
   aggregation. For the data obtained from each omics domain
   (transcriptomics and proteomics data), Gene-set enrichment analysis
   (GSEA) is performed after differential expression analysis using all
   data (original standard analysis). Next, gene set enrichment analysis
   is performed on subsets of data based on bootstrap samples. Finally,
   the different rankings of the GO terms or pathways can be integrated
   not only within each omics level but also across all omics levels. The
   robustness of gene set enrichment analysis can be studied by comparing
   the original results with either single- or multi-omics results.

2.1 Differential expression analysis and bootstrap method for Gene-set
enrichment analysis

   Prior to enrichment analysis, differential expression analysis was
   performed on normalized expression data from different groups of
   interest (for example, disease vs. control) to obtain the ranks of
   genes and proteins. For the microarray and proteomics data,
   normalization was performed using the quantile method ([66]Bolstad et
   al., 2003; [67]Välikangas et al., 2018; [68]Zhao et al., 2020), and for
   the RNA-seq data, the internal normalization of the DESeq method of the
   R package “DESeq2” was used ([69]Love et al., 2014). Differential
   analysis of the proteomics data was performed using the functionality
   of the R package “limma” ([70]Ritchie et al., 2015), and differential
   analysis of the RNA-seq data was done using the R package “DESeq2”
   (spinal muscular atrophy data) and microarray data using “limma”
   ([71]Ritchie et al., 2015) (renal cell carcinoma datasets).Next,
   enrichment analysis based on the results of the differential expression
   analysis was initially performed using the complete dataset, that is,
   using all genes or proteins that were assigned to a particular gene set
   according to the database information. We denote this as the original
   analysis. Gene sets were defined based on pathway data from the KEGG,
   Reactome, and WikiPathways databases, as well as GO terms. KEGG,
   Reactome, and WikiPathways enrichment analyses were performed using the
   R package “clusterProfiler” ([72]Wu et al., 2021), and GO term
   enrichment analyses were performed using the R package “topGO”
   ([73]Alexa and Rahnenführer, 2009). The enrichment analyses in
   “clusterProfiler” implement the methods described by [74]Subramanian et
   al. (2005), which are independent of thresholds for differentially
   expressed features. In contrast, the “topGO” method uses a threshold
   but has the advantage of incorporating information about the hierarchy
   of GO terms.

   To study the variability and robustness of the outcome of the
   enrichment analysis, B bootstrap samples were drawn using only 95% of
   all the genes in each run. The genes were randomly drawn without
   replacement. GSEA was repeated for these randomly selected subsets of
   genes B times, where B is the number of times the whole set was
   resampled. The composition of the defined gene sets changed when
   bootstrapping from the gene sets was performed. Consequently, the
   composition of the gene sets was different for each bootstrap run.
   Thus, the effect of individual genes was also reflected in this
   approach. From this bootstrap procedure, B ranking lists of the gene
   sets were obtained.

2.2 Rank aggregation for single- and multi-omics analyses

   The resulting enrichment analysis from the B bootstrap runs with B
   lists of GO terms was aggregated using the R package “RobustRankAggreg”
   ([75]Kolde et al., 2012). The aggregation score for each pathway was
   obtained based on the number of occurrences and the ranks from each
   bootstrap run. The aggregation score was further transformed into a
   rank for each pathway. To study the robustness of the original
   findings, the rank obtained from the aggregated score can be compared
   with the actual analysis, that is, the analysis without a bootstrap
   step.

   For multi-omics data, original and bootstrap enrichment analyses were
   first performed for each omics domain, resulting in one list of
   aggregated ranks per domain. The aggregated scores from each omics
   domain were further aggregated. In one of our data examples, enrichment
   analysis was performed separately for the transcriptomics and
   proteomics data, and both ranking lists were aggregated into one final
   ranking list. Thus, the final multi-omics score for each pathway or GO
   term was obtained.

2.3 R package: bootGSEA

   The workflow shown in [76]Figure 1 has been compiled and implemented in
   the new R package “bootGSEA” available at the GitHub repository
   ([77]https://github.com/klausjung-hannover/bootGSEA). The input
   requires the results of differential expression analysis. The package
   currently has eight functions. The functions boot.GO and boot.pathway
   are used for GO and pathway enrichment analyses, respectively, of the
   complete data (original analysis) and of bootstrapped data samples, and
   aggr.boot.GO and aggr.boot.pathway are used for the rank aggregation of
   pathways obtained from the former functions. In the functions boot.GO
   and boot.pathway, the user can specify which percentage of features
   should be drawn during the bootstrap runs. Furthermore, to understand
   the robustness of pathways at a broader level, we used a multi-omics
   approach by aggregating ranks from individual omics levels using the
   aggr.multiomics function. In addition, three functions are provided to
   visualize these results and study the robustness of the findings.
   Examples of these visualizations are presented in [78]Section 3. The
   function compareRank was implemented to compare the original and
   bootstrapped results at a single-omics level, the function plotRank,
   for both single- and multi-omics levels, and the function histDiff to
   understand the rank difference between original and bootstrap analyses.

2.4 Example data 1: transcriptomics data from a renal cancer study

   The gene expression profiles of renal cell carcinoma (RCC) datasets
   ([79]GSE6344 ([80]Copland, 2008), [81]GSE14762 ([82]Dykema and Furge,
   2009), [83]GSE11024 ([84]Kort, 2008), [85]GSE14994 ([86]Signoretti and
   Beroukhim, 2010), [87]GSE53757 ([88]John Copland et al., 2014), and
   [89]GSE15641 ([90]Jones et al., 2005)) were downloaded from the Gene
   Expression Omnibus (GEO) database using the GEOquery ([91]Davis and
   Meltzer, 2007) package in the R platform. Detailed information about
   the datasets, including platform and sample size, is given in [92]Table
   1. Differential expression analysis was performed using the limma
   ([93]Ritchie et al., 2015) package in R. DEGs were screened based on
   FDR-adjusted p-values
   [MATH: <mo><</mo> :MATH]
   0.05 as the cut-off value for all datasets. The results from the
   differential analysis were further analyzed following the pipeline
   shown in [94]Figure 1 at the single-omics level.

TABLE 1.

   Renal cell carcinoma (RCC) datasets with accession numbers from the GEO
   database, sample sizes in the normal and tumor groups, and references