10 common mistakes that could ruin your enrichment analysis

Anusuiya Bora1,2 & Mark Ziemann1,2

  1. Deakin University, School of Life and Environmental Sciences, Geelong, Australia.

  2. Burnet Institute, Melbourne, Australia

(*) Correspondence:

Author ORCID
Mark Ziemann 0000-0002-7688-6974
Anusuiya Bora 0009-0006-2908-1352

Target journal: F1000 Research

Abstract

Functional enrichment analysis (FEA) is an incredibly powerful way to summarise complex genomics data into information about the regulation of biological pathways including cellular metabolism, signalling and immune responses. About 10,000 scientific articles describe using FEA each year, making it among the most used techniques in bioinformatics. While FEA has become a routine part of workflows via myriad software packages and easy-to-use websites, mistakes can easily creep in due to poor tool design and unawareness among users of pitfalls. Here we outline 10 mistakes that undermine the effectiveness of FEA which we commonly see in research articles. We provide practical advice on their mitigation.

Background

PubMed searches indicate keywords like “Pathway analysis” and “Enrichment analysis” appear in title or abstracts of approximately 10,000 articles per year, and that the number of articles matching these keywords has increased by a factor of 5.4 between 2014 and 2024. The purpose of FEA is to understand whether gene categories are collectively differentially represented in the molecular profile at hand, and involves querying hundreds or thousands of functional categories which commonly represent gene pathways or ontologies. There are a variety of methods for FEA, but the main two are over-representation analysis (ORA) and functional class scoring (FCS) (1). ORA involves selecting genes based on a hard cut-off followed by a test of enrichment (e.g.: Fisher’s exact test) as compared to a background list (2). Popular ORA tools include websites like DAVID (3) and software packages like ClusterProfiler (4). FCS involves ranking all detected genes followed by a test to asses whether the distribution of scores deviates towards the up- or down-regulated directions. GSEA (5) is a stand-alone FCS software with a graphical user interface, and there are several command-line implementations such as fgsea (6).

Recommendations on correct application of pathway enrichment have been previously published (7–10), yet we and others continue to observe blatant mistakes and methodological deficiencies appearing in peer-reviewed publications at an alarming rate (11,12). The purpose of this opinion article is to share what our group has learned about successful FEA over the past 15 years having authored dozens of articles using it and critically examining hundreds of published articles using the method.

1. Using uncorrected p-values for statistical significance

AB

Figure 1A. Impact of FDR correction of p-values on the number of 'significant' gene sets.

Figure 1A. Impact of FDR correction of p-values on the number of ‘significant’ gene sets.

Figure 1B. FDR control reduces false positives.

Figure 1B. FDR control reduces false positives.

2. Not using a background gene list

AB

Figure 2A. Impact of background list on the number of significant gene sets. Example dataset.

Figure 2A. Impact of background list on the number of significant gene sets. Example dataset.

Figure 2B. Impact of background list on the number of significant gene sets. 100 simulations.

Figure 2B. Impact of background list on the number of significant gene sets. 100 simulations.

Figure 2C. Gene sets consistently appearing as false positives include those related to cancer.

Figure 2C. Gene sets consistently appearing as false positives include those related to cancer.

3. Using a tool that doesn’t report enrichment scores

FDR values can tell us whether something is statically significant, but it doesn’t directly indicate whether there will be any biological impact. For that, we need some measure of effect size. In enrichment analysis, we can use an enrichment score as a proxy measure of effect size. For rank-based tools like GSEA, the enrichment score varies from -1 to +1 denoting the distribution of genes in a gene set relative to all other genes. A score of 1.0 would mean that X genes in the set are the X most upregulated, while if a value is close to 0, it means the distribution of genes is close to what you might get by random chance. For over-representation methods like DAVID, the fold enrichment score is often quoted, which is the odds ratio of genes of a gene set in the list as compared to the background. Unfortunately, DAVID doesn’t provide the fold enrichment scores in the main results page, they are only available in the table for download. Many other common tools don’t even calculate the enrichment scores (looking at you clusterProfiler), which leaves researchers in the dark about their effect sizes. Tools that do provide enrichment scores include ShinyGO (web) (13), GSEA (5) and fgsea (fora) (6).

4. Prioritising results solely by p-value

Pathway enrichment analysis can return hundreds of significant results, which can be confusing to interpret. Many tools by default will sort the results by significance, but this can lead you to prioritise pathways that are very large but where each gene is only slightly dysregulated. To demonstrate this, see the results from a typical pathway enrichment analysis result with p-value prioritisation and with enrichment score prioritisation after removal of non-significant pathways (Table 1 and Table 2). P-value prioritisation emphasises generic functions with really large gene sets, while enrichment score prioritisation highlights much smaller gene sets with highly specific functions, where each member gene is showing a relatively bigger change in expression. These more specific gene sets are in general better candidates for downstream validation due to their explanatory power.

Table 1. Top upregulated pathways when prioritised by FDR.
set setSize pANOVA s.dist p.adjustANOVA
Cell Cycle 589 1.4e-47 0.35 2.6e-44
Metabolism of RNA 725 4.4e-46 0.31 4.0e-43
Cell Cycle, Mitotic 479 1.5e-39 0.35 9.1e-37
Cell Cycle Checkpoints 246 2.1e-27 0.40 9.6e-25
M Phase 336 9.5e-27 0.34 2.9e-24
Mitotic Prometaphase 189 4.6e-25 0.44 1.1e-22
Mitotic Metaphase and Anaphase 208 8.9e-24 0.40 1.8e-21
Mitotic Anaphase 207 2.3e-23 0.40 4.3e-21
Processing of Capped Intron-Containing Pre-mRNA 273 6.6e-23 0.35 1.1e-20
Resolution of Sister Chromatid Cohesion 114 1.0e-20 0.51 1.6e-18
Table 2. Top upregulated pathways when prioritised by ES. Pathways with FDR>0.05 were excluded.
set setSize pANOVA s.dist p.adjustANOVA
Activation of NOXA and translocation to mitochondria 5 1.1e-03 0.84 6.9e-03
Condensation of Prometaphase Chromosomes 11 2.5e-06 0.82 3.5e-05
Postmitotic nuclear pore complex (NPC) reformation 27 1.8e-11 0.75 6.4e-10
Phosphorylation of Emi1 6 1.6e-03 0.75 8.8e-03
Interactions of Rev with host cellular proteins 37 6.8e-15 0.74 5.2e-13
Nuclear import of Rev protein 34 1.7e-13 0.73 9.9e-12
Rev-mediated nuclear export of HIV RNA 35 1.0e-13 0.73 6.3e-12
Transport of Ribonucleoproteins into the Host Nucleus 32 2.1e-12 0.72 1.0e-10
Export of Viral Ribonucleoproteins from Nucleus 32 2.9e-12 0.71 1.3e-10
NEP/NS2 Interacts with the Cellular Export Machinery 32 2.9e-12 0.71 1.3e-10

5. Using gene lists that are too large or too small for ORA

It’s a common misconception that only differentially expressed genes that meet the FDR threshold should be submitted to an enrichment test, but this simply isn’t true. So long as you’re using proper FDR control of your pathways (See #1 above), you can select genes in any arbitrary way your like. The caveat here is that enrichment tests (like hypergeometric method) have size ranges of input gene lists that work best. We tested different input gene list sizes in ORA and found that 2500 yielded the most (456 with FDR<0.0), while sizes 200 and less yielded very few significant pathways (Figure 3). In the range of 300-1000, there’s a steep increase in the number of significant pathways, and after that the gradient reduces. This suggests a sweet spot around 1000, which in our example is 7.6% of the 13,168 genes detected. If you want to avoid making arbitrary thresholds (which seem to annoy reviewers) then we’d suggest using a method like GSEA instead that calculates enrichment from all detected genes.

Figure 3. Effect of gene list size on number of significant pathways. Up-regulated in red, down-regulated in blue.

Figure 3. Effect of gene list size on number of significant pathways. Up-regulated in red, down-regulated in blue.

6. Combining up and down-regulated genes in the same ORA test

In some articles we’ve read, we noticed that authors didn’t conduct separate ORA tests for up- and down-regulated gene lists, instead opting to submit the combined list for ORA. This isn’t necessarily an error, as it tests the hypothesis that some pathways are “dysregulated,” a mix of up- and down-regulated genes, that appear at an elevated rate. The combined approach can miss a lot of results as compared to the separate approach. According to the results of our example analysis the separate approach identified 355 pathways and the combined approach found only 149, that’s 58% less (see Fig XX). The combined approach could uniquely identify some pathways, but this is relatively few. In the example dataset, only 2.2% of results were identified exclusively with the combined test.

The reason behind this is two-fold. Firstly, we know that genes in the same pathway are typically correlated with each other (14). Consider cell cycle genes, or genes responding to pathogens, which are activated in unison to coordinate a complex biological process. In a typical differential expression experiment after a stimulus, this results in pathways that are predominantly up or down regulated, but rarely a mix of up and down. Due to this phenomenon, the up and down lists each have relatively strong enrichments, but they are diluted when combined (15). Based on this, failing to report data of the separate approach could leave you 58% fewer results, and an incomplete picture of what’s happening at the molecular level.

Figure 6. Combining up and downregulated genes into one ORA test yields far fewer results.

Figure 6. Combining up and downregulated genes into one ORA test yields far fewer results.

7. Bad visualisation

AB

8. Using shallow gene annotations

One of the most important decisions you’ll make when doing a pathway enrichment analysis is selecting the database to query. There are many options to consider, both proprietary and open source. When choosing, consider whether the database contains the gene sets that you a priori suspect will be altered in the profile you’re looking at. Secondly consider the breadth and depth of the pathway library, this will be where the unexpected discoveries may occur, and it pays to use a comprehensive one to capture as many aspects of your data as possible. To demonstrate this, see how KEGG legacy and KEGG Medicus seem tiny when compared to Reactome, which is itself dwarfed by Gene Ontology’s Biological Process (GOBP; Table 3). Consequently, the results obtained are substantially richer for Reactome and GOBP as compared to KEGG libraries.

Table 3. Size metrics of selected gene set libraries, and the number of differentially regulated pathways (FDR<0.05).
No. gene sets Total no. annotations Median gene set size No. genes with ≥1 annotation Up-regulated Down-regulated
KEGG 186 12800 52.5 5245 11 51
KEGGM 658 9662 11.5 2788 20 18
Reactome 1787 97544 23.0 11369 165 117
GOBP 7583 616242 20.0 18000 340 1416
MSigDB 35134 4089406 47.0 43351 3214 7217

9. Using outdated gene identifiers and gene sets

One great thing about working with omics data is that databases like GEO are loaded with old data sets that we can reanalyse with new pathway databases and software tools to eke out further insights. When the data is several years old, we should use the processed data with caution, as many gene names may have since changes. For example, Illumina’s EPIC DNA Methylation microarray was released in 2016, and in the following eight years, 3,253 of 22,588 gene names have changed (14.4%) (16), meaning that those genes wouldn’t be recognised by the pathway enrichment software. To update defunct gene symbols, the HGNChelper R package can help (17), and it has the benefit of fixing gene symbols ruined by Excel autocorrect, which are unfortunately common in GEO (18). Persistent gene identifiers like Ensembl (eg: ENSG00000180096) and HGNC (eg: HGNC:2879) are less likely to change over time and would therefore be preferable over gene symbols (eg: SEPTIN1) for FEA.

The depth of pathway databases increases every year as annotation consortia continue assimilating functional information from the literature. Sometimes these increases are fairly large, as shown by the chart of Reactome growth below (Figure XX). To be certain you have the best possible gene annotation for your analysis, it’s always best to download the newest version.

Figure XX. Reactome gene set growth. Gene sets were downloaded from the MSigDB website, except The last bar which represents the latest gene sets downloaded directly from Reactome, not yet incorporated into MSigDB.

Figure XX. Reactome gene set growth over time. Gene sets were downloaded from the MSigDB website, except The last bar which represents the latest gene sets downloaded directly from Reactome, not yet incorporated into MSigDB.

Figure XX. Reactome gene set growth over time. Gene sets were downloaded from the MSigDB website, except The last bar which represents the latest gene sets downloaded directly from Reactome, not yet incorporated into MSigDB.

10. Not providing detailed methodological info

AB

Conclusion

More subtle issues:

  • Karp, P.D., Midford, P.E., Caspi, R. et al. Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics. BMC Genomics 22, 191 (2021). https://doi.org/10.1186/s12864-021-07502-8

  • Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression Gu Mi ,Yanming Di,Sarah Emerson,Jason S. Cumbie,Jeff H. Chang Published: October 2, 2012 https://doi.org/10.1371/journal.pone.0046128

  • Analyzing gene expression data in terms of gene sets: methodological issues Jelle J Goeman 1, Peter Bühlmann Affiliations Expand PMID: 17303618 DOI: 10.1093/bioinformatics/btm051

  • PLoS Biol. 2019 Nov 12;17(11):e3000481. doi: 10.1371/journal.pbio.3000481 Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias Shir Mandelboum 1,2, Zohar Manber 3, Orna Elroy-Stein 1,2,*, Ran Elkon 2,3,

  • On the influence of several factors on pathway enrichment analysis Briefings in Bioinformatics, Volume 23, Issue 3, May 2022, bbac143, Sarah Mubeen, Alpha Tom Kodamullil, Martin Hofmann-Apitius, Daniel Domingo-Fernández

The natural selection of bad science Paul E. Smaldino and Richard McElreath Published:01 September 2016 https://doi.org/10.1098/rsos.160384

Other notes

  • It’s official. ORA is not as sensitive as FCS

  • Showing too many results

  • Not distinguishing results from multiple tools

Bibliography

1.
Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput Biol. 2012;8(2):e1002375.
2.
Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999;22(3):281–5.
3.
Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, et al. DAVID: A web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50(W1):W216–21.
4.
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb). 2021;2(3):100141.
5.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
6.
Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov NM, Sergushichev A. Fast gene set enrichment analysis. bioRxiv. 2021.
7.
Tilford CA, Siemers NO. Gene set enrichment analysis. Methods Mol Biol. 2009;563:99–121.
8.
Tipney H, Hunter L. An introduction to effective use of enrichment analysis software. Hum Genomics. 2010;4(3):202–6.
9.
Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022;18(8):e1010348.
10.
Zhao K, Rhee SY. Interpreting omics data with pathway enrichment analysis. Trends Genet. 2023;39(4):308–19.
11.
Timmons JA, Szkop KJ, Gallagher IJ. Multiple sources of bias confound functional enrichment analysis of global -omics data. Genome Biol. 2015;16(1):186.
12.
Wijesooriya K, Jadaan SA, Perera KL, Kaur T, Ziemann M. Urgent need for consistent standards in functional enrichment analysis. PLoS Comput Biol. 2022;18(3):e1009935.
13.
Ge SX, Jung D, Yao R. ShinyGO: A graphical gene-set enrichment tool for animals and plants. Bioinformatics. 2020;36(8):2628–9.
14.
Gatti DM, Barry WT, Nobel AB, Rusyn I, Wright FA. Heading down the wrong pathway: On the influence of correlation within gene sets. BMC Genomics. 2010;11(1):574.
15.
Hong G, Zhang W, Li H, Shen X, Guo Z. Separate enrichment analysis of pathways for up- and downregulated genes. J R Soc Interface. 2014;11(92):20130950.
16.
Ziemann M, Abeysooriya M, Bora A, Lamon S, Kasu MS, Norris MW, et al. Direction-aware functional class scoring enrichment analysis of infinium DNA methylation data. Epigenetics. 2024;19(1):2375022.
17.
Oh S, Abdelnabi J, Al-Dulaimi R, Aggarwal A, Ramos M, Davis S, et al. HGNChelper: Identification and correction of invalid gene symbols for human and mouse. F1000Res. 2020;9:1493.
18.
Ziemann M, Eren Y, El-Osta A. Gene name errors are widespread in the scientific literature. Genome Biol. 2016;17(1).