A recipe for extremely reproducible enrichment analysis (and why we need it)

Mark Ziemann1*, Anusuiya Bora2

Affiliations

  1. Deakin University, Geelong, Australia, School of Life and Environmental Sciences.

  2. Vellore Institute of Technology, Vellore, India.

(*) Corresponding author:

Abstract

Unreliable and irreproducible research is a significant problem that wastes resources and risks undermining the public perception of science. Previous work has highlighted that published enrichment analyses frequently suffer from statistical and reporting flaws. We sought to determine whether this translates into irreproducibility by examining whether the findings of 20 open-access articles describing enrichment analysis were reproducible. We find that only three articles exhibited a high degree of concordance, while seven exhibited poor reproducibility. The findings indicate that current enrichment analysis practices and reporting are inadequate. To address this, we provide a set of template scripts and a step-by-step protocol to adapt them to a new project’s enrichment analysis needs. This recipe is built according to the “five pillars” framework, which facilitates extreme reproducibility.

Introduction

Reproducibility is essential to the scientific enterprise by confirming the validity of new discoveries [1]. According to a 2016 survey of Nature readers, 52% of respondents agreed that there was a “significant crisis” of reproducibility [2]. While bioinformatics and other branches of computational research are theoretically amenable to complete reproducibility, in reality this is rarely achieved due to lack of detail provided in methods sections, lack of shared data and lack of code [3]. Indeed, even if data and code are shared, reproducibility still cannot be guaranteed due to the differences in computing environments over time, and the availability of dependencies and web-tools [4].

To address this, many articles have been written to endorse the use of “virtual machines” or “containerisation” to control the computing environment including the operating system and dependencies along with the research code to facilitate a high degree of reproducibility [4–17]. Containers also allow for an easy installation process, as all working dependencies are shipped together and can be installed with a single command.

In addition to shared code, data and containerisation, there are a few other ways researchers can make their work more reproducible. Firstly, thorough documentation makes the reproducibility process a great deal easier and more accessible to novice bioinformaticians. Version control keeps track of code and documentation changes over time and manages the edits made by multiple contributors. Literate programming using R Markdown or Jupyter notebooks provides optimal documentation by embedding free text explanations together with the code and outputs such as charts and tables. Secondly, code and data can be linked. This means that the code “knows” the location of the data, avoiding any “file not found” errors [3]. Lastly, reanalysis can be made as simple as possible by creating a master script that executes analysis for each part sequentially, making the entire manuscript reproducible by executing just one or a few commands, and without any manual steps [18]. Taken together, computational reproducibility is a spectrum [19], and the five pillars framework has been described as a way of achieving extreme reproducibility [20]. Currently, bioinformatics work at this gold standard is rare, despite the availability of such tools for over a decade.

In this article, we focus on enrichment analysis, which is one of the most used methods in computational biology, and involves the summarisation of omics data to reflect biological changes, such as the detection of pathway activation in development or disease [21,22]. As of August 2020, this type of analysis was cited 131,332 times [23]. The most popular tool for enrichment analysis appears to be DAVID [24]. Its 11 related journal articles have received a total of 37,035 citations from 27,129 separate PubMed articles (as at 29th June 2023). Examination of related keywords in PubMed suggests the number of articles describing “enrichment analysis” is doubling every two years, as omics analysis becomes accessible to more researchers around the world.

These observations highlight that these tools are essential in the interpretation of omics data. However, there are concerns that inadvertent misuse of these methods leads to unreliable results. These potential problems include (i) the lack of a correct background for over-representation tests, (ii) lack of p-value correction for multiple testing and (iii) lack of reporting detail [25–28]. Without comprehensive methodological documentation and reproducibility, it isn’t possible to verify the validity of the methods and results. We are concerned about the reliability of published enrichment analyses, as web-based and point-and-click tools appear to be far more popular than computer script-based methods for this type of work [27]. Although web-based tools are convenient, they are a potential reproducibility concern. Firstly, incomplete reporting of methods, options and parameters is common. Secondly, algorithms and functional annotation sets undergo regular updates, and previous versions are typically not made available, hampering later reproducibility. In some cases, the entire web resource can become unavailable, and hence irreproducible due to a phenomenon known as link decay [29].

A small number of enrichment analysis applications have been developed to have beneficial properties of being “containerised” and with archived historical versions [30,31]. This helps reproducibility greatly but is not a complete solution as it lacks information about how the input data was used and a record of any parameter selections.

A solution to this is to use containerisation to provide a complete scripted workflow with prescribed links to research data, external databases and complete record of parameter selection. When designed correctly, these containers encapsulate bioinformatics workflows and could ensure their computational reproducibility for decades to come with a few simple commands. There are multiple barriers to entry for such workflows including (i) the added difficulty compared to existing point-and-click solutions; (ii) time poorness of researchers to learn new skills; and (iii) a lack of step-by-step guides and templates aimed at non-expert bioinformaticians.

We address this firstly by providing working templates for Docker image and R Markdown scripts for common transcriptome and enrichment analysis routines. Secondly, we provide step-by-step written and video guides to help users customise these templates for their own work, deploy the analysis, verify the results, and share the materials.

However, before recommending or prescribing such workflows, it is necessary to assess directly whether current practices are adequate, by trying to reproduce the enrichment analyses of previously published studies, and examining whether the conclusions of those articles are supported or not. In the following section we investigate the computational reproducibility of enrichment analyses shown in a sample of 20 open-access journal articles published in 2019. This should provide a sound basis for recommendations.

A systematic assessment of reproducibility in enrichment analysis

Methods

From a list of 186 PubMed central articles using enrichment analysis that we examined previously [27], we selected 20 articles that fulfilled the following criteria; (1) the organism under study is human; (2) the tool used was DAVID Version 6.8 (Database for Annotation, Visualization, and Integrated Discovery) [32] ; (3) the omics type was gene expression (array or RNA-seq); and (4) that the gene list was provided in the supplementary data. The basis for these selection criteria is that this combination represents the most common application of enrichment analysis in our previous study, which could be reproduced without the need to reanalyse the raw dataset from scratch. These articles are listed in the bibliography [33–52].

The lists of genes were collected from the supplementary files and stored in a spreadsheet, paying special attention to preventing gene name errors from occurring [53]. The reproducibility analysis was performed using the respective methodologies stated in the methods sections of the 20 articles using DAVID version 6.8 (March-May 2022) [24].

Briefly, we used the DAVID “Analysis Wizard” and pasted in the provided gene list into the text box, selecting this as “gene list,” together with the appropriate gene identifier type. Typically this was the official gene symbol. The species was specified as Homo sapiens. Next we clicked on the “Functional Annotation Tool” link, bringing up the “Annotation Summary Results” page. We followed the steps indicated in the respective papers to select appropriate gene sets investigated in the articles such as gene ontology (biological process, molecular function), KEGG pathways, etc. Next, the “Functional Annotation Chart” was clicked on, opening a new window with a table of enrichment results, which were saved for future reference.

The data presented in the paper was compared with the results we obtained, recording any discrepancies. Additionally, the statements made in the abstract, results, and discussion sections were carefully examined and cross-checked with our results to understand whether those assertions survive reproduction or not.

Each article was critically evaluated and given a score of one (1) to three (3). Articles rated 1 showed overall poor reproducibility. For example most GO and/or KEGG terms mentioned in the results and discussion were not confirmed upon reproduction, and the conclusions of the study were overall not supported. Articles were rated 2 when at least 50% of the terms are observed as significant after replication according to the original study’s significance threshold, and most conclusions were supported. Articles were rated 3 when they showed a high level of reproducibility, with 80% or more of the terms shown in the article observed again in reproduction, and all conclusions were supported in reproduction.

Results

After conducting this reproduction study according to the methods described in the respective articles, the 20 articles were classified based on the degree of reproducibility (Figure 1). Six studies were classified as low reproducibility, while 10 were classified as medium and just three were classified as high. Justifications for these classifications are provided in Table 1.

Figure 1. Examining reproducibility of enrichment analysis in a sample of 20 journal articles.

Figure 1. Examining reproducibility of enrichment analysis in a sample of 20 journal articles.

Table 1. Classification and justification of reproducibility scores for each article.
S.No. Pubmed ID: Replication of (Table/Figure) A. Was Replication possible? B. Were there other significant terms found which were not mentioned in the article? Replication score and justification
1.PMC6349697: Figure 2C and 2D A. Yes, however, corrected p values are different after replication. B. Yes, only BP terms were taken into account so several CC and MF significant terms were not mentioned in the paper. 3 (high). Considering top 6 BP Terms and KEGG pathways. The basis of the selection of just 6 gene ontologies and 4 KEGG pathways shown in the paper needs to be addressed, when there is a higher amount of significant GOs and pathways that are observed after replication. It was impossible to determine if the article utilised a background gene list or not. The steps to perform DAVID analysis could have been elaborated in the paper. The results shown in the paper are observed after replication but only 6 BP GO significant terms are shown in the paper.
2.PMC6368841: Figure 3 A - D A. Only GO Terms were the same as the paper. KEGG terms were highly inconsistent. B. When replicated, KEGG Terms (using DAVID) were not mentioned in the results of the paper. Shows inconsistency due to the usage of the server. Some terms using clusterprofiler in R were mentioned. 2 (medium). GO results from the original manuscript are somewhat similar to reproduced analysis. The FDR values however are inconsistent. KEGG results were highly different. The list of genes as input into DAVID analysis is unclear. They should have mentioned that both up and down-regulated proteins were considered.
3.PMC6381667: Figure 1 and Figure 2 A. Results are not reproducible and are highly inconsistent B. N.A. 2 (medium). GO terms obtained after replication showed inconsistent results compared to the results in the original article. KEGG results were similar. Better naming convention for supplementary file for 117 genes for DAVID input is necessary. GO Results are not reproducible and are highly inconsistent.
4.PMC6405693: Table 1 and Table 2; Texts under “Protein-Protein Interaction Network Building and Interrelation Analysis Between Pathways” A. Results do not match. B. N.A. 1 (low). GO analysis from up-regulated and down-regulated proteins are less similar to the results given in the paper. The FDR values for some pathways are highly inconsistent. KEGG Analysis of all DEGs do not match at all.
5.PMC6425008: Most of the tables and figures of the paper (Figure 5, 6, 7), Table 2 A. No. B. N.A. 1 (low). This paper could have done proper labelling of supplementary files as it was difficult to find the gene list for DAVID analysis. Lacks information on when control genes are used and when citral-affected gene lists are used. Replication of GO and KEGG analysis is not consistent with results from the paper. How were Figures 5-7 derived? The overall protocol is not so coherent.
6.PMC6444048: The text under “Pathway analysis of SUVmax related genes” And Table 2 A. Terms are the same but gene counts have been different. The DAVID results are somewhat reproducible. B. No 3 (high). Gene counts in the terms significantly enriched in the “process of cell division” and “nucleoplasm” have changed. Only cell cycle is highly enriched in the replicated results, the claims of metabolic pathways and hypoxia signalling (through GSEA analysis) to be enriched are not accurate. The results claimed are somewhat similar. All the processes have different FDR values.
7.PMC6463127: Table 4, Table 5, Table 6 A. Yes but the significance differed. B. Yes, several terms from replication were not mentioned in the paper for GO Analysis. 2 (medium). Does not consider corrected p values. Overall, significant terms from GO analysis of up and down-regulated DEGs are missing upon replication.
8.PMC6535219: Figure 2 A. Somewhat similar. B. There were many top significant terms that were not mentioned. 1 (low). Narrowing/Filtering the given gene list is a required step that could have been mentioned. Many terms were not so significant anymore. There were new terms that are of top positions after replication.
9.PMC6539328: Figure 1A A. Mostly not. B. Yes, many 1 (low). The significant pathways stated in the paper are not showing up once replicated. There is no clarity of the up-regulated genes used for GO and KEGG analysis. Only 21 up-regulated genes are listed whereas the paper states 4831. Also, why are only up-regulated genes used?
10.PMC6542760: Figure 4 A. No. B. Yes, many significant terms were observed. 1 (low). Replication results of GO terms are not at all compliant with the results shown in the paper.
11.PMC6557785: Figure 5 A-B A. Yes, but results in paper are not significant upon replication. B. Yes 2 (medium). For GO terms, there are other terms that have higher numbers of genes, making them more enriched than immune-related pathways as mentioned in the paper. For KEGG Pathways, there are other pathways that are more significant and contain a higher number of genes enriched for example “Salmonella infection” that has a higher count of genes but is not mentioned in the paper.
12.PMC6561911: Figure 5 a and b A. No. B. N.A. 1 (low). The enrichment analysis results are inconsistent with the results displayed in the paper. There was no KEGG analysis done
13.PMC6580941: Table 1 and Table 2 A. Yes. B. Yes! Significant terms like “nucleoplasm,” “RNA Binding,” “protein binding” are missing from the paper. 2 (medium). For GO analysis, the replicated results are not as significant as the GO terms shown in Tables 1 and 2 of the paper. The Gene list for input is not so clearly mentioned, it should have been labelled properly in the name of the supplemental file. P-values are completely different.
14.PMC6587650: Figure 2 A. Yes. B. Yes, some. 3 (high). (Best) The GO terms results of BP, MF, CC and KEGG pathway results from the paper are compliant with replicated results. They could have mentioned why they chose the type of sorting shown in the paper to display their results. Overall, all results have been replicated.
15.PMC6591946: Figure 5 c and Figure 6 a A. Yes for GO analysis only. B. Yes for KEGG analysis 2 (medium). The GO term results of replication are compliant with Figure 5 c of the paper. KEGG pathway results are semi-consistent.
16.PMC6663624: Table 3 and Table 4 A. Yes. B. Yes 1 (low). The top 2 significant GO terms in the paper are not significant after replication analysis is done. Other GO terms are the same as replicated results. For KEGG Analysis, some terms are more significant after replication. Eg: PD-L1 expression and PD-1 checkpoint pathway in cancer, Th1 and Th2 cell differentiation, Th17 cell differentiation were more significant after “natural killer (NK) cell-mediated cytotoxicity” KEGG Pathway. The paper is somewhat replicable.
17.PMC6582306: Figure 5 A. Partly. B. Yes, many. 2 (medium). The paper uses 44 sets of DEGs grouped in 4 major types. For replication, OFC Set 9 was used, DAVID GO analysis was done twice where a discrepancy of results was observed. Compared to the paper, the most significant term was missing after replication analysis was conducted. For OFC set 9, several pathways are seen after replication but only one pathway is shown in paper. This should be addressed.
18.PMC6333352: Figure 7 A. Yes. B. Yes. There was a difference in gene counts as well. 2 (medium). DAVID was used for KEGG Analysis: From WGCNA - Bahn - KEGG Analysis: Epstein-Barr virus infection, the pathway with the highest gene count is missing in the results shown in the paper. From NERI - BAHN - KEGG Analysis: “Pathways of neurodegeneration - multiple diseases,” “Pathways in cancer” - pathways with the highest gene count (30 and 28 respectively) are not mentioned in the paper. Other pathways are present and the paper is somewhat reproducible.
19.PMC6526186: Table 1 A. Yes. B. Yes. 2 (medium). The top BP GO terms from the paper are no longer significantly higher in the replicated GO results. The top 2 KEGG Pathways after replication are “Human cytomegalovirus infection” followed by “Pathways in cancer.” The p-values have changed after the results were replicated, compared to the values given in the paper. The rest of the pathways are similar to the results given in the paper.
20.PMC6607402: Figure 1 A-D A. Yes. B. Yes, many! 2 (medium). For GO analysis and their replication, top enriched BP, CC and MF GO terms are not mentioned. Most of the terms are mentioned in the replicated results, but different levels of enrichment.

Taken together, the ability to reproduce DAVID enrichment analysis using the supplementary gene lists was highly variable. The reliability of enrichment analysis results would benefit by adopting best practices in computational research.

A container-based approach to addressing reproducibility of enrichment analysis

Method overview

We have developed a step-by-step protocol for how to conduct an extremely reproducible enrichment analysis, which is attached as a supplementary file (Suppl Info 1) and published to protocols.io with a permissive license [54]. This protocol is in accordance with the five pillars, and includes version control with git and GitHub, compute environment control using Docker, literate programming with R Markdown, persistent code and data sharing with Zenodo and documentation of all of these to empower others to reproduce the workflow. Central to this protocol is the provision of template Dockerfile and an R Markdown script, which are intended to be remixed and altered by end users to suit their project’s needs. The general steps involved are shown in Figure 2.

Figure 3. Flow chart of example workflow.

Figure 3. Flow chart of example workflow.

The Dockerfile uses a base image from Bioconductor which should correspond to the latest available stable release version, on the Ubuntu operating system. The image contains instructions for installing a few useful utilities including nano and git for modifying scripts, and magic-wormhole for transferring data between computers. It installs R packages from CRAN and Bioconductor repositories to fulfill common tasks in transcriptome analysis such as fetching RNA-seq counts from DEE2 with getDEE2 [55], differential expression with DESeq2 [56], enrichment analysis with clusterprofiler, fgsea and mitch [57–59]. It also contains instructions to clone a github repository, which contains the scripts that run the analytical workflow. Lastly, it copies a gene set library from Reactome [60] with a clear version history to guarantee provenance of the annotation data.

The R Markdown script called example.Rmd undertakes the transcriptome and enrichment analyses. R Markdown allows the unification of extended descriptions, code and results into a single document, which is becoming ubiquitous in bioinformatics and data science more generally [61]. The ability to provide extended descriptions is helpful for providing contextualising information about the purpose of the work, detailed methods and interpretations derived from the analysis. The R Markdown script contains code chunks that perform specific tasks including fetching an example data set from DEE2, inspecting data quality, differential expression and two types of enrichment analysis: over-representation analysis (ORA) and functional class scoring (FCS) (Figure 3).

Figure 3. Flow chart of example workflow.

For each enrichment analysis conducted, a checklist is provided to comply with reporting guidelines [28,62]. An example checklist is shown in Table 2.

Table 2. An example reporting checklist for enrichment analysis.
Reporting criteria Method or resource used
Origin of gene sets Reactome (2023-03-06)
Tool used ClusterProfiler v4.4.4
Statistical test used hypergeometric test
P-value correction for multiple comparisons FDR method
Background list Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria DESeq2 FDR<0.05
ORA directionality Separate tests for up- and down-regulation
Data availability via dee2.io at accession SRP038101
Other parameters Min gene set size of 10

The software versions used in this example analysis are provided in Table 3.

Table 3. The software bundled in the example Docker image.
Software Version Purpose
Ubuntu 22.04 Operating System
R 4.2.2 Statistical computing language
nano 6.2 text editing
git 2.34.1 source control
R/kableExtra 1.3.4 render formatted tables
R/vioplot 0.4.0 violin charts
R/gplots 3.1.3 heatmaps
R/eulerr 7.0.0 Venn diagrams
BioC/getDEE2 1.8.0 fetch example RNA-seq data
BioC/DESeq2 1.38.3 differential expression
BioC/fgsea 1.24.0 functional class scoring
BioC/clusterProfiler 4.6.1 over-representation analysis
Bioc/mitch 1.10.0 multi-contrast enrichment

Reproducing this workflow is relatively quick and simple, requiring six commands and approximately four minutes starting from a base Linux install. The steps include installing Docker, pulling and running the container, running the R Markdown script, exiting the container and copy the newly created report from the container to the current working directory (Box 1).

sudo apt update && sudo apt install docker.io -y  # 30 sec
sudo docker run -it --entrypoint /bin/bash mziemann/enrichment_recipe #60 sec
Rscript -e 'rmarkdown::render("example.Rmd")' # 90 sec
exit
docker cp $(docker ps -aql):/enrichment_recipe/example.html . # instant
firefox example.html

Box 1. Linux shell commands to reproduce a transcriptome and enrichment analysis using a Docker image.

Demonstration

To demonstrate the utility of this workflow, we present the results of this example data analysis; it is a comparison of control and 5-azacitidine-treated AML3 cell transcriptomes with RNA-seq in triplicate [63]. We use this analysis to better understand the effect of DNA methyltransferase inhibitors on gene expression and elaborate upon previous findings [63], which mainly focused on the relationship between RNA expression and DNA methylation, but did not show results of a transcriptome-only enrichment analysis. Furthermore, we provide a comparison of the two most common approaches for enrichment analysis; ORA and FCS, and give guidance for interpreting results of such tests.

For quality control the number of reads per sample was calculated, and found to be in the range of 11-16M. Genes with fewer than 10 reads per sample on average were removed from the analysis, leaving 13,168 genes. A multi-dimensional scaling plot showed clustering of samples into their treatment groups, suggesting a robust effect of the chemical treatment (Figure 4A). Differential expression analysis revealed 3,598 differentially expressed genes with FDR<0.05, with 1,672 exhibiting higher expression and 1,926 exhibiting lower expression in 5-azacytidine treated cells (Figure 4B). Up and down-regulated gene sets were subjected to ORA separately, using the list of 13,168 genes as the background, and gene sets from REACTOME.

Figure 4. Differential expression and enrichment analysis with the example workflow. (A) An MDS plot of gene expression data for the largest two principal components. Blue and pink points represent control and azacitidine-treated samples respectively. (B) An MA plot. The x-axis represents the log10 base mean expression, and the y-axis represents the log2 fold change. Points in red indicate genes with statistically significant differential expression (FDR<0.05), while those in grey indicate did not meet this threshold. (C) Bar plot of ORA-based enrichment test results. (D) Bar plot of FCS-based enrichment test results. (C,D) REACTOME gene sets with FDR>0.05 were discarded and the remaining gene sets were ranked by enrichment score (ES) The most extreme up- and down-regulated genes sets are shown (based on ES). Red bars indicate up-regulation and blue bars indicate down-regulation. (E) Euler diagrams of differentially regulated gene sets identified with ORA and FCS approaches.

With ORA, we observed 201 up-regulated and 99 down-regulated gene sets (FDR<0.05). We prioritised the enrichment analysis by enrichment score, a pseudo measure for effect size, which tends to highlight smaller and more specific gene sets undergoing more drastic differential expression, rather than large gene sets that undergo small expression changes. This led to the identification of up-regulated pathways related to nuclear RNA import and glucokinase regulation; while down-regulated pathways were associated with interferon signaling, interleukin signaling and NADPH oxidase (host defence) (Figure 4C).

After FCS, we identified 239 up-regulated gene sets and 158 down-regulated gene sets (FDR<0.05). FCS up-regulated pathways were associated with prometaphase chromosome condensation, nuclear RNA transport and defective homologous recombination repair (Figure 4D). This finding is consistent with changes to the cell cycle observed in 5-azacytidine treated cells, where the proportion of cells in G2/M phase is reduced, and proxy measures of apoptosis appear slightly higher [63].

Down-regulated pathways related to endosomal/vacuolar (antigen presentation) pathway, interferon signaling and TLR/TNF signaling, which was not described in the original publication [63]. Interestingly, these pathways are reported to be upregulated in other cancer cell studies of DNA hypomethylating agents for similar durations [64–66].

Regarding up-regulated pathways, the similarity between ORA and FCS was relatively high, exhibiting a Jaccard index of 0.58, while this agreement was lower for down-regulated pathways, with a Jaccard index of 0.46 (Figure 4E).

Concluding remarks

Although this reproducibility study is relatively small, there are a few key observations. We observe that only 3/20(15%) of studies achieved a high degree of reproducibility. In a way, this is unexpected, given the simplicity of using DAVID. Users provide a list of genes in a text box and select some options. There are several posible explanations for this inconsistency:

  • The methods stated in the respective articles are inconsistent with what was actually done.

  • The list of genes in the supplement is not the same as what was used for enrichment analysis.

  • Ambiguity around whether up- and down-regulated genes were considered separately or together.

  • Ambiguity around whether all GO classes were considered or just one, such as “biological process.”

  • Ambiguity around whether a custom background gene set was used.

This irreproducibility might be expected given that substantial methodological flaws and reporting deficiencies were present in 85% of enrichment analyses from which this set of 20 studies were drawn [27].

While researching the literature on computational reproducibility in bioinformatics, we identified many case studies, but only two systematic examinations of multiple journal articles. Ionnidis et al [67] examined a set of 10 gene expression studies with available data published in 2005-2006, and found only two could be reproduced to a satisfactory level. A recent preprint reports that only 245 out of 9,625 Jupyter notebooks (2.5%) in biomedical journals successfully reproduced results identical to the respective journal article [68]. More work in this direction is urgently needed to understand the causes of irreproducible bioinformatics research so it can be remedied, as reproducibility is considered only the first step towards reliable research [69].

This article is an attempt to foster increased reproducibility in genomics by demonstrating the implementation of the five pillars principles into a commonly used routine, namely differential expression and pathway enrichment analysis. However, these concepts may be difficult for novice analysts, and so we provide a step-by-step protocol to guide them on how to customise template Dockerfile and R Markdown scripts to suit their project’s needs [54]. Although we aimed to make this process as easy as possible, it requires a substantial time outlay, which we estimate at 15-20 hours. This is significantly longer than using a web-based tool that provides results in just a few minutes, but we argue the time investment is well spent. It ensures analytical reproducibility, assists with auditing and makes the process of reproduction simple and fast (Box 1), which should aid the early identification of errors.

We also want to emphasise that there is still a definite need for graphical and web-based tools. These are good for data exploration, to quickly identify trends that can be followed up with more rigorous and reproducible methods, such as this protocol.

Significant progress is being made to make such point-and-click tools more reproducible. The Galaxy, GenePattern and RNAlysis projects offer graphical tools to create, edit and import workflows that enable reproducibility [70–72]. The Degust differential expression suite is a graphical interface to popular expression tools like edgeR and limma, and allows users to download the computer script to facilitate future non-interactive reproducibility [73]. Such functionality could be written into existing enrichment analysis tools like DAVID, and could provide a “good enough” solution to reproducibility.

Although we hope that extremely reproducible enrichment analysis will become more widespread, we do not anticipate any dramatic changes in researcher behaviour unless the motivating factors are addressed. Currently, reward structures in academia favour career progression for individuals who publish many articles as opposed to individuals who publish fewer but more rigorously conducted works. Solutions to this could include mandates by funding agencies or institutions [74], badging of articles that comply with best practices [75], and recognition of research rigour in track record evaluations.

Acknowledgements

This research was supported by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the NCRIS-funded Australian Research Data Commons (ARDC).

Availability of materials

Bibliography

1.
Committee on Reproducibility and Replicability in Science, Board on Behavioral, Cognitive, and Sensory Sciences, Committee on National Statistics, Division of Behavioral and Social Sciences and Education, Nuclear and Radiation Studies Board, Division on Earth and Life Studies, et al. Reproducibility and replicability in science. Washington, D.C.: National Academies Press; 2019. Available: https://nap.nationalacademies.org/read/25303/chapter/1
2.
Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533: 452–454.
3.
Peng RD. Reproducible research in computational science. Science. 2011;334: 1226–1227.
4.
Perkel JM. Challenge to scientists: Does your ten-year-old code still run? Nature. 2020;584: 656–658.
5.
Stein LD. The case for cloud computing in genome informatics. Genome Biol. 2010;11: 207.
6.
Dudley JT, Butte AJ. In silico research in the era of cloud computing. Nat Biotechnol. 2010;28: 1181–1185.
7.
Tan TW, Tong JC, Khan AM, Silva M de, Lim KS, Ranganathan S. Advancing standards for bioinformatics activities: Persistence, reproducibility, disambiguation and minimum information about a bioinformatics investigation (MIABi). BMC Genomics. 2010;11 Suppl 4: S27.
8.
Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013;9: e1003285.
9.
Piccolo SR, Frampton MB. Tools and techniques for computational reproducibility. Gigascience. 2016;5.
10.
Grüning B, Chilton J, Köster J, Dale R, Soranzo N, Beek M van den, et al. Practical computational reproducibility in the life sciences. Cell Syst. 2018;6: 631–635.
11.
Lewis J, Breeze CE, Charlesworth J, Maclaren OJ, Cooper J. Where next for the reproducibility agenda in computational biology? BMC Syst Biol. 2016;10.
12.
Veiga Leprevost F da, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, et al. BioContainers: An open-source and community-driven framework for software standardization. Bioinformatics. 2017;33: 2580–2582.
13.
Pasquier T, Lau MK, Han X, Fong E, Lerner BS, Boose ER, et al. Sharing and preserving computational analyses for posterity with encapsulator. Comput Sci Eng. 2018;20: 111–124.
14.
Kulkarni N, Alessandrı̀ L, Panero R, Arigoni M, Olivero M, Ferrero G, et al. Reproducible bioinformatics project: A community for reproducible bioinformatics analysis pipelines. BMC Bioinformatics. 2018;19: 349.
15.
Hung L-H, Hu J, Meiss T, Ingersoll A, Lloyd W, Kristiyanto D, et al. Building containerized workflows using the BioDepot-workflow-builder. Cell Syst. 2019;9: 508–514.e3.
16.
Brito JJ, Li J, Moore JH, Greene CS, Nogoy NA, Garmire LX, et al. Recommendations to enhance rigor and reproducibility in biomedical research. Gigascience. 2020;9.
17.
Piccolo SR, Ence ZE, Anderson EC, Chang JT, Bild AH. Simplifying the development of portable, scalable, and reproducible workflows. Elife. 2021;10.
18.
Lasser J. Creating an executable paper is a journey through open science. Commun Phys. 2020;3.
19.
Akalin A. Scientific data analysis pipelines and reproducibility. https://towardsdatascience.com/scientific-data-analysis-pipelines-and-reproducibility-75ff9df5b4c5; 2018.
20.
Ziemann M, Poulain P, Bora A. The five pillars of computational reproducibility: Bioinformatics and beyond. OSF Preprints. 2023. doi:10.31219/osf.io/4pd9n
21.
Slonim DK. From patterns to pathways: Gene expression data analysis comes of age. Nat Genet. 2002;32 Suppl: 502–508.
22.
Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput Biol. 2012;8: e1002375.
23.
Xie C, Jauhari S, Mora A. Popularity and performance of bioinformatics software: The case of gene set analysis. BMC Bioinformatics. 2021;22: 191.
24.
Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, et al. DAVID: A web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50: W216–W221.
25.
Timmons JA, Szkop KJ, Gallagher IJ. Multiple sources of bias confound functional enrichment analysis of global -omics data. Genome Biol. 2015;16: 186.
26.
Reimand J, Isserlin R, Voisin V, Kucera M, Tannus-Lopes C, Rostamianfar A, et al. Pathway enrichment analysis and visualization of omics data using g:profiler, GSEA, cytoscape and EnrichmentMap. Nat Protoc. 2019;14: 482–517.
27.
Wijesooriya K, Jadaan SA, Perera KL, Kaur T, Ziemann M. Urgent need for consistent standards in functional enrichment analysis. PLoS Comput Biol. 2022;18: e1009935.
28.
Zhao K, Rhee SY. Interpreting omics data with pathway enrichment analysis. Trends Genet. 2023;0.
29.
Hennessey J, Ge S. A cross disciplinary study of link decay and the effectiveness of mitigation techniques. BMC Bioinformatics. 2013;14 Suppl 14: S5.
30.
Ge SX, Jung D, Yao R. ShinyGO: A graphical gene-set enrichment tool for animals and plants. Bioinformatics. 2020;36: 2628–2629.
31.
Perampalam P, Dick FA. BEAVR: A browser-based tool for the exploration and visualization of RNA-seq data. BMC Bioinformatics. 2020;21: 221.
32.
Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, et al. The DAVID gene functional classification tool: A novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8: R183.
33.
Liu H, Wu N, Zhang Z, Zhong X, Zhang H, Guo H, et al. Long non-coding RNA LINC00941 as a potential biomarker promotes the proliferation and metastasis of gastric cancer. Front Genet. 2019;10: 5.
34.
Zhang S, Cao R, Li Q, Yao M, Chen Y, Zhou H. Comprehensive analysis of lncRNA-associated competing endogenous RNA network in tongue squamous cell carcinoma. PeerJ. 2019;7: e6397.
35.
Zhang L, Luo M, Yang H, Zhu S, Cheng X, Qing C. Next-generation sequencing-based genomic profiling analysis reveals novel mutations for clinical diagnosis in chinese primary epithelial ovarian cancer patients. J Ovarian Res. 2019;12: 19.
36.
Wang K, Li L, Fu L, Yuan Y, Dai H, Zhu T, et al. Integrated bioinformatics analysis the function of RNA binding proteins (RBPs) and their prognostic value in breast cancer. Front Pharmacol. 2019;10: 140.
37.
Balusamy SR, Ramani S, Natarajan S, Kim YJ, Perumalsamy H. Integrated transcriptome and in vitro analysis revealed anti-proliferative effect of citral in human stomach cancer through apoptosis. Sci Rep. 2019;9: 4883.
38.
Ahn KS, Kang KJ, Kim YH, Kim T-S, Song B-I, Kim HW, et al. Genetic features associated with 18F-FDG uptake in intrahepatic cholangiocarcinoma. Ann Surg Treat Res. 2019;96: 153–161.
39.
Chai YJ, Chae H, Kim K, Lee H, Choi S, Lee KE, et al. Comparative gene expression profiles in parathyroid adenoma and normal parathyroid tissue. J Clin Med. 2019;8: 297.
40.
Guo C, Xu L-F, Li H-M, Wang W, Guo J-H, Jia M-Q, et al. Transcriptomic study of the mechanism of anoikis resistance in head and neck squamous carcinoma. PeerJ. 2019;7: e6978.
41.
Alebrahim D, Nayak M, Ward A, Ursomanno P, Shams R, Corsica A, et al. Mapping semaphorins and netrins in the pathogenesis of human thoracic aortic aneurysms. Int J Mol Sci. 2019;20: 2100.
42.
Hong H, Liu T, Wu H, Zhang J, Shi X, Le X, et al. MicroRNA-550a is associated with muscle system conferring poorer survival for esophageal cancer. Biosci Rep. 2019;39: BSR20181173.
43.
Han M-Z, Wang S, Zhao W-B, Ni S-L, Yang N, Kong Y, et al. Immune checkpoint molecule herpes virus entry mediator is overexpressed and associated with poor prognosis in human glioblastoma. EBioMedicine. 2019;43: 159–170.
44.
McSweeney KM, Bozza WP, Alterovitz W-L, Zhang B. Transcriptomic profiling reveals p53 as a key regulator of doxorubicin-induced cardiotoxicity. Cell Death Discov. 2019;5: 102.
45.
Wang C, Li D, Zhang L, Jiang S, Liang J, Narita Y, et al. RNA sequencing analyses of gene expression during Epstein-Barr virus infection of primary B lymphocytes. J Virol. 2019;93.
46.
Pan Q, Wang L, Zhang H, Liang C, Li B. Identification of a 5-gene signature predicting progression and prognosis of clear cell renal cell carcinoma. Med Sci Monit. 2019;25: 4401–4413.
47.
Wang Y, Zhao W, Liu X, Guan G, Zhuang M. ARL3 is downregulated and acts as a prognostic biomarker in glioma. J Transl Med. 2019;17: 210.
48.
Weigt SS, Wang X, Palchevskiy V, Li X, Patel N, Ross DJ, et al. Usefulness of gene expression profiling of bronchoalveolar lavage cells in acute lung allograft rejection. J Heart Lung Transplant. 2019;38: 845–855.
49.
Najafi H, Naseri M, Zahiri J, Totonchi M, Sadeghizadeh M. Identification of the molecular events involved in the development of prefrontal cortex through the analysis of RNA-seq data from BrainSpan. ASN Neuro. 2019;11: 1759091419854627.
50.
Feltrin AS, Tahira AC, Simões SN, Brentani H, Martins DC Jr. Assessment of complementarity of WGCNA and NERI results for identification of modules associated to schizophrenia spectrum disorders. PLoS One. 2019;14: e0210431.
51.
Wang S-S, Chen G, Li S-H, Pang J-S, Cai K-T, Yan H-B, et al. Identification and validation of an individualized autophagy-clinical prognostic index in bladder cancer patients. Onco Targets Ther. 2019;12: 3695–3712.
52.
Sun R, Meng X, Wang W, Liu B, Lv X, Yuan J, et al. Five genes may predict metastasis in non-small cell lung cancer using bioinformatics analysis. Oncol Lett. 2019;18: 1723–1732.
53.
Zeeberg BR, Riss J, Kane DW, Bussey KJ, Uchio E, Linehan WM, et al. Mistaken identifiers: Gene name errors can be introduced inadvertently when using excel in bioinformatics. BMC Bioinformatics. 2004;5: 80.
54.
Ziemann M, Bora A. A recipe for extremely reproducible enrichment analysis v1. protocols.io. 2023. doi:10.17504/protocols.io.j8nlkwpdxl5r/v1
55.
Ziemann M, Kaspi A, El-Osta A. Digital expression explorer 2: A repository of uniformly processed RNA sequencing data. Gigascience. 2019;8.
56.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15: 550.
57.
Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: An R package for comparing biological themes among gene clusters. OMICS. 2012;16: 284–287.
58.
Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov MN, Sergushichev A. Fast gene set enrichment analysis. bioRxiv. 2016. p. 060012. doi:10.1101/060012
59.
Kaspi A, Ziemann M. Mitch: Multi-contrast pathway enrichment for multi-omics and single-cell profiling data. BMC Genomics. 2020;21: 447.
60.
Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 2022;50: D687–D692.
61.
Grolemund G, Wickham H. R for data science. Sebastopol, CA: O’Reilly Media; 2017.
62.
Wijesooriya K, Jadaan SA, Perera KL, Kaur T, Ziemann M. Guidelines for reliable and reproducible functional enrichment analysis. bioRxiv. 2021. p. 2021.09.06.459114.
63.
Lund K, Cole JJ, VanderKraats ND, McBryan T, Pchelintsev NA, Clark W, et al. DNMT inhibitors reverse a specific signature of aberrant promoter DNA methylation and associated gene silencing in AML. Genome Biol. 2014;15: 406.
64.
Li H, Chiappinelli KB, Guzzetta AA, Easwaran H, Yen R-WC, Vatapalli R, et al. Immune regulation by low doses of the DNA methyltransferase inhibitor 5-azacitidine in common human epithelial cancers. Oncotarget. 2014;5: 587–598.
65.
Chiappinelli KB, Strissel PL, Desrichard A, Li H, Henke C, Akman B, et al. Inhibiting DNA methylation causes an interferon response in cancer via dsRNA including endogenous retroviruses. Cell. 2015;162: 974–986.
66.
Okoye-Okafor UC, Javarappa KK, Tsallos D, Saad JJ, Benard L, Thiruthuvanathan V, et al. Azacytidine inhibits megakaryopoiesis via the induction of immunogenic RNA species and activation of type-i interferon signaling. Blood. 2019;134: 1280–1280.
67.
Ioannidis JPA, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, et al. Repeatability of published microarray gene expression analyses. Nat Genet. 2009;41: 149–155.
68.
Samuel S, Mietchen D. Computational reproducibility of jupyter notebooks from biomedical publications. arXiv. 2022. doi:10.48550/arXiv.2209.04308
69.
Nosek BA, Hardwicke TE, Moshontz H, Allard A, Corker KS, Dreber A, et al. Replicability, robustness, and reproducibility in psychological science. Annu Rev Psychol. 2022;73: 719–748.
70.
Kuehn H, Liberzon A, Reich M, Mesirov JP. Using GenePattern for gene expression analysis. Curr Protoc Bioinformatics. 2008;Chapter 7: 7.12.1–7.12.39.
71.
Goecks J, Nekrutenko A, Taylor J, Galaxy Team. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11: R86.
72.
Teichman G, Cohen D, Ganon O, Dunsky N, Shani S, Gingold H, et al. RNAlysis: Analyze your RNA sequencing data without writing a single line of code. BMC Biol. 2023;21: 74.
73.
Powell D. Drpowell/degust 4.1.1. Zenodo; 2019. doi:10.5281/zenodo.3501067
74.
Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. Quantifying reproducibility in computational biology: The case of the tuberculosis drugome. PLoS One. 2013;8: e80278.
75.
Kidwell MC, Lazarević LB, Baranski E, Hardwicke TE, Piechowski S, Falkenberg L-S, et al. Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biol. 2016;14: e1002456.