Practical guides for implementing ‘Five Pillar’ principles

Source: https://github.com/markziemann/5pillars/blob/main/guides/practical_guides.Rmd

The five pillar approach is a synthesis of over a decade of learnings around computational reproducibility, and contains seven key recommendaions and many best practices. These may be overwhelming for any data scientists, especially beginners and novices, so here we have tried to put together some recommended resources to enable practitioners to put these principles into practice. One of the questions practitioners may have is “Where do I start?” and “In what order should I learn/master these principles?”. The following sections have been arranged in order, and have been submitted to the Internet Archive for posterity.


Getting started with data analysis in R and Python

Learn the basics of your scripting language (R/Python/shell) so that you can do useful analyses with it. Learn by solving problems.


Introduction to Galaxy: bioinformatics in the browser

Literate programming with R Markdown, Jupyter and Quarto

Convert those scripts into literate scripts and document them with introduction, methods, code comments, results observations and interpretations. Use JupyterLab (Jupyter notebook) or Rstudio (Rmarkdown) development environments to provide graphical interfaces to the programming languages.


Practical guides for Conda, Guix and Docker

Control the environment. Record the version of programming languages and packages. Become familiar and experiment with Conda, Guix and Docker. Select the option that works best for your project.


Continuous analysis/continuous validation

Continuous analysis involves automatic execution and testing of code, which is important when changes are made to code or data. Results generated in HTML are automatically updated. Badges on the repository indicate the status of the codebase in terms of completing with or without errors. Transparency and reproducibility help validation which can be continously maintained.