Practical guides for implementing ‘Five Pillar’ principles

Source: https://github.com/markziemann/5pillars/blob/main/guides/practical_guides.Rmd

The five pillar approach is a synthesis of over a decade of learnings around computational reproducibility, and contains seven key recommendaions and many best practices. These may be overwhelming for any data scientists, especially beginners and novices, so here we have tried to put together some recommended resources to enable practitioners to put these principles into practice. One of the questions practitioners may have is “Where do I start?” and “In what order should I learn/master these principles?”. The following sections have been arranged in order, and have been submitted to the Internet Archive for posterity.

Getting started with data analysis in R and Python

Learn the basics of your scripting language (R/Python/shell) so that you can do useful analyses with it. Learn by solving problems.

Introduction to R and Rstudio

Introduction to Unix and shell programming

Introduction to Python

Introduction to VS Code

Tutorial and video: Getting Started with Visual Studio Code

Introduction to Galaxy: bioinformatics in the browser

Galaxy: A very short introduction

Literate programming with R Markdown, Jupyter and Quarto

Convert those scripts into literate scripts and document them with introduction, methods, code comments, results observations and interpretations. Use JupyterLab (Jupyter notebook) or Rstudio (Rmarkdown) development environments to provide graphical interfaces to the programming languages.

Practical guides for `git` and GitHub

Learn how integrate version control into your development environment.

Practical guides for Conda, Guix and Docker

Control the environment. Record the version of programming languages and packages. Become familiar and experiment with Conda, Guix and Docker. Select the option that works best for your project.

Practical guides for documenting computational research

Document the process of reproduction as if you were an external researcher. Test the instructions.

Extend the scripts to make them end-to-end processes

Try to extend the scripts to make them end-to-end processes, by linking to raw data and outputting whole figures or research articles.

Continuous analysis/continuous validation

Continuous analysis involves automatic execution and testing of code, which is important when changes are made to code or data. Results generated in HTML are automatically updated. Badges on the repository indicate the status of the codebase in terms of completing with or without errors. Transparency and reproducibility help validation which can be continously maintained.