A Paradigm for Managing Computational Reproducibility in a Changing Software Package Landscape

Presentation at SDSS 2020
Published

2020

Who was involved

This work was jointly authored by myself and Dr. Heike Hofmann.

Abstract

Achieving computational reproducibility within data science pipelines is a dynamic, shifting task. Package development for data science is happening at a very rapid speed, both in R and python, the two main scripting languages for Data Science. This means, that an implemented data pipeline might produce different results due to a change in the underlying dependencies. Focusing on the R software we propose a paradigm for managing computational reproducibility that assists users in not only identifying when a package’s functionality has changed, but also identifies whether that change will impact the results of a user’s project code.

About this product

This is an abbreviated version of one of my dissertation chapters on computational reproducibility. This work was presented at the Symposium on Data Science and Statistics (SDSS) in 2020.

See the slides from my presentation