Who was involved
This work was jointly authored by myself and Dr. Heike Hofmann.
Abstract
Achieving computational reproducibility within data science pipelines is a dynamic, shifting task. Package development for data science is happening at a very rapid speed, both in R and python, the two main scripting languages for Data Science. This means, that an implemented data pipeline might produce different results due to a change in the underlying dependencies. Focusing on the R software we propose a paradigm for managing computational reproducibility that assists users in not only identifying when a package’s functionality has changed, but also identifies whether that change will impact the results of a user’s project code.
About this product
This is an abbreviated version of one of my dissertation chapters on computational reproducibility. This work was presented at the Symposium on Data Science and Statistics (SDSS) in 2020.