Reproducible Software Pipelines
We consider a software pipeline that processes data to produce results (e.g. reports, plots, presentations)
The Association of Computing Machinery (ACM) defines three different flavors of reproducibility:
The measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation.
— ACM
The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.
— ACM
The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.
— ACM
We want our software pipelines to be at least repeatable.