Reproducible Software Pipelines
Continuous integration (CI) is the practice of integrating source code changes frequently and ensuring that the integrated codebase is in a workable state.
Central code repository
The code has automated unit and integration tests
All the tests are executed on each commit on a dedicated machine in a fresh environment
In software engineering, CI is used to ensure check that the codebase is still working after each change
What can it buy us in reproducibility?
Run the entire pipeline on every commit in a docker image
Use an isolated environment that is not your machine
Possibly use a small subset of the data
Setting this up forces you to make the code reproducible
GitHub actions are executed upon events in a repository
They are described as yaml
files in the .github/workflows
directory
They work very will when paired with docker
https://github.com/cecca/repro-course-ci
Create the conda-lock file with
conda-lock -f env.yaml --platform linux-64 --kind explicit
And then
Dockerfile
# ------------------------------------------------------------------------
# Build image
FROM continuumio/miniconda:latest AS builder
COPY conda-linux-64.lock /var/locks/conda-linux-64.lock
RUN conda create -p /opt/env --file /var/locks/conda-linux-64.lock &&\
/opt/env/bin/pip install pyattimo==0.6.1
# ------------------------------------------------------------------------
# Runtime image
FROM ubuntu:noble
COPY --from=builder /opt/env /opt/env
COPY . .
ENV PATH="/opt/env/bin:${PATH}"
CMD python3 pipeline.py check
name: CI
# Controls when the action will run. Triggers the workflow on push or pull request
# events but only for the ain branch
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
build:
# The type of runner that the job will run on
runs-on: ubuntu-latest
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v2
- name: Test run
run: |
docker build -t ci .
docker run ci