Continuous integration

Reproducible Software Pipelines

Continuous integration (CI) is the practice of integrating source code changes frequently and ensuring that the integrated codebase is in a workable state.

Continuous integration at a glance

  • Central code repository

  • The code has automated unit and integration tests

  • All the tests are executed on each commit on a dedicated machine in a fresh environment

  • Addresses the “works on my machine” problems

Continuous integration

  • In software engineering, CI is used to ensure check that the codebase is still working after each change

  • What can it buy us in reproducibility?

Continuous integration for reproducibility

  • Run the entire pipeline on every commit in a docker image

  • Use an isolated environment that is not your machine

  • Possibly use a small subset of the data

  • Setting this up forces you to make the code reproducible

GitHub actions

  • GitHub actions are executed upon events in a repository

    • push
    • merge a pull request
    • tag
  • They are described as yaml files in the .github/workflows directory

  • They work very will when paired with docker

GitHub actions, an example

https://github.com/cecca/repro-course-ci

GitHub actions, an example

Create the conda-lock file with

conda-lock -f env.yaml --platform linux-64 --kind explicit

And then

Dockerfile
# ------------------------------------------------------------------------
# Build image
FROM continuumio/miniconda:latest AS builder

COPY conda-linux-64.lock /var/locks/conda-linux-64.lock
RUN conda create -p /opt/env --file /var/locks/conda-linux-64.lock &&\
    /opt/env/bin/pip install pyattimo==0.6.1

# ------------------------------------------------------------------------
# Runtime image
FROM ubuntu:noble
COPY --from=builder /opt/env /opt/env
COPY . .
ENV PATH="/opt/env/bin:${PATH}"

CMD python3 pipeline.py check

GitHub actions, an example

name: CI

# Controls when the action will run. Triggers the workflow on push or pull request
# events but only for the ain branch
on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # The type of runner that the job will run on
    runs-on: ubuntu-latest

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
    # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
    - uses: actions/checkout@v2
    - name: Test run
      run: |
        docker build -t ci .
        docker run ci