Containers

Reproducible Software Pipelines

Containers

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another

– Docker docs

So what’s the difference with environment.yml and conda-lock.yml?

conda-lock.yml

Container

A container packs your code and all its dependencies, down to the operating-system level, in a package that can be executed on another machine.

There needs to be something on the receiving machine that is able to run the container

Different types of container runtimes

We will focus on docker, as it is the most common solution

Containers and virtual machines

A VM emulates the hardware and the whole operating system

A container shares the OS kernel with the host

Containers are much more lightweight

Docker concepts: images

  • An image is a read-only template with instructions for creating a Docker container.
  • It is an inert pile of bytes
  • You can think of it as the executable file of a program

Docker concepts: container

  • The running instance of an image

  • Roughly equivalent to the process executing a program

  • Isolated from other containers and from the OS

  • You can set limits on the resources available to the container:

    • memory
    • CPUs
    • network
    • storage

Docker architecture

  • Docker daemon: listens for Docker API requests and manages Docker objects such as images, containers, networks, and volumes.
  • Docker client: application that issues commands (build, start, stop) to the Docker daemon
  • Docker registries: store Docker images. Docker Hub is a public registry that anyone can use.

Docker architecture

The docker registry

  • https://hub.docker.com/
  • has a huge collection of ready-made images

Pulling an image from docker hub

docker run hello-world

Listing images

Try now to run

docker images

it will list something like

REPOSITORY               TAG       IMAGE ID       CREATED         SIZE
hello-world              latest    d2c94e258dcb   16 months ago   13.3kB
docker/getting-started   latest    3e4394f6b72f   20 months ago   46.9MB

Customizing images

  • By writing a Dockerfile one can create custom images
  • Custom images are based off the ones available in the public registry
  • A Dockerfile is basically a recipe for building a container image

Customizing images

Dockerfile
FROM ubuntu

This will create an image that is a clone of the base ubuntu image

To turn this specification into an image, run

docker build

Now docker images will list

REPOSITORY               TAG       IMAGE ID       CREATED         SIZE
<none>                   <none>    0db8d9ac190e   3 weeks ago     78.1MB
hello-world              latest    d2c94e258dcb   16 months ago   13.3kB
docker/getting-started   latest    3e4394f6b72f   20 months ago   46.9MB

Customizing images

To give a name to the image, we use

docker build -t my-image .

and now docker images lists

REPOSITORY               TAG       IMAGE ID       CREATED         SIZE
my-image                 latest    0db8d9ac190e   3 weeks ago     78.1MB
hello-world              latest    d2c94e258dcb   16 months ago   13.3kB
docker/getting-started   latest    3e4394f6b72f   20 months ago   46.9MB

Customizing images: adding software

Dockerfile
FROM ubuntu

RUN apt-get update -y && apt-get install -y python3

Which version of python do you get?


It depends on two things:

  • The version of Ubuntu you are basing your image off
  • When you build the container!

Customizing images: pinning the base image version

  • Images in the registry are tagged with version strings:

![Tags on the ubuntu image pages

Customizing images: pinning the base image version

Dockerfile
FROM ubuntu:noble-20240801

RUN apt-get update -y && apt-get install -y python3

Now the apt-get calls will refer to the ubuntu-noble repositories even in the future.

Customizing images: Caveats

  • Even if you specify a precise tag for the image, when you run apt-get update && apt-get install ... you might get different versions each time

Customizing images: bringing files into the image

Dockerfile
FROM ubuntu:noble-20240801

RUN apt-get update -y && apt-get install -y python3

# The COPY command copies files from your machine (current directory)
# into the container image
COPY app.py /app/app.py

# The following command copies the entire build directory in 
# the /app/ dir within the image
COPY . /app/

Layers

Layers

Layers: caching

Layers caching: the order is important!

If a layer changes all the ones after it will be recomputed

Dockerfile
FROM ubuntu:noble-20240801
COPY app.py /app/app.py
RUN apt-get update -y && \
    apt-get install -y python3

This will reinstall python every time you modify app.py

Dockerfile
FROM ubuntu:noble-20240801
RUN apt-get update -y && \
    apt-get install -y python3
COPY app.py /app/app.py

This will run the installation step only once

Layers: housekeeping

Run the following command every once in a while

docker system prune

it will remove unused layers and free some disk space

The command

docker rmi <image-name>

removes an entire image

Running a container

docker run my-image

Nothing happens, because we did not specify what should happen upon running

Running a container

Dockerfile
FROM ubuntu:noble-20240801
RUN apt-get update -y && apt-get install -y python3
COPY app.py /app/app.py

# This instruction specifies that we want to run 
# the script we copied in the container
CMD python3 /app/app.py 

Running a container

Dockerfile
FROM ubuntu:noble-20240801
RUN apt-get update -y && apt-get install -y python3
COPY app.py /app/app.py

# This instruction specifies that we want to run 
# the container as an executable
ENTRYPOINT python3 /app/app.py 

Differences between CMD and ENTRYPOINT

Both CMD and ENTRYPOINT instructions define what command gets executed when running a container. There are few rules that describe their co-operation.

  • Dockerfile should specify at least one of `CMD` or `ENTRYPOINT` commands.
  • `ENTRYPOINT` should be defined when using the container as an executable.
  • `CMD` should be used as a way of defining default arguments for an `ENTRYPOINT` command or for executing an ad-hoc command in a container.
  • `CMD` will be overridden when running the container with alternative arguments.

Differences between CMD and ENTRYPOINT

  • Use ENTRYPOINT if you want the container to act as the wrapper to an executable
  • Use CMD if you want to provide a default command, giving the chance to the user to overwrite it.

Building a conda-based container

env.yaml
name: base
channels:
  - conda-forge
dependencies:
  - streamlit=1.38
conda-lock -f env.yaml --platform linux-64 --kind-explicit
Dockerfile
FROM continuumio/miniconda:latest

COPY conda-linux-64.lock /locks/conda-linux-64.lock
RUN conda create -p /opt/env --file /locks/conda-linux-64.lock

COPY app.py /app/app.py
ENV PATH=/opt/env/bin:"${PATH}"
CMD streamlit run /app/app.py

Then build it as usual:

docker build -t streamlit-container

Running containers interactively

docker run -i -t --rm streamlit-container
  • -i runs the container interactively
  • -t allocates a pseudo tty
  • --rm removes the container once you stop it

Running containers: network

  • By default, containers cannot be accessed from the network, for security
  • You can explicitly forward ports from the host to the container using the -p option:
docker run -p host-port:container-port streamlit-container

Running containers: limiting memory

docker run --memory=1G streamlit-container

Running containers: limiting CPUs

docker run --cpus=1 streamlit-container

Can also accept fractional numbers. --cpus="0.5" ensures that the container only uses 50% of a single CPU.

Running containers: accessing GPUs

docker run --gpus=all streamlit-container

Running containers: persisting changes

  • Anything you write in a container is transient: when the container stops, it desappears
  • We can mount a directory of the host on the container
docker run -i -t --rm --volume $(pwd):/data

where $(pwd) can be replaced by any source directory on the host, and /data can be replaced by an destination directory in the container

Putting containers on a diet

Dockerfile
# Builder image
FROM continuumio/miniconda:latest AS builder
COPY conda-linux-64.lock /locks/conda-linux-64.lock
RUN conda create -p /opt/env --file /locks/conda-linux-64.lock

# Primary image
FROM ubuntu:noble
COPY --from=builder /opt/env /opt/env
COPY app.py /app/app.py
ENV PATH=/opt/env/bin:"${PATH}"
CMD streamlit run /app/app.py