Reproducible Software Pipelines
environment.yml
and conda-lock.yml
files capture the dependencies you explicitly specifyA container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another
– Docker docs
So what’s the difference with environment.yml
and conda-lock.yml
?
conda-lock.yml
A container packs your code and all its dependencies, down to the operating-system level, in a package that can be executed on another machine.
There needs to be something on the receiving machine that is able to run the container
We will focus on docker
, as it is the most common solution
Containers are much more lightweight
The running instance of an image
Roughly equivalent to the process executing a program
Isolated from other containers and from the OS
You can set limits on the resources available to the container:
build
, start
, stop
) to the Docker daemonhello-world
imageTry now to run
docker images
it will list something like
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest d2c94e258dcb 16 months ago 13.3kB
docker/getting-started latest 3e4394f6b72f 20 months ago 46.9MB
Dockerfile
one can create custom imagesDockerfile
is basically a recipe for building a container imageThis will create an image that is a clone of the base ubuntu
image
To turn this specification into an image, run
docker build
Now docker images
will list
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 0db8d9ac190e 3 weeks ago 78.1MB
hello-world latest d2c94e258dcb 16 months ago 13.3kB
docker/getting-started latest 3e4394f6b72f 20 months ago 46.9MB
To give a name to the image, we use
docker build -t my-image .
and now docker images
lists
REPOSITORY TAG IMAGE ID CREATED SIZE
my-image latest 0db8d9ac190e 3 weeks ago 78.1MB
hello-world latest d2c94e258dcb 16 months ago 13.3kB
docker/getting-started latest 3e4394f6b72f 20 months ago 46.9MB
It depends on two things:
![Tags on the ubuntu image pages
Now the apt-get
calls will refer to the ubuntu-noble
repositories even in the future.
apt-get update && apt-get install ...
you might get different versions each timeDockerfile
FROM ubuntu:noble-20240801
RUN apt-get update -y && apt-get install -y python3
# The COPY command copies files from your machine (current directory)
# into the container image
COPY app.py /app/app.py
# The following command copies the entire build directory in
# the /app/ dir within the image
COPY . /app/
If a layer changes all the ones after it will be recomputed
Dockerfile
This will reinstall python every time you modify app.py
Run the following command every once in a while
docker system prune
it will remove unused layers and free some disk space
The command
docker rmi <image-name>
removes an entire image
docker run my-image
Nothing happens, because we did not specify what should happen upon running
CMD
and ENTRYPOINT
Both CMD and ENTRYPOINT instructions define what command gets executed when running a container. There are few rules that describe their co-operation.
Dockerfile should specify at least one of `CMD` or `ENTRYPOINT` commands.
`ENTRYPOINT` should be defined when using the container as an executable.
`CMD` should be used as a way of defining default arguments for an `ENTRYPOINT` command or for executing an ad-hoc command in a container.
`CMD` will be overridden when running the container with alternative arguments.
CMD
and ENTRYPOINT
ENTRYPOINT
if you want the container to act as the wrapper to an executableCMD
if you want to provide a default command, giving the chance to the user to overwrite it.conda
-based containerconda-lock -f env.yaml --platform linux-64 --kind-explicit
Dockerfile
Then build it as usual:
docker build -t streamlit-container
docker run -i -t --rm streamlit-container
-i
runs the container interactively-t
allocates a pseudo tty--rm
removes the container once you stop it-p
option:docker run -p host-port:container-port streamlit-container
docker run --memory=1G streamlit-container
docker run --cpus=1 streamlit-container
Can also accept fractional numbers. --cpus="0.5"
ensures that the container only uses 50% of a single CPU.
docker run --gpus=all streamlit-container
docker run -i -t --rm --volume $(pwd):/data
where $(pwd)
can be replaced by any source directory on the host, and /data
can be replaced by an destination directory in the container
Dockerfile
# Builder image
FROM continuumio/miniconda:latest AS builder
COPY conda-linux-64.lock /locks/conda-linux-64.lock
RUN conda create -p /opt/env --file /locks/conda-linux-64.lock
# Primary image
FROM ubuntu:noble
COPY --from=builder /opt/env /opt/env
COPY app.py /app/app.py
ENV PATH=/opt/env/bin:"${PATH}"
CMD streamlit run /app/app.py