Our knowledge of the universe and (arguably) our modern society are built upon computer code. One would hope that said code is tested rigorously, produces consistent results, and robust to fault. As part of this wider conversation, I would argue that reproducibility - especially on different machines and computing environments - is critical. Ignoring reproducibility is the equivalent of buying an iPhone that only works in the Apple Store but as soon as it comes to your house, it breaks!

Docker was built for ensuring reproducibility. Long story short, what’s neat about Docker is that you just have to ship your Dockerfile and some other files (e.g. requirements.txt to set up your coding environment). In theory, a script should run identically between your machine and anyone else who receives those files!

This post is a cheat sheet to get you up and running. Plus, the logo is an adorable whale - what’s not to like about Docker?

Table of contents

Examples that I show here are in this repo.

Docker Jargon

Term What is it Real-life Equivalent
Dockerfile A file that specifies how to build a Docker image Instructions on how to write a bread recipe
Docker image A file that acts as a template for Docker containers The bread recipe
Docker container Instances of the Docker image The bread

Making Dockerfiles

# Define a base for import
FROM IMAGE:tag

# Copy files over
COPY FILENAME /PATH/TO/DEST/IN/CONTAINER/

# Run commands while building the image
RUN COMMAND1 && COMMAND2

# This is the command to run when running the container.
# This is a strict executable that runs unless --entrypoint is 
ENTRYPOINT ["executable", "param1", "param2", ... ]

# This is the command to run but parameters are overwritable
CMD ["executable", "param1", "param2", ...] 

Example Dockerfiles

Toy example

Make sure to have a requirements.txt file.

scikit-learn==0.23.1

And let’s have a simple Python script that just plots the number of objects in the iris dataset

# iris_histogram.py
from sklearn.datasets import load_iris
iris = load_iris()

iris_dict = dict([(i,v) for i,v in enumerate(iris['target_names']) ])
iris_names = [iris_dict[_] for _ in iris['target']]

iris_counts = dict([ (t, iris_names.count(t)) for t in iris['target_names'] ])

for k,v in iris_counts.items():
    print(f"{k.ljust(10)}: {'*'*v}")

This is all we need for our Dockerfile:

FROM python:3.7-slim-stretch

COPY requirements.txt /

RUN pip install -r requirements.txt

CMD ["python", "iris_histogram.py"]

To run it, we simply need to do

$ docker build -t iris_example -f Dockerfile .
...

$ docker run iris_example
setosa    : **************************************************
versicolor: **************************************************
virginica : **************************************************

Bioinformatics example

I’ll process a PDB file of my favourite antibody ever: an antibody that binds Sonic hedgehog

# requirements.txt
biopython==1.76

This is our basic script:

# example.py
from Bio.PDB import PDBParser

p = PDBParser(QUIET=True)
s = p.get_structure('3mxw.pdb', '3mxw.pdb')

n_c = len(list(s.get_chains()))

print(f"This protein has {n_c} chains")

Our Dockerfile:

FROM python:3.7-slim-stretch

COPY . /

RUN pip install -r requirements.txt

CMD ["python", "example.py"]

And running it is as simple as:

$ docker build -t pdb_example -f Dockerfile .
...

$ docker run pdb_example
This protein has 3 chains and 220 residues in chain H

Docker commands

Build image

# docker build -t [IMAGE_TAG] -f [LOCATION_OF_DOCKERFILE] [CONTEXT_DIRECTORY]

docker build -t my_image -f Dockerfile .

Run image (Build a container)

Default run

# docker run [IMAGE TAG]

docker run my_image

Run image interactively (e.g. a bash shell)

# docker run -it [IMAGE TAG] [COMMAND] 

# Spin up a new Docker container and run a bash shell
docker run -it my_image bash 

Run a command on a running container

# docker exec -it [CONTAINER TAG] [COMMAND]

# Open bash on an existing, running container
docker exec -it my_container bash

List things

# List docker images
docker image ls 

# List containers
docker ps -a

Delete containers and images

# Remove docker container
docker rm [CONTAINER_TAG]

# Remove all docker containers
docker rm $(docker ps -a -q)

# Remove docker image
docker rmi [IMAGE_TAG]

# Remove all docker images
docker rmi $(docker image ls -a -q)