Our knowledge of the universe and (arguably) our modern society are built upon computer code. One would hope that said code is tested rigorously, produces consistent results, and robust to fault. As part of this wider conversation, I would argue that reproducibility - especially on different machines and computing environments - is critical. Ignoring reproducibility is the equivalent of buying an iPhone that only works in the Apple Store but as soon as it comes to your house, it breaks!
Docker was built for ensuring reproducibility. Long story short, what’s neat about Docker is that you just have to
ship your Dockerfile
and some other files (e.g. requirements.txt
to set up your coding environment). In theory,
a script should run identically between your machine and anyone else who receives those files!
This post is a cheat sheet to get you up and running. Plus, the logo is an adorable whale - what’s not to like about Docker?
Table of contents
Examples that I show here are in this repo.
Docker Jargon
Term | What is it | Real-life Equivalent |
---|---|---|
Dockerfile |
A file that specifies how to build a Docker image | Instructions on how to write a bread recipe |
Docker image | A file that acts as a template for Docker containers | The bread recipe |
Docker container | Instances of the Docker image | The bread |
Making Dockerfiles
# Define a base for import
FROM IMAGE:tag
# Copy files over
COPY FILENAME /PATH/TO/DEST/IN/CONTAINER/
# Run commands while building the image
RUN COMMAND1 && COMMAND2
# This is the command to run when running the container.
# This is a strict executable that runs unless --entrypoint is
ENTRYPOINT ["executable", "param1", "param2", ... ]
# This is the command to run but parameters are overwritable
CMD ["executable", "param1", "param2", ...]
Example Dockerfiles
Toy example
Make sure to have a requirements.txt
file.
scikit-learn==0.23.1
And let’s have a simple Python script that just plots the number of objects in the iris dataset
# iris_histogram.py
from sklearn.datasets import load_iris
iris = load_iris()
iris_dict = dict([(i,v) for i,v in enumerate(iris['target_names']) ])
iris_names = [iris_dict[_] for _ in iris['target']]
iris_counts = dict([ (t, iris_names.count(t)) for t in iris['target_names'] ])
for k,v in iris_counts.items():
print(f"{k.ljust(10)}: {'*'*v}")
This is all we need for our Dockerfile
:
FROM python:3.7-slim-stretch
COPY requirements.txt /
RUN pip install -r requirements.txt
CMD ["python", "iris_histogram.py"]
To run it, we simply need to do
$ docker build -t iris_example -f Dockerfile .
...
$ docker run iris_example
setosa : **************************************************
versicolor: **************************************************
virginica : **************************************************
Bioinformatics example
I’ll process a PDB file of my favourite antibody ever: an antibody that binds Sonic hedgehog
# requirements.txt
biopython==1.76
This is our basic script:
# example.py
from Bio.PDB import PDBParser
p = PDBParser(QUIET=True)
s = p.get_structure('3mxw.pdb', '3mxw.pdb')
n_c = len(list(s.get_chains()))
print(f"This protein has {n_c} chains")
Our Dockerfile:
FROM python:3.7-slim-stretch
COPY . /
RUN pip install -r requirements.txt
CMD ["python", "example.py"]
And running it is as simple as:
$ docker build -t pdb_example -f Dockerfile .
...
$ docker run pdb_example
This protein has 3 chains and 220 residues in chain H
Docker commands
Build image
# docker build -t [IMAGE_TAG] -f [LOCATION_OF_DOCKERFILE] [CONTEXT_DIRECTORY]
docker build -t my_image -f Dockerfile .
Run image (Build a container)
Default run
# docker run [IMAGE TAG]
docker run my_image
Run image interactively (e.g. a bash shell)
# docker run -it [IMAGE TAG] [COMMAND]
# Spin up a new Docker container and run a bash shell
docker run -it my_image bash
Run a command on a running container
# docker exec -it [CONTAINER TAG] [COMMAND]
# Open bash on an existing, running container
docker exec -it my_container bash
List things
# List docker images
docker image ls
# List containers
docker ps -a
Delete containers and images
# Remove docker container
docker rm [CONTAINER_TAG]
# Remove all docker containers
docker rm $(docker ps -a -q)
# Remove docker image
docker rmi [IMAGE_TAG]
# Remove all docker images
docker rmi $(docker image ls -a -q)