Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Joshua Cook, Docker for Data Science, https://doi.org/10.1007/978-1-4842-3012-1_5

5. The Dockerfile

Joshua Cook¹

(1)Santa Monica, California, USA

Every Docker image is defined as a stack of layers, each defining fundamental, stateless changes to the image. The first layer might be the virtual machine’s operating system (a Debian or Ubuntu Docker image), the next the installation of dependencies necessary for your application to run, and all the way up to the source code of your application. The best way to leverage this system is via a Dockerfile.

Best Practices

A Dockerfile is a way to script the building of an image. This script uses a domain-specific language to tell the Docker daemon how to sequentially build a Docker image. When the daemon is instructed to build an image, it does so by reading the necessary instructions from a Dockerfile.

Docker holds the following as best practices when creating Dockerfiles:

Containers should be ephemeral or stateless, in that they can be reinstantiated with a minimum of set up and configuration.
Use a .dockerignore file, similar to a .gitignore.
Avoid installing unnecessary packages.
Each container should have only one concern.
Minimize the number of layers.
Sort multiline arguments.

Stateless Containers

A best practice in modern software development is software that runs as isolated processes sharing nothing between them, often referred to as microservices .¹ While an application may run more than one process at a time, these processes should be stateless. If any information is to be persisted beyond the termination of the process, this information should be written to a stateful backing service such as a database (like MongoDB or Postgres) or a key-value store (like Redis).

We look at Docker containers as abstractions of system processes, in fact, thinking of Docker containers as processes being managed by the Docker daemon. As such, it is also best practice to define our Docker images, and thus the running containers which they define, as completely stateless. We should be able to shut down a container and remove it from our system, then start an identical container using the same image, and run command with no effect to our work.

Single-Concern Containers

Early iterations of Dockerfile best practices held to the mantra “one process per container.” As Docker has matured, sticking to one process for each container has proven untenable. Jupyter is actually a prime example of a container that can’t be defined to run via a single process.

That said, best practice holds that each container should be defined to have a single concern. Jupyter may require multiple processes to function properly but we will dedicate a single container to running Jupyter. Should we wish to interface with a Postgres database, we would use a second container concerned with this database. The end goal is to keep application concerns separate and modular.

Project: A Repo of Docker Images

In this chapter, you will be developing a repository of Dockerfiles, each defining a separate image you might use in the course of your work. In the process of developing and maintaining these images defined via a Dockerfile, you will explore the syntax of defining an image and best practices in building and maintaining images. Your ultimate goal is to create a suite of images that will become your primary building blocks in developing modular systems for performing the work of the data scientist.

Prepare for Local Development

First, you will prepare your local machine for the development work you will be doing. In Listing 5-1, you do this by creating a directory for your project and initializing it as a git repository.

Listing 5-1. Prepare for Local Development

$ mkdir ch_5_dockerfiles && cd ch_5_dockerfiles
$ git init
$ touch README.md
$ git add README.md
$ git commit -m 'init'

Configure GitHub

On GitHub, create a repo called Dockerfiles. Once you have created the repo, connect your local directory to the remote GitHub repo (Listing 5-2).

Note

You will need to configure your GitHub account for connection via SSH. Documentation for this is available at https://help.github.com/articles/connecting-to-github-with-ssh/ .

Listing 5-2. Connect the Local Repo to GitHub

$ git remote add origin [email protected]:<username>/dockerfiles.git
$ git push -u origin master

Building Images Using Dockerfiles

The connection between the Dockerfile and a built (compiled) Docker image is the docker build command. The build command tells the Docker daemon to construct an image using the specified context and a Dockerfile. Context refers to the collection of files that will be used to build the specific image. The context will be specified by an included PATH. You can take a look at the requirements of docker build to see what this might look like (Listing 5-3).

Listing 5-3. Display docker build help

$ docker build
"docker build" requires exactly 1 argument(s).
See 'docker build --help'.

Usage:  docker build [OPTIONS] PATH | URL | -

As you can see, docker build requires a PATH or a URL as the final argument. This is the context.

Dockerfile Syntax

Dockerfiles are built using a simple domain-specific language (see Listing 5-4). Instructions are case-insensitive but by convention are written in all caps. Instructions are passed sequentially and Dockerfiles should be thought of as scripts passed to the Docker daemon.

Listing 5-4. Dockerfile syntax

# Comment
INSTRUCTION arguments

Designing the gsl Image

The first image you will construct is the same image you used in Chapter 3, the GSL image. This image is used for compiling code using the GNU Scientific Library (GSL) , a suite of C tools used in computational mathematics, especially using the BLAS ecosystem. You build this image using the gcc² image as a base image, and in doing so ensure that you have all of the tools necessary to compile your C code.

Create the gsl Source Directory

First, you make a directory for this specific image and instantiate your Dockerfile (Listing 5-5).

Listing 5-5. Create a Directory for the gsl Image Containing an Empty Dockerfile

$ mkdir gsl
$ touch gsl/Dockerfile

In Listing 5-6, to build the gsl image, you use the docker build command, using the relatively-referenced folder gsl as context. The -t flag tells the daemon to name that image joshuacook/gsl. Your initial attempt at a build fails because you have not added any commands to the Dockerfile.

Listing 5-6. Run a Docker Build

$ docker build -t joshuacook/gsl gsl
sending build context to Docker daemon 53.25 kB
Error response from daemon: The Dockerfile (Dockerfile) cannot be empty

Define the gsl Image

Let’s define the gsl image as you did in Chapter 3, using three layers, defined by three commands, as seen in Listing 5-7.

Listing 5-7. gsl/Dockerfile

FROM gcc

LABEL maintainer=@joshuacook

RUN apt-get update && 
      apt-get install -y 
      gsl-bin 
      libgsl0-dbg 
      libgsl0-dev 
      libgsl0ldbl

Build the gsl Image

In Listing 5-8, having defined the image using a Dockerfile, you build the image using the docker build command, again naming the image and providing the gsl directory as context.

Listing 5-8. Run a Docker Build

$ docker build -t joshuacook/gsl gsl
sending build context to Docker daemon 14.85 kB
Step 1 : FROM gcc
latest: Pulling from library/gcc

693502eb7dfb: Pull complete
...
e6a66f7b6a7a: Pull complete
Digest: sha256:c1fa0b3eeba33a7b9da5ab7de7fa2c520760f778b5e5d1db38791d0da7b841b9
Status: Downloaded newer image for gcc:latest
 ---> 408d218617ca
Step 2 : LABEL maintainer @joshuacook
 ---> Running in 723583dd01d6
 ---> 20bbe26b8b8a
Removing intermediate container 723583dd01d6
Step 3 : RUN apt-get update &&     apt-get install -y     gsl-bin     libgsl0-dbg     libgsl0-dev     libgsl0ldbl
 ---> Running in ed02ca384066
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
...

FROM gcc

The first layer uses the FROM instruction to define the base image from which you will build your image. A valid Dockerfile must always begin with a FROM instruction. Best practice recommends that all but advanced use cases should begin their images by pulling FROM the Docker Public Repositories. To reiterate, FROM must be the first non-comment instruction in the Dockerfile. Here, you are pulling the from the gcc image.

FROM can be used with the following three syntaxes:

FROM <image>
FROM <image>:<tag>
FROM <image>@<digest>

Note

The use of tag or digest are optional. If neither is provided, the tag is assumed to be latest, corresponding to the latest available build at Docker Hub.

LABEL maintainer=@joshuacook

The second layer uses the LABEL instruction to define metadata associated with your image. Each LABEL is a key-value pair. In this case, you associate the key maintainer (of the image) to the value (the Docker Hub user), joshuacook. Images can have multiple LABELs.

To view an image’s LABELs, you use the docker inspect command. In Listing 5-9, you do this for the joshuacook/gsl image that you just built.

Listing 5-9. Inspect the joshuacook/gsl Image

$ docker inspect joshuacook/gsl
...
   "Labels": {
       "maintainer": "@joshuacook"
   }
...

RUN apt-get update && apt-get install

The final layer of the image uses the RUN instruction, in this case to install the libraries necessary for using the GNU Scientific Library. The RUN instruction uses /bin/sh to execute any provided commands in a new layer on top of the current image. The results of the execution become the new image.

Note

We use a (backslash) to continue a single instruction across multiple lines. Thus, the instructions in Listing 5-10 and the Listing 5-11 are functionally equivalent.

Listing 5-10. apt Install Over Multiple Lines

RUN apt-get update && 
      apt-get install -y 
      gsl-bin 
      libgsl0-dbg 
      libgsl0-dev 
      libgsl0ldbl

Listing 5-11. apt Install in a Single Line

RUN apt-get update && apt-get install -y gsl-bin libgsl0-dbg libgsl0-dev libgsl0ldbl

Commit Changes to GitHub

Before you move on, you should git commit the changes that you have made to your Dockerfile and push the changes to GitHub (see Listing 5-12).

Listing 5-12. Commit Changes and Push to GitHub

$ git add gsl/Dockerfile
$ git commit -m 'GSL IMAGE - initial build'
$ git push

The Docker Build Cache

Recall that the structure of a Docker image is defined as a stack of images, each defining a stateless change to the image. Because each of them is built as a separate abstraction layer, changes to the Docker image are made from the top down when rebuilding the image. If you have only made changes to the code sitting on the top-most layer, this is the only layer of the image that will be rebuilt.

You can examine the build cache by adding a new layer to your GSL image. Let’s add a new layer to your image containing another library you might be interested in for performing data science tasks in C. Add the line in Listing 5-13 to your Dockerfile.

Listing 5-13. Add an Additional Dependency Layer to joshuacook/gsl

RUN apt-get update && 
    apt-get install -y gdb

In Listing 5-14, you rebuild the image.

Listing 5-14. Rerun the joshuacook/gsl Image Build

$ docker build -t joshuacook/gsl gsl
Sending build context to Docker daemon 2.048 kB
Step 1/4 : FROM gcc
 ---> 408d218617ca
Step 2/4 : LABEL maintainer @joshuacook
 ---> Using cache
 ---> ba0d6482c28a
Step 3/4 : RUN apt-get update &&       apt-get install -y       gsl-bin       libgsl0-dbg       libgsl0-dev       libgsl0ldbl
 ---> Using cache
 ---> 8706282586fc
Step 4/4 : RUN apt-get install -y gdb
 ---> Running in c491003e5a95

...

 ---> 53111721b96a
Removing intermediate container c491003e5a95
Successfully built 53111721b96a
...

Note

During this build process, steps 1-3 were not executed as before. Prior to executing each step, the Docker daemon compares the step to the individual layers of an existing instance of the same image in the local cache. If they match, the build uses the existing layers (i.e. the Docker Build Cache) to build the image. As a result, a build with no changes is idempotent, meaning that the initial command builds the image, but subsequent builds have no additional effect. Listing 5-15 demonstrates the idempotency of the docker build process.

Listing 5-15. Rerun the Identical Build Once More

$ docker build -t joshuacook/gsl gsl
Sending build context to Docker daemon 2.048 kB
Step 1/4 : FROM gcc
 ---> 408d218617ca
Step 2/4 : LABEL maintainer @joshuacook
 ---> Using cache
 ---> ba0d6482c28a
Step 3/4 : RUN apt-get update &&       apt-get install -y       gsl-bin       libgsl0-dbg       libgsl0-dev       libgsl0ldbl
 ---> Using cache
 ---> 8706282586fc
Step 4/4 : RUN apt-get install -y gdb
 ---> Using cache
 ---> 53111721b96a
Successfully built 53111721b96a

A more interesting and useful result is that a build will only run steps following the first step upon which you have made a change, as demonstrated in Listing 5-15.

Anaconda

Anaconda is a freemium distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing. You will use Anaconda to drive your Jupyter platform. The conda command is the primary interface for managing Anaconda installations. Miniconda is a small “bootstrap” version that includes only conda and conda-build, and installs Python. As an exercise, however, you will design your own miniconda image modeled after the miniconda image distributed by Continuum Analytics that can be used to minimally run IPython.

Design the miniconda3 Image

In Chapter 3, you used the jupyter/scipy-notebook image to run IPython using the scipy library. This is a perfectly reasonable usage of that image. It contains all of the libraries that you need. Furthermore, storing two independently defined images (one for the purpose of running Jupyter with scipy and the other for the purpose of running IPython with scipy) would consume unnecessary disk space in the interest of having single-usage images. Finally, running IPython from the jupyter image is a recommended best practice from the IPython team.³ Ultimately, you will do the same.

Create the miniconda3 Source Directory

Again, in Listing 5-16, you begin by creating a directory to hold the source code associated with this image.

Listing 5-16. Create a Directory for the miniconda3 Image Containing an Empty Dockerfile

$ mkdir miniconda3
$ touch miniconda3/Dockerfile

In Listing 5-17, you use the tree tool to display the overall structure of your docker images project to this point.

Listing 5-17. Use tree to Show Project Progress

$ tree
.
├── README.md
├── gsl
 │   └── Dockerfile
└── miniconda3
    └── Dockerfile

Begin the Image with FROM, ARG, and MAINTAINER

You begin your image with the FROM instruction, which sets the base image upon which you will build your image. A valid Dockerfile requires a FROM instruction as its first instruction. In Listing 5-18, you take your cues from the Jupyter project and build using the debian image.

Listing 5-18. miniconda3/Dockerfile

FROM debian

In Listing 5-19, you build the image.

Listing 5-19. Build the miniconda3 Image

$ docker build -t miniconda3 miniconda3
Sending build context to Docker daemon 2.048 kB
Step 1/1 : FROM debian
latest: Pulling from library/debian
6d827a3ef358: Pull complete
Digest: sha256:72f784399fd2719b4cb4e16ef8e369a39dc67f53d978cd3e2e7bf4e502c7b793
Status: Downloaded newer image for debian:latest
 ---> 8cedef9d7368
Successfully built 8cedef9d7368

Here, you have built an image with a single layer: your underlying operating system. In Listing 5-20, you list images in your image cache .

Listing 5-20. Show Cached Images

$ docker images
REPOSITORY        TAG       IMAGE ID        CREATED          SIZE
debian            latest    47af6ca8a14a    8 days ago       125.1 MB
miniconda3        latest    47af6ca8a14a    5 minutes ago    125.1 MB
...

Note

The debian image and the miniconda3 image have the exact same IMAGE ID. This is because they are the same image.

Commit Changes to the Local Repository

You will be building a fairly complicated image here and as such it is best practice to commit to the local git repository frequently. In Listing 5-21, you make your first commit for this image.

Listing 5-21. Commit Changes

$ git add Dockerfile
$ git commit -m 'MINICONDA3 IMAGE. Added FROM instruction.'

In Listing 5-22, you add a label for the maintainer of the image, as you did previously for the gsl image. Again, this is not required, but it is conventional.

Listing 5-22. miniconda3/Dockerfile

FROM debian
LABEL maintainer=@joshuacook

In Listing 5-23, you add an ARG instruction. The ARG instruction is used to define an environment variable that is available at build or runtime.

Listing 5-23. miniconda3/Dockerfile

FROM debian
LABEL maintainer=@joshuacook
ARG DEBIAN_FRONTEND=noninteractive

Here you use the ARG instruction to define an environment variable that will describe the behavior of your interaction with your Debian container. You could also have defined an ARG with no value and then passed the value to the build as an argument (see Listing 5-24).

Listing 5-24. An ARG Passed at Build Time

$ DEBIAN_FRONTEND=noninteractive docker build -t some_image .

Note

You declare the value of an argument inline with the docker build command.

There is yet a third way to declare environment variables. The Jupyter image defined by the Jupyter development team uses the ENV instruction to achieve the same purpose. After reading this thread⁴ on the Docker team’s GitHub page, I chose to use the ARG instruction. Any of them will work but it is important to only use one.

Idempotently Run the Build

In Listing 5-25, you run the build. As you have previously run the build for your first layer, this will not need to be executed, and the build will pick up only the most recent additions.

Listing 5-25. Run the Build

$ docker build -t miniconda3 miniconda3
Sending build context to Docker daemon 2.048 kB
Step 1/3 : FROM debian
 ---> a2ff708b7413
Step 2/3 : LABEL maintainer @joshuacook
 ---> Using cache
 ---> c140313c988e
Step 3/3 : ARG DEBIAN_FRONTEND=noninteractive
 ---> Running in b927c25267f0
 ---> da987fd59d24
Removing intermediate container b927c25267f0
Successfully built da987fd59d24

Now you’re getting somewhere. You have added three lines to your Dockerfile. When you ran the build, it either 1) used a pre-existing layer as in the case at Step 1 or 2) created a new layer containing the results of the step.

Note

The debian image was not pulled this time. There was nothing new to do here!

Commit Changes to the Local Repository

In Listing 5-26, you again commit your changes to the local repository.

Listing 5-26. Commit Changes

$ git add Dockerfile
$ git commit -m 'MINICONDA3 IMAGE. Added maintainer LABEL and ARG instruction'

Provision the miniconda3 Image

Ultimately, your Jupyter application will run on a containerized Debian machine. As with any application, it will have many dependencies, or system-level applications, that the Debian system must be able to execute in order to function properly. Were you building this system manually, that is without Docker, you would use the command line package manager appropriate to your operating system (apt for Ubuntu or Debian, yum for CentOS, or brew for Mac OS X) in a process known as provisioning.

The Docker daemon has no native way of provisioning, but rather has a mechanism for leveraging the base system’s package manager via the RUN instruction (Listing 5-27).

Listing 5-27. The RUN Instruction

RUN <command>

This has the effect of running <command> in a shell to the image (i.e. /bin/sh -c <command>). Similar to scripting in a shell language, you can add a backslash () to the end of a line to continue that command on the next line. In other words, the statements in Listing 5-28 and Listing 5-29 are equivalent.

Listing 5-28. A RUN Instruction

RUN /bin/bash -c 'source $HOME/.bashrc ; echo $HOME'

Listing 5-29. Another RUN Instruction

RUN /bin/bash -c 'source $HOME/.bashrc ;
echo $HOME'

In Listing 5-30, you provision the thin operating system for the miniconda3 image.

Listing 5-30. miniconda3/Dockerfile

RUN apt-get update --fix-missing && 
    apt-get install -y 
    wget bzip2 ca-certificates 
    libglib2.0-0 libxext6 libsm6 libxrender1

Run the Build

At this point, you actually have some meat to your image. Let’s go ahead and build it (Listing 5-31).

Note

In Step 4/4, the RUN instruction is blissfully unaware of those line breaks. They are simply there for us in order to make the Dockerfile more readable.

Listing 5-31. Run the Build

$ docker build -t miniconda3 miniconda3
Sending build context to Docker daemon  2.56 kB
Step 1/4 : FROM debian
 ---> a2ff708b7413
Step 2/4 : LABEL maintainer @joshuacook
 ---> Using cache
 ---> c140313c988e
Step 3/4 : ARG DEBIAN_FRONTEND=noninteractive
 ---> Using cache
 ---> 6be1e058f2de
Step 4/4 : RUN apt-get update &&     apt-get install -yq --no-install-recommends     build-essential     bzip2     ca-certificates     git     libglib2.0-0     libsm6     libxrender1     wget&& apt-get clean &&     rm -rf /var/lib/apt/lists/*
 ---> Running in 642c7d024e55
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
Ign http://deb.debian.org jessie InRelease
Get:2 http://deb.debian.org jessie-updates InRelease [145 kB]
...

This phase of the build will take some time. For now, let’s hold off on examining the output. Again, running a Docker build is idempotent, meaning that no matter how many times you run it, you receive the same output. You will wait for this run to complete and then run the build once more to receive a compact output (Listing 5-32).

Listing 5-32. Idempotently Run the Build

$ docker build -t miniconda3 miniconda3
Sending build context to Docker daemon  2.56 kB
Step 1/4 : FROM debian
 ---> a2ff708b7413
Step 2/4 : LABEL maintainer @joshuacook
 ---> Using cache
 ---> c140313c988e
Step 3/4 : ARG DEBIAN_FRONTEND=noninteractive
 ---> Using cache
 ---> c140313c988e
Step 4/4 : RUN apt-get update &&     apt-get install -yq --no-install-recommends     build-essential     bzip2     ca-certificates     git     libglib2.0-0     libsm6     libxrender1     wget&& apt-get clean &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> bb1ae69e4c8a
Successfully built bb1ae69e4c8a

In Listing 5-32, you can see that the build output contains four steps representing the four instruction in your Dockerfile. The way that Docker works, it creates or uses a cached image for each step. At Step 1, you tell the Docker daemon which image you will be using as your base image FROM debian. After each step, you have a new layer and thus a new image and IMAGE ID. Let’s have a look at the images in your cache once more (Listing 5-33).

Listing 5-33. Display Images in the Local Image Cache

$ docker images
REPOSITORY       TAG       IMAGE ID        CREATED             SIZE
debian           latest    47af6ca8a14a    8 days ago          125.1 MB
miniconda3       latest    bb1ae69e4c8a    About a minute ago  1.636 GB
...

Note

The final layer’s ID, bb1ae69e4c8a, matches the IMAGE ID of the miniconda3 image.

Commit Changes to the Local Repository

In Listing 5-34, you again commit your changes to the local repository.

Listing 5-34. Commit Changes

$ git add Dockerfile
$ git commit -m 'MINICONDA3 IMAGE. OS provision statement.'

Install Miniconda

In Listing 5-35, you install Miniconda via a RUN instruction. You have taken this command directly from the Continuum Analytics Docker image for Miniconda.⁵ It varies slightly from the installation of Miniconda for the jupyter/base-notebook image .

Listing 5-35. miniconda3/Dockerfile

RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && 
wget --quiet 
    https://repo.continuum.io/miniconda/Miniconda3-4.3.11-Linux-x86_64.sh -O ∼/miniconda.sh && 
    /bin/bash ∼/miniconda.sh -b -p /opt/conda && 
    rm ∼/miniconda.sh

Run the Build

In Listing 5-36, you run the build to install Miniconda3.

Listing 5-36. Run the Build

$ docker build -t miniconda3 miniconda3
Sending build context to Docker daemon  2.56 kB
...
Step 5/5 : RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh &&     wget --quiet https://repo.continuum.io/miniconda/Miniconda3-4.3.11-Linux-x86_64.sh -O ∼/miniconda.sh &&     /bin/bash ∼/miniconda.sh -b -p /opt/conda &&     rm ∼/miniconda.sh
 ---> Running in 6e3605df5b68
...
Python 3.6.0 :: Continuum Analytics, Inc.
creating default environment...
installation finished.
 ---> 02b8f5d04aeb
Removing intermediate container 6e3605df5b68
Successfully built 02b8f5d04aeb

Commit the Changes to the Local Repository

In Listing 5-37, you again commit your changes to the local repository.

Listing 5-37. Commit Changes

$ git add Dockerfile
$ git commit -m 'MINICONDA3 IMAGE. Install miniconda3.'

tini

I think it a bit beyond of the progress you have made to this point to begin a discussion of the PID 1 “Zombie” process problem at this point. The short of it is this: Docker best practices are that we not run unnecessary processes when running our containers. This includes the typical systemd or sysvinit that would be run by your operating system to handle processes and signals. This can lead to containers that can’t be gracefully stopped or zombie containers that persist when they should have died.

For now, let’s simply handle this situation the way that the Continuum development team does and use tini. tini is a lightweight solution to this problem with no additional dependencies. It reaps zombies, spawns a single process (which will run as PID 1), and when the tini’s first child process has exited, tini exits as well.

In Listing 5-38, you include tini in your container.

Listing 5-38. miniconda3/Dockerfile

RUN apt-get install -y curl grep sed dpkg && 
    TINI_VERSION=`curl https://github.com/krallin/tini/releases/latest | grep -o "/v.*"" | sed 's:^..(.*).$:1:'` && 
    curl -L "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini_${TINI_VERSION}.deb" > tini.deb && 
    dpkg -i tini.deb && 
    rm tini.deb && 
    apt-get clean

Run the Build

In Listing 5-39, you run the build to install tini.

Listing 5-39. Run the Build

$ docker build -t miniconda3 miniconda3
Sending build context to Docker daemon  2.56 kB
...
Step 6/6 : RUN apt-get install -y curl grep sed dpkg &&     TINI_VERSION=`curl https://github.com/krallin/tini/releases/latest | grep -o "/v.*"" | sed 's:^..(.*).$:1:'` &&     curl -L "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini_${TINI_VERSION}.deb" > tini.deb &&     dpkg -i tini.deb &&     rm tini.deb &&     apt-get clean
 ---> Running in b6fa6f29121c
...
Successfully built ce5eb4345473

Commit the Changes to the Local Repository

In Listing 5-40, you again commit your changes to the local repository.

Listing 5-40. Commit Changes

$ git add Dockerfile
$ git commit -m 'MINICONDA3 IMAGE. Install tini.'

Configure the Environment Variable with ENV

In order to make sure that your container runs properly, you need to configure a few environment variables. In Listing 5-41, you do this with the ENV instruction. You first specify some language specific variables,⁶ and then you make sure that the conda binary is in the PATH.

Listing 5-41. miniconda3/Dockerfile

ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH

ENTRYPOINT

Ultimately, you provide an ENTRYPOINT for your container. The ENTRYPOINT instruction specifies the process(es) to be launched when the image is instantiated. In Listing 5-42, you tell Docker to run tini as PID 1.

Listing 5-42. miniconda3/Dockerfile

ENTRYPOINT [ "/usr/bin/tini", "--" ]

Run the Build

In Listing 5-43, you run the build to set your environment variables and the ENTRYPOINT.

Listing 5-43. Run the Build

$ docker build -t miniconda3 miniconda3
Sending build context to Docker daemon 3.072 kB
...
Step 7/9 : ENV LANG C.UTF-8 LC_ALL C.UTF-8
...
Step 8/9 : ENV PATH /opt/conda/bin:$PATH
...
Step 9/9 : ENTRYPOINT /usr/bin/tini --
 ---> Running in 3c3776056958
 ---> 014cc8d97486
Removing intermediate container 3c3776056958
Successfully built 014cc8d97486
Successfully tagged miniconda3:latest

Commit the Changes to the Local Repository

In Listing 5-44, you again commit your changes to the local repository. As this marks the completion of your miniconda3 image, you also push your changes to GitHub.

Listing 5-44. Commit Changes

$ git add Dockerfile
$ git commit -m 'MINICONDA3 IMAGE. Set env and entrypoint.'
$ git push

Design the ipython Image

Your miniconda3 image is significantly smaller than most of the other images in the Jupyter stack precisely because it comes with next to nothing preinstalled. If you wish to run ipython, you are going to need to install this library. Let’s use the FROM instruction once more to build an ipython image using the miniconda3 image that you just designed.

Create the ipython Source Directory

First, you make a directory for this specific image and instantiate your Dockerfile (Listing 5-45).

Listing 5-45. Create a Directory for the ipython Image Containing an Empty Dockerfile

$ mkdir ipython
$ touch ipython/Dockerfile

Define the ipython Image

Let’s define the ipython image, using three layers, defined by three commands, as seen in Listing 5-46.

Listing 5-46. ipython/Dockerfile

FROM miniconda3
LABEL maintainer=@joshuacook
RUN conda update conda && 
    conda install --quiet --yes ipython && 
    conda clean -tipsy

Install ipython with conda

The only novel command here is using conda to install ipython. conda is a package manager much like apt, but it is designed for managing Python packages. Here, you use conda to install ipython. Astute Pythonistas will notice that we have eschewed configuring any sort of virtual environment.

Define the Default Runtime Command

Your final instruction in the ipython Dockerfile is to give the image a default runtime command via the CMD ⁷ instruction. Each Dockerfile can contain a single CMD instruction, although this instruction is not required. Per the Dockerfile best practices,⁸ the CMD instruction should be used to run the software contained by an image and should always be used in the form CMD ["executable", "param1", "param2"...]. In Listing 5-47, you add an instruction to your Dockerfile to execute ipython at runtime.

Listing 5-47. ipython/Dockerfile

FROM miniconda3
LABEL maintainer=@joshuacook
RUN conda update conda && 
    conda install --quiet --yes ipython && 
    conda clean -tipsy
CMD ["ipython"]

Build the ipython Image

In Listing 5-48, having defined the image using a Dockerfile, you build the image using the docker build command, again naming the image and providing the ipython directory as context.

Listing 5-48. Run a Docker Build

$ docker build -t ipython ipython
Sending build context to Docker daemon 2.048 kB
Step 1/4 : FROM jupyter-base
 ---> 014cc8d97486
Step 2/4 : LABEL maintainer @joshuacook
 ---> Using cache
 ---> 938a406795a2
Step 3/4 : RUN conda update conda &&     conda install --quiet --yes ipython &&     conda clean -tipsy
 ---> Running in 64910e75091f
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /opt/conda:

The following packages will be UPDATED:

    conda: 4.3.11-py36_0 --> 4.3.14-py36_0

...
 ---> f9a032f0a9a5.
Removing intermediate container 64910e75091f
Step 4/4 : CMD ipython
 ---> Running in 5839183d3d00
 ---> 98f0b133173e
Removing intermediate container 5839183d3d00
Successfully built 98f0b133173e
Successfully tagged ipython:latest

Commit the Changes to GitHub

In Listing 5-49, you commit this new image and push the changes to GitHub.

Listing 5-49. Commit the Changes and Push to GitHub

$ git add ipython/Dockerfile
$ git commit -m 'IPYTHON. Initial build'
$ git push

Run the ipython Image as a New Container

Finally, having developed your design and built your ipython image, let’s run the image as a new container. In Listing 5-50, you run ipython as an interactive terminal containerized process. You terminate the process via Ctrl+D.

Listing 5-50. Run the ipython Image as a New Interactive Terminal Containerized Process

$ docker run -it ipython
Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00)
Type "copyright", "credits" or "license" for more information.

IPython 5.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
                                                                                                        
In [1]: import requests

In [2]: resp = requests.get('http://google.com')

In [3]: resp.status_code
Out[3]: 200

In [4]:
Do you really want to exit ([y]/n)?

Summary

In this chapter, I introduced the Dockerfile, the file type used to define an image in the Docker ecosystem. You explored several common instructions used in the definition of Dockerfiles. You also defined three images, the third built using the second as its base image. After reading this chapter, you should be familiar with the definition of new images using Docker best practices.

Footnotes

1 https://en.wikipedia.org/wiki/Microservices

2 https://en.wikipedia.org/wiki/GNU_Compiler_Collection

3 https://hub.docker.com/r/ipython/ipython/

4 https://github.com/docker/docker/issues/4032

5 https://github.com/ContinuumIO/docker-images/blob/master/miniconda3/Dockerfile

6 http://unix.stackexchange.com/questions/87745/what-does-lc-all-c-do

7 https://docs.docker.com/engine/reference/builder/#cmd

8 https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#cmd

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. The Dockerfile

Create new playlist

Sign In

Sign Up

5. The Dockerfile

Best Practices

Stateless Containers

Single-Concern Containers

Project: A Repo of Docker Images

Prepare for Local Development

Listing 5-1. Prepare for Local Development

Configure GitHub

Note

Listing 5-2. Connect the Local Repo to GitHub

Building Images Using Dockerfiles

Listing 5-3. Display docker build help

Dockerfile Syntax

Listing 5-4. Dockerfile syntax

Designing the gsl Image

Create the gsl Source Directory

Listing 5-5. Create a Directory for the gsl Image Containing an Empty Dockerfile

Listing 5-6. Run a Docker Build

Define the gsl Image

Listing 5-7. gsl/Dockerfile

Build the gsl Image

Listing 5-8. Run a Docker Build

FROM gcc

Note

LABEL maintainer=@joshuacook

Listing 5-9. Inspect the joshuacook/gsl Image

RUN apt-get update && apt-get install

Note

Listing 5-10. apt Install Over Multiple Lines

Listing 5-11. apt Install in a Single Line

Commit Changes to GitHub

Listing 5-12. Commit Changes and Push to GitHub

The Docker Build Cache

Listing 5-13. Add an Additional Dependency Layer to joshuacook/gsl

Listing 5-14. Rerun the joshuacook/gsl Image Build

Note

Listing 5-15. Rerun the Identical Build Once More

Anaconda

Design the miniconda3 Image

Create the miniconda3 Source Directory

Listing 5-16. Create a Directory for the miniconda3 Image Containing an Empty Dockerfile

Listing 5-17. Use tree to Show Project Progress

Begin the Image with FROM, ARG, and MAINTAINER

Listing 5-18. miniconda3/Dockerfile

Listing 5-19. Build the miniconda3 Image

Listing 5-20. Show Cached Images

Note

Commit Changes to the Local Repository

Listing 5-21. Commit Changes

Listing 5-22. miniconda3/Dockerfile

Listing 5-23. miniconda3/Dockerfile

Listing 5-24. An ARG Passed at Build Time

Note

Idempotently Run the Build

Listing 5-25. Run the Build

Note

Commit Changes to the Local Repository

Listing 5-26. Commit Changes

Provision the miniconda3 Image

Listing 5-27. The RUN Instruction

Listing 5-28. A RUN Instruction

Listing 5-29. Another RUN Instruction

Listing 5-30. miniconda3/Dockerfile

Run the Build

Note

Listing 5-31. Run the Build

Listing 5-32. Idempotently Run the Build

Listing 5-33. Display Images in the Local Image Cache

Note

Commit Changes to the Local Repository

Listing 5-34. Commit Changes

Install Miniconda

Listing 5-35. miniconda3/Dockerfile

Run the Build

Listing 5-36. Run the Build

Commit the Changes to the Local Repository

Table of Contents for
5. The Dockerfile