© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
S. BhatPractical Docker with Pythonhttps://doi.org/10.1007/978-1-4842-7815-4_4

4. Understanding the Dockerfile

Sathyajith Bhat1  
(1)
Bangalore, Karnataka, India
 

Now that you have a better understanding of Docker and its associated terminology, this chapter shows you how to convert your project into a containerized application using Docker. In this chapter, you learn what a Dockerfile is, including its syntax, and learn how to write a Dockerfile. With a better understanding of Dockerfiles, you can work toward the first step in writing a Dockerfile for the Newsbot app.

Dockerfile Primer

For a traditionally deployed application, building and packaging an application was often quite tedious. With the aim to automate the build and packaging of the application, people turned to different utilities, such as GNU Make, maven, Gradle, and so on, to build the application package. Similarly, in the Docker world, a Dockerfile is an automated way to build your Docker images.

The Dockerfile contains special instructions that tell the Docker Engine about the steps required to build an image. To invoke a build using Docker, you issue the Docker build command. Listing 4-1 shows a typical Dockerfile.
FROM ubuntu:latest
LABEL author="sathyabhat"
LABEL description="An example Dockerfile"
RUN apt-get install python
COPY hello-world.py
CMD python hello-world.py
Listing 4-1

A Typical Dockerfile

Looking at this Dockerfile, it’s easy to see what we’re telling the Docker Engine to build. However, don’t let the simplicity fool you—Dockerfiles let you build complex conditions when generating your Docker images. When a Docker build command is issued, it builds the Docker images from the Dockerfile and a build context.

Build Context

A build context is a file or set of files available at a specific path or URL. To understand this better, say you have some supporting files that you need during a Docker image build—for instance, an application-specific config file that was generated earlier and needs to be part of the container.

The build context can be local or remote—you can even set the build context to the URL of a Git repository, which can come in handy if the source files are not located in the same host as the Docker daemon or if you want to test feature branches. You simply set the context to the branch. The build command would look like this:
docker build https://github.com/sathyabhat/sample-repo.git#mybranch
Similarly, to build images based on your Git tags, the build command would look like this:
docker build https://github.com/sathyabhat/sample-repo.git#mytag
Working on a feature via a pull request? Want to try that pull request? Not a problem, you can even set the context to a pull request:
docker build https://github.com/sathyabhat/sample-repo.git#pull/1337/head

The build command sets the context to the path or URL provided, uploading the files available to the Docker daemon and allowing it to build the image. You are not limited to a build context of an URL or path. If you pass an URL to a remote tarball (i.e., a .tar file), the tarball at the URL is downloaded onto the Docker daemon and the build command is issued with that as the build context .

Caution

If you provide the Dockerfile on the root (/) directory and set that as the context, doing so will transfer your hard disk contents to the Docker daemon.

Dockerignore

You should now understand that the build context transfers the contents of the current directory to the Docker daemon during the build. Consider the case where the context directory has a lot of files/directories that are not relevant to the build process. Uploading these files/directories can cause a significant increase in network traffic. A Dockerignore file, much like gitignore, allows you to define files that are exempt from being transferred during the build process.

The ignore list is provided by a file known as .dockerignore and when the Docker CLI finds this file, it modifies the context to exclude the files/patterns provided in the file. Anything starting with a hash (#) is considered a comment and ignored. The following snippet shows a sample .dockerignore file that excludes the temp, .git, and .DS_Store directories:
*/temp*
.DS_Store
.git

BuildKit

With the 18.09 release of the Docker Engine, Docker overhauled their container build system using BuildKit. BuildKit is now the default build system for Docker. For most users, BuildKit works exactly as the legacy build system. BuildKit has a new command output for Docker image builds and, as a result, provides more detailed feedback about the build process.

If you see output that’s different from other learning resources, that means they may have not been updated with the output from BuildKit. BuildKit also tries to parallelize the build steps as much as possible, so you can expect faster build speeds, especially for containers that have a lot of Dockerfile instructions. For advanced users, BuildKit also introduces the ability to pass secrets into the build stage without the secret being in the final layer. The build output, when using BuildKit, is shown in Listing 4-2. (Note that the sha output has been truncated due to space constraints.)
docker build .
[+] Building 11.6s (6/6) FINISHED
 => [internal] load build definition from Dockerfile 0.1s
 => => transferring dockerfile: 84B   0.0s
 => [internal] load .dockerignore  0.1s
 => => transferring context: 2B 0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest 8.7s
 => [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s
 => [1/1] FROM docker.io/library/ubuntu:latest@sha256:aba80b7 2.7s
 => => resolve docker.io/library/ubuntu:latest@sha256:aba80b7 0.0s
 => => sha256:aba80b7 1.20kB / 1.20kB 0.0s
 => => sha256:376209 529B / 529B  0.0s
 => => sha256:987317 1.46kB / 1.46kB 0.0s
 => => sha256:c549ccf8 28.55MB / 28.55MB  1.1s
 => => extracting sha256:c549ccf   1.2s
 => exporting to image 0.0s
 => => exporting layers   0.0s
 => => writing image sha256:f2afdc
Listing 4-2

Build Output When BuildKit Is Enabled

As of writing this chapter, it is still possible to switch back to the legacy build process by setting the DOCKER_BUILDKIT flag , as shown in Listing 4-3.
DOCKER_BUILDKIT=0 docker build .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM ubuntu:latest
latest: Pulling from library/ubuntu
c549ccf8d472: Already exists
Digest: sha256:aba80b77e27148d99c034a987e7da3a287ed455390352663418c0f2ed40417fe
Status: Downloaded newer image for ubuntu:latest
 ---> 9873176a8ff5
Step 2/2 : CMD echo Hello World!
 ---> Running in d5ca2635eecd
Removing intermediate container d5ca2635eecd
 ---> 77711564634f
Successfully built 77711564634f
Listing 4-3

Switching Back to the Legacy Build Process

Unless you encounter any problems, I do not recommend switching back to the legacy build process. Stick to using Docker BuildKit. If you’re not seeing the new build output, ensure that you have updated to the latest version of Docker.

Building Using Docker Build

You’ll return to the sample Dockerfile a bit later. Let’s start with a simple Dockerfile first. Copy the following snippet to a file and save it as Dockerfile :
FROM ubuntu:latest
CMD echo Hello World!
Now build this image using the docker build command. You’ll see the response as shown in Listing 4-4. (Note that the sha output has been truncated.)
 docker build .
[+] Building 11.6s (6/6) FINISHED
 => [internal] load build definition from Dockerfile0.1s
 => => transferring dockerfile: 84B  0.0s
 => [internal] load .dockerignore 0.1s
 => => transferring context: 2B0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest 8.7s
 => [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s
 => [1/1] FROM docker.io/library/ubuntu:latest@sha256:aba80b7 2.7s
 => => resolve docker.io/library/ubuntu:latest@sha256:aba80b7 0.0s
 => => sha256:aba80b7 1.20kB / 1.20kB 0.0s
 => => sha256:376209 529B / 529B 0.0s
 => => sha256:987317 1.46kB / 1.46kB 0.0s
 => => sha256:c549ccf8 28.55MB / 28.55MB 1.1s
 => => extracting sha256:c549ccf  1.2s
 => exporting to image0.0s
 => => exporting layers  0.0s
 => => writing image sha256:f2afdc
Listing 4-4

Response from Docker Engine as it Builds the Dockerfile

You can see that the Docker build works in steps, each step corresponding to one instruction of the Dockerfile. Now try the build process again.
docker build .
[+] Building 0.1s (5/5) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 37B   0.0s
=> [internal] load .dockerignore  0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:latest 0.0s
=> CACHED [1/1] FROM docker.io/library/ubuntu:latest   0.0s
=> exporting to image 0.0s
=> => exporting layers   0.0s
=> => writing image sha256:f2afdcc   0.0s
Note how much faster the build process is the second time around. Docker has already cached the layers and doesn’t have to pull them again. To run this image, use the docker run command followed by the image ID f2afdcc:
docker run f2afdcc
Hello World!
So, the Docker runtime was able to start a container and run the command defined by the CMD instruction; hence, you get the output. Now, starting a container from an image by typing the image ID gets tedious fast. You can make this easier by tagging the image with an easy-to-remember name. You can do this by using the docker tag command, as shown here:
docker tag <image id> <tag name>
docker tag f2afdcc sathyabhat/hello-world
You’ll look at deeper look at tags in the next section. Docker also validates that the Dockerfile has valid instructions and they are in the proper syntax. Consider the earlier Dockerfile, shown in Listing 4-5.
FROM ubuntu:latest
LABEL author="sathyabhat"
LABEL description="An example Dockerfile"
RUN apt-get install python
COPY hello-world.py
CMD python hello-world.py
Listing 4-5

Dockerfile for Python with an Invalid Instruction

If you try to build this Dockerfile, Docker will complain about an error, as shown here:
docker build -f Dockerfile.invalid .
[+] Building 0.1s (2/2) FINISHED
=> [internal] load build definition from Dockerfile.invalid  0.0s
=> => transferring dockerfile: 336B  0.0s
=> [internal] load .dockerignore  0.0s
=> => transferring context: 2B 0.0s
failed to solve with frontend dockerfile.v0: failed to create LLB definition: dockerfile parse error line 6:
COPY requires at least two arguments, but only one was provided. Destination could not be determined.

You’ll get back to fixing this problem a little later in the chapter. For now, it’s time to look at some of the commonly used Dockerfile instructions and at tagging images.

Tags

A tag is a name that uniquely identifies a specific version of a Docker image. Tags are plain-text labels often used to identify specific details, such as the version, the base OS of the image, or the architecture of the Docker image. Tagging a Docker image gives you the flexibility to refer uniquely to a specific version, which makes it easier to roll back to previous versions of a Docker image if the current image is not working as expected.

If a tag is not specified, Docker will apply a string called "latest" as the default tag. The "latest" tag is often the source of many problems, especially for new Docker users. Many believe that having "latest" as the tag would mean that the Docker image is the latest version of the image and would always be updated to the latest version. This is not true—latest was chosen as a convention but doesn’t have any special meaning to it.

I do not recommend using latest as a tag, especially with production workloads. During development stages, omitting the tag will result in the "latest" tag being applied to every build. If there were a breaking change, since the tag is common, the previous images would get overwritten. This makes rolling back to the previous version of the image difficult unless you noted the SHA-hash of the image. Using specific tags makes it easier to determine, at a glance, what tag or version of Docker image is running on the container. Using specific tags also reduces the chance of breaking changes being propagated, especially if you tag your image as latest and have a breaking change or a bug. The next time your container crashes or restarts, it might pull the image with the breaking change or bug.

Docker images can be tagged and retagged using the docker tag command:
docker tag <image id> <tag name>
docker tag f2afdcc sathyabhat/hello-world
The tag names will typically have the Docker Registry prefixed to the tag name. If a registry name is not specified, Docker will assume the image is part of Docker Hub and will try to pull it from there. The tags can be assigned as part of the build process by passing the -t flag, as shown in Listing 4-6.
docker build -t sathyabhat/helloworld .
[+] Building 0.2s (5/5) FINISHED
=> [internal] load build definition from Dockerfile0.0s
=> => transferring dockerfile: 37B  0.0s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B0.0s
=> [internal] load metadata for docker.io/library/ubuntu:latest0.0s
=> CACHED [1/1] FROM docker.io/library/ubuntu:latest  0.0s
=> exporting to image 0.0s
=> => exporting layers  0.0s
=> => writing image sha256:f2afdcc 0.0s
=> => naming to docker.io/sathyabhat/helloworld
Listing 4-6

Adding a Tag When Building the Image

Note that even though you did not mention docker.io as part of the tag, it was prefixed to the tag name, as mentioned. The last line tells you that the image was tagged successfully. You can verify this by searching for docker images:
docker images sathyabhat/helloworld
REPOSITORY              TAG      IMAGE ID        CREATED      SIZE
sathyabhat/helloworld   latest   f2afdccf8eeb   3 weeks ago   72.7MB

Dockerfile Instructions

When looking at a Dockerfile, you’re mostly likely to run into the following instructions.
  • FROM

  • ADD

  • COPY

  • RUN

  • CMD

  • ENTRYPOINT

  • ENV

  • VOLUME

  • LABEL

  • EXPOSE

Let’s see what they do.

FROM

As you learned earlier, every image needs to start from a base image. The FROM instruction tells the Docker Engine the base image to be used for subsequent instructions. Every valid Dockerfile must start with a FROM instruction . The syntax is as follows:
FROM <image> [AS <name>]
OR
FROM <image>[:<tag>] [AS <name>]
OR
FROM <image>[@<digest>] [AS <name>]

Where <image> is the name of a valid Docker image from any public/private repository. As mentioned, if the tag is skipped, Docker will fetch the image tagged as latest.

WORKDIR

The WORKDIR instruction sets the current working directory for the RUN, CMD, ENTRYPOINT, COPY, and ADD instructions. WORKDIR is useful when you have multiple directories in the source code and you want some specific actions to be done within these specific directories. WORKDIR is also frequently used to set a separate location for the application to run in the container. The syntax is as follows:
WORKDIR /path/to/directory

WORKDIR can be set multiple times in a Dockerfile and, if a relative directory succeeds a previous WORKDIR instruction, it will be relative to the previously set working directory. Let’s look at an example demonstrating this.

Consider this Dockerfile:
FROM ubuntu:latest
WORKDIR /app
CMD pwd

The Dockerfile fetches the latest tagged image from Ubuntu as the base image, sets the current working directory to /app, and runs the pwd command when the image is run. The pwd command prints the current working directory.

Let’s try to build and run this and examine the output:
docker build -t sathybhat/workdir .
[+] Building 0.7s (6/6) FINISHED
 => [internal] load build definition from Dockerfile  0.0s
 => => transferring dockerfile: 36B 0.0s
 => [internal] load .dockerignore0.0s
 => => transferring context: 2B  0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest  0.6s
 => [1/2] FROM docker.io/library/ubuntu:latest@sha256:b3e2e4  0.0s
 => CACHED [2/2] WORKDIR /app 0.0s
 => exporting to image  0.0s
 => => exporting layers 0.0s
 => => writing image sha256:f8853df 0.0s
 => => naming to docker.io/sathybhat/workdir
Now you run the newly built image :
docker run sathybhat/workdir
/app
The result of pwd makes it clear that the current working directory is set as /app by way of the WORKDIR instruction. Modify the Dockerfile to add a couple of WORKDIR instructions, as shown here:
FROM ubuntu:latest
WORKDIR /usr
WORKDIR src
WORKDIR app
CMD pwd
Let’s build and run the new image:
docker build -t sathybhat/workdir .
[+] Building 0.7s (8/8) FINISHED
 => [internal] load build definition from Dockerfile  0.0s
 => => transferring dockerfile: 121B  0.0s
 => [internal] load .dockerignore 0.0s
 => => transferring context: 2B 0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest  0.6s
 => [1/4] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47  0.0s
 => CACHED [2/4] WORKDIR /usr 0.0s
 => CACHED [3/4] WORKDIR src  0.0s
 => CACHED [4/4] WORKDIR app  0.0s
 => exporting to image  0.0s
 => => exporting layers 0.0s
 => => writing image sha256:207b405  0.0s
 => => naming to docker.io/sathyabhat/workdir
Note that the image ID has changed, so that’s a new image being built with the same tag:
docker run sathybhat/workdir
/usr/src/app
As expected, the WORKDIR instructions of the relative directory have appended to the initial absolute directory set. By default, the WORKDIR is set as /, so any WORKDIR instructions featuring a relative directory will be appended to /. Here’s an example demonstrating this. Let’s modify the Dockerfile as follows:
FROM ubuntu:latest
WORKDIR var
WORKDIR log/nginx
CMD pwd
Build the image:
docker build -t sathyabhat/workdir .
[+] Building 1.8s (8/8) FINISHED
 => [internal] load build definition from Dockerfile   0.0s
 => => transferring dockerfile: 115B   0.0s
 => [internal] load .dockerignore   0.0s
 => => transferring context: 2B  0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest  1.6s
 => [auth] library/ubuntu:pull token for registry-1.docker.io  0.0s
 => CACHED [1/3] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s
 => [2/3] WORKDIR var   0.0s
 => [3/3] WORKDIR log/nginx   0.0s
 => exporting to image  0.0s
 => => exporting layers 0.0s
 => => writing image sha256:e7ded5d 0.0s
 => => naming to docker.io/sathyabhat/workdir
Now run it:
docker run sathyabhat/workdir
/var/log/nginx

Notice that you did not set any absolute working directory in the Dockerfile—the relative directories were appended to the default .

ADD and COPY

At first glance, the ADD and COPY instructions seem to be the same—they allow you to transfer files from the host to the container’s filesystem. COPY supports basic copying of files to the container, whereas ADD has support for features like tarball auto extraction (i.e., Docker will automatically extract compressed files added from local directory) and remote URL support (i.e., Docker will download the resources from a remote URL).

The syntax for both are quite similar:
ADD <source> <destination>
COPY  <source> <destination>

The ADD instruction is useful when you’re adding files from remote URLs or you have compressed files from the local filesystem that need to be automatically extracted into the container filesystem.

As an example, the following COPY instruction copies a single file called hugo to the /app directory in the container:
COPY hugo /app/
The following ADD instruction fetches a compressed file called hugo_0.88.0_Linux-64bit.tar.gz from the URL but doesn’t automatically decompress the file:
ADD https://github.com/gohugoio/hugo/releases/download/v0.88.0/hugo_0.88.0_Linux-64bit.tar.gz /app/
While the following ADD instruction will copy and automatically extract the contents of the compressed file to the /app directory in the container.
ADD hugo_0.88.0_Linux-64bit.tar.gz /app/
For Dockerfiles used to build Linux containers, both instructions let you change the owner/group of the files being added to the container. This is done using the --chown flag, as follows:
ADD --chown=<user>:<group> <source> <destination>
COPY --chown=<user>:<group> <source> <destination>
For example, if you want to add requirements.txt from the current working directory to the /usr/share/app directory, the instruction would be as follows:
ADD requirements.txt /usr/share/app
COPY  requirements.txt /usr/share/app
Both ADD and COPY support wildcards while specifying patterns. For example, having the following instructions in your Dockerfile will copy all the files with the .py extension to the /apps/ directory of the image.
ADD *.py /apps/
COPY *.py /apps/
Note

Docker recommends using COPY over ADD, especially when it’s a local file that’s being copied.

There are some points to consider when choosing COPY versus ADD. In the case of the COPY instruction :
  • If the <destination> does not exist in the image, it will be created.

  • All new files/directories are created with UID and GID as 0—that is, as the root user. To change this, you can use the --chown flag.

  • If the files/directories contain special characters, they need to be escaped.

  • The <destination> can be absolute or relative paths. In case of relative paths, the relativeness will be inferred from the path set by the WORKDIR instruction.

  • If the <destination> doesn’t end with a trailing slash, it will be considered a file and the contents of the <source> will be written into <destination>.

  • If the <source> is specified as a wildcard pattern, the <destination> must be a directory and must end with a trailing slash; otherwise, the build process will fail.

  • The <source> must be within the build context. It cannot be a file/directory outside of the build context because the first step of a Docker build process involves sending the context directory to the Docker daemon.

In case of the ADD instruction :

  • If the <source> is a URL and the <destination> is not a directory and doesn’t end with a trailing slash, the file is downloaded from the URL and copied into <destination>.

  • If the <source> is a URL and the <destination> is a directory and ends with a trailing slash, the filename is inferred from the URL and the file is downloaded and copied to <destination>/<filename>.

  • If the <source> is a local tarball of a known compression format, the tarball is unpacked as a directory. Remote tarballs, however, are not uncompressed.

RUN

The RUN instruction will execute any command during the build step of the container. This creates a new layer that is available for the next steps in the Dockerfile. It is important to note that the command following the RUN instruction runs only when the image is being built. The RUN instruction has no relevance when a container has started and is running.

RUN has two forms, the shell form and the exec form. In the shell form, the command is written space-delimited, as shown here:
RUN <command>

This form makes it possible to use shell variables, subcommands, command pipes, and command chains in the RUN instruction itself.

Consider a scenario where you want to embed the kernel release version into the home directory of the Docker image. You can get the kernel release and version using the uname –rv command. This output can be then printed using echo and then redirected to a file called kernel-info in the home directory of the image. You can do this with the RUN instruction in shell form, as shown here:
RUN echo `uname -rv` > $HOME/kernel-info
In exec form, the command is written comma-delimited and surrounded by quotes, as shown here:
RUN ["executible", "parameter 1", " parameter 2"] (the exec form)

Unless you need to use shell features like chaining and redirection, it is recommended to use the exec form for the RUN instruction.

Layer Caching

When the image is built, Docker will cache the layers that it has pulled. This is evident from the build logs. Consider the following Dockerfile:
FROM ubuntu:latest
RUN apt-get update
The build log when you run docker build is shown here:
docker build -f Dockerfile .
[+] Building 8.1s (7/7) FINISHED
 => [internal] load build definition from Dockerfile  0.1s
 => => transferring dockerfile: 96B 0.0s
 => [internal] load .dockerignore   0.0s
 => => transferring context: 2B  0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest  1.8s
 => [auth] library/ubuntu:pull token for registry-1.docker.io  0.0s
 => CACHED [1/2] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s
 => [2/2] RUN apt-get update  6.0s
 => exporting to image  0.2s
 => => exporting layers 0.1s
 => => writing image sha256:a9824f6

The logs indicate that, instead of redownloading the layer for the base Ubuntu image, Docker uses the cached layer saved to disk. This applies to all the layers that are created—and Docker creates a new layer whenever it encounters RUN, COPY, or ADD instructions. Having the right order of instructions can greatly improve whether Docker will reuse the layers. This can not only improve the image build speed, but also reduce container start times by virtue of having lesser number of layers to download.

Due to the way layer caching works, it is always best to chain the package update and package install as a single RUN instruction. Consider a Dockerfile where the run instructions are as shown here:
RUN apt-get update
RUN apt-get install pkg1
RUN apt-get install pkg2
RUN apt-get install pkg3
When Docker builds this image, it caches the four layers created by the four RUN commands. To reduce the number of layers, and to prevent packages not being able to be installed due to the package cache being out of date, it is best to chain the update and installs, as shown here:
RUN apt-get update && apt-get install -y
   pkg1
   pkg2
   pkg3
   pkg4

This creates a single layer with the packages to be installed, and any change in any of the packages will invalidate the cache and cause a new layer to be created with the updated packages. If you want to explicitly instruct Docker to avoid using the cache, then passing --no-cache flag to the docker build command will skip using the cache.

CMD and ENTRYPOINT

The CMD and ENTRYPOINT instructions define which command is executed when running a container. The syntax for both is shown here:
CMD ["executable","param1","param2"] (exec form)
CMD ["param1","param2"] (as default parameters to ENTRYPOINT)
CMD command param1 param2 (shell form)
ENTRYPOINT ["executable", "param1", "param2"] (exec form)
ENTRYPOINT command param1 param2 (shell form)
The ENTRYPOINT instruction is best when you want your container to function like an executable, and the CMD instruction provides the defaults for an executing container. Consider the Dockerfile shown here:
FROM ubuntu:latest
RUN apt-get update &&
    apt-get install -y curl &&
    rm -rf /var/lib/apt/lists/*
CMD ["curl"]
In this Docker image, Ubuntu is the base image, curl is installed on it, and curl is the parameter for the CMD instruction . This means that when the container is created and run, it will run curl without any parameters. Let’s build the image for the Dockerfile shown here:
docker build –t sathyabhat/curl .
[+] Building 11.8s (6/6) FINISHED
 => [internal] load build definition from Dockerfile 0.0s
 => => transferring dockerfile: 50B  0.0s
 => [internal] load .dockerignore 0.0s
 => => transferring context: 2B   0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest   0.7s
 => CACHED [1/2] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47  0.0s
 => [2/2] RUN apt-get update &&   apt-get install -y curl 10.7s
 => exporting to image   0.3s
 => => exporting layers  0.3s
 => => writing image sha256:8a9fc4b  0.0s
 => => naming to docker.io/sathyabhat/curl
You can see the result when you run the container:
docker run sathyabhat/curl
curl: try 'curl --help' or 'curl --manual' for more information
This is because curl expects a parameter to be passed. You can override the CMD instruction by passing arguments to the docker run command. As an example, try to curl wttr.in, which fetches the current weather.
docker run sathyabhat/curl wttr.in
docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "exec: "wttr.in": executable file not found in $PATH": unknown.
Uh oh, an error. As mentioned, the parameters after docker run are used to override the CMD instruction. However, you have passed only wttr.in as the argument, not the executable itself. For the override to work properly, you need to pass in the executable, which is curl, as well:
docker run sathyabhat/curl -s wttr.in
Weather report: Gurgaon, India
               Haze
  _ - _ - _ -  24-25 °C
   _ - _ - _   ↖ 13 km/h
  _ - _ - _ -  3 km
               0.0 mm

Passing an executable every time to override a parameter can be quite tedious. This is where the combination of ENTRYPOINT and CMD shines. You can set ENTRYPOINT to the executable while the parameter can be passed from the command line and will be overridden.

Modify the Dockerfile as follows:
FROM ubuntu:latest
RUN apt-get update &&
apt-get install -y curl
ENTRYPOINT ["curl", "-s"]
Build the image again:
docker build -t sathyabhat/curl .
[+] Building 0.7s (6/6) FINISHED
 => [internal] load build definition from Dockerfile.listing-4-x-5 0.0s
 => => transferring dockerfile: 157B 0.0s
 => [internal] load .dockerignore 0.0s
 => => transferring context: 2B   0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest   0.6s
 => [1/2] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47   0.0s
 => CACHED [2/2] RUN apt-get update &&   apt-get install -y curl 0.0s
 => exporting to image   0.0s
 => => exporting layers  0.0s
 => => writing image sha256:7e31728  0.0s
 => => naming to docker.io/sathyabhat/curl
Now you can curl any URL by just passing the URL as a parameter, instead of having to add the executable as well.
docker run sathyabhat/curl wttr.in
Weather report: Gurgaon, India
               Haze
  _ - _ - _ -  24-25 °C
   _ - _ - _   ↖ 13 km/h
  _ - _ - _ -  3 km
               0.0 mm

Of course, curl is just an example here. You can replace curl with any other program that accepts parameters (such as load-testing utilities, benchmarking utilities, etc.) and the combination of CMD and ENTRYPOINT makes it easy to distribute the image.

Note that the ENTRYPOINT must be provided in exec form—writing it in shell form means that the parameters are not passed properly and will not work as expected. Table 4-1 is from Docker’s Reference Guide. It explains the matrix of allowed ENTRYPOINT/CMD combinations, assuming p1_cmd, p1_entry and p2_cmd, p2_entry are the CMD and ENTRYPOINT variations of commands p1 and p2 that you want to run in the container.
Table 4-1

Commands for ENTRYPOINT/CMD Combinations

 

No ENTRYPOINT

ENTRYPOINT exec_entry p1_entry

ENTRYPOINT ["exec_entry", "p1_entry"]

No CMD

Error, not allowed

/bin/sh -c exec_entry p1_entry

exec_entry p1_entry

CMD ["exec_cmd", "p1_cmd"]

exec_cmd p1_cmd

/bin/sh -c exec_entry p1_entry

exec_entry p1_entry exec_cmd p1_cmd

CMD ["p1_cmd", "p2_cmd"]

p1_cmd p2_cmd

/bin/sh -c exec_entry p1_entry

exec_entry p1_entry p1_cmd p2_cmd

CMD exec_cmd p1_cmd

/bin/sh -c exec_cmd p1_cmd

/bin/sh -c exec_entry p1_entry

exec_entry p1_entry /bin/sh -c exec_cmd p1_cmd

The following points are important to remember about the shell and exec forms:
  • As mentioned earlier, you can specify RUN, CMD , and ENTRYPOINT in shell form and exec form. Which should be used will entirely depend on the requirements. But as general guide:
    • In shell form, the command is run in a shell with the command as a parameter. This form provides for a shell where shell variables, subcommands, commanding piping, and chaining are possible.

    • In exec form, the command does not invoke a command shell. This means that normal shell processing (such as $VARIABLE substitution, piping, etc.) will not work.

  • A program started in shell form will run as a subcommand of /bin/sh -c. This means the executable will not be running as PID and will not receive UNIX signals. As a consequence, a Ctrl+C to send a SIGTERM will not be forwarded to the container and the application might not exit correctly .

ENV

The ENV instruction sets the environment variables to the image. The ENV instruction has two forms:
ENV <key> <value>
ENV <key>=<value> ...

In the first form, the entire string after the <key> is considered the value, including whitespace characters. Only one variable can be set per line in this form.

In the second form, multiple variables can be set at one time, with the equals (=) character assigning value to the key.

The environment variables set are persisted through the container runtime. They can be viewed using docker inspect.

Consider this Dockerfile:
FROM ubuntu:latest
ENV LOGS_DIR="/var/log"
ENV APPS_DIR /apps/
Build the Docker image:
docker build  -t sathyabhat/env .
[+] Building 1.7s (6/6) FINISHED
 => [internal] load build definition from Dockerfile.listing-4-x-6   0.0s
 => => transferring dockerfile: 50B 0.0s
 => [internal] load .dockerignore   0.0s
 => => transferring context: 2B  0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest  1.6s
 => [auth] library/ubuntu:pull token for registry-1.docker.io  0.0s
 => CACHED [1/1] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s
 => exporting to image  0.0s
 => => exporting layers 0.0s
 => => writing image sha256:23eb815 0.0s
 => => naming to docker.io/sathyabhat/env
You can inspect the environment variables by using the following command:
docker inspect sathyabhat/env | jq ".[0].Config.Env"
The output will be as follows:
[
 "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  "LOGS_DIR=/var/log",
  "APPS_DIR=/apps/"
]
The environment variables defined for a container can be changed when running a container with the -e flag. In the previous example, change the LOGS_DIR value to /logs for a container. This is achieved by typing the following command:
docker run -it -e LOGS_DIR="/logs" sathyabhat/env
You can confirm the changed value by typing the following command at the terminal:
printenv | grep LOGS
LOGS_DIR=/logs
Type exit to close the interactive terminal of the container. To assign multiple environment variables, pass the additional environment variables using the -e flag, just as the first environment variable. In the previous example , if you were to override LOGS_DIR as well as APPS_DIR, it can be done using the following command:
docker run -it -e LOGS_DIR="/logs" -e APPS_DIR="/opt" sathyabhat/env
printenv | grep DIR
LOGS_DIR=/logs
APPS_DIR=/opt

Type exit to close the interactive terminal of the container.

VOLUME

The VOLUME instruction tells Docker to create a mount point on the container and mount it externally from the host. For instance, an instruction like this:
VOLUME /var/logs/nginx

tells Docker to mark the /var/logs/nginx directory as a mount point, with the data being mounted from the Docker host. This, when combined with the volume flag on the Docker run command, will result in data being persisted on the Docker host as a volume. This volume can then be backed up, moved, or transferred using Docker CLI commands. You will learn more about volumes in a later chapter in this book.

EXPOSE

The EXPOSE instruction tells Docker that the container listens for the specified network ports at runtime. The syntax is as follows:
EXPOSE <port> [<port>/<protocol>...]
For example, if you want to expose port 80, the EXPOSE instruction is as follows:
EXPOSE 80
If you want to expose port 53 on TCP and UDP, the Dockerfile instruction is the following:
EXPOSE 53/tcp
EXPOSE 53/udp

You can also include the port number and whether the port listens on TCP/UDP or both. If not specified, Docker assumes the protocol to be TCP.

Note

An EXPOSE instruction doesn’t publish the port. For the port to be published to the host, you need to use the -p flag with docker run to publish and map the ports.

Here’s a sample Dockerfile that uses the nginx image with port 80 exposed in the container.
FROM nginx:alpine
EXPOSE 80
Build the container:
[+] Building 0.4s (5/5) FINISHED
 => [internal] load build definition from Dockerfile  0.0s
 => => transferring dockerfile: 50B 0.0s
 => [internal] load .dockerignore   0.0s
 => => transferring context: 2B  0.0s
 => [internal] load metadata for docker.io/library/nginx:alpine   0.2s
 => CACHED [1/1] FROM docker.io/library/nginx:alpine@sha256:9152859  0.0s
 => exporting to image  0.0s
 => => exporting layers 0.0s
 => => writing image sha256:33fcd52 0.0s
 => => naming to docker.io/sathyabhat/web
To run this container, you have to provide the host port to which it is to be mapped . Map it to port 8080 on the host to port 80 of the container. To do that, type the following command:
docker run -d -p 8080:80 sathyabhat:web
The -d flag makes the nginx container run in the background and the -p flag does the port mapping. Confirm that the container is running:
curl http://localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>

LABEL

The LABEL instruction adds metadata to an image as a key-value pair.
LABEL <key>=<value> <key>=<value> <key>=<value> …
An image can have multiple labels and is typically used to add some metadata to assist in searching and organizing images and other Docker objects. Docker recommends the following guidelines.
  • For keys:
    • Authors of third-party tools should prefix each key with reverse DNS notation of a domain owned by them: for example, com.sathyasays.my-image.

    • com.docker.*, io.docker.*, and org.dockerproject.* are reserved by Docker for internal use.

    • Label keys should begin and end with lowercase letters and should contain only lowercase alphanumeric characters and the period (.) and hyphen (-) characters. Consecutive hyphens and periods are not allowed.

    • The period (.) separates the namespace fields.

  • For values:
    • Label values can contain any data type that can be represented as a string, including JSON, XML, YAML, and CSV types .

Guidelines and Recommendations for Writing Dockerfiles

The following are some guidelines and best practices for writing Dockerfiles as recommended by Docker.
  • Containers should be ephemeral. Docker recommends that images generated by Dockerfiles should be as ephemeral as possible. You should be able to stop, destroy, and restart the container at any point with minimal setup and configuration to the container. The container should ideally not write data to the container filesystem, and any persistent data should be written to Docker volumes or to data storage managed outside the container (for example, using a block storage like Amazon S3).

  • Keep the build context minimal. You read about build context earlier in this chapter. It’s important to keep the build context as minimal as possible to reduce the build times and the image size. This can be done by making effective use of the .dockerignore file.

  • Use multi-stage builds. Multi-stage builds help in drastically reducing the size of the image without having to write complicated scripts to transfer/keep the required artifacts. Multi-stage builds are described in the next section.

  • Skip unwanted packages. Having unwanted or nice-to-have packages increases the size of the image, introduces unwanted dependent packages, and increases the surface area for attacks.

  • Minimize the number of layers. While not as big of a concern as they used to be, it’s still important to reduce the number of layers in the image. As of Docker 1.10 and above, only RUN, COPY, and ADD instructions create layers. With these in mind, having a minimal of these instructions or combining many lines of the respective instructions reduces the number of layers, ultimately reducing the size of the image.

Using Multi-Stage Builds

As of version 17.05 and above, Docker added support for multi-stage builds, allowing complex image builds to be performed without the Docker image being unnecessarily bloated. Multi-stage builds are especially useful when you’re building images of applications that require some additional build-time dependencies but are not needed during runtime. Most common examples are applications written using programming languages such as Go or Java, where prior to multi-stage builds, it was common to have two different Dockerfiles. One was for the build and the other was for the release and the orchestration of the artifacts from the build time image to the runtime image.

With multi-stage builds, a single Dockerfile can be leveraged for build and deploy images—the build images can contain the build tools required for generating the binary or the artifact. In the second stage, the artifact can be copied over to the runtime image, thereby considerably reducing the size of the runtime image. For a typical multi-stage build, a build stage has several layers—each layer for installing tools required to build the application, generate the dependencies, and generate the application. In the final layer, the application built from build stages is copied over to the final layer and only that layer is considered for building the image. The build layers are discarded, drastically reducing the size of the final image.

Although this book doesn’t focus on multi-stage builds in detail, you will try an exercise on how to create a multi-stage build and see how much smaller using a slim image with multi-stage build makes the final image. More details about multi-stage builds are available on Docker’s website at https://docs.docker.com/develop/develop-images/multistage-build/.

Exercises

Building a Simple Hello World Docker Image

The start of the chapter introduced a simple Dockerfile that did not build due to syntax errors. In this exercise, you see how to fix that Dockerfile and add some of the instructions that you learned in this chapter.

Tip The source code and associated Dockerfile are available on the GitHub repo of the book, at https://github.com/Apress/practical-docker-with-python, in the source-code/chapter-4/exercise-1 directory.

The original Dockerfile is as follows:
FROM ubuntu:latest
LABEL author="sathyabhat"
LABEL description="An example Dockerfile"
RUN apt-get install python
COPY hello-world.py
CMD python hello-world.py

Trying to build this will result in an error since hello-world.py is missing. Let’s fix the build error. To do this, you need to add a hello-world.py that reads an environment variable, NAME, and prints Hello, $NAME!. If the environment variable is not defined, it will print "Hello, World!".

The contents of hello-world.py are as follows:
#!/usr/bin/env python3
from os import getenv
if getenv('NAME') is None:
        name = 'World'
else:
        name = getenv('NAME')
print(f"Hello, {name}!")
The corrected Dockerfile is as follows:
FROM python:3-alpine
LABEL description="Dockerfile for Python script which prints Hello, Name"
COPY hello-world.py /app/
ENV NAME=Readers
CMD python3 /app/hello-world.py
Build the Dockerfile:
docker build -t sathyabhat/chap04-ex1 .
[+] Building 1.9s (8/8) FINISHED
 => [internal] load build definition from Dockerfile  0.0s
 => => transferring dockerfile: 37B 0.0s
 => [internal] load .dockerignore   0.0s
 => => transferring context: 2B  0.0s
 => [internal] load metadata for docker.io/library/python:3-alpine   1.7s
 => [auth] library/python:pull token for registry-1.docker.io  0.0s
 => [internal] load build context   0.0s
 => => transferring context: 36B 0.0s
 => [1/2] FROM docker.io/library/python:3-alpine@sha256:3998e97  0.0s
 => CACHED [2/2] COPY hello-world.py /app/   0.0s
 => exporting to image  0.0s
 => => exporting layers 0.0s
 => => writing image sha256:538be87 0.0s
 => => naming to docker.io/sathyabhat/chap04-ex1
Confirm the image name and size:
docker images sathyabhat/chap04-ex1
REPOSITORY             TAG     IMAGE ID      CREATED      SIZE
sathyabhat/chap04-ex1  latest  538be873d192  3 hours ago  45.1MB
Run the Docker image:
docker run sathyabhat/chap04-ex1
Hello, Readers!
Try overriding the environment variable at runtime. You can do this by providing the -e parameter with docker run:
docker run -e NAME=all sathyabhat/chap04-ex1
Hello, all!

Congrats! You’ve successfully written your first Dockerfile and built your first Docker image.

A Look at Slim Docker Release Image (Using Multi-Stage Builds)

In this exercise, you will build two Docker images. The first image uses a standard build with python:3 as the base image, whereas the second image gives an overview of how multi-stage builds can be utilized.

Tip The source code and associated Dockerfile are available on the GitHub repo of the book at https://github.com/Apress/practical-docker-with-python, in the source-code/chapter-4/exercise-2/ directory.

Building the Docker Image Using a Standard Build

Create a requirements.txt file with the following content:
praw==3.6.0
Create a Dockerfile with the following content:
FROM python:3
COPY requirements.txt .
RUN pip install -r requirements.txt
Now build the Docker image:
[+] Building 7.2s (8/8) FINISHED
 => [internal] load build definition from Dockerfile   0.3s
 => => transferring dockerfile: 114B 0.0s
 => [internal] load .dockerignore 0.3s
 => => transferring context: 2B   0.0s
 => [internal] load metadata for docker.io/library/python:3  0.0s
 => [internal] load build context 0.6s
 => => transferring context: 54B  0.0s
 => [1/3] FROM docker.io/library/python:3  1.6s
 => [2/3] COPY requirements.txt . 0.2s
 => [3/3] RUN pip install -r requirements.txt 3.3s
 => exporting to image   1.6s
 => => exporting layers  1.5s
 => => writing image sha256:03191af  0.0s
 => => naming to docker.io/sathyabhat/base-build
The image was built successfully! Let’s determine the size of the image:
docker images sathyabhat/base-build

Repository

Tag

Image ID

Created

Size

sathyabhat/base-build

latest

03191af

About a minute ago

895MB

The Docker image sits at a fairly hefty 895MB, even though you did not add any of your application code, just a dependency. Let’s rewrite it to a multi-stage build.

Building the Docker Image Using a Multi-Stage Build
FROM python:3 as python-base
COPY requirements.txt .
RUN pip install -r requirements.txt
FROM python:3-alpine
COPY --from=python-base /root/.cache /root/.cache
COPY --from=python-base requirements.txt .
RUN pip install -r requirements.txt && rm -rf /root/.cache

The Dockerfile is different in that there are multiple FROM statements, signifying the different stages. In the first stage, you build the required packages using the python:3 image, which has the necessary build tools.

In the second stage, you copy the files installed in the first stage, reinstall them (notice this time that pip fetches the cached files and doesn’t build them again), and then delete the cached install files. The build logs are shown here:
[+] Building 0.6s (13/13) FINISHED
 => [internal] load build definition from Dockerfile  0.2s
 => => transferring dockerfile: 35B 0.0s
 => [internal] load .dockerignore .1s
 => => transferring context: 2B  0.0s
 => [internal] load metadata for docker.io/library/python:3-alpine .2s
 => [internal] load metadata for docker.io/library/python:3 0.0s
 => [internal] load build context .1s
 => => transferring context: 37B 0.0s
 => [stage-1 1/4] FROM docker.io/library/python:3-alpine@sha256:3998e97 0.0s
 => [python-base 1/3] FROM docker.io/library/python:3 0.0s
 => CACHED [python-base 2/3] COPY requirements.txt .  0.0s
 => CACHED [python-base 3/3] RUN pip install -r requirements.txt  0.0s
 => CACHED [stage-1 2/4] COPY --from=python-base /root/.cache /root/.cache 0.0s
 => CACHED [stage-1 3/4] COPY --from=python-base requirements.txt .  0.0s
 => CACHED [stage-1 4/4] RUN pip install -r requirements.txt && rm -rf /root/.cache 0.0s
 => exporting to image  0.1s
 => => exporting layers 0.0s
 => => writing image sha256:35c85a8 0.0s
 => => naming to docker.io/sathyabhat/multistage-build
Examining the size of the image using docker images shows you that using a multi-stage build has reduced the image size by quite a lot. This translates to reduced image sizes, faster application starts, and even reduced costs, as you are saving on bandwidth that is required to pull the container image.
docker images sathyabhat/multistage-build

Repository

Tag

Image ID

Created

Size

sathyabhat/multistage-build

latest

35c85a8497b5

About a minute ago

54.2MB

Writing a Dockerfile For Newsbot

In this exercise, you will write the Dockerfile for Newsbot, the Telegram chatbot project.

Tip The source code and associated Dockerfile are available on the GitHub repo of the book at https://github.com/Apress/practical-docker-with-python, in the source-code/chapter-4/exercise-3/ directory.

Let’s review what you need for this project:
  • A Docker image based on Python 3

  • The project dependencies listed in requirements.txt

  • An environment variable named NBT_ACCESS_TOKEN

Now that you have what you need, you can compose the Dockerfile. The general steps to composing a Dockerfile are as follows
  1. 1.

    Start with a proper base image.

     
  2. 2.

    Make a list of files required for the application.

     
  3. 3.

    Make a list of environment variables required for the application.

     
  4. 4.

    Copy the application files to the image using the COPY instruction.

     
  5. 5.

    Specify the environment variable with the ENV instruction.

     
Combining these steps, you arrive at this Dockerfile.
FROM python:3-alpine
WORKDIR /apps/subredditfetcher/
COPY . .
RUN ["pip", "install", "-r", "requirements.txt"]
CMD ["python", "newsbot.py"]
Now build the image:
[+] Building 0.9s (9/9) FINISHED
 => [internal] load build definition from Dockerfile   0.1s
 => => transferring dockerfile: 182B 0.0s
 => [internal] load .dockerignore 0.2s
 => => transferring context: 2B   0.0s
 => [internal] load metadata for docker.io/library/python:3-alpine 0.4s
 => [1/4] FROM docker.io/library/python:3-alpine@sha256:3998e97 0.0s
 => [internal] load build context 0.1s
 => => transferring context: 392B 0.0s
 => CACHED [2/4] WORKDIR /apps/subredditfetcher/ 0.0s
 => CACHED [3/4] COPY . .   0.0s
 => CACHED [4/4] RUN ["pip", "install", "-r", "requirements.txt"]  0.0s
 => exporting to image   0.1s
 => => exporting layers  0.0s
 => => writing image sha256:783b4c0  0.0s
 => => naming to docker.io/sathyabhat/newsbot
Now run the container. Take care to replace <token> with the Telegram Bot API key that you created in Chapter 3.
docker run –e NBT_ACCESS_TOKEN=<token> sathyabhat/newsbot
You should be seeing logs from the bot to ensure that it’s running:
INFO: <module> - Starting up
INFO: get_updates - received response: {'ok': True, 'result': []}
INFO: get_updates - received response: {'ok': True, 'result': []}
INFO: get_updates - received response: {'ok': True, 'result': []}

If you see these logs, congratulations! Not only did you write the Dockerfile for Newsbot, but you also built it and ran it successfully.

Summary

In this chapter, you gained a better understanding of what a Dockerfile is by reviewing its syntax. You are now one step closer to mastering writing a Dockerfile for Newsbot.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.39.23