Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4. Understanding the Dockerfile

Sathyajith Bhat¹

(1)

Bangalore, Karnataka, India

Now that you have a better understanding of Docker and its associated terminology, this chapter shows you how to convert your project into a containerized application using Docker. In this chapter, you learn what a Dockerfile is, including its syntax, and learn how to write a Dockerfile. With a better understanding of Dockerfiles, you can work toward the first step in writing a Dockerfile for the Newsbot app.

Dockerfile Primer

For a traditionally deployed application, building and packaging an application was often quite tedious. With the aim to automate the build and packaging of the application, people turned to different utilities, such as GNU Make, maven, Gradle, and so on, to build the application package. Similarly, in the Docker world, a Dockerfile is an automated way to build your Docker images.

The Dockerfile contains special instructions that tell the Docker Engine about the steps required to build an image. To invoke a build using Docker, you issue the Docker build command. Listing 4-1 shows a typical Dockerfile.

FROM ubuntu:latest

LABEL author="sathyabhat"

LABEL description="An example Dockerfile"

RUN apt-get install python

COPY hello-world.py

CMD python hello-world.py

Listing 4-1

A Typical Dockerfile

Looking at this Dockerfile, it’s easy to see what we’re telling the Docker Engine to build. However, don’t let the simplicity fool you—Dockerfiles let you build complex conditions when generating your Docker images. When a Docker build command is issued, it builds the Docker images from the Dockerfile and a build context.

Build Context

A build context is a file or set of files available at a specific path or URL. To understand this better, say you have some supporting files that you need during a Docker image build—for instance, an application-specific config file that was generated earlier and needs to be part of the container.

The build context can be local or remote—you can even set the build context to the URL of a Git repository, which can come in handy if the source files are not located in the same host as the Docker daemon or if you want to test feature branches. You simply set the context to the branch. The build command would look like this:

docker build https://github.com/sathyabhat/sample-repo.git#mybranch

Similarly, to build images based on your Git tags, the build command would look like this:

docker build https://github.com/sathyabhat/sample-repo.git#mytag

Working on a feature via a pull request? Want to try that pull request? Not a problem, you can even set the context to a pull request:

docker build https://github.com/sathyabhat/sample-repo.git#pull/1337/head

The build command sets the context to the path or URL provided, uploading the files available to the Docker daemon and allowing it to build the image. You are not limited to a build context of an URL or path. If you pass an URL to a remote tarball (i.e., a .tar file), the tarball at the URL is downloaded onto the Docker daemon and the build command is issued with that as the build context .

Caution

If you provide the Dockerfile on the root (/) directory and set that as the context, doing so will transfer your hard disk contents to the Docker daemon.

Dockerignore

You should now understand that the build context transfers the contents of the current directory to the Docker daemon during the build. Consider the case where the context directory has a lot of files/directories that are not relevant to the build process. Uploading these files/directories can cause a significant increase in network traffic. A Dockerignore file, much like gitignore, allows you to define files that are exempt from being transferred during the build process.

The ignore list is provided by a file known as .dockerignore and when the Docker CLI finds this file, it modifies the context to exclude the files/patterns provided in the file. Anything starting with a hash (#) is considered a comment and ignored. The following snippet shows a sample .dockerignore file that excludes the temp, .git, and .DS_Store directories:

*/temp*

.DS_Store

.git

BuildKit

With the 18.09 release of the Docker Engine, Docker overhauled their container build system using BuildKit. BuildKit is now the default build system for Docker. For most users, BuildKit works exactly as the legacy build system. BuildKit has a new command output for Docker image builds and, as a result, provides more detailed feedback about the build process.

If you see output that’s different from other learning resources, that means they may have not been updated with the output from BuildKit. BuildKit also tries to parallelize the build steps as much as possible, so you can expect faster build speeds, especially for containers that have a lot of Dockerfile instructions. For advanced users, BuildKit also introduces the ability to pass secrets into the build stage without the secret being in the final layer. The build output, when using BuildKit, is shown in Listing 4-2. (Note that the sha output has been truncated due to space constraints.)

docker build .

[+] Building 11.6s (6/6) FINISHED

=> [internal] load build definition from Dockerfile 0.1s

=> => transferring dockerfile: 84B 0.0s

=> [internal] load .dockerignore 0.1s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 8.7s

=> [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s

=> [1/1] FROM docker.io/library/ubuntu:latest@sha256:aba80b7 2.7s

=> => resolve docker.io/library/ubuntu:latest@sha256:aba80b7 0.0s

=> => sha256:aba80b7 1.20kB / 1.20kB 0.0s

=> => sha256:376209 529B / 529B 0.0s

=> => sha256:987317 1.46kB / 1.46kB 0.0s

=> => sha256:c549ccf8 28.55MB / 28.55MB 1.1s

=> => extracting sha256:c549ccf 1.2s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:f2afdc

Listing 4-2

Build Output When BuildKit Is Enabled

As of writing this chapter, it is still possible to switch back to the legacy build process by setting the DOCKER_BUILDKIT flag , as shown in Listing 4-3.

DOCKER_BUILDKIT=0 docker build .

Sending build context to Docker daemon 2.048kB

Step 1/2 : FROM ubuntu:latest

latest: Pulling from library/ubuntu

c549ccf8d472: Already exists

Digest: sha256:aba80b77e27148d99c034a987e7da3a287ed455390352663418c0f2ed40417fe

Status: Downloaded newer image for ubuntu:latest

---> 9873176a8ff5

Step 2/2 : CMD echo Hello World!

---> Running in d5ca2635eecd

Removing intermediate container d5ca2635eecd

---> 77711564634f

Successfully built 77711564634f

Listing 4-3

Switching Back to the Legacy Build Process

Unless you encounter any problems, I do not recommend switching back to the legacy build process. Stick to using Docker BuildKit. If you’re not seeing the new build output, ensure that you have updated to the latest version of Docker.

Building Using Docker Build

You’ll return to the sample Dockerfile a bit later. Let’s start with a simple Dockerfile first. Copy the following snippet to a file and save it as Dockerfile :

FROM ubuntu:latest

CMD echo Hello World!

Now build this image using the docker build command. You’ll see the response as shown in Listing 4-4. (Note that the sha output has been truncated.)

docker build .

[+] Building 11.6s (6/6) FINISHED

=> [internal] load build definition from Dockerfile0.1s

=> => transferring dockerfile: 84B 0.0s

=> [internal] load .dockerignore 0.1s

=> => transferring context: 2B0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 8.7s

=> [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s

=> [1/1] FROM docker.io/library/ubuntu:latest@sha256:aba80b7 2.7s

=> => resolve docker.io/library/ubuntu:latest@sha256:aba80b7 0.0s

=> => sha256:aba80b7 1.20kB / 1.20kB 0.0s

=> => sha256:376209 529B / 529B 0.0s

=> => sha256:987317 1.46kB / 1.46kB 0.0s

=> => sha256:c549ccf8 28.55MB / 28.55MB 1.1s

=> => extracting sha256:c549ccf 1.2s

=> exporting to image0.0s

=> => exporting layers 0.0s

=> => writing image sha256:f2afdc

Listing 4-4

Response from Docker Engine as it Builds the Dockerfile

You can see that the Docker build works in steps, each step corresponding to one instruction of the Dockerfile. Now try the build process again.

docker build .

[+] Building 0.1s (5/5) FINISHED

=> [internal] load build definition from Dockerfile 0.0s

=> => transferring dockerfile: 37B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 0.0s

=> CACHED [1/1] FROM docker.io/library/ubuntu:latest 0.0s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:f2afdcc 0.0s

Note how much faster the build process is the second time around. Docker has already cached the layers and doesn’t have to pull them again. To run this image, use the docker run command followed by the image ID f2afdcc:

docker run f2afdcc

Hello World!

So, the Docker runtime was able to start a container and run the command defined by the CMD instruction; hence, you get the output. Now, starting a container from an image by typing the image ID gets tedious fast. You can make this easier by tagging the image with an easy-to-remember name. You can do this by using the docker tag command, as shown here:

docker tag <image id> <tag name>

docker tag f2afdcc sathyabhat/hello-world

You’ll look at deeper look at tags in the next section. Docker also validates that the Dockerfile has valid instructions and they are in the proper syntax. Consider the earlier Dockerfile, shown in Listing 4-5.

FROM ubuntu:latest

LABEL author="sathyabhat"

LABEL description="An example Dockerfile"

RUN apt-get install python

COPY hello-world.py

CMD python hello-world.py

Listing 4-5

Dockerfile for Python with an Invalid Instruction

If you try to build this Dockerfile, Docker will complain about an error, as shown here:

docker build -f Dockerfile.invalid .

[+] Building 0.1s (2/2) FINISHED

=> [internal] load build definition from Dockerfile.invalid 0.0s

=> => transferring dockerfile: 336B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

failed to solve with frontend dockerfile.v0: failed to create LLB definition: dockerfile parse error line 6:

COPY requires at least two arguments, but only one was provided. Destination could not be determined.

You’ll get back to fixing this problem a little later in the chapter. For now, it’s time to look at some of the commonly used Dockerfile instructions and at tagging images.

Dockerfile Instructions

When looking at a Dockerfile, you’re mostly likely to run into the following instructions.

FROM
ADD
COPY
RUN
CMD
ENTRYPOINT
ENV
VOLUME
LABEL
EXPOSE

Let’s see what they do.

FROM

As you learned earlier, every image needs to start from a base image. The FROM instruction tells the Docker Engine the base image to be used for subsequent instructions. Every valid Dockerfile must start with a FROM instruction . The syntax is as follows:

FROM <image> [AS <name>]

FROM <image>[:<tag>] [AS <name>]

FROM <image>[@<digest>] [AS <name>]

Where <image> is the name of a valid Docker image from any public/private repository. As mentioned, if the tag is skipped, Docker will fetch the image tagged as latest.

WORKDIR

The WORKDIR instruction sets the current working directory for the RUN, CMD, ENTRYPOINT, COPY, and ADD instructions. WORKDIR is useful when you have multiple directories in the source code and you want some specific actions to be done within these specific directories. WORKDIR is also frequently used to set a separate location for the application to run in the container. The syntax is as follows:

WORKDIR /path/to/directory

WORKDIR can be set multiple times in a Dockerfile and, if a relative directory succeeds a previous WORKDIR instruction, it will be relative to the previously set working directory. Let’s look at an example demonstrating this.

Consider this Dockerfile:

FROM ubuntu:latest

WORKDIR /app

CMD pwd

The Dockerfile fetches the latest tagged image from Ubuntu as the base image, sets the current working directory to /app, and runs the pwd command when the image is run. The pwd command prints the current working directory.

Let’s try to build and run this and examine the output:

docker build -t sathybhat/workdir .

[+] Building 0.7s (6/6) FINISHED

=> [internal] load build definition from Dockerfile 0.0s

=> => transferring dockerfile: 36B 0.0s

=> [internal] load .dockerignore0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 0.6s

=> [1/2] FROM docker.io/library/ubuntu:latest@sha256:b3e2e4 0.0s

=> CACHED [2/2] WORKDIR /app 0.0s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:f8853df 0.0s

=> => naming to docker.io/sathybhat/workdir

Now you run the newly built image :

docker run sathybhat/workdir

/app

The result of pwd makes it clear that the current working directory is set as /app by way of the WORKDIR instruction. Modify the Dockerfile to add a couple of WORKDIR instructions, as shown here:

FROM ubuntu:latest

WORKDIR /usr

WORKDIR src

WORKDIR app

CMD pwd

Let’s build and run the new image:

docker build -t sathybhat/workdir .

[+] Building 0.7s (8/8) FINISHED

=> [internal] load build definition from Dockerfile 0.0s

=> => transferring dockerfile: 121B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 0.6s

=> [1/4] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s

=> CACHED [2/4] WORKDIR /usr 0.0s

=> CACHED [3/4] WORKDIR src 0.0s

=> CACHED [4/4] WORKDIR app 0.0s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:207b405 0.0s

=> => naming to docker.io/sathyabhat/workdir

Note that the image ID has changed, so that’s a new image being built with the same tag:

docker run sathybhat/workdir

/usr/src/app

As expected, the WORKDIR instructions of the relative directory have appended to the initial absolute directory set. By default, the WORKDIR is set as /, so any WORKDIR instructions featuring a relative directory will be appended to /. Here’s an example demonstrating this. Let’s modify the Dockerfile as follows:

FROM ubuntu:latest

WORKDIR var

WORKDIR log/nginx

CMD pwd

Build the image:

docker build -t sathyabhat/workdir .

[+] Building 1.8s (8/8) FINISHED

=> [internal] load build definition from Dockerfile 0.0s

=> => transferring dockerfile: 115B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 1.6s

=> [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s

=> CACHED [1/3] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s

=> [2/3] WORKDIR var 0.0s

=> [3/3] WORKDIR log/nginx 0.0s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:e7ded5d 0.0s

=> => naming to docker.io/sathyabhat/workdir

Now run it:

docker run sathyabhat/workdir

/var/log/nginx

Notice that you did not set any absolute working directory in the Dockerfile—the relative directories were appended to the default .

ADD and COPY

At first glance, the ADD and COPY instructions seem to be the same—they allow you to transfer files from the host to the container’s filesystem. COPY supports basic copying of files to the container, whereas ADD has support for features like tarball auto extraction (i.e., Docker will automatically extract compressed files added from local directory) and remote URL support (i.e., Docker will download the resources from a remote URL).

The syntax for both are quite similar:

ADD <source> <destination>

COPY <source> <destination>

The ADD instruction is useful when you’re adding files from remote URLs or you have compressed files from the local filesystem that need to be automatically extracted into the container filesystem.

As an example, the following COPY instruction copies a single file called hugo to the /app directory in the container:

COPY hugo /app/

The following ADD instruction fetches a compressed file called hugo_0.88.0_Linux-64bit.tar.gz from the URL but doesn’t automatically decompress the file:

ADD https://github.com/gohugoio/hugo/releases/download/v0.88.0/hugo_0.88.0_Linux-64bit.tar.gz /app/

While the following ADD instruction will copy and automatically extract the contents of the compressed file to the /app directory in the container.

ADD hugo_0.88.0_Linux-64bit.tar.gz /app/

For Dockerfiles used to build Linux containers, both instructions let you change the owner/group of the files being added to the container. This is done using the --chown flag, as follows:

ADD --chown=<user>:<group> <source> <destination>

COPY --chown=<user>:<group> <source> <destination>

For example, if you want to add requirements.txt from the current working directory to the /usr/share/app directory, the instruction would be as follows:

ADD requirements.txt /usr/share/app

COPY requirements.txt /usr/share/app

Both ADD and COPY support wildcards while specifying patterns. For example, having the following instructions in your Dockerfile will copy all the files with the .py extension to the /apps/ directory of the image.

ADD *.py /apps/

COPY *.py /apps/

Note

Docker recommends using COPY over ADD, especially when it’s a local file that’s being copied.

There are some points to consider when choosing COPY versus ADD. In the case of the COPY instruction :

If the <destination> does not exist in the image, it will be created.
All new files/directories are created with UID and GID as 0—that is, as the root user. To change this, you can use the --chown flag.
If the files/directories contain special characters, they need to be escaped.
The <destination> can be absolute or relative paths. In case of relative paths, the relativeness will be inferred from the path set by the WORKDIR instruction.
If the <destination> doesn’t end with a trailing slash, it will be considered a file and the contents of the <source> will be written into <destination>.
If the <source> is specified as a wildcard pattern, the <destination> must be a directory and must end with a trailing slash; otherwise, the build process will fail.
The <source> must be within the build context. It cannot be a file/directory outside of the build context because the first step of a Docker build process involves sending the context directory to the Docker daemon.

In case of the ADD instruction :

If the <source> is a URL and the <destination> is not a directory and doesn’t end with a trailing slash, the file is downloaded from the URL and copied into <destination>.
If the <source> is a URL and the <destination> is a directory and ends with a trailing slash, the filename is inferred from the URL and the file is downloaded and copied to <destination>/<filename>.
If the <source> is a local tarball of a known compression format, the tarball is unpacked as a directory. Remote tarballs, however, are not uncompressed.

RUN

The RUN instruction will execute any command during the build step of the container. This creates a new layer that is available for the next steps in the Dockerfile. It is important to note that the command following the RUN instruction runs only when the image is being built. The RUN instruction has no relevance when a container has started and is running.

RUN has two forms, the shell form and the exec form. In the shell form, the command is written space-delimited, as shown here:

RUN <command>

This form makes it possible to use shell variables, subcommands, command pipes, and command chains in the RUN instruction itself.

Consider a scenario where you want to embed the kernel release version into the home directory of the Docker image. You can get the kernel release and version using the uname –rv command. This output can be then printed using echo and then redirected to a file called kernel-info in the home directory of the image. You can do this with the RUN instruction in shell form, as shown here:

RUN echo `uname -rv` > $HOME/kernel-info

In exec form, the command is written comma-delimited and surrounded by quotes, as shown here:

RUN ["executible", "parameter 1", " parameter 2"] (the exec form)

Unless you need to use shell features like chaining and redirection, it is recommended to use the exec form for the RUN instruction.

Layer Caching

When the image is built, Docker will cache the layers that it has pulled. This is evident from the build logs. Consider the following Dockerfile:

FROM ubuntu:latest

RUN apt-get update

The build log when you run docker build is shown here:

docker build -f Dockerfile .

[+] Building 8.1s (7/7) FINISHED

=> [internal] load build definition from Dockerfile 0.1s

=> => transferring dockerfile: 96B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 1.8s

=> [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s

=> CACHED [1/2] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s

=> [2/2] RUN apt-get update 6.0s

=> exporting to image 0.2s

=> => exporting layers 0.1s

=> => writing image sha256:a9824f6

The logs indicate that, instead of redownloading the layer for the base Ubuntu image, Docker uses the cached layer saved to disk. This applies to all the layers that are created—and Docker creates a new layer whenever it encounters RUN, COPY, or ADD instructions. Having the right order of instructions can greatly improve whether Docker will reuse the layers. This can not only improve the image build speed, but also reduce container start times by virtue of having lesser number of layers to download.

Due to the way layer caching works, it is always best to chain the package update and package install as a single RUN instruction. Consider a Dockerfile where the run instructions are as shown here:

RUN apt-get update

RUN apt-get install pkg1

RUN apt-get install pkg2

RUN apt-get install pkg3

When Docker builds this image, it caches the four layers created by the four RUN commands. To reduce the number of layers, and to prevent packages not being able to be installed due to the package cache being out of date, it is best to chain the update and installs, as shown here:

RUN apt-get update && apt-get install -y

pkg1

pkg2

pkg3

pkg4

This creates a single layer with the packages to be installed, and any change in any of the packages will invalidate the cache and cause a new layer to be created with the updated packages. If you want to explicitly instruct Docker to avoid using the cache, then passing --no-cache flag to the docker build command will skip using the cache.

CMD and ENTRYPOINT

The CMD and ENTRYPOINT instructions define which command is executed when running a container. The syntax for both is shown here:

CMD ["executable","param1","param2"] (exec form)

CMD ["param1","param2"] (as default parameters to ENTRYPOINT)

CMD command param1 param2 (shell form)

ENTRYPOINT ["executable", "param1", "param2"] (exec form)

ENTRYPOINT command param1 param2 (shell form)

The ENTRYPOINT instruction is best when you want your container to function like an executable, and the CMD instruction provides the defaults for an executing container. Consider the Dockerfile shown here:

FROM ubuntu:latest

RUN apt-get update &&

apt-get install -y curl &&

rm -rf /var/lib/apt/lists/*

CMD ["curl"]

In this Docker image, Ubuntu is the base image, curl is installed on it, and curl is the parameter for the CMD instruction . This means that when the container is created and run, it will run curl without any parameters. Let’s build the image for the Dockerfile shown here:

docker build –t sathyabhat/curl .

[+] Building 11.8s (6/6) FINISHED

=> [internal] load build definition from Dockerfile 0.0s

=> => transferring dockerfile: 50B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 0.7s

=> CACHED [1/2] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s

=> [2/2] RUN apt-get update && apt-get install -y curl 10.7s

=> exporting to image 0.3s

=> => exporting layers 0.3s

=> => writing image sha256:8a9fc4b 0.0s

=> => naming to docker.io/sathyabhat/curl

You can see the result when you run the container:

docker run sathyabhat/curl

curl: try 'curl --help' or 'curl --manual' for more information

This is because curl expects a parameter to be passed. You can override the CMD instruction by passing arguments to the docker run command. As an example, try to curl wttr.in, which fetches the current weather.

docker run sathyabhat/curl wttr.in

docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "exec: "wttr.in": executable file not found in $PATH": unknown.

Uh oh, an error. As mentioned, the parameters after docker run are used to override the CMD instruction. However, you have passed only wttr.in as the argument, not the executable itself. For the override to work properly, you need to pass in the executable, which is curl, as well:

docker run sathyabhat/curl -s wttr.in

Weather report: Gurgaon, India

Haze

_ - _ - _ - 24-25 °C

_ - _ - _ ↖ 13 km/h

_ - _ - _ - 3 km

0.0 mm

Passing an executable every time to override a parameter can be quite tedious. This is where the combination of ENTRYPOINT and CMD shines. You can set ENTRYPOINT to the executable while the parameter can be passed from the command line and will be overridden.

Modify the Dockerfile as follows:

FROM ubuntu:latest

RUN apt-get update &&

apt-get install -y curl

ENTRYPOINT ["curl", "-s"]

Build the image again:

docker build -t sathyabhat/curl .

[+] Building 0.7s (6/6) FINISHED

=> [internal] load build definition from Dockerfile.listing-4-x-5 0.0s

=> => transferring dockerfile: 157B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 0.6s

=> [1/2] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s

=> CACHED [2/2] RUN apt-get update && apt-get install -y curl 0.0s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:7e31728 0.0s

=> => naming to docker.io/sathyabhat/curl

Now you can curl any URL by just passing the URL as a parameter, instead of having to add the executable as well.

docker run sathyabhat/curl wttr.in

Weather report: Gurgaon, India

Haze

_ - _ - _ - 24-25 °C

_ - _ - _ ↖ 13 km/h

_ - _ - _ - 3 km

0.0 mm

Of course, curl is just an example here. You can replace curl with any other program that accepts parameters (such as load-testing utilities, benchmarking utilities, etc.) and the combination of CMD and ENTRYPOINT makes it easy to distribute the image.

Note that the ENTRYPOINT must be provided in exec form—writing it in shell form means that the parameters are not passed properly and will not work as expected. Table 4-1 is from Docker’s Reference Guide. It explains the matrix of allowed ENTRYPOINT/CMD combinations, assuming p1_cmd, p1_entry and p2_cmd, p2_entry are the CMD and ENTRYPOINT variations of commands p1 and p2 that you want to run in the container.

Table 4-1

Commands for ENTRYPOINT/CMD Combinations

	No ENTRYPOINT	ENTRYPOINT exec_entry p1_entry	ENTRYPOINT ["exec_entry", "p1_entry"]
No CMD	Error, not allowed	/bin/sh -c exec_entry p1_entry	exec_entry p1_entry
CMD ["exec_cmd", "p1_cmd"]	exec_cmd p1_cmd	/bin/sh -c exec_entry p1_entry	exec_entry p1_entry exec_cmd p1_cmd
CMD ["p1_cmd", "p2_cmd"]	p1_cmd p2_cmd	/bin/sh -c exec_entry p1_entry	exec_entry p1_entry p1_cmd p2_cmd
CMD exec_cmd p1_cmd	/bin/sh -c exec_cmd p1_cmd	/bin/sh -c exec_entry p1_entry	exec_entry p1_entry /bin/sh -c exec_cmd p1_cmd

The following points are important to remember about the shell and exec forms:

As mentioned earlier, you can specify RUN, CMD , and ENTRYPOINT in shell form and exec form. Which should be used will entirely depend on the requirements. But as general guide:
- In shell form, the command is run in a shell with the command as a parameter. This form provides for a shell where shell variables, subcommands, commanding piping, and chaining are possible.
- In exec form, the command does not invoke a command shell. This means that normal shell processing (such as $VARIABLE substitution, piping, etc.) will not work.
A program started in shell form will run as a subcommand of /bin/sh -c. This means the executable will not be running as PID and will not receive UNIX signals. As a consequence, a Ctrl+C to send a SIGTERM will not be forwarded to the container and the application might not exit correctly .

ENV

The ENV instruction sets the environment variables to the image. The ENV instruction has two forms:

ENV <key> <value>

ENV <key>=<value> ...

In the first form, the entire string after the <key> is considered the value, including whitespace characters. Only one variable can be set per line in this form.

In the second form, multiple variables can be set at one time, with the equals (=) character assigning value to the key.

The environment variables set are persisted through the container runtime. They can be viewed using docker inspect.

Consider this Dockerfile:

FROM ubuntu:latest

ENV LOGS_DIR="/var/log"

ENV APPS_DIR /apps/

Build the Docker image:

docker build -t sathyabhat/env .

[+] Building 1.7s (6/6) FINISHED

=> [internal] load build definition from Dockerfile.listing-4-x-6 0.0s

=> => transferring dockerfile: 50B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/ubuntu:latest 1.6s

=> [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s

=> CACHED [1/1] FROM docker.io/library/ubuntu:latest@sha256:b3e2e47 0.0s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:23eb815 0.0s

=> => naming to docker.io/sathyabhat/env

You can inspect the environment variables by using the following command:

docker inspect sathyabhat/env | jq ".[0].Config.Env"

The output will be as follows:

[

"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",

"LOGS_DIR=/var/log",

"APPS_DIR=/apps/"

]

The environment variables defined for a container can be changed when running a container with the -e flag. In the previous example, change the LOGS_DIR value to /logs for a container. This is achieved by typing the following command:

docker run -it -e LOGS_DIR="/logs" sathyabhat/env

You can confirm the changed value by typing the following command at the terminal:

printenv | grep LOGS

LOGS_DIR=/logs

Type exit to close the interactive terminal of the container. To assign multiple environment variables, pass the additional environment variables using the -e flag, just as the first environment variable. In the previous example , if you were to override LOGS_DIR as well as APPS_DIR, it can be done using the following command:

docker run -it -e LOGS_DIR="/logs" -e APPS_DIR="/opt" sathyabhat/env

printenv | grep DIR

LOGS_DIR=/logs

APPS_DIR=/opt

Type exit to close the interactive terminal of the container.

VOLUME

The VOLUME instruction tells Docker to create a mount point on the container and mount it externally from the host. For instance, an instruction like this:

VOLUME /var/logs/nginx

tells Docker to mark the /var/logs/nginx directory as a mount point, with the data being mounted from the Docker host. This, when combined with the volume flag on the Docker run command, will result in data being persisted on the Docker host as a volume. This volume can then be backed up, moved, or transferred using Docker CLI commands. You will learn more about volumes in a later chapter in this book.

EXPOSE

The EXPOSE instruction tells Docker that the container listens for the specified network ports at runtime. The syntax is as follows:

EXPOSE <port> [<port>/<protocol>...]

For example, if you want to expose port 80, the EXPOSE instruction is as follows:

EXPOSE 80

If you want to expose port 53 on TCP and UDP, the Dockerfile instruction is the following:

EXPOSE 53/tcp

EXPOSE 53/udp

You can also include the port number and whether the port listens on TCP/UDP or both. If not specified, Docker assumes the protocol to be TCP.

Note

An EXPOSE instruction doesn’t publish the port. For the port to be published to the host, you need to use the -p flag with docker run to publish and map the ports.

Here’s a sample Dockerfile that uses the nginx image with port 80 exposed in the container.

FROM nginx:alpine

EXPOSE 80

Build the container:

[+] Building 0.4s (5/5) FINISHED

=> [internal] load build definition from Dockerfile 0.0s

=> => transferring dockerfile: 50B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/nginx:alpine 0.2s

=> CACHED [1/1] FROM docker.io/library/nginx:alpine@sha256:9152859 0.0s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:33fcd52 0.0s

=> => naming to docker.io/sathyabhat/web

To run this container, you have to provide the host port to which it is to be mapped . Map it to port 8080 on the host to port 80 of the container. To do that, type the following command:

docker run -d -p 8080:80 sathyabhat:web

The -d flag makes the nginx container run in the background and the -p flag does the port mapping. Confirm that the container is running:

curl http://localhost:8080

<!DOCTYPE html>

<html>

<head>

<title>Welcome to nginx!</title>

<style>

body {

width: 35em;

margin: 0 auto;

font-family: Tahoma, Verdana, Arial, sans-serif;

}

</style>

</head>

<body>

<h1>Welcome to nginx!</h1>

<p>If you see this page, the nginx web server is successfully installed and

working. Further configuration is required.</p>

<p>For online documentation and support please refer to

<a href="http://nginx.org/">nginx.org</a>.<br/>

Commercial support is available at

<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>

</body>

</html>

LABEL

The LABEL instruction adds metadata to an image as a key-value pair.

LABEL <key>=<value> <key>=<value> <key>=<value> …

An image can have multiple labels and is typically used to add some metadata to assist in searching and organizing images and other Docker objects. Docker recommends the following guidelines.

For keys:
- Authors of third-party tools should prefix each key with reverse DNS notation of a domain owned by them: for example, com.sathyasays.my-image.
- com.docker.*, io.docker.*, and org.dockerproject.* are reserved by Docker for internal use.
- Label keys should begin and end with lowercase letters and should contain only lowercase alphanumeric characters and the period (.) and hyphen (-) characters. Consecutive hyphens and periods are not allowed.
- The period (.) separates the namespace fields.
For values:
- Label values can contain any data type that can be represented as a string, including JSON, XML, YAML, and CSV types .

Guidelines and Recommendations for Writing Dockerfiles

The following are some guidelines and best practices for writing Dockerfiles as recommended by Docker.

Containers should be ephemeral. Docker recommends that images generated by Dockerfiles should be as ephemeral as possible. You should be able to stop, destroy, and restart the container at any point with minimal setup and configuration to the container. The container should ideally not write data to the container filesystem, and any persistent data should be written to Docker volumes or to data storage managed outside the container (for example, using a block storage like Amazon S3).
Keep the build context minimal. You read about build context earlier in this chapter. It’s important to keep the build context as minimal as possible to reduce the build times and the image size. This can be done by making effective use of the .dockerignore file.
Use multi-stage builds. Multi-stage builds help in drastically reducing the size of the image without having to write complicated scripts to transfer/keep the required artifacts. Multi-stage builds are described in the next section.
Skip unwanted packages. Having unwanted or nice-to-have packages increases the size of the image, introduces unwanted dependent packages, and increases the surface area for attacks.
Minimize the number of layers. While not as big of a concern as they used to be, it’s still important to reduce the number of layers in the image. As of Docker 1.10 and above, only RUN, COPY, and ADD instructions create layers. With these in mind, having a minimal of these instructions or combining many lines of the respective instructions reduces the number of layers, ultimately reducing the size of the image.

Using Multi-Stage Builds

As of version 17.05 and above, Docker added support for multi-stage builds, allowing complex image builds to be performed without the Docker image being unnecessarily bloated. Multi-stage builds are especially useful when you’re building images of applications that require some additional build-time dependencies but are not needed during runtime. Most common examples are applications written using programming languages such as Go or Java, where prior to multi-stage builds, it was common to have two different Dockerfiles. One was for the build and the other was for the release and the orchestration of the artifacts from the build time image to the runtime image.

With multi-stage builds, a single Dockerfile can be leveraged for build and deploy images—the build images can contain the build tools required for generating the binary or the artifact. In the second stage, the artifact can be copied over to the runtime image, thereby considerably reducing the size of the runtime image. For a typical multi-stage build, a build stage has several layers—each layer for installing tools required to build the application, generate the dependencies, and generate the application. In the final layer, the application built from build stages is copied over to the final layer and only that layer is considered for building the image. The build layers are discarded, drastically reducing the size of the final image.

Although this book doesn’t focus on multi-stage builds in detail, you will try an exercise on how to create a multi-stage build and see how much smaller using a slim image with multi-stage build makes the final image. More details about multi-stage builds are available on Docker’s website at https://docs.docker.com/develop/develop-images/multistage-build/.

Exercises

Building a Simple Hello World Docker Image

The start of the chapter introduced a simple Dockerfile that did not build due to syntax errors. In this exercise, you see how to fix that Dockerfile and add some of the instructions that you learned in this chapter.

Tip The source code and associated Dockerfile are available on the GitHub repo of the book, at https://github.com/Apress/practical-docker-with-python, in the source-code/chapter-4/exercise-1 directory.

The original Dockerfile is as follows:

FROM ubuntu:latest

LABEL author="sathyabhat"

LABEL description="An example Dockerfile"

RUN apt-get install python

COPY hello-world.py

CMD python hello-world.py

Trying to build this will result in an error since hello-world.py is missing. Let’s fix the build error. To do this, you need to add a hello-world.py that reads an environment variable, NAME, and prints Hello, $NAME!. If the environment variable is not defined, it will print "Hello, World!".

The contents of hello-world.py are as follows:

#!/usr/bin/env python3

from os import getenv

if getenv('NAME') is None:

name = 'World'

else:

name = getenv('NAME')

print(f"Hello, {name}!")

The corrected Dockerfile is as follows:

FROM python:3-alpine

LABEL description="Dockerfile for Python script which prints Hello, Name"

COPY hello-world.py /app/

ENV NAME=Readers

CMD python3 /app/hello-world.py

Build the Dockerfile:

docker build -t sathyabhat/chap04-ex1 .

[+] Building 1.9s (8/8) FINISHED

=> [internal] load build definition from Dockerfile 0.0s

=> => transferring dockerfile: 37B 0.0s

=> [internal] load .dockerignore 0.0s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/python:3-alpine 1.7s

=> [auth] library/python:pull token for registry-1.docker.io 0.0s

=> [internal] load build context 0.0s

=> => transferring context: 36B 0.0s

=> [1/2] FROM docker.io/library/python:3-alpine@sha256:3998e97 0.0s

=> CACHED [2/2] COPY hello-world.py /app/ 0.0s

=> exporting to image 0.0s

=> => exporting layers 0.0s

=> => writing image sha256:538be87 0.0s

=> => naming to docker.io/sathyabhat/chap04-ex1

Confirm the image name and size:

docker images sathyabhat/chap04-ex1

REPOSITORY TAG IMAGE ID CREATED SIZE

sathyabhat/chap04-ex1 latest 538be873d192 3 hours ago 45.1MB

Run the Docker image:

docker run sathyabhat/chap04-ex1

Hello, Readers!

Try overriding the environment variable at runtime. You can do this by providing the -e parameter with docker run:

docker run -e NAME=all sathyabhat/chap04-ex1

Hello, all!

Congrats! You’ve successfully written your first Dockerfile and built your first Docker image.

A Look at Slim Docker Release Image (Using Multi-Stage Builds)

In this exercise, you will build two Docker images. The first image uses a standard build with python:3 as the base image, whereas the second image gives an overview of how multi-stage builds can be utilized.

Tip The source code and associated Dockerfile are available on the GitHub repo of the book at https://github.com/Apress/practical-docker-with-python, in the source-code/chapter-4/exercise-2/ directory.

Building the Docker Image Using a Standard Build

Create a requirements.txt file with the following content:

praw==3.6.0

Create a Dockerfile with the following content:

FROM python:3

COPY requirements.txt .

RUN pip install -r requirements.txt

Now build the Docker image:

[+] Building 7.2s (8/8) FINISHED

=> [internal] load build definition from Dockerfile 0.3s

=> => transferring dockerfile: 114B 0.0s

=> [internal] load .dockerignore 0.3s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/python:3 0.0s

=> [internal] load build context 0.6s

=> => transferring context: 54B 0.0s

=> [1/3] FROM docker.io/library/python:3 1.6s

=> [2/3] COPY requirements.txt . 0.2s

=> [3/3] RUN pip install -r requirements.txt 3.3s

=> exporting to image 1.6s

=> => exporting layers 1.5s

=> => writing image sha256:03191af 0.0s

=> => naming to docker.io/sathyabhat/base-build

The image was built successfully! Let’s determine the size of the image:

docker images sathyabhat/base-build

Repository	Tag	Image ID	Created	Size
sathyabhat/base-build	latest	03191af	About a minute ago	895MB

The Docker image sits at a fairly hefty 895MB, even though you did not add any of your application code, just a dependency. Let’s rewrite it to a multi-stage build.

Building the Docker Image Using a Multi-Stage Build

FROM python:3 as python-base

COPY requirements.txt .

RUN pip install -r requirements.txt

FROM python:3-alpine

COPY --from=python-base /root/.cache /root/.cache

COPY --from=python-base requirements.txt .

RUN pip install -r requirements.txt && rm -rf /root/.cache

The Dockerfile is different in that there are multiple FROM statements, signifying the different stages. In the first stage, you build the required packages using the python:3 image, which has the necessary build tools.

In the second stage, you copy the files installed in the first stage, reinstall them (notice this time that pip fetches the cached files and doesn’t build them again), and then delete the cached install files. The build logs are shown here:

[+] Building 0.6s (13/13) FINISHED

=> [internal] load build definition from Dockerfile 0.2s

=> => transferring dockerfile: 35B 0.0s

=> [internal] load .dockerignore .1s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/python:3-alpine .2s

=> [internal] load metadata for docker.io/library/python:3 0.0s

=> [internal] load build context .1s

=> => transferring context: 37B 0.0s

=> [stage-1 1/4] FROM docker.io/library/python:3-alpine@sha256:3998e97 0.0s

=> [python-base 1/3] FROM docker.io/library/python:3 0.0s

=> CACHED [python-base 2/3] COPY requirements.txt . 0.0s

=> CACHED [python-base 3/3] RUN pip install -r requirements.txt 0.0s

=> CACHED [stage-1 2/4] COPY --from=python-base /root/.cache /root/.cache 0.0s

=> CACHED [stage-1 3/4] COPY --from=python-base requirements.txt . 0.0s

=> CACHED [stage-1 4/4] RUN pip install -r requirements.txt && rm -rf /root/.cache 0.0s

=> exporting to image 0.1s

=> => exporting layers 0.0s

=> => writing image sha256:35c85a8 0.0s

=> => naming to docker.io/sathyabhat/multistage-build

Examining the size of the image using docker images shows you that using a multi-stage build has reduced the image size by quite a lot. This translates to reduced image sizes, faster application starts, and even reduced costs, as you are saving on bandwidth that is required to pull the container image.

docker images sathyabhat/multistage-build

Repository	Tag	Image ID	Created	Size
sathyabhat/multistage-build	latest	35c85a8497b5	About a minute ago	54.2MB

Writing a Dockerfile For Newsbot

In this exercise, you will write the Dockerfile for Newsbot, the Telegram chatbot project.

Let’s review what you need for this project:

A Docker image based on Python 3
The project dependencies listed in requirements.txt
An environment variable named NBT_ACCESS_TOKEN

Now that you have what you need, you can compose the Dockerfile. The general steps to composing a Dockerfile are as follows

1.
Start with a proper base image.
2.
Make a list of files required for the application.
3.
Make a list of environment variables required for the application.
4.
Copy the application files to the image using the COPY instruction.
5.
Specify the environment variable with the ENV instruction.

Combining these steps, you arrive at this Dockerfile.

FROM python:3-alpine

WORKDIR /apps/subredditfetcher/

COPY . .

RUN ["pip", "install", "-r", "requirements.txt"]

CMD ["python", "newsbot.py"]

Now build the image:

[+] Building 0.9s (9/9) FINISHED

=> [internal] load build definition from Dockerfile 0.1s

=> => transferring dockerfile: 182B 0.0s

=> [internal] load .dockerignore 0.2s

=> => transferring context: 2B 0.0s

=> [internal] load metadata for docker.io/library/python:3-alpine 0.4s

=> [1/4] FROM docker.io/library/python:3-alpine@sha256:3998e97 0.0s

=> [internal] load build context 0.1s

=> => transferring context: 392B 0.0s

=> CACHED [2/4] WORKDIR /apps/subredditfetcher/ 0.0s

=> CACHED [3/4] COPY . . 0.0s

=> CACHED [4/4] RUN ["pip", "install", "-r", "requirements.txt"] 0.0s

=> exporting to image 0.1s

=> => exporting layers 0.0s

=> => writing image sha256:783b4c0 0.0s

=> => naming to docker.io/sathyabhat/newsbot

Now run the container. Take care to replace <token> with the Telegram Bot API key that you created in Chapter 3.

docker run –e NBT_ACCESS_TOKEN=<token> sathyabhat/newsbot

You should be seeing logs from the bot to ensure that it’s running:

INFO: <module> - Starting up

INFO: get_updates - received response: {'ok': True, 'result': []}

If you see these logs, congratulations! Not only did you write the Dockerfile for Newsbot, but you also built it and ran it successfully.

Summary

In this chapter, you gained a better understanding of what a Dockerfile is by reviewing its syntax. You are now one step closer to mastering writing a Dockerfile for Newsbot.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
4. Understanding the Dockerfile

4. Understanding the Dockerfile

Dockerfile Primer

Build Context

Dockerignore

BuildKit

Building Using Docker Build

Tags

Dockerfile Instructions

FROM

WORKDIR

ADD and COPY

RUN

Layer Caching

CMD and ENTRYPOINT

ENV

VOLUME

EXPOSE

LABEL

Guidelines and Recommendations for Writing Dockerfiles

Using Multi-Stage Builds

Exercises

Summary

Table of Contents for 4. Understanding the Dockerfile

Create new playlist

Sign In

Sign Up

4. Understanding the Dockerfile

Dockerfile Primer

Build Context

Dockerignore

BuildKit

Building Using Docker Build

Tags

Dockerfile Instructions

FROM

WORKDIR

ADD and COPY

RUN

Layer Caching

CMD and ENTRYPOINT

ENV

VOLUME

EXPOSE

LABEL

Guidelines and Recommendations for Writing Dockerfiles

Using Multi-Stage Builds

Exercises

Summary

Table of Contents for
4. Understanding the Dockerfile