Chapter 5. Working with Docker Containers

In the previous chapter, we learned how to build a Docker image and the very basic steps required for running the resulting image within a container. In this chapter, we’ll first take a look at where containers came from and then dive deeper into containers and the Docker commands that control the overall configuration, resources, and privileges that your container receives.

What Are Containers?

You might be familiar with virtualization systems like VMware or Xen that allow you to run a complete Linux kernel and operating system on top of a virtualized layer, commonly called a hypervisor. This approach provides very strong isolation between virtual machines because each hosted kernel sits in separate memory space and has defined entry points into the actual hardware, either through another kernel or something that looks like hardware.

Containers are a fundamentally different approach where all containers share a single kernel and isolation is implemented entirely within that single kernel. This is called operating system virtualization. The libcontainer project gives a good, short definition of a container: “A container is a self-contained execution environment that shares the kernel of the host system and which is (optionally) isolated from other containers in the system.” The major advantages are around efficiency of resources because you don’t need a whole operating system for each isolated function. Since you are sharing a kernel, there is one less layer of indirection between the isolated task and the real hardware underneath. When a process is running inside a container, there is only a very little shim that sits inside the kernel rather than potentially calling up into a whole second kernel while bouncing in and out of privileged mode on the processor.

But the container approach means that you can only run processes that are compatible with the underlying kernel. Unlike hardware virtualization like that provided by VMware, for example, Windows applications cannot run inside a Linux container. So containers are best thought of as a Linux technology where, at least for now, you can run any of your favorite Linux applications or servers. When thinking of containers, you should try very hard to throw out what you might already know about virtual machines and instead conceptualize a container as a wrapper around a process that actually runs on the server.

History of Containers

It is often the case that a revolutionary technology is an older technology that has finally arrived in the spotlight. Technology goes in waves, and some of the ideas from the 1960s are back in vogue. Similarly, Docker is a new technology and it has an ease of use that has made it an instant hit, but it doesn’t exist in a vacuum. Much of what underpins Docker comes from work done over the last 30 years in a few different arenas: from a system call added to the Unix kernel in the late 1970s, to tooling built on modern Linux. It’s worth a quick tour through how we got to Docker because understanding that helps you place it within the context of other things you might be familiar with.

Containers are not a new idea. They are a way to isolate and encapsulate a part of the running system. The oldest technology in that area were the first batch processing systems. You’d run a program for a while, then switch to run another program. There was isolation: you could make sure your program didn’t step on anyone else’s program. That’s all pretty crude now, but it’s the very first step on the road to Linux containers and Docker.

Most people would argue that the seeds for today’s containers were planted in 1979 with the addition of the chroot system call to Version 7 Unix. chroot restricts a process’s view of the underlying filesystem. The chroot system call is commonly used to protect the operating system from untrusted server processes like FTP, BIND, and Sendmail, which are publicly exposed and susceptible to compromise.

In the 1980s and 1990s, various Unix variants were created with mandatory access controls for security reasons.1 This meant you had tightly controlled domains running on the same Unix kernel. Processes in each domain had an extremely limited view of the system that precluded them from interacting across domains. A popular commercial version of Unix that implemented this idea was the Sidewinder firewall built on top of BSDI Unix. But this was not possible in most mainstream Unix implementations.

That changed in 2000 when FreeBSD 4.0 was released with a new command, called jail, which was designed to allow shared-environment hosting providers to easily and securely create a separation between their processes and those of their individual customers. FreeBSD jail expanded chroot’s capabilities, but restricted everything a process could do with the underlying system and processes in other jails.

In 2004, Sun released an early build of Solaris 10, which included Solaris Containers, and later evolved into Solaris Zones. This was the first major commercial implementation of container technology and is still used today to support many commercial container implementations. In 2007, HP released Secure Resource Partitions for HP-UX, later renamed to HP-UX Containers; and finally, in 2008, Linux Containers (LXC) were released in version 2.6.24 of the Linux kernel. The phenomenal growth of Linux Containers across the community did not really start to grow until 2013 with the inclusion of user namespaces in version 3.8 of the Linux Kernel and the release of Docker one month later.

Companies that had to deal with scaling applications to the size of the Internet, with Google being a very early example, started pushing container technology in the early 2000s in order to facilitate distributing their applications across data centers full of computers. A few companies maintained their own patched kernels with container support for internal use. Google contributed some of its work to support containers into the mainline Linux kernel, as understanding about the broader need for these features began to increase in the Linux community.

In late 2013, months after the Docker announcement, Google released lmctfy, the open source version of the internal container engine it had been running for some years. By this time, Docker was already widely discussed in the press. It was the right combination of ease of use and enabling technology just at the right time. Other promising container engines, like CoreOS Rocket, have been released since, but Docker seems to have built up a head of steam that is currently powering it to the forefront.

Note

If you haven’t heard about CoreOS Rocket, you might be wondering what it is. Rocket is an open source container runtime that CoreOS is designing to address what they see as serious deficiencies with the Docker approach to containerization and the supporting tool set. It is left as an exercise for the reader to determine whether the CoreOS approach and solution fits your needs.

Now let’s turn back to Docker and take a closer look at modern containers.

Creating a Container

So far we’ve started containers using the handy docker run command. But docker run is really a convenience command that wraps two separate steps into one. The first thing it does is create a container from the underlying image. This is accomplished separately using the docker create command. The second thing docker run does is execute the container, which we can also do separately with the docker start command.

The docker create and docker run commands both contain all the options that pertain to how a container is initially set up. In Chapter 4, we demonstrated that with the docker run command you could map network ports in the underlying container to the host using the -p argument, and that -e could be used to pass environment variables into the container.

This only just begins to touch on the array of things that you can configure when you first create a container. So let’s take a pass over some of the options that docker supports.

Basic Configuration

Now let’s take a look at some of the ways we can tell Docker to configure our container when we create it.

Container name

When you create a container, it is built from the underlying image, but various command-line arguments can affect the final settings. Settings specified in the Dockerfile are always used as defaults, but you can override many of them at creation time.

By default, Docker randomly names your container by combining an adjective with the name of a famous person. This results in names like ecstatic-babbage and serene-albattani. If you want to give your container a specific name, you can do so using the --name argument.

$ docker create --name="awesome-service" ubuntu:latest
Warning

You can only have one container with any given name on a Docker host. If you run the above command twice in a row, you will get an error. You must either delete the previous container using docker rm or change the name of the new container.

Labels

As mentioned in Chapter 4, labels are key-value pairs that can be applied to Docker images and containers as metadata. When new Docker containers are created, they automatically inherit all the labels from their parent image.

It is also possible to add new labels to the containers so that you can apply metadata that might be specific to that single container.

docker run -d --name labels -l deployer=Ahmed -l tester=Asako 
  ubuntu:latest sleep 1000

You can then search for and filter containers based on this metadata, using commands like docker ps.

$ docker ps -a -f label=deployer=Ahmed
CONTAINER ID  IMAGE         COMMAND       ... NAMES
845731631ba4  ubuntu:latest "sleep 1000"  ... labels

You can use the docker inspect command on the container to see all the labels that a container has.

$ docker inspect 845731631ba4
...
        "Labels": {
            "deployer": "Ahmed",
            "tester": "Asako"
        },
...

Hostname

By default, when you start a container, Docker will copy certain system files on the host, including /etc/hostname, into the container’s configuration directory on the host,2 and then use a bind mount to link that copy of the file into the container. We can launch a default container with no special configuration like this:

$ docker run --rm -ti ubuntu:latest /bin/bash

This command uses the docker run command, which runs docker create and docker start in the background. Since we want to be able to interact with the container that we are going to create for demonstration purposes, we pass in a few useful arguments. The --rm argument tells Docker to delete the container when it exits, the -t argument tells Docker to allocate a psuedo-TTY, and the -i argument tells Docker that this is going to be an interactive session, and we want to keep STDIN open. The final argument in the command is the exectuable that we want to run within the container, which in this case is the ever useful /bin/bash.

If we now run the mount command from within the resulting container, we will see something similar to this:

root@ebc8cf2d8523:/# mount
overlay on / type overlay (rw,relatime,lowerdir=...,upperdir=...,workdir...)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,mode=755)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,...,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
/dev/sda9 on /etc/resolv.conf type ext4 (rw,relatime,data=ordered)
/dev/sda9 on /etc/hostname type ext4 (rw,relatime,data=ordered)
/dev/sda9 on /etc/hosts type ext4 (rw,relatime,data=ordered)
devpts on /dev/console type devpts (rw,nosuid,noexec,relatime,...,ptmxmode=000)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime)
tmpfs on /proc/kcore type tmpfs (rw,nosuid,mode=755)
root@ebc8cf2d8523:/#
Note

When you see any examples with a prompt that looks something like root@hostname, it means that you are running a command within the container instead of on the Docker host.

There are quite a few bind mounts in a container, but in this case we are interested in this one:

/dev/sda9 on /etc/hostname type ext4 (rw,relatime,data=ordered)

While the device number will be different for each container, the part we care about is that the mount point is /etc/hostname. This links the container’s /etc/hostname to the hostname file that Docker has prepared for the container, which by default contains the container’s ID and is not fully qualified with a domain name.

We can check this in the container by running the following:

root@ebc8cf2d8523:/# hostname -f
ebc8cf2d8523
root@ebc8cf2d8523:/# exit
Note

Don’t forget to exit the container shell so that we return to the Docker host when finished.

To set the hostname specifically, we can use the --hostname argument to pass in a more specific value.

$ docker run --rm -ti --hostname="mycontainer.example.com" ubuntu:latest /bin/bash

Then, from within the container, we will see that the fully-qualified hostname is defined as requested.

root@mycontainer:/# hostname -f
mycontainer.example.com
root@mycontainer:/# exit

Domain Name Service (DNS)

Just like /etc/hostname, the resolv.conf file is managed via a bind mount between the host and container.

/dev/sda9 on /etc/resolv.conf type ext4 (rw,relatime,data=ordered)

By default, this is an exact copy of the Docker host’s resolv.conf file. If we didn’t want this, we could use a combination of the --dns and --dns-search arguments to override this behavior in the container:

$ docker run --rm -ti --dns=8.8.8.8 --dns=8.8.4.4 --dns-search=example1.com 
  --dns-search=example2.com ubuntu:latest /bin/bash
Note

If you want to leave the search domain completely unset, then use --dns-search=.

Within the container, we would still see a bind mount, but the file contents would no longer reflect the host’s resolv.conf; instead, it now looks like this:

root@0f887071000a:/# more /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4
search example1.com example2.com
root@0f887071000a:/# exit

Media Access Control (MAC) address

Another important piece of information that you can configure is the MAC address for the container.

Without any configuration, a container will receive a calculated MAC address that starts with the 02:42:ac:11 prefix.

If you need to specifically set this to a value, you can do this by running something similar to this:

$ docker run --rm -ti --mac-address="a2:11:aa:22:bb:33" ubuntu:latest /bin/bash

Normally you will not need to do that. But sometimes you want to reserve a particular set of MAC addresses for your containers in order to avoid other virtualization layers that use the same private block as Docker.

Warning

Be very careful when customizing the MAC address settings. It is possible to cause ARP contention on your network if two systems advertise the same MAC address. If you have a strong need to do this, try to keep your locally administered address ranges within some of the official ranges, like x2-xx-xx-xx-xx-xx, x6-xx-xx-xx-xx-xx, xA-xx-xx-xx-xx-xx, and xE-xx-xx-xx-xx-xx (with x being any valid hexidecimal character).

Storage Volumes

There are times when the default disk space allocated to a container or its ephemeral nature is not appropriate for the job at hand and it is necessary to have storage that can persist between container deployments.

Warning

Mounting storage from the Docker host is not a generally advisable pattern because it ties your container to a particular Docker host for its persistent state. But for cases like temporary cache files or other semi-ephemeral states, it can make sense.

For the times when we need to do this, we can leverage the -v command to mount filesystems from the host server into the container. In the following example, we are mounting /mnt/session_data to /data within the container:

$ docker run --rm -ti -v /mnt/session_data:/data ubuntu:latest /bin/bash
root@0f887071000a:/# mount | grep data
/dev/sda9 on /data type ext4 (rw,relatime,data=ordered)
root@0f887071000a:/# exit
Tip

If you have SELinux enabled on your Docker host, you may get a “Permission Denied” error when trying to mount a volume into your container. There are a few ways to handle this. The most direct method is to simply set the right context on the directory that you are trying to mount:

$ sudo chcon -Rt svirt_sandbox_file_t /var/lib/dhcpd

As of Docker 1.7, it is also possible to handle this directly from the Docker command line. If you are going to share a volume between containers, you can use the z option to the volume mount. This is identical to the above chcon command:

docker run -v /etc/dhcpd:/etc/dhcpd:z dhcpd

However, the best option is to actually utilize the Z option to the volume mount command, which will set the directory with the exact MCS label (e.g., chcon … -l s0:c1,c2) that the container will be using. This provides for the best security and will only allow a single container to mount the volume:

docker run -v /etc/dhcpd:/etc/dhcpd:Z dhcpd

In the mount options, we can see that the filesystem was mounted read-write on /data as we expected.

Note

The mount point in the container does not need to pre-exist for this command to work properly. However, the host mount point must exist. Auto creation of the host directory was deprecated in docker version 1.9.

If the container application is designed to write into /data, then this data will be visible on the host filesystem in /mnt/session_data and would remain available when this container was stopped and a new container started with the same volume mounted.

In Docker 1.5, a new command was added that allows the root volume of your container to be mounted read-only so that processes within the container cannot write anything to the root filesystem. This prevents things like logfiles, which a developer was unaware of, from filling up the container’s allocated disk in production. When used in conjunction with a mounted volume, you can ensure that data is only written into expected locations.

In our previous example, we could accomplish this by simply adding --read-only=true to the command.

$ docker run --rm -ti --read-only=true -v /mnt/session_data:/data 
  ubuntu:latest /bin/bash
root@df542767bc17:/# mount | grep " / "
overlay on / type overlay (ro,relatime,lowerdir=...,upperdir=...,workdir=...)
root@df542767bc17:/# mount | grep data
/dev/sda9 on /data type ext4 (rw,relatime,data=ordered)
root@df542767bc17:/# exit

If we look closely at the mount options for the root directory, we will notice that they are mounted with the ro option, which makes it read-only. However, the /session_data mount is still mounted with the rw option so that our application can successfully write to the one volume to which we have designed it to write.

Sometimes it is necessary to make a directory like /tmp writeable, even when the rest of the container is read-only. In Docker 1.10, the --tmpfs attribute was added to docker run, so that you can mount a tmpfs filesystem into the container. Any data in these tmpfs directories will be lost when the container is stopped. The following command example shows a container being launched with a tmpfs filesystem mounted at /tmp with the rw, noexec, nodev, nosuid, and size=256M mount options set:

$ docker run --rm -ti --read-only=true --tmpfs 
  /tmp:rw,noexec,nodev,nosuid,size=256M ubuntu:latest /bin/bash
root@25b4f3632bbc:/# df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           256M     0  256M   0% /tmp
root@25b4f3632bbc:/# grep /tmp /etc/mtab
tmpfs /tmp tmpfs rw,seclabel,nosuid,nodev,noexec,relatime,size=262144k 0 0
root@25b4f3632bbc:/# exit
Warning

Containers should be designed to be stateless whenever possible. Managing storage creates undesirable dependencies and can easily make deployment scenarios much more complicated.

Resource Quotas

When people discuss the types of problems that you must often cope with when working in the cloud, the concept of the “noisy neighbor” is often near the top of the list. The basic problem this term refers to is that other applications, running on the same physical system as yours, can have a noticeable impact on your performance and resource availability.

Traditional virtual machines have the advantage that you can easily and very tightly control how much memory and CPU, among other resources, are allocated to the virtual machine. When using Docker, you must instead leverage the cgroup functionality in the Linux kernel to control the resources that are available to a Docker container. The docker create command directly supports configuring CPU and memory restrictions when you create a container.

Note

Constraints are applied at the time of container creation. Constraints that you apply at creation time will exist for the life of the container. In most cases, if you need to change them, then you need to create a new container from the same image and change the constraints, unless you manipulate the kernel cgroups directly under the /sys filesystem.

There is an important caveat here. While Docker supports CPU and memory limits, as well as swap limits, you must have these capabilities enabled in your kernel in order for Docker to take advantage of them. You might need to add these as command-line parameters to your kernel on startup. To figure out if your kernel supports these limits, run docker info. If you are missing any support, you will get warning messages at the bottom, like:

WARNING: No swap limit support
Note

The details regarding getting cgroup support configured for your kernel are distribution-specific, so you should consult the Docker documentation if you need help configuring things.

CPU shares

Docker thinks of CPU in terms of “cpu shares.” The computing power of all the CPU cores in a system is considered to be the full pool of shares. 1024 is the number that Docker assigns to represent the full pool. By configuring a container’s CPU shares, you can dictate how much time the container gets to use the CPU for. If you want the container to be able to use at most half of the computing power of the system, then you would allocate it 512 shares. Note that these are not exclusive shares, meaning that assigning all 1024 shares to a container does not prevent all other containers from running. Rather it’s a hint to the scheduler about how long each container should be able to run each time it’s scheduled. If we have one container that is allocated 1024 shares (the default) and two that are allocated 512, they will all get scheduled the same number of times. But if the normal amount of CPU time for each process is 100 microseconds, the containers with 512 shares will run for 50 microseconds each time, whereas the container with 1024 shares will run for 100 microseconds.

Let’s explore a little bit how this works in practice. For the following examples, we are going to use a new Docker image that contains the stress command for pushing a system to its limits.

When we run stress without any cgroup constraints, it will use as many resources as we tell it to. The following command creates a load average of around 5 by creating two CPU-bound processes, one I/O-bound process, and two memory allocation processes:

$ docker run --rm -ti progrium/stress 
  --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s
Warning

This should be a reasonable command to run on any modern computer system, but be aware that it is going to stress the host system, so don’t do this in a location that can’t take the additional load, or even a possible failure, due to resource starvation.

If you run the top command on the Docker host, near the end of the two-minute run, you can see how the system is affected by the load created by the stress program.

Note

In the following code, we are running on a system with two CPUs.

$ top -bn1 | head -n 15
top - 20:56:36 up 3 min,  2 users,  load average: 5.03, 2.02, 0.75
Tasks:  88 total,   5 running,  83 sleeping,   0 stopped,   0 zombie
%Cpu(s): 29.8 us, 35.2 sy,  0.0 ni, 32.0 id,  0.8 wa,  1.6 hi,  0.6 si,  0.0 st
KiB Mem:   1021856 total,   270148 used,   751708 free,    42716 buffers
KiB Swap:        0 total,        0 used,        0 free.    83764 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  810 root      20   0    7316     96      0 R  44.3  0.0   0:49.63 stress
  813 root      20   0    7316     96      0 R  44.3  0.0   0:49.18 stress
  812 root      20   0  138392  46936    996 R  31.7  4.6   0:46.42 stress
  814 root      20   0  138392  22360    996 R  31.7  2.2   0:46.89 stress
  811 root      20   0    7316     96      0 D  25.3  0.0   0:21.34 stress
    1 root      20   0  110024   4916   3632 S   0.0  0.5   0:07.32 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.04 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.11 ksoftirqd/0

If you want run the exact same stress command again, with only half the amount of available CPU time, you can run it like this:

$ docker run --rm -ti --cpu-shares 512 progrium/stress 
  --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s

The --cpu-shares 512 is the flag that does the magic, allocating 512 CPU shares to this container. Note that the effect might not be noticeable on a system that is not very busy. That’s because the container will continue to be scheduled for the same time-slice length whenever it has work to do, unless the system is constrained for resources. So in our case, the results of a top command on the host system will likely look exactly the same, unless you run a few more containers to give the CPU something else to do.

Warning

Unlike virtual machines, Docker’s cgroup-based constraints on CPU shares can have unexpected consequences. They are not hard limits; they are a relative limit, similar to the nice command. An example is a container that is constrained to half the CPU shares, but is on a system that is not very busy. Because the CPU is not busy, the limit on the CPU shares would have only a limited effect because there is no competition in the scheduler pool. When a second container that uses a lot of CPU is deployed to the same system, suddenly the effect of the constraint on the first container will be noticeable. Consider this carefully when constraining containers and allocating resources.

CPU pinning

It is also possible to pin a container to one or more CPU cores. This means that work for this container will only be scheduled on the cores that have been assigned to this container.

In the following example, we are running our stress container pinned to the first of two CPUs, with 512 CPU shares. Note that everything following the container image here are parameters to the stress command, not docker.

$ docker run --rm -ti --cpu-shares 512 --cpuset=0 progrium/stress 
  --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s
Warning

The --cpuset argument is zero-indexed, so your first CPU core is 0. If you tell Docker to use a CPU core that does not exist on the host system, you will get a Cannot start container error. On our two-CPU example host, you could test this by using --cpuset=0,1,2.

If we run top again, we should notice that the percentage of CPU time spent in user space (us) is lower than it previously was, since we have restricted two CPU-bound processes to a single CPU.

%Cpu(s): 18.5 us, 22.0 sy,  0.0 ni, 57.6 id,  0.5 wa,  1.0 hi,  0.3 si,  0.0 st
Note

When you use CPU pinning, additional CPU sharing restrictions on the container only take into account other containers running on the same set of cores.

In Docker Engine 1.7, support was added for the CPU CFS (Completely Fair Scheduler) within the Linux kernel. You can alter the CPU quota a given container has by setting the --cpu-quota flag to a valid value when launching the container with docker run.

Memory

We can control how much memory a container can access in a manner similar to constraining the CPU. There is, however, one fundamental difference: while constraining the CPU only impacts the application’s priority for CPU time, the memory limit is a hard limit. Even on an unconstrained system with 96 GB of free memory, if we tell a container that it may only have access to 24 GB, then it will only ever get to use 24 GB regardless of the free memory on the system. Because of the way the virtual memory system works on Linux, it’s possible to allocate more memory to a container than the system has actual RAM. In this case, the container will resort to using swap in the event that actual memory is not available, just like a normal Linux process.

Let’s start a container with a memory constraint by passing the -m option to the docker run command:

$ docker run --rm -ti -m 512m progrium/stress 
  --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s

When you use the -m option alone, you are setting both the amount of RAM and the amount of swap that the container will have access to. So here we’ve constrained the container to 512 MB of RAM and 512 MB of additional swap space. Docker supports b, k, m, or g, representing bytes, kilobytes, megabytes, or gigabytes, respectively. If your system somehow runs Linux and Docker and has mulitple terabytes of memory, then unfortunately you’re going to have to specify it in gigabytes.

If you would like to set the swap separately or disable it altogether, then you need to also use the --memory-swap option. The --memory-swap option defines the total amount of memory and swap available to the container. If we rerun our previous command, like so:

$ docker run --rm -ti -m 512m --memory-swap=768m progrium/stress 
  --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s

Then we are telling the kernel that this container can have access to 512 MB of memory and 256 MB of additional swap space. Setting the --memory-swap option to -1 will disable the swap completely within the container.

Warning

Unlike CPU shares, memory is a hard limit! This is good because the constraint doesn’t suddenly make a noticeable effect on the container when another container is deployed to the system. But it does mean that you need to be careful that the limit closely matches your container’s needs because there is no wiggle room.

So, what happens if a container reaches its memory limit? Well, let’s give it a try by modifying one of our previous commands and lowering the memory significantly:

$ docker run --rm -ti -m 100m progrium/stress --cpu 2 --io 1 
  --vm 2 --vm-bytes 128M --timeout 120s

Where all our other runs of the stress container ended with the line:

stress: info: [1] successful run completed in 120s

We see that this run quickly fails with the line:

stress: FAIL: [1] (452) failed run completed in 1s

This is because the container tries to allocate more memory than it is allowed, and the Linux Out of Memory (OOM) killer is invoked and starts killing processes within the cgroup to reclaim memory. Since our container has only one running process, this kills the container.

Docker 1.10 has added features that allow you to tune and disable the Linux Out of Memory killer by using the --oom-kill-disable and the --oom-score-adj arguments to docker run

Note

As of Docker 1.9, it is also possible to specifically limit the amount of kernel memory available to a container by using the --kernel-memory argument to docker run or docker create.

Block I/O

In Docker 1.7, support was added to apply some prioritization to a container’s use of block device I/O. This is managed by manipulating the default setting of the blkio.weight cgroup attribute, which can have a value of 10 to 1000, and defaults to 500. The system will divide all of the available I/O between every process within a cgroup slice, with the assigned weights impacting how much I/O each individual process receives.

To set this weight on a container, you need to pass the --blkio-weight to your docker run command with a valid value.

To read more technical details about this kernel feature, take a look at the blkio-controller kernel documentation.

Note

The release of Docker 1.10, introduced even more block I/O tuning features, and added the docker update command, which can be used to dynamically adjust the resources limits of one of more containers. The following example shows how you could adjust the memory limit on 2 containers simultaneously: docker update --memory="1024M" 6b785f78b75e 92b797f12af1

ulimits

Another common way to limit resources avaliable to a process in Unix is through the application of user limits. The following code is a list of the types of things that can usually be configured by setting soft and hard limits via the ulimit command:

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 5835
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Before the release of Docker 1.6, all containers inherited the ulimits of the Docker daemon. This is usually not appropriate because the Docker server requires more resources to perform its job than any individual container.

It is now possible to configure the Docker daemon with the default user limits that you want to apply to every container. The following command would tell the Docker daemon to start all containers with a soft limit of 50 open files and a hard limit of 150 open files:

$ sudo docker daemon --default-ulimit nofile=50:150

You can then override these ulimits on a specific container by passing in values using the --ulimit argument.

$ docker run -d --ulimit nofile=150:300 nginx

There are some additional advanced commands that can be used when creating containers, but this covers many of the more common use cases. The Docker client documentation lists all the available options and is kept current with each Docker release.

Starting a Container

Earlier in the chapter we used the docker create command to create our container. When we are ready to start the container, we can use the docker start command.

Let’s say that we needed to run a copy of Redis, a common key-value store. We won’t really do anything with this Redis container, but it’s a long-lived process and serves as an example of something we might do in a real environment. We could first create the container using a command like the one shown here:

$ docker create -p 6379:6379 redis:2.8
Unable to find image 'redis:2.8' locally
30d39e59ffe2: Pull complete
...
868be653dea3: Pull complete
511136ea3c5a: Already exists
redis:2.8: The image you are pulling has been verified. Important: ...
Status: Downloaded newer image for redis:2.8
6b785f78b75ec2652f81d92721c416ae854bae085eba378e46e8ab54d7ff81d1

The command ends with the full hash that was generated for the container. However, if we didn’t know the full or short hash for the container, we could list all the containers on the system, whether they are running or not, using:

$ docker ps -a
CONTAINER ID  IMAGE                   COMMAND               ...
6b785f78b75e  redis:2.8               "/entrypoint.sh redi  ...
92b797f12af1  progrium/stress:latest  "/usr/bin/stress --v  ...

We can then start the container with the following command:

$ docker start 6b785f78b75e
Note

Most Docker commands will work with the full hash or a short hash. In the previous example, the full hash for the container is 6b785f78b75ec2652f81d92…bae085eba378e46e8ab54d7ff81d1, but the short hash that is shown in most command output is 6b785f78b75e. This short hash consists of the first 12 characters of the full hash.

To verify that it’s running, we can run:

$ docker ps
CONTAINER ID  IMAGE      COMMAND               ...  STATUS       ...
6b785f78b75e  redis:2.8  "/entrypoint.sh redi  ...  Up 2 minutes ...

Auto-Restarting a Container

In many cases, we want our containers to restart if they exit. Some containers are just very short-lived and come and go quickly. But for production applications, for instance, you expect them to be up after you’ve told them to run. We can tell Docker to do that on our behalf.

The way we tell Docker to do that is by passing the --restart argument to the docker run command. It takes three values: no, always, or on-failure:#. If restart is set to no, the container will never restart if it exits. If it is set to always, then the container will restart whenever the container exits with no regard to the exit code. If restart is set to on-failure:3, then whenever the container exits with a nonzero exit code, Docker will try to restart the container three times before giving up.

We can see this in action by rerunning our last memory-constrained stress container without the --rm argument, but with the --restart argument.

$ docker run -ti --restart=on-failure:3 -m 100m progrium/stress 
  --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s

In this example, we will see the output from the first run appear on the console before it dies. If we run a docker ps immediately after the container dies, we will see that Docker is attempting to restart the container.

$ docker ps
...  IMAGE                   ...  STATUS                                ...
...  progrium/stress:latest  ...  Restarting (1) Less than a second ago ...

It will continue to fail because we have not given it enough memory to function properly. After three attempts, Docker will give up and we will see the container disappear from the the output of docker ps.

Stopping a Container

Containers can be stopped and started at will. You might think that starting and stopping are analogous to pausing and resuming a normal process. It’s not quite the same, though. When stopped, the process is not paused; it actually exits. And when a container is stopped, it no longer shows up in the normal docker ps output. On reboot, docker will attempt to start all of the containers that were running at shutdown. It uses this same mechanism, and it’s also useful when testing or for restarting a failed container. We can simply pause a Docker container with docker pause and unpause, discussed later. But let’s stop our container now:

$ docker stop 6b785f78b75e
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

Now that we have stopped the container, nothing is in the ps list! We can start it back up with the container ID, but it would be really inconvenient to have to remember that. So docker ps has an additional option (-a) to show all containers, not just the running ones.

$ docker ps -a
CONTAINER ID  IMAGE                  STATUS                   ...
6b785f78b75e  progrium/stress:latest Exited (0) 2 minutes ago ...

That STATUS field now shows that our container exited with a status code of 0 (no errors). We can start it back up with all of the same configuration it had before:

docker start 6b785f78b75e
6b785f78b75e
$ docker ps -a
CONTAINER ID IMAGE                   ... STATUS    ...
6b785f78b75e progrium/stress:latest  Up 15 seconds ...

Voila, our container is back up and running.

Note

Remember that containers exist even when they are not started, which means that you can always restart a container without needing to recreate it. Although memory contents will have been lost, all of the container’s filesystem contents and metadata, including environment variables and port bindings, are saved and will still be in place when you restart the container.

We keep talking about the idea that containers are just a tree of processes that interact with the system in essentially the same way as any other process on the server. That means that we can send them Unix signals, which they can respond to. In the previous docker stop example, we’re sending the container a SIGTERM signal and waiting for the container to exit gracefully. Containers follow the same process group signal propagation that any other process group would receive on Linux.

A normal docker stop sends a normal SIGTERM signal to the process. If you want to force a container to be killed if it hasn’t stopped after a certain amount of time, you can use the -t argument, like this:

$ docker stop -t 25 6b785f78b75e

This tells Docker to initially send a SIGTERM signal as before, but then if the container has not stopped within 25 seconds, to send a SIGKILL signal to forcefully kill it.

Although stop is the best way to shut down your containers, there are times when it doesn’t work and we need to forcefully kill a container.

Killing a Container

We saw what it looks like to use docker stop to stop a container, but often if a process is misbehaving, you just want it to exit immediately.

We have docker kill for that. It looks pretty much like docker stop:

$ docker kill 6b785f78b75e
6b785f78b75e

A docker ps nows shows that the container is no longer running, as expected:

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

Just because it was killed rather than stopped does not mean you can’t start it again, though. You can just issue a docker start like you would for a nicely stopped container. Sometimes you might want to send another signal to a container, one that is not stop or kill. Like the Linux kill command, docker kill supports sending any Unix signal. Let’s say we wanted to send a USR1 signal to our container to tell it to do something like reconnect a remote logging session. We could do the following:

$ docker kill --signal=USR1 6b785f78b75e
6b785f78b75e

If our container actually did something with the USR1 signal, it would now do it. Since we’re just running a bash shell, though, it just continues on as if nothing happened. Try sending a HUP signal, though, and see what happens. Remember that a HUP is the signal that is sent when the terminal closes on a foreground process.

Pausing and Unpausing a Container

Sometimes we really just want to stop our container as described above. But there are a number of times when we just don’t want our container to do anything for a while. That could be because we’re taking a snapshot of its filesystem to create a new image, or just because we need some CPU on the host for a while. If you’re used to normal Unix process handling, you might wonder how this actually works since containerized processes are just processes.

Pausing leverages the cgroups freezer, which essentially just prevents your process from being scheduled until you unfreeze it. This will prevent the container from doing anything while maintaining its overall state, including memory contents. Unlike stopping a container, where the processes are made aware that they are stopping via the SIGSTOP signal, pausing a container doesn’t send any information to the container about its state change. That’s an important distinction. Several Docker commands use pausing and unpausing internally as well. Here’s how we pause a container:

$ docker pause 6b785f78b75e

If we look at the list of running containers, we will now see that the Redis container status is listed as (Paused).

# docker ps
CONTAINER ID  IMAGE                   ...  STATUS                  ...
6b785f78b75e  progrium/stress:latest  ...  Up 36 minutes (Paused)  ...

Attempting to use the container in this paused state would fail. It’s present, but nothing is running. We can now resume the container using the docker unpause command.

$ docker unpause 6b785f78b75e
6b785f78b75e
$ docker ps
CONTAINER ID  IMAGE                   ... STATUS ...
6b785f78b75e  progrium/stress:latest  ... Up 1 second ...

It’s back to running, and docker ps correctly reflects the new state. Note that it shows “Up 1 second” now, which is when we unpaused it, not when it was last run.

Cleaning Up Containers and Images

After running all these commands to build images, create containers, and run them, we have accumulated a lot of image layers and container folders on our system.

We can list all the containers on our system using the docker ps -a command and then delete any of the containers in the list, as follows:

$ docker ps -a
CONTAINER ID  IMAGE                   ...
92b797f12af1  progrium/stress:latest  ...
...
$ docker rm 92b797f12af1

We can then list all the images on our system using:

$ docker images
REPOSITORY       TAG     IMAGE ID      CREATED       VIRTUAL SIZE
ubuntu           latest  5ba9dab47459  3 weeks ago   188.3 MB
redis            2.8     868be653dea3  3 weeks ago   110.7 MB
progrium/stress  latest  873c28292d23  7 months ago  281.8 MB

We can then delete an image and all associated filesystem layers by running:

$ docker rmi 873c28292d23
Warning

If you try to delete an image that is in use by a container, you will get a Conflict, cannot delete error. You should stop and delete the container(s) first.

There are times, especially during development cycles, when it makes sense to completely clean off all the images or containers from your system. There is no built-in command for doing this, but with a little creativity it can be accomplished reasonably easily.

To delete all of the containers on your Docker hosts, you can use the following command:

$ docker rm $(docker ps -a -q)

And to delete all the images on your Docker host, this command will get the job done:

$ docker rmi $(docker images -q -)

Newer versions of the docker ps and docker images commands both support a filter argument that can make it easy to fine-tune your delete commands for certain circumstances.

To remove all containers that exited with a nonzero state, you can use this filter:

$ docker rm $(docker ps -a -q --filter 'exited!=0')

And to remove all untagged images, you can type:

$ docker rmi $(docker images -q -f "dangling=true")
Note

You can read the official Docker documentation to explore the filtering options. At the moment there are very few filters to choose from, but more will likely be added over time. And if you are really interested, Docker is an open source project, so they are always open to public code contributions.

It is also possible to make your own very creative filters by stringing together commands using pipes (|) and other similar techniques.

Next Steps

In the next chapter, we’ll do more exploration of what Docker brings to the table. For now it’s probably worth doing a little experimentation on your own. We suggest exercising some of the container control commands we covered here so that you’re familiar with the command-line options and the overall syntax. Try interacting with stoppped or paused containers to see what you can see. Then when you’re feeling confident, head on into Chapter 6!

1 SELinux is one current implementation.

2 Typically under /var/lib/docker/containers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.37.174