An introduction to Docker

Docker is an application execution environment that simplifies the packaging and deployment of applications. Originally it started as a simpler way to use Linux containers. Today it is a huge ecosystem with many tools, and a repository of publicly available application images.

Before looking at a quick overview on how to use it, it is important to understand what Docker is, and how it compares to other—more or less—similar solutions. First of all, why is something like Docker really useful? There are many solutions to packaging applications, and all operating systems provide their own packaging system (even several, sometimes). Docker does not aim to be a replacement for the Windows application installer, macOS packages, or Debian packages. Docker aims to simplify the setup and isolation of an application (or, most of the time, a service). It is first and foremost an isolation tool for a Linux distribution.

Historically, there have been three main categories of solution to isolate the execution of some software:

Emulators
Virtual machines
Containers

Emulators are the oldest isolation solutions available. They are also probably the solution that offers the strongest guarantee of isolation from the host system. An emulator is software that implements the instruction set of a CPU, as well as the interfaces of some other hardware devices (such as serial ports and disk interfaces). The instruction set that is implemented is not necessarily that of the host system. This allows us to execute ARM code on an x86 system, for example. QEMU is a famous open source emulator that supports many kinds of hardware architecture such as ARM and MIPS. Nowadays, emulators are mainly used for two cases: testing software on specialized hardware more easily, and retrogaming. Many people use an emulator without knowing it when they play old Atari or Amiga games on a PC or a Raspberry Pi! So, emulators are a great solution for running software for another hardware architecture other than the one on the host system. However, they have a big drawback: they are very slow. Since they have to execute each instruction of the target architecture in software, the performance hit is huge.

Emulators are great for testing software for other hardware architectures. However, there are many cases where the need is just to execute software for the same hardware, but in another operating system. This is what virtual machines allow us to do. A typical example is somebody who uses Windows on a PC, but wants to also execute some Linux applications. By executing a Linux distribution inside a virtual machine, it is possible to have Linux and Windows running at the same time. Virtual machines are much more efficient than emulators because they can directly use the CPU, and execute instructions natively. However, this comes at the cost of being less isolated: it is easier for the software running inside the virtual machine to escape from it and execute some code on the host operating system. At some point, CPUs implemented dedicated instructions to improve both the efficiency and security of virtual machines. Virtual machines are now heavily used in cloud environments.

Still, virtual machines have an important overhead: each virtual machine runs a full operating system, including the kernel. Sometimes, the requirement is even simpler and only some applications need to be isolated from the host, by using the same operating system, on the same hardware architecture. This is where containers come into action. A container is not a feature in itself, but an accumulation of many different isolation mechanisms provided by an operating system. All recent operating systems provide some form of isolation, but it so happens that one of them is heavily used in cloud environments, running on 1-CPU architecture: Linux on an x86-64 CPU. As cloud technologies grew, the need to isolate services on big servers, while having a minimal impact on performance, became more and more important. In the early days of cloud computing, virtual machines where used to isolate services. But it so happens that very good isolation can be achieved on Linux with far fewer overheads. When features such as capabilities, cgroups, namespaces, and iptables are combined together, it is possible to execute other Linux distributions that share the same kernel environment as the host. This is the basis of a Linux container.

The following figure shows the principles used in each of these isolation technologies:

Figure 7.2: Emulator, virtual machine, and container isolation architectures

Emulators provide the best isolation, but at a higher performance cost, because each code instruction is executed as instructions on another instruction set. Virtual machines are much more efficient, at the cost of less isolation than on an emulator. Finally, containers provide even less isolation guarantees, but are the most efficient solution. The fact that containers are less isolated by design does not mean that they are not secured: they are deployed in many services where isolation between applications is required.

There are many container management solutions for Linux, and Docker is one of them. One promise from Docker is to package your application once, and then execute it anywhere. But, as usual with such promises, there is a catch: a Docker image is universal only if you consider that running Linux on an x86-64 processor is universal! The fact is that running Docker from another operating system (namely Windows or macOS) relies on the execution of a virtual machine running a Linux distribution that hosts the Docker containers. So, in the end, Docker is just a way to execute an isolated Linux system hosted by another Linux system on an x86-64 processor. This solution is very efficient because many resources are shared with the Linux host system, and the container is usually a very specialized system which contains only what is needed to execute the service.

The original Docker solution evolved quickly to propose many features on top of container management:

Docker itself allows us to create and manage containers.
Docker Compose allows us to create systems that are based on several containers. It also simplifies the definition of resources accessible to the container.
Swarm allows us to create clusters of physical machines so that the deployment of many instances of containers is easier.
Docker Stack leverages Swarm to scale Docker instances easily.

For several of these features, other companies propose alternative solutions, such as Kubernetes from Google, which allows us to do orchestration instead of Swarm.

Table of Contents for An introduction to Docker

Create new playlist

Sign In

Sign Up

Table of Contents for
An introduction to Docker