Containers

Transitioning from development to production has always been a painful process. It involves a lot of documentation, hand-offs, installation, and configuration. Since every programming language produces software that behaves slightly differently, the deployment of heterogenous applications is always difficult.

Some of these problems have been mitigated by containers. With containers, the installation and configuration is mostly standardized. There are several ways for how to deal with distribution, but this issue also has some standards to follow. This makes containers a great choice for organizations that want to increase the cooperation between development and operations.

The following topics will be covered in this chapter:

  • Building containers
  • Testing and integrating containers
  • Understanding container orchestration

Technical requirements

The examples listed in this chapter require the following:

The code present in the chapter has been placed on GitHub at https://github.com/PacktPublishing/Software-Architecture-with-Cpp/tree/master/Chapter14.

Reintroducing containers

Containers are making a lot of buzz recently. One might think they are a brand new technology that was not available before. However, that is not the case. Before the rise of Docker and Kubernetes, the dominating players in the industry at the moment, there were already solutions such as LXC, which offered a lot of similar features.

We can trace the origins of separating one execution environment from another with the chroot mechanism available in UNIX systems since 1979. Similar concepts were also used in FreeBSD jails and Solaris Zones.

The main task of the container is to isolate one execution environment from another. This isolated environment can have its own configuration, different applications, and even different user accounts than the host environment.

Even though the containers are isolated from the host, they usually share the same operating system kernel. This is the main differentiator from virtualized environments. Virtual machines have dedicated virtual resources, which means they are separated at the hardware level. Containers are separated at the process level, which means there is less overhead to run them.

The ability to package and run another operating system that is already optimized and configured for running your application is a strong advantage of containers. Without containers, the build and deploy process usually consists of several steps:

  1. The application is built.
  2. The example configuration files are provided.
  3. Installation scripts and associated documentation is prepared.
  4. The application is packaged for a target operating system (such as Debian or Red Hat).
  5. The packages are deployed to the target platform.
  6. Installation scripts prepare the basis for the application to run.
  7. The configuration has to be tweaked to fit the existing system.

When you switch to containers, there is less of a need for a robust installation script. The application will only target a single well-known operating system – the one present in the container. The same goes for configuration: instead of preparing many configurable options, the application is pre-configured for the target operating system and distributed alongside it. The deployment process consists only of unpacking the container image and running the application process inside it.

While containers and microservices are often thought to be the same thing, they are not. Moreover, containers may mean application containers or operating system containers, and only application containers fit well with microservices. The following sections will tell you why. We'll describe the different container types that you can encounter, show you how they relate to microservices, and explain when it's best to use them (and when to avoid them).

Exploring the container types

Of the containers described so far, operating system containers are fundamentally different from the current container trend led by Docker, Kubernetes, and LXD. Instead of focusing on recreating an entire operating system with services such as syslog and cron, application containers focus on running a single process within a container – just the application.

Proprietary solutions replace all the usual OS-level services. These solutions provide a unified way to manage the applications within a container. For example, instead of using syslog to handle logs, the standard output of the process with PID 1 is considered as application logs. Instead of using a mechanism such as init.d or systemd, the application container's lifecycle is handled by the runtime application.

Since Docker is at the moment the dominant solution for application containers, we will mostly use it as an example throughout this book. To make the picture complete, we will present viable alternatives, as they may be better suited to your needs. Since the project and specification are open source, these alternatives are compatible with Docker and can be used as replacements.

Later in this chapter, we will explain how to use Docker to build, deploy, run, and manage application containers.

The rise of microservices

The success of Docker coincided with the rise of the adoption of microservices. It is no surprise since microservices and application containers fit together naturally.

Without application containers, there was no easy and unified way to package, deploy, and maintain microservices. Even though individual companies developed some solutions to fix these problems, none was popular enough to approach being an industry standard.

Without microservices, the application containers were pretty limited. The software architecture focused on building entire systems explicitly configured for the given set of services running there. Replacing one service with another required a change of the architecture.

When brought together, application containers provide a standard way for the distribution of microservices. Each microserver comes with its own configuration embedded, so operations such as autoscaling or self-healing no longer require knowledge about an underlying application.

You can still use microservices without application containers and you can use application containers without hosting microservices in them. For instance, even though neither PostgreSQL databases nor Nginx web servers were designed as microservices, they are typically used in application containers.

Choosing when to use containers

There are several benefits to the container approach. OS containers and application containers also have some different use cases in which their strengths lie.

The benefits of containers

When compared to virtual machines, the other popular way of isolating environments, containers require less overhead during runtime. Unlike virtual machines, there is no need to run a separate version of an operating system kernel and use the hardware or software virtualization techniques. Application containers also do not run other operating system services that are typically found in virtual machines such as syslog, cron, or init. Additionally, application containers offer smaller images as they do not usually have to carry an entire operating system copy. In extreme examples, an application container can consist of a single statically linked binary.

At this point, you may wonder why to bother with containers at all if there is just a single binary inside? There is one particular benefit of having a unified and standardized way to build and run containers. As containers have to follow specific conventions, it is easier to orchestrate them than regular binaries, which can have different expectations regarding logging, configuration, opening ports, and so on.

Another thing is that containers provide a built-in means of isolation. Each container has its own namespace for processes and a namespace for user accounts, among others. This means that the process (or processes) from one container has no notion of the processes on the host or in the other containers. The sandboxing can go even further as you can assign memory and a CPU quota to your containers with the same standard user interface (whether it is Docker, Kubernetes, or something else).

The standardized runtime also means higher portability. Once a container is built, you can typically run it on different operating systems without modifications. This also means what runs in operations is very close or identical to what runs in development. Issue reproduction is more effortless and so is debugging.

The disadvantages of containers

Since there is a lot of pressure nowadays to move workloads to containers, you want to understand all the risks associated with such migration as an architect. The benefits are touted everywhere and you probably already understand them.

The main obstacle to container adoption is that not all applications can be easily migrated to containers. This is especially true of application containers that are designed with microservices in mind. If your application is not based on microservices architecture, putting it into containers may introduce more problems than it will solve.

If your application already scales well, uses TCP/IP-based IPC, and is mostly stateless, the move to containers should not be challenging. Otherwise, each of these aspects would pose a challenge and prompt a rethink of the existing design.

Another problem associated with containers is persistent storage. Ideally, containers should have no persistent storage of their own. This makes it possible to take advantage of fast startups, easy scaling, and flexible scheduling. The problem is that applications providing business value cannot exist without persistent storage.

This drawback is usually mitigated by making most containers stateless and relying on an external non-containerized component to store the data and the state. Such an external component can be either a traditional self-hosted database or a managed database from a cloud provider. Going in either direction requires you to reconsider the architecture and modify it accordingly.

Since application containers follow specific conventions, the application has to be modified to follow these conventions. For some applications, it will be a low-effort task. For others, such as multiprocess components using in-memory Inter-Process Communication (IPC), it will be complicated.

One point often omitted is that application containers work great as long as the applications inside them are native Linux applications. While Windows containers are supported, they are neither convenient nor as supported as their Linux counterparts. They also require licensed Windows machines running as hosts.

It is easier to enjoy the application containers' benefits if you are building a new application from scratch and can base your design on this technology. Moving an existing application to application containers, especially if it is complicated, will require a lot more work and possibly also a revamp of the entire architecture. In such a case, we advise you to consider all the benefits and disadvantages extra carefully. Making a wrong decision may harm your product's lead time, availability, and budget.

Building containers

Application containers are the focus of this section. While OS containers mostly follow system programming principles, application containers bring new challenges and patterns. Also, they provide specialized build tools to deal with those challenges. The primary tool we will consider is Docker, as it's the current de facto standard for building and running application containers. We will also present some alternative approaches to building application containers.

Unless otherwise noted, whenever we use the word "containers" from now on, it relates to "application containers."

In this section, we will focus on different approaches to using Docker for building and deploying containers.

Container images explained

Before we describe container images and how to build them, it is vital to understand the distinction between containers and container images. There is often confusion between the terms, especially during informal conversations.

The difference between a container and a container image is the same as between a running process and an executable file.

Container images are static: They're snapshots of a particular filesystem and associated metadata. The metadata describes, among other things, what environmental variables are set during runtime or which program to run when the container is created from the image.

Containers are dynamic: They are running a process contained within the container image. We can create containers from the container images and we can also create container images by snapshotting a running container. The container image build process consists, in fact, of creating several containers, executing commands inside them, and snapshotting them after the command finishes.

To distinguish between the data introduced by the container image and the data generated during runtime, Docker uses union mount filesystems to create different filesystem layers. These layers are also present in the container images. Typically, each build step of the container image corresponds to a new layer in the resulting container image.

Using Dockerfiles to build an application

The most common way to build an application container image using Docker is to use a Dockerfile. Dockerfile is an imperative language describing the operations required to produce the resulting image. Some of the operations create new filesystem layers; others operate on metadata.

We will not go into details and specifics related to Dockerfiles. Instead, we will show different approaches to containerizing a C++ application. For this, we need to introduce some syntax and concepts related to Dockerfiles.

Here is an example of a very simple Dockerfile:

FROM ubuntu:bionic

RUN apt-get update && apt-get -y install build-essentials gcc

CMD /usr/bin/gcc

Typically, we can divide a Dockerfile into three parts:

  • Importing the base image (the FROM instruction)
  • Performing operations within the container that will result in a container image (the RUN instruction)
  • Metadata used during runtime (the CMD command)

The latter two parts may well be interleaved, and each of them may comprise one or more instructions. It is also possible to omit any of the later parts as only the base image is mandatory. This does not mean you cannot start with an empty filesystem. There is a special base image named scratch exactly for this purpose. Adding a single statically linked binary to an otherwise empty filesystem could look like the following:

FROM scratch

COPY customer /bin/customer

CMD /bin/customer

In the first Dockerfile, the steps we take are the following:

  1. Import the base Ubuntu Bionic image.
  2. Run a command inside the container. The results of the command will create a new filesystem layer inside the target image. This means the packages installed with apt-get will be available in all the containers based on this image.
  3. Set the runtime metadata. When creating a container based on this image, we want to run GCC as the default process.

To build an image from a Dockerfile, you will use the docker build command. It takes one required argument, the directory containing the build context, which means the Dockerfile itself and other files you want to copy inside the container. To build a Dockerfile from a current directory, use docker build.

This will build an anonymous image, which is not very useful. Most of the time, you want to use named images. There is a convention to follow when naming container images and that's what we'll cover in the next section.

Naming and distributing images

Each container image in Docker has a distinctive name consisting of three elements: the name of the registry, the name of the image, a tag. Container registries are object repositories holding container images. The default container registry for Docker is docker.io. When pulling an image from this registry, we may omit the registry name.

Our previous example with ubuntu:bionic has the full name of docker.io/ubuntu:bionic. In this example, ubuntu is the name of the image, while bionic is a tag that represents a particular version of an image.

When building an application based on containers, you will be interested in storing all the registry images. It is possible to host your private registry and keep your images there or use a managed solution. Popular managed solutions include the following:

  • Docker Hub
  • quay.io
  • GitHub
  • Cloud providers (such as AWS, GCP, or Azure)

Docker Hub is still the most popular one, though some public images are migrating to quay.io. Both are general-purpose and allow the storage of public and private images. GitHub or cloud providers will be mainly attractive to you if you are already using a particular platform and want to keep your images close to the CI pipeline or the deployment targets. It is also helpful if you want to reduce the number of individual services you use.

If none of the solutions appeal to you, hosting your own local registry is also very easy and requires you to run a single container.

To build a named image, you need to pass the -t argument to the docker build command. For example, to build an image named dominicanfair/merchant:v2.0.3, you will use docker build -t dominicanfair/merchant:v2.0.3 ..

Compiled applications and containers

When building container images for applications in interpreted languages (such as Python or JavaScript), the approach is mostly the same:

  1. Install dependencies.
  2. Copy source files to the container image.
  3. Copy the necessary configuration.
  4. Set the runtime command.

For compiled applications, however, there's an additional step of compiling the application first. There are several possible ways to implement this step, each of them with their pros and cons.

The most obvious approach is to install all the dependencies first, copy the source files, and then compile the application as one of the container build steps. The major benefit is that we can accurately control the toolchain's contents and configuration and therefore have a portable way to build an application. However, the downside is too big to ignore: the resulting container image contains a lot of unnecessary files. After all, we will need neither source code nor the toolchain during runtime. Due to the way overlay filesystems work, it is impossible to remove the files after being introduced in a previous layer. What is more, the source code in the container may prove to be a security risk if an attacker manages to break into the container.

Here's how it can look:

FROM ubuntu:bionic

RUN apt-get update && apt-get -y install build-essentials gcc cmake

ADD . /usr/src

WORKDIR /usr/src

RUN mkdir build &&
cd build &&
cmake .. -DCMAKE_BUILD_TYPE=Release &&
cmake --build . &&
cmake --install .

CMD /usr/local/bin/customer

Another obvious approach, and the one we discussed earlier, is building the application on the host machine and only copying the resulting binaries inside the container image. This requires fewer changes to the current build process when one is already established. The main drawback is that you have to match the same set of libraries on your build machines as you do in your containers. If you're running, for example, Ubuntu 20.04 as your host operating system, your containers will have to be based on Ubuntu 20.04 as well. Otherwise, you risk incompatibilities. With this approach, it is also necessary to configure the toolchain independently of the container.

Just like this:

FROM scratch

COPY customer /bin/customer

CMD /bin/customer

A slightly more complicated approach is to have a multi-stage build. With multi-stage builds, one stage may be dedicated to setting up the toolchain and compiling the project, while another stage copies the resulting binaries to their target container image. This has several benefits over the previous solutions. First of all, the Dockerfiles now control both the toolchain and the runtime environment, so every step of the build is thoroughly documented. Second of all, it is possible to use the image with the toolchain to ensure compatibility between development and the Continuous Integration/Continuous Deployment (CI/CD) pipeline. This way also makes it easier to distribute upgrades and fixes to the toolchain itself. The major downside is that the containerized toolchain may not be as comfortable to use as a native one. Also, build tools are not particularly well-suited to application containers, which require that there's one process running per container. This may lead to unexpected behavior whenever some of the processes crash or are forcefully stopped.

A multi-stage version of the preceding example would look like this:

FROM ubuntu:bionic AS builder

RUN apt-get update && apt-get -y install build-essentials gcc cmake

ADD . /usr/src

WORKDIR /usr/src

RUN mkdir build &&
cd build &&
cmake .. -DCMAKE_BUILD_TYPE=Release &&
cmake --build .

FROM ubuntu:bionic

COPY --from=builder /usr/src/build/bin/customer /bin/customer

CMD /bin/customer

The first stage, starting at the first FROM command sets up the builder, adds the sources, and builds the binaries. Then, the second stage, starting at the second FROM command, copies the resulting binary from the previous stage without copying the toolchain or the sources.

Targeting multiple architectures with manifests

Application containers with Docker are typically used on x86_64 (also known as AMD64) machines. If you are only targeting this platform, you have nothing to worry about. However, if you are developing IoT, embedded, or edge applications, you may be interested in multi-architecture images.

Since Docker is available on many different CPU architectures, there are several ways to approach image management on multiple platforms.

One way to handle images built for different targets is by using the image tags to describe a particular platform. Instead of merchant:v2.0.3, we could have merchant:v2.0.3-aarch64. Although this approach may seem to be the easiest to implement, it is, in fact, a bit problematic.

Not only do you have to change the build process to include the architecture in the tagging process. When pulling the images to run them, you will also have to take care to manually append the expected suffix everywhere. If you are using an orchestrator, you won't be able to share the manifests between the different platforms in a straightforward way, as the tags will be platform-specific.

A better way that doesn't require modifying the deployment step is to use manifest-tool (https://github.com/estesp/manifest-tool). The build process at first looks similar to the one suggested previously. Images are built separately on all the supported architectures and pushed to the registry with a platform suffix in their tags. After all the images are pushed, manifest-tool merges the images to provide a single multi-architecture one. This way, each supported platform is able to use the exact same tag.

An example configuration for manifest-tool is provided here:

image: hosacpp/merchant:v2.0.3
manifests:
- image: hosacpp/merchant:v2.0.3-amd64
platform:
architecture: amd64
os: linux
- image: hosacpp/merchant:v2.0.3-arm32
platform:
architecture: arm
os: linux
- image: hosacpp/merchant:v2.0.3-arm64
platform:
architecture: arm64
os: linux

Here, we have three supported platforms, each with their respective suffix (hosacpp/merchant:v2.0.3-amd64, hosacpp/merchant:v2.0.3-arm32, and hosacpp/merchant:v2.0.3-arm64). Manifest-tool combines the images built for each platform and produces a hosacpp/merchant:v2.0.3 image that we can use everywhere.

Another possibility is to use Docker's built-in feature called Buildx. With Buildx, you can attach several builder instances, each of which targets a required architecture. What's interesting is that you don't need to have native machines to run the builds; you can also use the QEMU emulation or cross-compilation in a multi-stage build. Although it is much more powerful than the previous approach, Buildx is also quite complicated. At the time of writing, it requires Docker experimental mode and Linux kernel 4.8 or later. It requires you to set up and manage builders and not everything behaves in an intuitive way. It's possible it will improve and become more stable in the near future.

An example code to prepare the build environment and build a multi-platform image may look like the following:

# create two build contexts running on different machines
docker context create
--docker host=ssh://[email protected]
--description="Remote engine amd64"
node-amd64
docker context create
--docker host=ssh://[email protected]
--description="Remote engine arm64"
node-arm64

# use the contexts
docker buildx create --use --name mybuild node-amd64
docker buildx create --append --name mybuild node-arm64

# build an image
docker buildx build --platform linux/amd64,linux/arm64 .

As you can see, this may be a little confusing if you're used to the regular docker build command.

Alternative ways to build application containers

Building container images with Docker requires the Docker daemon to be running. The Docker daemon requires root privileges, which may pose security problems in some setups. Even though the Docker client that does the building may be run by an unprivileged user, it is not always feasible to install the Docker daemon in the build environment.

Buildah

Buildah is an alternative tool to build container images that can be configured to run without root access. Buildah can work with regular Dockerfiles, which we discussed earlier. It also presents its own command-line interface that you can use in shell scripts or other automation you find more intuitive. One of the previous Dockerfiles rewritten as a shell script using the buildah interface will look like this:

#!/bin/sh

ctr=$(buildah from ubuntu:bionic)

buildah run $ctr -- /bin/sh -c 'apt-get update && apt-get install -y build-essential gcc'

buildah config --cmd '/usr/bin/gcc' "$ctr"

buildah commit "$ctr" hosacpp-gcc

buildah rm "$ctr"

One interesting feature of Buildah is that it allows you to mount the container image filesystem into your host filesystem. This way, you can use your host's commands to interact with the contents of the image. If you have software you don't want (or can't due to licensing restrictions) put within the container, it's still possible to invoke it outside of the container when using Buildah.

Ansible-bender

Ansible-bender uses Ansible playbooks and Buildah to build container images. All of the configuration, including base images and metadata, is passed as a variable within the playbook. Here is our previous example converted to Ansible syntax:

---
- name: Container image with ansible-bender
hosts: all
vars:
ansible_bender:
base_image: python:3-buster

target_image:
name: hosacpp-gcc
cmd: /usr/bin/gcc
tasks:
- name: Install Apt packages
apt:
pkg:
- build-essential
- gcc

As you see, the ansible_bender variable is responsible for all the configuration specific to containers. The tasks presented below are executed inside the container based on base_image.

One thing to note is that Ansible requires a Python interpreter present in the base image. This is why we had to change ubuntu:bionic used in previous examples to python:3-buster. ubuntu:bionic is an Ubuntu image without a Python interpreter preinstalled.

Others

There are also other ways to build container images. You can use Nix to create a filesystem image and then put it inside the image using Dockerfile's COPY instruction, for example. Going further, you can prepare a filesystem image by any other means and then import it as a base container image using docker import.

Choose whichever solution fits your particular needs. Keep in mind that building with a Dockerfile using docker build is the most popular approach and hence it is the best-documented one and the best supported. Going with Buildah is more flexible and allows you to better fit creating container images into your build process. Finally, ansible-bender may be a good solution if you're already heavily invested in Ansible and you want to reuse already available modules.

Integrating containers with CMake

In this section, we'll demonstrate how to create a Docker image by working with CMake.

Configuring the Dockerfile with CMake

First, and foremost, we'll need a Dockerfile. Let's use yet another CMake input file for this:

configure_file(${CMAKE_CURRENT_SOURCE_DIR}/Dockerfile.in
${PROJECT_BINARY_DIR}/Dockerfile @ONLY)

Note that we're using PROJECT_BINARY_DIR to not overwrite any Dockerfiles created by other projects in the source tree if our project is part of a bigger one.

Our Dockerfile.in file will look as follows:

FROM ubuntu:latest
ADD Customer-@[email protected] .
RUN apt-get update &&
apt-get -y --no-install-recommends install ./Customer-@[email protected] &&
apt-get autoremove -y &&
apt-get clean &&
rm -r /var/lib/apt/lists/* Customer-@[email protected]
ENTRYPOINT ["/usr/bin/customer"]
EXPOSE 8080

First, we specify that we'll take the latest Ubuntu image, install our DEB package on it along with its dependencies, and then tidy up. It's important to update the package manager cache in the same step as installing the package to avoid issues with stale caches due to how layers in Docker work. Cleanup is also performed as part of the same RUN command (in the same layer) so that the layer size is smaller. After installing the package, we make our image run the customer microservice when it is started. Finally, we tell Docker to expose the port that it will be listening on.

Now, back to our CMakeLists.txt file.

Integrating containers with CMake

For CMake-based projects, it is possible to include a build step responsible for building the containers. For that, we need to tell CMake to find the Docker executable and bail out if it doesn't. We can do this using the following:

find_program(Docker_EXECUTABLE docker)
if(NOT Docker_EXECUTABLE)
message(FATAL_ERROR "Docker not found")
endif()

Let's revisit the example from one of Chapter 7, Building and Packaging. There, we built a binary and a Conan package for the customer application. Now, we want to package this application as a Debian archive and build a Debian container image with a pre-installed package for the customer application.

To create our DEB package, we need a helper target. Let's use CMake's add_custom_target functionality for this:

add_custom_target(
customer-deb
COMMENT "Creating Customer DEB package"
COMMAND ${CMAKE_CPACK_COMMAND} -G DEB
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
VERBATIM)
add_dependencies(customer-deb libcustomer)

Our target invokes CPack to create just the one package that's interesting for us and omitting the rest. We want the package to be created in the same directory as the Dockerfile for convenience. The VERBATIM keyword is recommended as, with it, CMake will escape problematic characters. If it's not specified, the behavior of your scripts may vary across different platforms.

The add_dependencies call will make sure that before CMake builds the customer-deb target, libcustomer is already built. As we now have our helper target, let's use it when creating the container image:

add_custom_target(
docker
COMMENT "Preparing Docker image"
COMMAND ${Docker_EXECUTABLE} build ${PROJECT_BINARY_DIR}
-t dominicanfair/customer:${PROJECT_VERSION} -t dominicanfair/customer:latest
VERBATIM)
add_dependencies(docker customer-deb)

As you can see, we invoke the Docker executable we found earlier in the directory containing our Dockerfile and DEB package, to create an image. We also tell Docker to tag our image as both the latest and with the version of our project. Finally, we ensure the DEB package will be built when we invoke our Docker target.

Building the image is as simple as make docker if make is the generator you chose. If you prefer the full CMake command (for example, to create generator-agnostic scripts), the invocation is cmake --build . --target docker.

Testing and integrating containers

Containers fit very well with CI/CD pipelines. Since they mostly require no further dependencies other than the container runtime itself, they can be easily tested. Worker machines don't have to be provisioned to fulfill the testing needs, so adding more nodes is much easier. What is more, all of them are general-purpose so that they may act both as builders, test runners, and even deployment executors without any prior configuration.

Another great benefit of using containers in CI/CD is the fact that they are isolated from one another. This means multiple copies running on the same machine should not interfere. That is true unless the tests require some resources from the host operating system, such as port forwarding or volume mounting. Therefore it's best to design tests so that such resources are not necessary (or at least they don't clash). Port randomization is a helpful technique to avoid clashes, for example.

Runtime libraries inside containers

The choice of containers may influence the choice of a toolchain and, therefore, C++ language features available to the application. Since containers are typically Linux-based, the system compiler available is usually GNU GCC with glibc as a standard library. However, some Linux distributions popular with containers, such as Alpine Linux, are based on a different standard library, musl.

If you are targeting such a distribution, make sure the code you'll be using, whether developed in-house or from third-party providers, is compatible with musl. The main advantage of both musl and Alpine Linux is that it results in much smaller container images. For example, a Python image built for Debian Buster is around 330 MB, the slimmed-down Debian version is around 40 MB, while the Alpine version is only around 16 MB. Smaller images mean less wasted bandwidth (for uploads and downloads) and quicker updates.

Alpine may also introduce some unwanted traits, such as longer build times, obscure bugs, or reduced performance. If you want to use it to reduce the size, run proper tests to make sure the application behaves without problems.

To reduce your images' size even more, you may consider ditching the underlying operating system altogether. What we mean by operating system here is all the userland tools ordinarily present in a container, such as a shell, package manager, and shared libraries. After all, if your application is the only thing that's going to be running, everything else is unnecessary.

It is typical for Go or Rust applications to provide a static build that is self-sufficient and can form a container image. While this might not be as straightforward in C++, it is worth considering.

There are a few drawbacks related to decreasing the image size as well. First of all, if you decide to go with Alpine Linux, keep in mind it is not as popular as, say, Ubuntu, Debian, or CentOS. Although it is often a platform of choice for container developers, it's very unusual for any other purpose.

This means that there might be new compatibility problems, mostly stemming from the fact it's not based on the de facto standard glibc implementation. If you rely on third-party components, the provider may not offer support for this platform.

If you decide to go down the single statically linked binary inside the container image route, there are also some challenges to consider. First of all, you are discouraged from statically linking glibc as it makes internal use of dlopen to handle Name Service Switch (NSS) and iconv. If your software relies on DNS resolving or character set conversion, you'll have to provide a copy of glibc and the relevant libraries anyway.

Another point to consider is that shell and package managers are often used for debugging containers that misbehave. When one of your containers is acting strangely, you may start another process inside the container and figure out what is happening inside by using standard UNIX tools such as ps, ls, or cat. To run such an application inside the container, it has to be present in the container image first. Some workarounds allow the operator to inject debugging binaries inside the running container, but none of them are well-supported at the moment.

Alternative container runtimes

Docker is the most popular way to build and run containers, but since the container standard is open, there are also alternative runtimes that you may use. The main replacement for Docker that offers a similar user experience is Podman. Together with Buildah, described in the previous section, they are tools aimed to replace Docker altogether.

The added benefit is that they don't require an additional daemon running on a host machine, as Docker does. Both also have support (although it is not yet mature) for rootless operations, which makes them a better fit for security-critical operations. Podman accepts all the commands you would expect the Docker CLI to take, so you can simply use it as an alias this way.

Another approach to containers that aims to provide better security is the Kata Containers initiative. Kata Containers uses lightweight virtual machines to leverage the hardware virtualization required for an additional level of isolation between the containers and the host operating system.

Cri-O and containerd are also popular runtimes used by Kubernetes.

Understanding container orchestration

Some of the containers' benefits only become apparent when you are using a container orchestrator to manage them. An orchestrator keeps track of all the nodes that will be running your workload, and it also monitors the health and status of the containers spread across these nodes.

More advanced features, for example, high availability, require the proper setup of the orchestrator, which typically means dedicating at least three machines for the control plane and another three machines for worker nodes. The autoscaling of nodes, in addition to the autoscaling of containers, also requires the orchestrator to have a driver able to control the underlying infrastructure (for example, by using the cloud provider's API).

Here, we will cover some of the most popular orchestrators that you can choose from to base your system on. You will find more practical information on Kubernetes in the next chapter, Chapter 15, Cloud-Native Design. Here, we give you an overview of the possible choices.

The presented orchestrators operate on similar objects (services, containers, batch jobs) although each may behave differently. The available features and operating principles vary between them. What they have in common is that you typically write a configuration file that declaratively describes the required resources and then you apply this configuration using a dedicated CLI tool. To illustrate the differences between the tools, we provide an example configuration specifying a web application introduced before (the merchant service) and a popular web server, Nginx, to act as a proxy.

Self-hosted solutions

Whether you are running your application on-premises, in a private cloud, or in a public cloud, you may want to have tight control over the orchestrator of your choice. The following is a collection of self-hosted solutions in this space. Keep in mind that most of them are also available as managed services. However, going with self-hosted helps you prevent vendor lock-in, which may be desirable for your organization.

Kubernetes

Kubernetes is probably the best-known orchestrator of all the ones that we mention here. It is prevalent, which means there is a lot of documentation and community support if you decide to implement it.

Even though Kubernetes uses the same application container format as Docker, this is basically where all the similarities end. It is impossible to use standard Docker tools to interact with Kubernetes clusters and resources directly. There is a new set of tools and concepts to learn when using Kubernetes.

Whereas with Docker, the container is the main object you will operate on, with Kubernetes, the smallest piece of the runtime is called a Pod. A Pod may consist of one or more containers that share mount points and networking resources. Pods in themselves are rarely of interest as Kubernetes also has higher-order concepts such as Replication Controllers, Deployment Controllers, or DaemonSets. Their role is to keep track of the pods and ensure the desired number of replicas is running on the nodes.

The networking model in Kubernetes is also very different from Docker. With Docker, you can forward ports from a container to make it accessible from different machines. With Kubernetes, if you want to access a pod, you typically create a Service resource, which may act as a load balancer to handle the traffic to the pods that form the service's backend. Services may be used for pod-to-pod communication, but they may also be exposed to the internet. Internally, Kubernetes resources perform service discovery using DNS names.

Kubernetes is declarative and eventually consistent. This means that instead of directly creating and allocating resources, you only have to provide the description of the desired end state and Kubernetes will do the work required to bring the cluster to the desired state. Resources are often described using YAML.

Since Kubernetes is highly extensible, there are a lot of associated projects developed under the Cloud Native Computing Foundation (CNCF), which turn Kubernetes into a provider-agnostic cloud development platform. We will present Kubernetes in more detail in the next chapter, Chapter 15, Cloud Native Design.

Here's how the resource definition looks for Kubernetes using YAML (merchant.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: dominican-front
name: dominican-front
spec:
selector:
matchLabels:
app: dominican-front
template:
metadata:
labels:
app: dominican-front
spec:
containers:
- name: webserver
imagePullPolicy: Always
image: nginx
ports:
- name: http
containerPort: 80
protocol: TCP
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
labels:
app: dominican-front
name: dominican-front
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: dominican-front
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: dominican-merchant
name: merchant
spec:
selector:
matchLabels:
app: dominican-merchant
replicas: 3
template:
metadata:
labels:
app: dominican-merchant
spec:
containers:
- name: merchant
imagePullPolicy: Always
image: hosacpp/merchant:v2.0.3
ports:
- name: http
containerPort: 8000
protocol: TCP
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
labels:
app: dominican-merchant
name: merchant
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8000
selector:
app: dominican-merchant
type: ClusterIP

To apply this configuration and orchestrate the containers, use kubectl apply -f merchant.yaml.

Docker Swarm

Docker Engine, also required to build and run Docker containers, comes pre-installed with its own orchestrator. This orchestrator is Docker Swarm, and its main feature is high compatibility with existing Docker tools by using the Docker API.

Docker Swarm uses the concept of Services to manage health checks and autoscaling. It supports rolling upgrades of the services natively. Services are able to publish their ports, which will then be served by Swarm's load balancer. It supports storing configs as objects for runtime customization and has basic secret management built in.

Docker Swarm is much simpler and less extensible than Kubernetes. This could be an advantage if you do not want to learn about all the details of Kubernetes. However, the main disadvantage is a lack of popularity, which means it is harder to find relevant material about Docker Swarm.

One of the benefits of using Docker Swarm is that you don't have to learn new commands. If you're already used to Docker and Docker Compose, Swarm works with the same resources. It allows specific options that extend Docker to handle deployments.

Two services orchestrated with Swarm would look like this (docker-compose.yml):

version: "3.8"
services:
web:
image: nginx
ports:
- "80:80"
depends_on:
- merchant
merchant:
image: hosacpp/merchant:v2.0.3
deploy:
replicas: 3
ports:
- "8000"

To apply the configuration, you run docker stack deploy --compose-file docker-compose.yml dominican.

Nomad

Nomad is different from the previous two solutions, as it is not focused solely on containers. It is a general-purpose orchestrator with support for Docker, Podman, Qemu Virtual Machines, isolated fork/exec, and several other task drivers. Nomad is a solution worth learning about if you want to gain some of the advantages of container orchestration without migrating your application to containers.

It is relatively easy to set up and integrates well with other HashiCorp products such as Consul for service discovery and Vault for secret management. Like Docker or Kubernetes, Nomad clients can run locally and connect to the server responsible for managing your cluster.

There are three job types available in Nomad:

  • Service: A long-lived task that should not exit without manual intervention (for example, a web server or a database).
  • Batch: A shorter-lived task that can complete within as little as a few minutes. If the batch job returns an exit code indicating an error, it is either restarted or rescheduled according to configuration.
  • System: A task that it is necessary to run on every node in the cluster (for example, logging agent).

Compared to other orchestrators, Nomad is relatively easy to install and maintain. It is also extensible when it comes to task drivers or device plugins (used to access dedicated hardware such as GPUs or FPGAs). It lacks in community support and third-party integrations when compared to Kubernetes. Nomad does not require you to redesign the application's architecture to access the provided benefits, which is often the case with Kubernetes.

To configure the two services with Nomad, we need two configuration files. The first one is nginx.nomad:

job "web" {
datacenters = ["dc1"]
type = "service"
group "nginx" {
task "nginx" {
driver = "docker"
config {
image = "nginx"
port_map {
http = 80
}
}
resources {
network {
port "http" {
static = 80
}
}
}
service {
name = "nginx"
tags = [ "dominican-front", "web", "nginx" ]
port = "http"
check {
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
}

The second describes the merchant application, so it's called merchant.nomad:

job "merchant" {
datacenters = ["dc1"]
type = "service"
group "merchant" {
count = 3
task "merchant" {
driver = "docker"
config {
image = "hosacpp/merchant:v2.0.3"
port_map {
http = 8000
}
}
resources {
network {
port "http" {
static = 8000
}
}
}
service {
name = "merchant"
tags = [ "dominican-front", "merchant" ]
port = "http"
check {
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
}

To apply the configuration, you run nomad job run merchant.nomad && nomad job run nginx.nomad.

OpenShift

OpenShift is Red Hat's commercial container platform built on Kubernetes. It includes a lot of additional components that are useful in the everyday operations of Kubernetes clusters. You get a container registry, a build tool similar to Jenkins, Prometheus for monitoring, Istio for service mesh, and Jaeger for tracing. It is not fully compatible with Kubernetes so it shouldn't be thought of as a drop-in replacement.

It is built on top of existing Red Hat technology such as CoreOS and Red Hat Enterprise Linux. You can use it on-premises, within Red Hat Cloud, on one of the supported public cloud providers (including AWS, GCP, IBM, and Microsoft Azure), or as a hybrid cloud.

There is also an open source community-supported project called OKD, which forms the basis of Red Hat's OpenShift. If you do not require commercial support and other benefits of OpenShift, you may still use OKD for your Kubernetes workflow.

Managed services

As previously mentioned, some of the aforementioned orchestrators are also available as managed services. Kubernetes, for instance, is available as a managed solution in multiple public cloud providers. This section will show you some of the different approaches to container orchestration, which are not based on any of the solutions mentioned above.

AWS ECS

Before Kubernetes released its 1.0 version, Amazon Web Services proposed its own container orchestration technology called Elastic Container Service (ECS). ECS provides an orchestrator that monitors, scales, and restarts your services when needed.

To run containers in ECS, you need to provide the EC2 instances on which the workload will run. You are not billed for the orchestrator's use, but you are billed for all the AWS services that you typically use (the underlying EC2 instances, for example, or an RDS database).

One of the significant benefits of ECS is its excellent integration with the rest of the AWS ecosystem. If you are already familiar with AWS services and invested in the platform, you will have less trouble understanding and managing ECS.

If you do not require many of the Kubernetes advanced features and its extensions, ECS may be a better choice as it's more straightforward and more comfortable to learn.

AWS Fargate

Another managed orchestrator offered by AWS is Fargate. Unlike ECS, it does not require you to provision and pay for the underlying EC2 instances. The only components you are focused on are the containers, the network interfaces attached to them, and IAM permissions.

Fargate requires the least amount of maintenance compared to other solutions and is the easiest to learn. Autoscaling and load-balancing are available out of the box thanks to the existing AWS products in this space.

The main downside here is the premium that you pay for hosting your services when compared to ECS. A straight comparison is not possible as ECS requires paying for the EC2 instances, while Fargate requires paying for the memory and CPU usage independently. This lack of direct control over your cluster may easily lead to high costs once your services start to autoscale.

Azure Service Fabric

The problem with all of the preceding solutions is that they mostly target Docker containers, which are first and foremost Linux-centric. Azure Service Fabric, on the other hand, is a Windows-first product backed by Microsoft. It enables running legacy Windows apps without modifications, which may help you migrate your application if it relies on such services.

As with Kubernetes, Azure Service Fabric is not so much a container orchestrator in itself, but rather a platform on top of which you can build your applications. One of the building blocks happens to be containers, so it works fine as an orchestrator.

With the recent introduction of Azure Kubernetes Service, the managed Kubernetes platform in the Azure cloud, there is less need for using Service Fabric.

Summary

When you are an architect of modern software, you have to take into account modern technologies. Taking them into account doesn't mean following the trends blindly; it means being able to objectively assess whether a particular proposition makes sense in your case or not.

Both microservices, presented in the previous chapters, and containers, presented in this chapter, are worth considering and understanding. Are they worth implementing as well? It depends heavily on what type of product you are designing. If you've read this far, you are ready to make the decision for yourself.

The next chapter is dedicated to cloud-native design. A very interesting but also a complex topic that ties in service-oriented architecture, CI/CD, microservices, containers, and cloud services. As it turns out, the great performance of C++ is a welcome feature for some of the cloud-native building blocks.

Questions

  1. How do application containers differ from operating system containers?
  2. What are some early examples of sandboxing environments in UNIX systems?
  3. Why are containers a good fit for microservices?
  4. What are the main differences between containers and virtual machines?
  5. When are application containers a bad choice?
  6. What are some tools to build multi-platform container images?
  7. Besides Docker, what are some other container runtimes?
  8. What are some popular orchestrators?

Further reading

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.141.6