Monitoring Containers

In a typical containerized environment, one host machine can run many containers. From a macro monitoring perspective, we first want to have an overview of the container status—for example, how many healthy containers vs. broken containers are on that host VM, or how many web containers vs. database containers are running on the host VM? This data is particularly important if we have more than one VM in a cluster, which is the case in almost any real world scenario. It gives us insights into how and what types of containers and services are distributed across a cluster, so that we can correct unwanted behaviors. For example, we can learn over time that our containers hosting databases use up way more resources than the ones that only serve as gateways. We could then further use the data to tell our scheduler (see more information on orchestration and scheduling in Chapter 5) to only place the database containers on bigger VMs that offer more RAM and CPU, and the gateway containers on smaller VMs.

Therefore, when it comes down to what we should monitor at an individual container level, the classical runtime metrics including CPU, Memory, Network, and Disk are still the important indicators. For example, we can monitor the memory usage trend to detect the potential memory leak caused by a service in the container. Monitoring those metrics on containers is very important as we can combine them with other metrics, such as the ones coming from the host VM. There are situations when only the combined real-time runtime metrics enable us to make the right decisions.

Let’s think about the following scenario. We detect that one of our containers has a very high CPU usage. As we have learned, we can spin up a container quickly, so we might think just to spin up another one to add more CPU capacity. However, if the current host environment is running low on CPU, we are unable to put a new container on it. In this case, we would need to add a new host VM first, and then put the container on it.

So how and where does Docker emit the data needed for monitoring? The answer is that Docker relies on two Linux kernel mechanisms, control groups and namespaces (we have discussed control groups and namespaces in Chapter 2), to create the isolated container environment.

Those two features also provide the basic container runtime metrics.

Control groups expose metrics about CPU, Memory, and Disk usage through a pseudo-filesystem. In most of the latest Linux distributions using the Linux kernel 3.x or later, such as Ubuntu 14.04, CentOS 7 Red Hat Enterprise Linux 7 and so on, they are mounted on “/sys/fs/cgroup/” with each control group having its own sub-directory. For example, memory metrics can be found in the “memory” control group sub-directory. On some older systems, it might be mounted on /cgroup and the file hierarchies are also different.

Namespaces expose network metrics. We can utilize the system call setns to switch the current monitoring agent process to the same network namespace of the container and then read the metrics data from “/proc/net/dev”.


Image Docker Runmetrics

The Docker web site http://docs.docker.com/articles/runmetrics/ provides more information on Docker runtime metrics.


Now that we know where to find the data, we need to have an easy way to read that information. There are actually two choices.

Read the data directly from the control groups and namespaces

Use the Docker Remote API

The Docker Remote API is a set of RESTFul APIs using JSON and GET/POST methods. Since Docker Remote API v1.17, it can collect key performance metrics from containers running on the host. It provides a programmable way for external monitoring agents to query Docker information like container metadata and lifecycle events. The information returned by the APIs provide a comprehensive overview of the containers and their host VM. Below is a list of calls relevant to monitoring.

GET /info: provides system-wide information; for example, total memory of the host, total number containers on the host, total number of images, and so on.

GET /version: provides the Docker version.

GET /events: provides the container lifecycle and runtime events with timestamp. Table 7.1 provides an overview of all events and their respective Docker commands. Monitoring container events is crucial for the overall monitoring strategy in automated environments, as it provides insights into the lifecycle of the containers.

Image

TABLE 7.1: Overview of container events

As an example, the single API GET /containers/(container_id)/stats can return all CPU, Memory, Disk, and Network usage in the unified JSON format.


Image Docker API Security

The Docker Remote API is using UNIX sockets, which enables traditional UNIX permission checks to limit access. In order to secure the communication between the Docker client, or any other HTTP client, one should enable TLS authentication. The Docker web site offers more information on how to enable this at https://docs.docker.com/engine/articles/security/


By default, the Docker Remote API is bound to a local UNIX socket (unix:///var/run/docker.sock) on the host it is running on. However, it can be bound to a network port on that host so that monitoring agents (and other software that talks to the Docker Remote API) can communicate with it over the network. So potentially, we can configure one monitoring agent to collect container metrics from multiple hosts.

By v1.20, the Docker Remote API only returns all the possible metrics for the given container on one-second interval. There is no way to specify the metrics type and interval. It can generate a significant overhead if we want to monitor hundreds of containers with the Remote API. Thus, if the target environment is resource-restricted, reading the data from control groups and namespaces can be a better choice. On the other hand, Docker is a quickly evolving platform. We also expect the Remote API will provide more custom options in future releases to improve the resource utilization.


Image Note

The Docker Remote API website https://docs.docker.com/reference/api/docker_remote_api/ offers a complete list of all the different Remote API versions and methods.


Now that we know what we need to monitor from a container perspective, where to find the data, and how to access the data, we need to find a way to collect it. The good thing is that we do not really need to build our own agent to collect the data as there are already many monitoring solutions available, although we could.

An important question, however, is where to run the monitoring agent. Most monitoring solutions offer either a monitoring agent that runs on the host VM, or a container that contains the monitoring agent. While the preferred way is to containerize the agent as well, the answer is really that it depends on the scenario and host VM.

If the VM already hosts other applications, we can extend the monitoring agent running on each host to support Docker as well. It is very doable, irrespective of whether you choose the native Linux solution (control groups and namespaces) or the Docker Remote API. In fact, many existing server monitor solutions have enabled Docker monitoring in their host-based agent. The Azure Diagnostics agent, for example, runs on the host VM, collects all the data from the directories on the host VM, and transfers it to a different location, such as Azure storage.

If our host VMs are only hosting containerized applications, we need a consolidated solution to deploy and manage all the applications, including the monitoring agent. The preferred way is to containerize the agent as well. With the agent and all its dependencies packaged into a single image, we can deploy and run the monitoring agent on any Host/OS and integrate with other container orchestration tools.


Image Monitoring Agents

Some special Docker environments such as CoreOS, do not even permit third-party packages installed on the host VM, so using a “Monitoring” container is the only option.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.96