Comparing actual resource usage with defined limits

Knowing when a container uses too much or too few resources compared to requests helps us be more precise with resource definitions and, ultimately, help Kubernetes make better decisions where to schedule our Pods. In most cases, having too big of a discrepancy between requested and actual resource usage will not result in malfunctioning. Instead, it is more likely to result in an unbalanced distribution of Pods or in having more nodes than we need. Limits, on the other hand, are a different story.

If resource usage of our containers enveloped as Pods reaches the specified limits, Kubernetes might kill those containers if there's not enough memory for all. It does that as a way to protect the integrity of the rest of the system. Killed Pods are not a permanent problem since Kubernetes will almost immediately reschedule them if there is enough capacity.

If we do use Cluster Autoscaling, even if there isn't enough capacity, new nodes will be added as soon as it detects that some Pods are in the pending state (unschedulable). So, the world is not likely to end if resource usage goes over the limits.

Nevertheless, killing and rescheduling Pods can result in downtime. There are, apparently, worse scenarios that might happen. But we won't go into them. Instead, we'll assume that we should be aware that a Pod is about to reach its limits and that we might want to investigate what's going on and we that might need to take some corrective measures. Maybe the latest release introduced a memory leak? Or perhaps the load increased beyond what we expected and tested and that results in increased memory usage. The cause of using memory that is close to the limit is not the focus right now. Detecting that we are reaching the limit is.

First, we'll go back to Prometheus' graph screen.

 1  open "http://$PROM_ADDR/graph"

We already know that we can get actual memory usage through the container_memory_usage_bytes metric. Since we already explored how to get requested memory, we can guess that limits are similar. They indeed are, and they can be retrieved through the kube_pod_container_resource_limits_memory_bytes. Since one of the metrics is the same as before, and the other is very similar, we'll jump straight into executing the full query.

Please type the expression that follows, press the Execute button, and switch to the Graph tab.

 1  sum(label_join(
 2    container_memory_usage_bytes{
 3      namespace!="kube-system"
 4    }, 
 5    "pod", 
 6    ",", 
 7    "pod_name"
 8  ))
 9  by (pod) /
10  sum(
11    kube_pod_container_resource_limits_memory_bytes{
12      namespace!="kube-system"
13    }
14  )
15  by (pod)

In my case (screenshot following), we can see that quite a few Pods use more memory than what is defined as their limits.

Fortunately, I do have spare capacity in my cluster, and there is no imminent need for Kubernetes to kill any of the Pods. Moreover, the issue might not be in Pods using more than what is set as their limits, but that not all containers in those Pods have the limits set. In either case, I should probably update the definition of those Pods/containers and make sure that their limits are above their average usage over a few days or even weeks.

Figure 3-46: Prometheus' graph screen with the percentage of container memory usage based on memory limits and with those from the kube-system Namespace excluded

Next, we'll go through the drill of exploring the difference between the old and the new version of the values.

 1  diff mon/prom-values-req-cpu.yml 
 2      mon/prom-values-limit-mem.yml

The output is as follows.

175c175
<   for: 1m
---
>   for: 1h
184c184
<   for: 6m
---
>   for: 6h
190a191,199
> - alert: MemoryAtTheLimit
>   expr: sum(label_join(container_memory_usage_bytes{namespace!="kube-system"}, "pod", ",", "pod_name")) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace!="kube-system"}) by (pod) > 0.8
>   for: 1h
>   labels:
>     severity: notify
>     frequency: low
>   annotations:
>     summary: Memory usage is almost at the limit
>     description: At least one Pod uses memory that is close it its limit

Apart from restoring sensible thresholds for the alerts we used before, we defined a new alert called MemoryAtTheLimit. It will fire if the actual usage is over eighty percent (0.8) of the limit for more than one hour (1h).

Next is the upgrade of our Prometheus Chart.

 1  helm upgrade -i prometheus 
 2    stable/prometheus 
 3    --namespace metrics 
 4    --version 7.1.3 
 5    --set server.ingress.hosts={$PROM_ADDR} 
 6    --set alertmanager.ingress.hosts={$AM_ADDR} 
 7    -f mon/prom-values-limit-mem.yml

Finally, we can open the Prometheus' alerts screen and confirm that the new alert was indeed added to the mix.

 1  open "http://$PROM_ADDR/alerts"

We won't go through the drill of creating a similar alert for CPU. You should know how to do that yourself.

Table of Contents for Comparing actual resource usage with defined limits

Create new playlist

Sign In

Sign Up

Table of Contents for
Comparing actual resource usage with defined limits