A quick introduction to Prometheus and Alertmanager

We'll continue the trend of using Helm as the installation mechanism. Prometheus' Helm Chart is maintained as one of the official Charts. You can find more info in the project's README (https://github.com/helm/charts/tree/master/stable/prometheus). If you focus on the variables in the Configuration section (https://github.com/helm/charts/tree/master/stable/prometheus#configuration), you'll notice that there are quite a few things we can tweak. We won't go through all the variables. You can check the official documentation for that. Instead, we'll start with a basic setup, and extend it as our needs increase.

Let's take a look at the variables we'll use as a start.

 1  cat mon/prom-values-bare.yml

The output is as follows.

server:
  ingress:
    enabled: true
    annotations:
      ingress.kubernetes.io/ssl-redirect: "false"
      nginx.ingress.kubernetes.io/ssl-redirect: "false"
  resources:
    limits:
      cpu: 100m
      memory: 1000Mi
    requests:
      cpu: 10m
      memory: 500Mi
alertmanager:
  ingress:
    enabled: true
    annotations:
      ingress.kubernetes.io/ssl-redirect: "false"
      nginx.ingress.kubernetes.io/ssl-redirect: "false"
  resources:
    limits:
      cpu: 10m
      memory: 20Mi
    requests:
      cpu: 5m
      memory: 10Mi
kubeStateMetrics:
  resources:
    limits:
      cpu: 10m
      memory: 50Mi
    requests:
      cpu: 5m
      memory: 25Mi
nodeExporter:
  resources:
    limits:
      cpu: 10m
      memory: 20Mi
    requests:
      cpu: 5m
      memory: 10Mi
pushgateway:
  resources:
    limits:
      cpu: 10m
      memory: 20Mi
        requests:
      cpu: 5m
      memory: 10Mi

All we're doing for now is defining resources for all five applications we'll install, as well as enabling Ingress with a few annotations that will make sure that we are not redirected to HTTPS version since we do not have certificates for our ad-hoc domains. We'll dive into the applications that'll be installed later. For now, we'll define the addresses for Prometheus and Alertmanager UIs.

 1  PROM_ADDR=mon.$LB_IP.nip.io
 2
 3  AM_ADDR=alertmanager.$LB_IP.nip.io

Let's install the Chart.

 1  helm install stable/prometheus 
 2      --name prometheus 
 3      --namespace metrics 
 4      --version 7.1.3 
 5      --set server.ingress.hosts={$PROM_ADDR} 
 6      --set alertmanager.ingress.hosts={$AM_ADDR} 
 7      -f mon/prom-values-bare.yml

The command we just executed should be self-explanatory, so we'll jump into the relevant parts of the output.

...
RESOURCES:
==> v1beta1/DaemonSet
NAME                     DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
prometheus-node-exporter 3       3       0     3          0         <none>        3s

==> v1beta1/Deployment
NAME                          DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
prometheus-alertmanager       1       1       1          0         3s
prometheus-kube-state-metrics 1       1       1          0         3s
prometheus-pushgateway        1       1       1          0         3s
prometheus-server             1       1       1          0         3s
...

We can see that the Chart installed one DeamonSet and four Deployments.

The DeamonSet is Node Exporter, and it'll run a Pod on every node of the cluster. It provides node-specific metrics that will be pulled by Prometheus. The second exporter (Kube State Metrics) runs as a single replica Deployment. It fetches data from Kube API and transforms them into the Prometheus-friendly format. The two will provide most of the metrics we'll need. Later on, we might choose to expand them with additional exporters. For now, those two together with metrics fetched directly from Kube API should provide more metrics than we can absorb in a single chapter.

Further on, we got the Server, which is Prometheus itself. Alertmanager will forward alerts to their destination. Finally, there is Pushgateway that we might explore in one of the following chapters.

While waiting for all those apps to become operational, we might explore the flow between them.

Prometheus Server pulls data from exporters. In our case, those are Node Exporter and Kube State Metrics. The job of those exporters is to fetch data from the source and transform it into the Prometheus-friendly format. Node Exporter gets the data from /proc and /sys volumes mounted on the nodes, while Kube State Metrics gets it from Kube API. Metrics are stored internally in Prometheus.

Apart from being able to query that data, we can define alerts. When an alert reaches its threshold, it is forwarded to Alertmanager that acts as a crossroad.

Depending on its internal rules, it can forward those alerts further to various destinations like Slack, email, and HipChat (only to name a few).

Figure 3-1: The flow of data to and from Prometheus (arrows indicate the direction)

By now, Prometheus Server probably rolled out. We'll confirm that just in case.

 1  kubectl -n metrics 
 2      rollout status 
 3      deploy prometheus-server

Let's take a look at what is inside the Pod created through the prometheus-server Deployment.

 1  kubectl -n metrics 
 2      describe deployment 
 3      prometheus-server

The output, limited to the relevant parts, is as follows.

  Containers:
   prometheus-server-configmap-reload:
    Image: jimmidyson/configmap-reload:v0.2.2
    ...
   prometheus-server:
    Image: prom/prometheus:v2.4.2
    ...

Besides the container based on the prom/prometheus image, we got another one created from jimmidyson/configmap-reload. The job of the latter is to reload Prometheus whenever we change the configuration stored in a ConfigMap.

Next, we might want to take a look at the prometheus-server ConfigMap, since it stores all the configuration Prometheus needs.

 1  kubectl -n metrics 
 2      describe cm prometheus-server

The output, limited to the relevant parts, is as follows.

...
Data
====
alerts:
----
{}

prometheus.yml:
----
global:
  evaluation_interval: 1m
  scrape_interval: 1m
  scrape_timeout: 10s

rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
  static_configs:
  - targets:
    - localhost:9090
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-apiservers
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: default;kubernetes;https
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_service_name
    - __meta_kubernetes_endpoint_port_name
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
...

We can see that the alerts are still empty. We'll change that soon.

Further down is the prometheus.yml config with scrape_configs taking most of the space. We could spend a whole chapter explaining the current config and the ways we could modify it. We will not do that because the config in front of you is bordering insanity. It's the prime example of how something can be made more complicated than it should be. In most cases, you should keep it as-is. If you do want to fiddle with it, please consult the official documentation.

Next, we'll take a quick look at Prometheus' screens.

A note to Windows users
Git Bash might not be able to use the open command. If that's the case, replace open with echo. As a result, you'll get the full address that should be opened directly in your browser of choice.

 1  open "http://$PROM_ADDR/config"

The config screen reflects the same information we already saw in the prometheus-server ConfigMap, so we'll move on.

Next, let's take a look at the targets.

 1  open "http://$PROM_ADDR/targets"

That screen contains seven targets, each providing different metrics. Prometheus is periodically pulling data from those targets.

All the outputs and screenshots in this chapter are taken from AKS. You might see some differences depending on your Kubernetes flavor.

You might notice that this chapter contains much more screenshots than any other. Even though it might look like there are too many, I wanted to make sure that you can compare your results with mine, since there will be inevitable differences that might sometimes look confusing if you do not have a reference (my screenshots).

Figure 3-2: Prometheus' targets screen

A note to AKS users
The kubernetes-apiservers target might be red indicating that Prometheus cannot connect to it. That's OK since we won't use its metrics.

A note to minikube users
The kubernetes-service-endpoints target might have a few sources in red. There's no reason for alarm. Those are not reachable, but that won't affect our exercises.

We cannot find out what each of those targets provides from that screen. We'll try to query the exporters in the same way as Prometheus pulls them.

To do that, we'll need to find out the Services through which we can access the exporters.

 1  kubectl -n metrics get svc

The output, from AKS, is as follows.

NAME                          TYPE      CLUSTER-IP    EXTERNAL-IP PORT(S)  AGE
prometheus-alertmanager       ClusterIP 10.23.245.165 <none>      80/TCP   41d
prometheus-kube-state-metrics ClusterIP None          <none>      80/TCP   41d
prometheus-node-exporter      ClusterIP None          <none>      9100/TCP 41d
prometheus-pushgateway        ClusterIP 10.23.244.47  <none>      9091/TCP 41d
prometheus-server             ClusterIP 10.23.241.182 <none>      80/TCP   41d

We are interested in prometheus-kube-state-metrics and prometheus-node-exporter since they provide access to data from the exporters we'll use in this chapter.

Next, we'll create a temporary Pod through which we'll access the data available through the exporters behind those Services.

 1  kubectl -n metrics run -it test 
 2      --image=appropriate/curl 
 3      --restart=Never 
 4      --rm 
 5      -- prometheus-node-exporter:9100/metrics

We created a new Pod based on appropriate/curl. That image serves a single purpose of providing curl. We specified prometheus-node-exporter:9100/metrics as the command, which is equivalent to running curl with that address. As a result, a lot of metrics were output. They are all in the same key/value format with optional labels surrounded by curly braces ({ and }). On top of each metric, there is a HELP entry that explains its function as well as TYPE (for example, gauge). One of the metrics is as follows.

 1  # HELP node_memory_MemTotal_bytes Memory information field
    MemTotal_bytes.
 2  # TYPE node_memory_MemTotal_bytes gauge
 3  node_memory_MemTotal_bytes 3.878477824e+09

We can see that it provides Memory information field MemTotal_bytes and that the type is gauge. Below the TYPE is the actual metric with the key (node_memory_MemTotal_bytes) and value 3.878477824e+09.

Most of Node Exporter metrics are without labels. So, we'll have to look for an example in the prometheus-kube-state-metrics exporter.

 1  kubectl -n metrics run -it test 
 2      --image=appropriate/curl 
 3      --restart=Never 
 4      --rm 
 5      -- prometheus-kube-state-metrics:8080/metrics

As you can see, the Kube State metrics follow the same pattern as those from the Node Exporter. The major difference is that most of them do have labels. An example is as follows.

 1  kube_deployment_created{deployment="prometheus-
    server",namespace="metrics"} 1.535566512e+09

That metric represents the time the Deployment prometheus-server was created inside the metrics Namespace.

I'll leave it to you to explore those metrics in more detail. We'll use quite a few of them soon.

For now, just remember that with the combination of the metrics coming from the Node Exporter, Kube State Metrics, and those coming from Kubernetes itself, we can cover most of our needs. Or, to be more precise, those provide data required for most of the basic and common use cases.

Next, we'll take a look at the alerts screen.

 1  open "http://$PROM_ADDR/alerts"

The screen is empty. Do not despair. We'll get back to that screen quite a few times. The alerts we'll be increasing as we progress. For now, just remember that's where you can find your alerts.

Finally, we'll open the graph screen.

 1  open "http://$PROM_ADDR/graph"

That is where you'll spend your time debugging issues you'll discover through alerts.

As our first task, we'll try to retrieve information about our nodes. We'll use kube_node_info so let's take a look at its description (help) and its type.

 1  kubectl -n metrics run -it test 
 2      --image=appropriate/curl 
 3      --restart=Never 
 4      --rm 
 5      -- prometheus-kube-state-metrics:8080/metrics 
 6      | grep "kube_node_info"

The output, limited to the HELP and TYPE entries, is as follows.

 1  # HELP kube_node_info Information about a cluster node.
 2  # TYPE kube_node_info gauge
 3  ...

You are likely to see variations between your results and mine. That's normal since our clusters probably have different amounts of resources, my bandwidth might be different, and so on. In some cases, my alerts will fire, and yours won't, or the other way around. I'll do my best to explain my experience and provide screenshots that accompany them. You'll have to compare that with what you see on your screen.

Now, let's try using that metric in Prometheus.

Please type the following query in the expression field.

 1  kube_node_info

Click the Execute button to retrieve the values of the kube_node_info metric.

Unlike previous chapters, the Gist from this one (03-monitor.sh (https://gist.github.com/vfarcic/718886797a247f2f9ad4002f17e9ebd9)) contains not only the commands but also Prometheus expressions. They are all commented (with #). If you're planning to copy and paste the expressions from the Gist, please exclude the comments. Each expression has # Prometheus expression comment on top to help you identify it. As an example, the one you just executed is written in the Gist as follows. # Prometheus expression # kube_node_info

If you check the HELP entry of the kube_node_info, you'll see that it provides information about a cluster node and that it is a gauge. A gauge (https://prometheus.io/docs/concepts/metric_types/#gauge) is a metric that represents a single numerical value that can arbitrarily go up and down.

That makes sense for information about nodes since their number can increase or decrease over time.

A Prometheus gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

If we focus on the output, you'll notice that there are as many entries as there are worker nodes in the cluster. The value (1) is useless in this context. Labels, on the other hand, can provide some useful information. For example, in my case, operating system (os_image) is Ubuntu 16.04.5 LTS. Through that example, we can see that we can use the metrics not only to calculate values (for example, available memory) but also to get a glimpse into the specifics of our system.

Figure 3-3: Prometheus' console output of the kube_node_info metric

Let's see if we can get a more meaningful query by combining that metric with one of the Prometheus' functions. We'll count the number of worker nodes in our cluster. The count is one of Prometheus' aggregation operators (https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators).

Please execute the expression that follows.

 1  count(kube_node_info)

The output should show the total number of worker nodes in your cluster. In my case (AKS) there are 3. On the first look, that might not be very helpful. You might think that you should know without Prometheus how many nodes you have in your cluster. But that might not be true. One of the nodes might have failed, and it did not recuperate. That is especially true if you're running your cluster on-prem without scaling groups. Or maybe Cluster Autoscaler increased or decreased the number of nodes. Everything changes over time, either due to failures, through human actions, or through a system that adapts itself. No matter the reasons for volatility, we might want to be notified when something reaches a threshold. We'll use nodes as the first example.

Our mission is to define an alert that will notify us if there are more than three or less than one nodes in the cluster. We'll imagine that those are our limits and that we want to know if the lower or the upper thresholds are reached due to failures or Cluster Autoscaling.

We'll take a look at a new definition of the Prometheus Chart's values. Since the definition is big and it will grow with time, from now on, we'll only look at the differences.

 1  diff mon/prom-values-bare.yml 
 2      mon/prom-values-nodes.yml

The output is as follows.

> serverFiles:
>   alerts:
>     groups:
>     - name: nodes
>       rules:
>       - alert: TooManyNodes
>         expr: count(kube_node_info) > 3
>         for: 15m
>         labels:
>           severity: notify
>         annotations:
>           summary: Cluster increased
>           description: The number of the nodes in the cluster increased
>       - alert: TooFewNodes
>         expr: count(kube_node_info) < 1
>         for: 15m
>         labels:
>           severity: notify
>         annotations:
>           summary: Cluster decreased
>           description: The number of the nodes in the cluster decreased

We added a new entry serverFiles.alerts. If you check Prometheus' Helm documentation, you'll see that it allows us to define alerts (hence the name). Inside it, we're using the "standard" Prometheus syntax for defining alerts.

Please consult Alerting Rules documentation (https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) for more info about the syntax.

We defined only one group of rules called nodes. Inside it are two rules. The first one (TooManyNodes) will notify us if there are more than 3 nodes for more than 15 minutes. The other (TooFewNodes) will do the opposite. It'll tell us if there are no nodes (<1) for 15 minutes. Both rules have labels and annotations that, for now, serve only informational purposes. Later on, we'll see their real usage.

Let's upgrade our Prometheus' Chart and see the effect of the new alerts.

 1  helm upgrade -i prometheus 
 2    stable/prometheus 
 3    --namespace metrics 
 4    --version 7.1.3 
 5    --set server.ingress.hosts={$PROM_ADDR} 
 6    --set alertmanager.ingress.hosts={$AM_ADDR} 
 7    -f mon/prom-values-nodes.yml

It'll take a few moments until the new configuration is "discovered" and Prometheus is reloaded. After a while, we can open the Prometheus alerts screen and check whether we got our first entries.

From now on, I won't comment (much) on the need to wait for a while until next config is propagated. If what you see on the screen does not coincide with what you're expecting, please wait for a while and refresh it.

 1  open "http://$PROM_ADDR/alerts"

You should see two alerts.

Both alerts are green since none evaluates to true. Depending on the Kuberentes flavor you choose, you either have only one node (for example, Docker for Desktop and minikube) or you have three nodes (for example, GKE, EKS, AKS). Since our alerts are checking whether we have less than one, or more than three nodes, neither of the conditions are met, no matter which Kubernetes flavor you're using.

If your cluster was not created through one of the Gists provided at the beginning of this chapter, then you might have more than three nodes in your cluster, and the alert will fire. If that's the case, I suggest you modify the mon/prom-values-nodes.yml file to adjust the threshold of the alert.

Figure 3-4: Prometheus' alerts screen

Seeing inactive alerts is boring, so I want to show you one that fires (becomes red). To do that, we can add more nodes to the cluster (unless you're using a single node cluster like Docker for Desktop and minikube). However, it would be easier to modify the expression of one of the alerts, so that's what we'll do next.

 1  diff mon/prom-values-nodes.yml 
 2      mon/prom-values-nodes-0.yml

The output is as follows.

57,58c57,58
< expr: count(kube_node_info) > 3
< for: 15m
---
> expr: count(kube_node_info) > 0
> for: 1m
66c66
< for: 15m
---
> for: 1m

The new definition changed the condition of the TooManyNodes alert to fire if there are more than zero nodes. We also changed the for statement so that we do not need to wait for 15 minutes before the alert fires.

Let's upgrade the Chart one more time.

 1  helm upgrade -i prometheus 
 2    stable/prometheus 
 3    --namespace metrics 
 4    --version 7.1.3 
 5    --set server.ingress.hosts={$PROM_ADDR} 
 6    --set alertmanager.ingress.hosts={$AM_ADDR} 
 7    -f mon/prom-values-nodes-0.yml

... and we'll go back to the alerts screen.

 1  open "http://$PROM_ADDR/alerts"

A few moments later (don't forget to refresh the screen), the alert will switch to the pending state, and the color will change to yellow. That means that the conditions for the alert are met (we do have more than zero nodes) but the for period did not yet expire.

Wait for a minute (duration of the for period) and refresh the screen. The alert's state switched to firing and the color changed to red. Prometheus sent our first alert.

Figure 3-5: Prometheus' alerts screen with one of the alerts firing

Where was the alert sent? Prometheus Helm Chart deployed Alertmanager and pre-configured Prometheus to send its alerts there. Let's take a look at it's UI.

 1  open "http://$AM_ADDR"

We can see that one alert reached Alertmanager. If we click the + info button next to the TooManyNodes alert, we'll see the annotations (summary and description) as well as the labels (severity).

Figure 3-6: Alertmanager UI with one of the alerts expanded

We are likely not going to sit in front of the Alertmanager waiting for issues to appear. If that would be our goal, we could just as well wait for the alerts in Prometheus.

Displaying alerts is indeed not the reason why we have Alertmanager. It is supposed to receive alerts and dispatch them further. It is not doing anything of that sort simply because we did not yet define the rules it should use to forward alerts. That's our next task.

We'll take a look at yet another update of the Prometheus Chart values.

 1  diff mon/prom-values-nodes-0.yml 
 2      mon/prom-values-nodes-am.yml

The output is as follows.

71a72,93
> alertmanagerFiles:
>   alertmanager.yml:
>     global: {}
>     route:
>       group_wait: 10s
>       group_interval: 5m
>       receiver: slack
>       repeat_interval: 3h
>       routes:
>       - receiver: slack
>         repeat_interval: 5d
>         match:
>           severity: notify
>           frequency: low
>     receivers:
>     - name: slack
>       slack_configs:
>       - api_url: "https://hooks.slack.com/services/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"
>         send_resolved: true
>         title: "{{ .CommonAnnotations.summary }}"
>         text: "{{ .CommonAnnotations.description }}"
>         title_link: http://my-prometheus.com/alerts

When we apply that definition, we'll add alertmanager.yml file to Alertmanager. If contains the rules it should use to dispatch alerts. The route section contains general rules that will be applied to all alerts that do not match one of the routes. The group_wait value makes Alertmanager wait for 10 seconds in case additional alerts from the same group arrive. That way, we'll avoid receiving multiple alerts of the same type.

When the first alert of a group is dispatched, it'll use the value of the group_interval field (5m) before sending the next batch of the new alerts from the same group.

The receiver field in the route section defines the default destination of the alerts. Those destinations are defined in the receivers section below. In our case, we're sending the alerts to the slack receiver by default.

The repeat_interval (set to 3h) defines the period after which alerts will be resent if Alertmanager continues receiving them.

The routes section defines specific rules. Only if none of them match, those in the route section above will be used. The routes section inherits properties from above so only those that we define in this section will change. We'll keep sending matching routes to slack, and the only change is the increase of the repeat_interval from 3h to 5d.

The critical part of the routes is the match section. It defines filters that are used to decide whether an alert is a match or not. In our case, only those with the labels severity: notify and frequency: low will be considered a match.

All in all, the alerts with severity label set to notify and frequency set to low will be resent every five days. All the other alerts will have a frequency of three hours.

The last section of our Alertmanager config is receivers. We have only one receiver named slack. Below the name is slack_config. It contains Slack-specific configuration. We could have used hipchat_config, pagerduty_config, or any other of the supported ones. Even if our destination is not one of those, we could always fall back to webhook_config and send a custom request to the API of our tool of choice.

For the list of all the supported receivers, please consult Alertmanager Configuration page (https://prometheus.io/docs/alerting/configuration/).

Inside slack_configs section, we have the api_url that contains the Slack address with the token from one of the rooms in the devops20 channel.

For information how to general an incoming webhook address for your Slack channel, please visit the Incoming Webhooks page (https://api.slack.com/incoming-webhooks).

Next is the send_resolved flag. When set to true, Alertmanager will send notifications not only when an alert is fired, but also when the issue that caused it is resolved.

We're using summary annotation as the title of the message, and the description annotation for the text. Both are using Go Templates (https://golang.org/pkg/text/template/). Those are the same annotations we defined in the Prometheus' alerts.

Finally, the title_link is set to http://my-prometheus.com/alerts. That is indeed not the address of your Prometheus UI but, since I could not know in advance what will be your domain, I put a non-existing one. Feel free to change my-prometheus.com to the value of the environment variable $PROM_ADDR. Or just leave it as-is knowing that if you click the link, it will not take you to your Prometheus UI.

Now that we explored Alertmanager configuration, we can proceed and upgrade the Chart.

 1  helm upgrade -i prometheus 
 2    stable/prometheus 
 3    --namespace metrics 
 4    --version 7.1.3 
 5    --set server.ingress.hosts={$PROM_ADDR} 
 6    --set alertmanager.ingress.hosts={$AM_ADDR} 
 7    -f mon/prom-values-nodes-am.yml

A few moments later, Alertmanager will be reconfigured, and the next time it receives the alert from Prometheus, it'll dispatch it to Slack. We can confirm that by visiting the devops20.slack.com workspace. If you did not register already, please go to slack.devops20toolkit.com. Once you are a member, we can visit the devops25-tests channel.

 1  open "https://devops20.slack.com/messages/CD8QJA8DS/"

You should see the Cluster increased notification. Don't get confused if you see other messages. You are likely not the only one running the exercises from this book.

Figure 3-7: Slack with an alert message received from Alertmanager

Sometimes, for reasons I could not figure out, Slack receives empty notifications from Alertmanager. For now, I'm ignoring the issue out of laziness.

Now that we went through the basic usage of Prometheus and Alertmanager, we'll take a break from hands-on exercises and discuss the types of metrics we might want to use.

Table of Contents for A quick introduction to Prometheus and Alertmanager

Create new playlist

Sign In

Sign Up

Table of Contents for
A quick introduction to Prometheus and Alertmanager