Rule file configuration

Previously, we took a look at the Prometheus configuration file; we'll now move onto the provided alerting rules example, which we can see in the following snippet:

vagrant@prometheus:~$ cat /etc/prometheus/alerting_rules.yml 
groups:
- name: alerting_rules
 rules:
 - alert: NodeExporterDown
 expr: up{job="node"} != 1
 for: 1m
 labels:
 severity: "critical"
 annotations:
 description: "Node exporter {{ $labels.instance }} is down."
 link: "https://example.com"

Let's look at the NodeExporterDown alert definition more closely. We can split the configuration into five distinct sections: alert, expr, for, labels, and annotations. We'll now go over each one of these in the next table:

Section	Description	Mandatory
`alert`	The alert name to use	Yes
`expr`	The PromQL expression to evaluate	Yes
`for`	The time to ensure that the alert is being triggered before sending the alert, defaults to 0	No
`labels`	User-defined key-value pairs	No
`annotations`	User-defined key-value pairs	No

The Prometheus community typically uses CamelCase for alert naming.

Prometheus does not carry out validation to check whether an alert name is already in use, so it is possible for two or more alerts to share the same name but evaluate different expressions. This might cause issues, such as tracking which specific alert is triggering, or writing tests for alerts.

The NodeExporterDown rule will only trigger when the up metric with the job=”node” selector is not 1 for more than one minute, which we'll now test by stopping the Node Exporter service:

vagrant@prometheus:~$ sudo systemctl stop node-exporter
vagrant@prometheus:~$ sudo systemctl status node-exporter

...
Mar 05 20:49:40 prometheus systemd[1]: Stopping Node Exporter...
Mar 05 20:49:40 prometheus systemd[1]: Stopped Node Exporter.

We're now forcing an alert to become active. This will force the alert to go through three different states:

Order	State	Description
1	Inactive	Not yet pending or firing
2	Pending	Not yet active long enough to become firing
3	Firing	Active `for` more than the defined `for` clause threshold

Going to the /alerts endpoint on the Prometheus server web interface, we can visualize the three different states for the NodeExporterDown alert. First, the alert is inactive, as we can see in the following figure:

Figure 9.3: The NodeExporterDown alert is inactive

Then, we can see the alert in a pending state. This means that, while the alert condition has been triggered, Prometheus will continue to check whether that condition keeps being triggered for each evaluation cycle until the for duration has passed. The next figure illustrates the pending state; notice that the Show annotations tick box is selected, which expands the alert annotations:

Figure 9.4: The NodeExporterDown alert is pending

Finally, we can see the alert turn to firing. This means that the alert is active for more than the duration defined by the for clause – in this case, 1 minute:

Figure 9.5: The NodeExporterDown alert is firing

When an alert becomes firing, Prometheus sends a JSON payload to the configured alerting service endpoint, which, in our case, is the alertdump service, which is configured to log to the /vagrant/cache/alerting.log file. This makes it very easy to understand what kind of information is being sent and can be validated as follows:

vagrant@prometheus:~$ cat /vagrant/cache/alerting.log

[
   {
       "labels": {
           "alertname": "NodeExporterDown",
           "dc": "dc1",
           "instance": "prometheus:9100",
           "job": "node",
           "prom": "prom1",
           "severity": "critical"
       },
       "annotations": {
           "description": "Node exporter prometheus:9100 is down.",
           "link": "https://example.com"
       },
       "startsAt": "2019-03-04T21:51:15.04754979Z",
       "endsAt": "2019-03-04T21:58:15.04754979Z",
       "generatorURL": "http://prometheus:9090/graph?g0.expr=up%7Bjob%3D%22node%22%7D+%21%3D+1&g0.tab=1"
   }
]

Now that we've seen how to configure some alerting rules and validated what Prometheus is sending to the configured alerting system, let's explore how to enrich those alerts with contextual information by using labels and annotations.

Table of Contents for Rule file configuration

Create new playlist

Sign In

Sign Up

Table of Contents for
Rule file configuration