Rule file configuration

Previously, we took a look at the Prometheus configuration file; we'll now move onto the provided alerting rules example, which we can see in the following snippet:

vagrant@prometheus:~$ cat /etc/prometheus/alerting_rules.yml 
groups:
- name: alerting_rules
rules:
- alert: NodeExporterDown
expr: up{job="node"} != 1
for: 1m
labels:
severity: "critical"
annotations:
description: "Node exporter {{ $labels.instance }} is down."
link: "https://example.com"

Let's look at the NodeExporterDown alert definition more closely. We can split the configuration into five distinct sections: alert, expr, for, labels, and annotations. We'll now go over each one of these in the next table:

Section

Description

Mandatory

alert

The alert name to use

Yes

expr

The PromQL expression to evaluate

Yes

for

The time to ensure that the alert is being triggered before sending the alert, defaults to 0

No

labels

User-defined key-value pairs

No

annotations

User-defined key-value pairs

No

The Prometheus community typically uses CamelCase for alert naming.
Prometheus does not carry out validation to check whether an alert name is already in use, so it is possible for two or more alerts to share the same name but evaluate different expressions. This might cause issues, such as tracking which specific alert is triggering, or writing tests for alerts.

The NodeExporterDown rule will only trigger when the up metric with the job=”node” selector is not 1 for more than one minute, which we'll now test by stopping the Node Exporter service:

vagrant@prometheus:~$ sudo systemctl stop node-exporter
vagrant@prometheus:~$ sudo systemctl status node-exporter

...
Mar 05 20:49:40 prometheus systemd[1]: Stopping Node Exporter...
Mar 05 20:49:40 prometheus systemd[1]: Stopped Node Exporter.

We're now forcing an alert to become active. This will force the alert to go through three different states:

Order

State

Description

1

Inactive

Not yet pending or firing

2

Pending

Not yet active long enough to become firing

3

Firing

Active for more than the defined for clause threshold

 

Going to the /alerts endpoint on the Prometheus server web interface, we can visualize the three different states for the NodeExporterDown alert. First, the alert is inactive, as we can see in the following figure:

Figure 9.3: The NodeExporterDown alert is inactive

Then, we can see the alert in a pending state. This means that, while the alert condition has been triggered, Prometheus will continue to check whether that condition keeps being triggered for each evaluation cycle until the for duration has passed. The next figure illustrates the pending state; notice that the Show annotations tick box is selected, which expands the alert annotations:

Figure 9.4: The NodeExporterDown alert is pending

Finally, we can see the alert turn to firing. This means that the alert is active for more than the duration defined by the for clause – in this case, 1 minute:

Figure 9.5: The NodeExporterDown alert is firing

When an alert becomes firing, Prometheus sends a JSON payload to the configured alerting service endpoint, which, in our case, is the alertdump service, which is configured to log to the /vagrant/cache/alerting.log file. This makes it very easy to understand what kind of information is being sent and can be validated as follows:

vagrant@prometheus:~$ cat /vagrant/cache/alerting.log
[
{
"labels": {
"alertname": "NodeExporterDown",
"dc": "dc1",
"instance": "prometheus:9100",
"job": "node",
"prom": "prom1",
"severity": "critical"
},
"annotations": {
"description": "Node exporter prometheus:9100 is down.",
"link": "https://example.com"
},
"startsAt": "2019-03-04T21:51:15.04754979Z",
"endsAt": "2019-03-04T21:58:15.04754979Z",
"generatorURL": "http://prometheus:9090/graph?g0.expr=up%7Bjob%3D%22node%22%7D+%21%3D+1&g0.tab=1"
}
]

Now that we've seen how to configure some alerting rules and validated what Prometheus is sending to the configured alerting system, let's explore how to enrich those alerts with contextual information by using labels and annotations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.255