route

This is ostensibly the most important configuration section of Alertmanager. In this section, we will define how to group alerts based on their labels (group_by), how long to wait for new alerts before sending additional notifications (group_interval), and how long to repeat them (repeat_interval), but most importantly, which receivers should be triggered for each alert batch (receiver). Since each route can have its own child routes, this forms a routing tree. The top-level route can't have any matching rules as it works like a catch-all for any alert that doesn't match any of its sub-routes. Each setting, except continue, made on a route is carried over to its child routes in a cascading fashion. Although the default behavior is to stop searching for a receiver when the most specific match possible is found, it is possible to set continue to true, making the matching process keep going, thereby allowing you to trigger multiple receivers.

You can find the following example route configuration in our test environment:

route:
receiver: operations
group_by: ['alertname', 'job']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h

routes:
- match_re:
job: (checkoutService|paymentService)
receiver: yellow-squad-email
routes:
- match:
severity: pager
receiver: yellow-squad-pager
...

The main route in the preceding example does the following:

  • Defines the operations receiver as the default route when no other sub-routes match
  • Groups incoming alerts by alertname and job
  • Waits 30 seconds for more alerts to arrive before sending the first notification to reduce the number of notifications for the same problem
  • Waits five minutes before sending additional notifications when new alerts are added to a batch
  • Resends a notification every four hours for each alert batch with the currently firing alerts

Additionally, it sets a sub-route for alerts whose job label matches either checkoutService or paymentService with its own receiver, yellow-squad-email. That sub-route, in turn, define its own child route that, if the severity label matches pager, should use the yellow-squad-pager receiver instead.

The official Prometheus website offers a routing tree editor and visualizer at https://prometheus.io/webtools/alerting/routing-tree-editor/.

The group_by clause can also take the sole value of ..., which will signal Alertmanager to not do any grouping for incoming alerts. This is very rarely used, as the purpose of grouping is to tune down the number of notifications so that the signal-to-noise ratio is high. One possible usage of this feature is to send every alert as-is to another system where alerts get processed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.183.172