Configuring alerts in Prometheus

In this part, we will look at how to design alerts in Prometheus. First, however, we need to understand a few concepts that are used in Prometheus:

  • Metrics: Metrics are a core concept of Prometheus. We can expose these from our codes, and Prometheus will store them in a time-series format. We can then use them with flexible query language.
  • Labels: Prometheus indicates the service that a particular metric applies to. Labels in Prometheus are arbitrary, and, as such, they can be much more powerful than just which service/instance exposed a metric.

In the following example, http_failure_request is the metric that denotes all the points collected by Prometheus for the product page service, which exposes an HTTP failure request. For example, service="productpage" is a label, which denotes that this particular http_failure_request metric is for the productpage service:

# Request counter for the Product Page service( Application created in ISTIO)
http_failure_request{service="productpage"}

Prometheus can gather metrics from services, VMs, infrastructure, or any other third-party application. To expose and scrape the metrics, it uses the /metrics URLs, which return a full list of metrics with label sets and their values without any calculation:

The syntax for how you can create Prometheus alert rules using annotations is as follows:

alert: Lots_Of_product_page_Jobs_In_Queue
expr: sum(jobs_in_queue{service="productpage"}) > 100
for: 15m
labels:
severity: minor
annotations:
summary: Product page queue appears to be building up (consistently more than 100
jobs waiting)
dashboard: https://grafana.monitoring.intra/dashboard/db/productpage-overview
impact: Product page is experiencing delays, causing orders to be marked as pending
runbook: https://wiki-internal/runbooks/productpage-queues.html
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.239.46