Optimizing alerts

An important thing to do after creating a series of alerts is to re-evaluate them on a regular basis. Two things that might come out of an evaluation such as this are as follows:

  • Changes in the alerting threshold: Evaluating alerts regularly involves taking a look at the metric over time and seeing where the alerting threshold is at now. This might lead to the conclusion that the threshold is too low or too high.
  • Removing duplicates: Looking at alerts that have been raised over the month(s), it is very likely to identify one or more groups of alerts that are always raised at the same time. For example, a set of alerts set on a specific web server can be so related that they are always raised at the same time. A common example is the CPU usage and the average response time for an HTTP request; these two often rise at the same time. If this is the case, it is worth considering either removing one of them or downgrading one of them to be a warning only. Duplicate alerts increase the number of items that need an immediate reaction, leading to increased pressure on the team without a clear benefit.

Constantly optimizing the set of alerts not only helps to reduce waste, but also prevents so-called alert fatigue.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.