Event generation and recovery expression

Trigger events are generated whenever a trigger changes state. A trigger can be in one of the following states:

  • OK: The normal state, when the trigger expression evaluates to false
  • PROBLEM: A problem state, when the trigger expression evaluates to true
  • UNKNOWN: A state when Zabbix cannot evaluate the trigger expression, usually when there is missing data
Refer to Chapter 20, Zabbix Maintenance, for information on how to get notifications about triggers becoming UNKNOWN.

No matter whether a trigger goes from OK to PROBLEM, UNKNOWN, or any other state, an event is generated.

There is also a way to customize this with the PROBLEM events generation mode multiple option in the trigger properties. We will discuss this option in Chapter 10, Advanced Item Monitoring.

We found out before that we can use certain trigger functions to avoid changing the trigger state after every change in data. By accepting a time period as a parameter, these functions allow us to react only if a problem has been going on for a while. But what if we would like to be notified as soon as possible, while still avoiding trigger flapping if values fluctuate near our threshold? Here, a specific Zabbix macro (or variable) helps and allows us to construct trigger expressions that have some sort of hysteresis—the remembering of state.

A common case is measuring temperatures. For example, a very simple trigger expression would read like this:

server:temp.last()>20 

It would fire when the temperature was 21 and go to the OK state when it's 20. Sometimes, temperature fluctuates around the set threshold value, so the trigger goes on and off all the time. This is undesirable, so an improved expression in versions before 3.2 would look like this:

({TRIGGER.VALUE}=0 and {server:temp.last()}>20) or 
({TRIGGER.VALUE}=1 and {server:temp.last()}>15) 

As this was rather complex, this feature was replaced in Zabbix 3.2 with a new, more user-friendly Recovery expression. The only thing we need to do now in our trigger is select the Recovery expression box from the OK event generation option:

You can also think of this as the trigger having two thresholds. One for the error state and one for the OK state. We expect it to switch to the PROBLEM state when the values pass the upper threshold at 20 degrees but resolve only when they fall below the lower threshold at 15 degrees:

How does that change the situation when compared to the simple expression that only checked for temperatures over 20 degrees? Let's have a look:

In this example case, we have avoided two unnecessary problem states, and that usually means at least two notifications as well. This is another way of preventing trigger flapping.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.43.140