Preventing trigger flapping

With the service items and triggers we wrote, the triggers would fire right away, as soon as the service was detected as being down. This can be undesirable if we know that a service will be down for a moment during an upgrade because of log rotation or backup requirements. We can use a different function to achieve a delayed reaction in such cases. Replacing the last() function with max() allows us to specify a parameter, and thus react only when the item values have indicated a problem for some time. For the trigger to fire only when a service has not responded for 5 minutes, we could use an expression such as this:

{A test host:net.tcp.service[ssh].max(300)}=0
For this example to work properly, the item interval must not exceed 5 minutes. If the item interval exceeds the trigger function's checking time, only a single value will be checked, making the use of a trigger function such as max() useless.

Remember that, for functions that accept seconds as a parameter, we can also use the count of returned values by prefixing the number with #, like this:

{A test host:net.tcp.service[ssh].max(#5)}=0 

In this case, the trigger would always check the five last returned values. Such an approach allows the trigger period to scale along if the item interval is changed, but it should not be used for items that can stop sending in data.

Using trigger functions is the easiest and most widely applied solution for potential trigger flapping. The previous service example checked that the maximum value over the last 5 minutes was 0, thus, we were sure that there are no values of 1, which would mean service is up.

For our CPU load trigger, we used the avg(180) function, checking the average value for the last 3 minutes. We could also have used min(180)—in this case, a single drop below the threshold would reset the 3-minute timer, even if the overall average was above the threshold. Which one should you use? That is entirely up to you, depending on what the functional requirements are. One way is not always better than the others.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.81.240