Summary

We started this chapter by discussing actions. Actions are the things controlling what is performed when a trigger fires, and they have a very wide range of things to configure at various levels, including conditions of various precision, message contents, and actual operations performed, starting with simple email sending and using custom scripts, and ending with the powerful remote command execution. We also learned about other things affecting actions, such as user media configuration and user permissions.

Let's refresh our memory on what alerting-related concepts are available:

Trigger is a problem definition including a severity level, with the trigger expression containing information on calculations and thresholds. Event is something happening—that is, a trigger changing state from PROBLEM to OK, and so on. Action is a configuration entity, with specific sets of conditions that determine when it is invoked and the operations to be performed. Operation is an action property that defined what to do if this action is invoked, and escalations were configured with the help of operations. Alert or notification is the actual thing sent out—an email, SMS, or any other message.

In addition to simple one-time messages, we also figured out how the built-in escalations work in Zabbix, and escalated a few problems. While escalations allow us to produce fairly complex response scenarios, it is important to pay attention when configuring them. Once enabled, they allow us to perform different operations, based on how much time has passed since the problem occurred, and other factors. We discussed common issues with notifications, including the fact that users must have permission to view a host to receive notifications about it, and recovery messages only being sent to the users who received the original problem message.

By now, we have learned of three ways to avoid trigger flapping, resulting in excessive notifications:

  • By using trigger expression functions such as min(), max(), and avg() to fire a trigger only if the values have been within a specific range for a defined period of time
  • By using hysteresis and only returning to the OK state if the current value is some comfort distance below (or above) the threshold
  • By creating escalations that skip the first few steps, thus only sending out messages if a problem has not been resolved for some time

The first two methods are different from the last one. Using different trigger functions and hysteresis changes the way the trigger works, impacting how soon it fires and how soon it turns off again. With escalations, we do not affect the trigger's behavior (thus they will still show up in Monitoring | Triggers and other locations), but we introduce delayed notification whenever a trigger fires.

Finally, we figured out what global scripts are and tried manually pinging a host and obtaining a list of the top CPU-using processes on it. As for action operations, we discussed several ways to react to a problem:

  • Sending an email
  • Running a command (executed either on the Zabbix agent or server)
  • Running an IPMI command
  • Running a command over SSH or Telnet
  • Reusing a global script

The last one allowed us to configure a script once and potentially reconfigure it for all systems in a single location.

When configuring triggers and actions, there are several little things that can both make life easier and introduce hard-to-spot problems. Hopefully, the coverage of the basics here will help you to leverage the former and avoid the latter.

In the next chapter, we will see how we can avoid configuring some of the things we already know, including items and triggers, on each host individually. We will use templates to manage such configurations on multiple hosts easily.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.131.110.169