Taking an action

Just as items only provide raw data and triggers are independent from them as they can access virtually any item's historical data, triggers, in turn, only provide a status change. This change is recorded as an event just as measurements are recorded as item data. This means that triggers don't provide any reporting functionality; they just check their conditions and change the status accordingly. Once again, what may seem to be a limitation and lack of power turns out to be the exact opposite as the Zabbix component in charge of actually sending out alerts or trying to automatically resolve some problems is completely independent from triggers. This means that just as triggers can access any item's data, actions can access any trigger's name, severity, or status so that, once again, you can create the perfect mix of very general and very specific actions without being stuck in a one-action-per-trigger scheme.

Unlike triggers, actions are also completely independent from hosts and templates. Every action is always globally defined and its conditions checked against every single Zabbix event. As you'll see in the following paragraphs, this may force you to create certain explicit conditions instead of implicit conditions, but that's balanced out by the fact that you won't have to create similar but different actions for similar events just because they are related to different hosts.

An action is composed of the following three different parts that work together to provide all the functionality needed:

  • Action definition
  • Action conditions
  • Action operations

The fact that every action has a global scope is reflected in every one of its components, but it assumes critical importance when it comes to action conditions as it's the place where you decide which action should be executed based on which events. But let's not get ahead of ourselves, and let's see a couple of interesting things about each component.

Defining an action

This is where you decide a name for the action and can define a default message that can be sent as a part of the action itself. In the message, you can reference specific data about the event, such as the host, item, and trigger names, item and trigger values, and URLs. Here, you can leverage the fact that actions are global by using macros so that a single action definition could be used for every single event in Zabbix and yet provide useful information in its message.

You can see a few interesting macros already present in the default message when you create a new action, as shown in the following screenshot:

Defining an action

Most of them are pretty self-explanatory, but it's interesting to see how you can, of course, reference a single trigger—the one that generated the event. On the other hand, as a trigger can check multiple items from multiple hosts, you can reference all the hosts and items involved (up to nine different hosts and/or items) so that you can get a picture of what's happening by just reading the message.

Other interesting macros can make the message even more useful and expressive. Just remember that the default message can be sent not only via e-mail, but also via chat or SMS; you'll probably want to create different default actions with different messages for different media types so that you can calibrate the amount of information provided based on the media available.

You can see the complete list of supported macros in the official documentation wiki at https://www.zabbix.com/documentation/2.4/manual/appendix/macros/supported_by_location, so we'll look at just a couple of the most interesting ones.

The {EVENT.DATE} and {EVENT.TIME} macros

These two macros can help you to differentiate between the time a message is sent and the time of the event itself. It's particularly useful not only for repeated or escalated actions, but also for all media where a timestamp is not immediately apparent.

The {INVENTORY.SERIALNO.A} and friends macros

When it comes to hardware failure, information about a machine's location, admin contact, serial number, and so on, can prove quite useful to track it down quickly or to pass it on to external support groups.

Defining the action conditions

This part lets you define conditions based on the event's hosts, trigger, and trigger values. Just as with trigger expressions, you can combine different simple conditions with a series of AND/OR logical operators, as shown in the next screenshot. You can either have all AND, all OR, or a combination of the two, where conditions of different types are combined with AND, while conditions of the same type are combined with OR:

Defining the action conditions

Observe how one of the conditions is Trigger value = PROBLEM. Since actions are evaluated for every event and since a trigger switching from PROBLEM to OK is an event in itself, if you don't specify this condition the action will be executed both when the trigger switches to PROBLEM and when the trigger switches back to OK. Depending on how you have constructed your default message and what operations you intend to do with your actions, this may very well be what you intended, and Zabbix will behave exactly as expected.

Anyway, if you created a different recovery message in the Action definition form and you forget the condition, you'll get two messages when a trigger switches back to OK—one will be the standard message, and one will be the recovery message. This can certainly be a nuisance as any recovery message would be effectively duplicated, but things can get ugly if you rely on external commands as part of the action's operations. If you forget to specify the condition Trigger value = PROBLEM, the external, remote command would also be executed twice—once when the trigger switches to PROBLEM (this is what you intended) and once when it switches back to OK (this is quite probably not what you intended). Just to be on the safe side, and if you don't have very specific needs for the action you are configuring, it's probably better if you get into the habit of putting Trigger value = PROBLEM for every new action you create or at least checking whether it's present in the actions you modify.

The most typical application to create different actions with different conditions is to send alert and recovery messages to different recipients. This is the part where you should remember that actions are global.

Let's say that you want all the database problems sent over to the database administrators group and not the default Zabbix administrators group. If you just create a new action with the condition that the host group must be DB Instances and, as message recipients, choose your DB admins, they will certainly receive a message for any DB-related event, but so will your Zabbix admins if the default action has no conditions configured. The reason is that since actions are global, they are always executed whenever their conditions evaluate to True. In this case, both the specific action and the default one would evaluate to True, so both groups would receive a message. What you could do is add an opposite condition in the default action so that it would be valid for every event, except for those related to the DB Instances host group. The problem is that this approach can quickly get out of control, and you may find yourself with a default action full of the not in group conditions. Truth is, once you start creating actions specific to message recipients, you either disable the default action or take advantage of it to populate a message archive for administration and reporting purposes.

Starting with Zabbix 2.4, there is another supported way of calculating action conditions. As you can easily imagine, the And/Or type of calculation clearly suffers from many limitations. Taking a practical example with two groups of the same condition type, you can't use the AND condition within a group and the OR condition within the other group. Starting with Zabbix 2.4, this limitation has been bypassed. If you take a look at the possible options to calculating the action condition, you can see that now we can choose even the Custom expression option, as shown in the following screenshot:

Defining the action conditions

This new way allows us to use calculated formulas, such as:

  • (A and B) and (C or D)
  • (A and B) or (C and D)

But you can even mix the logical operators, as with this example:

  • ((A or B) and C) or D

This opens quite a few interesting scenarios of usage, bypassing the previous limitations.

Choosing the action operations

If the first two parts were just preparation, this is where you tell the action what it should actually do. The following are the two main aspects to this:

  • Operation steps
  • The actual operations available for each step

As with almost everything in Zabbix, the simplest cases that are very straightforward are most often self-explanatory; you just have a single step, and this step consists of sending the default message to a group of defined recipients. Also, this simple scenario can become increasingly complex and sophisticated but still manageable, depending on your specific needs. Let's see a few interesting details about each part.

Steps and escalations

Even if an action is tied to a single event, it does not mean that it can perform a single operation. In fact, it can perform an arbitrary number of operations called steps, which can even go on for an indefinite amount of time or until the conditions for performing the action are not valid anymore.

You can use multiple steps to both send messages as well as perform automated operations. Alternatively, you can use the steps to send alert messages to different groups or even multiple times to the same group with the time intervals that you want as long as the event is unacknowledged or even not yet resolved. The following screenshot shows a combination of different steps:

Steps and escalations

As you can see, step 1 starts immediately, is set to send a message to a user group, and then delays the subsequent step by just 1 minute. After 1 minute, step 2 starts and is configured to perform a remote command on the host. As step 2 has a default duration (which is defined in the main Action definition tab), step 3 will start after about an hour. Steps 3, 4, and 5 are all identical and have been configured together—they will send a message to a different user group every 10 minutes. You can't see it in the preceding screenshot, but step 6 will only be executed if the event is not yet acknowledged, just as step 7, which is still being configured. The other interesting bit of step 7 is that it's actually set to configure steps 7 to 0. It may seem counterintuitive, but in this case, step 0 simply means forever. You can't really have further steps if you create a step N to 0, because the latter will repeat itself with the time interval set in the step's Duration(sec) field. Be very careful in using step 0 because it will really go on until the trigger's status changes. Even then, if you didn't add a Trigger status="PROBLEM" condition to your action, step 0 can be executed even if the trigger switched back to OK. In fact, it's probably best never to use step 0 at all unless you really know what you are doing.

Messages and media

For every message step, you can choose to send the default message that you configured in the first tab of the Action creation form or send a custom message that you can craft in exactly the same way as the default one. You might want to add more details about the event if you are sending the message via e-mail to a technical group. On the other hand, you might want to reduce the amount of details or the words in the message if you are sending it to a manager or supervisor or if you are limiting the message to an SMS.

Remember that in the Action operation form, you can only choose recipients as Zabbix users and groups, while you still have to specify any media address for every user they are reachable to. This is done in the Administration tab of the Zabbix frontend by adding media instances for every single user. You also need to keep in mind that every media channel can be enabled or disabled for a user; it may be active only during certain hours of the day or just for one or more specific trigger severity, as shown in the following screenshot:

Messages and media

This means that even if you configure an action to send a message, some recipients may still not receive it based on their own media configuration.

While Email, Jabber, and SMS are the default options to send messages, you still need to specify how Zabbix is supposed to send them. Again, this is done in the Media types section of the Administration tab of the frontend. You can also create new media types there that will be made available both in the media section of user configuration and as targets to send messages to in the Action operations form.

If you have more than one server and you need to use them for different purposes or with different sender identifications, a new media type can be a different e-mail, jabber, or SMS server. It can also be a script, and this is where things can become interesting if not potentially misleading.

A custom media script has to reside on the Zabbix server in the directory that is indicated by the AlertScriptPath variable of zabbix_server.conf. When called upon, it will be executed with the following three parameters passed by the server:

  • $1: The recipient of the message
  • $2: The subject of the message
  • $3: The body of the main message

The recipient will be taken from the appropriate user-media property that you defined for your users while creating the new media type. The subject and the message body will be the default ones configured for the action or some step-specific ones, as explained before. Then, from Zabbix's point of view, whether it's an old UUCP link, a modern mail server that requires strong authentication, or a post to an internal microblogging server, the script should send the message to the recipient by whatever custom methods you intend to use. The fact is that you can actually do what you want with the message; you can simply log it to a directory, send it to a remote file server, morph it to a syslog entry and send it over to a log server, run a speech synthesis program on it and read it aloud on some speakers, or record a message on an answering machine (as with every custom solution); the sky's the limit with custom media types. This is why you should not confuse custom media with the execution of a remote command—while you could potentially obtain roughly the same results with one or the other, custom media scripts and remote commands are really two different things.

Remote commands

These are normally used to try to perform corrective actions in order to resolve a problem without human intervention. After you've chosen the target host that should execute the command, the Zabbix server will connect to it and ask it to perform it. If you are using the Zabbix agent as a communication channel, you'll need to set EnableRemoteCommands to 1, or the agent will refuse to execute any command. Other possibilities include SSH, Telnet, and IPMI (if you have compiled the relative options during server installation).

Remote commands can be used to do almost anything—kill or restart a process, make space on a filesystem by zipping or deleting old files, reboot a machine, and so on. They tend to seem powerful and exciting to new implementers, but in the authors' experience, they tend to be fragile solutions that tend to break things almost as often as they fix them. It's harder than it looks to make them run safely without accidentally deleting files or rebooting servers when there's no need to. The real problem with remote commands is that they tend to hide problems instead of revealing them, which should really be the job of a monitoring system. Yes, they can prove useful as a quick patch to ensure the smooth operation of your services, but use them too liberally and you'll quickly forget that there actually are recurring problems that need to be addressed because some fragile command somewhere is trying to fix things in the background for you. It's usually better to really try to solve a problem than to just hide it behind an automated temporary fix. This is not just from a philosophical point of view as, when these patches fail, they tend to fail spectacularly and with disastrous consequences.

So, our advice is that you use remote commands very sparingly and only if you know what you are doing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.9.118