Defining an escalation for repeated notifications

In this recipe, we'll learn how to arrange a Nagios Core configuration such that after a certain number of repetitions, notifications for problems on hosts or services are escalated to another contact, instead of (or in addition to) the normally defined contact. This is done by defining a separate object type called a host or service escalation.

This kind of setup could be useful for alerting more senior networking staff of an unsolved problem that a less experienced person is struggling to fix and can also function as a "safety valve" to ensure that problem notifications for hosts eventually do reach someone else if they remain unfixed.

Getting ready

You should have a Nagios Core 4.0 or newer server, with at least one host or service configured already, and at least two contact groups — one for the first few notifications, and one for the escalations. You should understand how notifications are generated and sent to the contacts and contact_groups for hosts or services.

We'll use the example of a host called sparta.example.net, which normally sends notifications to a group called ops. We'll arrange for all the notifications after the fourth one to also be sent to a contact group called emergency.

How to do it...

We can configure an escalation for our host or service as follows:

  1. Change to the objects configuration directory for Nagios Core. The default is /usr/local/nagios/etc/objects. If you've put the definition for your host in a different file, move to that directory instead.
    # cd /usr/local/nagios/etc/objects
    
  2. Edit the file containing the definition for the host. The definition might look like this:
    define host {
        use                    linux-server
        host_name              sparta.example.net
        alias                  sparta
        address                192.0.2.21
        contact_groups         ops
        notification_period    24x7
        notification_interval  10
    }
  3. Underneath the host definition, add the configuration for a new hostescalation object:
    define hostescalation {
        host_name              sparta.example.net
        contact_groups         ops,emergency
        first_notification     5
        last_notification      0
        notification_interval  10
    }
  4. Validate the configuration and restart the Nagios Core server:
    # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
    # /etc/init.d/nagios reload
    

With this done, when problems are encountered with the host that generate notifications, all the notifications beyond the fourth one will be sent to both the ops contact group and also to the emergency contact group, expanding the number of people or systems contacted, and making it more likely that the problem is actually addressed or fixed. Perhaps someone on the ops team has misplaced their pager.

How it works...

The configuration added in the preceding section is best thought of as a special case or override for a particular host, in that it specifies a range of notifications that should be sent to a different set of contact groups. It can be broken down as follows:

  • host_name: The same value of host_name given for the host in its definition. We specified sparta.example.net.
  • contact_groups: The contact groups to which the notifications meeting this special case should be sent. Note that we specify both the emergency group and the ops group, so that the matching notifications go to both. Note that they are comma-separated.
  • first_notification: The count of the first notification that should match this escalation. We chose the fifth notification.
  • last_notification: The count of the last notification that should match this escalation. We set this to zero, which means that all notifications after the first_notification should be sent to the nominated contact or contact groups. The notifications will not stop until they are manually turned off, or the problem is fixed.
  • notification_interval: Like the host and service directives of the same name, this specifies how long Nagios Core should wait before sending new notifications if the host remains in a problematic state. Here, we've chosen ten minutes.

Individual contacts can also (or instead) be specified with the contacts directive, rather than contact_groups.

There's more...

The preceding escalation continues sending notifications both to the original ops group and also to the members of the emergency group. It's generally a good idea to do it this way rather than sending notifications only to the escalated group, because the point of escalations is to increase the reach of the notifications when a problem is not being dealt with, rather than merely trying to contact a different group of people instead.

This principle applies to stacking escalations as well; if we had a host group with all our contacts in it, perhaps named everyone, we could define a second escalation that from the tenth notification onwards goes to every single contact:

define hostescalation {
    host_name              sparta.example.net
    contact_groups         everyone
    first_notification     10
    last_notification      0
    notification_interval  10
}

Just as we can specify multiple host escalations, it's also fine for the ranges of notifications to overlap so that more than one escalation applies.

With a little arithmetic, you can arrange escalations such that they work after a host or service has been in a problematic state for a certain period of time. For example, the escalation we specified in the recipe will apply after the host has been in a problematic state for 40 minutes, because the notification_interval specifies that Nagios Core should wait 10 minutes between resending notifications.

Service escalations work much the same way as host escalations do; the difference is that you need to specify the service by its service_description as well as its host name. Everything else works the same way. An escalation for a service check called HTTP running on sparta.example.net that does the same thing as the previous escalation would look like this:

define serviceescalation {
    host_name              sparta.example.net
    service_description    HTTP
    contact_groups         ops,emergency
    first_notification     5
    last_notification      0
    notification_interval  10
}

See also

  • Creating a new contact group, Chapter 1, Understanding Hosts, Services, and Contacts
  • The Configuring notifications for groups section in this chapter
  • The Defining a custom notification method section in this chapter
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.105.137