Customizing notification behavior

We looked at the basic and default notification configuration provided by Icinga. There are a number of customizations possible on top of it, some of which are covered in this section.

Service definitions

Service objects offer some directives to control or specify whether, which, and how the notifications should be sent out.

  • The notifications_enabled directive is used to enable/disable (0/1) notifications for a service check. This is useful to perform extremely noncritical checks, for which we don't need notifications. The default value is 1, and should only be set to 0 to disable notifications.
  • The notification_options directive determines what kind of notifications should be sent for the service check. The notification options that can be specified as the value are as follows:
    • w = WARNING
    • u = UNKNOWN
    • c = CRITICAL
    • r = RECOVERY

    So, suppose we don't want to receive the WARNING and UNKNOWN notifications, we can simply have c and r as the value of this directive in the service definitions, which will send only the CRITICAL and RECOVERY notifications.

  • The first_notification_delay directive determines the time to wait before sending out the first notification after a service check enters a non-OK state. The value is the number of time units, and the length of a time unit is defined by the interval_length directive in icinga.cfg, which defaults to 60 seconds. This is useful if we expect intermittent problems and that they will recover within a certain time automatically; we don't need a notification if it actually recovers within the expected time. The default value is 0, which means Icinga should send out a notification immediately.
  • The notification_interval directive determines the time after which a contact should be notified again that the service is still in the non-OK state. The value is again the number of time units. The service template has this set to 60; it means that the contacts will be renotified about the problem if a service check stays in the non-OK state for more than 60 minutes (or 1 hour). To disable such reminder notifications, set this value to 0.
  • The notification_period directive describes the time of the day/week/month/year to which the notifications should be restricted. This is specified by the Icinga timeperiod object. This timeperiod object defines the periods of time, such as selected days of a week, or selected hours in a day. The service template has the timeperiod object as 24x7, as the value of this directive. Let's look at its definition (timeperiods.cfg):
    define timeperiod {
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
    }

    The preceding timeperiod object defines an all-time time period for all days in a week and all hours in a day. This means the notifications can be sent out for the said service at all times.

    Similarly, we can restrict the time at which the alert for the service should be sent by defining a more restrictive timeperiod object and using this notification_period directive in the service definition. There is an timeperiod object named as workhours, already defined in default configuration; let's have a quick look at it for clarity:

    define timeperiod {
        timeperiod_name workhours
        alias           Normal Work Hours
        monday          09:00-17:00
        tuesday         09:00-17:00
        wednesday       09:00-17:00
        thursday        09:00-17:00
        friday          09:00-17:00
    }

    The preceding timeperiod object defines normal work hours, that is, only 9 a.m. to 5 p.m. on weekdays. If this time period is used in any of the service checks, no notifications will be sent for the service checks outside of these work hours.

Contact definitions

Notification options can also be customized inside contact definitions. Let's look at various available directives:

  • The host_notifications_enabled and service_notifications_enabled directives can be used to enable/disable (0/1) the host/service notifications on the basis of contacts. The default value is 1. The value(s) should be set to 0 to explicitly disable notifications for the contact.
  • The host_notification_period and service_notification_period directives can, again, be used to specify the Icinga timeperiod objects, only within which the notifications to this particular contact should be sent out. This may be useful if only an on-duty person should receive notifications 24x7 and the boss should receive it only during work hours.
  • We have already looked at host_notification_commands and service_notification_commands. Note that the value of these directives can be a comma-separated list of command names if we want multiple commands to be executed for notifications.
  • The email directive, as seen in the earlier sections, is used to specify the e-mail address of the contact. This is accessible using the $CONTACTEMAIL$ Icinga macro in command objects.
  • The pager directive is used to specify the mobile number of the contact, which is available via the $CONTACTPAGER$ Icinga macro in command objects.
  • The addressx directives are used to specify the other miscellaneous addresses for the contact (Jabber ID and so on). The value of x ranges from 1 through 6, each of which is available via the $CONTACTADDRESSn$ macros, where again n ranges from 1 through 6.

The host/service escalation

Escalation is basically if person1 receives an alert and the problem continues to be there for x number of minutes then a notification should be sent (escalated) to person2.

Escalation is another aspect of notification configuration and can be done using the hostescalation and serviceescalation Icinga objects. These objects are used to define escalation paths to various people. This path can go on as long as we have configured it.

The escalations work when there is notification_interval defined in the service definition, which will re-notify contacts after the defined interval. Escalation logic kicks in when the serviceescalation object is defined. In the object definition, we define the contacts that should be notified at the nth renotification.

For example, we have three contacts defined: onduty, techlead, and manager. We want the onduty contact to be notified immediately, techlead if the check does not recover within 15 minutes, then notify the techlead if it does not recover within the next 15 minutes (30 minutes from the start), and finally to notify the manager contact if it does not recover within the next 15 minutes (45 minutes from start). For such a scenario, we need to set the notification_interval directive to 15 minutes (assuming the length of a time unit is one minute in the main configuration) in a service object definition, so that Icinga will retrigger the notification every 15 minutes until the check recovers. Now, we also need to define two serviceescalation objects as follows (we will use the localhost host and the HTTP service as examples; this can be used in the same way for remote hosts too):

define serviceescalation {
    host_name               localhost
    service_description     HTTP
    first_notification      2
    last_notification       3
    contacts                techlead
}

define serviceescalation {
    host_name               localhost
    service_description     HTTP
    first_notification      4
    last_notification       0
    contacts                manager
}

The preceding escalation definitions have the first_notification and last_notification directives, which determine the number of notifications for which the given escalation is valid. The first_notification directive specifies n for nth notification, with which this escalation becomes valid. The last_notification directive specifies n for nth notification, with which the escalation becomes ineffective. Escalation being valid implies that the contacts specified in the escalation will be notified. When the service goes CRITICAL, the onduty contact is immediately notified as the first notification.

For the first escalation definition, escalation becomes valid with the second notification (first_notification is 2) for the service and will become ineffective after the third notification (last_notification is 3) has been sent out. So, the second and the third notifications will be sent to the techlead contact after 15 and 30 minutes each.

For the second escalation definition, escalation becomes valid with the fourth notification (first_notification is 4) being sent out for the service and will never become ineffective (last_notification is 0). So the fourth notification (after 45 minutes) will go to the manager contact and will be notified every 15 minutes until the service check recovers.

The following table puts together a timeline for this example:

Timestamp

Service State

n (nth notification)

Explanation

05:00

CRITICAL

1

The onduty contact is immediately notified.

05:15

CRITICAL

2

First serviceescalation kicks in (first_notification is 2) and the techlead contact is notified.

05:30

CRITICAL

3

The techlead contact is notified again and first serviceescalation ends (last_notification is 3).

05:45 and onward

CRITICAL

4

Second serviceescalation kicks in (first_notification is 4), the manager contact is notified and will be notified every 15 minutes until the service recovers, as this escalation never ends (last_notification is 0).

A similar configuration can be made for host escalations too:

define hostescalation {
    host_name               localhost
    first_notification      2
    last_notification       3
    contacts                techlead
}

define hostescalation {
    host_name               localhost
    first_notification      4
    last_notification       0
    contacts                manager
}

Let's take a quick look at various other directives offered by escalation objects to control notifications:

  • The first_notification and last_notification directives, as we saw earlier, define the number of notifications for which the escalation stays valid.
  • The notification_interval directive can be used to specify the time to wait before sending a notification to the contact in the escalation that is similar to the directive in service definition.
  • The escalation_period directive can be used to specify the time period object for which the escalation stays valid (apart from the first and last notification numbers). The same timeperiod object can be used, and is optional.
  • The escalation_options directive is used to specify which notifications to send to the contacts of this escalation. Options are as follows:
    • w = WARNING
    • u = UNKNOWN
    • c = CRITICAL
    • r = RECOVERY

    This is similar to the corresponding directive in the service definition.

  • The first_warning_notification, last_warning_notification, first_critical_notification, last_critical_notification, first_unknown_notification, and last_unknown_notification directives are used to specify different notification numbers for the WARNING, CRITICAL, and UNKNOWN states of the service check. This works in a similar way to the first and last notification number directives, except that the one that is more applicable according to the service state is used.

This covers most of the configuration options that Icinga provides to customize the behavior of notifications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.52.208