We looked at the basic and default notification configuration provided by Icinga. There are a number of customizations possible on top of it, some of which are covered in this section.
Service objects offer some directives to control or specify whether, which, and how the notifications should be sent out.
notifications_enabled
directive is used to enable/disable (0
/1
) notifications for a service check. This is useful to perform extremely noncritical checks, for which we don't need notifications. The default value is 1
, and should only be set to 0
to disable notifications.notification_options
directive determines what kind of notifications should be sent for the service check. The notification options that can be specified as the value are as follows:w
= WARNING
u
= UNKNOWN
c
= CRITICAL
r
= RECOVERY
So, suppose we don't want to receive the WARNING
and UNKNOWN
notifications, we can simply have c
and r
as the value of this directive in the service definitions, which will send only the CRITICAL
and RECOVERY
notifications.
first_notification_delay
directive determines the time to wait before sending out the first notification after a service check enters a non-OK state. The value is the number of time units, and the length of a time unit is defined by the interval_length
directive in icinga.cfg
, which defaults to 60 seconds. This is useful if we expect intermittent problems and that they will recover within a certain time automatically; we don't need a notification if it actually recovers within the expected time. The default value is 0
, which means Icinga should send out a notification immediately.notification_interval
directive determines the time after which a contact should be notified again that the service is still in the non-OK state. The value is again the number of time units. The service template has this set to 60
; it means that the contacts will be renotified about the problem if a service check stays in the non-OK state for more than 60 minutes (or 1 hour). To disable such reminder notifications, set this value to 0
.notification_period
directive describes the time of the day/week/month/year to which the notifications should be restricted. This is specified by the Icinga timeperiod
object. This timeperiod
object defines the periods of time, such as selected days of a week, or selected hours in a day. The service template has the timeperiod
object as 24x7
, as the value of this directive. Let's look at its definition (timeperiods.cfg
):define timeperiod { timeperiod_name 24x7 alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 }
The preceding timeperiod
object defines an all-time
time period for all days in a week and all hours in a day. This means the notifications can be sent out for the said service at all times.
Similarly, we can restrict the time at which the alert for the service should be sent by defining a more restrictive timeperiod
object and using this notification_period
directive in the service definition. There is an timeperiod
object named as workhours
, already defined in default configuration; let's have a quick look at it for clarity:
define timeperiod { timeperiod_name workhours alias Normal Work Hours monday 09:00-17:00 tuesday 09:00-17:00 wednesday 09:00-17:00 thursday 09:00-17:00 friday 09:00-17:00 }
The preceding timeperiod
object defines normal work hours, that is, only 9 a.m. to 5 p.m. on weekdays. If this time period is used in any of the service checks, no notifications will be sent for the service checks outside of these work hours.
Notification options can also be customized inside contact definitions. Let's look at various available directives:
host_notifications_enabled
and service_notifications_enabled
directives can be used to enable/disable (0
/1
) the host/service notifications on the basis of contacts. The default value is 1
. The value(s) should be set to 0
to explicitly disable notifications for the contact.host_notification_period
and service_notification_period
directives can, again, be used to specify the Icinga timeperiod
objects, only within which the notifications to this particular contact should be sent out. This may be useful if only an on-duty person should receive notifications 24x7 and the boss should receive it only during work hours.host_notification_commands
and service_notification_commands
. Note that the value of these directives can be a comma-separated list of command names if we want multiple commands to be executed for notifications.email
directive, as seen in the earlier sections, is used to specify the e-mail address of the contact. This is accessible using the $CONTACTEMAIL$
Icinga macro in command objects.pager
directive is used to specify the mobile number of the contact, which is available via the $CONTACTPAGER$
Icinga macro in command objects.addressx
directives are used to specify the other miscellaneous addresses for the contact (Jabber ID and so on). The value of x
ranges from 1
through 6
, each of which is available via the $CONTACTADDRESSn$
macros, where again n
ranges from 1
through 6
.Escalation is basically if person1
receives an alert and the problem continues to be there for x
number of minutes then a notification should be sent (escalated) to person2
.
Escalation is another aspect of notification configuration and can be done using the hostescalation
and serviceescalation
Icinga objects. These objects are used to define escalation paths to various people. This path can go on as long as we have configured it.
The escalations work when there is notification_interval
defined in the service definition, which will re-notify contacts after the defined interval. Escalation logic kicks in when the serviceescalation
object is defined. In the object definition, we define the contacts that should be notified at the nth renotification.
For example, we have three contacts defined: onduty
, techlead
, and manager
. We want the onduty
contact to be notified immediately, techlead
if the check does not recover within 15 minutes, then notify the techlead
if it does not recover within the next 15 minutes (30 minutes from the start), and finally to notify the manager
contact if it does not recover within the next 15 minutes (45 minutes from start). For such a scenario, we need to set the notification_interval
directive to 15 minutes (assuming the length of a time unit is one minute in the main configuration) in a service object definition, so that Icinga will retrigger the notification every 15 minutes until the check recovers. Now, we also need to define two serviceescalation
objects as follows (we will use the localhost
host and the HTTP service as examples; this can be used in the same way for remote hosts too):
define serviceescalation { host_name localhost service_description HTTP first_notification 2 last_notification 3 contacts techlead } define serviceescalation { host_name localhost service_description HTTP first_notification 4 last_notification 0 contacts manager }
The preceding escalation definitions have the first_notification
and last_notification
directives, which determine the number of notifications for which the given escalation is valid. The first_notification
directive specifies n
for nth notification, with which this escalation becomes valid. The last_notification
directive specifies n
for nth notification, with which the escalation becomes ineffective. Escalation being valid implies that the contacts specified in the escalation will be notified. When the service goes CRITICAL, the onduty
contact is immediately notified as the first notification.
For the first escalation definition, escalation becomes valid with the second notification (first_notification
is 2
) for the service and will become ineffective after the third notification (last_notification
is 3
) has been sent out. So, the second and the third notifications will be sent to the techlead
contact after 15 and 30 minutes each.
For the second escalation definition, escalation becomes valid with the fourth notification (first_notification
is 4
) being sent out for the service and will never become ineffective (last_notification
is 0
). So the fourth notification (after 45 minutes) will go to the manager
contact and will be notified every 15 minutes until the service check recovers.
The following table puts together a timeline for this example:
Timestamp |
Service State |
n (nth notification) |
Explanation |
---|---|---|---|
05:00 |
CRITICAL |
|
The |
05:15 |
CRITICAL |
|
First |
05:30 |
CRITICAL |
|
The |
05:45 and onward |
CRITICAL |
|
Second |
A similar configuration can be made for host escalations too:
define hostescalation { host_name localhost first_notification 2 last_notification 3 contacts techlead } define hostescalation { host_name localhost first_notification 4 last_notification 0 contacts manager }
Let's take a quick look at various other directives offered by escalation objects to control notifications:
first_notification
and last_notification
directives, as we saw earlier, define the number of notifications for which the escalation stays valid.notification_interval
directive can be used to specify the time to wait before sending a notification to the contact in the escalation that is similar to the directive in service definition.escalation_period
directive can be used to specify the time period object for which the escalation stays valid (apart from the first and last notification numbers). The same timeperiod
object can be used, and is optional.escalation_options
directive is used to specify which notifications to send to the contacts of this escalation. Options are as follows:w
= WARNING
u
= UNKNOWN
c
= CRITICAL
r
= RECOVERY
This is similar to the corresponding directive in the service definition.
first_warning_notification
, last_warning_notification
, first_critical_notification
, last_critical_notification
, first_unknown_notification
, and last_unknown_notification
directives are used to specify different notification numbers for the WARNING, CRITICAL, and UNKNOWN states of the service check. This works in a similar way to the first and last notification number directives, except that the one that is more applicable according to the service state is used.This covers most of the configuration options that Icinga provides to customize the behavior of notifications.
18.227.52.208