Parent-child relationships and service dependencies

Icinga gives an ability to define the parent-child host relationships and service dependencies among service checks. This is important because:

  • If a switch or router fails, all servers behind them will also become unreachable. Then Icinga would generate host alerts for all such servers and for the service checks defined on them. If we define the network device to be the parent of the servers behind it, Icinga will automatically suppress notifications for hosts behind the network device and perform service checks on them.
  • If the SSH or NRPE service fails on a Linux or Windows server respectively, all service checks relying on these services would also fail and we will have many alerts. So we define service dependencies and make other services dependent upon SSH and NRPE. This would tell Icinga to suppress alerts for dependent service checks if the service depended upon fails.

Relationships between the hosts

Declaring relationships between the host is simple. The host object type provides the parents directive for this purpose. Suppose we have server1.example.org and server2.example.org behind a sw-1.example.org switch. When the switch goes down, Icinga will generate DOWN alerts for all servers behind it. In most cases, this is not desirable because once we get alerted about the switch being down, it is expected that the hosts behind it will also be unreachable. To suppress these subsequent alerts, we need to define relationships between the hosts in the host definitions as follows:

define host {
    use             generic-switch
    host_name       sw1.example.org
    address         192.168.32.1
    hostgroups      all,switches
}

define host {
    use             linux
    host_name       server1.example.org
    address         192.168.32.56
    hostgroups      all,linux
    parents       sw1.example.org
}

define host {
    use             windows
    host_name       server2.example.org
    address         192.168.32.57
    hostgroups      all,windows
    parents       sw1.example.org
}

The preceding definitions will make sw1.example.org the parent of server1.example.org and server2.example.org. When Icinga detects it cannot reach either of the servers, its reachability logic comes in and checks for reachability of their parents' and children's hosts and figures out the point of failure in the network map. It then marks the hosts behind the point of failure in an UNREACHABLE state.

By default Icinga will send notifications for an UNREACHABLE state too. So to suppress the notifications for such cases, we need to remove the unreachable (u) option from the value of the notification_options directive in host definitions.

define host {
    use                    generic-host
    host_name             server1.example.org
    notification_options  d,r
}

This can optionally be put in the host templates to affect all the hosts. Now with these two things (host relationships and exclude (u) option in host definition) configured, we will get only one notification when the switch is down.

Note that the web interface will still show all the servers and service checks on them as CRITICAL; only their notifications (e-mail and so on) will be suppressed.

Using these host relationships, Icinga builds a network map consisting of a graph with network devices and hosts at nodes connected from the monitoring servers to the root-level hosts in the tree, and then their children as their nodes. Following is a screenshot of an example network map:

Relationships between the hosts

Network Map

Service relationships

Declaring service relationships is a little more complicated compared to host relationships. Here, we need to define Icinga objects of the type servicedependency to declare such dependency relationships.

Let's say we have a server.example.org server with the following host definition and ping, then HTTP and SMTP service checks:

define host {
    use                     generic-host
    host_name               server.example.org
}

define command {
    command_name            check_ping
    command_line            $USER1$/check_ping -H $HOSTADDRESS$
}
define service {
    use                     generic-service
    host_name               server.example.org
    service_description     Ping
    check_command           check_ping
}

define command {
    command_name            check_http
    command_line            $USER1$/check_http -H $HOSTADDESS$
}
define service {
    use                     generic-service
    host_name               server.example.org
    service_description     HTTP
    check_command           check_http
}

define command {
    command_name            check_smtp
    command_line            $USER1$/check_smtp -H $HOSTADDESS$
}
define service {
    use                     generic-service
    host_name               server.example.org
    service_description     SMTP
    check_command           check_smtp
}

With this configuration, we would want that when the ping check fails, we will know other checks would also fail, and there would be no unnecessary alert floods regarding them. So, we have to make HTTP and SMTP checks dependent on ping check. Here's how we see the definition:

define servicedependency {
    host_name                           server.example.org
    service_description                 Ping
    dependent_service_description       HTTP,SMTP
}

The object definition would make the checks in the dependent_service_description directive depend on the checks in the service_description directive. Note that the values of these directives should match those specified in the service_description directive in service object definitions. Both the directives can have a comma-separated list of service descriptions, if needed. The following figure depicts the dependencies:

Service relationships

Dependency graph of Ping, HTTP, and SMTP service checks

Similarly for checks executing over SSH, we would define similar dependency objects. Let's look at the example of making load and disk checks dependent on the SSH check:

define servicedependency {
    host_name                           server.example.org
    service_description                 SSH
    dependent_service_description       Load,Disk
}

The SSH check, of course, should in turn be dependent upon the ping check:

define servicedependency {
    host_name                           server.example.org
    service_description                 Ping
    dependent_service_description       HTTP,SMTP,SSH
}

The following figure depicts this:

Service relationships

Dependency graph for Ping, HTTP, SMTP, SSH, load, and disk services

A similar dependency can be defined for NRPE checks on the Windows servers.

With these dependency relationships in place, Icinga now has a dependency tree-map of service checks which it would use to figure out what notifications it should suppress. Failing to perform the ping check won't alert about HTTP, SMTP, and SSH checks failing. SSH check failing won't alert about load and disk checks failing.

But this approach of configuration may be cumbersome since we have to do this separately for each host and dependent service check. This use case is a relatively common one that is used across most of the servers. In this case, we can pass hostgroup to service dependency.

define servicedependency {
    hostgroup_name                      linux
    service_description                 SSH
    dependent_service_description       Load,Disk
}

define servicedependency {
    hostgroup_name                      windows
    service_description                 NRPE
    dependent_service_description       Memory,Windows Folder Size
}

This would apply this dependency relationship on all hosts of the specified hostgroups. But this is still a little cumbersome since we have to keep adding service checks to the list in the dependent_service_description directive as and when they come up, which may get overlooked at times. So what we can do is define service groups (this is the same as hostgroups, that is, a group of services),put all dependent service checks in the service groups, and then simply specify the service group in the dependency object.

One could argue that it is one and the same because we will ultimately need to put a list of service checks in the service groups' definition. But that's not the only option, there's a better way to do it.

Let's define ssh_dependent service group, which will have relevant service checks as its members:

define servicegroup {
    servicegroup_name           ssh_dept
}

define service {
    ...
    service_description         Load
    ...
    servicegroups               ssh_dept
}

define service {
    ...
    service_description         Disk
    ...
    servicegroups               ssh_dept
}

Using the servicegroups directive in service definition, we can easily assign member service checks to desired service groups. Then, we only need to use this service group in the dependency definition:

define servicedependency {
    service_description             SSH
    dependent_servicegroup_name     ssh_dept
}

Note that the preceding definition assumes there is a SSH service check defined for servers having load and disk checks.

The preceding dependency definition would make all the service checks that are members of the ssh_dept service group depending on the SSH service check of respective hosts. Note that, in this method, we don't require a hostgroup to be specified in the dependency object definition. We have SSH and other service checks (disk, load, and so on) that are involved in the dependency defined to apply on the linux hostgroup. Icinga will apply this dependency on all hosts that have these service checks. Similar configuration can be applied to Windows servers:

define servicegroup {
    servicegroup_name           nrpe_dept
}
define service {
    ...
    service_description         Memory
    ...
    servicegroups               nrpe_dept
}
define service {
    ...
    service_description         Windows Folder Size
    ...
    servicegroups               nrpe_dept
}
define servicedependency {
    service_description             NRPE
    dependent_servicegroup_name     nrpe_dept
}

For common checks for public services, the configuration is as follows:

define servicegroup {
    servicegroup_name           ping_dept
}
define service {
    ...
    service_description         HTTP
    ...
    servicegroups               ping_dept
}
define service {
    ...
    service_description         SMTP
    ...
    servicegroups               ping_dept
}
define servicedependency {
    service_description             Ping
    dependent_servicegroup_name     ping_dept
}

The following figure shows the service group dependencies:

Service relationships

Service group dependency graph

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.206.254