In this recipe, you'll learn how to establish a host dependency between two hosts. This feature can be used to control how Nagios Core checks hosts and notifies them about problems in situations where if one host is DOWN
, it implies that at least one other host is necessarily DOWN
.
First of all, it's very important to note that this is not quite the same thing as a host being UNREACHABLE
, which is what the parents
directive is for, as discussed in the Creating a network host hierarchy recipe in this chapter. Most of the time, a host actually being DOWN
does not mean that other hosts actually go DOWN
by definition. It's more typical for a child host to simply be UNREACHABLE
; it might work fine, but Nagios Core can't verify this because of the DOWN
host in its path.
However, there's one particularly broad category where host dependencies are definitely useful: the host/guest relationship of virtual machines. If you monitor both a host physical machine and one or more guest virtual machines, then the virtual machines will definitely be dependent on the host; if the host machine is actually in the DOWN
state and has no redundant failover, then it would imply that the guests were DOWN
as well and not simply UNREACHABLE
.
We'll use virtualization as an example with two virtual machines, zeus.example.net
and athena.example.net
, running on a host machine, ephesus.example.net
. All three are already monitored, but we'll establish a host dependency so that Nagios Core doesn't notify anyone about the guests' state if it determines that the host is DOWN
.
You will need Nagios Core 4.0 or a newer server and to have shell access to change its backend configuration.
We can establish our host dependencies like so:
/usr/local/nagios/etc/objects
. If you've put the definition for your host in a different file, move to its directory instead and run the following line of code:# cd /usr/local/nagios/etc/objects
/usr/local/nagios/etc/nagios.cfg
. A sensible choice could be /usr/local/nagios/etc/objects/dependencies.cfg
, as follows:# vi dependencies.cfg
hostdependency
definition. In our case, the definition looks similar to the following. Note that you can include multiple dependent hosts by separating their names with commas, as follows:define hostdependency {
host_name ephesus.example.net
dependent_host_name zeus.example.net,athena.example.net
execution_failure_criteria n
notification_failure_criteria d,u
}
nagios.cfg
to include a reference to this new file so that it gets included in the configuration via the following code:cfg_file=/usr/local/nagios/etc/objects/dependencies.cfg
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg # /etc/init.d/nagios restart
With this done, if the ephesus.example.net
host goes down and takes both the zeus.example.net
and athena.example.net
hosts down with it, checks to all three hosts will continue, but notifications will be suppressed for the two guest hosts.
The host dependency object's four directives are as follows:
host_name
: This is the name of the host on which at least one other host is dependent. This can also be a comma-separated list of host names.dependent_host_name
: This is the name of the dependent host. Again, this can be a comma-separated list.execution_failure_criteria
: This defines a list of states for the host being depended upon. If this host is in any of these states, then Nagios Core will skip the checks for the dependent hosts. This can be a comma-separated list of any of the following flags:o
: The depended-upon host is UP
d
: The depended-upon host is DOWN
u
: The depended-upon host is UNREACHABLE
p
: The depended-upon host is PENDING
(that is, not checked yet)Alternatively, the single n
flag can be used (as it is in this example) to specify that the checks should take place regardless of the depended-upon host's state.
notification_failure_criteria
: This defines a list of states for the host being depended upon. If this host is in any of these states, then notifications for the dependent host will not be sent. The flags are the same as for execution_failure_criteria
; in this example, we chose to suppress the notifications if the host being depended upon is DOWN
or UNREACHABLE
.When Nagios Core notices that the zeus.example.net
or athena.example.net
hosts have apparently gone to the DOWN
state as a result of a failed host check, it refers to its configuration to check whether there are any dependencies for the host and finds that they depend on ephesus.example.net
.
It then checks the status of ephesus.example.net
and finds it to be DOWN
. Referring to execution_failure_criteria
and finding n
, it continues to run checks for both of the dependent hosts as normal. However, referring to notification_failure_criteria
and finding d,u
, it determines that notifications should be suppressed until the host returns to an UP
state.
We can specify groups rather than host names using the hostgroup_name
and dependent_hostgroup_name
directives, as follows:
define hostdependency { hostgroup_name vm-hosts dependent_hostgroup_name vm-guests execution_failure_criteria n notification_failure_criteria d,u }
We can also provide comma-separated lists of hosts on which to depend, as shown in the following code:
define hostdependency {
host_name ephesus.example.net,alexandria.example.net
dependent_host_name zeus.example.net,athena.example.net execution_failure_criteria n
notification_failure_criteria d,u
}
If a host depends on more than one host, the check or notification rules apply if any of its dependencies are not met rather than all of them. For the preceding example, this means that if ephesus.example.net
was in the DOWN state
but alexandria.example.net
was in UP
, the dependency would still suppress checks or notifications for all the dependent hosts.
This means that host dependencies are not really suitable in redundant scenarios where the loss of one of the depended-upon hosts does not necessarily imply the loss of all its dependent hosts. You are likely to find that monitoring nodes as a cluster is a better fit for this situation; this is discussed in the Monitoring individual nodes as a cluster recipe, which is also in this chapter.
3.145.63.136