In this recipe, we'll adjust the definition of a very important host to ensure that it checks whether the host is up in every three minutes and, if it finds that the host is down as a result of the check failing, it will check again after a minute before it sends a notification about the state to its defined contact. We'll do this by customizing the definition for an existing host.
You should have a Nagios Core 4.0 or newer server with at least one host configured already. We'll use the example of sparta.example.net
, a host defined in its own file.
You should also understand the basics of commands and plugins, in particular the meaning of the check_command
directive. These are covered in the recipes in Chapter 2, Working with Commands and Plugins.
We can customize the check frequency for a host as follows:
/usr/local/nagios/etc/objects
. If you've put the definition of your host in a different file, move it to its directory instead:# cd /usr/local/nagios/etc/objects
# vi sparta.example.net.cfg
The host definition may look something like this:
define host { use linux-server host_name sparta.example.net alias sparta address 192.0.2.21 }
check_interval
directive to 3
:define host {
use linux-server
host_name sparta.example.net
alias sparta
address 192.0.2.21
check_interval 3
}
retry_interval
directive to 1
: use linux-server
host_name sparta.example.net
alias sparta
address 192.0.2.21
check_interval 3
retry_interval 1
}
max_check_attempts
to 2
:define host {
use linux-server
host_name sparta.example.net
alias sparta
address 192.0.2.21
check_interval 3
retry_interval 1
max_check_attempts 2
}
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg # /etc/init.d/nagios reload
With this done, Nagios Core will run the relevant check_command
(probably something like check-host-alive
) against this host every three minutes and, if it fails, will flag the host as down, check the same again after one minute, and only then send a notification to its defined contact if the second check fails as well.
The preceding configuration changed three properties of the host
object type to effect the changes we needed:
check_interval
: This defines how long to wait between successive checks of the host under normal conditions. We set this to 3
, or three minutes.retry_interval
: This defines how long to wait between follow-up checks of the host after first finding problems with it. We set this to 1
, or one minute.max_check_attempts
: This defines how many total checks should we run before a notification is sent. We set this to 2
for two checks. This means that after the first failed check is run, Nagios Core will run another check a minute later and will only send a notification if this check fails as well. After two checks have been run and the host is still in a problem state, it will go from a SOFT
state to a HARD
state.Note that setting these directives in a host that derives from a template, as is the case with our example, will override any of the same directives in the template.
It's important to note that we can also define the units used by the check_interval
and retry_interval
commands. They only use minutes by default, checking the interval_length
setting that's normally defined in the root configuration file for Nagios Core, by default, /usr/local/nagios/etc/nagios.cfg
:
interval_length=60
If we wanted to specify these periods in seconds instead, we could set this value to 1
instead of 60
:
interval_length=1
This would allow us, for example, to set check_interval
to 15
, to check a host every 15 seconds. Note that if we have a lot of hosts with such a tight checking schedule, it might overburden the Nagios Core process, particularly if the checks take a long time to complete.
Don't forget that changing these properties for a large number of hosts can be tedious, so if it's necessary to set these directives to some common value for more than a few hosts, it may be appropriate to set the values in a host template and then have these hosts inherit from it. Refer to the Using inheritance to simplify configuration recipe in Chapter 9, Managing Configuration, for more details. Note that the same three directives also work for service declarations and have the same meaning. We could define the same notification behavior for a service on sparta.example.net
with a declaration like this:
define service { use generic-service host_name sparta.example.net service_description HTTP check_command check_http address 192.0.2.21 check_interval 3 retry_interval 1 max_check_attempts 2 }
3.17.184.90