Using an alternative check command for hosts

In this recipe, we'll learn how to deal with a slightly tricky case in network monitoring: that of monitoring a server that doesn't respond to PING but still provides some network service that requires checking.

It's good practice to allow PING where you can, as it's one of the stipulations in RFC 1122 and a very useful diagnostic tool not just for monitoring but for troubleshooting. However, sometimes, servers that are accessed only by a select few people might be configured not to respond to these messages, perhaps for reasons of secrecy. It's quite common for domestic routers to be configured this way.

Another very common reason for this problem, and the example we'll address here, is checking servers that are behind an IPv4 NAT firewall. It's not possible to address the host directly via an RFC1918 address such as 192.168.1.20 from the public Internet, and pinging the public interface of the router doesn't tell us whether the host for which it is translating addresses is actually working.

However, port 22 for SSH is forwarded from the outside to this server and it's this service that we need to check for availability. The structure of this part of the network, including the addressing problem, is in the following diagram:

Using an alternative check command for hosts

We'll do this by checking whether the host is up through an SSH check, since we can't use PING on it from the outside like we normally would.

Getting ready

You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already. You should also be familiar with the relationship between services, commands, and plugins.

How to do it...

We can specify an alternative check method for a host as follows:

  1. Change to the directory containing the configuration file for the relevant host in Nagios Core. The default location for these files is /usr/local/nagios/etc/objects:
    # cd /usr/local/nagios/etc/objects
    
  2. Find the file that contains the host definition for the host that won't respond to PING and edit it. In this example, our crete.example.net host is the one we want to edit:
    # vi crete.example.net.cfg
    
  3. Change or define the check_command parameter of the host to the command that we want to use for the check instead of the usual check-host-alive or check_ping command. In this case, we want to use check_ssh. The resulting host definition might look something like this:
    define host {
        use            linux-server
        host_name      crete.example.net
        alias          crete
        address        192.0.2.23
        check_command  check_ssh
    }

    Note that defining check_command still works even if we're using a host template such as generic-host or linux-server. It might also be good practice to check whether the host will actually respond to our check as we expect it to:

    # sudo -s -u nagios
    $ /usr/local/nagios/libexec/check_ssh -H 192.0.2.23
    SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0) | time=0.012820s;;;0.000000;10.000000
    
  4. Validate the configuration and restart the Nagios Core server:
    # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
    # /etc/init.d/nagios reload
    

With this done, the next scheduled host check for the crete.example.net server should show the host as UP because it was checked with the check_ssh command and not the usual check-host-alive command.

How it works...

The configuration we added for the preceding crete.example.net host uses check_ssh to check whether the host is up, rather than using a check that uses PING. This is appropriate because the only public service accessible from crete.example.net is its SSH service. The following diagram demonstrates how the host is now being checked via SSH, despite not being accessible by PING:

How it works...

The check_ssh command is normally used to check whether a service is available, rather than a host. However, Nagios Core allows us to use it as a host check command as well. Most service commands work this way; you could check a web server in the same way with check_http.

There's more...

Note that for completeness' sake, it would also be appropriate to monitor the NAT router via PING or to perform an appropriate check on its public address. This way, if the host check for the SSH server fails, we can check whether the NAT router in front of it is still available, which assists in troubleshooting irrespective of whether the problem is with the server or with the NAT router. You can make this setup even more useful by making the NAT router a parent host for the SSH server behind it, which is explained in the Creating a network host hierarchy recipe in Chapter 8, Managing Network Layout.

See also

  • Monitoring SSH for any host, Chapter 5, Monitoring Methods
  • Checking an alternative SSH port, Chapter 5, Monitoring Methods
  • Monitoring local services on a remote machine with NRPE, Chapter 6, Enabling Remote Execution
  • Creating a network host hierarchy, Chapter 8, Managing Network Layout
  • Establishing a host dependency, Chapter 8, Managing Network Layout
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.43.140