Monitoring the output of an SNMP query

In this recipe, we'll learn how to use the check_snmp plugin to monitor the output made by SNMP (Simple Network Management Protocol) requests.

Despite its name, SNMP is not really a very simple protocol. However, it's a very common method for accessing information on many kinds of networked devices, including monitoring boards, usage meters, and storage appliances as well as workstations, servers, and routing equipment.

Because SNMP is so widely supported and typically able to produce such a large volume of information to trusted hosts, it's an excellent way of gathering information from hosts that's not otherwise retrievable from network services. For example, while checking for a PING response from a large router is simple enough, there may not be an easy way to check properties like the state of each of its interfaces, or the presence of a certain route in its routing tables.

Using check_snmp in Nagios Core allows automated retrieval of this information from the devices and generating alerts appropriately. While its setup is somewhat complex, it is worth learning how to use it as it is among the most powerful plugins in Nagios Core for network administrators and it is quite typical to see dozens of commands defined for its use in a typical configuration for a large network. It can often be used to complement or even replace remote plugin execution daemons like NRPE or NSclient++.

Getting ready

You should have a Nagios Core 4.0 or newer server with at least one host configured already. You should also understand the basics of how hosts and services relate, which is covered in the recipes in Chapter 1, Understanding Hosts, Services, and Contacts.

This recipe assumes a basic knowledge of SNMP, including its general intended purpose, the concept of an SNMP community, and what SNMP MIBs and OIDs are. In particular, if you're looking to monitor some property of a networked device that's available to you via SNMP, you should know what the OID for that data is. This information is often available in the documentation for network devices, or can be deduced by running an appropriate snmpwalk command against the host to view the output for all its OIDs.

You should check that an SNMP daemon is running on the target host and also that the check_snmp plugin is available on the monitoring host. It is included as part of the standard Nagios Plugins so, provided the Net-SNMP libraries were available on the system when these were compiled, it should be available. If it is not, you may need to install the Net-SNMP libraries on your monitoring system and recompile the plugins.

We'll use the example of retrieving the total process count from a Linux server with hostname ithaca.example.net and flagging WARNING and CRITICAL states at appropriate high ranges. We'll also discuss how to test for the presence or absence of strings rather than numeric thresholds.

It's a good idea to test that the host will respond to SNMP queries in the expected form. We can test this with snmpget. Assuming a community name of public, we could write:

$ snmpget -v1 -c public ithaca.example.net .1.3.6.1.2.1.25.1.6.0 
iso.3.6.1.2.1.25.1.6.0 = Gauge32: 81

We can also test the plugin by running it directly as the nagios user:

# sudo -s -u nagios
$ /usr/local/nagios/libexec/check_snmp -H ithaca.example.net -C public -o .1.3.6.1.2.1.25.1.6.0
SNMP OK - 81 | iso.3.6.1.2.1.25.1.6.0=81

How to do it...

We can define a command and service check for the Linux process count OID as follows:

  1. Change to the objects configuration directory for Nagios Core. The default is /usr/local/nagios/etc/objects. If you've put the definition for your host in a different file, move to that directory instead.
    # cd /usr/local/nagios/etc/objects
    
  2. Edit a suitable file containing command definitions, perhaps commands.cfg, and add the following definition to the end of the file.
    define command {
        command_name  check_snmp_linux_procs
        command_line  $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o .1.3.6.1.2.1.25.1.6.0 -w 100 -c 200
    }
    
  3. Edit the file containing the definition for the host. The host definition might look something like this:
    define host {
        use        linux-server
        host_name  ithaca.example.net
        alias      ithaca
        address    192.0.2.61
    }
    
  4. Beneath the definition for the host, place a new service definition using our new command. Replace public with the name of your SNMP community if it differs:
    define service {
        use                  generic-service
        host_name            ithaca.example.net
        service_description  SNMP_PROCS
        check_command        check_snmp_linux_procs!public
    }
    
  5. Validate the configuration and restart the Nagios Core server:
    # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
    # /etc/init.d/nagios reload
    

With this done, a new service check with a description of SNMP_PROCS will be added to the ithaca.example.net host and the check_snmp plugin will issue a request for the value of the specified OID as its regular check. It will flag a WARNING state if the count is greater than 100 and a CRITICAL state if greater than 200, notifying accordingly. All this appears in the web interface the same way as any other service, under the Services menu item.

How it works...

The preceding configuration defines both a new command based around the check_snmp plugin and, in turn, a new service check using that command for the ithaca.example.net server. The community name for the SNMP request, public, is passed into the command as an argument; everything else, including the OID to be requested, is fixed into the check_snmp_linux_procs command definition.

Part of the command line defined includes the -w and -c options. For numeric outputs like ours, these are used to define the limits for the value beyond which a WARNING or CRITICAL state is raised, respectively. In this case, we define a WARNING threshold of 100 processes and a CRITICAL threshold of 200 processes.

Similarly, if the SNMP check fails completely due to connectivity problems or syntax errors, an UNKNOWN state will be reported.

There's more...

It's also possible to test the output of SNMP checks to see if they match a particular string or pattern for determining whether the check succeeded. If we needed to check that the system's short hostname was ithaca, for example (perhaps as a simple test SNMP query that should always succeed), we might set up a command definition as follows:

define command {
    command_name  check_snmp_hostname
    command_line  $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o .1.3.6.1.2.1.1.5.0 -r $ARG2$
}

With a corresponding service check like this:

define service {
    use                  generic-service
    host_name            ithaca.example.net
    service_description  SNMP_HOSTNAME
    check_command        check_snmp_hostname!public!ithaca
}

This particular check would only succeed if the SNMP query succeeds and returns a string matching the string ithaca, as specified in the second argument.

See also

  • Creating a new host, Chapter 1, Understanding Hosts, Services, and Contacts
  • Creating a new service, Chapter 1, Understanding Hosts, Services, and Contacts
  • Creating a new command, Chapter 2, Working with Commands and Plugins
  • The Creating an SNMP OID to monitor recipe in this chapter
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.103.183