Network devices

Let's first define a host for a sw-1.example.org switch. Icinga already provides a template host for switches.

define host {
    use             generic-switch
    host_name       hp-ac12504
    alias           HP 12504 AC switch
    address         192.168.32.58
    hostgroups      all,switches
}

Icinga gives a generic-switch template host found in templates.cfg with switch specific tweaks. Similarly, we add all switches to the switches hostgroup using the hostgroups directive in their host definitions.

Most routers and switches support Simple Network Management Protocol (SNMP) for monitoring. SNMP exposes data of the device in the form of variables, which can be queried remotely.

We will use the check_snmp check plugin provided by the default nagios plugins installation to execute check commands for monitoring the network devices. The common command definition is as follows:

define command {
    command_name        check_snmp
    command_line        $USER1$/check_snmp -H $HOSTADDRESS$ -o $ARG1$ $ARG2$
}

The identification scheme to identify variables that expose data is called OID in SNMP. The OID we want to query is passed using the -o command-line option to check_snmp. So we have made it the first argument to the check_snmp Icinga command definition and rest of the command-line arguments can be passed separately.

The packet loss and RTA check

The packet loss and RTA check is simply a packet loss (ping) and round trip average time (average time taken by packet to make a round trip from Icinga server to the network device) check between Icinga monitoring server and the switch device. The check_ping command already provides this checking functionality, so we won't need to use check_snmp for this particular check, instead we use check_ping.

define command {
    command_name        check_ping
    command_line        $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ $ARG3$
}

define service {
    use                     generic-service
    hostgroup_name          switches
    service_description     Ping
    check_command           check_ping!300,5%!700,20%
}

The service check will generate:

  • CRITICAL for more than 700 ms RTA or more than 20 percent packet loss
  • WARNING for more than 300 ms RTA or more than 5 percent packet loss
  • OK for less than 300 ms RTA and less than 5 percent packet loss

Note that we can pass more command-line options to check_ping using a third argument in the check_command directive of service definition, for example, to specify the timeout for the check plugin.

The SNMP status

If we are to monitor switch devices using SNMP, it is important to add a check for SNMP itself. We will use the simple uptime command via SNMP to check its status. If SNMP fails, the uptime command will also fail resulting in an alert.

define service {
    use                     generic-service
    hostgroup_name          switches
    service_description     SNMP
    check_command           check_snmp!sysUpTime.0
}

In the preceding code, sysUpTime.0 is a SNMP OID for getting the uptime value. If this service check fails, all other service checks relying on SNMP will also start failing.

The network port check

The network port check would monitor a network port and report if it is responding.

define service {
    use                     generic-service
    hostgroup_name          switches
    service_description     Port 443 status
    check_command           check_snmp!ifOperStatus.443!-r 1
}

This service check queries for the IOD ifOperStatus.443, in which the .443 part indicates that we want to check the port 443. The second argument -r 1 indicates that we expect the value 1 to be returned (1 means it is in the UP state). So it will give:

  • CRITICAL if SNMP returns a value other than 1
  • OK if SNMP returns the value 1
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.116.51