Let's first define a host for a sw-1.example.org
switch. Icinga already provides a template host for switches.
define host { use generic-switch host_name hp-ac12504 alias HP 12504 AC switch address 192.168.32.58 hostgroups all,switches }
Icinga gives a generic-switch
template host found in templates.cfg
with switch specific tweaks. Similarly, we add all switches to the switches
hostgroup using the hostgroups
directive in their host definitions.
Most routers and switches support Simple Network Management Protocol (SNMP) for monitoring. SNMP exposes data of the device in the form of variables, which can be queried remotely.
We will use the check_snmp
check plugin provided by the default nagios plugins
installation to execute check commands for monitoring the network devices. The common command definition is as follows:
define command { command_name check_snmp command_line $USER1$/check_snmp -H $HOSTADDRESS$ -o $ARG1$ $ARG2$ }
The identification scheme to identify variables that expose data is called OID in SNMP. The OID we want to query is passed using the -o
command-line option to check_snmp
. So we have made it the first argument to the check_snmp
Icinga command definition and rest of the command-line arguments can be passed separately.
The packet loss and RTA check is simply a packet loss (ping) and round trip average time (average time taken by packet to make a round trip from Icinga server to the network device) check between Icinga monitoring server and the switch device. The check_ping
command already provides this checking functionality, so we won't need to use check_snmp
for this particular check, instead we use check_ping
.
define command { command_name check_ping command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ $ARG3$ } define service { use generic-service hostgroup_name switches service_description Ping check_command check_ping!300,5%!700,20% }
The service check will generate:
Note that we can pass more command-line options to check_ping
using a third argument in the check_command
directive of service definition, for example, to specify the timeout for the check plugin.
If we are to monitor switch devices using SNMP, it is important to add a check for SNMP itself. We will use the simple uptime command via SNMP to check its status. If SNMP fails, the uptime command will also fail resulting in an alert.
define service { use generic-service hostgroup_name switches service_description SNMP check_command check_snmp!sysUpTime.0 }
In the preceding code, sysUpTime.0
is a SNMP OID for getting the uptime value. If this service check fails, all other service checks relying on SNMP will also start failing.
The network port check would monitor a network port and report if it is responding.
define service { use generic-service hostgroup_name switches service_description Port 443 status check_command check_snmp!ifOperStatus.443!-r 1 }
This service check queries for the IOD ifOperStatus.443
, in which the .443
part indicates that we want to check the port 443. The second argument -r 1
indicates that we expect the value 1
to be returned (1
means it is in the UP state). So it will give:
1
1
13.58.116.51