Chapter 3. Running Remote Checks on Systems

So far, we have seen how to define service checks for a localhost. But the real use case of a monitoring server like Icinga is to monitor an entire infrastructure, not to deploy Icinga on each of the hosts you would want to monitor. This chapter covers ways to monitor remote servers from an Icinga instance, similar to the one we have for localhost monitoring. A similar configuration is used to monitor remote servers as well, with slight modifications.

There are several different ways of monitoring our infrastructure using remote servers, depending on the needs and services we want to monitor.

  • Active checks: The monitoring server polls the remote server at fixed intervals of time to check the status of the service. For example, the Icinga server would periodically run a command to make a test HTTP connection to the remote host to fetch the status of the HTTP check.
  • Passive checks: The remote hosts check the status of the service themselves and submit it to the monitoring server. The hosts would have to report both critical and recovery events to the monitoring server.

The type of the check can be configured on a per-service check basis using the active_checks_enabled and passive_checks_enabled directives in the service object definition.

Both active and passive checks have their appropriate use cases. It is important to determine the type of check to be used for your use case. Active checking is generally recommended for monitoring services such as HTTP, IMAP, and so on; while passive checking is recommended for services that are long running or are generated by internal events of the host, such as monitoring a logfile for errors; such an event would submit a CRITICAL event to Icinga.

Further, we will have a look at what tools are required for both types of checks.

Active checks

The monitoring server initiates the checks at specific intervals and their statuses are set according to the return value of the check plugin. There are several ways to retrieve status of a service, depending on the kind of service it is.

There are majorly two types of services: public services and private services. We will look at both of them in this section.

Public services

Publicly available services include services that are accessible over the network, either the internal network or via the Internet; basically, ones that can be checked by establishing the network connection and optionally making a sample request. Examples include HTTP, FTP, SSH, IMAP, SMTP, and MySQL Server.

If, for example, we want to monitor HTTP, SSH, and IMAP services on server1.example.org, which is some remote host other than the monitoring server itself, the host and services configuration would look like the following:

  • Host definition:
    define host {
      use         linux-server
      host_name   server1.example.org
      alias       Example server 1
      address     172.16.143.22
      hostgroups  linux               ; just an example
    }

Icinga's default set of configuration comes with a linux-server host template, which is defined in templates.cfg. Following is the configuration for a few service checks.

  • HTTP, which makes a GET / request on port 80:
    define command {
      command_name    check_http
      command_line    $USER1$/check_http -I $HOSTADDRESS$
    }
    
    define service {
      use                     generic-service
      host_name               server1.example.org
      service_description     HTTP
      check_command           check_http
    }
  • SSH:
    define command {
      command_name    check_ssh
      command_line    $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
    }
    
    define service {
      use                     generic-service
      host_name               server1.example.org
      service_description     SSH
      check_command           check_ssh
    }
  • IMAP:
    define command {
      command_name    check_imap
      command_line    $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
    }
    
    define service {
      use                     generic-service
      host_name               server1.example.org
      service_description     IMAP
      check_command           check_imap
    }

We can play around with command-line arguments that the check plugin such as check_http provides; for example, warning/critical threshold values of response time, and so on. We will cover the check plugins in detail in next chapter.

Note

Icinga service must be reloaded whenever we update any of the configuration files for these changes to take effect.

$ sudo service icinga reload

The reload/restart command verifies the entire configuration for syntax or semantics errors, and reports any errors found. It is recommended, as a general practice, to always do a configuration check before reloading/restarting Icinga.

$ sudo service icinga show-errors

The preceding command verifies the Icinga configuration and shows the errors, if any.

Private services

Private services include various system resource and performance checks, such as checks for free disk space, CPU load, memory usage, number of processes, and so on. Such information is not available over the network and has to be acquired using some intermediate agents that can provide the same when requested. Some of the agents are as follows:

  • SSH (Linux servers)
  • NRPE (Linux servers)
  • NSClient++ (Windows servers)
  • SNMP (routers, switches, and so on)

These agents can also be used to check for public services that are not necessarily accessible from the Icinga server, or the purpose of the check is different. For example, to test the reachability of web server running on server1 from server2 (both of which are different from Icinga server), running the HTTP check for server1 from Icinga server would not serve the purpose. This use case will use one or more of these agents running on server2 so that they can provide Icinga with the status of reachability of server1.

Secure Shell (SSH)

The simplest way to get any information from a Linux server is to run SSH on the remote server and run any command/script to get the information. The nagios-plugins package provides ready-to-use plugins for such purposes (check_load, check_disk, and so on). We have used these in our localhost monitoring setup. So, we can define a command to SSH the server, run one of these plugins, and return the output, which is determined to set the status of the service check on the monitoring server. We also need to ensure that the nagios-plugins package is installed on the remote server, so that we have all the available check plugins to execute over SSH. Let's look at a configuration for disk space check on server1.example.org:

define command {
  command_name    check_by_ssh
  command_line    $USER1$/check_by_ssh -H $HOSTADDRESS$ -C 'PATH=$PATH:/usr/lib64/nagios/plugins $ARG1$'
}

define service {
  use                     generic-service
  host_name               server1.example.org
  service_description     Disk
  check_command           check_by_ssh!check_disk -w 20% -c 10%
}

Note

Icinga runs and executes checks as user configured using the icinga_user directive in icinga.cfg. So, we need to make sure that proper SSH keys are generated for the user on the monitoring server, and added to authorized_keys of the remote server(s) so that check_by_ssh can execute flawlessly.

To generate SSH keys for the icinga user, use the following command:

$ su icinga –c 'ssh-keygen –t rsa'

Keep pressing Enter to give default values for presented parameters. This will generate the SSH public key (~/.ssh/id_rsa.pub) and the SSH private key (~/.ssh/id_rsa) in the .ssh directory inside the home folder of the icinga user.

It is necessary to put the public key in ~/.ssh/authorized_keys, which is in the home folder of the icinga user on the remote host. You will have to make sure the icinga user exists on the remote host. This will give SSH access to the icinga user on the Icinga server for the icinga user on the remote host.

We appended /usr/lib64/nagios/plugins to PATH so that the check_by_ssh command object can be re-used to run other plugins over SSH, without having to give the full path in the command every time.

Nagios Remote Plugin Executor (NRPE)

NRPE is an add-on that is deployed on the remote hosts to execute the check plugins on them. It is similar to using SSH; NRPE daemon has to be running on the remote server and its configuration should have a command_name to command_executable (with arguments) mapping. So, when Icinga executes the check_nrpe check, it uses the NRPE command name that we specify in the service definition, then sends it to the NRPE agent (daemon) on the remote server. This executes the corresponding command line and returns the exit code.

There are pros and cons in using this method instead of using the SSH method. SSH gives us more flexibility in terms of running any desired command or script over SSH. NRPE has an overhead of defining NRPE command-name to command-executable mapping and other required configuration. On the other hand, SSH increases the load on the monitoring server if there are a large number of checks, due to frequent opening and closing of SSH connections.

Each execution of a check calls for a SSH connection, execution, and closing of the connection, which is a considerable overhead. Following is an example of the NRPE daemon configuration (usually /etc/nrpe.cfg), similar command and service object definitions can be used to execute checks over NRPE:

# command[<command_name>]=<command_line>
command[check_users]=/usr/lib64/nagios/plugins/check_users -c 10
command[check_load]=/usr/lib64/nagios/plugins/check_load -c 40%

More information on installation and configuration of NRPE can be found at http://docs.icinga.org/latest/en/nrpe.html.

NSClient++

While the above methods are best suited for Linux servers, they are not supported for the Windows servers. For this purpose, there is an agent called NSClient++. It is the Windows' replacement for Linux's NRPE daemon, although it is cross-platform and available for Linux too. The same check_nrpe plugin can be used to run commands on remote Windows servers. The plugin contacts the NSClient++ agent and asks for the status of one of the commands made available by the agent. The list of available commands and their usage can be found in the agent's documentation.

For example, if we have a Windows server with the hostname server2.example.org, the host definition can be as follows:

define host {
  use             windows-server
  host_name       server2.example.org
  alias           Example server 2
  address         172.16.143.23
  hostgroups      windows               ; just an example
}

Icinga already provides a windows-server host template, found in templates.cfg. Following is what the Icinga configuration for checking CPU load would look like (NSClient++ supports a command called CheckCPU):

define command {
  command_name    check_nrpe
  command_line    $USER1$/check_nrpe -u -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$
}

define service {
  use                     generic-service
  host_name               server2.example.org
  service_description     CPU
  check_command           check_nrpe!CheckCPU!warn=80 crit=90
}

We'd need a working NSClient++ deployed on the remote Windows server(s) for the preceding code to work. Have a look at http://docs.icinga.org/latest/en/monitoring-windows.html#installwindowsagent for the same. Make sure proper whitelisting is done on the Windows server side to allow check_nrpe to talk to the agent. The list of commands supported by NSClient++ is available at http://www.nsclient.org/nscp/wiki/CheckCommands.

Simple Network Management Protocol (SNMP)

SNMP agents on routers and switches can be used to monitor checkpoints or services on them. Monitoring network devices mostly includes simply network traffic and open ports check. A wide range of such values is available via SNMP and are called the OIDs. Each Object Identifier (OID) has a value associated with it. For example, the OID sysUpTime.0 gives the uptime of the device. An example of a host definition for a router is as follows:

define host {
  use             generic-switch
  host_name       switch1.example.org
  alias           HP 12504 AC switch
  address         192.168.32.58
  hostgroups      switches              ; just an example
}

An example of a service check for getting the uptime is as follows:

define command {
  command_name     check_snmp
  command_line     $USER1$/check_snmp -H $HOSTADDRESS$ -o $ARG1$ $ARG2$
}

define service {
  use                     generic-service
  host_name               switch1.example.org
  service_description     SNMP
  check_command           check_snmp!sysUpTime.0
}

Readers are advised to read on how to use SNMP to get various kinds of values. These values can then be used for monitoring. The check_snmp plugin is used to contact the device and query the values via the SNMP agent after supplying relevant authorization. The snmpwalk command can be used to get the list of available OIDs with a particular device:

$ snmpwalk -mAll –v1 –cpublic switch1.example.org system

The previous command gives you a list of all OIDs and their values as reported by the switch device.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.60.158