Installing Nagios

Nagios is an open source monitoring tool that is well known and widely accepted by system administrators. There are other OpenSource options available as well, such as Zabbix or Sensu. We won't be able to get into those here. Just know that they are available and can help your monitoring needs.

There are plans in the works for monitoring installation to be added to a Triple-O deployment. Keep watch on the community for progress that is made on this. For now, we will install Nagios and look at what configurations can be dropped in to monitor an OpenStack installation. Start by installing Nagios, setting it to start on boot and starting the service:

undercloud# sudo yum install nagios nagios-plugins-all nagios-plugins-nrpe -y
undercloud# sudo chkconfig nagios on
undercloud# sudo systemctl start nagios

When nagios is installed, it adds configuration to Apache to serve a web page for you to see the status. Open http://192.0.2.1/nagios/ in a web browser. The default username and password for the web page is nagiosadmin/nagiosadmin. When the web page is opened, there will be two links of interest. On the left-hand side of the screen, there will be a hosts and a services link. You will see that there is a default localhost configuration that is installed for you. The configuration for localhost is defined in /etc/nagios/objects/localhost.cfg. The plugins-all package that was installed with Nagios provides the plugins that are necessary for the localhost service checks to pass.

The localhost file could be copied and updated appropriately for each host that you would like to monitor if you want a base set of checks for a host. Servers that are included in an OpenStack cluster need more specific monitoring. Let's add configurations to the Nagios configuration files to monitor the OpenStack hosts and services. To apply configuration changes, the service will need to be restarted for Nagios to read the updates and start checks based on the new configurations. For example, we will refer to a set of monolithic configuration files to configure Nagios in this chapter. Each file that is referred to in this chapter will be placed in the drop directory /etc/nagios/conf.d. The top-level Nagios configuration file /etc/nagios/nagios.cfg includes files placed in this directory. There are options to break up the files referenced in this chapter; the way that the localhost file is structured is one example. It is beyond the scope of this chapter to go further into splitting the configuration files. Please search for documentation on this if you choose to split up your configuration files further.

Adding Nagios host checks

Start by adding host checks. The first example's configuration file will hold all the configuration stanzas for the hosts in the cluster that we are going to monitor. Let's use the file /etc/nagios/conf.d/nagios_host.cfg. This establishes a check to ensure that each host is up and responding to network communication. If you have additional compute nodes, make sure to add them as well. Here's the configuration that would cover the control, two compute and the ceph nodes. Be sure to use OpenStack server list to get the correct IP addresses and hostnames:

define host {
address 192.0.2.9
host_name overcloud-controller-0
use linux-server
}
define host {
address 192.0.2.10
host_name overcloud-novacompute-0
use linux-server
}
define host {
address 192.0.2.11
host_name overcloud-novacompute-1
use linux-server
}
define host {
address 192.0.2.8
host_name overcloud-cephstorage-0
use linux-server
}

After adding these configurations, validate the Nagios configuration and restart the Nagios service:

$ service nagios configcheck
Running configuration check... OK.
$ service nagios restart

Often a configuration gets a fat-finger error in it and the configuration validation will fail. When that happens, Nagios will fail to start. To find out where the syntax error is, run Nagios by hand by referencing the top-level configuration file:

$ nagios -v /etc/nagios/nagios.cfg

This will give you the line that the syntax error is on. If Nagios restarts successfully, you should be able to connect to Nagios on port 80, select the host list, and after some time passes and the checks fire, a health check will succeed on your hosts that have been added to the hosts configuration file. Now that Nagios is aware of the hosts that we will be monitoring, let's define an example command that could be used to monitor one of the services on the hosts.

Nagios commands

Before service checks can be executed to start checking a service on a host, there must be a command defined that will be referenced by the service check. Let's put these commands in the /etc/nagios/conf.d/nagios_command.cfg file. We are not going to cover all the commands needed to monitor your OpenStack cloud here. Instead, we will cover the concept of a defined command. Each command has a name that will be referenced later and a path to an executable. The executable runs and returns a zero through three return codes. Zero means the check succeeded, one means the check is warning, two means the check failed, and three or another return code indicates the status is unknown. An example of command definitions in the /etc/nagios/conf.d/nagios_command.cfg file looks like this:

define command { 
command_line /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ 
command_name check_nrpe 
}
define command { 
command_line /usr/lib64/nagios/plugins/example_commandcommand_name example_command
}

Note that the commands are executed live in /usr/lib64/nagios/plugins/. If you add executable scripts that Nagios will use to check services, it is good practice to add the executable scripts to this directory. If the intent of example_command was to verify that the host's hostname was set properly, its content may look like this:

#!/bin/bash
HOSTNAME=`hostname`
if [ -z $HOSTNAME ] || [ -z $1 ]; then
echo "Host name or argument was blank"
exit 3
fi
if [ $HOSTNAME == $1 ]; then
echo "Hostname is $HOSTNAME"
exit 0
fi
if [[ $HOSTNAME == *$1* ]]; then
echo "Hostname is $HOSTNAME and contains $1"
exit 1
else
echo "Hostname is not $1"
exit 2
fi

Note that there is a case for all four of the possible return values. It is not required that return codes three and one be returned. Unfortunately, this command could be terribly useless. If it were associated to a host, it would never be accurate because it would always execute on the host that Nagios is running on and would never return success for any host other than the host that Nagios is running on. This creates the need for a command to be executed remotely on a host that is being monitored.

The check_nrpe command shown in the nagios_command.cfg file is important as it allows exactly that – remote execution of commands on the hosts being monitored. Nagios Remote Plugin Executor (NRPE) checks are issued to the hosts via this command definition. Make sure that the NRPE command definition is in the nagios_command.cfg file. On each of the hosts that will have NRPE checks run on them, the NRPE service must be running and TCP port 5666 must be open for the Nagios host to connect to. Make sure this is a private connection. If this is unsecured traffic, it can open a security risk. To ensure the NRPE service is on each of your overcloud nodes, connect to each one and install, enable the service and add the Nagios server, 192.0.2.1 in your example cloud, to the allowed_hosts parameter in the configuration file /etc/nagios/nrpe.cfg:

overcloud-node# sudo yum install nrpe -y
overcloud-node# sudo chkconfig nrpe on
overcloud-node# sudo vim /etc/nagios/nrpe.cfg
overcloud-node# sudo systemctl start nrpe

The configuration for these checks requires that a host and a command name be passed and that all the details about the command that is run beyond its name will be defined on the remote host that the command is being executed against. These details live in /etc/nagios/nrpe.cfg on each host. At the very bottom of this file, there is an include_dir directive:

include_dir=/etc/nrpe.d/

However, for this example, we will put the commands right in the nrpe.cfg file underneath the include_dir directive. By configuring the NRPE commands, Nagios is able to connect to the nodes and execute the commands to carry out the monitoring. Let's use the example_command script as an example NRPE command and make it a useful definition. On the control node, put this line in the nrpe.cfg file:

command[check_hostname]=/usr/lib64/nagios/plugins/example_command control

If this configuration is added to each of the overcloud nodes with the respective hostnames and the example_command script is installed at the referenced location, then it could be used to verify that a hostname was properly set on each of the OpenStack nodes.

There is a large collection of commands and NRPE commands that need to be defined on the Nagios host and the hosts that Nagios is monitoring. Look at the example code included with this book for the executable scripts, commands, and NRPE definitions needed to execute the service checks that will be referenced in the rest of this chapter.

Now that a basic overview of adding hosts, commands, and service definitions of Nagios has been covered, let's take a look at the kinds of checks that are useful to monitor the health of an OpenStack cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.82.217