Nagios (http://www.nagios.org) is an open source monitoring and notification utility. It enables users to monitor various resources, such as CPU, memory, disk usage, network status, reachability, HTTP status, testing of web page rendering, and various checks using Nagios-compatible sensors. There is a giant list of Nagios plugins that covers the monitoring of almost all popular services and software. The best thing with Nagios is its plugin architecture. You can write a simple plugin for custom resource monitoring. Thus, effectively, if you can measure a state, you can monitor its source in Nagios. This section will discuss, very briefly, Nagios setup and how it can be enabled to monitor system resources and Cassandra.
Nagios ships in different packages, such as DIY, student, professional, and business, based on a number of features and support. One may visit the Nagios website and choose one on the basis of one's needs. With the number of free plugins, the Nagios free version is generally a good option. In this section, we will see how to install and configure the Nagios free version (from the source) on a CentOS machine. These instructions should work on any RHEL variant. For Ubuntu- or Debian-like environments, you may need to look for an apt-get
equivalent of the yum
commands in the script. On the basis of your Linux distribution, the Nagios distribution can be installed from additional repositories. It may or may not be the latest and greatest among Nagios, but it eases a lot of installation hassles. We use tarball installation for this book to keep things generic.
The Nagios server (PHP-based) has some dependencies to be fulfilled before you can start installing it:
$ php -v PHP 5.3.26 (cli) (built: Jun 24 2013 18:08:10) Copyright (c) 1997-2013 The PHP Group Zend Engine v2.3.0, Copyright (c) 1998-2013 Zend Technologies
If PHP does not exist, install it as follows:
$ sudo yum install php
$ httpd -v Server version: Apache/2.2.24 (Unix) Server built: May 20 2013 21:12:45
If httpd does not exist, install it as follows:
$ sudo yum install httpd
$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-amazon-linux/4.6.3/lto-wrapper Target: x86_64-amazon-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,,fortran,ada,go,lto --enable-plugin --disable-libgcj --with-tune=generic --with-arch_32=i686 --build=x86_64-amazon-linux Thread model: posix gcc version 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC)
If it does not exist, install it as follows:
$ sudo yum install gcc glibc glibc-common
$ yum install gd gd-devel
Before we jump into installing Nagios, we need to set up a user account and a group for Nagios as follows:
$ sudo -i $ useradd -m nagios $ passwd nagios $ groupadd nagcmd $ usermod -a -G nagcmd nagios $ usermod -a -G nagcmd apache
Nagios installation can be divided into four parts: installing Nagios, configuring Apache httpd, installing plugins, and setting up Nagios as a service.
The following are the steps to install Nagios from tarball:
$ wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz $ tar xzf nagios-3.5.0.tar.gz
$ cd nagios $ ./configure –with-command-group=nagcmd $ make all $ sudo make install install-base install-cgis install-html install-exfoliation install-config install-init install-commandmode fullinstall
$ sudo vi /usr/local/nagios/etc/objects/contacts.cfg define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values alias Nagios Admin ; Full name of user email YOUR_EMAIL_ID ; *SET EMAIL ADDRESS* }
Perform the following steps to configure Apache httpd:
$ sudo make install-webconf /usr/bin/install -c -m 644 sample-config/httpd.conf /etc/httpd/conf.d/nagios.conf *** Nagios/Apache conf file installed ***
nagiosadmin
user:$ sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
$ sudo service httpd restart
Perform the following steps to install Nagios plugins:
$ wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz $ tar xzf nagios-plugins-1.4.16.tar.gz
$ cd nagios-plugins-1.4.16 $ ./configure --with-nagios-user=nagios –with-nagios-group=nagios $ make $ make install
Warning
If you get an error such as check_http.c:312:9: error: 'ssl_version' undeclared (first use in this function)
while trying to execute ./configure
or make
, your system probably lacks the libssl
library. To resolve this issue, execute the following commands:
On RHEL- or CentOS-like systems, run the following command:
yum install openssl-devel -y
On Debian- or Ubuntu-like systems, run the following command:
sudo apt-get install libssl-dev
./configure
, then make clean
, and finally make
.Everything is set. Now, let's set Nagios as a service, as follows:
$ sudo chkconfig --add nagios $ sudo chkconfig nagios on
Check whether the default configuration is good to go and start the Nagios service:
# Check configuration file $ sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg [-- snip --] Website: http://www.nagios.org Reading configuration data... Read main config file okay... Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'... [-- snip --] Processing object config file '/usr/local/nagios/etc/objects/localhost.cfg'... Read object config files okay... Running pre-flight check on configuration data... [-- snip --] Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check # Start Nagios as a service $ sudo service nagios start
Now you are ready to see the Nagios web console. Open http://NAGIOS_HOST_ADDRESS/nagios URL
in your browser. You should be able to see the Nagios home page with a couple of default checks on the Nagios host.
Nagios's power comes from the plethora of plugin libraries available for it. There are sufficient default plugins provided as a part of the base package to perform decent resource monitoring. For advanced or non-standard monitoring, you will have to either download the plugin from somewhere, such as the Nagios plugin directory or GitHub, or you will have to write a plugin of your own. Writing a custom plugin is very simple. There are only two requirements: the plugin should be executable via the command prompt, and the plugin should return with the following exit values:
0
: This implies the ok state1
: This implies the warning state2
: This implies the critical state3
: This implies the unknown stateThis means you are free to choose your programming language and tooling. As long as you follow these two specifications, your plugin can be used in Nagios.
For the Nagios plugin directory, visit http://exchange.nagios.org/directory/Plugins.
For Nagios plugin projects on GitHub, visit https://github.com/search?q=nagios+plugin&type=Repositories&ref=searchresults.
There are a few Cassandra-specific plugins in the Nagios plugins directory. There is a promising project on GitHub, namely Nagios Cassandra Monitor (https://github.com/dmcnelis/NagiosCassandraMonitor). It seems a little immature but worth evaluating. In this section, we will use a JMX-based plugin that is not Cassandra-specific. We will use this plugin to connect to Cassandra nodes and query heap usage. This will tell us about two things: whether or not it can connect to Cassandra (which can be treated as an indication of whether or not the Cassandra process is up) and what the heap usage is.
The following are the steps to get the JMX plugin installed (all these operations take place on the Nagios machine and not on Cassandra nodes):
libexec
directory:$ tar xvzf check_jmx.tgz $ cd check_jmx/nagios/plugin/ $ sudo cp check_jmx jmxquery.jar /usr/local/nagios/libexec/
$ cd /usr/local/nagios/libexec/ $ sudo chown nagios:nagios check_jmx jmxquery.jar
10.99.9.67
with your Cassandra node:$ ./check_jmx -U service:jmx:rmi:///jndi/rmi://10.99.9.67:7199/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192 JMX OK HeapMemoryUsage.used=1217368912{committed=1932525568;init=1953497088;max=1933574144;used=1217368912}
NRPE is a plugin to execute plugins on remote hosts. One may think of it as OpsCenter and its agents (see the following figure). With NRPE, Nagios can monitor remote host resources (such as memory, CPU, disk, and network) and can execute any plugin on a remote machine. The following figure shows Nagios with the NRPE plugin in action:
NRPE installation has to be done on the Nagios machine as well as on all the other machines where we want to execute a Nagios plugin locally (for example, to monitor CPU usage).
First, you need to create a nagios
user and a nagios
group and set the user with a password, as discussed in the Preparation section. After that, you need install the Nagios plugin as mentioned in the Installing Nagios plugins section. Now, you can install NRPE. Perform the following steps:
xinetd
if it does not already exist:$ sudo yum install xinetd
# Download and untar NRPE $ wget http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz $ tar xvzf nrpe-2.14.tar.gz # make and install daemon and plugin, configure xinetd $ cd nrpe-2.14 $ ./configure $ make all $ sudo make install-plugin $ sudo make install-daemon $ sudo make install-daemon-config $ make install-xinetd
/etc/xinetd.d/nrpe
to add the Nagios host address to it. In the following code snippet, you need to replace NAGIOS_HOST_ADDRESS
with the actual Nagios host address:# edit /etc/xinetd.d/nrpe only_from = 127.0.0.1 NAGIOS_HOST_ADDRESS # edit /etc/services and append this nrpe 5666/tcp # NRPE
xinet
is functional:# Restart xinetd $ sudo service xinetd restart Stopping xinetd: [FAILED] Starting xinetd: [ OK ] # Check if it's listening $ netstat -at | grep nrpe tcp 0 0 *:nrpe *:* LISTEN # Check NRPE plugin $ /usr/local/nagios/libexec/check_nrpe -H localhost NRPE v2.14 # Try to invoke a plugin via NRPE $ /usr/local/nagios/libexec/check_nrpe -H localhost -c check_load OK - load average: 0.01, 0.04, 0.06|load1=0.010;15.000;30.000;0; load5=0.040;10.000;25.000;0; load15=0.060;5.000;20.000;0;
Now, we have the machine ready to be monitored via NRPE.
Installing an NRPE plugin on a Nagios machine is a subset of the task that we did for the remote host machine. All you need to do is install the NRPE plugin and nothing else. The following are the steps to be performed to install a Nagios plugin:
$ wget http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz $ tar xvzf nrpe-2.14.tar.gz $ cd nrpe-2.14 $ ./configure $ make all $ sudo make install-plugin # Test if plugin is working, you should replace 10.99.9.67 # with one of the machine's address with NRPE + xinetd $ /usr/local/nagios/libexec/check_nrpe -H 10.99.9.67 NRPE v2.14
In this section, we will talk about how to set up CPU, disk, and Cassandra monitoring. However, the detail is enough to enable you to set up any Nagios plugin and configure monitoring.
/usr/local/nagios/etc/nrpe.cfg
. If you do not find the plugin that you want to execute or you want to change the parameters to be passed to the plugin, this is the place to achieve that. Use the following set of commands:# edit /usr/local/nagios/etc/nrpe.cfg command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1 [-- snip --] #custom commands *add your commands here* # EC2 ephemeral storage root disk command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
The following screenshot shows the Nagios interface monitoring local and remote resources:
As you can see, we have a CPU check (check_load
) and a disk check already provided by the default configuration. However, if I wanted to monitor the /dev/sda1
device for space availability, I would add a new check check_sda1
for this.
/usr/local/nagios/etc/objects/cassandrahosts.cfg
file is created. This file houses all the information related to monitoring. The following code snippet is what it looks like (see the comments in bold):# A machine to be monitored # DEFINE ALL CASSANDRA HOSTS HERE define host{ use linux-server host_name cassandra1 alias Cassandra Machine address 10.99.9.67 } # create logical groupings, manageable, saves typing # HOST GROUP TO COLLECTIVELY CALL ALL CASSANDRA HOSTS define hostgroup{ hostgroup_name cassandra_grp alias Cassandra Group members cassandra1 ;this is CSV of ;hosts defined above } # A service defines what command to execute on what hosts # MONITORING SERVICES # A service that executes locally #Check Cassandra on remote machines define service{ use generic-service hostgroup_name cassandra_grp service_description Cassandra check_command check_cas ;defined below } # A service that gets executed remotely via NRPE # check disk space status define service{ use generic-service hostgroup_name cassandra_grp service_description check disk check_command check_nrpe!check_sda1 } # check CPU status define service{ use generic-service hostgroup_name cassandra_grp service_description check CPU check_command check_nrpe!check_load } # A command is a template of a command line call, here: # $USER1$ is plugin directory, nagios/libexec # $HOSTADRRESS$ resolves to the address defined in # the preceding host block; hosts are chosen from the service that calls this command # define custom commands # check JVM heap usage using JMX, # warn if > 3.7G, mark critical if > 3.85G define command { command_name check_cas command_line $USER1$/check_jmx -U service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:7199/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 3700000000 -c 3850000000 }
/usr/local/nagios/etc/nagios.cfg
. Now, append the following lines to the file:#custom file *ADD YOUR FILES HERE* cfg_file=/usr/local/nagios/etc/objects/cassandrahosts.cfg Test the configuration and you are done. $ sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Nagios Core 3.5.0 [-- snip --] Reading configuration data... Read main config file okay... Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'... Processing object config file '/usr/local/nagios/etc/objects/contacts.cfg'... Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'... Processing object config file '/usr/local/nagios/etc/objects/templates.cfg'... Processing object config file '/usr/local/nagios/etc/objects/cassandrahosts.cfg'... Processing object config file '/usr/local/nagios/etc/objects/localhost.cfg'... Read object config files okay... Running pre-flight check on configuration data... [-- snip --] Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
Restart Nagios by executing sudo service nagios restart
.
Nagios has built-in support to send mail whenever an interesting event (such as a warning, an error, or a service coming back to the ok state) occurs. By default, it uses the mail
command, so if your mail is configured correctly, you should see mails when you execute the following command:
# substitute YOUR_EMAIL_ADDRESS with your email id. /usr/bin/printf "%b" "Hi Nishant, this is Nagios." | /bin/mail -s "Nagios test mail" YOUR_EMAIL_ADDRESS
If this does not reach your mail box or the spam folder, you should check your configuration. If you do not have the mail utility installed already, execute the following command:
# mail utility on RHEL like OS $ sudo yum install mailx # On Ubuntu or Debian derivatives $ sudo apt-get install mailutils
If you are not happy with the mailing option or want to change the mailer to send mail via a specific mail provider such as Gmail, you should dig into the plugins directory or GitHub to find appropriate alternatives.
Nagios provides a pretty intuitive GUI—a web-based console that immediately highlights anything that is wrong with any service or host. Apart from displaying the immediate state, Nagios also stores the history of monitored events. There are many reporting capabilities that provide a complete infrastructure status overview. One can easily generate a histogram that states the performance of a service, as shown in the following screenshot:
There are many reporting options, including options to disable alerts during a scheduled infrastructure downtime. It may be worth playing around the Nagios GUI to learn about the various options.
3.21.104.183