Nagios (http://www.nagios.org) is an open source monitoring and notification utility. It enables users to monitor various resources, such as CPU, memory, disk usage, network status, reachability, HTTP status, testing web page rendering, and various checks using Nagios-compatible sensors. There is a giant list of Nagios plugins that covers the monitoring of almost all popular services and software. The best thing with Nagios is its plugin architecture. You can write a simple plugin for custom resource monitoring. So, effectively, anything where its state can be measured, can be monitored via Nagios. This section will discuss, very briefly, Nagios setup and how it can be enabled to monitor system resources and Cassandra.
Nagios ships in different packages, such as DIY, student, professional, and business, based on a number of features and support; you may visit the Nagios website and choose one based on your needs. With the number of free plugins, the Nagios free version is generally a good option. In this section, we will see how to install and configure the Nagios free version (from the source) on a CentOS machine. These instructions should work on any RHEL variant. For Ubuntu- or Debian-like environments, you may need to look for an apt-get
equivalent of the yum
commands in the script. Based on your Linux distribution, the Nagios distribution can be installed from additional repositories. It may or may not be the latest and greatest among Nagios, but it eases a lot of installation hassles. We use tarball installation for this book to keep things generic.
The Nagios server (PHP-based) has some dependencies to be fulfilled before you can start installing it.
$ php -v PHP 5.3.26 (cli) (built: Jun 24 2013 18:08:10) Copyright (c) 1997-2013 The PHP Group Zend Engine v2.3.0, Copyright (c) 1998-2013 Zend Technologies
If PHP does not exist, install it.
$ sudo yum install php
$ httpd -v Server version: Apache/2.2.24 (Unix) Server built: May 20 2013 21:12:45
If httpd does not exist, install it.
$ sudo yum install httpd
$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-amazon-linux/4.6.3/lto-wrapper Target: x86_64-amazon-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,,fortran,ada,go,lto --enable-plugin --disable-libgcj --with-tune=generic --with-arch_32=i686 --build=x86_64-amazon-linux Thread model: posix gcc version 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC)
Install it, if it does not exist:
$ sudo yum install gcc glibc glibc-common
$ yum install gd gd-devel
Before we jump into installing Nagios, we need to set up a user account and a group for Nagios.
$ sudo -i $ useradd -m nagios $ passwd nagios $ groupadd nagcmd $ usermod -a -G nagcmd nagios $ usermod -a -G nagcmd apache
Nagios installation can be divided into four parts: installing Nagios, configuring Apache httpd, installing plugins, and setting up Nagios as a service.
The following are the steps to install Nagios from tarball:
$ wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz $ tar xzf nagios-3.5.0.tar.gz
$ cd nagios $ ./configure –with-command-group=nagcmd $ make all $ sudo make install install-base install-cgis install-html install-exfoliation install-config install-init install-commandmode fullinstall
$ sudo vi /usr/local/nagios/etc/objects/contacts.cfg
define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values alias Nagios Admin ; Full name of user email YOUR_EMAIL_ID ; *SET EMAIL ADDRESS* }
$ sudo make install-webconf /usr/bin/install -c -m 644 sample-config/httpd.conf /etc/httpd/conf.d/nagios.conf *** Nagios/Apache conf file installed ***
nagiosadmin
:$ sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
$ sudo service httpd restart
$ wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz $ tar xzf nagios-plugins-1.4.16.tar.gz
$ cd nagios-plugins-1.4.16 $ ./configure --with-nagios-user=nagios –with-nagios-group=nagios $ make $ make install
Warning! If you get an error such as check_http.c:312:9: error: 'ssl_version' undeclared (first use in this function)
while trying to execute ./configure or make
, your system probably lacks the libssl
library. To resolve this issue, execute the following commands:
On RHEL- or CentOS-like systems:
yum install openssl-devel -y
On Debian- or Ubuntu-like systems:
sudo apt-get install libssl-dev
./configure
, then make clean
, then make
.Everything is set; let's set Nagios as service:
$ sudo chkconfig --add nagios $ sudo chkconfig nagios on
Check if the default configuration is good to go and start the Nagios service:
# Check configuration file $ sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg [-- snip --] Website: http://www.nagios.org Reading configuration data... Read main config file okay... Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'... [-- snip --] Processing object config file '/usr/local/nagios/etc/objects/localhost.cfg'... Read object config files okay... Running pre-flight check on configuration data... [-- snip --] Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check # Start Nagios as a service $ sudo service nagios start
Now you are ready to see the Nagios web console. Open the http://NAGIOS_HOST_ADDRESS/nagios URL in your browser. You should be able to see the Nagios home page with a couple of default checks on the Nagios host.
Nagios' power comes from a lot of plugin libraries available for it. There are sufficient default plugins provided as a part of the base package to perform decent resource monitoring. For advanced or non-standard monitoring, you will have to either download it from somewhere, such as the Nagios plugins directory or GitHub, or you will have to write a plugin of your own. Writing a custom plugin is very simple. There are only two requirements: the plugin should be executable via command prompt, and the plugin should return with the following exit values:
0
implying OK state1
implying warning state2
implying critical state3
implying unknown stateThis means you are free to choose your programming language and tooling. As long as you follow these two specifications, your plugin can be used in Nagios.
http://exchange.nagios.org/directory/Plugins
Nagios plugins projects on GitHub:
https://github.com/search?q=nagios+plugin&type=Repositories&ref=searchresults
There are a few Cassandra-specific plugins in the Nagios plugins directory. There is a promising project on GitHub, namely, Nagios Cassandra Monitor (https://github.com/dmcnelis/NagiosCassandraMonitor); it seems a little immature, but worth evaluating. In this subsection, we will use a JMX-based plugin that is not Cassandra-specific. We will use this plugin to connect to Cassandra nodes and query heap usage. This will tell us about two things: whether or not it can connect to Cassandra (which can be treated as an indication of whether or not the Cassandra process is up) and what the heap usage is.
The following are the steps to get the JMX plugin installed. All these operations take place on the Nagios machine and not on Cassandra nodes.
libexec
directory:$ tar xvzf check_jmx.tgz $ cd check_jmx/nagios/plugin/ $ sudo cp check_jmx jmxquery.jar /usr/local/nagios/libexec/
$ cd /usr/local/nagios/libexec/ $ sudo chown nagios:nagios check_jmx jmxquery.jar
10.99.9.67
with your Cassandra node:$ ./check_jmx -U service:jmx:rmi:///jndi/rmi://10.99.9.67:7199/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192 JMX OK HeapMemoryUsage.used=1217368912{committed=1932525568;init=1953497088;max=1933574144;used=1217368912}
NRPE is a plugin to execute plugins on remote hosts. One may think of it as OpsCenter and its agents (see the following figure). With NRPE, Nagios can monitor remote host resources, such as memory, CPU, disk, network, and can execute any plugin on a remote machine.
NRPE installation has to be done on the Nagios machine as well as all the other machines where we want to execute a Nagios plugin locally, for example, to monitor the CPU usage.
First, you need to create a nagios
user and a nagios
group and set the user with a password as discussed in the Preparation subsection in this chapter. After that, install the Nagios plugin as mentioned in the Installing Nagios plugins section in this chapter. Now you may proceed to the NRPE installation.
xinetd
if it does not already exist:$ sudo yum install xinetd
# Download and untar NRPE $ wget http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz $ tar xvzf nrpe-2.14.tar.gz # make and install daemon and plugin, configure xinetd $ cd nrpe-2.14 $ ./configure $ make all $ sudo make install-plugin $ sudo make install-daemon $ sudo make install-daemon-config $ make install-xinetd
/etc/xinetd.d/nrpe
to add the Nagios host address to it. In the following code snippet below, you need to replace NAGIOS_HOST_ADDRESS
with the actual Nagios host address:# edit /etc/xinetd.d/nrpe only_from = 127.0.0.1 NAGIOS_HOST_ADDRESS # edit /etc/services append this nrpe 5666/tcp # NRPE
# Restart xinetd $ sudo service xinetd restart Stopping xinetd: [FAILED] Starting xinetd: [ OK ] # Check if it's listening $ netstat -at | grep nrpe tcp 0 0 *:nrpe *:* LISTEN # Check NRPE plugin $ /usr/local/nagios/libexec/check_nrpe -H localhost NRPE v2.14 # Try to invoke a plugin via NRPE $ /usr/local/nagios/libexec/check_nrpe -H localhost -c check_load OK - load average: 0.01, 0.04, 0.06|load1=0.010;15.000;30.000;0; load5=0.040;10.000;25.000;0; load15=0.060;5.000;20.000;0;
Installing NRPE plugin on a Nagios machine is a subset of the task that we did for the remote host machine. All you need to do is install the NRPE plugin and nothing else. The following are the steps:
$ wget http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz $ tar xvzf nrpe-2.14.tar.gz $ cd nrpe-2.14 $ ./configure $ make all $ sudo make install-plugin # Test if plugin is working, you should replace 10.99.9.67 # with one of the machine's address with NRPE + xinetd $ /usr/local/nagios/libexec/check_nrpe -H 10.99.9.67 NRPE v2.14
In this section, we will talk about how to set up CPU, disk, and Cassandra monitoring. However, the detail is enough to enable you to set up any Nagios plugin and configure monitoring.
Monitoring CPU and disk space: These are the tests that need to be executed on remote machines. So, we may need to configure NRPE configuration to allow those plugins to be executed remotely. This configuration is stored in /usr/local/nagios/etc/nrpe.cfg
. If you do not find the plugin that you wanted to execute or you want to change the parameters to be passed to the plugin, this is the place; to achieve that, use the following set of commands:
# edit /usr/local/nagios/etc/nrpe.cfg command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1 [-- snip --] #custom commands *add your commands here* # EC2 ephemeral storage root disk command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
Have a look at the following screenshot:
As you can see, we have a CPU check (check_load
) and a disk check already provided by the default configuration. However, if I wanted to monitor the /dev/sda1
device for space availability, I would add a new check, check_sda1
, for this.
Setting up a JMX monitor: For Cassandra, we want to check the JVM heap usage via JMX. Since this executes on the local machine (Nagios) to connect to the JMX service on the remote machine, we do not need to use NRPE for this. So, we have nothing to do here.
Updating configuration
: The best part of Nagios is its configuration. With a little trick and grouping, you can make a fine configuration that can scale to hundreds of machines. All configurations in Nagios are text-based with JSON-ish syntax. You can have files organized in whichever way you want and let Nagios know where the files are. For this particular case, the /usr/local/nagios/etc/objects/cassandrahosts.cfg
file is created. This file houses all the information related to monitoring. The following code is what it looks like (see the comments in bold):
# A machine to be monitored # DEFINE ALL CASSANDRA HOSTS HERE define host{ use linux-server host_name cassandra1 alias Cassandra Machine address 10.99.9.67 } # create logical groupings, manageable, saves typing # HOST GROUP TO COLLECTIVELY CALL ALL CASSANDRA HOSTS define hostgroup{ hostgroup_name cassandra_grp alias Cassandra Group members cassandra1 ;this is CSV of ;hosts defined above } # A service defines what command to execute on what hosts # MONITORING SERVICES # A service that executes locally #Check Cassandra on remote machines define service{ use generic-service hostgroup_name cassandra_grp service_description Cassandra check_command check_cas ;defined below } # A service that gets executed remotely via NRPE # check disk space status define service{ use generic-service hostgroup_name cassandra_grp service_description check disk check_command check_nrpe!check_sda1 } # check CPU status define service{ use generic-service hostgroup_name cassandra_grp service_description check CPU check_command check_nrpe!check_load } # A command is a template of a command line call, here: # $USER1$ is plugin directory, nagios/libexec # $HOSTADRRESS$ resolves to the address defined in # host block above, hosts are chosen from the service that # calls this command # define custom commands # check JVM heap usage using JMX, # warn if > 3.7G, mark critical if > 3.85G define command { command_name check_cas command_line $USER1$/check_jmx -U service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:7199/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 3700000000 -c 3850000000 }
Letting Nagios know about the new configuration: We have created a new configuration file that Nagios does not know about. We need to register it in /usr/local/nagios/etc/nagios.cfg
; append the following line to this file:
#custom file *ADD YOUR FILES HERE* cfg_file=/usr/local/nagios/etc/objects/cassandrahosts.cfg
Test the configuration and you are done.
$ sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Nagios Core 3.5.0 [-- snip --] Reading configuration data... Read main config file okay... Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'... Processing object config file '/usr/local/nagios/etc/objects/contacts.cfg'... Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'... Processing object config file '/usr/local/nagios/etc/objects/templates.cfg'... Processing object config file '/usr/local/nagios/etc/objects/cassandrahosts.cfg'... Processing object config file '/usr/local/nagios/etc/objects/localhost.cfg'... Read object config files okay... Running pre-flight check on configuration data... [-- snip --] Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
Nagios has built-in support to send mails whenever an interesting event, such as a warning, an error, or a service coming back to the OK state, occurs. By default, it uses the mail
command, so if your mail is configured correctly, you should see mails when you execute the following command:
# substitute YOUR_EMAIL_ADDRESS with your email id.
/usr/bin/printf "%b" "Hi Nishant,
this is Nagios." | /bin/mail -s "Nagios test mail" YOUR_EMAIL_ADDRESS
If this does not reach your mail box or the spam folder, you should check your configuration. If you do not have the mail utility installed already, execute the following command:
# mail utility on RHEL like OS
$ sudo yum install mailx
# On Ubuntu or Debian derivatives
$ sudo apt-get install mailutils
If you are not happy with the mailing option or want to change the mailer to send mail via a specific mail provider like Gmail, you should dig into the plugins directory or GitHub to find appropriate alternatives.
Nagios provides a pretty intuitive GUI—a web-based console that immediately highlights anything that is wrong with any service or host. Apart from displaying the immediate state, Nagios also stores the history of monitored events. There are many reporting capabilities that provide a complete infrastructure status overview. One can easily generate a histogram that states the performance of a service. Refer to the following diagram:
There are many reporting options; options to disable the alerts during a scheduled downtime of infrastructure. It may be worth playing around the Nagios GUI to learn about the various options.
18.226.251.206