Nagios – monitoring and notification

Nagios (http://www.nagios.org) is an open source monitoring and notification utility. It enables users to monitor various resources, such as CPU, memory, disk usage, network status, reachability, HTTP status, testing of web page rendering, and various checks using Nagios-compatible sensors. There is a giant list of Nagios plugins that covers the monitoring of almost all popular services and software. The best thing with Nagios is its plugin architecture. You can write a simple plugin for custom resource monitoring. Thus, effectively, if you can measure a state, you can monitor its source in Nagios. This section will discuss, very briefly, Nagios setup and how it can be enabled to monitor system resources and Cassandra.

Installing Nagios

Nagios ships in different packages, such as DIY, student, professional, and business, based on a number of features and support. One may visit the Nagios website and choose one on the basis of one's needs. With the number of free plugins, the Nagios free version is generally a good option. In this section, we will see how to install and configure the Nagios free version (from the source) on a CentOS machine. These instructions should work on any RHEL variant. For Ubuntu- or Debian-like environments, you may need to look for an apt-get equivalent of the yum commands in the script. On the basis of your Linux distribution, the Nagios distribution can be installed from additional repositories. It may or may not be the latest and greatest among Nagios, but it eases a lot of installation hassles. We use tarball installation for this book to keep things generic.

Prerequisites

The Nagios server (PHP-based) has some dependencies to be fulfilled before you can start installing it:

  • PHP: You will need to have a PHP processor to run Nagios. Check its availability using the following command:
    $ php -v
    PHP 5.3.26 (cli) (built: Jun 24 2013 18:08:10)
    Copyright (c) 1997-2013 The PHP Group
    Zend Engine v2.3.0, Copyright (c) 1998-2013 Zend Technologies
    

    If PHP does not exist, install it as follows:

    $ sudo yum install php
    
  • httpd: The Apache httpd web server serves as the front end to a PHP-based Nagios web application. To check whether you have httpd or not, execute the following command:
    $ httpd -v
    Server version: Apache/2.2.24 (Unix)
    Server built:   May 20 2013 21:12:45
    

    If httpd does not exist, install it as follows:

    $ sudo yum install httpd
    
  • GCC compiler: Check for the installed version of GCC compiler using the following command:
    $ gcc -v
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-amazon-linux/4.6.3/lto-wrapper
    Target: x86_64-amazon-linux
    Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,,fortran,ada,go,lto --enable-plugin --disable-libgcj --with-tune=generic --with-arch_32=i686 --build=x86_64-amazon-linux
    Thread model: posix
    gcc version 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC)
    

    If it does not exist, install it as follows:

    $ sudo yum install gcc glibc glibc-common
    
  • GD graphics library: GD is a dynamic graphics development library to generate various formats of dynamically generated images. Unfortunately, there is no quick way to see GD installation. To install the GD library, execute the following command:
    $ yum install gd gd-devel
    

Preparation

Before we jump into installing Nagios, we need to set up a user account and a group for Nagios as follows:

$ sudo -i
$ useradd -m nagios
$ passwd nagios
$ groupadd nagcmd
$ usermod -a -G nagcmd nagios
$ usermod -a -G nagcmd apache

Installation

Nagios installation can be divided into four parts: installing Nagios, configuring Apache httpd, installing plugins, and setting up Nagios as a service.

Installing Nagios

The following are the steps to install Nagios from tarball:

  1. Download tarball from the Nagios download page and untar it:
    $ wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz
    $ tar xzf nagios-3.5.0.tar.gz
    
  2. Install Nagios from the source:
    $ cd nagios
    $ ./configure –with-command-group=nagcmd
    $ make all
    $ sudo make install 
       install-base 
       install-cgis 
       install-html 
       install-exfoliation 
       install-config 
       install-init 
       install-commandmode 
       fullinstall
    
  3. Nagios is installed now. Update the contact details before you move to the next step:
    $ sudo vi /usr/local/nagios/etc/objects/contacts.cfg
    define contact{
     contact_name nagiosadmin     ; Short name of user
     use          generic-contact ; Inherit default values
     alias        Nagios Admin    ; Full name of user
     email        YOUR_EMAIL_ID   ; *SET EMAIL ADDRESS*
    }
    

Configuring Apache httpd

Perform the following steps to configure Apache httpd:

  1. Set Apache httpd with the appropriate Nagios configuration:
    $ sudo make install-webconf
    /usr/bin/install -c -m 644 sample-config/httpd.conf /etc/httpd/conf.d/nagios.conf
    *** Nagios/Apache conf file installed ***
    
  2. Set the password for the Nagios web console for the nagiosadmin user:
    $ sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
    
  3. Restart Apache httpd:
    $ sudo
     service httpd restart
    

Installing Nagios plugins

Perform the following steps to install Nagios plugins:

  1. Download and untar Nagios plugins from the Nagios website's plugins page (http://www.nagios.org/download/plugins/) using the following commands:
    $ wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
    $ tar xzf nagios-plugins-1.4.16.tar.gz
    
  2. Install the plugin:
    $ cd nagios-plugins-1.4.16
    $ ./configure --with-nagios-user=nagios –with-nagios-group=nagios
    $ make
    $ make install
    

    Note

    Warning

    If you get an error such as check_http.c:312:9: error: 'ssl_version' undeclared (first use in this function) while trying to execute ./configure or make, your system probably lacks the libssl library. To resolve this issue, execute the following commands:

    On RHEL- or CentOS-like systems, run the following command:

    yum install openssl-devel -y
    

    On Debian- or Ubuntu-like systems, run the following command:

    sudo apt-get install libssl-dev
    
  3. Rerun ./configure, then make clean, and finally make.

Setting up Nagios as a service

Everything is set. Now, let's set Nagios as a service, as follows:

$ sudo chkconfig --add nagios
$ sudo chkconfig nagios on

Check whether the default configuration is good to go and start the Nagios service:

# Check configuration file
$ sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[-- snip --]
Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
[-- snip --]
Processing object config file '/usr/local/nagios/etc/objects/localhost.cfg'...
   Read object config files okay...
Running pre-flight check on configuration data...
[-- snip --]
Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check

# Start Nagios as a service
$ sudo service nagios start

Now you are ready to see the Nagios web console. Open http://NAGIOS_HOST_ADDRESS/nagios URL in your browser. You should be able to see the Nagios home page with a couple of default checks on the Nagios host.

Nagios plugins

Nagios's power comes from the plethora of plugin libraries available for it. There are sufficient default plugins provided as a part of the base package to perform decent resource monitoring. For advanced or non-standard monitoring, you will have to either download the plugin from somewhere, such as the Nagios plugin directory or GitHub, or you will have to write a plugin of your own. Writing a custom plugin is very simple. There are only two requirements: the plugin should be executable via the command prompt, and the plugin should return with the following exit values:

  • 0: This implies the ok state
  • 1: This implies the warning state
  • 2: This implies the critical state
  • 3: This implies the unknown state

This means you are free to choose your programming language and tooling. As long as you follow these two specifications, your plugin can be used in Nagios.

Note

For the Nagios plugin directory, visit http://exchange.nagios.org/directory/Plugins.

For Nagios plugin projects on GitHub, visit https://github.com/search?q=nagios+plugin&type=Repositories&ref=searchresults.

Nagios plugins for Cassandra

There are a few Cassandra-specific plugins in the Nagios plugins directory. There is a promising project on GitHub, namely Nagios Cassandra Monitor (https://github.com/dmcnelis/NagiosCassandraMonitor). It seems a little immature but worth evaluating. In this section, we will use a JMX-based plugin that is not Cassandra-specific. We will use this plugin to connect to Cassandra nodes and query heap usage. This will tell us about two things: whether or not it can connect to Cassandra (which can be treated as an indication of whether or not the Cassandra process is up) and what the heap usage is.

The following are the steps to get the JMX plugin installed (all these operations take place on the Nagios machine and not on Cassandra nodes):

  1. Download the plugins from http://exchange.nagios.org/directory/Plugins/Java-Applications-and-Servers/check_jmx/details.
  2. Untar the downloaded plugin and navigate to the libexec directory:
    $ tar xvzf check_jmx.tgz
    $ cd check_jmx/nagios/plugin/
    $ sudo cp check_jmx jmxquery.jar /usr/local/nagios/libexec/
    
  3. Assign proper ownership and run a test:
    $ cd /usr/local/nagios/libexec/
    $ sudo chown nagios:nagios check_jmx jmxquery.jar
    
  4. Replace 10.99.9.67 with your Cassandra node:
    $ ./check_jmx -U
    service:jmx:rmi:///jndi/rmi://10.99.9.67:7199/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192
    
    JMX OK HeapMemoryUsage.used=1217368912{committed=1932525568;init=1953497088;max=1933574144;used=1217368912}
    
Executing remote plugins via the NRPE plugin

NRPE is a plugin to execute plugins on remote hosts. One may think of it as OpsCenter and its agents (see the following figure). With NRPE, Nagios can monitor remote host resources (such as memory, CPU, disk, and network) and can execute any plugin on a remote machine. The following figure shows Nagios with the NRPE plugin in action:

Executing remote plugins via the NRPE plugin

NRPE installation has to be done on the Nagios machine as well as on all the other machines where we want to execute a Nagios plugin locally (for example, to monitor CPU usage).

Installing NRPE on host machines

First, you need to create a nagios user and a nagios group and set the user with a password, as discussed in the Preparation section. After that, you need install the Nagios plugin as mentioned in the Installing Nagios plugins section. Now, you can install NRPE. Perform the following steps:

  1. Install xinetd if it does not already exist:
    $ sudo yum install xinetd
    
  2. Download the NRPE daemon and plugin from the NRPE Nagios page at http://exchange.nagios.org/directory/Addons/Monitoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Executor/details and install them:
    # Download and untar NRPE
    $ wget http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz
    $ tar xvzf nrpe-2.14.tar.gz
    # make and install daemon and plugin, configure xinetd
    $ cd nrpe-2.14
    $ ./configure
    $ make all
    $ sudo make install-plugin
    $ sudo make install-daemon
    $ sudo make install-daemon-config
    $ make install-xinetd
    
  3. After this, you need to make sure each host machine accepts requests coming from Nagios. For this, you need to edit /etc/xinetd.d/nrpe to add the Nagios host address to it. In the following code snippet, you need to replace NAGIOS_HOST_ADDRESS with the actual Nagios host address:
    # edit /etc/xinetd.d/nrpe
    only_from = 127.0.0.1 NAGIOS_HOST_ADDRESS
    
    # edit /etc/services and append this
    nrpe    5666/tcp             # NRPE
  4. Restart and test whether xinet is functional:
    # Restart xinetd
    $ sudo service xinetd restart
    Stopping xinetd:    [FAILED]
    Starting xinetd:    [  OK  ]
    
    # Check if it's listening
    $ netstat -at | grep nrpe
    tcp    0    0 *:nrpe    *:*    LISTEN
    # Check NRPE plugin
    $ /usr/local/nagios/libexec/check_nrpe -H localhost
    NRPE v2.14
    
    # Try to invoke a plugin via NRPE
    $ /usr/local/nagios/libexec/check_nrpe -H localhost -c check_load
    
    OK - load average: 0.01, 0.04, 0.06|load1=0.010;15.000;30.000;0; load5=0.040;10.000;25.000;0; load15=0.060;5.000;20.000;0;
    

Now, we have the machine ready to be monitored via NRPE.

Installing the NRPE plugin on a Nagios machine

Installing an NRPE plugin on a Nagios machine is a subset of the task that we did for the remote host machine. All you need to do is install the NRPE plugin and nothing else. The following are the steps to be performed to install a Nagios plugin:

$ wget http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz
$ tar xvzf nrpe-2.14.tar.gz
$ cd nrpe-2.14
$ ./configure
$ make all
$ sudo make install-plugin

# Test if plugin is working, you should replace 10.99.9.67
# with one of the machine's address with NRPE + xinetd
$ /usr/local/nagios/libexec/check_nrpe -H 10.99.9.67
NRPE v2.14

Setting up things to monitor

In this section, we will talk about how to set up CPU, disk, and Cassandra monitoring. However, the detail is enough to enable you to set up any Nagios plugin and configure monitoring.

  • Monitoring CPU and disk space: These are the tests that need to be executed on remote machines. Thus, we may need to configure NRPE configuration to allow those plugins to be executed remotely. This configuration is stored in /usr/local/nagios/etc/nrpe.cfg. If you do not find the plugin that you want to execute or you want to change the parameters to be passed to the plugin, this is the place to achieve that. Use the following set of commands:
    # edit /usr/local/nagios/etc/nrpe.cfg
    command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
    
    command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
    
    command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
    [-- snip --]
    #custom commands *add your commands here*
    
    # EC2 ephemeral storage root disk
    command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
    

    The following screenshot shows the Nagios interface monitoring local and remote resources:

    Setting up things to monitor

    As you can see, we have a CPU check (check_load) and a disk check already provided by the default configuration. However, if I wanted to monitor the /dev/sda1 device for space availability, I would add a new check check_sda1 for this.

  • Setting up a JMX monitor: For Cassandra, we want to check the JVM heap usage via JMX. Since this executes on the local machine (Nagios) to connect to the JMX service on the remote machine, we do not need to use NRPE for this. Thus, we have nothing to do here.
  • Updating configuration: The best part of Nagios is its configuration. With a little ingenuity and grouping, you can make a fine configuration that can scale to hundreds of machines. All configurations in Nagios are text-based with a JSON-ish syntax. You can have files organized in whichever way you want and let Nagios know where the files are. For this particular case, the /usr/local/nagios/etc/objects/cassandrahosts.cfg file is created. This file houses all the information related to monitoring. The following code snippet is what it looks like (see the comments in bold):
    # A machine to be monitored
    # DEFINE ALL CASSANDRA HOSTS HERE
    
    define host{
            use                     linux-server
            host_name               cassandra1
            alias                   Cassandra Machine
            address                 10.99.9.67
            }
    
    # create logical groupings, manageable, saves typing
    # HOST GROUP TO COLLECTIVELY CALL ALL CASSANDRA HOSTS
    
    define hostgroup{
            hostgroup_name  cassandra_grp
            alias           Cassandra Group
            members         cassandra1  ;this is CSV of
                                        ;hosts defined above
            }
    
    # A service defines what command to execute on what hosts
    # MONITORING SERVICES
    
    # A service that executes locally
    #Check Cassandra on remote machines
    
    define service{
            use                     generic-service
            hostgroup_name          cassandra_grp
            service_description     Cassandra
            check_command           check_cas ;defined below
            }
    
    # A service that gets executed remotely via NRPE
    # check disk space status
    define service{
      use                 generic-service
      hostgroup_name      cassandra_grp
      service_description check disk
      check_command       check_nrpe!check_sda1
      }
    
    # check CPU status
    define service{
      use                 generic-service
      hostgroup_name      cassandra_grp
      service_description check CPU
      check_command       check_nrpe!check_load
      }
    
    # A command is a template of a command line call, here:
    #   $USER1$ is plugin directory, nagios/libexec
    #   $HOSTADRRESS$ resolves to the address defined in
    #   the preceding host block; hosts are chosen from the service that calls this command
    
    # define custom commands
    # check JVM heap usage using JMX,
    # warn if > 3.7G, mark critical if > 3.85G
    
    define command {
            command_name check_cas
            command_line $USER1$/check_jmx -U service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:7199/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 3700000000 -c 3850000000
            }
  • Letting Nagios know about the new configuration: We have created a new configuration file that Nagios does not know about. We need to register it in /usr/local/nagios/etc/nagios.cfg. Now, append the following lines to the file:
    #custom file *ADD YOUR FILES HERE*
    cfg_file=/usr/local/nagios/etc/objects/cassandrahosts.cfg
    Test the configuration and you are done.
    $ sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
    Nagios Core 3.5.0
    [-- snip --]
    Reading configuration data...
       Read main config file okay...
    Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
    Processing object config file '/usr/local/nagios/etc/objects/contacts.cfg'...
    Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
    Processing object config file '/usr/local/nagios/etc/objects/templates.cfg'...
    Processing object config file '/usr/local/nagios/etc/objects/cassandrahosts.cfg'...
    Processing object config file '/usr/local/nagios/etc/objects/localhost.cfg'...
       Read object config files okay...
    Running pre-flight check on configuration data...
    [-- snip --]
     Total Warnings: 0
    Total Errors:   0
    Things look okay - No serious problems were detected during the pre-flight check

Restart Nagios by executing sudo service nagios restart.

Monitoring and notification using Nagios

Nagios has built-in support to send mail whenever an interesting event (such as a warning, an error, or a service coming back to the ok state) occurs. By default, it uses the mail command, so if your mail is configured correctly, you should see mails when you execute the following command:

# substitute YOUR_EMAIL_ADDRESS with your email id.
/usr/bin/printf "%b" "Hi Nishant, 
this is Nagios." | /bin/mail -s "Nagios test mail" YOUR_EMAIL_ADDRESS

If this does not reach your mail box or the spam folder, you should check your configuration. If you do not have the mail utility installed already, execute the following command:

# mail utility on RHEL like OS
$ sudo yum install mailx

# On Ubuntu or Debian derivatives
$ sudo apt-get install mailutils

If you are not happy with the mailing option or want to change the mailer to send mail via a specific mail provider such as Gmail, you should dig into the plugins directory or GitHub to find appropriate alternatives.

Nagios provides a pretty intuitive GUI—a web-based console that immediately highlights anything that is wrong with any service or host. Apart from displaying the immediate state, Nagios also stores the history of monitored events. There are many reporting capabilities that provide a complete infrastructure status overview. One can easily generate a histogram that states the performance of a service, as shown in the following screenshot:

Monitoring and notification using Nagios

An auto-generated histogram report from Nagios

There are many reporting options, including options to disable alerts during a scheduled infrastructure downtime. It may be worth playing around the Nagios GUI to learn about the various options.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.70.170