Writing custom Nagios Core management scripts

As established in Chapter 7, Using the Web Interface, some control over the Nagios Core process can be achieved by different areas of a web interface by enabling or disabling notifications or checks, scheduling downtime for hosts or services, or acknowledging existing programs. However, for large networks in particular, we may often need a more flexible approach to run operations on a large number of hosts or services in batches or to disable or enable features from another script, which is cumbersome to manage via the CGI.

For example, if we had a list of more than a hundred hosts in a segment of our network that were going to become inaccessible due to scheduled maintenance at a known time, it would take a very long time for us to set up all the scheduled downtime in the web interface to avoid sending out notifications for those hosts. It would be better to do this via a command line.

In this recipe, we'll install a custom script in /usr/local/bin on our monitoring server called ncdt, which is short for Nagios Core Downtime, that can be used to create downtime quickly for named hosts in the given time period. The downtime can be created using the SCHEDULE_HOST_DOWNTIME custom command that is written to the external commands file located at /usr/local/nagios/var/rw/nagios.cmd. We'll then use this script to loop over a list of hostnames to quickly set up an appropriate fixed scheduled downtime for all of them.

Getting ready

You'll need to have a server running Nagios Core 4.0 or newer version. In this example, we'll provide the complete ncdt shell script as part of the instructions, but, to understand the script and the loop over the list of hosts, some experience with shell script would be helpful. On your monitoring server, you will also need GNU Bash 2.05a or newer, date(1) from GNU sh-utils 2.0 or newer, and the administrative rights to create new scripts in /usr/local/bin.

How to do it...

We can create, test, and apply the ncdt shell script as follows:

  1. Change the directory to /usr/local/bin on the monitoring server or some other directory in your path suitable for custom scripts like this one:
    # cd /usr/local/bin
    
  2. Edit a new script ncdt:
    # vi ncdt
    
  3. Paste the following shell script into the file:
    #!/usr/bin/env bash
    
    # Define some fields for the command
    now=$(date +%s)
    author=${SUDO_USER:-$USER}
    fixed=1
    trigger=0
    duration=0
    comment='Batch downtime set by ncdt'
    
    # Read the hostname from the first argument
    hostname=${1:?}
    
    # Attempt to parse start and end dates; exit if the call doesn't work
    dtstart=$(date +%s --date "${2:?}") || exit
    dtend=$(date +%s --date "${3:?}") || exit
    
    # Write command and print message if it fails; succeed silently
    printf '[%lu] SCHEDULE_HOST_DOWNTIME;%s;%s;%s;%u;%u;%u;%s;%s
    ' 
        "$now" "$hostname" "$dtstart" "$dtend" 
        "$fixed" "$trigger" "$duration" "$author" "$comment" 
        > /usr/local/nagios/var/rw/nagios.cmd
    
  4. Make the script executable with chmod(1):
    # chmod +x ncdt
    
  5. We can test the script by giving it three parameters in this order:
    • A hostname for an existing host configured in Nagios Core
    • A start time for the scheduled downtime
    • An end time for the scheduled downtime

    For this example, we'll set a downtime for the sparta.example.net host from "now" (the current time) until 5:00am tomorrow:

    # ncdt sparta.example.net now '5:00am tomorrow'
    
  6. We need to check whether the downtime actually works, which we can do by looking at the contents of the log file; in this case, /usr/local/nagios/var/nagios.log:
    [1448444263] EXTERNAL COMMAND: SCHEDULE_HOST_DOWNTIME;sparta.example.net; 1448444263; 1448467200;1;0;0;tom;Batch downtime set by ncdt
    [1448444263] HOST DOWNTIME ALERT: sparta.example.net;STARTED; Host has entered a period of scheduled downtime
    

    In this example, the output shows that the external command went through correctly and that the host immediately entered downtime, as expected.

  7. We can now loop over a list of hostnames in the downtime-list file to quickly set the same period of downtime for a list of any number of hosts, using a while-read loop:
    # while read -r hostname ; do
    > ncdt "$hostname" now '5:00am tomorrow'
    > done < downtime-list
    

How it works...

The Bash script accepts three arguments: the hostname of the host that should enter downtime, the date and time the downtime should start, and the date and time it should end. The dates can be provided in a variety of formats and keywords understood by the GNU date(1); they are translated to Unix timestamps.

The script fails if any of these are not set. It calculates the current timestamp for the command time, then the name of the user writing the command, and uses fixed values for the remaining fields of the SCHEDULE_HOST_DOWNTIME command. It formats the complete command with all its arguments with printf, separating them with semicolons, and it writes the completed command to the external commands file to be processed by Nagios Core.

Once we know that the script is working, we're able to apply it in the for or while loops to quickly run the same script over a large number of hosts, perhaps in a file or a list pasted into the terminal window.

There's more...

Custom scripts don't need to be written in shell script. They can be written in any programming language that suits, as the only filesystem interaction needed is to write a line to the external commands file. We could, for example, write scripts similar to ncdt in Perl, Python, or Ruby.

Your scripts could have more features than this or be more user friendly; some ideas for expanding on this simple script might include the following:

  • Manual pages for man(1) or help output when run with --help
  • Working on services instead of hosts
  • Checking whether the host exists with MK Livestatus before running the command
  • Verifying that the command worked, for example, the downtime or acknowledgement was established correctly in Nagios Core

Here are some other examples of tasks that might benefit from script shortcuts. Note that some of these would require NDOUtils or MK Livestatus as a backend to retrieve data from Nagios Core.

  • Acknowledging problems with hosts or services
  • Searching for Nagios hostnames matching a pattern
  • Viewing the most recent notifications for a given host
  • Sending custom notifications to all the contacts for a given host
  • Enabling or disabling notifications
  • Enabling or disabling active checks
  • Enabling or disabling event handlers

Tip

A complete list of the commands available for the external commands file is available at https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/extcommands.html.

See also

  • The Reading a status from a Unix socket with MK Livestatus recipe in this chapter
  • The Reading a status into a database with NDOUtils recipe in this chapter
  • The Writing customized Nagios Core reports recipe in this chapter
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.123.189