Returning code and thresholds

Before coding a plugin, we must face some concepts that will be the stepping stone of our Nagios code base, one of these being the return codes of the plugin itself. As we already discussed, once the plugin collects the data about how the service is going, it evaluates these data and determines if the situation falls under one of the following status:

Return code

Status

Description

0

OK

The plugin checked the service and the results that are inside the acceptable range.

1

WARNING

The plugin checked the service and the results that are above a WARNING threshold. We must keep an eye on the service.

2

CRITICAL

The plugin checked the service and the results that are above a CRITICAL threshold or the service not responding. We must react now.

3

UNKNOWN

Either we passed the wrong arguments to the plugin or there is some internal error in it.

 

So, our plugin will check a service, evaluate the results, and based on a threshold, will return to Nagios one of the values listed in the tables and a meaningful message, like we can see in the description column in the following screenshot:

Notice the service check in red and the message in the preceding screenshot.

In the screenshot, we can see that some checks are green, meaning okay, and they have an explicative message in the description section. What we see in this section is the output of the plugin written in the stdout; and it is what we will craft as a response to Nagios.

Pay attention to the SSH check: it is red and is failing because it is checking the service at the default port, which is 22, but on this server the ssh daemon is listening on a different port. This leads us to a consideration: our plugin will need a command line parser able to receive some configuration options and some threshold limits as well, because we need to know what to check, where to check, and what are the acceptable working limits for a service:

  • Where: In Nagios, there can be a host without service checks (except for the implicit host alive carried on by a ping), but no services without a host to be performed onto. So, any plugin must receive on the command line the indication of the host to be run against, be it a dummy host but there must be one.
  • How: This is where our coding comes in; we will have to write the lines of code that instruct the plugin how to connect to the server, query, collect, and parse the answer.
  • What: We must instruct the plugin, usually with some meaningful options on the command line, on what are the acceptable working limits so that it can evaluate them and decide to notify us with an OK, WARNING, or CRITICAL message.

That is all for our script: who to notify, when, how, how many times, and so forth. These are tasks carried on by the core; a Nagios plugin is unaware of all of this. What it really must know for effective monitoring is what are the correct values that identify a working service. We can pass to our script two different kinds of value:

  • Range: This is a series of numeric values with a starting and ending point, like from 3 to 7 or from one number to infinite
  • Threshold: It is a range with an associated alert level

So, when our plugins perform checks, they collect a numeric value that is within or outside a range, based on the threshold we impose; then, based on the evaluation, it replies to Nagios with a return code and a message. How do we specify some ranges on the command line? Essentially in the following way:

[@] start_value:end_value

If the range starts from 0, the part from : to the left can be omitted. The start_value must always be a lower number than end_value.

If the range starts with start_value, it means from that number to infinity. Negative infinity can be specified using ~.

An alert is generated when the collected value resides outside the range specified, comprised of the endpoints.

If @ is specified, the alert is generated if the value resides inside the range.

Let's see some practical examples of how we would call our script, imposing some thresholds:

Plugin call

Meaning

./my_plugin -c 10

CRITICAL if less than 0 or higher than 10

./my_plugin -w 10:20

WARNING if less than 10 or higher than 20

/my_plugin -w ~:15 -c 16

WARNING if between -infinite and 15, critical from 16 and higher

./my_plugin -c 35:

CRITICAL if the value collected is below 35

./my_plugin -w @100:200

CRITICAL if the value is from 100 to 200, OK otherwise

 

We covered the basic requirements for our plugin that in its simplest form should be called with the following syntax:

./my_plugin -h hostaddress|hostname -w value -c value

We already talked about the need to relate a check to a host; we can do this either by using a hostname or hostaddress. It is up to us what to use, but we will not fill in this piece of information, because it will be drawn by the service configuration as a standard macro. We just introduced a new concept, service configuration, which is essential in making our script work in Nagios, so let's briefly see what we are talking about. A caveat before starting our journey on Nagios configurations: this is not a book on Nagios, so we will not cover all the complex bits and parts. We will touch all the topics needed to make our script do its job and with a working Nagios installation; we will be able to activate our new plugin quickly. Let's see now how to configure a plugin to make it work under Nagios, so then we will be able to focus on our script without any distractions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.29.73