Level of the details monitored

There're almost unlimited combinations of what details one can monitor on some target. It's possible to monitor every single detailed parameter of a process, such as detailed memory usage, the existence of PID files, and many more things, and it's possible to simply check whether a process is running.

Sometimes, a single service can require multiple processes to be running, and it might be enough to monitor whether a certain category of processes is running as expected, trusting some other component to figure that out. One example could be Postfix, the email server. Postfix runs several different processes, including master, pickup, anvil, and smtpd. While checks could be created against every individual process, often it would be enough to check whether the init script thinks that everything is fine.

We would need an init script that has the status command support. As init scripts usually output a textual string, Checking for service Postfix: running, it would be better to return only a numeric value to Zabbix that would indicate the service state. Common exit codes are 0 for success and nonzero if there's a problem. That means we could do something like the following:

/etc/init.d/postfix status > /dev/null 2>&1 || echo 1 

That would call the init script, discard all stdin and stderr output (because we only want to return a single number to Zabbix), and return 1 upon a non-successful exit code. That should work, right? There's only one huge problem-parameters should never return an empty string, which is what would happen with such a check if Postfix was running. If the Zabbix server were to check such an item, it would assume the parameter is unsupported and deactivate it as a consequence. We could modify this string so that it becomes the following:

/etc/init.d/postfix status > /dev/null 2>&1 && echo 0 || echo 1 

This would work very nicely, as now a Boolean is returned and Zabbix always gets valid data. But there's a possibly better way. As the exit code is 0 for success and nonzero for problems, we could simply return that. While this would mean that we won't get nice Boolean values only, we could still check for nonzero values in a trigger expression like this:

{hostname:item.last()}>0 

As an added benefit, we might get a more detailed return message if the init script returns a more detailed status with nonzero exit codes. As defined by the Linux Standard Base, the exit codes for the status commands are the following:

Code

Meaning

0

Program is running or service is OK

1

Program is dead and /var/run pid file exists

2

Program is dead and /var/lock lock file exists

3

Program isn't running

4

Program or service status is unknown

 

There're several reserved ranges that might contain other codes, used by a specific application or distributionthose should be looked up in the corresponding documentation.

For such a case, our user parameter command becomes even simpler, with the full string being as follows:

UserParameter=service.status[*],/etc/init.d/"$1" status > /dev/null 2>&1; echo $? 

We're simply returning the exit code to Zabbix. To make the output more user friendly, we'd definitely want to use value mapping. That way, each return code would be accompanied on the frontend with an explanatory message like the preceding. Notice the use of $1. This way, we can create a single user parameter and use it for any service we desire. For an item like that, the appropriate key would be service.status[postfix] or service.status[nfs]. If such a check doesn't work for the non-root user, sudo would have to be used.

In open source land, multiple processes per single service are less common, but they're quite popular in proprietary software, in which case a trick like this greatly simplifies monitoring such services.

Most distributions have moved to systemd. In that case, the user parameter line would be UserParameter=service.status[*],systemctl status "$1" > /dev/null 2>&1; echo $?.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.118.95