In this recipe, you'll learn how to use the Host and Service State Trends reporting tool on a host or service to show a graph of states over some fixed period of time. This can be useful to determine not only the overall availability, perhaps to meet the terms of a service-level agreement, but also to ascertain whether there are certain intervals or consistent times that the host enters a state that is not OK
. It's a good way to look for patterns in the downtime of your hosts.
You will need access to the Nagios Core web interface and permission to run commands from the CGIs. The sample configuration installed by following the Quick Start Guide provides the nagiosadmin
user all the necessary privileges when authenticated via HTTP.
If you find that you don't have this privilege, check the authorized_for_all_services
and authorized_for_all_hosts
directives in /usr/local/nagios/etc/cgi.cfg
and include your username in both, for example, tom
:
authorized_for_all_services=nagiosadmin,tom authorized_for_all_hosts=nagiosadmin,tom
Alternatively, you should also be able to see a host or service's information if you are authenticating with the same username as the nominated contact for the host or service you want to check.
In this example, we'll view a month's history for the CPU load service on roma.example.net
, a web server for which we've been running checks.
In Nagios 4.1.0, a new version of the Trends report was introduced using the new JSON data sources available to the CGIs. We'll demonstrate this one. The older version of the Trends report is still available. Just click on the (Legacy) link next to Trends.
We can arrange a Service State Trends report for the last month for our roma.example.net
server by following these steps:
You should be presented with a graph showing the state of the host or service over time, along with the markings of time for when the state changed, and a percentage breakdown of the relative states to the right.
A healthy service might look like this, with a few blips or none at all:
A more problematic service might have long periods of time in the WARNING
or CRITICAL
state:
You can double-click on sections of the graph to zoom in on them, provided that you did not select the Suppress image map option on the Advanced page of the report dialog.
Nagios Core assembles state changes from its log files for the specified time period and constructs the graph of state changes by color, delineating the dates on the horizontal axis and at regular intervals. The trends graph, therefore, only works for times covered by your archived log files. The third step of building the report involves a lot of possible options, if you switch to the Advanced tab, which are as follows:
retain_state_information
directive in nagios.cfg
.SOFT
states, meaning that it will include state changes that do occur but return to their previous state before max_check_attempts
is exhausted. Otherwise, it will only graph states that have endured right through the retry checks, or HARD
states.Note that it's important to ensure that checks have actually been running for the entire time for which you're running the report, as otherwise the State Breakdowns section will have distorted statistics. There is not much point running a yearly report for a host that has only existed for 6 months! In general, the more frequent and consistent your checks, the more accurate the trends graph will be.
3.133.134.151