Let's create some load

Right, so we configured sending email. But it's not very interesting until we actually receive some notifications. Let's increase the load on our test system. In the console, launch the following:

$ cat /dev/urandom | md5sum

This grabs a pseudo random, never-ending character stream and calculates its MD5 checksum, so system load should increase as a result. You can observe the outcome as a graph—navigate to Monitoring | Latest data and click on Graph for our single item again.

Notice how the system load has climbed. If your test system can cope with such a process really well, it might not be enough—in such a case, you can try running multiple such MD5 checksum calculation processes simultaneously.

Allow 3 minutes to pass and there should be a popup in the upper-right corner, accompanied by a sound alert:

There is one of the frontend messages we enabled earlier in our user profile. Let's look at what's shown in the message window:

  • The small grey rectangle represents trigger severity. For recovery messages, it is green. We will discuss triggers in Chapter 6, Detecting Problems with Triggers.
  • The first link leads to the Monitoring | Problems page, displaying the current problems for the host that are causing the message.
  • The second link leads to the Monitoring | Problems page, displaying the problem history for the trigger in question. 

The third link leads to the event details, displaying more information about this particular occurrence.

The window itself can be repositioned vertically, but not horizontally—just drag it by the title bar. At the top of the window, there are three buttons.

These buttons also have tooltips to remind us what they do, as follows:

  • The snooze button silences the alarm sound that is currently being played.
  • The mute/unmute button allows you to disable/enable all sounds.
  • The clear button clears the currently visible messages. A problem that is cleared this way will not show up later unless it is resolved and then happens again.

Frontend messaging is useful as it provides the following:

  • Notifications of new and resolved problems when you aren't explicitly looking at a list of current issues
  • Sound alarms
  • Quick access to problem details

Now is a good time to revisit the configuration options of these frontend messages. Open the profile again by clicking on the link in the upper-right corner, and switch to the Messaging tab:

Here is what these parameters mean:

  • Frontend messaging: This enables/disables messaging for the current user.
  • Message timeout: This is used to specify how long a message should be shown. It affects the message itself, although it may affect the sound alarm as well.
  • Play sound: This drop-down has the options Once, 10 seconds, and Message timeout. The first one will play the whole sound once. The second one will play the sound for 10 seconds, looping if necessary. The third will loop the sound for as long as the message is shown.
  • Trigger severity: This lets you limit messages based on trigger severity (see Chapter 6, Detecting Problems with Triggers, for more information on triggers). Unmarking a checkbox will not notify you about that specific severity at all. If you want to get a message but not a sound alert, choose no_sound from the drop-down.
Adding new sounds is possible by copying .wav files to the audio sub-directory in the frontend directory.

Previously, when configuring frontend messaging, we set the message timeout to 180 seconds. The only reason was to give us enough time to explore the popup when it first appeared; it is not a requirement for using this feature.

Now, let's open Monitoring | Problems and select Recent Problems for  show, and A test hosts for Hosts. We should see the CPU load too high on A test host for last 3 minutes trigger visible with red, flashing PROBLEM text in the Status column.

The flashing indicates that a trigger has recently changed state, which we just made it do with that increased system load.

However, if you have a new email notification, you should already be aware of this state change before opening Monitoring | Triggers. If all went as expected, you should have received an email informing you about the problem, so check your email client if you haven't yet. There should be a message with the subject PROBLEM: CPU load too high on A test host for last 3 minutes.

Did the email fail to arrive? This is most often caused by some misconfiguration in the mail delivery chain preventing the message from passing. If possible, check your email server's log files as well as network connectivity and spam filters. Going to Reports | Action log might reveal a helpful error message.

You can stop all MD5 checksum calculation processes now with a simple Ctrl + C. The trigger should then change status to OK, though you should allow at least the configured item interval of 30 seconds to pass.

Again, check your email: there should be another message, this time informing you that it's alright now, having the subject OK: CPU load too high on A test host for last 3 minutes.

Another place where we can see our trigger is on the dashboard in the Problems widget.  We see that stat status is RESOLVED, and at the end, we will be able to see an arrow in red under Actions if something has failed. When we move our mouse over the arrow, we will see three actions: the first one was the email that was sent; the second icon is the calendar with the V in it, when the problem was resolved; and the third is the calendar with the exclamation mark inside, when the problem happened. After the envelope, you will see the reason why the email was not sent:

If all went fine, then congratulations! You have set up all the required configuration to receive alerts whenever something goes wrong, as well as when things go back to normal. Let's recall what we did and learned:

  • We created a host. Hosts are monitored device representations in Zabbix that can have items attached to them.
  • We also created an item, which is a basic way of obtaining information about a Zabbix system. Remember: the unique item identifier is key, which is also the string specifying what data will actually be gathered. A host was required to attach this item to.
  • We explored a simple graph for the item that was immediately available without any configuration. The easy-to-use time-period selection controls allowed us to view any period and quickly zoom in for drill-down analysis.
  • Having data already is an achievement in itself, but defining what a problem is frees us from manually trying to understand a huge number of values. That's where triggers come in. They contain expressions that define thresholds.
  • Having a list of problems instead of raw data is a step forward, but it would still require someone looking at the list. We'd prefer being notified instead—that's what actions are for. We were able to specify who should be notified and when.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.30.253