Performance considerations

Zabbix tends to perform nicely for small installations, but as the monitored environment grows, we might run into performance problems. A full Zabbix performance discussion is out of scope here, but let's discuss the starting points to having a healthy configuration and the directions for further research:

  • Monitor only what you really need, as rarely as possible, and keep the data only for as long as really needed. It's common for new users of Zabbix to use default templates as is, add a lot of new items with low intervals, and never look at the data. It's suggested to clone the default templates, eliminate all that isn't needed, and increase the intervals as much as possible. This involves trimming item lists, increasing intervals, and reducing history and trend-retention periods. There're also events, alerts, and other data—we'll discuss their storage settings a bit later.
  • When using Zabbix agents, use active items. Active items will result in a smaller number of network connections and reduce the load on the Zabbix server. There're some features not supported with active items, so sometimes you'll have to use passive items. We discussed what can and cannot be done with active items in Chapter 3, Monitoring with Zabbix Agents and Basic Protocols.
  • Use Zabbix proxies. They'll provide bulk data to Zabbix server, reducing the work the server has to do even further. We discussed proxies in Chapter 17, Using Proxies to Monitor Remote Locations.

We already know about the history and trend-retention periods for items, but for how long does Zabbix store events, alerts, acknowledgment messages, and other data? This is configurable by going to Administration | General and choosing Housekeeping in the drop-down menu in the upper-right corner:

This page is excessively long, so the preceding screenshot only shows a small section from the top.

Here, we may configure for how long to keep the following data:

  • Events and alerts: We may choose separate storage periods for trigger, internal, network discovery, and active agent auto-registration events. Note that removing an event will also remove all associated alerts and acknowledgment messages.
  • Services: The IT service up and down state is recorded separately from trigger events, and its retention period can be configured separately as well.
  • Audit data: This specifies how long to store the audit data for. We'll discuss what that actually is in a moment.
  • User sessions: User sessions that have been closed will be removed more frequently, but active user sessions will be removed after one year by default. This means that we can't be logged in longer than a year.

These values should be kept reasonably low. Keeping data for a long period of time will increase the database size, and that can impact the performance a lot.

What about the history and trend settings in here? While they're configurable per item normally, we may override those here. Also, for each of the entries, internal housekeeping may be disabled. These options are aimed at users who have to manage large Zabbix installations. When the database grows really large, its performance significantly degrades and can be improved by partitioning the biggest tables—splitting them up by some criteria. With Zabbix, it's common to partition the history and trends tables, sometimes adding events and alerts tables. If partitioning is used, parts of tables (partitions) are removed, and the internal housekeeping for those tables should be disabled. A lot of people in the Zabbix community will eagerly suggest partitioning at the first opportunity. Unless you plan to have a really large installation or know database partitioning really well, it might be better to hold off. There's no officially supported or built-in partitioning scheme yet, and one might appear in the future. If it does and your partition scheme is different, it'll be up to you to synchronize it with the official one.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.48.135