Exploring configuration file parameters

Let's conclude this chapter by digging into the configuration files of the Zabbix agent and server and examining each parameter in them. We'll start with the agent configuration file and discuss the ways in which common parameters apply to other daemons. We will skip the proxy configuration file, as the common parameters will be discussed by then, and the proxy-specific parameters were discussed in Chapter 19, Using Proxies to Monitor Remote Locations. We will also skip all the parameters that start with TLS, as those are related to Zabbix daemon traffic encryption, and we discussed that in Chapter 20, Encrypting Daemon Traffic.

We will look at the parameters in the order they appear in in the default example configuration files—no other meaning should be derived from the ordering here.

While reading the following descriptions, it is suggested to have the corresponding configuration file open. It will allow you to verify that the parameters are the same in your version of Zabbix. Make sure to read the comments next to each parameter—they might show that some parameters have changed since the time of writing this. In general, when in doubt, read the comments in the configuration files. The Zabbix team tries really hard to make them both short and maximally relevant and helpful.

Zabbix agent daemon and common parameters

Let's start with the agent daemon parameters. For the parameters that are also available for other daemons, we'll discuss their relevance to all the daemons here:

  • PidFile: This is common to all daemons. They write the PID of the main process in this file. The default configuration files use /tmp for simplicity's sake. In production systems, this should be set to the distribution recommended location.
  • LogType: This is common to all daemons and can be one of file, syslog, or console. The default is file, and in that case, the LogFile parameter determines where the logs are written. The syslog value directs the daemon to log to syslog, and the console parameter tells it to log the messages to stdout.
  • LogFile: This is common to all daemons. Log data is written to this file when LogType is set to file. The default configuration files use /tmp for simplicity's sake. In production systems, this should be set to the distribution-recommended location.
  • LogFileSize: This is common to all daemons. When logging to a file, if the file size exceeds this number of megabytes, move it to file.0 (for example, zabbix_agentd.log.0) and log to a new file. Only one such move is performed (that is, there is never zabbix_agentd.log.1).
  • DebugLevel: This is common to all daemons and specifies how much logging information to provide, starting with 0 (nearly nothing) and ending with 5 (a lot). It is probably best to run at DebugLevel 3 normally, and use something higher for debugging. For example, starting with DebugLevel 4, all server and proxy database queries are logged. At DebugLevel 5, two extra things are currently logged:
    • Received pages for web monitoring
    • Received raw data for VMware monitoring

      Tip

      We will look at changing the log level for a running daemon in Appendix A, Troubleshooting.

  • SourceIP: This is common to all daemons. If the system has multiple interfaces, outgoing connections will use the specified address. Note that not all connections will obey this parameter—for example, the backend database connections on the server or proxy won't.
  • EnableRemoteCommands: This determines whether the system.run item should allow running commands. Disabled by default.
  • LogRemoteCommands: If EnableRemoteCommands is enabled, this parameter allows us to log all the received commands. Unless system.run is used to retrieve data, it's probably a good idea to enable logging of the remote commands.
  • Server: This is also available for the Zabbix proxy, but not for the Zabbix server. It's a comma-delimited list of IP addresses or host names the agent should accept connections from. It's only relevant for passive items, zabbix_get, and other incoming connections.
  • ListenPort: This is common to all daemons and specifies the port to listen on.
  • ListenIP: This is common to all daemons and specifies the IP address to listen on—could also be a comma-delimited list of addresses.
  • StartAgents: This is the number of processes to start that are responsible for incoming connection handling. If it's a very resource-starved system, it might be a good idea to reduce this. If this agent is expected to get lots of queries for passive items, increase this number. Note that it has nothing to do with the collector or active check processes; their numbers cannot be directly changed. If set to 0, the agent will stop listening to incoming connections. This could be better security-wise, but could also make debugging much harder.
  • ServerActive: This is the list of servers and ports to connect to for active checks. It follows the syntax of server:port, with multiple entries delimited by commas. If not set, no active checks are processed. We discussed this functionality in Chapter 3, Monitoring with Zabbix Agents and Basic Protocols.
  • Hostname: This is also available for the Zabbix proxy, but not for the Zabbix server. If specified, the exact string will be sent to the Zabbix server as the host name for this system.
  • HostnameItem: If Hostname is not specified but HostnameItem is, the value in this parameter will be interpreted as an item key and the result of the evaluation will be sent to the server as the host name for this system.
  • HostMetadata: This is an exact string to be sent to the server—used in active agent autoregistration.
  • HostMetadataItem: If HostMetadata is not specified but HostMetadataItem is, the value in this parameter will be interpreted as an item key and the result of the evaluation will be sent to the server as the host metadata to be used in active agent autoregistration.
  • RefreshActiveChecks: This specifies how often the agent should connect to the server and ask for active items. It's set to 2 minutes by default. If active checks are not used at all, it means a useless connection every 2 minutes from each agent—it's best not to set ServerActive at all in such a case.
  • BufferSend: Active agents will send values every BufferSend seconds—by default, every 5 seconds. This allows us to reduce the number of network connections if multiple values are collected within a 5-second window.
  • BufferSize: This is a buffer to hold the values for active items. By default, it's set to 100 values. This is an in-memory buffer—do not set it too large if memory usage is a concern. The buffer is actually split in half if there is at least one log-monitoring item—one half is used for "normal" values, the other for log entries. If the buffer is full, new "normal" values will result in the dropping of older "normal" values, but it won't affect log entries. If the log entry half of the buffer is full, log file processing stops, but no entries are dropped there. If there are log items only and no "normal" items, half of the buffer is still reserved for "normal" entries. If there are only "normal" items, the whole buffer is used for them until at least one log item is added.
  • MaxLinesPerSecond: This is the default maximum number of lines of log items that should be sent to the server. We discussed this in Chapter 11, Advanced Item Monitoring.
  • Alias: This is a way to set an alias for an item key. While usable on all platforms, we discussed it in Chapter 14, Monitoring Windows. This parameter can also be used to create two LLD rules with the same key, even if the key itself does not accept parameters. One rule could use the original key, another the key that is aliased.
  • Timeout: This is common to all daemons. It specifies the timeout for running commands, making connections, and so on. Since Zabbix 3.0, it has a default of 3 on agents and 4 on the server and proxy. This could affect userparameters, for example—a script that takes more than a few seconds would time out. It is highly suggested not to increase the timeout on the server side—if we have to handle many values every second, it's not good to have a server process wait on a single script that long. If you have such a script that takes a long time to return the value, consider using zabbix_sender instead, as discussed in Chapter 11, Advanced Item Monitoring.
  • AllowRoot: By default, Zabbix daemons, if started as root, try to drop the privileges to a user specified in the User parameter (refer to the next point). If the User parameter is not specified, the outcome depends on this parameter. If it's set to 0, startup fails. If it's set to 1, the daemon starts as the root user.
  • User: This is common to all daemons. If daemons are started as the root user and AllowRoot is set to 0, try to change to the user specified in this parameter. This is set to zabbix by default.
  • Include: This is common to all daemons. It allows you to include individual or multiple configuration files. We discussed this feature in Chapter 11, Advanced Item Monitoring. Note that files are included sequentially as if literally "included" in the location where the Include directive appeared. Also keep in mind that if specified more than once, most parameters will override all previous occurrences—that is, the last option with the same name wins.
  • UnsafeUserParameters: By default, a subset of characters is disallowed to be passed as parameters to userparameter keys. If enabled, this option will allow anything to be passed and is essentially equivalent to EnableRemoteCommands—the originally prohibited symbols make it simple to gain shell access. See the default configuration file for a full list of symbols this parameter would allow.
  • UserParameter: This allows us to extend agents by adding custom item keys to it. We discussed this in quite a lot of detail and configured some userparameters in Chapter 11, Advanced Item Monitoring. This parameter may be specified multiple times as long the item key is unique—that is a way to add multiple userparameters.
  • LoadModulePath: This is common to all daemons. It specifies a path to load modules, written in the C language. This is an advanced way to extend Zabbix daemons that's a bit out of scope for this book. Refer to the Zabbix manual for more details.
  • LoadModule: This is common to all daemons. Multiple entries of this parameter may be specified for individual .so files to load inside the LoadModulePath directory.

Zabbix server daemon parameters

We will now skip the common parameters we already discussed when looking at the agent daemon configuration file. The remaining ones are as follows:

  • DBHost: This is useful if the backend database is on a different system. Using an IP address is highly recommended here.
  • DBName: This is the database name; we set it in Chapter 1, Getting Started with Zabbix. As the comment explains, it should be set to the database file path when the SQLite backend is used for a proxy.
  • DBSchema: This is the database schema, only useful with PostgreSQL and IBM DB2.
  • DBUser and DBPassword: These are database access credentials. As the comment explains, they're ignored when the SQLite backend is used for a proxy.
  • DBSocket: This is the path to the database socket, if needed. Unless the Zabbix server or proxy is compiled against a different database library than the one available at runtime, you'll likely never need this parameter.
  • DBPort: If connecting to a local or remote database on a nonstandard port, specify it here.
  • StartPollers: Pollers are internal processes that collect data in various ways. By default, five pollers are started, and this is plenty for tiny installations such as our test setup. In larger installations, it is common to have hundreds of pollers. Notice that there are no separate SNMP pollers—the same processes are responsible for passive agent and SNMP device polling. How to know whether you have enough? Using the internal monitoring, find out the average busy rate. If it's above some 70%, just add more pollers. Pollers are responsible for:
    • Connecting to passive agents
    • Connecting to SNMP devices
    • Performing simple checks, such as service/port checks
    • Retrieving internal monitoring data
    • Retrieving VMware data from the VMware cache
    • Running external check scripts
  • StartIPMIPollers: This specifies how many processes should be started that poll IPMI devices. We configured this parameter in Chapter 16, Monitoring IPMI devices.
  • StartPollersUnreachable: If a host is not reachable, it is not polled by normal pollers anymore—special types called unreachable pollers now deal with that host, including IPMI items. This is done to avoid a situation where a few hosts that time out take up most of the poller time. If there aren't enough unreachable pollers, the worst thing that happens is that hosts, declared unreachable, are not noticed as being back up as quickly. By default, only one unreachable poller is started. To know whether that is enough, observe their busy rate, especially when there are systems down in the monitored environment.
  • StartTrappers: By default, there are five trappers. As with pollers, monitor their busy rate and add more as needed. Trappers are responsible for receiving incoming connections from:
    • Active agents
    • Active proxies
    • zabbix_sender
    • The Zabbix frontend, including server availability check, global scripts, and queue data
  • StartPingers: These processes create temporary files and then call fping against those files to perform ICMP ping checks. If there are lots of ICMP ping items; make sure to check the busy rate of these processes and add more as needed.
  • StartDiscoverers: Discoverers perform network discovery. Discovery happens sequentially for each rule. Even if there are lots of available discoverers, only one at a time works on a single discovery rule. Note that discoverers split up the rules they will serve—for example, if there are two discovery rules and two discoverers, one discoverer will always work with a particular rule. We discussed network discovery in Chapter 12, Automating Configuration.
  • StartHTTPPollers: These processes are responsible for processing web scenarios. Like discoverers, HTTP pollers split up the web scenarios they will serve. We discussed web monitoring in Chapter 13, Monitoring Web Pages.
  • StartTimers: Timer processes can be quite resource intensive, especially if lots of triggers use time-based functions such as now(). We discussed time-based trigger functions in Chapter 6, Detecting Problems with Triggers. These processes are responsible for:
    • Placing hosts in and out of maintenance at second 0 of every minute—this is only done by the first timer process if more than one is started
    • Evaluating all triggers that include at least one time-based trigger function at second 0 and second 30 every minute
  • StartEscalators: These processes move escalations forward in steps, as discussed in Chapter 7, Acting upon Monitored Conditions. They also run remote commands, if instructed so by action operations.
  • JavaGateway, JavaGatewayPort, and StartJavaPollers: These parameters point at the Java gateway and its port and tell the server or proxy how many processes should connect to that gateway. Note that they all connect to the same gateway, so the gateway should be able to handle the load if the number of Java pollers is increased. We discussed Java monitoring in Chapter 17, Monitoring Java Applications.
  • StartVMwareCollectors, VMwareFrequency, VMwarePerfFrequency, VMwareCacheSize, and VMwareTimeout: These control the way VMware monitoring works. We discussed these parameters in detail in Chapter 18, Monitoring VMware.
  • SNMPTrapperFile and StartSNMPTrapper: When receiving SNMP traps, we must specify the temporary trap file and whether the SNMP trapper should be started. Note that only one SNMP trapper process may be started. We configured these parameters in Chapter 4, Monitoring SNMP Devices.
  • HousekeepingFrequency: This specifies how often the internal housekeeper process runs—or, to be more specific, how long after the previous run finished the next run should start. It is not suggested to change the default interval of one hour—the housekeeper may be disabled as needed for specific data in Administration | General, as discussed earlier in this chapter. The first run of the housekeeper happens 30 minutes after the server or proxy starts. The housekeeper may be manually invoked using the runtime control option.
  • MaxHousekeeperDelete: For deleted items, this specifies how many values per item should be deleted in a single run, with the default being 5,000. For example, if we had deleted 10 items with 10,000 values each, it would take two housekeeper runs to get rid of all of the values for all items. If an item had a huge number of values, deleting them all in one go could cause database performance issues. Note that this parameter does not affect value cleanup for existing items.
  • SenderFrequency: This specifies how often unsent alerts are sent out. Note that changing this value will affect both the time from the trigger to the first message and retries. With the default of 30 seconds, it may take up to 30 seconds to send out a message after a trigger fires. It also means that there will be 30 seconds between attempts—Zabbix tries to send a message 3 times before declaring it as failed. If this parameter is reduced to result in the faster sending of the first message, it will also decrease the time between repeated attempts. With the default value of 30 seconds, an e-mail server being down for a bit more than one minute would still result in the message being sent on the third attempt. If this parameter is reduced to 10 seconds, a 30-second email-server downtime would be enough to potentially miss a message.
  • CacheSize: This is the size of the main configuration cache that holds hosts, items, triggers, and lots of other information. Usage of this cache depends on the size of the configuration data—which is influenced by the number of hosts, items, and other entities. Be very proactive with this parameter—if cache usage significantly increases or you plan to add monitoring for lots of new hosts, increase the configuration cache. If the configuration cache is full, the Zabbix server stops.
  • CacheUpdateFrequency: This specifies how often the configuration cache is updated. The default of 1 minute is quite fine for most installations, although in large environments, it might be a good idea to increase this parameter, as a configuration cache update itself can increase database load.
  • StartDBSyncers: This specifies how many database or history syncers should be started (both names are used interchangeably in various places in Zabbix). These processes are responsible for calculating triggers that reference items, receiving new values, and storing the resulting events and those history values in the database—probably the most database-taxing processes in Zabbix. The default of four database or history syncers should be enough for most environments, although it could be useful to increase for big installations. Be careful with increasing this number—having too many of these can have a negative effect on performance; although you might see that if their average busy rate decreases, the number of values processed could decrease.
  • HistoryCacheSize: When values are collected, they are first stored in a history cache. History or database syncers take values from this cache, process triggers, and store the values in the database. The history cache getting full usually indicates performance issues—increasing the cache size is unlikely to help. If this cache is full, no new values are inserted in it, but the Zabbix server keeps running.
  • HistoryIndexSize: This cache holds information about the most recent and oldest value for all items in the history cache. It is used to avoid scanning the history cache, which could get rather large. Usage of this cache depends on the number of items that collect data. As with the main configuration and trend cache, make sure to have enough room in this cache—if it's full, the Zabbix server will shut down.
  • TrendCacheSize: This cache holds trend information for the current hour for each item—not the current hour per the clock, but the current hour based on the incoming values. That is, the last value that came in for an item determines the current hour value. For example, if values are sent in using zabbix_sender for the hour 09:00–10:00 yesterday, that is the current hour, and its trend data is in the trend cache. As soon as the first value for hour 10:00–11:00 arrives, the trend cache information for that item is written to the database and 10:00–11:00 becomes the new current hour. Usage of this cache depends on the amount of items that collect data. As with the main configuration cache, make sure to have enough room in this cache—if it's full, the Zabbix server will shut down.
  • ValueCacheSize: This parameter controls the size of the cache that holds historical values—but as opposed to the history cache, it holds values that are expected to be useful in the future. The values in here are not meant to be written out to the database, but quite the opposite—values are often read into this cache from the database. The value cache is used when item values are needed for trigger calculation (for example, computing the average value for last 10 minutes), for calculated or aggregate items, for including in notifications, and other purposes. Value cache population can take a while when the server first starts up. If the value cache is full, the Zabbix server will keep running, but its performance will likely degrade. Monitor this cache and increase the size as needed.
  • TrapperTimeout: This parameter controls how long trappers spend on communicating with active agents and proxies as well as zabbix_sender. Being set to the maximum value of 5 minutes by default, this timeout is highly unlikely to be reached.
  • UnreachablePeriod, UnavailableDelay, and UnreachableDelay: These parameters work together to determine how value retrieval failures should be handled. If value retrieval fails with a network error, the host is considered to be unreachable and is checked every UnreachableDelay seconds (by default, 15). This goes on for UnreachablePeriod seconds (45 by default), and if all checks fail (with the default settings we end up with 4 checks), the host is marked unavailable and is checked every UnavailableDelay seconds. Note that since Zabbix 3.0, if an item fails twice in a row but another item of the same type on the same host succeeds, the failing item is marked unsupported instead. It is probably best to leave these values at the defaults, as changing them could lead to fairly confusing results.
  • AlertScriptsPath: Custom scripts to be be called from actions must be placed in the directory specified by this parameter. We configured such a script in Chapter 7, Acting upon Monitored Conditions.
  • ExternalScripts: Scripts that are to be used in external check items must be placed in the directory specified by this parameter. We configured such an item in Chapter 11, Advanced Item Monitoring.
  • FpingLocation and Fping6Location: These parameters should point at the fping binaries for IPv4 and IPv6, if different. The fping utility is required for ICMP checks, which we configured in Chapter 3, Monitoring with Zabbix Agents and Basic Protocols.
  • SSHKeyLocation: If using SSH items with keys, the keys must be placed in the directory specified by this parameter. We configured SSH items in Chapter 11, Advanced Item Monitoring.
  • LogSlowQueries: Normally, SQL queries are not logged up to DebugLevel 4. This parameter allows us to log all queries that take longer than the number of milliseconds, specified here, at DebugLevel 3. By default, since Zabbix 3.0, any query that takes longer than 3 seconds is logged. They appear in the log file like this:
    13890:20151223:152504.421 slow query: 3.005859 sec, "commit;"
  • TmpDir: This is a temporary directory for any files the Zabbix server or proxy need to store. Currently, only used for files that are passed to fping.
  • SSLCertLocation, SSLKeyLocation, and SSLCALocation: These parameters specify where certificates, keys, and certificate authority files will be looked up when the SSL functionality is used with web monitoring.

Again, all the parameters starting with TLS are relevant for daemon traffic encryption and won't be discussed here.

The available parameters might be slightly different if you have a more recent version of Zabbix. To list the supported parameters in the configuration file you have, the following command could help:

$ grep "### Option" zabbix_agentd.conf

Now, if you get confused about some parameter, what's the first place you should check? If you said or thought "comments in the configuration files themselves, of course," great. If not, go take a look at those comments and remember that the Zabbix team really, really tries hard to make those comments useful and wants you to read them. You will save your own time that way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.91.51