Estimating SNMP Data Sample Rates

When you ask five network managers “what is a sensible sampling rate for SNMP data?” you will get six different answers. There are many conflicting issues responsible for this. Let’s review some of them.

NNM itself allows SNMP sampling intervals as small as one second. SNMP agents running on 8-bit hardware as low-priority processes may be unable to respond to an SNMP request for multiple objects in such a short interval. They will often time-out and become unresponsive when pressed too hard. Recall that NNM is configured by default to try an SNMP request three extra times with a 0.8-second exponential time-out (0.8, 1.6, 3.2, and 6.4 seconds for four time-outs totalling 12 seconds). The retries will just serve to overload slow SNMP agents. Therefore, one-second polling intervals are generally avoided because that’s smaller than the time-out interval. NNM administrators don’t want to spend the extra time to configure specific SNMP timing parameters for specific network devices.

Polling a device on a remote network may incur a one-second round trip latency, especially if there are congested serial links to traverse. The default short SNMP time-outs will just add to the network traffic. For entire subnets it is practical to define one set of SNMP timeouts because it involves just one wild card setting in the configuration GUI. Still, one-second polling is too small for this case because the latency will disturb the sampling times by up to several seconds.

High-speed and high-volume SNMP polling can generate considerable network traffic. Given that many NNM systems use a fast Ethernet adapter, it is conceivable that a serial line can be swamped by SNMP traffic to the detriment of mission-critical traffic. A rule of thumb is that network management traffic should not use more than 10% of any link’s capacity. Assuming a 200-byte SNMP packet size, you can multiply by the number of devices and divide by the polling interval to calculate the traffic SNMP polling adds to the network. Note that snmpCollect attempts to reduce the number of SNMP requests by testing each device’s SNMP agent for the number of values it can return in one request. This reduces the overhead of sending multiple, single-valued requests, which is a good thing. It also increases the average packet size well above 200 bytes assumed above. Traffic overhead on the network is another reason for using larger polling intervals.

Short polling intervals on many SNMP objects will cause the snmpCollect daemon to consume more CPU time. This can negatively impact a small NNM system. Ideally, you want to keep snmpCollect below 10% CPU utilization. On the other hand, if too many concurrent data collections are attempted, snmpCollect can fall behind. Help snmpCollect keep up by configuring the -n numberconcurrentsnmp option in snmpCollect.lrf. Monitor the $OV_LOG/snmpCol.trace file for problems, because this option can be set too high. Keep this value well below maxfiles, the operating system’s maximum number of open files parameter. If maxfiles is 64, then -n 35 is found empirically to work well (but check $OV_LOG/ snmpCol.trace to verify snmpCollect’s health). Once again, you have a reason to keep polling intervals on the high side.

We’ve seen that excessive polling can negatively impact network devices, the network itself, and the NNM system. A one-second polling interval isn’t a good idea. But if you poll hourly, all the interesting variations in the network’s statistics are averaged into one fairly unrepresentative and virtually useless statistic. Review Figure 9-5 to appreciate how sampling rates effect the quality of the resulting data. When in doubt, choose a five-minute sampling interval. Experience shows this captures enough of the changing statistics of network metrics and yet doesn’t stress the network, NNM system, or sensitive network devices.

Figure 9-5. How sample rates effect data quality.

High sampling intervals like one second, ten seconds, and one minute capture all of the interesting variations in the network metric. These sample rates are too high. The 10-minute samples have removed all of the interesting information in the data. This is why a common sampling interval is five minutes.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.28.113