Strategies for Setting Threshold Values

Should you define threshold values for performance polling? What is the staff going to do about a threshold event? What is the local process? The usual answer is that nothing can be done or should be done. After all, if a user is downloading a large file or if a backup is in progress, is it appropriate to hunt down the offending system and shut down the switch port it’s connected to? This kind of draconian network policing is certain to cause vehement user reaction.

A threshold event should be classified as a mere warning and not acted upon directly by the staff. There is also an opportunity for an ECS-trained network manager to create a custom circuit that intelligently correlates threshold events into a more useful one. For example, if a significant portion of the network is experiencing threshold events, the custom event correlator should detect this and generate a major event.

Assuming you want to generate threshold events, how should the threshold values be set? Three approaches can be considered.

The easiest answer is to use the published rules of thumb such as those in Table 9-2 and in textbooks on capacity planning [2] .

[2] John Blommers, Prentice Hall, 1996, Practical Planning for Network Growth, ISBN 0-13-206111-2

A more difficult approach is to adjust the threshold levels upward until the event rate is acceptably low. This approach is clearly labor-intense since each and every interface has to be monitored and individually tweaked. Another name for this technique is baselining. You monitor the performance metrics for a few weeks for each device and then change the threshold levels accordingly. Note that these threshold levels can be manually added in the $OV_CONF/ snmpCol.conf file to avoid a lot of tedious work with the SNMP data collector configuration GUI. The format of this file changed at NNM 6.0, so be cautious when you upgrade your NNM system to migrate this file format.

A final approach to setting threshold values is based on analytic considerations. For example, suppose you model a serial circuit as a simple M/M/1 queue. Recall that if this queue is 90% utilized, then the average wait time is 10x the norm. That’s a good reason to generate an alarm. For error rates and packet loss, consider the graph in Figure 9-7. It shows how the TCP throughput of an application performing a continuous TCP data transfer slows as packet loss increases. Even a one percent packet loss effectively reduces throughput to negligible levels.

Figure 9-7. TCP throughput vs. packet loss.

An analytic approach to setting error thresholds involves examining empirical data as depicted here. This graph depicts a single streaming TCP data connection between workstations on the same LAN segment which is subject to packet loss. Clearly, the slightest loss percentage effectively negates the streaming advantages of TCP. An error threshold of one percent is justified by this data.


To avoid a random single error burst from generating an alarm, you can take advantage of NNM’s clever threshold features. By specifying that a given metric must exceed the threshold value for four consecutive samples, you ensure that only a sustained error condition will generate an alarm. At five-minute samples this means a 20-minute duration is required to trigger an alarm. To ensure the restoration of service is detected promptly, you can set the duration of the rearm interval to only two samples, or 10 minutes. The principles are explained in Figure 9-8.

Figure 9-8. NNM threshold parameters.

In the examples in the text you require four consecutive samples to exceed the arm threshold to generate a threshold event. The sample labeled B will generate an event, but sample A will not because only a single consecutive sample has exceeded the arm threshold. Two consecutive samples must be lower than the rearm threshold, so sample C is the only one that qualifies.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.149.19