Reducing Event Storms with ECS

NNM 6.0 introduced event correlation services (ECS) to provide the means to intelligently interpret the event stream, reduce its volume, and increase its quality. This solves the problem of servicing events on a large network which can number hundreds per minute. A single outage can generate thousands of events. As it’s impractical to react to them all, network managers would resort to one of the following:

  • ignore all messages except when a critical device goes down

  • reconfigure most messages “log only”

  • handle critical alarms only

  • use the NNM event browser filter to limit events

  • buy a third-party tool for event filtering

The NNM event browser filter lets you configure the following criteria to limit events displayed:

  • severity level

  • source IP addresses

  • wildcard to specify a range of IP addresses or node names

  • acknowledged or unacknowledged alarms

  • alarm time span

  • message string word search

  • event type

Simple event filtering is a very crude method for reducing the event stream to a mere trickle. There are nuggets of valuable information buried in the event stream and the right way to find them is to take advantage of ECS. You want the cause of the problem to be identified; you don’t want to see all the symptomatic events that result. Study Figure 8-5 for an overview of the event stream within NNM. Note that ECS is active by default. Several correlations are provided bundled with NNM as described in Table 8-2. The exact correlations provided will vary with each version of NNM. Additional correlations may be obtained from third parties, from HP consultants, or by writing your own using the ECS Designer for NNM products.

Figure 8-5. Event flows in NNM [a.].

This very high-level diagram shows how data flows between the databases and daemons in NNM.


[a.] Copyright 1999 Hewlett-Packard Company. Reproduced with permission. Hewlett-Packard Company makes no warranty as to the accuracy or completeness of the foregoing material and herby disclaims any responsibility therefor.

Table 8-2. Bundled Edition Correlations
Correlation Name Description
Connector Down Use the network topology to isolate the root cause of a cascade of device failures to a specific connector. An event storm is avoided and one event is logged to the event browser. The root cause device is shown as a red icon, while effected devices are shown in blue (unknown status).
Scheduled Maintenance When devices scheduled for maintenance go down, NNM will normally generate alarms. This correlation defines which devices or what range of IP addresses to ignore starting at a given time and for a specific duration.
Repeated Event Allows for the suppression of related events about a particular device. For example, without this correlation, when a device MAC address changes, NNM’s check of neighboring device ARP caches will cause NNM to report a mismatch for each one.
Pair Wise Multiple traps are correlated over a time interval and the parent trap is identified and reported in the alarm browser.
MgXServerDown New with NNM 6.1, this circuit allows correlation of events between HP OpenView Network Node Manager and HP OpenView ManageX systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.139.172