Chapter 9. Scope-based event grouping

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Scope-based event grouping

This chapter describes a scenario for scope-based event grouping and includes the following topics:

•9.1, “Introduction” on page 154

•9.2, “Scenario description” on page 155

•9.3, “Scenario topology” on page 155

•9.4, “Scenario steps” on page 156

•9.5, “Summary” on page 168

9.1 Introduction

Scope-based event grouping is based on the premise that if you have a group of events that occur at the same place at the same time, it is likely that they are related to the same problem. In this context, scope is another way of referring to same place. In practice, this method proves to be effective for grouping events.

Grouping events includes the following goals:

•Bring order to the event list and logically grouping events by incident

•Provide a mechanism to create only one incident ticket per incident

•Keep all the related event information together to aid problem diagnosis

•Reduce mean time to repair (MTTR) and operations costs

Although the use of a scope that is based on geographic location is the natural choice for scope-based event grouping, the ScopeID field is a string and can be set to anything that makes sense in the context of the grouping scenario. Another way to think of the scope is the field or reach of influence.

The idea is that your scope is wide enough to include all the events that might be related to a problem without making it so large that the automation incorrectly groups too many events together. It is better to stray on the conservative side of not grouping too many events together rather than too many.

Scope-based event grouping seeks to group events that happen at the same place at the same time. The same time is defined in terms of a minimum period that needs to pass without further new events occurring before it is deemed that the incident finished. This period of “quiet time” is referred to as the QuietPeriod.

The term “new events” applies to the occurrence of new, unique events only. Recurrences of the same events (known as deduplication in Netcool terms) are not applicable in terms of resetting the QuietPeriod.

One example of a real-world scenario is a telecommunications company that defined the scope to be the cell site identification code, which is encoded into every event. Implementing this scope was simple and involved adding one line of code to the Probe rules to set the ScopeID to match the cell site identification code. Any events that came from that site at the same time were automatically grouped. This scope was convenient because the company has multiple different equipment vendors’ events on each site and building management events.

The company reduced the number of events that was presented to operators by 77% and had an average of 12 events per grouping. In this scenario, a QuietPeriod of 10 minutes (600 seconds) was found to be optimal.

Another example of a real-world scenario is a large bank that defined the scope to be the line-of-business identifier. Within the bank, there are many lines-of-businesses that represent the “customers” of the bank’s ITSM solution. Each line-of-business owns several servers that run business critical applications. IBM Tivoli Monitoring is used extensively on these servers and monitor everything from the applications to the hardware on which the applications are running.

This configuration generates a great deal of ITM events into Netcool, which makes it challenging for operations to manage. The bank discovered that, by setting the ScopeID to the line-of-business ID, they reduced the number of events that are presented to operators by more than 99% and had an average of 210 events per grouping. In this scenario, a QuietPeriod of 10 minutes (600 seconds) was found to be optimal.

The only settings that are required to start scope-based event grouping is the setting of an appropriate ScopeID and QuietPeriod if the default value is not appropriate. An optional extension to the grouping is to activate the automatic probable cause and affect determination via the weightings.

Note: The weighting function is optional and the grouping is not dependent on setting up this function.

9.2 Scenario description

Company A is a telecommunications company with an extensive wireless network. They use several different vendors’ equipment across their widely distributed cell sites. The equipment they use creates a high volume of network traffic and Operations often must deal with many events. This challenge is exacerbated whenever there is a major outage.

Helen runs the tooling department for the Netcool based ITSM solution at Company A. She identified the following list of challenges that are facing Operations:

•There are too many events for operators to manage.

•There often are many events present that relate to each incident.

•Events for multiple incidents are all mixed, which makes it difficult to manage.

•Multiple tickets often are opened from the many events.

•The information from these events often is fragmented across tickets.

•It is costly to the business to close all of the duplicate tickets.

The net result of these challenges is that the MTTR is higher than it should be. The metaphorical “house” is messy and needs organizing. It is here that Helen believes that event grouping can help.

9.2.1 Business value

The scope-based event grouping feature helps Company A in the following ways:

•Automatically group events by incident, based on location and time

•Combine the selected event details into a single place for ticketing

•Allow the cutting of a single ticket per incident, which hopefully eliminates duplicates

By implementing these features, Helen predicts that she can considerably reduce the number of trouble tickets that are opened, and keep the related event information that pertains to an incident in one place. She believes that these changes can help operators pinpoint and resolve problems faster.

9.3 Scenario topology

For this scenario, we used the environment that is described in 1.4, “Our environment for the scenarios” on page 18 of this book.

9.4 Scenario steps

The following sections describe the step-by-step implementation of this scenario.

9.4.1 Analyzing the current event set

The operators review traditional filtered lists of events that include field-based sorting applied. Although they often work on higher severity events first, they are mindful of ensuring that older events are dealt within a timely fashion.

The operations team frequently struggle to stay on top of the high volumes of events that come through and invariably end up cutting multiple duplicate trouble tickets, particularly whenever there is an event storm. Figure 9-1 shows the current environment.

Figure 9-1 Initial event listing

Scope-based event grouping works on the basis that events are grouped that occur at the same place at the same time. Helen identifies that the event stream includes location information that is encoded within the event stream that might be used to define the scope.

Tip: In many cases, the location information is not present in the event stream. If it is available in a database (for example, an asset database), the location or scope can be enriched into the event stream by using Netcool/Impact.

9.4.2 Configuring the system

Helen completed the following steps to configure the development system to test out how her plan to deploy scope-based event grouping might look:

1. Helen imports the scope-based event grouping automation into the Net-cool/OMNIbus ObjectServer. She opens a command line session to the ObjectServer and imports the automation functions, as shown in the following example:

$OMNIHOME/bin/nco_sql -server AGG_P -user root -password abc123

< $OMNIHOME/extensions/eventgrouping/objectserver/

scope_event_grouping_aggregation.sql

Helen repeats the process on the backup ObjectServer and makes some additions to the failover bidirectional Aggregation Gateway so that the automation control in-formation is replicated between the primary and backup Netcool/OMNIbus systems. For more information about this process, see this website:

http://ibm.biz/seg_install

2. Helen modifies the Probe rules to set up the scope. Because the event stream contains the information Helen needs to use for the scope-based grouping, she makes a small addition to the Probes rules file to set up the ScopeID and, where possible, subgrouping.

The primary location is stored in a data token that is known as location with some events that also contain secondary location information in the data token that is called suburb. Helen adds the following lines to the Probe rules file to assign the primary location into the ScopeID field and the suburb location into the SiteName field:

@ScopeID = $location

@SiteName = $suburb

Tip: Subgrouping is done automatically if the $suburb token is populated with a non-null value. If it is null, subgrouping is not done. Subgrouping can be forced in any case, however, by setting a SEGNoSiteNameParentIfSiteNameB-lank property to 0. With this setting, a subgrouping is created to subgroup events that do not have a subgrouping value defined.

3. Helen edits the Event Viewer view so that it includes IBM Related Events in the view. This change allows the relationships to be rendered in the Event Viewer.

If Netcool Operations Insight is not installed, IBM Related Events relationship might not be installed. In this case, you can create a relationship, as shown in Figure 9-2 on page 158.

Figure 9-2 Create New Relationship window

9.4.3 Viewing the grouping

Helen adds the ScopeID and SiteName fields to her Event List view and renames the SiteName field to something more appropriate, such as Suburb. She then replays a sample of the organization's event data through the new Probe rules file (the results are shown in Figure 9-3).

Figure 9-3 List of groupings

The sample contains 159 unique events. By applying scope-based event grouping, the large number of events collapses down to six rows or groups, which is comparable to the number of incidents to which the event set relates. For the specific sample set, the number of rows that are being presented to operations is reduced by 96%.

9.4.4 Modifying the properties

Helen performs the following steps to modify the properties:

1. She opens the AUCKLAND grouping and inspects the appearance of the groupings, as shown in Figure 9-4.

Figure 9-4 AUCKLAND grouping

Although Helen is pleased with the grouping that results, she wants to modify the appearance of the event groupings to provide more useful information to the operators.

Scope-based event grouping makes available the 43 properties that allow its appearance and behavior to be modified. The properties are stored in the master.properties table in the Netcool/OMNIbus ObjectServer and can be updated via nco_sql or the Netcool Administrator tool. For more information about the properties, see the following IBM Knowledge Center site:

http://ibm.biz/seg_docs

2. Helen wants to modify some of the properties to change the appearance of some elements of the synthetic parent events. She also wants to activate the journaling feature in scope-based event grouping so that a single ticket can be cut from each subgrouping.

Helen modifies the following properties to the values that are shown:

– SEGSiteNamePrefix = SUBURB (CharValue)

– SEGScopeIDSitesAffectedLabel = suburb (CharValue)

– SEGJournalToScopeIDParent = 1 (IntValue)

– SEGJournalToSiteNameParent = 1 (IntValue)

These changes update the Event Viewer, as shown in Figure 9-5.

Figure 9-5 Updated Event Viewer

3. When Helen double-clicks SiteName event (which is also known as a SUBURB event), she can see that the journals are now there and ready for ticketing, as shown in Figure 9-6).

Figure 9-6 Journals ready for ticketing: SiteName event

Tip: The default setting for the maximum number of events to send to the journal of a SiteName event is 10. This setting can be modified via the SEGMaxSiteNameJournals property.

4. When Helen double-clicks a ScopeID event, she can see that the journals are now there and ready for ticketing, as shown in Figure 9-7.

Figure 9-7 Journals ready for ticketing: ScopeID event

Tip: The default setting for the maximum number of events to send to the journal of a ScopeID event is 50. This setting can be modified via the SEGMaxScopeIDJournals property.

5. Helen verifies that when the synthetic parent event is assigned to a user, it is assigned to a group or has a ticket assigned to it and that the OwnerUID, OwnerGID, and TTNumber all individually propagate to the child events, as shown in Figure 9-8.

Figure 9-8 OwnerUID, OwnerGID, and TTNumber propagate

9.4.5 Using ScopeAlias

Although scope-based event grouping is data-driven, there are occasions where it makes sense to merge two or more different scopes and reflect them as a single entity. This merge can be done by defining a ScopeAlias.

Helen used the following process:

1. After reviewing the resulting groupings, Helen identifies that three of the smaller towns (KATIKATI, WAIHI, and TAURANGA) overlap in terms of the underlying network infrastructure. Helen decides to combine the events from these three subscopes because together they make up the definition of same place, as shown in Figure 9-9.

Figure 9-9 Combining the events

2. Helen creates a ScopeAlias of BAY OF PLENTY for the three smaller towns; which is the name of the larger region in which the three smaller towns are located.

3. Helen adds one entry for each of the three towns via the Netcool Administrator. She also prepares an SQL file to check into the company version control system for future use, as shown in the following example:

insert into master.correlation_scopealias_members (ScopeAlias, ScopeID) values ('BAY OF PLENTY', 'TAURANGA');

insert into master.correlation_scopealias_members (ScopeAlias, ScopeID) values ('BAY OF PLENTY', 'WAIHI');

insert into master.correlation_scopealias_members (ScopeAlias, ScopeID) values ('BAY OF PLENTY', 'KATIKATI');

4. Upon replaying the test data through the system, Helen sees the results that are shown in Figure 9-10. All three of KATIKATI, WAIHI, and TAURANGA retain their original ScopeID values in each’s respective ScopeID fields; however, they are grouped under BAY OF PLENTY alias instead. The BAY OF PLENTY label is an alias to all three scopes, hence the term ScopeAlias.

Figure 9-10 Events that are grouped under BAY OF PLENTY alias

Using CauseWeight and ImpactWeight

Helen wants to enhance the resulting groupings with weightings so that the events can be prioritized for the operators in terms of cause and affect. This change helps the operators to more easily pinpoint the events in a group that represent the probable causes of each incident. Until now, the synthetic subgrouping parent events showed CAUSE AND IMPACT: UNKNOWN in the Summary field because none of the child events included assigned weightings.

Tip: This cause and affect text can be switched on or off the scope and subgrouping parent events separately via the Properties menu.

Scope-based event grouping provides a standard method to weigh events in terms of the likelihood that they are a high-impacting event to businesses or services, and each one’s likelihood that it is a contributing cause of an incident. This information can then be used to enrich the Summary field of the synthetic parent events to both guide operators, and provide more information to any ticket headlines.

Helen completed the following steps:

1. Helen creates a copy of the Probe rules template that is provided with scope-based event grouping into the main Probe directory to work with so that she can allocate the event categories.

2. Helen edits the top section of the template to standardize the setting:

– ScopeID

– SiteName (where available)

– Event category (NormalisedAlarmCode)

– OSI level of the event

3. In the first subsection, Helen sets the default values of the following fields per the template:

# SET / INITIALISE MANDATORY VARIABLES

@ScopeID = $location

@NormalisedAlarmCode = 0

$OSILevel = 9

# SET / INITIALISE OPTIONAL VARIABLES

@SiteName = $suburb

4. Helen edits the next section of the file that switches on $EventCode. In the case of Company A, $EventCode is not a valid token. Therefore, Helen modifies the switch statement to use other tokens to determine event categorization, and setting a sensible OSI level for the events:

switch ($MyField) {

case "INFO": # EXAMPLE - Informational events

@NormalisedAlarmCode = 10

$OSILevel = 3

case "A1400": # EXAMPLE - Workarounds in execution

@NormalisedAlarmCode = 20

$OSILevel = 3

...

Tip: For more information about the 16 event categories (from purely informational to controlled shutdown), see this website:

http://ibm.biz/seg_fields

The table at this site shows how categorizations and OSI levels combine to establish the weightings. These weightings are implemented in the second half of the Probe rules file template and are not edited. It is important to use the standard weighting method so that events from any source can be compared with any other events, in terms of their cause and affect, regardless of their source.

5. Helen modifies the following other properties to enable the display of cause and affect analysis text to automatically appear in the Summary fields of the synthetic containment events, wherever direct child events exist to each respective parent event:

– SEGUseScopeIDImpactCause = 1

– SEGUseSiteNameImpactCause = 1

6. Helen clears the ObjectServer and replays the event data through the Probe. She sees the events that are shown in Figure 9-11 with weightings preset and cause and affect diagnosis text that appears in the Summary fields of the ScopeIDParent events and the SiteNameParent events.

Figure 9-11 Cause and affect diagnosis text

9.4.6 Using data from the highest ranked child event

Finally, Helen wants to enhance the Summary field of the subgroup parent events with the Node value of the highest weighted child. Although the properties limit which default items can go into the Summary line of the synthetic parent events, it can be done via the CustomText field if there is any other text that needs to be included.

The process includes the following tasks:

•We can put any text that we want in each child event’s CustomText field.

•We can auto-select the child with the highest weighted cause, the highest weighted impact, the first FirstOccurrence (for example, first event in the group), or the last LastOccurrence (most recent recurrence in the group). The CustomText from the auto-selected event is copied to its direct parent event.

•We can opt to display a synthetic parent event’s CustomText in its Summary field.

The following properties perform the auto-select of the priority child event for a ScopeID parent event in the order of precedence listed:

•SEGPropagateTextToScopeIDParentCause = 1

•SEGPropagateTextToScopeIDParentImpact = 1

•SEGPropagateTextToScopeIDParentFirst = 1

•SEGPropagateTextToScopeIDParentLast = 1

Tip: If SEGPropagateTextToScopeIDParentCause is set to 1, the rest are ignored. Similarly, if SEGPropagateTextToScopeIDParentCause is set to 0 but SEGPropagateTextToScopeIDParentImpact is set to 1, the rest are ignored, and so on.

Similarly, the following properties perform the auto-select of the priority child event for a ScopeID parent event in the order of precedence listed:

•SEGPropagateTextToSiteNameParentCause = 1

•SEGPropagateTextToSiteNameParentImpact = 1

•SEGPropagateTextToSiteNameParentFirst = 1

•SEGPropagateTextToSiteNameParentLast = 1

The following properties enable showing the CustomText field of the synthetic parent in its own Summary field:

•SEGUseScopeIDCustomText = 1

•SEGUseSiteNameCustomText = 1

Helen completed the following steps:

1. Because Helen wants to propagate the highest cause weighted child to the subgrouping parent and display its Node in the event, she sets the following properties only:

– SEGPropagateTextToSiteNameParentCause = 1

– SEGUseSiteNameCustomText = 1

2. Helen enters the value of the CustomText fields for the children events in the Probe rules so that when the events are present in the ObjectServer, they are holding the text that they need to pass should they happen to be the highest weighted child, as shown in the following example:

@CustomText = "high node: " + @Node

3. Helen clears the ObjectServer and replays the event data through the Probe and sees the groupings that are shown in Figure 9-12 with the top weighted child event Node displayed in the Summary line of each of the subgrouping synthetic parent events.

Figure 9-12 also shows that the subgroupings of Frankton and Claudelands have top nodes of link42 and link65 because of the modifications Helen made.

Figure 9-12 Subgroupings of Frankton and Claudelands

9.5 Summary

Helen finished all her configuration modifications and is ready to begin user acceptance testing. She is confident the new groupings will help the operations team more easily make sense of the large volumes of events because of the logical grouping that is occurring and the cause and affect analysis that is done automatically by the system. She expects ticket counts to drop significantly and MTTR to improve dramatically. This improvement will help to save the business money and provide a better service to its customers.

Note: For more information about setting up scope-based event grouping, see the documentation that is available at the following IBM Knowledge Center site:

http://ibm.biz/seg_docs

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9. Scope-based event grouping

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 9. Scope-based event grouping