Incident Resolution and Prevention: A Service Establishment and Delivery Process Area at Maturity Level 3

Purpose

The purpose of Incident Resolution and Prevention (IRP) is to ensure timely and effective resolution of service incidents and prevention of service incidents as appropriate.



Introductory Notes

The Incident Resolution and Prevention process area involves the following activities:

• Identifying and analyzing service incidents

• Initiating specific actions to address incidents

• Monitoring the status of incidents, tracking progress of incident status, and escalating as necessary

• Identifying and analyzing the underlying causes of incidents

• Identifying workarounds that enable service to continue

• Initiating specific actions to either address the underlying causes of incidents or to provide workarounds

• Communicating the status of incidents to relevant stakeholders

• Validating the complete resolution of incidents with relevant stakeholders



The term “incident” is used to mean “service incident” in this process area and in other areas of the model where the context makes the meaning clear. The term “service incident” is used in the glossary and in other parts of the model to clearly differentiate this specially defined term from the everyday use of the word “incident.” (See the definition of “service incident” in the glossary.)

Incidents are events that, if not addressed, eventually can cause the service provider organization to break its service commitments. Hence, the service provider organization should address incidents in a timely and effective manner according to the terms of the service agreement.

Addressing an incident can include the following activities:

• Removing an underlying cause or causes

• Minimizing the impact of an incident

• Monitoring the condition or series of events causing the incident

• Providing a workaround

Incidents can cause or be indications of interruptions or potential interruptions to a service.



Customer complaints are a special type of potential interruption. A complaint indicates that the customer perceives that a service does not meet his or her expectations, even if the customer is in error about what the agreement calls for. Therefore, complaints should be handled as incidents and are within the scope of the Incident Resolution and Prevention process area.

All incidents have one or more underlying causes, regardless of whether the service provider is aware of the cause or not. For example, each system outage has an underlying cause, whether it is a memory leak, a corrupt database, or an operator error.

An underlying cause of an incident is a condition or event that contributes to the occurrence of one or more incidents. Not all underlying causes result in incidents immediately. For example, a defect in an infrequently used part of a system may not result in an incident for a long time.

Underlying causes can be any of the following:

• Root causes that are within the service provider’s control and can and should be removed

• Positive or negative conditions of a service that may or may not be removed

• Conditions that the service provider cannot change (e.g., weather conditions)

Underlying causes and root causes (as described in the Causal Analysis and Resolution process area) are not synonymous. A root cause is a type of underlying cause that begins a chain of causes for some outcome of interest. We don’t normally look for the cause of a root cause and we normally expect to achieve the greatest reduction in the occurrence of incidents when we address a root cause.

Sometimes, we are unable to address a root cause for practical or budgetary reasons, and so instead we can focus on other non-root underlying causes. It doesn’t always make business sense to remove all underlying causes either. Under some circumstances, addressing incidents with workarounds or simply resolving incidents on a case-by-case basis can be more effective.

Effective practices for incident resolution start with developing a process for addressing incidents with the customers, end users, and other relevant stakeholders who report incidents. Organizations can have a collection of known incidents, underlying causes of incidents, and workarounds, as well as separate but related activities designed to create the actions for addressing selected incidents and underlying causes. Processing all incidents and analyzing selected incidents and their underlying causes to define approaches to addressing those incidents are two reinforcing activities that can be performed in parallel or in sequence.

Thus, the Incident Resolution and Prevention process area has three specific goals. The Prepare for Incident Resolution and Prevention goal helps to ensure an approach is established for timely resolution of incidents and effective prevention of incidents when possible. The specific practices of the goal to Identify, Control, and Address Individual Incidents are used to treat and close incidents as they occur, often by applying workarounds or other actions defined in the goal to Analyze and Address Causes and Impacts of Selected Incidents.

Related Process Areas

Refer to the Capacity and Availability Management process area for more information about monitoring and analyzing capacity and availability.

Refer to the Service Delivery process area for more information about establishing service agreements.

Refer to the Causal Analysis and Resolution process area for more information about determining causes of selected outcomes.

Refer to the Configuration Management process area for more information about tracking and controlling changes.

Refer to the Risk Management process area for more information about identifying and analyzing risks and mitigating risks.

Refer to the Work Monitoring and Control process area for more information about providing an understanding of the ongoing work so that appropriate corrective actions can be taken when the performance deviates significantly from the plan.



Specific Practices by Goal

SG 1 Prepare for Incident Resolution and Prevention

Preparation for incident resolution and prevention is conducted.

Establish and maintain an approach for ensuring timely and effective resolution and prevention of incidents to ensure the terms of the service agreement are met.

SP 1.1 Establish an Approach to Incident Resolution and Prevention

Establish and maintain an approach to incident resolution and prevention.

The approach to incident resolution and prevention describes the organizational functions involved in incident resolution and prevention, the procedures employed, the support tools used, and the assignment of responsibility during the lifecycle of incidents. Such an approach is typically documented.

Often, the amount of time needed to fully address an incident is defined before the start of service delivery and documented in a service agreement.

In many service domains, the approach to incident resolution and prevention involves a function called a “help desk,” “service desk,” or one of many other names. This function is typically the one that communicates with the customer, accepts incidents, applies workarounds, and addresses incidents. However, this function is not present in all service domains. In addition, other functional groups are routinely included to address incidents as appropriate.

Refer to the Service Delivery process area for more information about establishing service agreements.

Example Work Products

1. Incident management approach

2. Incident criteria

Subpractices

1. Define criteria for determining what an incident is.

To be able to identify valid incidents, criteria are defined that enable service providers to determine what is and what is not an incident. Typically, criteria also are defined for differentiating the severity and priority of each incident.

2. Define categories for incidents and criteria for determining which categories an incident belongs to.

The resolution of incidents is facilitated by having an established set of categories, severity levels, and other criteria for assigning types to incidents. These predetermined criteria can enable prioritization, assignment, and escalation actions quickly and efficiently.



Criteria are established that enable service staff to quickly and easily identify major incidents.



3. Describe how responsibility for processing incidents is assigned and transferred.



4. Identify one or more mechanisms that customers and end users can use to report incidents.

These mechanisms account for how groups and individuals can report incidents.

5. Define methods and acquire tools to use for incident management.

6. Describe how to notify all relevant customers and end users who may be affected by a reported incident.

How to communicate with customers and end users is typically documented in the service agreement.

7. Define criteria for determining severity and priority levels and categories of actions and responses to be taken based on severity and priority levels.



8. Identify requirements on the amount of time defined for the resolution of incidents in the service agreement.

Often, the minimum and maximum amounts of time needed to resolve an incident is defined and documented in the service agreement before the start of service delivery.

Refer to the Service Delivery process area for more information about establishing service agreements.

9. Document criteria that define when an incident should be closed.

Not all underlying causes of incidents are addressed and not all incidents have workarounds either. Incidents should not be closed until the documented criteria are met.

Often closure codes are used to classify each incident. These codes are useful when data are analyzed further.

SP 1.2 Establish an Incident Management System

Establish and maintain an incident management system for processing and tracking incident information.

An incident management system includes the storage media, procedures, and tools for accessing the incident management system. These storage media, procedures, and tools can be automated but are not required to be automated. For example, storage media might be a filing system where documents are stored. Procedures can be documented on paper and tools can be hand tools or instruments for performing work without automated help.

A collection of historical data covering addressed incidents, underlying causes of incidents, known approaches to addressing incidents, and workarounds should be available to support incident management.

Example Work Products

1. An incident management system with controlled work products

2. Access control procedures for the incident management system

Subpractices

1. Ensure that the incident management system allows the escalation and transfer of incidents among groups.

Incidents may need to be transferred or escalated between different groups because the group that entered the incident may not be best suited for taking action to address it.

2. Ensure that the incident management system allows the storage, update, retrieval, and reporting of incident information that is useful to the resolution and prevention of incidents.



3. Maintain the integrity of the incident management system and its contents.



4. Maintain the incident management system as necessary.

Maintenance should include removing obsolete information and consolidating redundant information that accumulates over time.

SG 2 Identify, Control, and Address Individual Incidents

Individual incidents are identified, controlled, and addressed.

The focus of this goal is on managing individual incidents as they occur to restore service or otherwise resolve the incidents as quickly as possible. Managing individual incidents can also include handling multiple related incidents through actions that focus on completing or restoring already affected service delivery. The practices that comprise this goal include interaction with those who report incidents and those who are affected by them. The processing and tracking of incident data happens among these practices until the incident is addressed and closed.

Treatment of incidents can include collecting and analyzing data looking for potential incidents or simply waiting for incidents to be reported by end users or customers.

The specific practices of this goal can also depend on the practices in the goal to Analyze and Address Causes and Impacts of Selected Incidents. The practices in that goal are used to identify and define the range of approaches available to address individual incidents as called for in this goal.

Often, incidents involve work products that are under configuration management.

Refer to the Configuration Management process area for more information about tracking and controlling changes.

SP 2.1 Identify and Record Incidents

Identify incidents and record information about them.

Capacity, performance, or availability issues often signal potential incidents.

Refer to the Capacity and Availability Management process area for more information about monitoring and analyzing capacity and availability.

Example Work Products

1. Incident management record

Subpractices

1. Identify incidents that are in scope.



2. Record information about the incident.

When recording information about an incident, record sufficient information to properly support analysis and resolution activities.



3. Categorize the incident.

Using the categories established in the approach to incident resolution and prevention, assign the relevant categories to the incident in the incident management system. Communicating with those who reported the incident about its status enables the service provider to confirm incident information early.

SP 2.2 Analyze Individual Incident Data

Analyze individual incident data to determine a course of action.

The best course of action may be to do nothing, to address an incident as a unique case, to increase monitoring for other incidents, to educate an end user, or to employ a previously established workaround or other known reusable solution for handling similar incidents.

The analysis covered by this practice focuses on resolving incidents as they occur through a course of action that is both timely and effective enough to meet immediate service request needs. When more in-depth analyses and actions are required to mitigate future incidents, refer to the goal to Analyze and Address Causes and Impacts of Selected Incidents.

Example Work Products

1. Major incident report

2. Incident assignment report

Subpractices

1. Analyze incident data.

For known incidents, the analysis can be done by merely selecting the type of incident. For major incidents, a separate incident resolution team may be assembled to analyze the incident.

2. Determine which group is best suited to take action to address the incident.

Which group is best suited to take action to address the incident can depend on a number of different factors, including the type of incident, locations involved, and severity.



3. Determine actions that should be taken to address the incident.



4. Plan the actions to be taken.

SP 2.3 Resolve Incidents

Resolve incidents.

Incidents are resolved by following the course of action determined by individual incident analysis. It is possible that the initial selected course of action may fail to resolve an incident or may be only partially successful, in which case additional follow-up analyses and actions may be necessary.

Applying workarounds and other previously established reusable solutions can significantly reduce the impact of incidents, which otherwise be handled on a case-by-case basis. The use of already known reusable solutions to resolve incidents helps to reduce the time required to resolve them, and can also improve the quality of resolutions. It is essential to have a single repository established that contains all previously established reusable solutions. This repository can be used to quickly determine the appropriate reusable solution to be used for related incidents.

Example Work Products

1. Updated incident management record

Subpractices

1. Address the incident using the best course of action.

The best course of action can employ an applicable workaround or other previously established reusable solution if one is available.

2. Manage the actions until the impact of the incident is at an acceptable level.

3. Record the actions and result.

4. Review actions taken that resulted in service system changes to determine if further actions are needed to ensure traceability to requirements.

SP 2.4 Monitor the Status of Incidents to Closure

Monitor the status of incidents to closure.

Throughout the life of the incident, the status of the incident should be recorded, tracked, escalated as necessary, and closed.

Refer to the Work Monitoring and Control process area for more information about providing an understanding of the ongoing work so that appropriate corrective actions can be taken when the performance deviates significantly from the plan.

Example Work Products

1. Closed incident management records

Subpractices

1. Document actions and monitor and track the incidents until they meet the terms of the service agreement and satisfy the incident submitter as appropriate.

Monitor the responses to those who reported the incident and how the incident was addressed until it is resolved to the customer’s or organization’s satisfaction.

2. Escalate incidents as necessary.

The incident should be tracked throughout its life and escalated, as necessary, to ensure its resolution. Escalation may be required if relevant stakeholders are not satisfied with the resolution or if the resolution is urgent or requires non-standard processes or resources.

3. Review the resolution and confirm the results with relevant stakeholders.

Confirming that the underlying causes were successfully addressed can involve confirming with the person who reported the incident or others involved in analyzing the incident that the actions taken in fact resulted in the incident no longer occurring. Part of the result of addressing the incident can be the level of customer satisfaction.

Now that the incident has been addressed, it is confirmed that the service again meets the terms of the service agreement.

4. Close incidents that meet the criteria for closure.

SP 2.5 Communicate the Status of Incidents

Communicate the status of incidents.

Communication is a critical factor when providing services, especially when incidents occur. Communication with the person who reported the incident and possibly those who were affected by it should be considered throughout the life of the incident record in the incident management system. Well-informed end users and customers are more understanding and can even be helpful in addressing the incident successfully.

Communication and coordination between incident resolution staff and service delivery staff may be appropriate to prevent incident resolution actions from interfering with ongoing service delivery.

Typically, the results of actions are reviewed with the person that reported the incident to verify that the actions indeed resolved the incident to the satisfaction of the submitter.

Example Work Products

1. Records of communication with customers and end users

2. Status reports

SG 3 Analyze and Address Causes and Impacts of Selected Incidents

Causes and impacts of selected incidents are analyzed and addressed.

The focus of this goal is on reducing the impact or occurrence of future incidents. The practices in this goal cover the analysis of selected incidents to define how to address similar incidents in the future. The results of this analysis are fed back to those who control and address incidents, and can also lead to the prevention of certain types of incidents.

All incidents have one or more underlying causes that trigger their occurrence. Addressing an underlying cause of some selected types of incidents can reduce the likelihood of service interference, reduce the workload on the service provider, or improve the level of service.

Underlying causes can be identified for selected incidents that have already happened, and for types of incidents that have never occurred but are possible.



The root cause of an incident is often different than the immediate underlying cause. For example, an incident can be caused by a faulty system component (the underlying cause), while the root cause of the incident is a suboptimal supplier selection process. This process area uses the term “underlying cause” flexibly, ranging from immediate causes or conditions to deeper root causes, to allow for a variety of possible solutions ranging from workarounds to complete prevention of a class of related incidents.

Refer to the Causal Analysis and Resolution process area for more information about determining causes of selected outcomes.

SP 3.1 Analyze Selected Incidents

Analyze the underlying causes of selected incidents.

The purpose of conducting causal analysis on incidents is to determine the best course of action to address incidents in the future so that their impact will be minimized most effectively. While completely preventing incidents is usually desirable, other business objectives can limit the extent to which incident prevention is effective. In some cases, it can be more effective to respond to certain incidents after they occur via reusable solutions than it is to try to reduce or prevent their occurrence in the first place. Therefore, a possible course of action includes not addressing an underlying cause at all and continuing to deal with selected incidents after they occur by using newly established or revised workarounds and other reusable solutions.

Often, analyzing incidents involves work products that are under configuration management.

It is essential to have a single repository established that contains all known incidents, their underlying causes, and approaches to addressing these underlying causes. This repository can be used to quickly determine the causes of related incidents.

Refer to the Configuration Management process area for more information about tracking and controlling changes.

Example Work Products

1. Report of underlying causes of incidents

2. Documented causal analysis activities

Subpractices

1. Identify underlying causes of incidents.



Refer to the Risk Management process area for more information about identifying and analyzing risks and mitigating risks.

2. Record information about the underlying causes of an incident or group of incidents.

When recording information about the underlying causes of an incident, record sufficient information to properly support causal analysis and resolution.



3. Conduct causal analysis with the people who are responsible for performing related tasks.

For underlying causes of major incidents, the analysis can involve assembling a separate team to analyze the underlying cause.

Refer to the Causal Analysis and Resolution process area for more information about determining causes of selected outcomes.

4. Determine the best overall approach for dealing with selected incidents in the future.

This approach can include service system changes that reduce or prevent the occurrence of similar incidents, that limit the impact of similar incidents through reusable solutions, or that combine some of these approaches.

SP 3.2 Establish Solutions to Respond to Future Incidents

Establish and maintain solutions to respond to future incidents.

Reusable solutions such as workarounds are important mechanisms that enable service delivery to continue in spite of the occurrence of an incident. (A workaround is a less-than-optimal solution for a certain type of incident that is nevertheless effective enough to use until a better solution can be developed and deployed.) Therefore, it is important that workarounds and other reusable solutions be documented and confirmed to be effective before they are used to address incidents with customers and end users.

Example Work Products

1. Reusable solution description and instructions

2. Contribution to collection of workarounds for incidents

3. Workaround verification results

Subpractices

1. Determine which group is best suited to establish and maintain a reusable solution.

The group should be best suited to define the reusable solution, describe the steps involved, and communicate this information appropriately.

2. Plan and document the reusable solution.

3. Verify and validate the reusable solution to ensure that it effectively addresses the incident.

4. Communicate the reusable solution to relevant stakeholders.

SP 3.3 Establish and Apply Solutions to Reduce Incident Occurrence

Establish and apply solutions to reduce the occurrence of selected incidents.

After analysis has determined the underlying causes of incidents, the actions to be taken, if any, are planned and performed. Planning includes determining who will act, when, and how. All of this information is documented in an action proposal. The action proposal is used by those who take action to address the underlying causes of incidents, and the actions are managed to closure. The end result will be a reduction in the occurrence of the selected incidents.

Example Work Products

1. Action proposal

2. Contribution to collection of known approaches to addressing underlying causes of incidents

3. Updated incident management record

Subpractices

1. Determine which group is best suited to address the underlying cause.

Which group is best suited to address the underlying cause can depend on the type of underlying cause, configuration items involved, and the severity of the relevant incidents.



2. Determine the actions to be taken to address the underlying cause.

When analyzing standard incidents, the actions for addressing that standard incident can be documented as a standard action plan. If the incident is not standard, a historical collection of addressed incidents and known errors should be searched to see if the incident is related to others. This data should be maintained to allow this kind of analysis, thus saving time and leveraging effort.



Refer to the Decision Analysis and Resolution process area for more information about analyzing possible decisions using a formal evaluation process that evaluates identified alternatives against established criteria.

3. Document the actions to be taken in an action proposal.

4. Verify and validate the action proposal to ensure that it effectively addresses the underlying cause.

5. Communicate the action proposal to relevant stakeholders.

6. Address the underlying cause by implementing the action proposal that resulted from the analysis of the incidents’ underlying causes.

Often, the actions called for in an action proposal will include maintaining or changing the service system.

Refer to the Service Delivery process area for more information about maintaining the service system.


SSD Add

Refer to the Service System Development process area for more information about developing service systems.


7. Manage the actions until the underlying cause is addressed.

Managing the actions can include escalating the selected incidents as appropriate.



8. Record the actions and result.

The actions used to address the underlying cause of the selected incidents and the results of those approaches are recorded in the incident management system to support analyzing similar incidents in the future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.152.250