Chapter 11. Problem Management

Introduction

Problems are a fact of life in all data centers. The sheer complexity of systems and diversity of services offered today all but guarantee that problems will occur. One of the many factors that separate world-class infrastructures from mediocre ones is how well they manage the variety and volume of problems they encounter.

This chapter discusses the entire scope of problem management. It begins as usual with a commonly accepted definition of this process followed by its interpretation and implications. Then we discuss the scope of problem management and show how it differs from change management and request management.

Next we look at the length the key steps required to develop a robust problem management process. Also included is a recent survey of infrastructure managers prioritizing their major issues with problem management in their own shops. The chapter concludes with a discussion on how to develop meaningful metrics for problem management. Just as important are some examples showing how one can easily perform trending analysis on these metrics.

Definition of Problem Management

Regardless of how well-designed its processes or how smooth-running its operations, even a world-class infrastructure will occasionally miss its targeted levels of service. The branch of systems management that deals with the handling of these occurrences is called problem management.

The identification of problems typically comes in the form of a trouble call from an end-user to a service desk, but problems may also be identified by programmers, analysts, or systems administrators. Problems are normally logged into a database for subsequent tracking, resolution, and analysis. The sophistication of the database and the depth of the analysis vary widely from shop to shop and are two key indicators of the relative robustness of a problem management process. We will discuss these two characteristics in more detail later in this chapter.

Scope of Problem Management

Several of my clients have struggled with distinguishing the three closely related processes of problem management, change management, and request management. While the initial input to these three processes may be similar, the methods for managing a problem, a change, or a request for service typically varies significantly from each other. As a result, the scope of what actually constitutes problem management also varies significantly from shop to shop. Many infrastructures do agree that first-level problem handling, commonly referred to as tier 1, is the minimum basis for problem management. Table 11-1 shows some of the more common variations to this scheme.

Table 11-1. Variations of Problem Management Schemes

image

The most integrated approach is the last variation in the table, in which all three tiers of problem management are tightly coupled with change management and request management. Among other things, this means that all calls concerning problems, changes, and service requests go through a centralized help desk and are logged into the same type of database.

Distinguishing Between Problem, Change, and Request Management

Problem, change, and request management are three infrastructure processes that are closely related but distinct. Changes sometimes cause problems or can be the result of a problem. Expanding a database that is running out of storage space may be a good, proactive change, but it may cause backup windows to extend into production time, resulting in a scheduling problem for operations.

The same is true of problems causing or being the result of changes. Request management is usually treated as a subset of problem management, but it applies to individuals requesting services or enhancements, such as a specialized keyboard, a file restore, an advanced-function mouse, extra copies of a report, or a larger monitor. Table 11-2 shows a general delineation of problem, change, and service requests.

Table 11-2. General Delineation of Problem, Change, and Service Requests

image

To further delineate the differences between problem, change, and service requests, Table 11-3 provides examples of the three types of requests taken from actual infrastructure environments.

Table 11-3. Examples of Problem, Change, and Service Requests

image

Distinguishing Between Problem Management and Incident Management

Some IT shops draw a distinction between the notion of an incident and that of a problem, and by extension, they draw a distinction between incident management and problem management. In these cases, the delineation usually follows the guidelines offered by the IT Infrastructure Library (ITIL), which was discussed in more detail in Chapter 6Comparison to ITIL Processes.”

In accordance with ITIL best practices, service desk operators initially treat all calls as incidents. After a brief assessment, the operator may determine that the incident does not reflect any kind of failure in the infrastructure but instead is a request of some kind by the user calling in. In this case, the incident falls into the category of a service request, similar to the criteria described in the previous section.

Service desk operators analyze and attempt to resolve those incidents that are not classified as service requests. If the operator is successful in resolving such an incident, it is considered resolved and is designated as closed. If the service desk operator is not able to resolve the incident, it is turned over to a second level, or tier 2, support group for resolution and is then designated as a problem. In short, some IT shops designate tier 1 issues as incidents and their tier 2 issues as problems. The process described in this chapter can apply to both categories of incidents and problems.

Real Life Experience—What You See Not Always What You Get

A relatively large IT organization in a financial services company was determined to staff its service desk with very courteous and knowledgeable personnel. Much money was invested in technical training and telephone etiquette.

Part of the telephone etiquette training included an effective technique to ensure that service desk staff would answer their telephone with a calm, soothing voice. Each call taker had a mirror directly in front of them with the inscription, “What you see is what they hear.”

The mirrors were very popular and helped remind the service desk staff that the expressions on their faces often translate into the quality of their voices. A few months after the mirrors were introduced, a new service-desk person joined the staff. His demeanor and technical training qualified him perfectly for the job.

This new recruit invited a fair amount of teasing from his co-workers, however, due to his outrageously long, curled hair and beard. Several of his colleagues joked that he had a face for radio, while others chimed in that he had the perfect look for a service-desk agent. Many claimed he looked scary to them. He didn’t really believe that most of his co-workers found his look all that scary until one day he came to work and noticed someone had changed the inscription on his mirror to read, “What you see is what we all fear!”

The Role of the Service Desk

Most all IT organizations have some type of service desk to offer assistance to end-users. In the past, IT managers used various names to designate this function, including help desk, call center, trouble desk, or technical support. As IT evolved to become more customer-oriented, the name of this area changed to that of customer support or the customer service center. With the emphasis today on IT service management, many organizations have renamed their help desks to that of service desk.

The service desk is a function, not a process. A function is defined within a department on an organization chart, has strict organizational boundaries, and is associated with numerous personnel management issues. For example, the implementation of a service desk involves hiring staff, training them, evaluating their performance, offering them career paths and promotions, and rectifying unacceptable behavior. A process has none of these characteristics. Table 11-4 summarizes some of these generic differences between a function and a process.

Table 11-4. Differences Between a Function and a Process

image

Because a service desk is a function and not a process, it is not designated as one of the 12 systems management processes. But the service desk does play a very crucial role in problem management. It serves as the single point of contact into IT for all users and customers; also, it is responsible for accurately logging all calls that come into the service desk, classifying them, handling them as best as possible, and tracking them to completion. The relative effectiveness of a problem management process is directly related to the caliber of the individuals who staff the service desk. It is hard to overstate the significant role the service desk plays in the success of a problem management process.

Segregating and Integrating Service Desks

As IT services grow in many companies, the number of service desks frequently grows. When PCs first came onto the scene, they were usually managed by a support group that was totally separate from the mainframe data center, including a separate service desk for PC-related problems. This pattern of multiple service desks was often repeated as networks grew in size, number, and complexity; the pattern was also prevalent when the Internet blossomed.

As IT organizations began recentralizing in the early and mid-1990s, infrastructure managers were faced with the decision of whether to segregate their service desks or integrate them. By this time, many IT shops had grown out of control because they used multiple call centers—companies with 10 or 15 IT service desks were not uncommon. If non-IT services were included in the mix, employees sometimes had as many as 40 or 50 service numbers to choose from for an inquiry. In one extreme example, a bank provided nearly 100 separate service desks for IT alone, having grown a new one each time an application was put into production.

Over time the benefits of integrating service desks gradually prevailed in most instances. This is not to say that segregated service desks are always worse than integrated ones. Highly diverse user populations, remote locations, and a wide range of services are some reasons why segregated service desks can sometimes be a better solution.

Table 11-5 summarizes the advantages and disadvantages of integrating services desks. Since the advantages and disadvantages of segregating service desks are almost the exact reverse of those for integrating them, the table is not repeated for segregated service desks.

Table 11-5. Advantages and Disadvantages of Integrated Service Desks

image

Real Life Experience—One-Stop Shopping, Eventually

A major motion picture studio had an IT department with seven different help desks that were slated for consolidation. An extensive marketing campaign was used to inform customers of the new single help-desk number with a catchy slogan that said, “One call does it all.” The slogan was posted all over the studio.

In order to enact a single service desk number that would provide all of the diverse services previously offered, automated menus were used. So many menus were provided initially that users often became confused and frustrated. Many posters throughout the studio were written over to read, “One call does not do it all!” or “These multiple menus will jerk and bend you!”

The frustrations eventually subsided as the menus were reduced and simplified and as more help-desk agents were added to the staff.

An integrated service desk is not only easier for customers to use, it lends itself to the cross-training of staff. Furthermore, there is a cost savings because companies can standardize the call-tracking tool by collecting all call records on a single, centralized database. Even more is saved by utilizing staff more efficiently. Also, since there are fewer and more standardized help desks, management is easier and less complex. Finally, it is easier to integrate other system management disciplines into it, particularly change management.

The primary advantage of a segregated service desk is the ability to customize specialized support for diverse applications, customers, and services. For example, some consolidated service desks have attempted to offer fax, telephone, facilities, office moves, and other IT-related services only to find they were attempting to do too much with too few resources; they were providing too little service to too many users. In this case, multiple service desks tend to work better, although communication among all of them is critical to their combined success.

A compromise hybrid solution is sometimes used in which all IT customers call a single service desk number that activates a menu system. The customer is then routed to the appropriate section of a centralized service desk depending on the specific service requested.

Key Steps to Developing a Problem Management Process

The following 11 key steps are required to develop a robust problem management process. We then discuss each step in detail.

  1. Select an executive sponsor.
  2. Assign a process owner.
  3. Assemble a cross-functional team.
  4. Identify and prioritize requirements.
  5. Establish a priority and escalation scheme.
  6. Identify alternative call-tracking tools.
  7. Negotiate service levels.
  8. Develop service and process metrics.
  9. Design the call-handling process.
  10. Evaluate, select, and implement the call-tracking tool.
  11. Review metrics to continually improve the process.

Step 1: Select an Executive Sponsor

A comprehensive problem management process is comprised of individuals from an assortment of IT departments and of suppliers that are external to IT. An executive sponsor is needed to bring these various factions together and to ensure their support. The executive sponsor must also select the problem management process owner, address conflicts that the process owner cannot resolve, and provide executive leadership, direction, and support for the project.

Step 2: Assign a Process Owner

One of the most important duties of the executive sponsor is to assign the appropriate process owner. The process owner will be responsible for assembling and leading a cross-functional process design team, for implementing the final process design and for the ongoing maintenance of the process. The selection of this individual is very key to the success of this project as this person must lead, organize, communicate, team-build, troubleshoot, and delegate effectively.

Table 11-6 shows the priority order of several characteristics I recommend for a problem management process owner to use when choosing this individual. The ability to promote teamwork and cooperation is extremely important due to the large number of level 2 support groups that become involved with problem resolution. This is also why the ability to work with diverse groups is necessary though not at as high in priority. Knowledge of desktops is key due to the overwhelming number of desktop calls. The ability to analyze metrics and trending reports ensures continual improvements to the quality of the process. Effectively communicating and meeting with users is essential to good customer feedback. Other lower-priority characteristics involve a rudimentary working knowledge of areas of likely problems, including database systems, system and network components, facilities issues, and backup systems.

Table 11-6. Problem Management Process Owner Characteristics

image

Step 3: Assemble a Cross-Functional Team

A well-designed problem management process involves the participation of several key groups. Mature, highly developed processes may have as many as a dozen areas serving as second-level, tier 2 support. A representative from each of these areas, along with key user reps and tier 1 support, normally comprise such a team.

This team is responsible for identifying and prioritizing requirements, establishing the priority scheme, negotiating internal SLAs, proposing internal process metrics and external service metrics, and finalizing the overall design of the call-handling process. The specific areas participating in this cross-functional process design team vary from shop to shop but there are some common groups that are usually involved in this type of project. The following list includes typical areas represented by such a team:

  1. Service desk
  2. Desktop hardware support
  3. Desktop software support
  4. Network support
  5. Operations support
  6. Applications support
  7. Server support (includes mainframe and midrange)
  8. Database administration
  9. Development groups
  10. Key user departments
  11. External suppliers

Step 4: Identify and Prioritize Requirements

Once the cross-functional team members have been identified and assembled, one of the team’s first orders of business is to identify and prioritize requirements. Specific requirements and their relative priorities depend on an organization’s current focus and direction, but several common attributes are usually designed into a robust problem management process. It should be noted that this step does not yet involve the actual implementation of requirements; it focuses on acquiring the team’s consensus as to a requirement’s inclusion and priority.

A variety of brainstorming and prioritizing techniques are available to do this effectively. In Chapter 19, “Developing Robust Processes,” I discuss brainstorming in more detail and list several helpful brainstorming ground rules. A sample of prioritized requirements from former clients is shown in Table 11-7.

Table 11-7. Typical Problem Management Requirements with Priorities

image

Step 5: Establish a Priority and Escalation Scheme

A well-designed priority scheme is one of the most important aspects of an effective problem management process. Priority schemes vary from shop to shop as to specific criteria, but most all of them attempt to prioritize problems based on severity, impact urgency, and aging. Closely related to the priority scheme is an escalation plan that prescribes how to handle high-priority but difficult-to-resolve problems. Using an even number of levels prevents a tendency to average toward the central, and employing descriptive names rather than numbers is more user-friendly. Table 11-8 is an example of a combined priority scheme and escalation plan from one of my former clients.

Table 11-8. Sample Priority and Escalation Scheme

image

Step 6: Identify Alternative Call-Tracking Tools

The call-tracking tool is the cornerstone of an effective problem management process. Requirements for the tools will have already been identified and prioritized in Step 4. Companies usually lean toward either having their tools custom developed or purchasing a commercially available application. I have seen several instances of both types of call-tracking tools, and each kind offers unique advantages and disadvantages. Commercial packages generally offer more flexibility and integration possibilities while custom-developed solutions normally cost less and can be more tailored to a particular environment. In either event, alternative solutions should be identified for later evaluation.

Step 7: Negotiate Service Levels

External service levels should be negotiated with key customer representatives. These agreements should be reasonable, enforceable, and mutually agreed upon by both the customer service department and IT. Internal service levels should be negotiated with internal level 2 support groups and external suppliers. A more detailed discussion of key customers and key suppliers is provided in Chapter 4Customer Service.”

Step 8: Develop Service and Process Metrics

Service metrics should be established to support the SLAs that will be in place with key customers. The following are some common problem management service metrics:

  1. Wait time when calling help desk
  2. Average time to resolve a problem at level 1
  3. Average time for level 2 to respond to a customer
  4. Average time to resolve problems of each priority type
  5. Percentage of time problem is not resolved satisfactorily
  6. Percentage of time problem is resolved at level 1
  7. Trending analysis of various service metrics

Process metrics should be established to support internal SLAs that will be in place with various level 2 support groups and with key external suppliers. The following list shows some common problem management process metrics:

  1. Abandon rate of calls
  2. Percentage of calls dispatched to wrong level 2 group
  3. Total number of calls per day, week, month
  4. Number of calls per level 1 analyst
  5. Percentage of calls by problem type, customer, or device
  6. Trending analysis of various process metrics

Step 9: Design the Call-Handling Process

This is the central process of problem management and requires the participation of the entire cross-functional team. It dictates how problems are first handled, logged, and analyzed and later how they might be handed off to level 2 for resolution, closing, and customer feedback.

Step 10: Evaluate, Select, and Implement the Call-Tracking Tool

In this step, the alternative call-tracking tools are evaluated by the cross-functional team to determine the final selection. Work should then proceed on implementing the tool of choice. A small subset of calls is sometimes used as a pilot program during the initial phases of implementation.

Step 11: Review Metrics to Continually Improve the Process

All service and process metrics should be reviewed regularly to spot trends and opportunities for improvement. This usually becomes the ongoing responsibility of the process owner.

Opening and Closing Problems

The opening and closing of problems are two of the most critical activities of problem management. Properly opened tickets can lead to quick closing of problems or to appropriately dispatched level 2 support groups for timely resolution. Infrastructures with especially robust problem management processes tie in their call-tracking tool with both an asset management database and a personnel database. The asset database allows call agents to view information about the exact configuration of a desktop and its problem history. The personnel database allows agents to learn the caller’s logistics and their recent call history.

Closing a problem should be separate and distinct from resolving it. A problem is normally said to be resolved when an IT service has been restored that had been interrupted. A problem is said to be closed when the customer sends confirmation in one form or another that the resolution activity was satisfactory and that the problem has not reoccurred. In general, a level 2 problem will be resolved by a level 2 support analyst, but it should be closed by the service-desk analyst who opened it.

Analyzing a variety of trending data can help to continually improve the problem management process. This data may concern the types of problems resolved, the average time it took to resolve, devices involved, customers involved, and root causes. Table 11-9 provides a sense of the variety and distribution of problems handled by a typical service desk. This data shows a recent representative month of IT problem calls to a service desk at a major motion picture studio.

Table 11-9. Monthly Distribution of Typical Service Desk Problem Types

image

Three items are noteworthy:

  1. The inclusion of the accumulative percent field gives an immediate snapshot of the distribution of calls. One-third of the calls are the result of problems with PC software; over half the calls are a result of desktop software problems in general (comprised of both PC and Macintosh software).
  2. The overwhelming majority of calls—92.3%—are desktop-related.
  3. The recent inclusion of fax service requests is showing a healthy start—over 100 requests a month.

Summary snapshot data such as this can be fed into weekly and monthly trend reports to analyze trends, patterns, and relationships. This analysis can then be used to formulate appropriate action plans to address issues for continual improvement.

Client Issues with Problem Management

I conclude this chapter with the results from a recent survey of infrastructure mangers and senior analysts on what they believed were the greatest exposures with their current problem management processes. The respondents represented more than 40 client companies. They identified, in their own words, 27 separate issues. Table 11-10 lists these issues and their percentage of occurrences.

Table 11-10. Issues with Problem Management Processes

image

image

Assessing an Infrastructure’s Problem Management Process

The worksheets shown in Figures 11-1 and 11-2 present a quick-and-simple method for assessing the overall quality, efficiency, and effectiveness of a problem management process. The worksheet shown in Figure 11-1 is used without weighting factors, meaning that all 10 categories are weighted evenly for the assessment of a problem management process. Sample ratings are inserted to illustrate the use of the worksheet. In this case, the problem management process scored a total of 26 points for an overall nonweighted assessment score of 65 percent, as compared to the second sample worksheet, which compiled a weighted assessment score of 69 percent.

Figure 11-1. Sample Assessment Worksheet for Problem Management Process

image

image

Figure 11-2. Sample Assessment Worksheet for Problem Management Process with Weighting Factors

image

image

One of the most valuable characteristics of these worksheets is that they are customized to evaluate each of the 12 processes individually. The worksheets in these figures apply only to the problem management process. However, the fundamental concepts applied in using these evaluation worksheets are the same for all 12 disciplines. As a result, the detailed explanation on the general use of these worksheets presented near the end of Chapter 7, “Availability,” also applies to the other worksheets in the book. Please refer to that discussion if you need more information.

Measuring and Streamlining the Problem Management Process

We can measure and streamline the problem management process with the help of the assessment worksheet shown in Figure 11-1. We can measure the effectiveness of a problem management process with service metrics such as calls answered by the second ring, calls answered by a person, calls solved at level 1, response times of level 2, and feedback surveys. Process metrics—such as calls dispatched to wrong groups, calls requiring repeat follow-up, and the amount of overtime spent by level 2 and third-party vendors—help us gauge the efficiency of this process. And we can streamline the problem management process by automating actions such as escalation, paging, exception reporting, and the use of a knowledge database.

Summary

This chapter began in the usual manner with a formal definition of problem management, followed by a discussion of the differences between problem management, change management, and request management. Following this we showed how problem management differs from incident management and we described the important role that the service desk plays in the problem and incident management processes.

Next was the core of the chapter, in which we looked at the 11 key steps to developing a robust problem management process. These included preferred characteristics of a process owner, areas that should be represented on a cross-functional team, a sample of prioritized requirements, a representation of a problem priority scheme, and examples of service and process metrics.

We concluded by discussing proper methods to open and close problem tickets; advantages and disadvantages of integrated help desks; and candid results of a survey from over 40 infrastructure managers and analysts on what they believed were their major concerns with their own problem management processes.

Test Your Understanding

1. Problems are normally logged into a database for subsequent tracking, resolution, and analysis. (True or False)

2. One reason so many groups are involved with designing a problem management process is that many of them serve as second level, tier 2 support. (True or False)

3. A customer provides key participation during the time a problem is:

a. logged

b. tracked

c. resolved

d. closed

4. What types of analyses can be used to reduce the number and duration of problem calls?

5. Describe the advantages and disadvantages of consolidating help desks.

Suggested Further Readings

1. IT Problem Management; 2001; Walker, Gary; Prentice Hall

4. Effective Computer User Support: How to Manage the IT Help Desk; 2002; Bruton, Noel; Butterworth-Heinemann

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.164.151