The Top Three Missing or Broken Processes

Now don't fall out of your chair when you read this. The top three missing or broken processes are:

  1. Client/server production acceptance (CSPA)

  2. Change control

  3. Problem management

We can understand the first one because it only evolved over the past five years. But numbers two and three? They've been around for over two decades. This is pretty hard to swallow in this day and age.

Change control and problem management are two of the most critical processes or disciplines in data processing as depicted in Table 5-1. They're on top for a reason, namely that they are necessary to keep the organization flowing. For this reason we highlight them here, and go into further detail in the following pages.

The big problem is that three fourths of the companies we studied didn't even have an enterprise-wide change control process. Problem management was not as bad; 90 percent of the companies had something that resembled problem management. That's the good news, but the bad news is that 65 percent of them were broken. How can this be? Is anything and everything that ever came out of the mainframe environment tossed aside? Unfortunately, the answer appears to be yes. Seventy-five percent of the companies we studied had a legacy environment. Why is everyone turning his or her back to a very successful mission-critical, production-support environment? We can turn our back to many other aspects of the mainframe era but not the way it provides RAS. Once again that perception rears its ugly ageless head.

But there's so much more that's missing than just these three processes. The remaining issues are discussed later in this section.

Table 5-1 defines the components and processes for managing a disciplined production environment. The purpose of this diagram is to highlight the importance of the top three processes that are missing or broken in most computing environments.

Table 5-1. Components and Processes for Managing a Disciplined Production Environment
Client/Server Production Acceptance (CSPA) Problem Management Change Control Service Level Agreements  
System and Network Security Version Release Management Network Management Event Monitoring Disaster Recovery
Capacity Planning Performance Monitoring Asset Management Software Distribution Job Scheduling
User Security Access Console Management Disk Management   

Lack of an Enterprise-wide Change Control Process

Change Control Process: A process that coordinates any change that can potentially impact the operational production environment.

Seventy-five percent of the companies we studied did not have an enterprise-wide, change control process. The ones that did had many problems. In Table 5-2 we highlight these problems and the percentage of occurrences from our elite group of 40.

Table 5-2. Problems with the Current Change Control Processes
Issues Percent of Occurrences
Not all changes are logged. 95%
Changes not thoroughly tested. 90%
Lack of enforcement. 85%
Lack of effective method for communicating within IT. 75%
Coordination within the groups is poor—only the person attending the change meetings is aware of the events—on many occasions the information is not disseminated throughout the organization. 65%
Lack of centralized ownership. 60%
Change control is not effective. In some instances changes are being made on the production servers without coordination or communication. 60%
Lack of approval policy. 50%
Hard copies of 'changes' kept in file cabinets. 50%
On many occasions notification to all after the fact. 40%
Managers and directors sign a hard copy of every change. 25%
Current process is only a form for notification. 20%
The process is bureaucratic and not user-friendly. 20%
Several different flavors of change control. 20%

A change is any addition or modification to the data-processing systems that could potentially affect the stability of the production environment. Areas of change include, but are not limited to, hardware, system software (OS), application software, networks, environment (heating, cooling, and so on), and documentation.

When implementing change control, what is often overlooked, the fatal flaw in distributed systems, is not that you don't have enough controls and practices, but that there are not adequate checks and balances to detect unauthorized changes or unforeseen consequences. Checks and balances, especially for change control, help eliminate human error, improve the efficiency and effectiveness of the process, and ensure that we maintain high systems availability.

The basic elements in the change process are:

  • Notification

  • Review

  • Approvals

  • Scheduling

  • Implementation

When a change to a production system (server, application, network, etc.) is required, the individual responsible for implementing the change fills out a form that documents the answers to the following questions:

  • What's the business reason for change so it can be prioritized?

  • When can it be done?

  • When must it be done?

  • What are the changes?

  • What do they affect?

  • Are there prerequisites and/or corequisites?

  • Will the customer see the changes?

  • How to communicate to customers?

  • Will they change any of the operational parameters?

  • How long will the change take?

  • What is the status of the application-related applications, operating system, hardware, network, etc., during implementation of the change?

  • What are the symptoms of the change not working properly?

  • What will be the back-out procedure for the change?

  • How long will the back-out take?

  • What is the status of the application-related applications, operating system, hardware, network, etc., during the back-out?

Once the form has been completed the individual acquires the proper approvals, and then has to submit it to a group that owns the change control process. Changes are then reviewed and approved weekly. Bureaucratic? Sure it is. That's because properly-designed checks and balances, when used, mitigate risk. We want to minimize problems, avoid Murphy's Law of everything going wrong, to maintain RAS.

Secure? Hardly, because regardless of any process that looks good on paper, certain individuals have the capability of making changes to the environment without following the process. A good example of this is a mainframe systems programmer. This individual could make a change to the MVS operating system whenever he/she wanted. They owned the key to the box. So why bother going through all this? What's the point? Because the MIS staff was mentored in disciplines and this type of behavior was unheard of.

To ease the bureaucratic delays and paper shuffling that can bog down controlled changes, the entire approval process, including authorizing signatures, can be put on e-mail. The person requesting a system change fills out a change request (online) script. The change request describes the proposed change and the impact it will have on the system, which unit or group owns the server, any special change instructions, and what procedures to take to back out of the change if it fails.

Another option to ease paper shuffling, as well as managerial nerves, might be an automated, online Web-based bulletin board that lets users see which requests are pending for approval, which have been approved but are awaiting implementation, and systems changes up to a month old. Additionally, this application can have alerts that can e-mail change-control documents and updates regularly to IT personnel. Your goal should be to track and communicate (via e-mail) each and every change affecting your production environment. This process provides discipline, an audit trail for accountability, and mitigates risk so that IT's customers have a greater level of satisfaction.

Next, you can automatically monitor those servers to detect changes by using a simple command in Unix to compare a before and after snapshot of a server. The results are stored in a database for review by staff and management. You should know exactly what changes have been made to any server in the past 24 hours and who made them.

By storing all information in a database, retrieving information or audit trails by server will be simplified and accurate.

Lack of an Effective Problem Management Process

Problem Management Process: A centralized process to manage and resolve user, network, application, and system problems.

Ninety percent of the companies we studied have some sort of problem management process, but many of them were riddled with problems as depicted in Table 5-3.

Table 5-3. Problem Management Issues
Issues Percentage of Occurrences
Customer satisfaction isn't measured after problem is resolved. (None of the companies we surveyed had a post-resolution customer survey process!) 100%
Multiple levels of support are not clearly defined for client/server environments. 100%
Support personnel have very little exposure to new systems when deployed. 95%
Level 2 analysts not putting in detailed description of problem resolution. 90%
Lack of written documentation on existing and new systems/applications. 90%
Roles and responsibilities not clearly defined for problem resolution. 90%
Service levels of the companies studied not defined for problem resolution. 90%
Lack of root-cause analysis. 85%
Help desk staff not properly trained on new releases of applications. 80%
Lack of centralized ownership. 70%
Lack of close-loop feedback. 70%
Lack of metrics (performance or quality incentives). 65%
Problems not followed through to closure. 60%
Perception is that the help desk is not responsive. 50%
Problem tracking is poor, leaving the user without a clear understanding of who owns the problem and how to follow up on the resolution process. 50%
Most of the Unix or NT problems bypass the help desk—going directly to senior technical staff instead. 50%
Not one common enterprise-wide problem management process i.e., escalation—each group has its own requirements (e.g.,"Don't call me between the hours of X and X on Saturday night, call X."). 40%
HelpdDesk staff has very little authority. 40%
Many calls bypass the help desk. 35%
Lack of clear demarcation for problem resolution between desktop and LAN group. 30%
Help desk acts more like a dispatch center, leaving users frustrated, feeling like they simply get the runaround while no one is willing to solve problems. 30%
Not clearly defined for after-hours support in the client/server environment. 25%
Escalation not clearly understood. 20%
Problems not documented by help desk personnel. 20%
The process is extremely bureaucratic. A high number of notification and escalation procedures. 15%
Sometimes problems just sit around for days or weeks. 15%
On-call list is on the mainframe, but not everyone has mainframe access. 10%

The purpose of problem management is to establish an ongoing process to resolve problems, minimize the impact affecting IT services, and optimize the time and effort spent in resolution. The roblem management process facilitates immediate resolution without wasting time to figure out how to manage a problem. This is achieved by setting into motion a management process that encompasses an "interdepartmental" problem effort that will effectively manage the measurements of a tracking, escalation, resolution and reporting system (see the process flow diagram in Figure 5-1.)

The lack of a full problem management solution hinders IT from its ability to measure service delivery and manage its customer base; thus customer service is actually limited. It also makes it difficult for IT to manage scarce resources because it is difficult to gauge where time is being spent firefighting rather than providing active service. Because users are calling technical staff directly, very little time is left to keep abreast and plan for new technologies.

As a side note, we were shocked to find that some of the companies we surveyed didn't have established problem management processes in place in their IT shops. Some of these companies, however, have been highlighted in numerous news stories and books on how their public relations staffs handle major problems such as disasters, customer complaints, or customer returns. Ironic, isn't it?

Figure 5-1. Problem management process.


Lack of a Production Acceptance Process

Production Acceptance Process: A methodology to promote communication, standards, guidelines, and teamwork for deploying, implementing, and supporting mission critical client/server distributed systems.

The Client/server production acceptance (CSPA) process is also an operations runbook, service level agreement, and a working document defining everyone's roles and responsibilities for each new client/server production application. It is the QA process for production. The process should also be used for all major revisions to applications.

The CSPA will provide development and operations with the adhesive needed to bring the development and support parties together through structured communication and by setting expectations to implement and support mission-critical applications. The CSPA provides a checklist of requirements needed for operational groups to support a system installation in production. The CSPA also specifies everyone's roles and responsibilities for supporting mission-critical systems.

The CSPA is the cornerstone of all processes. We've been preaching about the needs for and benefits of this process since our first book, Rightsizing the New Enterprise, which we published in 1994. In the book we referred to this process as the Unix Production Acceptance (UPA) process. Very few companies use such a tool. In one of our earlier books in the Enterprise Institute Series, Software Development (Building Reliable Systems) (Prentice-Hall), we introduced the Web-centric Production Acceptance Process. We estimate only 10 percent (and that's being generous) in the industry use a production acceptance process to manage their enterprise. Sad, but it's true. Out of the 40 Fortune 1000 companies we studied for this book, only a few had something like it.

Everywhere we go executives talk about improving communications, but this is easier said than done. Doing it effectively is a whole different ballgame. Improving communications has to be driven by a process. Monthly meetings and quarterly get-togethers don't cut it. You need to have a single process to monitor service levels which covers topics such as:

  • User requirements

  • Business issues

  • New applications

  • Revisions to existing applications

  • Services provided, etc.

But don't burden these poor users with several different bureaucratic processes. They're trying to do more with less as well. What's needed is a single process to promote and instill effective communication practices. We discuss this process, which we refer to as production acceptance, in several books: Rightsizing the New Enterprise, Managing the New Enterprise, and Building the New Enterprise.

Although the name is misleading, it is so much more than a production acceptance process. Its primary focus is to instill and promote effective communication practices internally within IT and a process for IT to effectively communicate with its users on a daily basis. This is done by adhering to the CSPA checklist for each application being deployed into a production environment. It is a constant reminder to bring the groups together within IT and the customers to assure proper system deployment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.178