CHAPTER 14

Managing Errors

(No Blame)

Human error has been studied extensively. Research has shown that most of the errors humans make are typically associated with a change or something out of the ordinary occurring in their environment.45 The absolute prediction and elimination of human error is elusive and will probably always remain so.i

What is known can be applied to reduce human error or, at the very least, to reduce its consequences.46

The traditional approach to human error is the one that “names, blames, and shames” an individual as “causing” the accident—often referred to as the person model. The underlying assumption is that mistakes and errors are the results of human negligence, inattention, carelessness, lack of skill or knowledge, lack of motivation, or one of a host of other faulty, negative mental processes.47

The person model uses fear and discipline to attempt to improve safety. The person model considers errors as a “moral” issue that “bad things happen to bad people.”

On the other hand, the system model recognizes that systematic contributions cause errors. This model acknowledges that the organizational culture, human-to-system interface design, and environmental elements can create latent failure conditions. These latent failure conditions contribute to human error. The system model recognizes human limitations, and those human mistakes are inevitable. Therefore, systems should be designed to anticipate human error and to mitigate its consequences.ii

Indeed, “errare humanum est” (error is human)iii; therefore, in any business endeavor, planning for and with the probability that a human error will happen is a reasonable management strategy because workers at all levels will make mistakes.

Most of those mistakes will not be serious. Some could involve forgetfulness (I forgot to ship on time), and others could be financial errors (the two numbers have been transposed).

Other mistakes will be administrative errors or omissions that result from incorrect procedures or misunderstandings, usually for lack of communication clarity.

These types of mistakes are easily remedied and corrected. Proper training will help minimize their repetition. These types of errors are useful reminders that the workplace is also a constant learning place. When errors are manageable, which can also mean fixable, they become constructive learning opportunities for all involved.

Other types of errors include those that happen because of events out of the control of the business or those that could have serious negative consequences for the company, but were forgotten.

Situations outside the control of the business can include natural disasters such as earthquakes, tornados, floods, and so on; economic disasters, such as recessions, inflations, currency fluctuations, and so on; political events such as terrorism, rebellions, and even wars; and, of course, human misconduct, such as corruption, embezzlement, sabotage, and cyber- and hacker attacks.

For situations out of the business’s control, three management approaches can be considered.

The first approach is to just “forget about it,” which is the most common action small- and medium-sized businesses take. It can be too demanding for most small- and medium-sized businesses to plan for these types of contingencies.

The second approach is to buy some form of business insurance. This is not always affordable or available to small- and medium-sized businesses. Yet, this option is worth pursuing when the stakes become high enough.

The third option is to plan for some of those inconceivable situations.

Vignette: Preparing for the Inconceivable

The inconceivable happened on September 11, 2001.

Since then, the global business community has been made aware of the effects of major disasters on business operations, disasters going way beyond those resulting from only technology service disruptions. Companies with solid business continuity plans (BCPs)iv survived. The other ones faltered.

As a partner of a technology consulting firm at the time, I had the opportunity to develop several IT disaster recovery plans (DRPs)48 for financial institutions and retail service companies.

The typical scenario involved a detailed analysis of the IT the company was using (hardware and software) and devising a set of procedures to follow if or when the computer system failed.

This included exploring the use of alternative hot or cold sites, where the hot site would consist of a duplication of the company’s computer infrastructure, including the operation of its network (a costly approach), and the cold site that would be an empty shell of technology equipment—usually shared with other companies. A cold site could be converted into a workable site in short order (within hours or days) in times of need.

Some options would also include mutual sharing of technology infrastructures when possible, which was rare because of costs or competition concerns.

Recently, cloud computingvand the Internet-based virtual computing environment49 have become other attractive options for most small- and medium-sized businesses.

Unfortunately, technology is not the only thing that can go wrong in business.

Although technology-based DRPs have been around for several decades because technology is generally considered as a critical business function, functional DRPs, also called BCPs,50 have taken the back bench and have often been forgotten by small- and medium-sized business executives.

According to Jim Hoffer, CIO and vice president of management information systems at Synertech, Harrisburg, Pennsylvania, 80 percent of companies that had a DRP cover mainly their IT resources. About 50 percent cover their networks, and one-third tend to protect in some way the information residing on their personal computers.51

Public Safety Canada (PSC) has developed a comprehensive website on the topic of BCP.52

It is essential to understand the importance of going beyond technology DRP and prepare a more comprehensive BCP in the case of major business disruption.

I do not intend to repeat here all the information available on PSC’s website, but I believe that it would be useful to review some of the major points covered by the PSC material.

The inconceivable is not.

Lessons Learned: 9/11

The lessons learned from the 9/11 event include

The importance of considering all types of threats

The value of ensuring key personnel backup

The significance of having alternative networks and telecommunications options

The availability of a comprehensive IT backup system

Other major BCP considerations include

The need for providing employee support (counseling) when needed

The need to update and frequently test the BCP

The need of alternative operation sites because sizable security perimeters may surround the scene of incidents involving national security or law enforcement, and can impede personnel from returning to office buildings

The need for planning longer recuperation periods because increased uncertainty (following a high-impact disruption such as terrorism) may lengthen the time until operation is normalized

Adapted from Public Safety Canada.

Lessons Learned: Murphy’s Law

I have seen many small, medium-sized, and big businesses forget that “anything that can go wrong will go wrong.” When the unplanned or inconceivable happens, the typical reaction is to find someone to blame.

On one of the main inside walls of my manufacturing company’s plant, I had a sign that read: “No blame.”

One version of the Murphy’s Law states,

“The sandwich will always fall on the side of the jam.”

iResearch by Dr. James Reasons of the University of Manchester in England found that humans commit an average of six errors per week.

iiA system model is the conceptual model that describes and represents a system. A system comprises multiple views such as planning, requirement (analysis), design, implementation, deployment, structure, behavior, input data, and output data views. A system model is required to describe and represent these multiple views. (Source: Wikipedia.)

iiiFrom a longer quote as follows: “Errare (Errasse) humanum est, sed in errare (errore) perseverare diabolicum,” attributed to Seneca, which translates to: “To err is human, but to persist in error (out of pride) is diabolical.” (Source: https://en.wiktionary.org/wiki/errare_humanum_est#Latin.)

ivBCP (also called business continuity and resiliency planning [BCRP])

vCloud computing is a recently evolved computing terminology or metaphor based on utility and consumption of computing resources. Cloud computing involves deploying groups of remote servers and software networked that allow centralized data storage and online access to computer services or resources. Clouds can be classified as public, private, or hybrid. (Source: Wikipedia.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.196.182