Institutional and individual customers have increasingly better and broader awareness of products (and services) and are increasingly making smarter choices in their purchases. In fact, because society as a whole continues to become more knowledgeable of product performance, quality, reliability, and cost, these attributes are considered to be market differentiators.
People are responsible for designing, manufacturing, testing, maintaining, and disposing of the products that we use in daily life. Perhaps you may agree with Neville Lewis, who wrote, “Systems do not fail, parts and materials do not fail—people fail!” (Lewis 2003) It is the responsibility of people to have the knowledge and skills to develop products that function in an acceptably reliable manner. These concepts highlight the purpose of this book: to provide the understanding and methodologies to efficiently and cost effectively develop reliable products and to assess and manage the operational availability of complex products, processes, and systems.
This chapter presents the basic definitions of reliability and discusses the relationship between quality, reliability, and performance. Consequences of having an unreliable product are then presented. The chapter concludes with a discussion of supplier–customer reliability objectives and responsibilities.
The word quality comes from the Latin qualis, meaning “how constituted.” Dictionaries define quality as the essential character or nature of something, and as an inherent characteristic or attribute. Thus, a product has certain qualities or characteristics, and a product's overall performance, or its effectiveness, is a function of these qualities.
Juran and Gryna (1980) looked at multiple elements of fitness for use and evaluated various quality characteristics (or “qualities”), such as technological characteristics (strength, weight, and voltage), psychological characteristics (sensory characteristics, aesthetic appeal, and preference), and time-oriented characteristics (reliability and maintainability). Deming (1982) also investigated several facets of quality, focusing on quality from the viewpoint of the customer.
The American Society for Quality (ASQC Glossary and Tables for Statistical Quality Control 1983) defines quality as the “totality of features and characteristics of a product or service that bear on its ability to satisfy a user's given needs.” Shewhart (1931) stated it this way:
The first step of the engineer in trying to satisfy these wants is, therefore, that of translating as nearly as possible these wants into the physical characteristics of the thing manufactured to satisfy these wants. In taking this step, intuition and judgment play an important role, as well as a broad knowledge of the human element involved in the wants of individuals. The second step of the engineer is to set up ways and means of obtaining a product which will differ from the arbitrary set standards for these quality characteristics by no more than may be left to chance.
One of the objectives of quality function deployment (QFD) is to achieve the first step proposed by Shewhart. QFD is a means of translating the “voice of the customer” into substitute quality characteristics, design configurations, design parameters, and technological characteristics that can be deployed (horizontally) through the whole organization: marketing, product planning, design, engineering, purchasing, manufacturing, assembly, sales, and service.
Products have several characteristics, and the “ideal” state or value of these characteristics is called the target value (Figure 1.1). QFD (Figure 1.2) is a methodology to develop target values for substitute quality characteristics that satisfy the requirements of the customer. Mizuno and Akao (Shewhart 1931) have developed the necessary philosophy, system, and methodology to achieve this step.
Although there is a consensus that reliability is an important attribute of a product, there is no universally accepted definition of reliability. Dictionaries define reliability (noun) as the state of being reliable, and reliable (adjective) as something that can be relied upon or is dependable.
When we talk about reliability, we are talking about the future performance or behavior of the product. Will the product be dependable in the future? Thus, reliability has been considered a time-oriented quality (Kapur 1986; O'Conner 2000). Some other definitions for reliability that have been used in the past include:
As evident from the listing, various interpretations of the term reliability exist and usually depend on the context of the discussion. However, in any profession, we need an operational definition for reliability, because for improvement and management purposes, reliability must be precisely defined, measured, evaluated, computed, tested, verified, controlled, and sustained in the field.
Since there is always uncertainty about the future performance of a product, the future performance of a product is a random variable, and the mathematical theory of probability can be used to qualify the uncertainty about the future performance of a product. Probability can be estimated using statistics, and thus reliability needs both probability and statistics. Phrases such as “perform satisfactorily” and “function normally” suggest that a product must function within certain performance limits in order to be reliable. Phrases such as “under specified operating conditions” and “when used according to specified conditions” imply that reliability is dependent upon the environmental and application conditions in which a product is used. Finally, the terms “given period of time” and “expected lifetime” suggest that a product must properly function for a certain period of time.
In this book, reliability is defined as follows:
Reliability is the ability of a product or system to perform as intended (i.e., without failure and within specified performance limits) for a specified time, in its life cycle conditions.
This definition encompasses the key concepts necessary for designing, assessing, and managing product reliability. This definition will now be analyzed and discussed further.
When a product is purchased, there is an expectation that it will perform as intended. The intention is usually stated by the manufacturer of the product in the form of product specifications, datasheets, and operations documents. For example, the product specifications for a cellular phone inform the user that the cell phone will be able to place a call so long as the user follows the instructions and uses the product within the stated specifications.1 If, for some reason, the cell phone cannot place a call when turned on, it is regarded as not having the ability to perform as intended, or as having “failed” to perform as intended.
In some cases, a product might “work,” but do so poorly enough to be considered unreliable. For example, the cell phone may be able to place a call, but if the cell phone speaker distorts the conversation and inhibits understandable communication, then the phone will be considered unreliable. Or consider the signal problems reported for Apple's iPhone 4 in 2010. The metal bands on the sides of the iPhone 4 also acted as antennas for the device. Some users reported diminished signal quality when gripping the phone in their hands and covering the black strip on the lower left side of the phone. The controversy caused Apple to issue free protective cases for the iPhone 4 for a limited time to quell consumer complaints (Daniel Ionescu 2010).
When a product is purchased, it is expected that it will operate for a certain period of time.2 Generally, a manufacturer offers a warranty, which states the amount of time during which the product should not fail, and if it does fail, the customer is guaranteed a replacement. For a cell phone, the warranty period might be 6 months, but customer expectations might be 2 years or more. A manufacturer that only designs for the warranty can have many unhappy customers if the expectations are not met. For example, most customers expect their car to be able to operate at least 10 years with proper maintenance.
The reliability of a product depends on the conditions (environmental and usage loads) that are imposed on the product. These conditions arise throughout the life cycle of the product, including in manufacture, transport, storage, and operational use.3 If the conditions are severe enough, they can cause an immediate failure. For example, if we drop or sit on a cell phone, we may break the display. In some cases, the conditions may only cause a weakening of the product, such as a loosening of a screw, the initiation of a crack, or an increase in electrical resistance. However, with subsequent conditions (loads), this may result in the product not functioning as intended. For example, the product falls apart due to a missing screw, causing a connection to separate; cracking results in the separation of joined parts; and a change in electrical resistance causes a switch to operate intermittently or a button to fail to send a signal.
Reliability is a relative measure of the performance of a product. In particular, it is relative to the following:
Furthermore, the reliability of a product will be dependent, as a probability, on the following:
Many organizations have a document called “Failure Definitions and Scoring Criteria.” Such a document delineates how each incident or call for attention in a product will be handled with regard to reliability, maintainability, or safety.
For consumer products, quality has been traditionally associated with customer satisfaction or happiness. This interpretation of quality focuses on the total value or the utility that the customer derives from the product. This concept has also been used by the U.S. Department of Defense, focusing on system effectiveness as the overall ability of a product to accomplish its mission under specified operating conditions.
There are various characteristics (e.g., engineering, technological, psychological, cost, and delivery) that impact customer satisfaction. Thus, quality (Q) may be modeled as:
where xi is the ith characteristic (i = 1, 2, … , n, …).
These qualities will impact the overall value perceived by the customer, as shown in Figure 1.3. In the beginning, we have ideal or target values of the characteristics x1, x2, … , xi, … , xn, … These values result in some measure of customer satisfaction. With time, changes in these qualities will impact customer satisfaction. Reliability as a “time-oriented” quality impacts customer satisfaction.
The undesirable and uncontrollable factors that cause a functional characteristic to deviate from its target value are called noise factors. Some examples of noise factors are:
A reliable product must be robust over time, as demonstrated in Figure 1.4.
Performance is usually associated with the functionality of a product—what the product can do and how well it can do it. For example, the functionality of a camera involves taking pictures. How well it can take pictures and the quality of the pictures involves performance parameters such as pixel density, color clarity, contrast, and shutter speed.
Performance is related to the question, “How well does a product work?” For example, for a race car, speed and handling are key performance requirements. The car will not win a race if its speed is not fast enough. Of course, the car must finish the race, and needs sufficiently high reliability to finish the race. After the race, the car can be maintained and even replaced, but winning is everything.4
For commercial aircraft, the safe transportation of humans is the primary concern. To achieve the necessary safety, the airplane must be reliable, even if its speed is not the fastest. In fact, other than cost, reliability is the driving force for most commercial aircraft design and maintenance decisions, and is generally more important than performance parameters, which may be sacrificed to achieve the required reliability.
Improving the performance of products usually requires adding technology and complexity. This can make the required reliability more difficult to achieve.
Quality is associated with the workmanship of the product. For example, the quality metrics of a camera might include defects in its appearance or operation, and the camera's ability to meet the specified performance parameters when the customer first receives the product. Quality defects can result in premature failures of the product.
Reliability is associated with the ability of a product to perform as intended (i.e., without failure and within specified performance limits) for a specified time in its life cycle. In the case of the camera, the customer expects the camera to operate properly for some specified period of time beyond its purchase, which usually depends on the purpose and cost of the camera. A low-cost, throwaway camera may be used just to take one set of pictures. A professional camera may be expected to last (be reliable) for decades, if properly maintained.
“To measure quality, we make a judgment about a product today. To measure reliability, we make judgments about what the product will be like in the future” (Condra 2001). Quality in this way of thinking is associated primarily with manufacturing, and reliability is associated mostly with design and product operation. Figure 1.5 shows the role of quality and reliability in product development.
Product quality can impact product reliability. For example, if the material strength of a product is decreased due to defects, the product reliability may also be decreased, because lower than expected life-cycle conditions could cause failures. On the other hand, a high-quality product may not be reliable, even though it conforms to workmanship specifications. For example, a product may be unable to withstand environmental or operational conditions over time due to the poor selection of materials, even though the materials meet workmanship specifications. It is also possible that the workmanship specifications were not properly selected for the usage requirements.
Reliability activities should span the entire life cycle of the system. Figure 1.6 shows the major points of reliability practices and activities for the life cycle of a typical system. The activities presented in Figure 1.6 are briefly explained in the following sections.
There is always a risk of a product failing in the field. For some products, the consequences of failure can be minor, while for others, it can be catastrophic. Possible consequences include financial loss, personal injury, and various intangible costs. Under U.S. law, consequences of product failure may also include civil financial penalties levied by the courts and penalties under statutes, such as the Consumer Product Safety Act, building codes, and state laws. These penalties can include personal sanctions such as removal of professional licenses, fines, and jail sentences.
When a product fails, there is often a loss of service, a cost of repair or replacement, and a loss of goodwill with the customer, all of which either directly or indirectly involve some form of financial loss. Costs can come in the form of losses in market share due to damaged consumer confidence, increases in insurance rates, warranty claims, or claims for damages resulting from personal injury. If negative press follows a failure, a company's stock price or credit rating can also be affected.
Often, costs are not simple to predict. For example, a warranty claim may include not only the cost of replacement parts, but also the service infrastructure that must be maintained in order to handle failures (Dummer et al. 1997). Repair staff must be trained to respond to failures. Spare parts may be required, which increases inventory levels. Service stations must be maintained in order to handle product repairs.
As an example of a financial loss, in July 2000, a month after the release of its new 1.13 GHz Pentium III microprocessors, Intel was forced to make a recall (Jayant 2000). The chips had a hardware glitch that caused computers to freeze or crash under certain conditions. Although fewer than 10,000 units were affected, the recall was an embarrassment and Intel's reputation was called into question at a time when competition in the microprocessor market was fierce.
In January 2011, Intel discovered a design flaw in its 6 Series Cougar Point support chips. Intel found that some of the connection ports in those chipsets could degrade over time and interrupt the flow of data from disk drives and DVD drives. By the time it discovered this problem, Intel had already shipped over 8 million defective chips to customers. As a result, Intel expected its revenue for the first quarter of 2011 to be cut by $300 million, and expected to spend $700 million for repair and replacement of the affected chips. This problem was the costliest in Intel's history and affected products from top manufacturers, including Dell, Hewlett-Packard, and Samsung (Tibken 2011).
Another example was problematic graphics processing units that were made by Nvidia. Customers began observing and reporting intermittent failures in their computers to companies such as Hewlett-Packard, Toshiba, and Dell. However, the absence of an effective reliability process caused a delay in understanding the problems, the failure mechanisms, the root causes, and the available corrective actions. These delays resulted in the continued production and sale of defective units, ineffective solutions, consumer and securities lawsuits, and costs to Nvidia of at least $397 million.6
In December 2011, Honda announced a recall of over 300,000 vehicles due to a defect in the driver's airbag. This was the latest in a series of recalls that had taken place in November 2008, June 2009, and April 2011, and involved nearly 1 million vehicles. The defective airbags were recalled because they could deploy with too much pressure, possibly endangering the driver (Udy 2011).
Between 2009 and 2011, Toyota had a string of recalls totaling 14 million vehicles. The problems included steering problems and the highly publicized sudden acceleration problem. In 2010 alone, Toyota paid three fines totaling $48.8 million. As a result of these safety concerns and damage to its reputation, Toyota had the lowest growth of the major automakers in the United States during 2010, growing 0.2 percent in a year when the U.S. auto market grew by 11.2 percent. Between July and September 2011, Toyota's profits declined 18.5 percent to around $1 billion (Foster 2011; Roland 2010a). In November 2011, Toyota recalled 550,000 vehicles worldwide due to possible steering problems caused by misaligned rings in the vehicles' engines.
The cost of failure also often includes financial losses for the customer incurred as a result of failed equipment not being in operation. For some products, this cost may greatly exceed the actual cost of replacing or repairing the equipment. Some examples are provided in Table 1.1 (Washington Post 1999).
Table 1.1 Cost of lost service due to a product failure
Type of business | Average hourly impact |
---|---|
Retail brokerage | $6,450,000 |
Credit card sales authorization | $2,600,000 |
Home shopping channels | $113,750 |
Catalog sales center | $90,000 |
Airline reservation centers | $89,500 |
Cellular service activation | $41,000 |
Package shipping service | $28,250 |
Online network connect fees | $22,250 |
ATM service fees | $14,500 |
The National Society of Professional Engineers notes that “Engineers, in the fulfillment of their professional duties, shall hold paramount the safety, health, and welfare of the public” (National Society of Professional Engineers 1964). In many cases, public health, safety, and welfare are directly related to reliability.
On July 17, 1981, the second- and fourth-floor suspended walkways within the atrium of the Kansas City Hyatt Regency Hotel collapsed. This was the single largest structural disaster in terms of loss of life in U.S. history at that time. The hotel had only been open for a year. The structural connections supporting the ceiling rods that supported the walkways across the atrium failed and both walkways collapsed onto the crowded first-floor atrium below. One hundred fourteen people were killed, and over 200 were injured. Millions of dollars in damages resulted from the collapse (University of Utah, Mechanical Engineering Department 1981). The accident occurred due to improper design of the walkway supports: the connections between the hanger rods and the main-carrying box beams of the walkways failed. Two errors contributed to the deficiency: a serious error in the original design of the connections, and a change in the hanger rod arrangement during construction, which doubled the load on the connection.
Another significant failure occurred on April 28, 1988, when a major portion of the upper crown skin of the fuselage of a 19-year-old Aloha Airlines 737 blew open at 24,000 ft. The structure separated in flight, causing an explosive decompression of the cabin that killed a flight attendant and injured eight other people. The airplane was determined to be damaged beyond repair. The National Transportation Security Board (NTSB), which investigated the Aloha accident, concluded the jet's roof and walls tore off in flight because there were multiple fatigue cracks in the jet's skin that had not been observed in maintenance. The cracks developed because the lap joints, which connect two overlapping metal sheets of the fuselage and were supposed to hold the fuselage together, corroded and failed (Stoller 2001).
In September 2011, the Federal Aviation Administration (FAA) fined Aviation Technical Services Inc. (ATS), a maintenance provider for Southwest Airlines, $1.1 million for making improper repairs to 44 Southwest Boeing 737-300 jetliners. The FAA had provided directives for finding and repairing fatigue cracks in the fuselage skins of the planes. The FAA alleged that ATS failed to properly install fasteners in all the rivet holes of the fuselage skins. In April 2011, a 5-ft hole was torn in the fuselage of a Southwest 737-300 in midflight at 34,000 ft. The pilot was able to make an emergency landing in Arizona, and none of the 122 people on board were seriously injured. While this plane was not among the ones repaired by ATS, this near-disaster highlighted the need for correct maintenance practices. After the incident, Southwest inspected 79 other Boeing 737s and found that five of them had fuselage cracks requiring repairs (Carey 2011).
On July 23, 2011, a high-speed train collided with a stalled train near the city of Wenzhou in southeastern China. It was reported that 40 people were killed and nearly 200 wounded. When he visited the scene of the accident, Chinese Premier Wen Jiabao said, “The high-speed railway development should integrate speed, quality, efficiency and safety. And safety should be in the first place. Without safety, high-speed trains will lose their credibility” (Dean et al. 2011).
There are a number of legal risks associated with product reliability and failure. A company can be sued for damages resulting from failures. A company can also be sued if they did not warn users of defects or reliability problems. In extreme cases of negligence, criminal charges can be brought in addition to civil damages.
Most states in the United States operate on the theory of strict liability. Under this law, a company is liable for damages resulting from a defect for no reason other than that one exists, and a plaintiff does not need to prove any form of negligence to win their case. Companies have a duty to exercise “ordinary and reasonable care” to make their products safe and reliable. If a plaintiff can prove that a defect or risk existed with a product, that this defect or risk caused an injury, that this defect or risk was foreseeable, and that the company broke their duty of care, damages can be assessed. A defect, for legal purposes, can include manufacturing flaws, design oversights, or inadequacies in the documentation accompanying a product. Thus, almost every job performed by a designer or an engineer can be subjected to legal scrutiny.
An example of failure resulting in legal liability occurred with 22 million Ford vehicles built between 1983 and 1995 that had defective thick film ignition (TFI) modules. The TFI module was the electronic control in the ignition system that controlled the spark in the internal combustion process. Defects in the TFI could cause vehicles to stall and die on the highway at any time. Failure at highway speeds could cause the driver to lose control or result in a stalled vehicle being hit by another vehicle. In October 2001, Ford agreed to the largest automotive class-action settlement in history, promising to reimburse drivers for the faulty ignition modules. The settlement was estimated to have cost Ford as much as $2.7 billion (Castelli et al. 2003).
In 1999, Toshiba was sued for selling defective laptop computers (Pasztor and Landers 1999). More than five million laptops were built with a defective floppy disk drive controller chip that would randomly corrupt data without warning. Toshiba agreed to a $2.1 billion settlement to prevent the case from going to trial, as Toshiba felt that a verdict as high as $9 billion might have been imposed.
Another example of liability occurred with Toyota's vehicles. Toyota had a host of recalls in 2010, and it was required to pay over $32 million in fines because of the late timing of the recalls (Roland 2010b).
Depending on the expectations that customers have for a product, relations with customers can be greatly damaged when they experience a product failure. Failures can also damage the general reputation of a company. A reputation for poor reliability can discourage repeat and potential future customers from buying a product, even if the causes of past failures have been corrected.
In some cases, the effects of a lack of reliability can hurt the national psyche, for example, failures in space, military, and transportation applications. The higher the profile of a failure event, the greater the effect is on society. Failures that affect public health and the environment can also create discontent with government and regulatory bodies.
The rapid pace of technological developments and the globalization of supply chains have made customers dependent upon worldwide suppliers who provide parts (materials), subassemblies, and final products. When customers have to wait until they receive their parts, subassemblies, or products to assess if they are reliable, this can be an expensive iterative process. An upfront evaluation of suppliers is a beneficial alternative. Measuring the reliability capability of a supplier yields important information about the likelihood that a reliable product can be produced (Tiku et al. 2007). Reliability capability can be defined as follows:
Reliability capability is a measure of the practices within an organization that contribute to the reliability of the final product, and the effectiveness of these practices in meeting the reliability requirements of customers.
To obtain optimal reliability and mutually beneficial results, suppliers and customers in the supply chain should cooperate. The IEEE Reliability Program Standard 1332 (IEEE Standards Project Editors 1998) identifies three reliability objectives between suppliers and customers:
Reliability pertains to the ability of a product to perform without failure and within specified performance limits for a specified time in its life-cycle application conditions. Performance and quality are related to reliability. Performance parameters typically describe the functional capabilities of a product. Quality parameters are commonly used to assess the manufacturing goodness and the ability of a product to work when first received by the customer.
Reliability engineering deals with preventing, assessing, and managing failures. The tools of reliability engineers include statistics, probability theory, and many fields of engineering and the sciences related to the problem domain.
1.1 Pick an example product and explain the differences between performance, quality, and reliability. Select a datasheet for a product and check what is listed in terms of performance, quality, and reliability. Document your observations.
1.2 Identify the reliability metrics provided in the specification sheets of a part or product. Discuss the relevance of these metrics.
1.3 Find an example of an actual product failure. Why did it occur? What was the root cause of the failure? What were the consequences? Can you put a value (e.g., time and money) on the consequences?
1.4 In some situations, the definition of failure may depend on both the performance specifications and expectations. Can you think of a past experience where you considered a product to have failed but it may not have been considered a failure according to the product specifications? Describe the situation. If you cannot think of a situation, report a hypothetical case.
1.5 Prepare a one-page brief on the “engineer's responsibility” laws and statutes of your country. If your country is the United States, choose another country's laws to report on.
1.6 Once a product leaves the manufacturer, it will be used in many different applications; some may not be for what the product was designed. From whom should product reliability/failure information be gathered? How should the many sources of reliability information be weighted?
1.7 In Section 1.6, four consequences of failure were introduced. Which of these do you think is most important? Why?
3.144.16.254