14.7 Examples of Fault Tree Analysis

14.7.1 Simple Example

The following examples show the fundamentals of FTA. We start with analyzing a simple cooling system flushing procedure. This procedure can also be analyzed using human reliability analysis (HRA) techniques, but we use FTA at this point. The procedure reads as follows:

Warning: cooling system must be below 100°F before draining.
1. Begin with the engine cold and ignition off.
2. Remove the radiator pressure cap.
Warning: ethylene glycol coolant is toxic and must be disposed of in an appropriate manner.
3. Open the petcock at the bottom of the radiator and drain the coolant into a bucket.
4. Close the petcock and fill the radiator with water.
5. Start the engine and turn the heater control to hot. Add cooling system cleaner and idle the engine for 30 min (or as per the instructions on container).
Warning: cooling system must be below 100°F before draining.
6. Stop the engine and allow it to cool for 5 min. Drain the system.
7. Close the petcock, fill the radiator with water, and let the engine idle for 5 min.
8. Open petcock and drain the water.
9. Repeat steps 6–8.
10. Close the petcock.
11. Fill cooling system with 50 : 50 mixture of water/nontoxic antifreeze/coolant.

The first step will be to determine the credible top events. In this case, it will be as follows:

Mechanic is burned.
Cooling flushing failed.

In fact, the “mechanic is burned” top event can be grouped under the “cooling flushing failed” top event because if the mechanic was burned then the task would fail in a sense.

From the procedure, we can identify several basic events. These are shown in Table 14.5, along with their credibility.

Table 14.5 Basic Events

Failure Description Credible failure
Engine not below 100°F before beginning flushing procedure In this failure, the mechanic begins the coolant draining process without ensuring the engine is cool enough This is a credible error. It happens all the time to professional as well as amateur mechanics. Since the system is under pressure a severe burn can occur
Failure to remove radiator cap As it says, the mechanic fails to remove the radiator cap This is not a credible error, unless we are modeling the fact that the mechanic does not do the task at all
Failure to drain radiator The mechanic fails to drain the radiator This is not a credible error, unless, again, we are modeling the fact that the mechanic does not do the task at all
Failure to close petcock valve This failure involves the mechanic not closing or incorrectly closing the petcock valve. This error can occur at least four times in the procedure This is a credible error and can lead to an environmental spill
Failure to add flushing agent The mechanic fails to add the flushing agent This is a credible error because the mechanic can get busy and forget where they are in the process
Failure to remove flushing agent The mechanic fails to remove the flushing agent This is a credible error because, once again, the mechanic can get busy and forget where they are in the process. OR a shift change occurs or job change over and the second mechanic does not know where in process they are. This happens in the airline industry every day
Failure to rinse engine The mechanic fails to rinse the remaining flushing agent from the engine This might not be a catastrophic error, once the engine is drained of the flushing agent. It depends on how corrosive the flushing agent is
Failure to fill engine with 50/50 nontoxic antifreeze mix The mechanic fails to fill the cooling system with the proper mixture, the right amount of coolant, or coolant at all This again is a credible error

Once the credibility of the failures is assessed, the next step is to construct the fault tree. Do not include errors that were deemed to be noncredible. The top event will be the cooling flush failed. All the basic events will be entry points into the tree. The tree is shown in Figure 14.3.

Figure 14.3 Fault tree analysis of coolant flushing task.

14.3

Notice that the fault tree was built only using an OR gate and the basic events. That is because only one of the credible failures can lead to the failure of the task.

Next is a model of a simple hardware system failure, one that most homeowners have experienced. That is of the sprinkler system failure. The top event is sprinkler system failure. Table 14.6 contains the credible failures that can lead to the top event.

Table 14.6 Credible Failures

Failure Description Credible failure
Sprinkler head failure 1 Sprinkler head fails because it wore out Yes, it is a credible failure
Sprinkler head failure 2 Sprinkler head fails because neighbor hits it with their lawn mower Yes, it is a credible failure
Sprinkler valve failure 1 Sprinkler valve wears out Yes, it is a credible failure
Sprinkler valve failure 2 Sprinkler valve breaks due to freezing Yes, this failure though is contingent on the system not being properly drained the fall before. So, we will model it in this manner
Sprinkler controller failure 1 The battery that backs up the memory fails, and after a power failure, the system has lost its mind Yes, it is a credible failure
Sprinkler controller failure 2 The sprinkler controller fails Yes, it is a credible failure
Sprinkler pipe failure 1 The sprinkler pipe breaks due to freezing Yes, this failure though is contingent on the system not being properly drained the fall before. So, again, we will model it in this manner
Sprinkler pipe failure 2 The sprinkler pipe breaks due to digging in the yard Yes, it is a credible failure

In Figure 14.4, we have constructed a partial fault tree from these failures. A full tree was not constructed to save space.

Figure 14.4 Partial fault tree of sprinkler system failure.

14.4

14.7.2 Modeling Success Using Fault Tree Analysis

One of the useful attributes of FTA is that it can also be used to model success paths as well as its more traditional use of modeling failure paths. For instance, say that a hiker wants to climb Mt. Everest. What must happen in order to do such a climb? Or say that someone wants to pass a certification examination or even complete a project on time. What does that person need to do to succeed in those goals? Obviously, there are many project management techniques that are great at modeling success paths for projects. Network diagrams are one example of a tool that can be used. However, FTA can also be used to model this type of process.

So, let us say our goal is to write a technical book that has 12 chapters and meet the contractual requirements of the publisher. As with modeling in the failure space, we need to develop the list of credible events that must occur to succeed. Table 14.7 lists these.

Table 14.7 Credible Events

Success Description Credible
Author 1 completes six chapters Author 1 completes the chapters assigned to him/her Yes, this has to occur to succeed
Author 2 completes six chapters Author 2 completes the chapters assigned to him/her Yes, this has to occur to succeed
Editor's changes are appropriate The Editor's changes must not change the technical content of the book and must be grammatically appropriate Yes, this has to occur to succeed
Artwork meets requirements The artwork has to meet the Publisher's requirements to be included Yes, this has to occur to succeed
Manuscript is formatted correctly Besides the book needing to meet technical and grammatical requirements, it also has to be formatted correctly Yes, this has to occur to succeed
Manuscript is submitted on time The manuscript has to be submitted on time to be accepted Yes, this has to occur to succeed

Though, in real life, an event tree is probably a better tool to model this with, we develop the model using a fault tree. Chapter 12 shows how this process can be modeled using an event tree. Figure 14.5 shows a fault tree for this process.

Figure 14.5 Fault tree for success model.

14.5

14.7.3 Fault Tree Analysis for Use in Accident Investigation

The following provides a description of an actual accident involving TAM Linhas Aéreas Flight 3054. An FTA will be constructed from the information provided.

On July 17, 2007, TAM Linhas Aéreas Flight 3054, an Airbus A320-233 aircraft left Salgado Filho International Airport in Porto Alegre, only to land in wet conditions and crash at Congonhas-São Paulo International Airport in São Paulo, Brazil. When the flight first touched down, it was raining, causing the plane to overrun the runway, cross a highly busy main road during rush hour traffic, and crash into the TAM Express warehouse, which happened to be next to a gas station that exploded with the force of the impact of the Airbus A320-233, #789 (4). With 187 people on board, and 12 people on the ground, there totaled 199 fatalities (5), causing this crash to be the highest in deaths of any Latin American aviation accident. Not only was it the most devastating in Latin America, but it was the worst Airbus A320 crash involving fatalities anywhere in the world (6).

Airbus A320-233 was registered as PR-MBK and had the manufacturer's serial number of 789. The A320-233 was powered by two International Aero V2527E-A5 engines. The A320-233, #789 was built in February 1998 and took its first flight in March 1998 and had its last flight in July 2007 (7). TAM Linhas Aéreas was the last of four companies to operate the A320-233, #789 in less than a decade. TAM Linhas Aéreas did not come into position of the A320-233, #789 until December 2006. Data collected from Flight International show that as of April 30, the A320-233, #789 had mounted up to 20,379 flying hours and 9313 cycles (7).

The aircraft was dispatched for the flight 3054 with a jammed thrust reverser, a braking device on the aircraft. According to TAM, the fault in the thrust reverser did not make the landing anymore dangerous and that the mechanical problem was not known at the time. It was later reported that the plane had trouble braking on the São Paulo runway on July 16, the day before the crash, indicating that they had prior knowledge that something was wrong with the braking system (7).

Once the aircraft touched down in São Paulo, the pilots were unable to slow the aircraft down at a normal rate. The aircraft was still traveling at approximately 90 knots toward the end of the runway. The aircraft took a hard left and overshot the runway where it cleared the major roadway since the runway was elevated, but eventually collided with the TAM Express building. Surveillance videos showed that the aircraft touched down at a normal speed and at a normal spot on the runway but the aircraft failed to properly slow down (8).

Authorities uncovered the flight data recorder, which contained information about what happened in the plane during flight. The data showed the following information. The thrusters had been in the climb position just before touchdown as the engines were being controlled by the computer system (4). An audio warning was given by the computer 2 seconds before touchdown, warning the pilots that they should manually take control of the throttle. When the aircraft touched down, it was found that one thruster was in the idle position, while the other was stuck in the climb position. In order for the spoilers to deploy and assist in slowing the aircraft down, both thrusters must be in the idle position. With different force being applied to each side, it created a force that caused the plane to veer off to the left uncontrollably (7).

Before the accident, the airport became under increased scrutiny due to a mid-air collision in September 2006. The airport was known to have safety issues regarding operations in the rain as well as runway characteristics for the traffic going through it. One of these characteristics involved the length of the runway (4). There are so many variables that can affect the landing distance of an airplane that the airport had failed to consider.

For example, if the aircraft's approach speed is 20 knots higher than normal, it will take the aircraft 25% longer to slow down. The runway had been seen as a problem before the incident, and in February 2007, a judge had actually banned flights using Fokker 100, Boeing 737-700s, and Boeing 737-800s stating that the runway needed to be an additional 1275 ft in order to operate safely. The A320 was not banned because the manufacture stated a shorter braking distance than the banned aircraft. However, the ban was quickly lifted as the airline industry stated that they would be inconveniencing thousands of passengers (6).

The root causes for the crash were that one of the reverse thrusters was known to be out before the flight, the runway was wet, and the runway should have been longer. While TAM claims that the thrusters should not have caused the crash, it is obvious that had the reverse thruster been functioning, the aircraft would have most likely been able to stop. Having grooves cut into the pavement to help reduce the risk of hydroplaning could have prevented the moisture on the runway and had the runway been longer, the aircraft would have had more time to stop (4).

Pilot training might have also contributed to the accident. Even though both pilots were very well trained and had plenty of experience. Commander Kleyber Aguiar Lima, from Porto Velho, was born on March 22, 1953, and worked for TAM from November 1987 to July 2007 and had over 14,000 flight hours in his career, and Commander Henrique Stephanini Di Sacco, from São Paulo, who was born on October 29, 1954, joined TAM in 2006 and also had over 14,000 flight hours in his career (5). They knew that one thruster was not functional and should have planned the landing as if neither thruster would work. They knew that the landing strip was short, they also knew that the strip was wet. The combination of the short landing strip and the wet landing strip along with the malfunctioning thrusters should have alerted the pilots to take a different course of action.

Precautions should be taken to improve traction during wet weather. This includes cutting grooves in the pavement to allow the water to flow off the runway increasing the traction when an aircraft lands. This airport just finished with major renovations on the landing strip, it should have been mandatory that the strip be 100% finished before being allowed for use.

Also, warnings from the government stating that the runway was much too short for larger airliners to land on were passed off way too quickly. Governmental rulings should be respected and with that the changes must be made no matter how necessary to ensure the safety of all the passengers on all the planes. The airport officials knew that the runways were too short to handle such large planes, and yet, they continued to allow those planes to fly to not disrupt the economy.

As with most aircraft crashes, there are several factors that lead to the crash. The following is a list of items that contributed to the crash. This list is then used to construct the fault tree that is depicted in Figure 14.6:

  • Runway was wet.
  • Rain in area.
  • No grooves cut in runway.
  • Thrust reverser broken.
  • Airline policy allowed aircraft to be flown with broken thrust reverser.
  • Runway too short.
  • Airport policy allowed larger planes to land on runway.
  • Experienced pilots had not had training in this situation.

Figure 14.6 Fault tree for TAM Linhas Aéreas Flight 3054.

14.6

The fault tree is very useful in showing how all the individual factors come together to cause the flight to crash. Is this all of them? No, there are some decision processes that the pilots went through, which are not shown. These can be better modeled using HRA techniques. These are discussed in Chapter 10.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.240.252