A propagated failure with global effect (PFGE) that originates from a system component causes the failure of the entire system [1]. As one type of common‐cause failures (CCFs), PFGEs have been investigated intensively in literature (see, e.g. [2–6]). Examples of causes for PFGEs include imperfect coverage (IPC) and destructive effects. Specifically, as discussed in Chapter 3, due to the IPC, a component fault, if not being detected or located successfully by the system recovery mechanism, may propagate and cause an overall system failure even when adequate redundancy remains. Certain types of failures originating from a system component can cause destructive effects on other components, for example, fire, explosion, overheating, blackout, or short circuit may incapacitate or destroy all other system components, causing the failure of the entire system.
However, it is not necessarily always the truth that a PFGE causes the entire system failure, particularly for systems undergoing the Functional DEPendence (FDEP) behavior. As described in Chapter 5 , with the FDEP, a trigger event, upon occurring, can isolate the corresponding dependent components (making them unusable or inaccessible) deterministically. Due to this isolation effect, a PFGE originating from a dependent component can thus be isolated without affecting other portions of the system. For example, in a clustered wireless sensor network (WSN) system, sensor nodes within a cluster are accessed through their cluster head [7]. In other words, these sensor nodes have FDEP on the cluster head. If the cluster head fails, PFGEs originating from any of the sensor nodes within the cluster can be isolated from the rest of the WSN system. Note that the failure isolation effect can take place only when the trigger event occurs before the occurrence of any PFGE originating from the corresponding dependent components. On the other hand, if any PFGE from a dependent component occurs before the trigger event happens, the global failure propagation effect takes place causing an entire system failure.
In summary, competitions exist in the time domain between the failure isolation effect and the failure propagation effect; different occurrence sequences lead to different system statuses. In this chapter, a separable method for handling PFGEs in system reliability analysis is first discussed. Based on this approach, methods are then presented for addressing the competing effects in the reliability analysis of different types of nonrepairable systems, including single‐phase system with single FDEP group, single‐phase system with multiple dependent FDEP groups, single‐phase system subject to propagated failures (PFs) with global and selective effects, multi‐phase system (or phased‐mission system, PMS) with single FDEP group, and PMS with multiple independent or dependent FDEP groups.
Til and Tip are random variables respectively representing the time‐to‐local‐failure and the time‐to‐PFGE of a system component i. fil(t) and fip(t) represent the probability density function (pdf) of Til and Tip, respectively. qil(t) and qip(t) are unconditional local and propagated failure probabilities of component i at time t, respectively. Thus, and .
According to the simple and efficient algorithm (SEA) in Section 3.3.2 [ 2 ,4], the system unreliability can be evaluated based on the total probability law as:
with Pu(t) being defined and computed as
Q(t) in 8.1 is defined as a conditional system failure probability given that no PFGEs take place during the considered mission time. The evaluation of Q(t) requires no consideration of effects from PFGEs and thus can be performed using any approaches ignoring PFGEs, e.g. the binary decision diagram (BDD)–based methods [ 4 , 7 ] for single‐phase systems (Section 2.4) and PMSs (Section 3.5.3).
As in the SEA method, the evaluation of Q(t) requires the calculation of a conditional component failure probability qi(t) given that no PFGEs occur to the component. The evaluation method is illustrated for different statistical relationships between the local failure (LF) and PFGE in the following sections.
When the LF and PFGE of the same component are s‐independent, the conditional component failure probability is evaluated as
When the LF and PFGE of the same component are s‐dependent, the conditional component failure probability is evaluated as
where,
When the LF and PFGE of the same component are disjoint or mutually exclusive, the conditional component failure probability is evaluated as
Based on the PFGE method, a combinatorial methodology is discussed in this section for analyzing reliability of a single‐phase system subject to competing failures involved in a single FDEP group or multiple independent (nonoverlapped) FDEP groups. The method is applicable to any arbitrary ttf distributions for the system components.
Given that the trigger component(s) can only experience LFs. The method contains the following three steps:
Note that in the case of the trigger component being subject to PFGEs in addition to the LF, the PFGE method presented in Section 8.2 should be applied to separate the global failure propagation effect originating from the trigger component before Step 1. qAl(t) in 8.7 should be, respectively, replaced with qA(t) evaluated using 8.3, 8.4, or 8.6 when the LF and PFGE of the trigger component are independent, dependent, or disjoint. Accordingly, the pdf of time‐to‐LF of trigger component A involved in 8.11, i.e. fAl(τ2) should be evaluated as dqA(t)/dt.
where
In general, for n components with their ttf r.v.s represented by X1, …, Xn, the probability of their sequential failures is evaluated as [8]:
Thus, 8.8 can be evaluated as
where
By definition, in the case of the dependent components undergoing no PFGEs, P(E2) = 0.
The LF probability and the PFGE probability are, respectively:
When the LF and PFGE of component B or C are s‐dependent, conditional PGFE failure rates conditioned on occurrence or nonoccurrence of an LF (λBp | l, , λCp | l, ) are given. Two types of dependencies can be modeled [11]: positive dependence takes place if the LF of a component causes an increased tendency of the component's PFGE (thus, e.g. λBp | l>); negative dependence takes place if the LF of a component causes a reduced tendency of the component's PFGE (thus, e.g. λBp | l < ).
Input Parameters. The following values are used in the illustrative analysis: λAl = 0.0001/hr, λBl = λCl = λDl = 0.0002/hr. For component B or C, two sets of parameters are considered. If the LF and PFGE are s‐independent or disjoint, λBp = λCp = 0.00001/hr; if the LF and PFGE are s‐dependent, λBp | l=λCp | l=0.00003/hr and /hr (positive dependence).
Example Analysis. The s‐independent case is used to illustrate the combinatorial method in detail.
Further, based on 8.12,
P(system fails|E1): under E1, no failure isolation effect takes place. Figure 8.3 shows the FT after removing the FDEP gate and its trigger component A. Based on the FT in Figure 8.3 , P(system fails|E1) is evaluated using the PFGE method (Section 8.2 ) as follows.
According to 8.1,
where, based on 8.2,
To evaluate Q(t) in 8.23, component conditional failure probabilities are computed. For the s‐independent case, 8.3 is adopted for the computation, that is,
Figure 8.4 shows the BDD model generated from the FT in Figure 8.3 for Q(t) in 8.23.
Evaluating the BDD of Figure 8.4 using the component conditional failure probabilities computed using 8.25, Q(t) is obtained as
Based on 8.23, 8.24, and 8.26 are integrated to obtain 8.27.
Under E2, since the global failure propagation effect takes place. Therefore,
Under E3, the failure isolation effect takes place. Figure 8.5 shows the reduced FT generated for evaluating P(system fails|E3). Thus,
In the case of LF and PFGE of component B or C being s‐dependent or disjoint, the combinatorial method presented in Section 8.3.1 can be similarly applied to derive the system unreliability.
Using the given parameter values, the unreliability of the example memory subsystem under the three cases is summarized in Table 8.1. The system unreliability in the s‐independent case is lower than that in the disjoint case. This is because that for the same component parameter values, the component reliability in the s‐independent case (calculated as [1 − qil(t)]·[1 − qip(t)]) is higher than that in the disjoint case (calculated as 1 − qil(t) − qip(t)). The system unreliability in the s‐dependent case is higher than that in the s‐independent or disjoint case due to the positive dependence assumed in the example input parameters.
Table 8.1 Unreliability of the example memory sub‐system.
Mission time t (hrs) | 1000 | 5000 | 10 000 |
s‐dependent | 0.0943 | 0.6417 | 0.8949 |
s‐independent | 0.0889 | 0.6128 | 0.8757 |
disjoint | 0.0894 | 0.6207 | 0.8799 |
This section considers the reliability analysis of a single‐phase system subject to competing failures involved in multiple dependent FDEP groups. The method is applicable to any arbitrary ttf distributions for the system components.
The combinatorial method contains the following three steps [12]:
P(system failure|CTEi)P(CTEi) can be evaluated as
where P(system failure|E2,i) = 1 due to the global failure propagation effect.
Under E1,i, all PFGEs from dependent components either do not happen or are isolated. A reduced FT is generated for evaluating P(system failure|E1,i) in 8.33. Under each considered CTEi, the trigger event and corresponding FEDP gate are first removed from the original system FT. If a trigger event occurs, then events of the corresponding dependent components are replaced with constant 1 (TRUE); otherwise, events of the dependent components remain in the FT. Boolean algebra rules are then applied to simplify the FT. The reduced FT generated can be evaluated using the BDD method [13] to find P(system failure|E1,i).
Note that the above three‐step procedure does not address PFGEs from nondependent components. In the case of nondependent components undergoing PFGEs, a pre‐processing step 0 described below is applied based on the PFGE method (Section 8.2 ):
Input Parameters. The exponential distribution is assumed for this illustrative example. The pdf and cdf of the exponential distribution with failure rate λ are given in 8.35.
The three MCs undergo both LFs and PFGEs with constant rates given in Table 8.2. The LF and PFGE of the same MC are s‐independent. The two MIUs only experience LFs with rates also given in Table 8.2 .
Table 8.2 Failure rates of the example memory system components (/hr).
Component | PFGE rate | LF rate |
MCi | 0.00005 | 0.0002 |
MIUi | 0 | 0.0001 |
Example Analysis. The unreliability of the example memory system at time t = 1000 hours is analyzed using the method of Section 8.4.1 as follows.
Under CTE0, if any PFGE from a dependent component occurs, E2,0 takes place. So,
Thus,
Figure 8.7 shows the reduced FT for evaluating P(system failure | E1,0). Figure 8.8 shows the BDD model generated from the FT. The evaluation of the BDD model in Figure 8.8 gives
Under CTE1, only MIU2 fails locally. Thus,
Under CTE1, if any PFGE from MC1 or MC2 happens, or PFGE from MC3 happens before MIU2 fails, then event E2,1 takes place. So
The evaluation of P(E2,1) involves an sequential event, which can be evaluated using 8.10. With P(E2,1), one obtain P(E1,1) = P(CTE1) − P(E2,1) = 0.07603.
Figure 8.9 shows the reduced FT for evaluating P(system failure | E1,1). Figure 8.10 shows the BDD model generated from the FT. The evaluation of the BDD model in Figure 8.10 gives P(system failure | E1,1) = 0.32968.
Under CTE2, only MIU1 fails locally. Thus,
Under CTE2, if any PFGE from MC2 or MC3 happens, or PFGE from MC1 happens before MIU1 fails, then event E2,2 takes place. So,
Thus, P(E1,2) = P(CTE2) − P(E2,2) = 0.07603.
Figure 8.11 shows the reduced FT for evaluating P(system failure | E1,2). Figure 8.12 shows the BDD model generated from the FT. The evaluation of the BDD model in Figure 8.12 gives P(system failure | E1,2) = 0.32968.
Under CTE3, both trigger components fail. Thus,
Under CTE3, if at least one PFGE from the three MCs happens before the corresponding trigger component fails, then event E2,3 takes place. So,
Thus,
When both of the trigger components fail, the entire system fails. Therefore P(system failure | E1,3) = 1.
A PF that originates from a system component causes extensive damages to the rest of the system. A PFGE occurs when the PF causes the entire system to fail. There also exist a propagated failure with selective effect (PFSE), which takes place when the PF causes failure of only a subset of system components. This section presents a combinatorial reliability analysis method for single‐phase systems subject to competing failures considering both global and selective propagation effects [14].
The combinatorial reliability analysis method can be described as a seven‐step procedure:
Based on the total probability law, the system unreliability is evaluated as
where Pu(t) = P(no PFGEs) and Q(t) = P(system fails|no PFGEs). While the PFGEs are separated from the solution combinatorics via the PFGE method, the PFSEs have to be addressed in the evaluation of Q(t) as follows.
Given that up to m independent PFSEs may occur when the trigger component functions, an event space with 2m events (denoted by SEi) is constructed, each being a combination of occurrence or nonoccurrence of these m PFSEs. Based on the total probability law, Q(t) in 8.42 is computed as
where P(system fails |SEi) can be obtained through the BDD‐based evaluation of a reduced FT. The reduced FT is generated by the following procedure:
Based on the total probability law, P(system fails|E2)P(E2) is calculated as
where P(E1) is computed at step 2, P(system fails|E1) is evaluated at step 3, P(Case a) is computed at step 5, and P(system fails ∩ Case b) is evaluated at step 6.
The above seven‐step procedure assumes that any nondependent components (including the trigger component) only undergo LFs. In the following, the procedure is extended to consider (1) PFGEs or (2) PFSEs or (3) both types of PFs for nondependent components.
where P'u(t) = P(no PFGEs from nondependent components), Q'(t) = P(system fails|no PFGEs from nondependent components). Q'(t) is then evaluated using the seven‐step procedure.
To evaluate P(system fails|SE8), a reduced FT is generated by replacing events of components affected by SE8 (including B, C, D) with constant 1 in the FT of Figure 8.15 , which leads to an FT containing failure events of active components {E} as shown in Figure 8.17. The evaluation of reduced FT gives
Using the similar procedure, all of P(SEi) and P(system fails|SEi) (i = 0, …, 15) can be evaluated. According to 8.43,
With Pu(t) and Q(t) being evaluated, according to 8.42, P(system fails|E1) is obtained as
Table 8.5 Event space for addressing PFSEs.
i | Definition of SEi' | Set of active components |
0 | No PFs happen before Al occurs | {E} |
1 | Only Bps1 occurs before Al occurs | {E} |
2 | Only Bps2 occurs before Al occurs | ∅ |
3 | Only Cps1 occurs before Al occurs | {E} |
4 | Only Cps2 occurs before Al occurs | ∅ |
5 | Only Bps1 and Bps2 occur before Al occurs | ∅ |
6 | Only Bps1 and Cps1 occur before Al occurs | {E} |
7 | Only Bps1 and Cps2 occur before Al occurs | ∅ |
8 | Only Bps2 and Cps1 occur before Al occurs | ∅ |
9 | Only Bps2 and Cps2 occur before Al occurs | ∅ |
10 | Only Cps1 and Cps2 occur before Al occurs | ∅ |
11 | Only Bps1, Bps2 and Cps1 occur before Al occurs | ∅ |
12 | Only Bps1, Bps2 and Cps2 occur before Al occurs | ∅ |
13 | Only Bps1, Cps1 and Cps2 occur before Al occurs | ∅ |
14 | Only Bps2, Cps1 and Cps2 occur before Al occurs | ∅ |
15 | Bps1, Bps2, Cps1 and Cps2 all occur before Al occurs | ∅ |
Next the evaluation of P(SEi′) and P(system fails | SEi′) is illustrated using SE0′ and SE1′.
Under SE0′, no PF occurs before Al occurs. Thus,
where, according to 8.45
Thus,
The reduced FT for evaluating P(system fails | SE0′) is same as that in Figure 8.17 . Thus, .
Under SE1′, only Bps1 occurs before Al occurs. Thus,
The reduced FT for evaluating P(system fails | SE1′) is same as that in Figure 8.17 . Thus, .
Using the similar procedure, all of P(SEi′) and P(system fails | SEi′) (i = 0, …, 15) can be evaluated. P(system fails ∩ Case b) can thus be evaluated using 8.46. Then, according to 8.44, one obtains
Previous sections focus on single‐phase systems. However, many real‐world systems are PMSs, involving multiple, consecutive, and nonoverlapping phases of operations or tasks. Consideration of competing failures in PMSs is a challenging task because PMSs exhibit dynamics in system configuration and component behavior, as well as statistical dependencies across phases for a given component.
This section presents a combinatorial method to address the competing failure effects in reliability analysis of nonrepairable binary‐state PMSs, where only one mission phase is subject to the FDEP behavior. As an example of such a PMS, a set of computers work together to accomplish an M‐phase mission task. In M − 1 of these phases, only local computing is needed (no FDEPs are involved), while in one of the phases, some computers need to access the Internet to access external data. Thus, in this particular phase these computers have FDEP on the router.
The phase with FDEP is referred to as an FDEP phase; other phases are referred to as non‐FDEP phases. All PFs have the global effect, i.e. only PFGEs are considered. The LF and PFGE of the same component are s‐independent. Also, failure events of different components are s‐independent. There is only one FDEP group existing in the system. Thus, the PMS considered undergoes no cascading failure propagation process.
The reliability analysis method for PMSs subject to competing failure isolation and propagation effects involves the following five‐step procedure [15]:
Let N denote the set of NDCs undergoing PFGEs. Pu(t) in 8.49 is evaluated as
As explained in Section 8.2 , the evaluation of Q(t) requires the use of a conditional LF probability for all the NDCs given that no PFGEs occur to the component during the mission. Since the LF and PFGE of the same component are s‐independent, 8.3 is applied to compute the conditional LF probability. The evaluation of Q(t) is conducted in steps 2–4.
Under E2, the global failure propagation effect takes place, causing the entire system failure. Thus, Q2C = 1.
The trigger LF occurs before any PFGEs from the DCs. Since under E3, all PFGEs from the DCs occur in the FDEP phase, the trigger LF occurs either in phases before the FDEP phase or in FDEP phase but before any PFGEs occurs from the DCs.
The trigger LF either occurs after any PFGE from the DCs occurs or does not occur at all. Under E3, if the trigger LF occurs, the failure can only occur either in phases after the FDEP phase or in the FDEP phase but after any PFGE from the DCs occurs.
Let Q3,iC (i = 1, 2) denote the conditional system failure probability, given that Case i occurs. Based on the total probability law, one obtains
where Q3,2C = 1 because the global failure propagation effect takes place under Case 2.
Assume the FDEP phase is phase m. To evaluate P(Case 1) Q3,1C, an event space with m events is constructed, with event i (i = 1, …, m – 1) representing the trigger component fails locally in phase i and E3 occurs, and event m representing the trigger component fails locally in phase m (FDEP phase) and before any PFGE from the DCs occurs in phase m. Based on these events, 8.55 can be evaluated as
The occurrence probabilities of the m events are
The sequential failure probability involved in 8.58 can be evaluated using integral formula 8.10, which can be solved using the MathCAD software [17].
in 8.56 can be computed by evaluating a reduced PMS FT, which is obtained through the following procedure:
The reduced FT generated is then evaluated using the PMS BDD method described in Section 3.5.3 to obtain .
P(Case 2) in 8.56 can be simply computed as
Input Parameters. Assume the phase durations for the three phases are independent of the system state and equal to 10, 30, and 20 hours, respectively. Therefore, the entire mission time is 60 hours.
All five components experience LFs in each of the three phases. Only computers A, B, and C can undergo PFGEs during the mission (e.g. due to computer viruses). Let and represent the conditional probability that component x fails locally and globally at phase i, respectively, given that the component has survived the previous phase. Their complements are represented as and , respectively.
For illustration, three types of ttf distributions are considered for evaluating and :
Table 8.6 Failure parameters for components A, B, and C.
Phases 1, 2, 3 | ||
LF | PFGE | |
A | λ = 1.5e − 4 | λ = 1e − 4 |
B | λ = 2e − 4 | λ = 1e − 4 |
C | λ = 2e − 4 | λ = 1e − 4 |
Table 8.7 Failure parameters for component E.
Phase 1 | Phase 2 | Phase 3 | |
λW | 2e − 4 | 1e − 4 | 1.5e − 4 |
αW | 2 | 2 | 2 |
Let and represent the unconditional probability that component x fails locally and globally by the end of phase i, respectively. and can be evaluated as
Their complements are represented as and , respectively, and can be evaluated as
Example Analysis. Applying the five‐step procedure in Section 8.6.1, the unreliability of the example PMS is analyzed as follows:
The evaluation of Q(t) in 8.49 requires the use of a conditional LF probability for A given that no PFGEs occur to A during the mission, which is evaluated based on 8.3.
There are two DCs: B and C. Both of them can experience PFGEs. Thus, the three events are defined as:
Under E2, the global failure propagation effect from B or C occurs causing the PMS failure. Thus, Q2C = 1.
Under Case 1, the following two events are defined: event 1 – trigger A fails locally in phase 1 and E3 happens, event 2 – A fails locally in phase 2 and before B or C fails globally in phase 2. P(event 1) is evaluated as
To evaluate in 8.56, a reduced FT is generated in Figure 8.21.
Figure 8.22 shows the PMS BDD generated in the PMS BDD method using the order of Bl1 < Cl1 < Dl3 < Dl1 < El3 < El2. The evaluation of the PMS BDD model gives
Let T2 represent the duration of phase 2. P(event 2) is evaluated as
The reduced FT and PMS BDD under event 2 are the same as those under event 1. Thus, .
According to 8.59, P(Case 2) in 8.56 is calculated as
According to 8.56, one obtains
This section presents a continuous time Markov chain (CTMC)‐based method for modeling competing failure propagation and isolation effects in reliability analysis of PMSs with multiple FDEP groups [18]. The exponential ttf distribution is assumed for system components. The LF, PFGE, PFSE of the same component are s‐independent.
A trigger component failure in one phase, if occurring first, only makes dependent components belonging to the same FDEP group inaccessible in that phase; these dependent components are still available to use in other phases if they are accessible directly by the system function without involving the trigger component in those phases. Both PFGEs and PFSEs from the dependent component can be isolated by the trigger failure. An isolated PFGE or PFSE only affects the component itself. An isolated PFGE or PFSE in a previous phase may still propagate to other components in a later phase that does not involve operation of the related FDEP group.
In [19] a CTMC‐based method was developed for the reliability analysis of PMSs without FDEP and related competing failures. This section presents an extension of the CTMC‐based method for considering the competing failure effects in reliability analysis of PMSs with multiple FDEP groups.
The extended CTMC‐based method involves the following three‐step procedure:
Note that a component not appearing in a phase FT means that the LF of this component does not contribute to the mission failure in the phase. However, the PFGE or PFSE of the component can still affect the mission failure and thus should be considered for constructing the CTMC of the phase.
Repeat this step until the CTMC of all the mission phases are analyzed. The analysis of the final phase gives the final PMS unreliability. In particular, the failure state probability of the final phase is the failure probability of the entire PMS.
Two examples are presented to illustrate the CTMC‐based method. The PMS in Example 8.5 contains dependent FDEP groups in different phases with dependent components undergoing LFs and PFGEs. The PMS in Example 8.6 contains dependent FDEP groups with dependent components undergoing LFs, PFGEs, and PFSEs.
In systems subject to the FDEP behavior, there exist competitions in the time domain between the failure isolation effect (caused by the trigger failure) and the failure propagation effect (caused by the PFGE/PFSE of the dependent components). As different occurrence sequences of the trigger failure and the dependent component PFGE/PFSE can lead to different system statuses, it is significant to address the competing effects in the system reliability analysis. Combinatorial methods are presented for reliability analysis of single‐phase systems with a single FDEP group or multiple dependent FDEP groups, and for multi‐phase systems with a single FDEP group. A CTMC‐based method is discussed for addressing the competing effects in the reliability analysis of multi‐phase systems with multiple FDEP groups.
13.59.100.42