In order to produce a mature product, equipment manufacturers try their best to reach a reliability level that is higher than the one specified, and most of all, that is constant throughout its operation duration. Three situations can be encountered:
It is therefore essential to make all of the efforts necessary to avoid youth and aging failures. Youth failures can only be filtered out by specific (burn-in, run-in) tests conducted before delivery to the equipment manufacturer. As for aging failures, specific analyses or last resort tests can be proposed.
For a maximum reduction of the probability of these two dreaded scenarios, the following methodology is proposed, as illustrated in Figure 5.1. Each of the stages proposed in Figure 5.1 is defined in the following sections.
This stage involves the division of the product into various subsets based on the failure physics and architecture of the product for which reliability estimation is possible. This analysis generally leads to several levels, ranging from the highest, the product itself, to the lowest, a failure mechanism of a component (see Figure 5.2).
Among these various estimation methods, some are referred to as “classical”, in the sense that this estimation relies on the existing data (e.g. bill of materials + predicted reliability models), and the others are considered to be specific.
The allocation of the initial reliability involves the equidistribution of the reliability objective to each of the previously defined subsets. It takes two different forms, according to the type of maintenance undergone by the product.
Non-maintained products
The initial reliability is allocated using the objective survival function according to the following relation:
where n is the number of subsets.
Based on the estimations of reliability of the subsets referred to as “1st level”, the reliability objectives of the subsets referred to as “low level” can be reallocated.
Maintained products
The initial reliability is allocated using the following relation:
According to these principles, all of the subsets of the product have an allocated reliability.
Each operational subset must be classified according to Table 5.1.
Table 5.1. Classification of subsets
Estimation type | Estimation method |
Predicted reliability | FIDES/NPRD 2017 |
Feedback | PLP power process |
Manufacturer data | Reliability report |
Zero-failure demonstration | Weibull – GLL model |
Reliability tests | Weibull – GLL model |
Some methods for building product maturity have already been described. One of the conditions is having a realistic predicted reliability method. The term realistic means that it should be of the order of magnitude of the product reliability observed during its operation. The FIDES methodology was therefore chosen as it seemed to be the most appropriate.
Chapter 6 presents some other methods for the confirmation of product maturity. One of them involves the estimation of the observed product reliability.
It is therefore essential to make sure that these two estimations are homogeneous, meaning that there is no significant difference between them. This task is not as simple as it seems. Indeed, it is not sufficient to calculate the ratio between the predicted and operational failure rates in order to answer this question (Giraudeau 2017). What are the values of this ratio starting with which a difference may be judged as significant?
To propose a more rigorous method, the following example is considered.
Assume that coins are being manufactured. One of the requirements is that these coins are balanced, meaning that the probability of having heads or tails must be equal to 50%. This is impossible in practice and an uncertainty should be considered on this value, which is purely theoretical. Assume that an uncertainty of 0.1% is deemed satisfactory.
In order to verify this, a test must be conducted on a sample of coins that are representative of those being subsequently introduced. A coin is tossed several times, and the result is illustrated as follows.
We can note that the number of heads and tails is not 50% each, as the results show 56% tails. Should we conclude that the coin is not well balanced?
To attempt to answer this, let us toss a coin several times once again. The results are illustrated in Figure 5.3.
The following can be noted:
Indeed, it is clear that a hypothesis must be formulated and tested with respect to the available data. For this example, the hypothesis (referred to as a null hypothesis) is:
“The coin is well balanced”.
Two errors are therefore possible:
Obviously, it would be ideal to find a process that minimizes both risks of error at the same time. Unfortunately, it can be shown that they are inversely correlated, meaning that any process that reduces the first error generally increases the second and vice versa. A choice between the two errors will therefore have to be made, so that the error to be avoided is the more significant one.
This is a difficult choice, as it generally depends on several parameters. The risk of the first error can be legitimately minimized to avoid the introduction of coins that do not comply with specifications. However, from an economical perspective, it is possible to consider that the second error should be minimized.
Moreover, it is also important to consider the rightful rejection of the hypothesis that the coin is well balanced. This is referred to as the power of the test. There are two reasons why this notion is important:
To resume the case of interest here, it is therefore preferable to minimize the statement that predicted reliability is consistent with the experience feedback when this is not the case. On the other hand, we must make sure that the power of the test is equal to or greater than 50%.
The proposed method involves the execution of the following stages, given in the sections below.
It is important to make sure that the predicted reliability is within the range of confidence of the observed reliability.
If then the predicted reliability is statistically consistent with the data.
If then the predicted reliability is not statistically consistent with the data.
The power of this test is the following: which is a probability that depends on the actual value of Lambda, which is unknown. The objective is then to calculate, depending on the actual value of λ, the value
This calculation is very likely mathematically impossible, but a simulation-based numerical approach can yield this curve (or function). The following algorithm is proposed.
The parameters λp, N and T are known. Here, T is fixed and N is the realization of a Poisson random variable of parameter λ x T. For each value of a set of values of λ (around λp, but different from λp), a large number of random variables is simulated following a Poisson distribution of parameter λ x T and the percentage of is calculated.
This percentage is an estimation of the power of the test. It is known that if λ tends to λp, then this percentage tends to (α = 1 – NC). It is also known that when λ is farther from λp, this percentage tends to 1. This curve can be used to decide, for an a priori fixed power, the possible gap between λ and λp through the test accepted λp. We then obtain:
EXAMPLE 5.1.– Consider the following data:
– Predicted failure rate | λp = 2.2.10-3; |
– Acceptable tolerance | Tol = 30%; |
– Risk level | α = 10% CL = 1 - α = 90%; |
– Cumulative observation time | T = 4,000 hours; |
– Number of observed failures | Nb_Failures = 8; |
– Number of simulations | Nb_Simulation = 1,000. |
Therefore, the predicted failure rate interval is:
which is then:
The credibility curve in Figure 5.4 is obtained.
The credibility interval of the predicted failure rate is:
Since the credibility interval is larger than the tolerated uncertainty interval, no conclusion can be drawn when given the validity of a predicted failure rate the observation time T must be increased!
EXAMPLE 5.2.– Consider the following data:
– Predicted failure rate | λp = 2.2.10-3; |
– Acceptable tolerance | Tol = 30%; |
– Risk level | α = 10% CL = 1 - α = 90%; |
– Cumulative observation time | T = 10,000 hours; |
– Number of failures observed | Nb_Failures = 10; |
– Number of simulations | Nb_Simulation = 1,000. |
Therefore, the predicted failure rate interval is:
which is then:
The credibility curve in Figure 5.5 is obtained.
The credibility interval of the predicted failure rate is:
This is a similar conclusion to the previous one, though the credibility interval is smaller.
EXAMPLE 5.3.– Consider the following data:
– Predicted failure rate | λp = 2,2.10-3; |
– Acceptable tolerance | Tol = 30%; |
– Risk level | α = 10% NC = 1 - α = 90%; |
– Cumulative observation time | T = 50,000 hours; |
– Number of observed failures | Nb_Failures = 100; |
– Number of simulations | Nb_Simulation = 1,000. |
Therefore the predicted failure rate interval is:
which is then:
The credibility curve in Figure 5.6 is obtained.
The credibility interval of the predicted failure rate is:
With the credibility interval smaller than the tolerated uncertainty interval, a conclusion can be drawn referring to the validity of the predicted failure rate with respect to REX.
For certain subsets, it was therefore possible to have a reliability estimation from the predicted reliability, experience feedback or data provided by the manufacturers of components. However, this reliability estimation was not possible for others, as a test was needed for this purpose (reliability demonstration, reliability test).
The optimal allocation of reliability therefore involves setting the reliability of subsets for which an estimation was possible, and depends on this data reallocating the reliability objectives to subsets that could not have such an estimation. These new objectives allow one of the following:
Assume that the architecture of the studied product is illustrated in Figure 5.7.
Reliability allocation for non-maintained products
Assume that the reliability objective is Robj = 92% at the time Tm = 10,000 hours. Since there are three subsets (DC/DC converter, fan and speed measurement), the reliability objective for the subsets of the first level is:
For the fan, given that there are no lower-level subsets, the following relation can be written:
For the subsets of the second level, the reliability objective is given by:
For the energy reserve, the booster and the digital electronics, given that there are no subsets of lower level, the following can be written:
For the level 3 subsets, the reliability objective is given by:
It can be written that:
A diagram is obtained (see Figure 5.8).
Reliability allocation for maintained products
Assume that the reliability objective is MTBFobj = 10,000 hours. The reliability objective for the level 1 subsets is, considering there are three subsets (DC/DC converter, fan and speed measurement):
For the fan, since there are no low-level subsets, it can be written that:
For the level 2 subsets, the reliability objective is given by:
For the energy reserve, booster and digital electronics, since there are no low-level subsets, it can be written that:
It can be written:
A diagram is obtained (see Figure 5.9).
The reliability of the previously defined subsets should now be defined. The estimations in Table 5.2 are proposed.
Table 5.2. Reliability estimation methods for the subsets
Estimation type | Estimation method |
Fan | Manufacturer data |
Energy reserve | FIDES |
Booster | FIDES |
Digital electronics | FIDES |
Speed sensor | Reliability demonstration |
Voltage reference | Accelerated test |
For the estimations made with FIDES, the results are:
For the fan, the manufacturer guarantees a failure rate of:
Non-maintained products
With the estimated failure rates being constant, the reliability can be estimated by the relation:
This yields:
Given this data, the other subsets can be reallocated. The following can be written:
Hence:
Consequently, the reliability of the Speed_Sensor and voltage reference subset can be estimated as follows:
A new diagram is obtained (see Figure 5.10).
The objective of reliability demonstration for the speed sensor is therefore 97.393%. For the voltage reference, it must be shown that the reliability model obtained by accelerated tests also leads to a reliability of 97.393%.
Maintained products
With the estimated failure rates being constant, the reliability can be estimated by the relation:
This yields:
Given this data, the other subsets can be reallocated. The following can be written as:
Hence:
Consequently, the reliability of the Speed_Sensor and voltage reference subset can be estimated as follows:
A new diagram is obtained (see Figure 5.11).
The reliability demonstration objective for the speed sensor is therefore 20,630. For the voltage reference, it should be shown that the reliability model obtained by accelerated tests also leads to an MTBF objective of 20,630.
Reliability demonstration for the speed sensor
Assume that the speed sensor is temperature sensitive and its activation energy is given by Ea = 0.45 eV. On the other hand, the life profile of the product over a one year period is given in Table 5.3.
Table 5.3. Parameters of mission profile example
Description of the stage | Stage duration (hrs) | Stage temperature (°C) |
Non-operation | 2,760 | 15 |
Operation | 6,000 | 45 |
Based on Sedyakin’s principle (Sedyakin 1966; Bayle 2019), an equivalent temperature of 39.3°C can be found (exp(Ea/Kb.(1/Teq – 1/Ttest) = exp(0.45/8.617e-5.(1/(273 + 39.3) – 1/(273 + 105))) # 18.3).
Assume that the test data are fixed by:
Consequently, the acceleration factor of the test is: AF # 18.3.
Non-maintained products
Knowing that the objective to be demonstrated is the required test time is:
Maintained products
Knowing that the objective to be demonstrated is the required test time is:
Reliability test for the voltage reference
Assume that the voltage reference is temperature sensitive. Our proposal is to conduct two tests on 12 parts under the following conditions:
The data in Table 5.4 was obtained.
Given the temperature sensitivity, a Weibull–Arrhenius model is chosen, whose survival function is given by:
The parameters of the model to be estimated are therefore:
The maximum plausibility method yields:
Figure 5.12 presents the Weibull paper plot.
Figure 5.13 represents the same plot, but depends on time and temperature.
Non-maintained products
Based on the data of the accelerated test, the survival function of the voltage reference under operational conditions can be estimated:
Consequently, the survival function at the analog electronics level is given by:
The survival function at the level of speed measurement is given by:
The survival function at the product level is given by:
or
The objective of 97.393% is not met at the level of voltage reference and therefore at the product level.
A diagram is obtained (Figure 5.14).
Maintained products
Using the data of the accelerated test, the survival function of the voltage reference under operational conditions can be estimated as follows:
The objective of 20,630 is met.
The MTBF at the product level can be estimated based on various estimated MTBF. For the analog electronics, the MTBF is given by:
For the speed measurement:
Therefore, at the product level:
or
Finally, a new diagram is obtained (see Figure 5.15).
For certain components whose technology evolves rapidly, it is reasonable to consider the possibility of observing failure mechanisms, all the more so as the life profile duration tends to increase in most of the industrial applications. This is particularly the case for complex digital components (microprocessor, flash memories, FPGA, etc.) with the well-known Moore’s law (Moore 1965). In order to avoid any aging problems, a manufacturer may decide to make electronics designers adhere to strict design rules. As an illustration, let us consider the virtual example of a digital component subjected to the following aging mechanisms:
Being independent, the failure rate of the component is written as:
where m is the number of failure mechanisms.
Demonstration
NOTE.– This demonstration may be considered trivial, and therefore useless. Since it is very accessible, its introduction was deemed useful.
These failure mechanisms are competing, and are therefore all sensitive to the junction temperature θj of the component. Equation [5.3] should be written as:
Assume that the failure rate can be expressed in the form of an Arrhenius exponential model. Equation [5.6] is then written as:
where:
EXAMPLE.– Consider the following data:
Consider the following life profiles:
The manufacturer being considered here has implemented design rules for digital circuits that date back several years, which are given by:
Quite often, these design rules rely on maximal values of temperature that are mentioned in the specifications and/or in the qualification stage. These rules are often very strict, as the maximal values of temperature are not undergone by the components throughout the duration of their operation, but only in several very specific cases.
Let us now plot the profile of the service life considered here as the inverse of the failure rate. Figure 5.16 is obtained from the proposed data.
It can be noted that the previous design rules are rather strict with respect to the considered profiles. On the other hand, with the new proposed design rules, the temperature range is complete, which may prohibit the use of the component in certain cases when the junction temperature is too low, for example.
NOTE.– The hypothesis of an exponential law may be put into question for aging mechanisms. There are, however, several reasons for retaining this hypothesis:
In the industry, the development of a product or electronic system requires the choice of the best components and technologies associated on the basis of several criteria and objectives. Reliability is essential among these selection criteria, as well as price, performance and continuity.
For each component/technology in the developed product, the predicted failure rate must be estimated prior to the product being put into operation. This depends on several factors:
All of these factors should be taken into account for the predicted estimation of reliability, which is theoretically complex. Strictly speaking, a “global all-manufacturer” model is needed, a model that integrates all of the manufacturers and possible technologies. This model cannot be built from a theoretical/physical point of view, even if the reliability of these components would be fully known.
It is clear that a “global” failure rate for all of the manufacturers of components should be found. It is therefore natural to consider a localization indicator. The aim is to give a general order of magnitude to the failure rate, a unique number that best summarizes the reliability data of various manufacturers. A mean of failure rates can be considered. The most appropriate type of mean should be identified. Let us briefly review the various types of mean in order to verify whether one of them is best adapted to our situation.
Generalized mean
Assume that there is a continuous function f defined over a set of real positive numbers and its reciprocal function f-1 exists. The generalized mean is given by
where n is the number of observed values.
Based on this general definition, the following means can be defined:
Arithmetic mean
The arithmetic mean is obtained when the function f is defined by Equation [5.8] is then written as:
Root mean square
The root mean square is obtained when the function f is defined by Equation [5.8] is then written as:
Mean of order p
The mean of order “p” is obtained when the function f is defined by Equation [5.8] is then written as:
Geometric mean
The geometric mean is obtained when the function f is defined by Equation [5.8] is then written as:
Consider the example of a component subjected to a single sensitive mechanism at constant temperature. An exponential-Arrhenius model can then be used in the form of a Cox model, whose failure rate is given by:
This modeling is possible because it is logical to think that, since the components of the manufacturers achieve the same function and have quite similar technologies, the proposed reliability model is the same for all manufacturers. Only the values of the parameters of the model, because of different reliabilities, differ from one manufacturer to another.
The objective is then to find a reliability model “globally applicable to all manufacturers” using the equation [5.13] according to the following form:
The difficulty is to express the unknown parameters λoeq and Eaeq using the presumably known parameters Ea and λo of each manufacturer. It is important to avoid proposing digital methods that are generally applicable only over ranges of values of fixed temperature, for which neither the physical interpretation of the parameters of the considered digital model nor the importance of the error made is known. It is therefore preferable to use an analytical method, or even an approximate one, that favorably addresses all of these undesirable criteria.
Based one equations [5.9]–[5.12], the only mean that meets the “analytical” criterion is a geometric mean. The failure rate can then be expressed as follows:
with
Demonstration
Using equations [5.12] and [5.14], we obtain:
This is an interesting result, as it shows that the thus-defined component failure rate can be written in the sought-for form. This formulation has two important advantages:
NOTE.– It can be seen that the “equivalent” activation energy is expressed as the arithmetic mean of various activation energies.
EXAMPLE.– Consider the data in Table 5.5.
Figure 5.17 is obtained.
Manufacturer weightings
Using a more realistic approach, it can also be considered that product designers do not use the components uniformly in terms of manufacturers. In other terms, some manufacturers are used more often than others. In this case, a “weighted” geometric mean can be defined as follows:
Based on equations [5.13] and [5.16], we obtain:
with
Demonstration
End
EXAMPLE.– Let us resume the data of the previous example and add the following weights to Table 5.6.
Figure 5.18 is obtained.
18.222.197.35