As already noted, a product that is in the manufacturing stage may typically have youth failures, notably during the first years of operation. Moreover, a product is considered mature provided that it has a constant reliability level that is higher than the specified objective. Prior to being delivered to the system manufacturer, the product should therefore be subjected to a test referred to as a burn-in test (Kececioglu and Sun 1999). The aim of this test is to eliminate as much as possible the defects generated during the manufacturing stage. It is conducted on each of the products delivered, as the manufacturing process may have certain variability and evolve (change of equipment, testing process, operator, variability in the testing means, etc.) in time throughout the period of product delivery.
In fact, for the method chosen in this book, this test is part of a more comprehensive method, namely the HALT/HASS method, which involves two distinct stages:
Burn-in testing presents a challenge, however. The maximum elimination of latent youth defects at reasonable costs (notably the testing duration) requires the application of high physical contributions, significantly higher than those to which the product is subjected during its operation. While this principle may be of interest for the products with latent defects, it is not useful for healthy products (without latent defects). Therefore, it should be verified that the burn-in test does not cause serious damage to healthy products. Moreover, it should also be verified that the burn-in test has certain efficiency (capacity to reveal latent defects), otherwise it would be useless.
Therefore, prior to the actual burn-in test, two tests are conducted on a certain number of representative products of the series:
This is illustrated in Figure 7.2.
The results of HALT tests, namely the estimation of the operational limits in temperature and vibration, can be used to give initial values for the levels of physical contributions of HASS tests. Any manufacturer willing to conduct a burn-in process must be aware of the following points referring to burn-in:
POS1 can be used to verify that applying a burn-in cycle in the series stage does not damage the service life of a product by more than “X%”. Hence, one or more pieces of equipment are subjected to several cycles. The equipment undergoing this test must be very similar to those to be delivered in the final series. Under no circumstances can this equipment be delivered as operational equipment.
This approach is generally used in HALT/HASS tests to justify the lack of damage of the products without latent defects. The lack of damage rests on a theoretical perspective rooted in Miner’s rule (Miner 1945). According to this rule, the cumulative damage of the product is linear. Hence, if one or more products are subjected to “n” burn-in cycles without observing any failures, then the burn-in test cycle to be applied to all the products will damage the healthy products by no more than 1/n. Damage of 2% is generally admitted. In this case, the POS1 test involves the application of 50 burn-in cycles, as illustrated in Figure 7.3.
A second hypothesis concerns the estimation of damage of products submitted to a HASS cycle. From a reliability perspective, what determines the useful service life of the product (flat sector of the bathtub curve) is the minimum of random variables representing each aging failure mechanism (that can vary depending on the life profile).
This can be schematically represented by Figure 7.4.
If all the laws of reliability of failure mechanisms were known, it could be verified if the specified reliability objective has been met. Unfortunately, this is not the case, as the component manufacturers do not provide this information in the datasheet of the component. Though experience shows that, in general, aging failure mechanisms are not observed during product operation, the minimum of the aging mechanisms with respect to the product service life is not known. A conservative approach is therefore to take the case in which the minimum corresponds exactly to the end of the service life.
Following the failures observed during the “n” cycles, certain information can be extracted, as listed in Table 7.1.
If one or several failures are observed on the first burn-in cycle, this proves a certain effectiveness of burn-in. This scenario is however unlikely given the small number of products undergoing the POS1 test. Hence, POS2 is proposed.
There is obviously no physical law of failure taking into account the simultaneous effect of the thermal cycling, vibrations and energizing. Moreover, there is no recent model taking into account the influence of the slope in the thermal cycling, this being particularly important during the precipitation phase. However, from a conservative perspective, it is possible to take into account the aging occurring during the POS1 test by considering only the effect of thermal amplitude according to the Coffin–Manson law. The focus is on the acceleration factor between the POS1 conditions and those observed during operation by the end user. This is given by:
A small problem remains to be solved however, since the thermal amplitude is different between the precipitation and detection phases. However, based on the Sedyakin principle, an equivalent amplitude (Bayle 2019) can be found in terms of reliability, which is given by:
PROOF.–
According to the Sedyakin principle:
or
or
or
End
Consequently, the acceleration factor can be written as:
or even
Therefore, the 50 cycles of POS1 can be reflected by operational cycles in the following form:
EXAMPLE.–
Consider an operational application with the life profile given in Table 7.2.
ΔT (°C) | Number of cumulated cycles | |
Brazed joints | 20 | 7,300 |
Bonding | 35 | 7,300 |
Assume the precipitation and detection phases have the following characteristics:
Let us analyze the aging generated by POS1 for the products without defects. For thermal cycling, the following failure mechanisms can be considered:
An equivalence in operational cycles is therefore obtained for POS1 for brazed joints
or
This result is below the 7,300 cycles of the life profile of the previous table. But it is important to note that the acceleration factor model that has been taken into account is very conservative. On the other hand, the focus is on the aging for one burn-in cycle. In this case, it can be seen that it represents 2,163/50 ~ 43 cycles or barely 0.6% (43/7,500).
or
This result shows that, while being conservative, the POS1 test covers the 7,300 cycles of the life profile considered in this example. If no failures are observed, it is reasonable to believe there will be none during operation.
It is interesting to see what this means in terms of reliability for the POS1 test. Similar hypotheses to those in section 7.2.2 lead us to obtain:
According to equation:
where:
NC is the level of confidence;
AF is the acceleration factor from equation [7.1];
N is the number of products undergoing POS1 test;
Npos1 is the number of thermal cycles of POS1;
Nm is the number of thermal cycles of the mission;
β is the shape parameter of Weibull law.
Resuming the data in the previous example and moreover considering that N = 4, NC = 90% and β = 5, the following results are obtained:
Brazed joints
Bonding
According to the equation:
Assuming the same hypotheses as previously leads to:
Brazed joints
Bonding 40,195 cycles
to be compared to 7,300 of life profile.
As already seen in the previous section, the objective of the POS2 test is to show a certain level of effectiveness of the burn-in cycle. The POS2 cycle relies on four burn-in cycles, and it theoretically damages only 8% of the healthy products according to Miner’s criterion. The four cycles are in fact a compromise between reasonable damage and a test duration that is long enough to trap as many latent failures as possible.
This test will be conducted on a series of products that could potentially be delivered to the equipment manufacturer. It is conducted in three successive stages on a certain number of products. As soon as a product is defective, the POS2 test is stopped, irrespective of the stage at which the failure occurs. Conducting the test in three stages makes it possible to limit the number of products undergoing the POS2 test if a failure is detected in stage 1 or 2.
When conducting the POS2 test, several scenarios are possible, as explained in the following table. In this table, we do not quantitatively distinguish the effect of the levels of physical contributions responsible for the occurrence of failures for each cycle. Only a qualitative analysis based on the occurrence of failures can indicate the effectiveness of burn-in testing. In the calculations that follow, the proportion of defect “p” is an unknown quantity. However, depending on the feedback for older products, a relatively accurate estimation of this parameter should be possible. The proposed process is illustrated in Figure 7.5.
All the possible scenarios are defined in Table 7.3.
POS2 testing is conducted without transfer of products from one stage to the next. Hence, it is possible to model the POS2 test by hypergeometric distribution. Hence:
The probability of observing “k” failures is given by:
What we are interested in here is estimating the probability of having at least one failure for POS2 in order to verify the burn-in effectiveness. This probability can be written as:
or:
Let us fix p = 2% and observe the evolution of this probability as a function of Q and n. Figure 7.6 is obtained.
It can be seen that the number of products Q subjected to burn-in does not influence the probability of detecting a defect during POS2 testing. Furthermore, it is important to acknowledge that it is necessary to have a significant number of products for POS2 testing in order to achieve a reasonable probability of detecting one or more latent defects.
Let us fix Q = 1,000 and observe the evolution of this probability as a function of p and n. Figure 7.7 is obtained.
It can be noted that the proportion of defects has a significant impact on the probability of detecting at least one defect during the POS2 test. Moreover, in order to have a reasonable probability of detecting a defect, the required number of products is all the larger as the defect proportion is larger, which is in fact logical.
Indeed, since generally p ≪ 1, equation [7.6] can be approximated by:
Hence, the value of n corresponds to a given probability
Based on equation [7.8], it is possible to set a probability to filter a defect and estimate the number of products “n” that must be subjected to POS2 testing. But how can this probability be fixed knowing that it is an ascending function of n? The economic aspect can then open up a possibility. Indeed, given:
it can be written that:
Indeed, the cost due to products having a defect depends on their number Q.p as well as on the fact that they were not filtered out by burn-in. Based on equation [7.8], it can be written that:
This equation is differentiated with respect to n:
This derivative is zero if:
or
or finally
Since “n” is a non-zero integer, the following inequality must be valid: .
NOTE.–
EXAMPLE.–
Ce = 20,000
Cp = 200
Cr = 10,000
Q = 10,000
P = 2%
The resulting optimal number of products for POS2 test is n* = 150. The total corresponding cost is ~ 420,000.
It should be recalled that the objective of the POS2 test is to estimate the effectiveness of burn-in. The number of products subjected to burn-in is set in advance, and therefore this parameter cannot be changed. Neither can the proportion “p” of defects; it simply remains the number of products at each POS2 stage to optimize the probability of having a failure in this test.
If no failure is observed after the third stage of POS2 testing, several hypotheses can be formulated to explain this fact:
The general rule is to wait for the operational feedback information.
Once the POS1 and POS2 tests are conducted, the HASS cycle involves two distinct stages.
The precipitation stage is conducted with the equipment in operation and under continuous monitoring. However, there is no need to stop the test during the precipitation stage if there are functional drifts. As a general rule, the temperature limits, HOL and LOL, during the precipitation stage are defined using the results of HALT with a 10°C reduction applied to the HOT and LOT limits:
A slightly different definition can sometimes be found in which the temperature limits are defined based on HALT results with a temperature reduction applied to the UTOL and LTOL limits of the HALT test defined as follows:
In the case of thermal cycling (type RCT), the temperature ramps are fixed at the value found during the tests for the VRT limit search; this can go up to 60°C/min in the chamber per piece of equipment (10°C/min locally on the burnt-in material). These temperature ramps are instructions of the HALT/HASS chambers (QUALMARK type). It is important to instrument any equipment confined without forced ventilation. In general, the maximum value reached is 35 to 45°C/min at the product level.
The vibration limit HV during the precipitation stage is determined from the HALT test and is decreased by 50%: HV = HVOL/2. It is important to instrument the equipment when several pieces of equipment are in the chamber in order to verify the homogeneity of vibration levels after installations.
In the detection stage, the constraints are generally applied at the specification level, making it possible to evidence obvious defects. The detection stage is conducted with the equipment under operation and under monitoring. The high and low limits of temperature during the detection stage rely on the maximal operational specification levels, and there is no notion of margin.
The temperature ramps can be weak, for example, 3°C or 10°C per minute if the equipment is to be tested during these ramps. The higher limit of vibration is reduced by 60 to 80% of the value fixed in precipitation, with the definition of a minimum level generally of 5 Grms, below which there is no descent.
The applied vibration can have a sawtooth form, ascending and descending by 5 Grms to X% of the value fixed in precipitation or in a constant manner.
At the end of each level, there is an on/off stage that may involve the rated voltages and currents as well as, in certain cases, with the minimum, maximum levels of voltage and current.
Figure 7.9 proposes an example of the HASS cycle that combines temperature levels and random vibrations.
The overall HALT/HASS method is illustrated in Figure 7.10.
At first glance, the cost of a manufacturing process with burn-in seems higher than that of a process without burn-in. The overall cost must nevertheless integrate the costs induced by youth defects, which makes it possible to highlight the economic interest of burn-in.
We should therefore ask the question whether implementing a burn-in process is cost-efficient with respect to the objectives set. Implementing a burn-in process may be wise if the manufacturer expects to improve the following points:
Economic criteria are completed by certain manufacturing constraints that can guide the choice of conducting a burn-in test or not.
Several constraints extrinsic to the equipment manufacturer can be considered:
Several constraints intrinsic to the equipment manufacturer can be considered:
Hence, the economic criteria and the manufacturing constraints guide the choice for or against a burn-in process.
The overview diagram given in Figure 7.11 proposes a certain number of qualitative manufacturing criteria that may influence the decision of the equipment manufacturer to implement a burn-in process. Other criteria and constraints may be involved in the final decision (see below).
The final user can, according to their own experience, require a burn-in test to be conducted and include it in the specifications they send to the manufacturer. In this case, it is a contractual demand that states the necessity for the manufacturer to conduct a burn-in test. This situation is typically encountered in the case of space equipment, in applying the ECSS standards required by the users. If not specified by the final user, the choice in conducting a burn-in test is made according to the other criteria given in Figure 7.11.
If the manufactured equipment/product is such that:
If the manufacturer has the elements described in section 7.5.3 available, it is possible to calculate if implementing a burn-in process is of interest, depending on the number of parts to be manufactured. Indeed, depending on the coverage rate, the effectiveness of the burn-in process and the proportion of products with latent defects, the minimal quantity to be produced before detecting a defect may be higher than the overall production. In this latter case, implementing a burn-in process presents no interest. If the manufacturer does not have sufficient elements, the choice of conducting a burn-in process is made according to the other criteria in Figure 7.2.
If the equipment/manufactured product is integrated by a system manufacturer before delivery to the final client, it can, depending on its own experience, or in response to client demand, require the implementation of a burn-in and include it in the specifications it sends to the manufacturer.
In the absence of demands or external elements leading to the implementation of a burn-in process, the feedback from the manufacturer on similar products/equipment can be used to guide its choice. If for a product/equipment of similar complexity, involving identical manufacturing processes, burn-in implementation has previously revealed a level of latent defects that was unacceptable with respect to the reliability level expected for the new product, a burn-in process would logically be once again implemented.
This analysis can be conducted when parameters such as coverage rate, effectiveness and proportion of products with latent defects are known or estimated.
The precondition for burn-in implementation is to have the possibility to observe products with youth defects during burn-in.
Several parameters are involved in this case:
The number of products Q for which a manufacturing defect can be revealed can then be estimated by:
The proportion of products Pr for which a manufacturing defect is observed is then:
For the probability of observing defects revealed by burn-in to be as high as possible, it is our proposal to estimate the lower bound of this proportion Pr for a given risk level α (Gau 2010).
This is given by:
where qF is the quantile of Fisher–Snedecor distribution (Gau 2010).
The quantile of a distribution law is a reciprocal function of the corresponding distribution function. The quantity of the Fisher–Snedecor law can be easily calculated, for example, using Excel with the function “INVERSE.LOI.F”.
Hence, if N = 100, α = 10% and Q = 2, the proportion of defects amounts to 2%, but the lower bound at risk α = 10% amounts to 0.36%. The lower bound of the number of products having a youth defect revealed by the burn-in can now be estimated by burn-in from the following relation:
EXAMPLE.–
Given the following data:
The curve shown in Figure 7.12 is obtained.
Hence, estimating that p = 5%, a burn-in presents interest provided that the number of defective products is above 100. Moreover, for a total of 100,000 products installed, even with a defect proportion of 1%, 677 defective products could be detected over the 1,000 products generated by the manufacturing process.
Obviously, while the amount of products to be delivered is known, the proportion of defective products cannot be known in advance. However, a reasonable estimation is possible due to the experience with products from previous generations. Setting a minimum of 10 potentially defective products, the curve shows that the least number to be delivered are:
Throughout the manufacturing stage, a certain number of tests with various objectives are conducted. It is important to know the coverage of these tests, meaning the capacity of a test to detect manufacturing defects.
This is referred to as the test coverage rate.
Indeed, it is important to be able to precipitate the latent failures into obvious failures, but it should be possible to observe the latter, hence the importance of the test coverage.
In general, the client requirements specify a certain coverage rate of the tests, but only during the operation stage. Nothing is specified during the manufacturing stage and, strictly speaking, it is up to the equipment manufacturer to propose certain demands on this subject.
This is not always (or often) done on each of the tests implemented during the manufacturing stage, and there is no dedicated generic methodology.
According to the general definition of coverage rate of a test, the following definitions are applicable:
Ratio between the number of defects detected by the test and the number of potential defects. Here, the term defect refers to a defect induced by the manufacturing process (improper component, missing component, improper value of the component, etc.).
Ratio between the sum of failure rates of the failure modes detected by the burn-in process and the failure rate of the product.
Ratio between the failure rate of the failure modes detected by the acceptance test and the failure rate of the product.
NOTE.– The various failure rates result from analyses of predictive reliability.
Indeed, a space application (such as Ariane rocket) does not have the same level of demand as a “consumer” application (e.g. mobile telephone).
Each manufacturing stage has a specific objective, and the demand and the method for estimating the coverage rate is different.
Nowadays, the notion of coverage rate is present for the receipt test but not always for the other stages of the manufacturing process. The coverage rate results from the FMEA analysis for the qualitative part and from the predictive reliability analyses for the quantitative part.
The following definitions are proposed:
The coverage rate of a test is generally defined by:
However, this definition does not seem to provide enough information, and it is preferable to associate it with the testability rate of the test, expressing the ability of a product to be tested. Moreover, a definition of the effectiveness rate of a test is proposed, which characterizes its capacity to detect detectable breakdowns, hence the following definition:
This definition is often used in the notion of coverage rate of a test.
A proportion “p” of products is considered to have youth defects. Consider an overall cost C(t) comprising the following:
The following notations are made:
The overall cost induced by the products depends on:
The overall cost is therefore given by:
or even:
The average number of breakdowns N2(t) during operation for the products without manufacturing defects can be calculated using the Poisson process theory. Indeed, the observed catastrophic breakdowns can be modeled by an exponential distribution of parameter λ. Considering the effect of maintenance, a homogeneous Poisson process is obtained, whose average number of breakdowns at instant “t” is given by:
The average number of breakdowns N1(t) during operation for the products with manufacturing defects can be calculated using the renewal process theory (Bayle 2019) and is given by:
This can be illustrated with the following diagram:
As already noted, the set of products can be divided into two categories:
Among the products having a latent youth defect, a certain number will be detected, while others will not, because the coverage rate of the tests is generally not 100%. Figure 7.13 illustrates the various possible cases for a product.
The overall cost C(t) induced by the products depends on:
The overall cost with burn-in is: CAD (t) = Cost_Burnin(t) + Cost_Operation(t) or still:
To compare these two costs, it is important to study the difference between them. Let us note this difference by ΔC(t) = CAD(t) − CSD(t) or:
The calculation of N1 is complex and requires knowledge on the parameters of the Weibull law modeling youth breakdowns. To simplify the theoretical approach, the following approximation can be made (as they are differentiated only by a youth failure):
Hence:
Numerical applicationsEXAMPLE 7.1.– Poor manufacturing process and effective burn-in:
The curve shown in Figure 7.14 is obtained.
The cost with burn-in is lower than that without it. This is a logical result, since there is a poor manufacturing process (p = 10%) and an effective burn-in (τc = 90% and Tb = 10 hrs). Indeed, due to the effectiveness of burn-in, most of the youth failures are filtered out and they will not occur during operation. Since the repair cost during operation is higher than that of burn-in, the overall result is positive.
EXAMPLE 7.2.– Very good manufacturing process and ineffective burn-in:
The curve shown in Figure 7.15 is obtained.
The costs with burn-in are higher than that without it. Indeed, though burn-in may be effective (Tb = 0.1 hrs and τc = 90%), since the manufacturing process is very good (p = 0.1%), burn-in filters out few youth failures and is therefore more expensive.
EXAMPLE 7.3.– Very good manufacturing process and effective burn-in:
The curve shown in Figure 7.16 is obtained.
In this example, the overall cost with burn-in is higher than that without it. This is a logical result, since the manufacturing process is very good (p = 0.1%) and therefore generates only a few youth failures during operation. Hence, burn-in adds an additional cost without return on investment.
EXAMPLE 7.4.– Poor manufacturing process and ineffective burn-in:
The curve shown in Figure 7.17 is obtained.
The costs with and without burn-in are pretty much identical. Indeed, though the manufacturing process generates a proportion of products with latent failures (p = 0.1%), burn-in being ineffective, few youth failures are filtered during burn-in.
18.225.175.151