CE
Compliance Engineering
search
Join Our Discussions
Find Suppliers Useful Links
calendar
Click
here for information on advertisers and products!
About CE-Mag
Free Subscriptions
Current Issue
Article Archives
ESD Help
Mr. Static
Web Gallery
Staff Info
Contact us

 

 

 

Using Accelerated Testing Methods to Improve Electronics Design

Alexander J. Porter

Measuring reliability is a critical part of electronic design. An understanding of available testing methods can help choose the right method.

Accelerated test methods that enable faster development and provide higher reliability have become a crucial part of evaluating products. Because accelerated testing covers a broad range of methods and equipment, determining which method is best for a particular application can be challenging. Applying that method within a business structure and selecting the correct equipment are also important.

Why Accelerated?

Measuring the reliability of products has become a mainstay in manufacturing businesses. Often, contracts are written based on meeting specifications, which are based on a given number of sample parts tested under narrowly described conditions with zero failures (e.g., 12 parts to 1 life). The advantages and influence of contractual requirements on the reliability tests are subtle but significant.

The ideal reliability test conducted in an engineering-driven test plan would entail testing a significant number of samples to failure and then conducting a reliability analysis on the time-to-failure of the samples. The result would provide an accurate measure of the product reliability.


Figure1. As the stress level rises, the time-to-failure drops exponentially.


For products purchased for assembly into a larger system, time-to-market requirements and the development timeline dictate a fixed time under test. This limited time frame precludes using a test-to-failure approach for reliability tests. Instead, a given number of the product are tested to a fixed life, with zero failures allowed. This test demonstrates the prod-uct's minimum reliability, while allowing for a fixed time to conduct testing.

Contractual reliability is, therefore, a compromised reliability. The reliability measure of a product sample is never actually taken. As a result, the product may be overdesigned or marginally designed. In fact, it is quite simple to demonstrate that many standard contracted-reliability measures have an inherently uncertain result. For example, look at a case in which 12 samples are tested to 1 life. The requirement is zero failures. This demonstrates a given level of reliability with a fairly low confidence of 46%. If the product has a 95% reliability, there is a nearly 46% chance that at least 1 of the 12 samples will fail, whereas there is a 54% chance that all 12 parts will pass. This means testing 12 samples provides almost even odds that the test will pass if the population is truly 95% reliable. Although this alone may not prove much, it allows for a clean, measurable contract.

Speed It Up

Several accelerated test methods, including accelerated reliability, focus on overcoming limitations of traditional reliability tests. Accelerated reliability uses a simple premise and some complex math to achieve a reliability estimate of a product for particular conditions. As a particular source of stress is increased, time-to-failure exponentially decreases. This effect is used to design the accelerated reliability test. A simple example of this effect is the thermal degradation of insulating material. As the stress level rises, the time-to-failure drops exponentially. The logarithmic rate of change is also affected by the change in the physics of failure. Both of these effects are used in accelerated reliability (see Figure 1).

In an accelerated reliability test, several sets of parts are tested at stress levels much higher than the expected service level. For example, eight parts might be tested simultaneously (Test 1) until four parts fail (see Figure 2). This would establish one point on the accelerated reliability graph. Next, another eight parts would be tested at a slightly different stress level until four parts fail (Test 2). Testing groups of parts at different stress levels would continue until enough data were collected to extrapolate the service condition time-to-failure.

Figure 2. An accelerated reliability test. For each test, eight parts are tested until four parts fail.


The problem is that every stress source and failure mechanism has a different characteristic exponential relationship. Wayne Nelson provides several examples of empirically determined accurate math models of different stress and failure combinations.1 Three models are shown here:

• Arrhenius-Weibull Model

F(t, T) = 1 – exp{–[t exp[ –go– (g1/T)]]b}

• Power-Lognormal Model

F(t,V) = F{[log(t) – µ (x)]/s}

• Cox (Proportional Hazards) Model

Ro(t) = exp[– Úot ho(t)dt]

Step It Up

Another method for accelerating tests increases the stress levels similar to the accelerated reliability test described above, but starts at service conditions after a traditional reliability test. Step stress testing is a combination of traditional reliability testing and overstress testing. The purpose of step stress testing is to demonstrate one life of a product and then overstress the product incrementally to find failure modes. The advantage of such a test for many supplier-purchaser relationships is the ease with which the contracted specification required can be extended to conduct a step stress test (see Figure 3).


Figure 3. For step stress testing, the product’s destruct limit for a given stress source is divided into 10 even steps for the service and stress level.

The challenge of the test is in setting the amount by which the stress sources (such as temperature, voltage, and vibration) will be increased. Because each stress source affects different failure modes and each failure mode is accelerated at a different rate, setting a good proportional increase for each stress source is important.

A simple method for setting the stress levels is to determine the product's destruct limit for a given stress source (for example, 160°C) and divide the stress levels between the service level (80°C) and the destruct level (160°C) into 10 even steps. This method has two advantages. It is easy to implement, and it ensures that the product will exhibit a fairly even increase in damage from each stress source, provided that the product was experiencing an even amount of damage from each stress source at service conditions. A properly designed and optimized product accumulates stress damage evenly throughout the product, so this assumption is reasonable.

An alternative is to identify the exponential relationship between increasing each stress and the effect on failure modes to determine a step level in which the stress damage accumulated from each source is proportionate to service conditions. This alternative requires an extensive application of accelerated reliability tests for each stress source combination.

Forget Statistics

The two methods of accelerating tests address different problems encountered with traditional reliability tests. One way to overcome the problems with using the traditional reliability test is to measure the failure modes that produce the statistics, rather than measuring the statistics.

Highly accelerated life testing (HALT) is a test method usually applied to solid-state electronics that determines failure modes, operational limits, and destruct limits. This test method differs significantly from reliability tests. HALT does not determine a statistical reliability or (despite its acronym) determine an estimated life. HALT applies a single stress source to a product at elevated levels to determine the levels at which the product stops functioning but is not destroyed (operational limit), the levels at which the product is destroyed (destruct limit), and the failure modes that cause destruction.

HALT has a significant advantage over traditional reliability tests in identifying failure modes in a very short period. Traditional reliability tests take a long time (from a few days to several months). HALT typically takes two or three days. HALT also identifies several failure modes, providing significant information for design engineers to improve a product. Typical reliability tests provide only one or two failure modes, if the product fails at all.

HALT typically uses three stress sources: temperature, vibration, and electrical power. Each stress source is applied starting at some nominal level (for example, 30°C) and is then elevated in increments until the product stops functioning. The product is then brought back to the nominal conditions to see whether it is functional. If the product is still functional, then the level at which the product stopped functioning is labeled the operational limit. The product is then subjected to stress levels above the operational limit, returning to nominal levels each time until the product fails to function. The maximum level at which the product functioned before failing to operate at nominal conditions is labeled the destruct limit. This process is repeated for hot temperature, cold temperature, temperature ramp rate, vibration, and voltage. The process is also repeated for combined stresses.

HALT has two significant disadvantages. Without a statistical reliability measure, the method does not fit well into the requirements for contracting between suppliers and purchasers. This relationship requires an objective measure that can be written into a contract. Some schemes have been suggested that would allow the objective measure of the relationship between the operational limit and the service conditions. Another disadvantage to HALT is the amount of time the test method can take to address a significant number of stress sources. Each stress source tested requires about one day to test one or two sample products. Because HALT is usually applied to solid-state electronics, the stress sources are limited to hot, cold, ramp, vibration, and voltage. Covering all combinations requires six parts (including the combined environment test) and four or five days (eight-hour days). However, applying the method to 10 or 20 stress sources increases the number of parts to 11 or 21, respectively, and the days of testing to 10 or 20, respectively.

Verify the Failure

One challenge with testing-to-failure is accounting for the business environment. Questions include: How does one contract a test to failure? Which failures will be addressed and which ones allowed? A contract built around a statistical test is well understood. A contract based on failing the product is more difficult to explain.

To quantify a product's development and potential, the product is first tested using the failure mode verification testing (FMVT) process. During FMVT, the product is exposed to all known stress sources (sources of potential damage to the product) simultaneously. Stresses can include vibration, temperature, humidity, mechanical loads, electrical loads, radiant heat, pressure, etc. Stresses are applied so as to randomize their effects. For example, vibration would be a random six-axis profile. Mechanical loads would be random relative to each other, and electrical loads would be randomized. The goal is to apply all known stresses simultaneously to produce random stress throughout the product. Initially, the stresses are applied at the maximum expected service conditions. Over time, the stresses are increased.

As the test progresses, the product will experience failures. The failures will be at locations in the design that accumulate stress damage faster than the rest of the product. These locations are the weak points in the design. Because all known stress sources are applied randomly and increasingly, the relative order of the failure modes is approximately the order of significance of the failure modes. The resulting unique failure modes can be plotted for a single product (see Figure 4).


Figure 4. Failure mode progression.


With this information, design maturity can be quantified. The theory is based on a couple of simple assumptions. First, the product is feasible, which means that although the design may have weaknesses, the basic idea is viable and the design must only be iterated to work. Second, an optimized, robust design accumulates stress damage evenly throughout the product so that when one part of an optimized design fails, the rest of the design is near failure also. Based on these two assumptions, relative design maturity can be determined from the failure-mode progression. If the first failure mode occurs early and is separated from the rest of the failure modes, then the product is immature and can be improved. If all failure modes occur close together and after a significant period, then the product is mature. Figure 5 shows a mature design. If the first failure modes were addressed, the design's expected life would not change because the next failure mode is at nearly the same time. However, addressing the first failure mode in the product represented in Figure 4 would result in a significant improvement.

This potential for improvement can be quantified as design maturity (DM). DM is equal to the average time between failures after the first failure divided by the time to first failure. This calculation gives the average potential improvement in life under the accelerated test gained by fixing one failure mode. The product in Figure 4, which has a DM of 0.42, has an average potential improvement in life of 42% by fixing one failure. The product in Figure 5, however, has a potential improvement of only 2%. Clearly, the second design is more mature; that is, it has less room for improvement.


Figure 5. Failure mode progression of a mature design.


DM tells only part of the story. The maturity of a design provides a measure of how much better the product could be under the accelerated stress conditions. A relative measure of a product's life is also needed if products are going to be compared. Figure 6 shows the failure-mode progression for a group of products that are assembled into a system. Although all components have a significant potential for improvement (37­60%), the products do not share the same potential. For example, Product 3 has a potential improvement of 49%, but in the accelerated environment, it can easily reach a life of more than 300 minutes by having one failure mode fixed. Although Product 4 has a 60% potential for improvement, fixing one failure mode would only give a life of around 175 minutes in the accelerated environment.


Figure 6. Failure mode progression for a system of products.


A quantification of the potential limit of the design is also needed. This technological limit can be defined by removing failure modes and recalculating the DM until it is <0.1. The time of the first remaining failure mode is the technological limit.

With Product 1, for example, eliminating the first failure mode would result in a DM of 0.06. Therefore, the technological limit is the time of the second failure mode, or 250 minutes. Using this definition, the technological limits of each component are 250, 240, 310, and 220 for Products 1­4, respectively. In other words, if these designs are iterated, the best they are expected to get to is their technological limit. As a system, then, it is only worthwhile to address the failure modes below the technological limit.

In addition, the maturity of a system as a whole can be determined. If it is assumed that the individual components will be brought up to their technological limit, how well balanced will the life of the components be? Using the technological limit to calculate a design maturity provides the system design maturity. In this case, it would be 0.13, which means that replacing the component with the worst potential (Product 4) would have a potential of improving the life of the system by 13%. This scenario assumes that the products have first been iterated to meet their technological limit.

Failing at Production

Highly accelerated stress screen (HASS) is a production screen usually applied to a product developed using the HALT method. Unlike the increasing stress levels in HALT, HASS uses a constant level significantly above the service conditions. HASS also may use a limited selection of stress sources such as temperature only. There are two types of HASS: a 100% screen and a sample check.

The premise of a 100% HASS screen is to apply a stress level for a set duration to all products from the production process so that good products experience a small reduction in expected life (around 1%), while bad products fail. This level is usually established from examining the results of the HALT and experimenting with known good products. There are two problems with the 100% HASS. A stress level cannot always be found that will reduce the life of a good product by a small amount. Even if a level can be found, bad products sometimes do not fail during the screen. The sample-check HASS uses a small production sample to test the product to failure. The time-to-failure is tracked and is used to verify continuous improvement in the production process. A drop in the time-to-failure under the HASS would indicate a loss of production quality control.

FMVT can also be applied to the production line. Given a product with process variables, the optimum levels can be found by determining the best combination of levels that results in maximum product life and durability within product specifications. To accomplish this, a clear definition of failure must be quantified: A condition in which the product no longer meets its specification is a failure. In addition, a set of parts must be generated with the process variables changed to reflect different levels (high-low) about some assumed center.

The parts must be tested under identical conditions to determine the relative life of the product under the FMVT environment. From design iterations, the FMVT environment is established as the level (or one or two levels higher) at which the first failure mode occurred.

Based on these results, the combination of k variables that gives the steepest ascent is established as the new center. Another high-low set of parts is produced around that center, and the process is repeated until the optimum is found. The level by which the high and low is established around the center can be reduced to narrow the range of the optimum value. For example, for an injection-molded part with three process variables (mold temperature of 110°C, pressure of 500 MPa, and injection temperature of 150°C), the high and low values are set as follows:

  • Mold temperature: 100°C and 120°C (±10°C).
  • Pressure: 480 MPa and 520 MPa (±20 MPa).
  • Injection temperature: 140°C and 160°C (±10°C).

Eight parts are produced based on the above highs and lows (see Table I). Table II shows the results of the eight parts tested. From these results, choose the following conditions for the new center:

Part
Mold
Pressure
Injection
1
Low
Low
High
2
Low
High
Low
3
High
Low
Low
4
High
High
Low
5
High
Low
High
6
Low
High
High
7
High
High
High
8
Low
Low
Low

Table I. Production of injection-molded parts based on mold temperature, pressure, and injection temperature.

 

Part
Life
(min)
Life (min)
Life (min)
Life
(min)
Life (min)
Life
(min)
Life
(min)
Life
(min)
1
56.97
8.79
100.55
-52.42
0.004
533.48
--
--
2
71.29
8.79
108.92
-45.87
0.31
--
--
--
3
65.53
10.55
100.55
-45.87
0.09
--
--
--
4
72.36
10.55
108.92
-45.87
1.54
--
--
--
5
58.05
10.55
100.55
-52.42
0.39
--
--
--
6
63.80
8.79
108.92
-52.42
2.22
--
--
--
7
64.88
10.55
108.92
-52.42
4.72
--
--
--
8
64.46
8.79
100.55
-45.87
0.98
--
--
--

Table II. Test results of injection-molded parts.

  • Test results: 72.4°C.
  • Mold temperature: 120°C.
  • Pressure: 520 MPa.
  • Injection temperature: 140°C.

Reduce the ± value by half (see Table III). The new high and low values are as follows:

Mold (°C)
Presuure (MPa)
Injection (°C)
Life (min)
115
510
145
88.5
115
530
135
66.3
125
510
135
68.3
125
530
135
62.2
125
510
145
84.4
115
530
145
84.3
125
530
145
78.3
115
510
135
72.4
Table III. Test results after reducing the ± values by half.
  • Mold temperature: 115°C and 125°C (±5°C).
  • Pressure: 510 MPa and 530 MPa (±10 MPa).
  • Injection temperature: 135°C and 145°C (±5°C).

These values produce the test results shown in Tables IV and V.

Test Results (°C)

Mold (°C)

Pressure (MPa)

Injection (°C)

Value

72.36
120
520
140
10
88.46
115
510
145
5
Table IV. Test results at ±5 and ±10 to mold temperature, pressure, and injection temperature.

Mold (°C)
Pressure (MPa)
Injection (°C)
Life (min)
112.5
505
147.5
80.1
112.5
515
142.5
98.6
117.5
505
142.5
79.1
117.5
515
142.5
91.0
117.5
505
147.5
72.5
112.5
515
147.5
91.9
117.5
515
147.5
84.3
112.5
505
142.5
86.7
Table V. Results of ± values of 2.5.

Reduce the value again by half and reduce the step (see Table V). For values with new centers of 112.5°C, 515 MPa, and 142.5°C, see Table VI. Note that the ideal (based on the perfect life) is 113°C, 515 MPa, and 145°C. Also note that the mold temperature has a small range of acceptable values. The process is not as sensitive to mold temperature as it is to pressure or injection temperature (see Table VII).

The total testing time under FMVT (assuming four samples at a time) is 16 hours. This time frame assumes that the ideal part lasts two hours under test conditions. Bad parts should last one hour, but a full two hours of testing should be planned. Traditional testing of eight parts for a life of 500,000 cycles (at 6 cycles/min at a given temperature) = 231 days of testing for the same round of four iterations to find optimal conditions. However, if eight parts are tested to 500,000 cycles and all pass (which they are very likely to do after the first two iterations), it is difficult to identify which process to use. If a part has a good design margin after the first iteration, several combinations will make it to the 500,000 cycles. So, the time needed for testing using the traditional cycle rate is indeterminate, but it is more than 231 days.

Warranty and Life Expectancy

FMVT is useful for troubleshooting warranty issues. A standard FMVT can be run on the product with the emphasis on identifying and applying all possible stress sources. Once the standard FMVT is run, two possibilities exist. If the warranty issue was reproduced, troubleshooting can go to the next stage. Otherwise, a significant fact has been established: The warranty issue is due to a stress source that was not identified or applied. If this is the case, then the additional stress sources must be identified, and FMVT must be run again.

Mold (°C)
Pressure (MPa)
Injection (°C)
Life (min)
111
513
143
100
111
517
141
84.1
113
513
141
90.6
113
517
141
84.1
113
513
143
100
111
517
143
93.4
113
517
143
93.4
111
513
141
90.6
Table VI. Values with new centers of 112.5°C, 515 MPa, and 142.5°C.

Once the warranty issue has been reproduced, a narrower test of limited stresses and levels should be determined that reproduces the warranty problem in a short period. A test that produces the warranty failure mode on the current design in only a few hours can usually be produced. This test can then be used to test design solutions. Once a design solution is identified, a full FMVT should be conducted again.

Using FMVT to estimate life requires the existence of field data for a similar design. The design with field data is tested first using a standard FMVT to establish the stress level at which a hard failure first occurs (see Figure 7). The new design and the old design (the design with field data) should then be tested side by side at a stress level one or two steps above the level at which the first hard failure occurred. The relative time-to-failure under the accelerated conditions of the test is then used to calculate the redundant estimated time-to-failure for the new design.

Figure 7. Field data tested using FMVT to establish the stress level at which hard failure occurs.

Test Equipment

A wide range of accelerated testing equipment is available. The following is just a sampling of the types of equipment available.

FMVT Machine. The FMVT machine (Entela) is a patent-pending six-axis vibration machine designed to produce large displacement as well as a relatively uniform distribution of vibration energy.

Air Hammer Tables. Air hammer tables (Qualmark, Envirotronics, Thermotron, SSI, and others) are patented six-axis vibration machines designed to produce quasi-random high-frequency vibration.

Servo Hydraulic Machines. Servo hydraulic machines (Team, Burke Porter, MTS, and others) are also six-axis vibration machines. These machines are capable of producing low-frequency vibration and feature a highly controllable vibration profile.

Chambers. Environmental chambers (Envirotronics, Thermotron, Espec, and others) come in a variety of sizes, temperature ranges, humidity controls, ramp rates, and control methods. A typical chamber will have circulating fans, electric heaters, a refrigeration system, and humidity control ranging from freezing to the maximum chamber temperature.

Other Equipment. Other vibration and cyclic equipment is available that can provide custom and standard single-axis vibration, six-axis vibration, and cyclic loading.

In general, the loading requirements of a particular test will govern the equipment used. It should be noted that several test methods require custom setups of cyclic testing of a component using one or more of the machines described.

For each piece of equipment, performance parameters can be used to determine which machine is most appropriate for a particular test method and information goal. For the basic vibration machines, the basic parameters are load, displacement range, frequency range, and resonance characteristics of the design (see Table VIII).

Parameters Making Distinction
FMVT Machine
Air Hammer
Servo Hydraulic
Load/displacement range
100 lb max load, 60 G max, 4 in displacement
100 lb max load, 60 G max, <0.25 in. displacement
2000 lb max load, 14 G max, 2 in. displacement
Resonance characteristics
5-2500 Hz primary energy range (ramp down)
500-10,000 Hz primary energy range (ramp up)
2-200 Hz primary energy range (flat)
Table VIII. Parameters for vibration machines.

Conclusion

Because accelerated testing covers a broad range of methods and equipment, determining which method is best for a particular application can be challenging. Selecting the correct equipment and applying the appropriate method requires an understanding of available testing methods and how to use them. Understanding a product's failure modes can be the first step in improved electronic design.

Reference

1. Wayne Nelson, Accelerated Testing, Statistical Models: Test Plans, and Data Analyses (New York: Wiley Interscience, 1990).

Alexander J. Porter is business development manager with Entela Engineering and Testing Laboratories (Grand Rapids, MI). He can be reached at aporter@entela.com.

Back to Table of Contents