|
Using Accelerated Testing Methods to
Improve Electronics Design
Alexander J. Porter
Measuring reliability is a critical
part of electronic design. An understanding of available testing methods
can help choose the right method.
Accelerated test methods that enable faster development
and provide higher reliability have become a crucial part of evaluating
products. Because accelerated testing covers a broad range of methods
and equipment, determining which method is best for a particular application
can be challenging. Applying that method within a business structure
and selecting the correct equipment are also important.
Measuring the reliability of products has become
a mainstay in manufacturing businesses. Often, contracts are written
based on meeting specifications, which are based on a given number of
sample parts tested under narrowly described conditions with zero failures
(e.g., 12 parts to 1 life). The advantages and influence of contractual
requirements on the reliability tests are subtle but significant.
The ideal reliability test conducted in an engineering-driven
test plan would entail testing a significant number of samples to failure
and then conducting a reliability analysis on the time-to-failure of
the samples. The result would provide an accurate measure of the product
reliability.
|

|
Figure1. As the stress level rises, the time-to-failure drops
exponentially.
|
For products purchased for assembly into a larger
system, time-to-market requirements and the development timeline dictate
a fixed time under test. This limited time frame precludes using a test-to-failure
approach for reliability tests. Instead, a given number of the product
are tested to a fixed life, with zero failures allowed. This test demonstrates
the prod-uct's minimum reliability, while allowing for a fixed time
to conduct testing.
Contractual reliability is, therefore, a compromised
reliability. The reliability measure of a product sample is never actually
taken. As a result, the product may be overdesigned or marginally designed.
In fact, it is quite simple to demonstrate that many standard contracted-reliability
measures have an inherently uncertain result. For example, look at a
case in which 12 samples are tested to 1 life. The requirement is zero
failures. This demonstrates a given level of reliability with a fairly
low confidence of 46%. If the product has a 95% reliability, there is
a nearly 46% chance that at least 1 of the 12 samples will fail, whereas
there is a 54% chance that all 12 parts will pass. This means testing
12 samples provides almost even odds that the test will pass if the
population is truly 95% reliable. Although this alone may not prove
much, it allows for a clean, measurable contract.
Several accelerated test methods, including accelerated
reliability, focus on overcoming limitations of traditional reliability
tests. Accelerated reliability uses a simple premise and some complex
math to achieve a reliability estimate of a product for particular conditions.
As a particular source of stress is increased, time-to-failure exponentially
decreases. This effect is used to design the accelerated reliability
test. A simple example of this effect is the thermal degradation of
insulating material. As the stress level rises, the time-to-failure
drops exponentially. The logarithmic rate of change is also affected
by the change in the physics of failure. Both of these effects are used
in accelerated reliability (see Figure 1).
In an accelerated reliability test, several sets
of parts are tested at stress levels much higher than the expected service
level. For example, eight parts might be tested simultaneously (Test
1) until four parts fail (see Figure 2). This would establish one point
on the accelerated reliability graph. Next, another eight parts would
be tested at a slightly different stress level until four parts fail
(Test 2). Testing groups of parts at different stress levels would continue
until enough data were collected to extrapolate the service condition
time-to-failure.
|
Figure 2.
An accelerated reliability test. For each test, eight parts are
tested until four parts fail.
|

|
The problem is that every stress source and failure
mechanism has a different characteristic exponential relationship. Wayne
Nelson provides several examples of empirically determined accurate
math models of different stress and failure combinations.1
Three models are shown here:
• Arrhenius-Weibull Model
F(t, T) = 1 – exp{–[t exp[ –go– (g1/T)]]b}
• Power-Lognormal Model
F(t,V) = F{[log(t) – µ (x)]/s}
• Cox (Proportional Hazards) Model
Ro(t) = exp[– Úot ho(t)dt]
Another method for accelerating tests increases
the stress levels similar to the accelerated reliability test described
above, but starts at service conditions after a traditional reliability
test. Step stress testing is a combination of traditional reliability
testing and overstress testing. The purpose of step stress testing is
to demonstrate one life of a product and then overstress the product
incrementally to find failure modes. The advantage of such a test for
many supplier-purchaser relationships is the ease with which the contracted
specification required can be extended to conduct a step stress test
(see Figure 3).
|

|
Figure 3. For step stress
testing, the product’s destruct limit for a given stress source
is divided into 10 even steps for the service and stress level.
|
The challenge of the test is in setting the amount
by which the stress sources (such as temperature, voltage, and vibration)
will be increased. Because each stress source affects different failure
modes and each failure mode is accelerated at a different rate, setting
a good proportional increase for each stress source is important.
A simple method for setting the stress levels
is to determine the product's destruct limit for a given stress source
(for example, 160°C) and divide the stress levels between the service
level (80°C) and the destruct level (160°C) into 10 even steps.
This method has two advantages. It is easy to implement, and it ensures
that the product will exhibit a fairly even increase in damage from
each stress source, provided that the product was experiencing an even
amount of damage from each stress source at service conditions. A properly
designed and optimized product accumulates stress damage evenly throughout
the product, so this assumption is reasonable.
An alternative is to identify the exponential
relationship between increasing each stress and the effect on failure
modes to determine a step level in which the stress damage accumulated
from each source is proportionate to service conditions. This alternative
requires an extensive application of accelerated reliability tests for
each stress source combination.
The two methods of accelerating tests address
different problems encountered with traditional reliability tests. One
way to overcome the problems with using the traditional reliability
test is to measure the failure modes that produce the statistics, rather
than measuring the statistics.
Highly accelerated life testing (HALT) is a test
method usually applied to solid-state electronics that determines failure
modes, operational limits, and destruct limits. This test method differs
significantly from reliability tests. HALT does not determine a statistical
reliability or (despite its acronym) determine an estimated life. HALT
applies a single stress source to a product at elevated levels to determine
the levels at which the product stops functioning but is not destroyed
(operational limit), the levels at which the product is destroyed (destruct
limit), and the failure modes that cause destruction.
HALT has a significant advantage over traditional
reliability tests in identifying failure modes in a very short period.
Traditional reliability tests take a long time (from a few days to several
months). HALT typically takes two or three days. HALT also identifies
several failure modes, providing significant information for design
engineers to improve a product. Typical reliability tests provide only
one or two failure modes, if the product fails at all.
HALT typically uses three stress sources: temperature,
vibration, and electrical power. Each stress source is applied starting
at some nominal level (for example, 30°C) and is then elevated
in increments until the product stops functioning. The product is then
brought back to the nominal conditions to see whether it is functional.
If the product is still functional, then the level at which the product
stopped functioning is labeled the operational limit. The product is
then subjected to stress levels above the operational limit, returning
to nominal levels each time until the product fails to function. The
maximum level at which the product functioned before failing to operate
at nominal conditions is labeled the destruct limit. This process is
repeated for hot temperature, cold temperature, temperature ramp rate,
vibration, and voltage. The process is also repeated for combined stresses.
HALT has two significant disadvantages. Without
a statistical reliability measure, the method does not fit well into
the requirements for contracting between suppliers and purchasers. This
relationship requires an objective measure that can be written into
a contract. Some schemes have been suggested that would allow the objective
measure of the relationship between the operational limit and the service
conditions. Another disadvantage to HALT is the amount of time the test
method can take to address a significant number of stress sources. Each
stress source tested requires about one day to test one or two sample
products. Because HALT is usually applied to solid-state electronics,
the stress sources are limited to hot, cold, ramp, vibration, and voltage.
Covering all combinations requires six parts (including the combined
environment test) and four or five days (eight-hour days). However,
applying the method to 10 or 20 stress sources increases the number
of parts to 11 or 21, respectively, and the days of testing to 10 or
20, respectively.
One challenge with testing-to-failure is accounting
for the business environment. Questions include: How does one contract
a test to failure? Which failures will be addressed and which ones allowed?
A contract built around a statistical test is well understood. A contract
based on failing the product is more difficult to explain.
To quantify a product's development and potential,
the product is first tested using the failure mode verification testing
(FMVT) process. During FMVT, the product is exposed to all known stress
sources (sources of potential damage to the product) simultaneously.
Stresses can include vibration, temperature, humidity, mechanical loads,
electrical loads, radiant heat, pressure, etc. Stresses are applied
so as to randomize their effects. For example, vibration would be a
random six-axis profile. Mechanical loads would be random relative to
each other, and electrical loads would be randomized. The goal is to
apply all known stresses simultaneously to produce random stress throughout
the product. Initially, the stresses are applied at the maximum expected
service conditions. Over time, the stresses are increased.
As the test progresses, the product will experience
failures. The failures will be at locations in the design that accumulate
stress damage faster than the rest of the product. These locations are
the weak points in the design. Because all known stress sources are
applied randomly and increasingly, the relative order of the failure
modes is approximately the order of significance of the failure modes.
The resulting unique failure modes can be plotted for a single product
(see Figure 4).
|
Figure 4. Failure mode progression.
|

|
With this information, design maturity can be
quantified. The theory is based on a couple of simple assumptions. First,
the product is feasible, which means that although the design may have
weaknesses, the basic idea is viable and the design must only be iterated
to work. Second, an optimized, robust design accumulates stress damage
evenly throughout the product so that when one part of an optimized
design fails, the rest of the design is near failure also. Based on
these two assumptions, relative design maturity can be determined from
the failure-mode progression. If the first failure mode occurs early
and is separated from the rest of the failure modes, then the product
is immature and can be improved. If all failure modes occur close together
and after a significant period, then the product is mature. Figure 5
shows a mature design. If the first failure modes were addressed, the
design's expected life would not change because the next failure mode
is at nearly the same time. However, addressing the first failure mode
in the product represented in Figure 4 would result in a significant
improvement.
This potential for improvement can be quantified
as design maturity (DM). DM is equal to the average time between failures
after the first failure divided by the time to first failure. This calculation
gives the average potential improvement in life under the accelerated
test gained by fixing one failure mode. The product in Figure 4, which
has a DM of 0.42, has an average potential improvement in life of 42%
by fixing one failure. The product in Figure 5, however, has a potential
improvement of only 2%. Clearly, the second design is more mature; that
is, it has less room for improvement.
|

|
Figure 5. Failure mode progression of a mature
design.
|
DM tells only part of the story. The maturity
of a design provides a measure of how much better the product could
be under the accelerated stress conditions. A relative measure of a
product's life is also needed if products are going to be compared.
Figure 6 shows the failure-mode progression for a group of products
that are assembled into a system. Although all components have a significant
potential for improvement (3760%), the products do not share the
same potential. For example, Product 3 has a potential improvement of
49%, but in the accelerated environment, it can easily reach a life
of more than 300 minutes by having one failure mode fixed. Although
Product 4 has a 60% potential for improvement, fixing one failure mode
would only give a life of around 175 minutes in the accelerated environment.
|
Figure 6. Failure mode progression for a system of products.
|

|
A quantification of the potential limit of the
design is also needed. This technological limit can be defined by removing
failure modes and recalculating the DM until it is <0.1. The time
of the first remaining failure mode is the technological limit.
With Product 1, for example, eliminating the
first failure mode would result in a DM of 0.06. Therefore, the technological
limit is the time of the second failure mode, or 250 minutes. Using
this definition, the technological limits of each component are 250,
240, 310, and 220 for Products 14, respectively. In other words,
if these designs are iterated, the best they are expected to get to
is their technological limit. As a system, then, it is only worthwhile
to address the failure modes below the technological limit.
In addition, the maturity of a system as a whole
can be determined. If it is assumed that the individual components will
be brought up to their technological limit, how well balanced will the
life of the components be? Using the technological limit to calculate
a design maturity provides the system design maturity. In this case,
it would be 0.13, which means that replacing the component with the
worst potential (Product 4) would have a potential of improving the
life of the system by 13%. This scenario assumes that the products have
first been iterated to meet their technological limit.
Highly accelerated stress screen (HASS) is a
production screen usually applied to a product developed using the HALT
method. Unlike the increasing stress levels in HALT, HASS uses a constant
level significantly above the service conditions. HASS also may use
a limited selection of stress sources such as temperature only. There
are two types of HASS: a 100% screen and a sample check.
The premise of a 100% HASS screen is to apply
a stress level for a set duration to all products from the production
process so that good products experience a small reduction in expected
life (around 1%), while bad products fail. This level is usually established
from examining the results of the HALT and experimenting with known
good products. There are two problems with the 100% HASS. A stress level
cannot always be found that will reduce the life of a good product by
a small amount. Even if a level can be found, bad products sometimes
do not fail during the screen. The sample-check HASS uses a small production
sample to test the product to failure. The time-to-failure is tracked
and is used to verify continuous improvement in the production process.
A drop in the time-to-failure under the HASS would indicate a loss of
production quality control.
FMVT can also be applied to the production line.
Given a product with process variables, the optimum levels can be found
by determining the best combination of levels that results in maximum
product life and durability within product specifications. To accomplish
this, a clear definition of failure must be quantified: A condition
in which the product no longer meets its specification is a failure.
In addition, a set of parts must be generated with the process variables
changed to reflect different levels (high-low) about some assumed center.
The parts must be tested under identical conditions
to determine the relative life of the product under the FMVT environment.
From design iterations, the FMVT environment is established as the level
(or one or two levels higher) at which the first failure mode occurred.
Based on these results, the combination of k
variables that gives the steepest ascent is established as the new center.
Another high-low set of parts is produced around that center, and the
process is repeated until the optimum is found. The level by which the
high and low is established around the center can be reduced to narrow
the range of the optimum value. For example, for an injection-molded
part with three process variables (mold temperature of 110°C, pressure
of 500 MPa, and injection temperature of 150°C), the high and low
values are set as follows:
-
Mold temperature:
100°C and 120°C (±10°C).
-
Pressure: 480 MPa
and 520 MPa (±20 MPa).
-
Injection temperature:
140°C and 160°C (±10°C).
Eight parts are produced based on the above highs
and lows (see Table I). Table II shows the results of the eight parts
tested. From these results, choose the following conditions for the
new center:
|
Part
|
Mold
|
Pressure
|
Injection
|
|
1
|
Low
|
Low
|
High
|
|
2
|
Low
|
High
|
Low
|
|
3
|
High
|
Low
|
Low
|
|
4
|
High
|
High
|
Low
|
|
5
|
High
|
Low
|
High
|
|
6
|
Low
|
High
|
High
|
|
7
|
High
|
High
|
High
|
|
8
|
Low
|
Low
|
Low
|
Table I. Production of
injection-molded parts based on mold temperature, pressure,
and injection temperature.
|
|
Part
|
Life
(min)
|
Life (min)
|
Life (min)
|
Life
(min)
|
Life (min)
|
Life
(min)
|
Life
(min)
|
Life
(min)
|
|
1
|
56.97
|
8.79
|
100.55
|
-52.42
|
0.004
|
533.48
|
--
|
--
|
|
2
|
71.29
|
8.79
|
108.92
|
-45.87
|
0.31
|
--
|
--
|
--
|
|
3
|
65.53
|
10.55
|
100.55
|
-45.87
|
0.09
|
--
|
--
|
--
|
|
4
|
72.36
|
10.55
|
108.92
|
-45.87
|
1.54
|
--
|
--
|
--
|
|
5
|
58.05
|
10.55
|
100.55
|
-52.42
|
0.39
|
--
|
--
|
--
|
|
6
|
63.80
|
8.79
|
108.92
|
-52.42
|
2.22
|
--
|
--
|
--
|
|
7
|
64.88
|
10.55
|
108.92
|
-52.42
|
4.72
|
--
|
--
|
--
|
|
8
|
64.46
|
8.79
|
100.55
|
-45.87
|
0.98
|
--
|
--
|
--
|
|
Table II.
Test results of injection-molded parts.
|
Reduce the ± value by half (see Table III).
The new high and low values are as follows:
|
Mold (°C)
|
Presuure (MPa)
|
Injection (°C)
|
Life (min)
|
|
115
|
510
|
145
|
88.5
|
|
115
|
530
|
135
|
66.3
|
|
125
|
510
|
135
|
68.3
|
|
125
|
530
|
135
|
62.2
|
|
125
|
510
|
145
|
84.4
|
|
115
|
530
|
145
|
84.3
|
|
125
|
530
|
145
|
78.3
|
|
115
|
510
|
135
|
72.4
|
| Table III. Test
results after reducing the ± values by half. |
-
Mold temperature:
115°C and 125°C (±5°C).
-
Pressure: 510 MPa
and 530 MPa (±10 MPa).
-
Injection temperature:
135°C and 145°C (±5°C).
These values produce the test results shown in
Tables IV and V.
|
|
|
|
|
|
|
72.36
|
120
|
520
|
140
|
10
|
|
88.46
|
115
|
510
|
145
|
5
|
| Table IV. Test
results at ±5 and ±10 to mold temperature, pressure, and injection
temperature. |
|
Mold (°C)
|
Pressure (MPa)
|
Injection (°C)
|
Life (min)
|
|
112.5
|
505
|
147.5
|
80.1
|
|
112.5
|
515
|
142.5
|
98.6
|
|
117.5
|
505
|
142.5
|
79.1
|
|
117.5
|
515
|
142.5
|
91.0
|
|
117.5
|
505
|
147.5
|
72.5
|
|
112.5
|
515
|
147.5
|
91.9
|
|
117.5
|
515
|
147.5
|
84.3
|
|
112.5
|
505
|
142.5
|
86.7
|
| Table V. Results
of ± values of 2.5. |
Reduce the value again by half and reduce the
step (see Table V). For values with new centers of 112.5°C, 515
MPa, and 142.5°C, see Table VI. Note that the ideal (based on the
perfect life) is 113°C, 515 MPa, and 145°C. Also note that
the mold temperature has a small range of acceptable values. The process
is not as sensitive to mold temperature as it is to pressure or injection
temperature (see Table VII).
The total testing time under FMVT (assuming four
samples at a time) is 16 hours. This time frame assumes that the ideal
part lasts two hours under test conditions. Bad parts should last one
hour, but a full two hours of testing should be planned. Traditional
testing of eight parts for a life of 500,000 cycles (at 6 cycles/min
at a given temperature) = 231 days of testing for the same round of
four iterations to find optimal conditions. However, if eight parts
are tested to 500,000 cycles and all pass (which they are very likely
to do after the first two iterations), it is difficult to identify which
process to use. If a part has a good design margin after the first iteration,
several combinations will make it to the 500,000 cycles. So, the time
needed for testing using the traditional cycle rate is indeterminate,
but it is more than 231 days.
Warranty and Life Expectancy
FMVT is useful for troubleshooting warranty issues.
A standard FMVT can be run on the product with the emphasis on identifying
and applying all possible stress sources. Once the standard FMVT is
run, two possibilities exist. If the warranty issue was reproduced,
troubleshooting can go to the next stage. Otherwise, a significant fact
has been established: The warranty issue is due to a stress source that
was not identified or applied. If this is the case, then the additional
stress sources must be identified, and FMVT must be run again.
|
Mold (°C)
|
Pressure (MPa)
|
Injection (°C)
|
Life (min)
|
|
111
|
513
|
143
|
100
|
|
111
|
517
|
141
|
84.1
|
|
113
|
513
|
141
|
90.6
|
|
113
|
517
|
141
|
84.1
|
|
113
|
513
|
143
|
100
|
|
111
|
517
|
143
|
93.4
|
|
113
|
517
|
143
|
93.4
|
|
111
|
513
|
141
|
90.6
|
| Table VI. Values
with new centers of 112.5°C, 515 MPa, and 142.5°C. |
Once the warranty issue has been reproduced,
a narrower test of limited stresses and levels should be determined
that reproduces the warranty problem in a short period. A test that
produces the warranty failure mode on the current design in only a few
hours can usually be produced. This test can then be used to test design
solutions. Once a design solution is identified, a full FMVT should
be conducted again.
Using FMVT to estimate life requires the existence
of field data for a similar design. The design with field data is tested
first using a standard FMVT to establish the stress level at which a
hard failure first occurs (see Figure 7). The new design and the old
design (the design with field data) should then be tested side by side
at a stress level one or two steps above the level at which the first
hard failure occurred. The relative time-to-failure under the accelerated
conditions of the test is then used to calculate the redundant estimated
time-to-failure for the new design.
|

|
| Figure
7. Field data tested using FMVT to establish the stress level at
which hard failure occurs. |
A wide range of accelerated testing equipment
is available. The following is just a sampling of the types of equipment
available.
FMVT Machine. The FMVT machine (Entela)
is a patent-pending six-axis vibration machine designed to produce large
displacement as well as a relatively uniform distribution of vibration
energy.
Air Hammer Tables. Air hammer tables (Qualmark,
Envirotronics, Thermotron, SSI, and others) are patented six-axis vibration
machines designed to produce quasi-random high-frequency vibration.
Servo Hydraulic Machines. Servo hydraulic
machines (Team, Burke Porter, MTS, and others) are also six-axis vibration
machines. These machines are capable of producing low-frequency vibration
and feature a highly controllable vibration profile.
Chambers. Environmental chambers (Envirotronics,
Thermotron, Espec, and others) come in a variety of sizes, temperature
ranges, humidity controls, ramp rates, and control methods. A typical
chamber will have circulating fans, electric heaters, a refrigeration
system, and humidity control ranging from freezing to the maximum chamber
temperature.
Other Equipment. Other vibration and cyclic
equipment is available that can provide custom and standard single-axis
vibration, six-axis vibration, and cyclic loading.
In general, the loading requirements of a particular
test will govern the equipment used. It should be noted that several
test methods require custom setups of cyclic testing of a component
using one or more of the machines described.
For each piece of equipment, performance parameters
can be used to determine which machine is most appropriate for a particular
test method and information goal. For the basic vibration machines,
the basic parameters are load, displacement range, frequency range,
and resonance characteristics of the design (see Table VIII).
|
Parameters Making Distinction
|
FMVT Machine
|
Air Hammer
|
Servo Hydraulic
|
|
Load/displacement range
|
100 lb max load, 60 G max, 4 in displacement
|
100 lb max load, 60 G max, <0.25 in. displacement
|
2000 lb max load, 14 G max, 2 in. displacement
|
|
Resonance characteristics
|
5-2500 Hz primary energy range (ramp down)
|
500-10,000 Hz primary energy range (ramp up)
|
2-200 Hz primary energy range (flat)
|
| Table VIII. Parameters
for vibration machines. |
Because accelerated testing covers a broad range
of methods and equipment, determining which method is best for a particular
application can be challenging. Selecting the correct equipment and
applying the appropriate method requires an understanding of available
testing methods and how to use them. Understanding a product's failure
modes can be the first step in improved electronic design.
1. Wayne Nelson, Accelerated Testing,
Statistical Models: Test Plans, and Data Analyses (New York: Wiley
Interscience, 1990).
Alexander J. Porter is business development
manager with Entela Engineering and Testing Laboratories (Grand Rapids,
MI). He can be reached at aporter@entela.com.
Back
to Table of Contents
|