Back to Industries & Applications
Data centers are critical infrastructure for our increasingly connected world. However, the environments in which they operate can be difficult for the equipment inside them. Factors like heat, humidity and dust can lead to system failures, downtime and data loss. And this reinforces the need for thorough environmental testing.
Since data centers operate around the clock, the components that keep them running need to be tested for reliability over a long period of time — a difficult scenario for continuous duty devices with multi-year lifespans. By exposing components to extreme conditions beyond what they would experience under typical operations, engineers can perform Accelerated Life Testing (ALT) to more quickly determine when a part will fail and better optimize them for their intended environment.
But even ALT has its challenges. In data centers, the traditional operating environment has been air cooling. However, liquid immersion is emerging as a preferred cooling technique, and current standards and test methods do not address the unique variables involved in this cooling scenario.
How can today’s system architects and design engineers optimize their devices for long-term reliability? ALT is a great place to start.
What is Accelerated Life Testing?
Accelerated Life Testing (ALT) is the process in which products or components are subject to extreme conditions outside of standard operational parameters to artificially age the item under test, identify faults and predict performance under normal operation. Typical factors include thermal cycling, humidity, shock and vibration along with other criteria. For a data center where systems and devices often operate continuously and for extended periods of time, traditional testing may take years. ALT expedites the process and allows a manufacturer to significantly reduce testing times, accelerate product development and determine the overall product lifespan.
Types of ALT
Although sometimes considered different classifications of tests, ALT can generally be broken into two categories — quantitative and qualitative — each containing a multitude of test types.
Quantitative ALT Methods
In quantitative ALT, the goal is to determine the predicted lifespan of a device by speeding up the time-to-failure and produce data to measure the reliability under a specific influencer. Typically, this is done using one of two general types:
Overstress acceleration – This is the preferred method for continuously-operating or very high usage products that are exposed to stresses exceeding normal use. For instance, a product or component may be exposed to very high temperatures under the notion that extreme temperature exposure over shortened periods of time accurately simulates normal temperature exposure over the expected lifespan. Similar tests can be done for factors like humidity and vibration. Because of the continuous nature of data centers, overstress acceleration tests are critical.
Usage rate acceleration – For products that do not operate continuously, these tests are used to more quickly simulate failure by performing a function at a faster or more frequent rate. For instance, connectors are tested to determine their mating cycles, or the number of times a connector can be connected and disconnected without failing to meet performance specifications. To expedite the test, the connect and disconnect process can be performed more rapidly when the mechanical forces involved remain the same as under normal operating conditions, and it is only the frequency that changes.
Qualitative ALT Methods
Where quantitative ALT produces data to measure how long a product can perform under specific stresses, qualitative ALT identifies the cause of failure and is often performed on a smaller sample size. Qualitative ALT tests vary but may include:
Highly Accelerated Life Testing (HALT) – Within HALT, a product is subject to a variety of simultaneous and independent stresses, such as temperature and vibration, to identify where and why a failure occurs. Although the stresses may be the same or similar to quantitative ALT, the goal of HALT is not to measure how long a product performs but rather to identify how it fails.
Highly Accelerated Stress Screen (HASS) – After HALT is finalized and design is complete, HASS can act as a final test to ensure reliability at the start of manufacturing. Although HASS exposes a product under test to the same stresses as HALT, HASS is specifically used as part of the production screening process.
Variations of qualitative ALT tests include shake and bake testing, torture tests and elephant tests.
ALT Testing of Connectors: EIA-364
The EIA-364 Electrical Connector/Socket Test Procedures Including Environmental Classifications standard establishes recommended minimum test sequences and procedures for electrical connectors and sockets, including ALT. Each EIA-364 standard assesses specific criteria, such as mating and un-mating force (EIA-364-13), humidity (EIA-364-31), durability (EIA-364-09) or thermal cycling (EIA-364-110) and serves as a baseline for connector performance based on the environments in which they will be deployed.
For data center equipment, the EIA-364-1000 Environmental Test Methodology for Assessing the Performance of Electrical Connectors and Sockets Used in Controlled Environment Applications is uniquely applicable. Originally designed for business office applications, ECIA-364-1000 covers relatively mild, controlled environment use, such as devices within data centers.
Although EIA-364 tests are recommendations and not requirements, they have become industry standard and serve as the ALT guidelines for many manufacturers.
Challenges of ALT in Liquid Environments
While EIA-364 and other ALT standards provide clear reliability guidelines for traditional air environments, ALT for components used within liquid immersion cooling applications is much less defined. Challenging this is the fact that there are already more than a dozen proprietary dielectric liquids on the market, all performing differently. Does that mean a manufacturer needs to perform ALT for 12+ liquids in addition to air? Will different products need to be manufactured per medium?
The Open Compute Project (OCP) Immersion Project aims to answer these questions and more using input and insights from industry experts to form a working group devoted to liquid immersion cooling. While air cooling has been the traditional method to lower server temperatures in data centers, immersion cooling has proven more energy efficient and cost effective while requiring less space. Through the Immersion Project, OCP is working to establish standardized definitions, specifications, compatibility requirements and best practices for both immersion solutions and immersion-ready equipment.
Ideally, through the guidance of organizations like OCP, manufacturers will be able to design one product that performs reliably across all liquid and air-cooled environments. For system architects and design engineers, this will simplify the BOM and minimize the risk of confusion and error. And engineers are excited to get ahead of the game. In fact, a recent Molex Reliability and Hardware Design Survey found that 51% of the 756 respondents already strive to meet possible future industry reliability certifications and standards, in addition to current requirements.
Molex is Paving the Way to More Reliable Data Centers
As an industry pioneer in high-speed data center applications, Molex invests heavily in ALT capabilities and is an active contributor to OCP, including the Immersion Project. We’re committed to ensuring reliable data center performance no matter the medium, and our broad portfolio of interconnect solutions is designed to meet current and evolving EIA-364 and OCP guidelines.
For more insights into Reliability and Hardware Design survey, explore the results here.
Related Content