Study by Coolcentric and Purdue University Demonstrates Reliability and Availability of a Liquid Cooled Data Center

When Purdue University needed a retrofit for its HPC data center growth and densification, it turned to liquid cooling as the most performance- and cost-efficient solution. Liquid cooling at Purdue is provided by passive rack door heat exchangers, fed and controlled by Coolant Distribution Units (CDUs). Each CDU can effectively remove sensible heat from several 42U racks. The use of CDU technology represents a distributed heat removal paradigm, one that requires less energy than traditional computer room air conditioning methods.

A new white paper demonstrates that thoughtful configuration of Rear Door Heat Exchangers and CDUs provides reliable cooling for the data center and continued availability. A section of Purdue’s data center is designed such that racks are cooled by a plurality of CDUs and their accompanying rack door heat exchangers. The system is interleaved into groups; each group cooled by several CDUs, effectively creating a physical heat removal system where adjacent racks are not cooled by the same CDU. This design is intended to reduce "hot spots" in the room by spreading the heat over a wider area, so that racks adjacent to those that have lost cooling due to the failure of a CDU can aid their overheated neighbors.

To test whether the implementation had the redundancy and availability needed for high availability applications, Purdue and Coolcentric conducted a study that examined if, in the event of a CDU failure, the remaining CDUs could provide reliable cooling for the data center and provide continued availability. By simulating the failure of a single CDU in a live HPC data center, the team was able to study the reactions of the remaining CDUs, the effects on the racks affected by the CDU failure, and those racks still serviced by the remaining CDUs. The failure condition consisted of denying cooling water to half of the doors in the data center. The two recovery conditions tested were: (1) utilizing a CRAC unit as back up, and (2) increasing water flow in the active (non-failed) CDU, thus lowering water temperature for maximizing performance to compensate for loss of cooling. In both scenarios, temperatures in the hot and cold aisles remained stable, as did the IT junction temperatures.

The study shows that passive rear door heat exchangers can supply a significant amount of cooling in a modern compute environment. The impact of an individual component failure can be minimized by simply distributing the heat exchangers across a number of racks in the data center, so that any single cooling distribution unit serves racks scattered over a wide area. Cooling can be maintained in the presence of failure with either a small amount of supplementary cooling, or by delivering more cooling from the non-failed CDU simply by lowering the door water supply temperature.