Thursday, November 21, 2024

Strategies for Maintaining Optimal Cooling in Data Centers

Power and efficient cooling are critical components of data center operations, whether they belong to a cloud hyperscaler, a commercial colocation facility, or an enterprise’s own data center. Without adequate power and effective heat removal, servers become inoperable.

The increasing demand for energy and cost-effective space, combined with a desire for enhanced connectivity and automation, has permitted data center operators to shift away from urban locations. In Europe, this shift has meant leaving business districts, while in the United States, operators gravitate towards states like Arizona, Nevada, and Texas, where land is more affordable. Nevertheless, advancements in computing technology and the rising demands from services such as artificial intelligence (AI) are radically transforming both the logistics and economics of data center setup.

Regulatory pressure on power grids and water supplies on both sides of the Atlantic is constraining development. The demand for power is projected to continue escalating as operators aim to accommodate more equipment, engineers design more compact server packages, and applications increasingly rely on power-intensive graphics processing units (GPUs) and other specialized processors.

Moreover, the rapid expansion of artificial intelligence is contributing additional pressures. For instance, a 2023 study from researchers at the University of California, Riverside, found that ChatGPT’s large language model (LLM) consumes 500 milliliters of water to handle five to 50 queries, almost entirely for cooling purposes. Furthermore, according to Alvin Nguyen, a senior analyst at Forrester, generating an image via an LLM demands approximately the same amount of energy as a gasoline-powered car would need to travel one mile.

Such trends are challenging the traditionally cautious data center sector and are urging CIOs to explore alternative technologies. While an enterprise or data center operator may have limited capability to enhance power grids, they can still adopt more energy-efficient chips; however, the overarching trend indicates that power consumption in data centers is likely to rise.

Tony Lock, distinguished analyst at Freeform Dynamics, indicates that the increasing digitization of business processes makes this rise in power unavoidable. Tasks once performed manually are transitioning to computerized solutions, with workloads shifting from offices or data rooms to data centers or cloud environments. “As the data center becomes responsible for a growing share of business service delivery, the onus falls on the data center manager to address rising electricity costs,” he states.

One potential approach to immediately improve both financial performance and operational efficiency involves upgrading cooling systems. Enhanced cooling efficiency allows operators to increase their equipment density within data centers. This is particularly crucial for running GPUs used in AI applications; for instance, Nvidia’s Blackwell platform necessitates liquid cooling, yet it boasts up to 25% savings in costs and energy usage compared to earlier models. Industry consensus suggests that around 40% of data center power consumption goes toward cooling.

Cooling Approaches

Traditionally, data centers use air circulation for cooling, relying on fans installed in racks and servers, in addition to computer room air-conditioning (CRAC) or computer room air handler (CRAH) systems to maintain ambient temperatures. These systems typically discharge waste heat outdoors, and if evaporative cooling is employed, it consumes both electricity and water.

The design and capacity of CRAC and CRAH units often dictate the layout and dimensions of the data center. As David Watkins, solutions director at Virtus Data Centres, explains, each unit is engineered to cool a specific kilowatt (kW) capacity and has a limited effective air distribution range, which guides designs for the building footprint and placement of racks.

Utilizing raised floor cooling systems promotes upward airflow, guiding cool air from below up through racks and into CRAC intakes for cooling.

To boost air cooling efficiency, data center engineers also implement hot and cold aisle designs. This layout separates incoming cool air and outgoing warm air, enhancing airflow cycles and stabilizing ambient temperatures throughout the facility, creating a more comfortable atmosphere for personnel. However, traditional air cooling is noisy, costly, and becomes inefficient as computing power escalates; beyond around 40kW, air alone can no longer cool effectively, as Watkins notes.

Consequently, data center operators are compelled to explore alternative solutions.

Liquid Cooling: A Viable Alternative

Air cooling has evolved over decades, offering improved efficiency, yet it is still seen as limited. As Steve Wallage, managing director of Danseb Consulting, points out, while there have been incremental advancements, air cooling remains established. Innovations, like KyotoCooling’s thermal wheel technology, have shown the potential for substantial savings—up to 80% compared to conventional cooling—but these tend to remain niche due to a lack of widespread adoption.

Liquid cooling has emerged as a prominent alternative, particularly for high-performance computing (HPC) and AI systems. Many high-performance systems now come equipped with built-in liquid cooling, which, despite its complexity, offers superior efficiency and often requires less water compared to air cooling setups.

Liquid cooling options include direct-to-chip systems, immersion cooling (where devices are submerged in a non-conductive liquid), and various cooling modalities for racks that typically use specialized oils instead of water. Immersion systems often necessitate close collaboration with hardware manufacturers to ensure safe operation without damaging components. Direct-to-chip cooling setups usually come preconfigured, allowing data center operators to connect them to integral systems like heat exchangers or cooling distribution units with relative ease.

“There are existing technologies that leverage direct liquid cooling,” states Nguyen from Forrester, adding that space constraints can limit how densely these systems operate, as they require networked pipes to heat-producing chips. On the other hand, simpler solutions, such as rear door or in-row cooling units, allow for easier retrofitting in existing setups without direct contact with the chips, reducing risk while delivering effective performance.

An Integrated Cooling Future

Experts assert that while liquid cooling signifies an innovative progression in cooling technology, it cannot wholly displace air cooling in data centers at this point. Alistair Barnes, head of mechanical engineering at Colt Data Centre Services, points out the ongoing need for air to dissipate heat that has been transferred through liquid cooling. He advocates for a hybrid solution where both cooling modalities collaborate to optimize power efficiency.

The challenges of adopting liquid cooling—like increased weight from heavier racks that may exceed structural limits—are prompting operators to explore even more avant-garde solutions. For instance, some data centers leverage free-air cooling successfully in cooler climates or are even relocating to salt mines where humidity is minimal. Others are integrating technologies that allow waste heat to be used innovatively, such as heating municipal buildings or swimming pools.

While not every data center can incorporate radical changes or make unusual relocations, CIOs have the opportunity to strategically invest in advanced cooling technologies and interrogate whether their cloud and colocation services are utilizing more economical and environmentally friendly cooling methods.