AI Makes Liquid Cooling Normal – And Necessary – Again

All trends with AI point up and to the right, and usually pretty sharply. The size of AI workloads, the number of AI companies popping up, and the size and complexity of AI systems just to name a few.

As a consequence of needing to drive performance and the need to keep latencies between components low, therefore driving up compute density, the amount of power is going way up and the space it has to be pushed into and extracted out of keeps going down. It might have been a shock to hear about Microsoft’s plans to go nuclear – via a currently shut-down nuclear reactor on Three Mile Island in Pennsylvania, no less – but really it shouldn’t have come as a surprise.

In a report earlier this year, Goldman Sachs analysts predict that AI workloads – coupled with the slowing of efficiency gains – will fuel a 160 percent growth in datacenter power demand by 2030, pushing the amount of power the facilities consume globally from 1 percent to 2 percent now to 3 percent to 4 percent by then. It is going to be an ongoing problem for everything from power consumption to the environment to costs to enterprises and cloud infrastructure providers, particularly as business adoption of AI skyrockets. A McKinsey survey in May found that 65 percent of respondents said their organizations regularly use generative AI, almost twice the number from a similar survey ten months earlier.

Given that, it’s not surprising that system makers are looking at more efficient ways to manage the thermal demands that are coming with AI and the increasingly dense systems – including those that are adopting such components as Nvidia’s power-hungry “Blackwell” B100 and B200 GPU accelerators – that are running them. That includes pushing liquid cooling for rack systems. Liquid cooling isn’t new; it’s something that has been offered for years. However, now the technology is taking on a greater urgency.

For many who remember the IBM mainframes of the 1960s, 1970s, and 1980s and Cray supercomputers from the 1970s and 1980s, this liquid cooling is a bit of a flashback. Get used to it.

In March, Nvidia rolled out two new versions of its DGX supercomputer, including the powerful – and liquid-cooled – SuperPOD, which is powered by the Nvidia’s massive GB200 CPU-GPU complex. Over the past several weeks, Lenovo, Hewlett Packard Enterprise, Dell Technologies, and Supermicro all have pushed liquid cooling in servers designed for the new AI era.

“There are some unique requirements that we are seeing from these AI solutions,” Arunkumar Narayanan, senior vice president of compute and networking portfolio management at Dell, said during a briefing earlier this month, noting that with AI, the system is no longer a server. “It is a rack scale solution at a rack level, or even a collection of racks. That’s what the offer is changing to. In order to enable these offers, the thermal density of GPUs and CPUs are growing.”

Narayanan pointed to AMD’s new “Turin” Zen 5 chips coming in at 500 watts, adding that “we are getting to GPUs of wattages of 1 kilowatt, 1.2 kilowatts, 1.5 kilowatts. All of this is going to need a high degree of thermal efficiency. We need to integrate liquid cooling. That’s going to be a critical capability as we go forward.”

This week at the OCP Global Summit 2024 in San Jose, California, Dell announced the Integrated Rack 7000, a highly dense OCP-standard rack designed to liquid cooling and for the higher densities of GPUs and CPUs. One of the systems that can be housed in the new rack – which will be available in the first quarter of next year – will be the PowerEdge XE9712 for both large language model (LLM) training and real-time inferencing for large-scale AI workloads. The system will be powered by the liquid-cooled GB200 NVL72, which connects up to 36 Nvidia Grace CPUs to 72 Blackwell GPUs via NVLink. The liquid cooling will make the GB200 NVL72 25 times more efficient than systems powered by Nivida’s H100 GPUs.

At its Tech World event this week in Seattle, Lenovo unveiled the ThinkSystem SC777 V4 Neptune, a system powered by Nvidia’s GB200s for running trillion-parameter AI models and that includes the system maker’s new ThinkSystem N1380 Neptune, the sixth generation of its liquid cooling technology. With the new Neptune system, which Lenovo says can cut power consumption by as much as 40 percent, datacenters with server rack consuming 100KW or more don’t need air conditioning units to keep cool.

It will help enterprises that want to move into a hybrid AI model, where some AI workloads are run in the cloud but others can be done on premises, according to Flynn Maloy, chief marketing officer for Lenovo’s Infrastructure Solutions Group. The power in the datacenter is being used for the workloads, not for cooling, Maloy said during a briefing before the show.

The latest version of Neptune comes with a new vertical design and new chassis that’s designed for industry standard 19-inch racks, he said. It uses open-loop, direct warm-water cooling and includes eight trays, four 15KW Titanium power conversion stations, and a redesigned water flow distribution system with an integrated manifold with a patented blind-mate mechanism and aerospace-grade dripless connectors to the compute, according to the company.

“Most servers look like a bunch of pizza boxes that are stacked vertically,” Maloy said. “We’ve … turned those pizza boxes vertical so that the nodes are vertical. This basically allows gravity and pressure and allows more heat elimination through the process. We redesigned the backplane of the entire technology. This allows 100 percent heat removal.”

He said the engineering done by Lenovo supports “the highest accelerated computing available. When you reduce the power footprint like this, you can get more performance out of the footprint of your computing across the board.” The new Neptune system uses open-loop, direct warm-water cooling

The Dell and Lenovo systems come after HPE and Supermicro made similar announcements. At its AI Day last week, HPE unveiled a completely fanless direct liquid cooling (DLC) architecture for AI systems, which CEO Antonio Neri boasted delivers a 90 percent improve in cooling power consumption than traditional air conditioning systems. It’s an eight-element design that includes liquid cooling not only for the GPUs and CPUs but also the server blade, local storage, network fabric, rack cabinet, cluster, and coolant distribution unit.

This is needed given the huge increases in both transistor density (5x) and power consumption (33x) between 2007 and 2024. The company first used the 100 percent fanless DLC for its Cray EX supercomputers, with the architecture pumping “cooling fluid through cold plates through the GPUs, CPUs, memory, and rectifiers eliminating the need for fans. Switches and interconnects in these systems are also water cooled,” according to HPE documentation. “Why does it work so well? When you compare equal volumes of fluid and gaseous air, fluid has more than 3,000 times the cooling capacity when compared to air.”

Days earlier, Supermicro unveiled a liquid cooling solution that includes coolant distribution units, cold plates, coolant distribution manifolds, cooling towers, and management software. The rack distribution units deliver cooling capacity of 250 KW and hot swappable pumps and power supplies, the cold plates let the liquid flow through microchannels to dissipate up to 1,600 KW, and the distribution manifolds support per-rack GPU density of up to 96 Nvidia B200s. The SuperCloud Composer software monitors the systems, racks, cooling towers, and various components.

In all, the system saves 40 percent in infrastructure energy and 80 percent in datacenter space by removing the need for AC units.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.