Pushing AI System Cooling To The Limits Without Immersion

Here is a question for you. What is harder to get right now: 1,665 of Nvidia’s “Blackwell” B200 GPU compute engines or 10 megawatts of power for a four year contract in the Northeast region of the United States?

Without question, it is the latter, not the former, and both will cost on the order of $66 million.

The fun bit is that those GPUs will probably actually take 13.4 megawatts of juice to operate a GB200 NVL72 rackscale system configuration. And if you don’t need a rackscale coherent memory domain for the GPUs because you are using the GPU machinery for AI training instead of inference (which operates at a scale of tens of thousands of GPUs), you will burn about the same power but you can do it with twice as much space and half the power density.

Here is another fun bit about modern AI datacenters: If you can’t prove that you have the power allocated to you, and in a datacenter that is designed to handle the density of the system, Nvidia will not sell the GPUs to you until you can prove you have power. And, the world on the street last week when we spoke at a conference at the NASDAQ exchange in New York City focused on AI in the financial services industry was that power companies are now trying to stretch their gigawatts of power generation and are increasingly looking at how you are distributing power and doing the cooling in an AI datacenter before they do their allocations.

Increasingly, if you can’t prove you are using the power wisely, you don’t get it, or you don’t get as much as you want.

Add to all of this the fact that compute density is necessary in an AI system running chain of thought models because these require coherent memory links between GPUs with super-low latency for AI inference, and we are in a situation where direct liquid cooling is not inevitable in the future, but is absolutely necessary right now. And a lot of datacenters are not used to it, and those that were way back in the IBM System/360 and System/370 mainframe days five and six decades ago have not had liquid cooled iron in their datacenters for a long time.

Which is why companies like Supermicro have to push the envelope on direct liquid cooling for their GPU-accelerated systems.

“All of the customers that we talk to are thinking in terms of how many GPUs can they power and cool per megawatt,” Michael McNerney, senior vice president of marketing and network security at Supermicro, tells The Next Platform. “They tell us how many megawatts, and they want the maximum number of GPUs possible. The conversations are about GPU density and GPUs per megawatt, and it is not about how much money they can save on power but getting more GPUs to throw at the AI workload.”

Supermicro developed its first generation of direct liquid cooling with cold plates on the CPUs and GPUs with eight-GPU servers based on Nvidia’s “Hopper” H100 GPUs in the fall of 2023, which it first became apparent that some of the cooling techniques that have been used for several years in HPC systems needed to go mainstream in AI systems. Supermicro designed and manufactured the whole DLC system, including the cold plates, the coolant distribution units (CDUs) in the racks, and the chillers that provide cool water back to the equipment in the racks.

Notably, half of the “Colossus” system at xAI – comprising a total of 50,000 H100 GPUs – in its datacenter in Memphis was built by Supermicro using its DLC-1 technology. The other half of the system (with another 50,000 H100s) was built by Dell and is only air-cooled.

Those nodes in the Colossus machines have a pair of CPUs as well as eight of the H100 GPUs. The server nodes also have eight ConnectX-7 network interface cards (one for each GPU) as well as a pair of lower-speed Ethernet interface cards for system management, PCI-Express switches for linking the GPU complex to the CPUs and the on-node storage and a number of other components. The DLC-1 system used water that was 30 degrees (Celsius), and could remove about somewhere north of 70 percent of the heat out of the system, which was a big improvement in efficiency and power savings. The CDUs in the DLC-1 setup were rated at 100 kilowatts.

But given the dearth of power out there around the globe, and its expense, Supermicro pushed harder with the DLC-2 liquid cooling system announced this week and debuting with the Blackwell B200 GPU nodes.

Here is what one of these new 4U nodes with the DLC-2 cooling looks like:

Technically, using the Supermicro naming convention, this machine above is the SYS-422GS-NBRT-LCC. The CDUs are more efficient and can deliver 250 kilowatts of cooling flow, and importantly can run on liquid that is only 45 degrees, which means it can be cooled with outside cooling towers instead of chillers, which cuts back on overall power requirements.

With the DLC-2 setup in the B200 HGX SuperServer, the pair of Intel Xeon 6 CPUs and eight Blackwell B200 GPUs have cold plates, but the main memory DIMMs, the PCI-Express switches in the node, and the power supplies, and voltage regulators are all equipped with cold plates to remove their heat directly, too.

And with the HGX B300 systems that Supermicro will ship later this year, the ConnectX-7 and later network interface cards will also have liquid coolant and thus around 98 percent of the heat generated by the system will be removed with liquid, not air. The SuperServer B300 node, in fact, will only have two small fans, and it won’t make much noise at all.

The upshot of this is that the GPU systems using the DLC-2 cooling will use 40 percent less power to cool the systems than the completely air-cooled HGX H100 systems from only two years ago. The power usage effectiveness of the racks using the DLC-2 setup will also be driven very low. A normal, legacy rack in an enterprise datacenter has a PUE of 1.6 to 2.0, which means the datacenter rack burns 1.6X to 2X the power of the computational units doing work, with the extra power being used to cool the rack. With DLC-1, the Supermicro racks were down to about 1.2 for a PUE, and the target for DLC-2 is a very low 1.02 PUE.

And the noise level for DLC-2 racks drops down to around 50 dB compared to around 75 dB for the DLC-1 racks. Normal conversation is around 60 dB, and heavy traffic (outside the car) is around 85 dB. A rock concert is on the order of 120 dB and a jet engine at takeoff is 140 dB.

The only way to get more efficient with cooling an AI system is to dunk it in  bath of baby oil or some other coolant that doesn’t wreck computer components. And that is a very heavy solution, so to speak.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

1 Comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.