Site icon The Next Platform

When Immersion Cooling is the Only Option

With only one, perhaps two, percent share of the broader datacenter cooling market, immersion and direct liquid cooling are still fringe technologies. The reasons for that are as nuanced as they are numerous, especially for immersion.

While it has been around for a number of years, the reputation for maintenance, leaks, and spills has quickly outpaced word of the benefits, which include almost 100 percent thermal efficiency. These potential drawbacks – coupled with concerns over floor space and facilities integration (supporting the weight of the immersion tubs, for instance) as well as how introducing liquids might change support contracts with system vendors – are further complications.

But a day will come, quite possibly this decade, when it might be the only option – at least for high-density servers in datacenters supporting supercomputing applications or AI training, among other demanding workloads. A few HPC centers have gone the immersion route but again, these are the exceptions.

Lucas Beran, Principal Analyst for Dell’Oro’s datacenter infrastructure group, says right now less than one percent of the world’s servers are cooled by direct liquid or immersion cooling. “When I talk to engineers, they say they love the idea of immersion and liquid. But operationally, the barrier is the human element. Datacenter owners and operations don’t want to add liquids and oils to the datacenter environment. They are worried about mess and destroyed equipment.”

While the “human element” is keeping adoption of immersion technology at bay, Beran says the resistance cannot continue for much longer. “I don’t think we’ll get to broader immersion until we need to get there but we’re starting to be there. Densities are creeping up quickly and we’re fast approaching the tipping point.”

The movement to immersion will begin in environments with extreme densities. “Direct cooling a CPU or GPU direct to chip is not 100 percent heat capture. If you have a rack that generates 40kW of heat, you’ll still have around 4–8kW of heat that will escape, which means you’ll need hybrid cooling via attaching a rear door unit or air handler or some other form of air cooling to deal with that heat. In the future, with extreme rack densities in HPC up to 200kW if you’re only capturing 80 percent of that heat, there is still plenty to cool,” Beran says.

In other words, the time to start at least looking at what immersion cooling might require is now. Although implementation might be years away, architecting facilities and even support contracts with systems vendors ahead of time is critical. Air cooling might not be going away anytime soon but it will not be enough in some cases.

“We are really just at the beginning of a decade-long transition from air-based cooling to liquid-based,” Beran argues. “Perimeter is legacy technology, rack and row-level or rear-door exchanges, especially for high-density section or hotspots, are an intermediate piece in between that final frontier of thermal management, which is immersion cooling.”

If we are facing a future of immersion cooling, who are the vendors to watch now, how are they different, and how much innovation room is there for startups to tackle market share? And could all the standalone company momentum now (as small as it is) be upended if an HPE or Dell, for instance, decided to integrate it into its offerings for some of those highest-density environments?

Beran says Green Revolution Cooling (GRC) (which is installed at supercomputing site, TACC, for instance) is the leader now in immersion. Asperitas and Submer are honorable mentions with still others, including Isotope, garnering some mindshare. “There are other startups in the space with a couple proofs of concept now and the market could change rapidly, but I’m confident in GRC and Asperitas. To dethrone either of those would be difficult, although it’s still early.”

Differentiation for these and future companies now centers around taking a single-phase or two-phase approach to immersion. In single-phase, liquid goes into the tank, captures heat, then gets pumped through a heat exchanger before going back in. For two-phase, once the liquid in the tank hits a certain temperature, it vaporizes, rises, is redirected for cooling elsewhere, then condenses back into a liquid. The latter has a higher CAPEX and while it provides a cooling improvement, so far that’s “marginal” according to Beran.

The other area of differentiation is in the engineered fluid that sits in the tank. Asperitas is working with Shell to refine its medium, and 3M is working on its own immersion fluid. But right now, it is hard to say what a big difference improvements in the fluid will make. In short, there may be room for startups to differentiate technology-wise – but for something that is already a tough sell, offering an incremental improvement on single-phase might not resonate and going with a two-phase approach might add even more complications and up-front cost. So far, by the way, the leading immersion companies mentioned are all single-phase. Also, even though it’s GRC material, this is a nice explainer about the differences between single- and two-phase.

Perhaps the only differentiation that could make practical sense at this early stage is if immersion became a core offering – with full systems-level support from major OEMs providing the bulk of datacenters for those high-density datacenters in HPC or AI training. For instance, if HPE/Cray had an immersion option that guaranteed those systems – servers, storage, networks, and immersion tanks. Right now, a concern is that introducing immersion or direct liquid could invalidate a support contract. If a major OEM bought a GRC, for instance, and cooled a specific line for these use cases, it would be a different set of considerations.

All of this leads back to the questions at the front of datacenter operators’ minds: What messes and dangers to gear do these tanks represent? What about safety due to slips? What about messes and maintenance? There is a chicken/egg scenario here. More centers will have to adopt immersion cooling and share their challenges openly so others can gauge risk. But no one wants to go first.

Other than those most talked-about reasons, there are other practical concerns that keep immersion at bay, including on the facilities side. These tanks are not small – they take up a fair amount of datacenter floor space and do require some engineering to support. In other words, it takes planning to implement. And if everyone is waiting for someone else to go first, the whole rollout of more immersion examples is further delayed.

And speaking of delays, even though immersion might the only way to cool high-performance hardware in the next decade, its growth has been further hindered by the pandemic. Beran says that it’s a “high-touch” purchase from the beginning that required boots on the ground along the way. Even barring any further shutdowns, 2020–2021 could have pushed immersion’s entry into more mainstream cooling farther back still.

The last question is whether there is opportunity for any startup hoping to secure early footing in what looks to be one of the only options for near-100 percent heat capture. The answer depends on who gets acquired by a major OEM with reasonable grounding in HPC/AI server markets (HPE, Dell, Lenovo and to be polite, perhaps IBM). Internationally, companies like Fujitsu – which pioneered immersion cooling for supercomputing in particular – have already invested, although we still do not see a lot of large publicly-listed systems that feature immersion.

The OEMs have struck up partnerships with GRC in particular (Dell and HPE, the latter also has a partnership with Iceotope) but what big centers want is integrated support for systems with this unique and particularly risky technology. It is not out of the question that one will be bought, forcing a new competitive landscape in a game that’s still far too early to call.

Exit mobile version