With 5.4 petaflops of peak performance crammed into 760 compute nodes, that is a lot of computing capability in a small space generating a considerable amount of heat. And that is what Lawrence Livermore National Laboratory’s latest HPC system – aptly nicknamed “Magma” and procured under the Commodity Technology Systems 1 (CTS-1) program – can deliver. Along with an advanced processor and server architecture to drive performance, an innovative cooling design keeps it running optimally under numerous and large workloads.
With this one single installation, the National Nuclear Safety Administration’s (NNSA) Tri-Labs, which includes Lawrence Livermore plus Sandia National Laboratories and Los Alamos National Laboratory, expanded their combined computing capacity by about 25 percent to a total of over 25 petaflops. Their existing CTS-1 computing cycles were fully consumed, and the NNSA needed to maximize its high performance capacity computing to accommodate additional needs. Magma ranked 69 on the November 2019 Top500 list using only 650 of its final 760 nodes. Like the other CTS-1 procurements, Magma was built by Penguin Computing.
CTS-1 systems are used as the everyday workhorses for Tri-Lab scientists and engineers researching a range of problems in hydrodynamics, materials science, molecular dynamics, and particle transport. Some of the CTS-1 systems are also dedicated to institutional computing and collaborations with industry and academia. But for the Magma procurement, the NNSA needed additional capacity dedicated to the simulation of 2D and 3D physical systems for parametric studies.
The Largest US Cluster Using “Cascade Lake-AP” Processors
Previous CTS-1 systems used processors that are a couple generations older than the Xeon processors in Magma, and as it turns out, Magma is the largest cluster in the United States to be built on a new server architecture based on Intel’s Xeon SP-9242 Platinum processors and its Server System S9200WK system design, which we profiled in detail here when they were announced in April 2019. Those Xeon SP-9242 Platinum processors have 48 cores, running at 2.3 GHz, and twelve memory channels per socket, made of a pair of 24-core “Cascade Lake” chips sharing a single socket, and therefore deliver almost twice as many cores and twice as much memory bandwidth per node than can be done with plain vanilla “Cascade Lake” Xeon SPs launched in in April 2019 and the “Cascade Lake-R” Xeon SP lineup that was announced in February 2020.
“We are continually tracking advancements in technologies and looking for capable and economic HPC solutions for scientists,” explains Matt Leininger, senior principle HPC strategist at Lawrence Livermore. “Our workloads – although they are intensive on the network – are most intensive on memory bandwidth. One of the things we like about the Intel Xeon Platinum processors is that they have a tremendous amount of memory bandwidth per node, and therefore we can remove that bottleneck from our application and deliver both capable and economical cycles to our mission critical applications.”
The Xeon SP-9242 Platinum processors have more cores, more socket-to-socket interconnect links, nearly twice the cache, and double the memory channels of other late-generation Xeon E5 and Xeon SP processors. This combination gives Lawrence Livermore’s HPC architects a new level of compute performance per node.
But the processors are just the beginning of what’s cool about Magma.
“We also required the system to be liquid cooled,” Leininger added. “Liquid cooling allows Lawrence Livermore to utilize the higher performance processors in a high-density solution while also easing the air-cooling requirements within our datacenters.”
According to Leininger, Tri-Labs’ experience with liquid cooling shows that it makes a significant difference with modern processors. Prior to more advanced CPU designs, air cooling offered adequate thermal protection to run the systems at full performance. Today’s modern processors require liquid cooling to reach their maximum compute capability. Plus, Lawrence Livermore needed the system in production as soon as possible.
Cool Technology That Keeps Magma Cool
Anyone who has built their own home gaming system with liquid cooling knows the complexity, risk, and extra serviceability involved with having coolant pumped around the inside of a chassis. It’s great for performance, but when it leaks or you have to replace components like CPUs or memory because you’re pushing the technologies to their boundaries, it means disassembling cooling system parts to reach the computing components. Scale that out to hundreds and thousands of nodes and that’s what HPC computing centers have had to deal with when considering liquid cooling solutions for clusters.
The Tri-Labs are no strangers to liquid cooling. Several systems across the labs integrate direct-to-chip liquid cooling. It adds complexity – and thus service cost. System leaks or failures amplify the difficulties and costs. But that’s part of the consideration to achieve the levels of performance they need With Magma, those challenges are pretty much eliminated with the CoolIT Systems liquid cooling design.
“We’re used to designing innovative, custom cooling systems for large clusters,” Jason Zeiler from CoolIT Systems comments. “Our own Cooling Distribution Units (CDU) interface between the facility liquid, subfloor piping, and the secondary side technology in the rack. Our offering to the market is as a technology leader, integration collaborator, and solution provider.”
According to Zeiler, memory failure rates in large systems are high across the industry, so easily serviceable memory cooling is a high priority going forward in HPC clusters. According to Leininger, DIMMs are the single most-replaced component in the CTS-1 clusters. Magma uniquely brings liquid cooling right to memory DIMMs.
“Our design allows for high-density memory heat capture,” Zeiler explains. “It provides very stable liquid cooling across the DIMMs. A key objective for our design was to provide both a cost-effective, high heat capture solution for memory while also maintaining very high serviceability. Our design allows for a high number of insertion cycles per DIMM, allowing them to be removed and replaced without any significant impact to the liquid cooling design.”
Another area of interest to IT admins at Lawrence Livermore and to Leininger himself is memory error rates. “Memory errors occur for a variety of reasons,” Leininger explains. “We are hoping that by liquid cooling the DIMMs on Magma, we will see a reduced rate of memory errors. But this is a bit of an experiment at the moment. We will be gathering data on Magma over its lifetime to see if that really holds true.”
Leak-Free Liquid Cooling
With the size of Magma and the dozen memory channels per processor that could support a large number of DIMMs per server, adding traditional liquid cooling directly to the DIMMs had the potential of significantly magnifying service complexity. But, CoolIT used innovative blind-mate, dry-break quick disconnect connectors to mate the component piping to the server board in each server and between the server and coolant manifold in the back of the rack.
Blind-mate, dry-break connectors automatically mate with a chassis manifold at the component level without having to manually disconnect the plumbing. Unplugging a server automatically unplugs and closes the coolant lines without leaking.
“The server design is very user friendly for liquid cooling with blind-mate connectors,” Zeiler explains.
“Our admins like the serviceability around the memory and that it’s being liquid cooled as well,” Leininger adds. “Lawrence Livermore was worried that the liquid cooling serviceability would be complex and potentially messy. However, CoolIT designed a clean and non-invasive solution.”
When designing the cooling system for the cluster, Intel, Penguin Computing, and CoolIT targeted 70 percent to 80 percent of the system to be water cooled and the rest air cooled. According to Leininger, that approach offers the most cost-effective method. Full liquid cooling adds complexity and cost with components like chilled doors, where air cooling can remove the remaining heat adequately to allow the system to run at full performance while protecting the components that are pushed the hardest.
Fast Design, Fast Deployment
Considering the size of Magma and the urgency that NNSA needed the added capacity, the system was on a fast track to deployment, which changed its design from previous OCP-based hardware from Penguin Computing.
“The increased memory bandwidth of the processors was compelling, and the quick availability allowing for a fast deployment were the two major driving factors in selecting the configuration,” Ken Gudenrath, DOE director at Penguin Computing, says. “To ensure a quick and complete cluster solution, we partnered with Intel using our Relion XE2142eAP 2U4N server in a standard EIA rack.”
Penguin Computing purchased fully integrated Intel Server System components with liquid cooling support and worked with CoolIT Systems to design the remaining liquid direct-to-chip cooling and Cooling Distribution Units for the datacenter. With Xeon SP Platinum 9200 processors with liquid cooling, Penguin was able to provide a high-density system with outstanding performance per core.
Like other CTS-1 systems, the fabric in the Magma cluster is based on 100 Gb/sec Omni-Path networks. But, due to the higher performance node capability of the system, Lawrence Livermore chose a double-rail interface with two Intel Omni-Path host adapters for each node.
“The quick collaboration amongst all the stakeholders allowed for fast design, contract execution, delivery, and ultimate acceptance of Magma,” adds Gudenrath. “Our final goal was achieved when we completed several initial high performance LINPACK (HPL) runs and submitted these for qualifying on the November 2019 Top500 list.”
For that HPL run, Magma had 62,400 cores in 650 nodes running at 2.3 GHz and had a peak theoretical performance of 4.6 petaflops at double precision, and delivered 3.24 petaflops running the HPL test, for a computational efficiency of 70.6 percent. When fully implemented, Magma’s theoretical peak will be 5.4 petaflops using all 760 compute nodes.
A Legacy Of Performance Continues Into The Future
“All the CTS-1 systems we’ve procured over the last four years, including Magma, will continue to deliver HPC cycles to our users,” Leininger states. “But scientists always have a growing demand for more systems like these. We continue to track technology roadmaps, and we are preparing for our next round of commodity technology systems procurements, called CTS-2, in the near future.”
Ken Strandberg is a technical story teller. He writes articles, white papers, seminars, web-based training, video and animation scripts, and technical marketing and interactive collateral for emerging technology companies, Fortune 100 enterprises, and multi-national corporations. Mr. Strandberg’s technology areas include Software, HPC, Industrial Technologies, Design Automation, Networking, Medical Technologies, Semiconductor, and Telecom. He can be reached at firstname.lastname@example.org.