The GPU “Expanse”: HPC Acceleration for the Masses

In the early days of GPU-accelerated supercomputers, accelerators were installed with the mission of delivering ultra-high performance for a few select codes. This was mostly because the ecosystem of NVIDIA GPU-enabled applications was nascent and early users of massive systems with GPUs like the Titan machine at Oak Ridge National Laboratory (the first top machine with accelerators) were still unsure of how to use them broadly at scale.

The use of GPUs in HPC applications has grown significantly since Titan appeared in 2012 with the earliest server-class GPU for HPC, the NVIDIA K20. The number of HPC applications set for GPU acceleration is currently in the hundreds, and now 136 of the top 500 fastest systems on the planet deploy the accelerators. Other than that major change in pure usability and application readiness, the more nuanced shift has been GPUs for acceleration of diverse, small, or moderate-sized jobs to deliver computing resources to many thousands of users across many research domains.

The increased need to serve mixed workloads on a large supercomputer with GPU acceleration capabilities has become an indispensable part of some HPC sites’ overall strategy. What was once a specialized accelerator for only a few key applications is now widely used — to the point that one such site, the San Diego Supercomputer Center (SDSC) at the University of California San Diego and home for the new National Science Foundation (NSF) – funded Expanse supercomputer, will not go back to pure-CPU machines in the near future.

Part of what makes SDSC interesting in this context is their uptick in GPUs for diverse workloads for Expanse. This is a follow-on machine from the NSF-funded Comet system, which will soon end its NSF tenure. Comet has just over 2,000 nodes, 72 of which have four  NVIDIA K80 and another 72 with four P100 GPUs. Expanse will have 52 PowerEdge server nodes, each with four NVIDIA V100 Tensor Core GPUs,  for total of 208 GPUs across the system. So while it is fewer total GPUs, it is a substantial increase in total compute power and as GPU supercomputers go, a dense accelerator profile.

For some, this might be unexpected given the workload profile at SDSC, which generally caters to more small- and mid-sized workloads versus massive parallel jobs that take over a significant portion of the machine. However, according to SDSC Deputy Director Shawn Strande, the lessons learned from Comet and user patterns make this decision an obvious one, especially with future workloads, such as those using AI, on the horizon.

“Based on the nearly full utilization of our GPUs on Comet, we expect to see the same thing on Expanse. We see applications in a wide range of disciplines including molecular dynamics, neuroscience, materials science, protein structure analysis, and phylogenetics. Our GPUs on Comet are chock-a-block full with jobs and we routinely reduce by half the amount of time that we can allocate to GPU proposals versus how much time reviewers recommend,” Strande says. “From application readiness efforts that are underway now for systems such as Summit and Perlmutter, it’s clear that the demand is there for large GPU-based systems, and the growth in machine learning and AI will only increase this demand. Systems such as Expanse, though modest in comparison, also provide a vital onramp for NSF users and those who are planning for scale-up on these large systems,” he adds.

“An all-CPU system was never a consideration for us. There is a strong demand for NVIDIA GPUs as evidenced by our Comet user community and the general lack of GPU computing resources available to the NSF research community. We serve a diverse workload that includes applications from many domains that require GPUs. Iin our view, it’s important that we help our users move toward architectures such as  GPUs, which have the potential for substantial price/performance benefits and represent an important pathway for users who are moving toward exascale.” – Shawn Strande, Deputy Director, San Diego Supercomputer Center

In terms of future workloads where NVIDIA GPUs will push new boundaries, Strande notes that AI is a small but growing part of the workload on Comet, which ends its NSF tenure in mid-2021. This is reflected by the 100+ projects that are using AI and ML techniques, though relatively modest capacity thus far, and is complemented by a larger user base doing traditional modeling and simulation with GPUs. Given the mixed workload on the GPUs and the preponderance of those that use double precision, the SDSC team chose to deploy double precision GPUs across the system rather than a mix of single and double precision GPUs, which could result in underutilized resources.

The GPU configuration in for Expanse is also indicative of the emphasis on accelerated compute, and was also based on lessons learned with Comet. Most of the GPU cycles are allocated to single GPU jobs, so having more in a server is advantageous, Strande explains.

“By and large, most of the GPU applications are not CPU bound, and we’ve found that in most cases the 2:4 ratio of CPU to GPU on a node is about right. Expanse will use Dell EMC PowerEdge C4140 servers with NVIDIA NVLink so for the users who run on multiple GPUs, they will see a nice performance gain over what we have on Comet today. We certainly would like to have more GPUs, but again, in terms of capacity and utilization, the balance we have in of 56x 2-socket CPU PowerEdge C6525 server nodes and 4x 4-way PowerEdge C4140 GPU nodes per scalable unit (essentially a rack) is ideal.”

It should be noted that the 60-node per scalable unit is an interesting design point for the HDR 100 interconnect. It requires only one 40-port HDR 200 switch per scalable unit, providing a single HDR 100 connection down to the nodes, leaving 20x HDR 200 links to the core, which gives a favorable 3:1 blocking ratio between scalable units. It should provide excellent latency and bandwidth with a single HDR 200 switch per scalable unit, something that other supercomputer sites will note over the next year or so.

Architecturally, this is the same kind of balanced system that can be seen on other large supercomputers designed to run a few select workloads at massive scale, often taking over most of the machine. What is interesting is that the same concept of system balance and scalability works on a supercomputer designed for the “masses” rather than “the massive” with similar GPU profiles.

Of course, when it comes to building balanced systems like this, it helps to have a partner along the way to navigate compute density with the right cooling systems (in the case of Expanse, it’s liquid cooling), the right capabilities with the correct memory, and the most fitting configuration for existing workload profiles. Strande says that Dell was the partner of choice for building an NVIDIA GPU-dense system. “We had a terrific group of experts who really understood the workloads, shared our design philosophy, and when the system is in production, we’ll be there to address any issues that may arise.”

The other story about Expanse is one of ecosystem collaboration. NVIDIA has worked tirelessly over the last decade to onboard countless applications for researchers using HPC systems. Dell Technologies has also built experience over many years with some of the largest supercomputer center collaborations on the planet. Bringing this kind of combined expertise to bear is what makes systems such as Expanse and many others possible, and allows researchers in the complex web of research areas at SDSC to have the right resource at the right time for mission-critical science.

More design and production philosophy for this machine can be found here.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.