Tooling Up For Exascale

The importance of achieving exascale computing has been discussed at great length in this publication and elsewhere. Depending upon who you talk to, though, its significance has been characterized as anything between an arbitrary performance milestone and the biggest revolution ever to happen in high performance computing. The truth probably lies somewhere in between.

Perhaps it would be more accurate to say that exascale computing encompasses all the points along that philosophical continuum. That particular formulation is probably more reflective of what has become a rather slippery term. It also has the practical advantage of enabling vendors to define exascale on their own terms.

As we recently reported, Cray (now part of Hewlett Packard Enterprise) has cast exascale computing as the convergence of traditional HPC workloads with that of machine learning and data analytics. And that just so happens to dovetail nicely with the heterogeneous capabilities of Cray’s “Shasta” supercomputing platform and associated ClusterStor storage and “Slingshot” Ethernet interconnect. The fact that the largest Shasta machines to be deployed in the United States over the next few years – “Aurora,” “Frontier,” and “El Capitan” – will be capable of operating at an exaflops or more has become just another detail.

For an HPC software company like Altair, the notion of exaflops as a defining metric is even more superfluous. That’s especially true when you’re talking about the company’s PBS Works suite and its most well-known offering in that product set: PBS Professional, aka PBS Pro. As an HPC workload manager and job scheduler, PBS Pro is certainly concerned with scale, but more as a function of carving up a system to run many jobs rather than bringing it to bear as a unit on a single job.

And because the scope of HPC workloads is growing, these tools also need to deal with more diverse platforms. In our recent conversation with Bill Nitzberg, chief technology officer of PBS Works at Altair Engineering, he notes that the effort to develop exascale systems has encouraged a more varied hardware landscape and one in which energy efficiency has become especially important.

“The system space just got really, really interesting with all this innovation happening in high performance computing driven by the race to exascale,” Nitzberg tells us. “With heterogeneous hardware, scheduling becomes more interesting. You are not scheduling one or two things – you are scheduling like six or eight different things.”

So instead of just looking at CPUs and memory, in many cases, the scheduler is going to have to account for accelerators like GPUs as well. Of course, GPUs have become well-established over the past several years, so most schedulers already support this kind of host-accelerator model. But Nitzberg says schedulers will also have to become power-aware for these exascale machines, which are expected to draw upwards of 30 megawatts. As a result, power is set to become a limited resource in these supercomputing centers and in some cases will need to be managed with a lot more precision than it has in the past.

More generally, Altair has been continuously improving the performance of PBS Pro as systems have grown in size. In some cases, that is made possible by transforming synchronous operations into asynchronous ones, in other cases, by improving the speed of scheduling algorithms directly. But for exascale, the critical challenge continues to be one of scalability.

Even with the extra computational heft afforded by GPUs, most of these initial exascale machines will have 50,000 or more nodes. Nitzberg says the design point for PBS Pro is to support a system at least twice as large as the most powerful supercomputer, the idea being to stay ahead of curve as machines inevitably grow bigger.

Altair developers are using a few different approaches toward this end. One that is already implemented is supporting multiple schedulers. That means you can now partition the system based on running more than one scheduler, which increases scalability through this sort of divide-and-conquer mechanism. Another upgrade, which is currently in progress, is to refactor the code in the main PBS Pro component (known as the Server) to improve horizontal scalability.

The principle metric being used to assess the tool’s scalability is node count.  At this point, the scheduler is rated to support up to 50,000 nodes and has been tested at up to 70,000 nodes before the performance starts to drop off. Interestingly, Altair did some of this testing on Amazon Web Services. The developers never actually spun up 50,000 nodes on AWS, instead using some virtualization tricks to simulate a much larger cluster. That virtualization capability, which has been productized by Altair, comes in handy, especially when you don’t have a $100 million machine at your disposal.

“I have told the team that I would like to get to 100,000 nodes,” says Nitzberg. “I think we’ll do that with the next release.”

He believes that two-fold jump in supported node count should be relatively easy to achieve and will likely just require some bug fixes and tweaking a few weak areas in the code. On the other hand, to get to 500,000 nodes (a 10X jump) – a level certainly foreseeable in second-generation and third-generation exascale systems — will necessitate a larger effort, probably entailing refactoring larger chunks of code, replacing algorithms, and even redesigning data structures.

In one instance, Nitzberg describes a table that links each node to every other node. When you scale that up to 100,000 nodes, now you have a 100,000-by-100,000 array that needs to be passed around.  At that point, you have to start thinking about replacing such a data structure with something less unwieldy.

Some of this upgrade effort will undoubtedly be the result of trial and error, namely, pushing the software until it breaks and then swatting away whatever bug that caused it. That said, the PBS Pro developers aren’t just flying blind. They are familiar enough with the code to recognize problematic areas that will need to be shored up. “They know where some of the bodies are buried,” Nitzberg laughs.

Some of the work that will be required to push the software to this next level will be accomplished with the help of a new partnership with Argonne National Laboratory, which became interested in using the PBS Pro technology for the lab’s upcoming Aurora supercomputer being built by Intel and Cray.  With a deployment date in 2021, Aurora is slated to become the first exascale supercomputer to be up and running on US soil.

Nitzberg says the collaboration with Argonne, which began over the summer, is being shepherded through the Department of Energy’s Exascale Computing Project (ECP) effort, much of which is focused on developing a system software stack and toolset that can serve these next-generation machines. In this particular case, Altair is working with Bill Allcock’s Advanced Integration Group at Argonne to come up with a workload manager and scheduler suitable for the Aurora architecture and exascale computing more generally.

The lab is not without its own experience in this area.  Argonne, under Allcock’s direction, developed an in-house job scheduler, known as Cobalt, which was originally devised for the BlueGene platform when IBM was still building branded supercomputers. As a result, Cobalt is now based on 20-year-old technology. Nitzberg says that developing the second version of this scheduler would have entailed a “complete rewrite.”

Nonetheless, some of the Cobalt technology is expected to be funneled into the PBS Pro software through this new collaboration. In 2016, the PBS Pro software became dual-licensed, with an open source version aimed principally at the public sector, and a commercial version targeted at private organizations.  Both are based on the common core of the software, which now stands to benefit from the expertise at Argonne. Nitzberg point to this DOE collaboration as a great proof point for this dual-license model.

The current plan is to use the enhanced version of the scheduler in the precursor system for Aurora. As of yet, there is no commitment to install it on the full-blown Aurora machine in 2021, although if the initial deployment is successful, it would logically follow to use the upscaled PBS Pro, or perhaps an offshoot of it, on the production system as well.

If you’re an Altair fan or are interested in exascale scheduling and workload management, more generally, Nitzberg and Allcock will be participating in the PBS Pro Open Source Project Community BoF at SC19 next week.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.