When talking about high-end HPC systems in the world, much of the attention often is paid to the massive supercomputers that are being developed by the likes of system makers Cray (now part of Hewlett Packard Enterprise and the main contractor on two exascale systems), Fujitsu, Atos, IBM, and others along with component makers Intel (which is a primary contractor on one exascale system), AMD, and Nvidia. All three component suppliers were awarded $258 million over three years by the US government in 2017 to work with the Department of Energy’s Exascale Computing Project (ECP) and its PathForward program to help accelerate the development of systems that are fast, powerful, and energy efficient.
The exascale systems can be the shiny things that draw the eye, but behind all that is the complex work needed to develop software, software development kits (SDKs) and applications that will make up the ecosystem that will need to be exascale-ready within the next four years and will have to easily integrate with these systems that will rely heavily on accelerators and that in some ways are still in their early stages.
It is an area that The Next Platform has tackled before and one that ECP director Doug Kothe spoke about at length in an interview posted on the ECP site. In that wide-ranging talk, Kothe touches on a broad array of topics, from the software stack and SDKs that the Software Technology team is working on to the applications that need to be able to handle a 50-fold increase in performance that the exascale systems are promising. He also touches on the tight working relationships with the hardware makers to ensure that what is being done in the ecosystem side jells with that these supercomputers will enable.
At the same time, there is pressure on the ECP to ensure that what is being learned within the exascale project is disseminated throughout the larger US HPC community. Two key recommendations coming out of a recent DOE review of the project’s status and roadmap was to publish and share what’s been learned and the program’s project management approach.
“We’ve learned a lot in terms of what to do, and probably more importantly, what not to do,” Kothe said during the interview with ECP communications manager Mike Bernhardt. “We also were given guidance to be more aggressive in our outreach … to really take seriously documenting our lessons learned and best practices for our software development, our applications development and our integration with facilities. We prepare a lot of detailed documentation for these reviews. The recommendation was to sift through that documentation, pull out the pieces that … the larger HPC community would benefit from with regards to preparing for exascale, preparing for accelerated nodes. [The recommendations were] basically saying, “Hey, we think you’re doing a pretty good job; get the material out that will help the broader community.”
Software And SDKs
Within the Software Technology portfolio, ECP has about 70 products, many that have been evolving over years on their own and put into use. What project software engineers are doing is picking up the development with the idea that they will need to run in nodes with multiple kinds of accelerators, such as GPUs. These are the foundation of the ECP’s Extreme-scale Scientific Software Stack, or E4S, the latest release of which came out in November 2019 with 50 full-release products and a half-dozen partial-release products, all of which can be used in four different kinds of containers. The engineers then were able to pull products together to create SDKs, Kothe said.
“We realized that many of these products have similar functionalities or they were meeting similar requirements, and by grouping these together – let’s say in programming models or in math libraries or in I/O or in DataVis – we can really ensure interoperability, a nice sort of horizontal integration, meaning applications can ideally plug and play some of these techniques, some of these technologies,” he said. “So we realized by grouping them together in five or six different related thematic areas that we could create software development kits along these themes – math libraries is probably our most mature – containerize them in different types of containers, and then deploy them for the community writ large.”
Key software projects include the Kokkos (developed by Sandia National Laboratories) and RAJA (created by Lawrence Livermore National Laboratory) abstraction layers that are being used both internally by ECP engineers and other by others outside of the project.
The abstraction layers make sure “that your data is laid out in a way that takes advantage of the accelerators and certain FOR loops and DO loops that they’re executed in ways that take advantage of the accelerators,” Kothe said, adding that “those details, whether it be for a particular GPU type, are really hidden from the applications and from software technologies wanting to use those layers. Now I can essentially call on Kokkos or RAJA to lay out the data for me, to execute certain floating-point operations for me, and whether I’m on an Intel or an AMD or an Nvidia GPU, that complexity is hidden. These abstraction layers essentially sit kind of on top of the metal, so to speak. Whether you’re using OpenMP or OpenACC or CUDA, those complexities are hidden.”
On The Application Side
The project has 24 applications, 11 of which are committed to the 50-times performance improvement over what they could do in 2016 and the other 13 to other metrics. Of those 11, engineers are seeing anywhere from three- to 200-times improvements. Kothe said that doing this “isn’t easy because we’re not just writing a hardware curve, because a hardware curve won’t get us there. … In particular, I’ll call out three applications very briefly. One [ExaSMR] is a collection of applications in support of small nuclear reactor commercial licensing, design and licensing. Another [EXAALT] is in support of fundamental materials science for materials in extreme conditions. And a third is cosmological simulations [ExaSky].”
Those three applications are seeing performance gains between 25 and 300 times. The key to these applications is understanding how they work with the hardware and programming models, such as OpenMP or CUDA. Engineers also are reviewing algorithms to determine what improvements can be made there. As an example of what was done to improve the applications’ performance, a performance engineer working with EXAALT “looked carefully at how the molecular potentials were being calculated, peeled that off into a kernel that we call a proxy app, did lot of analysis, and with various programming models asked, ‘Is there a way to speed up this potential?’ It was a small piece of code. And lo and behold a 40x speedup resulted from this detailed analysis. Then that kernel was imported back into the base code, and that really resulted in the EXAALT project now projecting a 200x performance improvement. This potential code had been around for years and not really been looked at with fresh eyes, with the point of view of, ‘How do I exploit the accelerators?’ So that’s just a great example of what ECP is about, which is getting fresh eyes on existing code, thinking about new programming models, thinking about new algorithms with an eye toward accelerators.”
Working With The System Makers
The ECP engineers with the Hardware and Integration group are tasked with supporting the PathForward vendors and ensuring that what is being done in software and applications meshes with what the hardware makers are doing with such systems as “Frontier” at Oak Ridge National Laboratory and “Aurora” at Argonne National Laboratory. They key is continuous integration of ECP’s products with the systems work – a model that ECP started more than a year ago – has been key, Kothe said.
“This is at a high level, an automated, ideally 24/7, daily/nightly deployment of our products onto the hardware – pre-exascale hardware now, but soon the early hardware and ultimately the exascale hardware – to test, test, test for robustness, for performance, the Software Technology products that come out of the E4S release and all of our applications as well,” he said. “It’s important to work closely with the vendors to understand their hardware roadmaps, to understand their portions of the software stack. But the rubber hits the road, so to speak, when you actually get your hands on early hardware. Early hardware we would view as one or two generations upstream in terms of time relative to the actual systems that are going to be deployed.”
As of the fall, EPC engineers were able to take an early look at some AMD hardware, and the project has been working with Intel on what the chip maker is doing. There also have been hack-a-thons and training systems with the vendors to get better prepare for what’s coming down the road.
That said, ECP’s Software Technology group is not narrowing down to one particular programming model, Kothe explained. “We do see a very diverse accelerated node ecosystem coming,” he said, “and we think that’s good for the community and good for us, meaning not just one type of accelerator but multiple types of accelerators, say, from Nvidia, AMD and Intel. That’s really forcing us – and I think this is for the good of the community and moving forward – to have a diverse, robust software stack that can enable applications to, ideally, seamlessly port and get performance on multiple GPUs. This is a very difficult and daunting task, but we’re now really getting into the details of how to develop, whether it’s abstraction layers or push for certain programming models that best allow our applications to achieve performance on these different types of accelerators.”
This kind of information about accelerated systems will be critical not only for ECP but also for anyone else in the HPC field.
“There are hundreds of people involved in ECP, as you know, but there’s a much larger community that we owe responsibility to in terms of getting this information out and sort of lowering the barrier to get onto not just exascale systems but accelerated node architectures in general, which are here to stay, from desktop to clusters to the largest systems in the world,” Kothe said.