Not so long ago, there was a question whether exascale supercomputers would be built from a very large number of thin nodes containing only modest amounts of parallelism or a smaller number of fat nodes powered by specialized accelerators and powerful manycore processors. As PGI’s Michael Wolfe pointed out, the latter model has won the day.
And that’s probably a good thing. As lovely as MPI is, distributed computing only takes you so far, given the inherent bandwidth and latency limitations of even the fastest system networks. And when you’re talking about hundreds of thousands of nodes, those limitations can constrict performance significantly. Of course, fat nodes have their own internal communication bottlenecks, but these are being mitigated by integrating more of the node hardware – memory, network interfaces, and controllers — onto the chip itself, or at least onto the processor package
But the complexity of programming accelerators and manycore general-purpose processors is a significant challenge on its own. Fortunately, in OpenMP and OpenACC, we have established parallel programming frameworks to help relieve some of this burden.
The newer one of two, OpenACC, is a directive-based approach for parallel computing that rides atop languages like C, C++ and Fortran. It was inspired by the rise of GPUs as computational accelerators but has been extended to support multicore CPUs as well.
In its relatively short life, it has become fairly widely deployed, especially in GPU-equipped systems. Over the last five years, the number of applications using OpenACC has grown from 39 to 236. Those now include three of the top five HPC codes: Gaussian (computational chemistry), ANSYS Fluent (CFD), and VASP (ab initio quantum mechanics). According to OpenACC vice president Jack Wells, whose day job is the Director of Science at Oak Ridge National Lab (ORNL), OpenACC is used in 18 percent of the INCITE codes on ORNL’s Summit supercomputer.
It’s also employed in the majority of climate codes and for the first time is being used in a weather prediction supercomputer: IBM’s Global High-Resolution Atmospheric Forecasting System, aka IBM GRAF. This also represents the first instance of a global weather prediction system being run operationally on GPUs, in this, case Nvidia V100 processors.
OpenACC 3.0 was recently released into the wild, and in it we see the beginnings of a new approach that aligns the standard more closely with some of the parallel constructs in existing high-level languages. According to OpenACC president Duncan Poole, the idea is increase interoperability between the languages, the associated libraries, and the OpenACC interface.
“It’s not just about directives,” Poole tells The Next Platform. “As the base languages get smarter and more parallel in their nature, in many ways this make the use of directives either optional or supportive to what’s already in the base language.”
The initial changes are fairly incremental, including support for C++ lambdas (inline functions); the addition of a modifier to zero out a device’s memory after it’s allocated; support for fat nodes with multiple devices through direct memory copies and synchronization; and an expanded list of directive that support the “if” clause.
More broadly, the OpenACC organization has also designated C18, C++17, and Fortran 2018 as supported base languages. Basically, that’s a statement of intent rather that a declaration of complete interoperability. But the idea is to bring these latest iterations of the C, C++, and Fortran into alignment with the OpenACC specification.
To that end, the organization is now working closely with the language standard committees to bridge the gap between the languages and OpenACC and develop a roadmap for interoperability. “We want to engage broadly in ecosystem development and that’s the new pillar we’re adding to our work,” Wells told us. At present, the boundaries of OpenACC and the languages don’t always fit together cleanly, given the overlapping capabilities in terms of parallel programming support. In Fortran 2018, for example the DO CONCURRENT construct could replace most of the functionality of OpenACC’s parallel loops, but they are not entirely equivalent.
To reconcile the languages with the directives is a big project. But the OpenACC members and users are behind this, said Wells. The prospect of making the parallel capabilities in the languages consistent with that of OpenACC is extremely attractive for developers, not to mention compiler vendors like PGI. The payoff is greater programmer productivity – something that both high-level languages and OpenACC aim for. The fact that the two have bifurcated is an accident of history, not a divergence of purpose.
From Wolfe’s perspective as a compiler writer, the ideal outcome would be to have OpenACC’s functionality completely subsumed by the base languages. “It should be a goal of all HPC compiler developers that over time programmers are able to use fewer directives,” he wrote, “either because of automation where the compiler becomes better at making decisions than the typical programmer, or because the parallel annotations become part of the underlying languages themselves.” For his part, Wolfe believes C/C++ and Fortran will eventually catch up and incorporate this parallel functionality into the semantics of the languages.
Until then, developers will have to make do with OpenACC. If you are interested in what is available in the latest iteration, you can access the API document for version 3.0 here.