As supercomputing centers look to future exascale systems, among the other pressing concerns (power consumption in particular) is adopting the right programming approach to scale applications across millions of cores.
And while this might sound like a big enough challenge on its own, it gets more complicated because it might just be that a new programming model (or system) might not be the scalability and performance answer either. It could just be that tweaking existing tools and methods can move programming evolution to programming revolution, that is, of course, if the supercomputing programmer community can agree.
Like all things in HPC, “it depends” is the key phrase here. Some application areas will need to re-architect their code from almost ground zero to benefit from performance possibilities. Others can take an evolutionary or step-change approach and still see great gains. There may be no answer to fit all application realms in terms of where to go from here, but one thing for sure is that the status quo for HPC programming will not be static for much longer.
This impending crisis of choosing the appropriate approach to programming pre-exascale and future exascale machines was the topic of a panel outfitted with leaders from the high performance computing programming community at this month’s International Supercomputing Conference (ISC16). Among the most pressing questions presented to the panel were whether or not solutions to the programming challenges ahead exist (and what they are), what approaches might be a good fit but aren’t being explored (or are not known, fast enough, accessible enough) and why it doesn’t seem possible for anyone to develop the “perfect” system and garner broader adoption.
To set the stage, panel chair, Dr. Bill Gropp, Chief Scientist at NCSA and one of the foremost authorities on parallel programming at extreme scale also outlined what is meant by a search for a programming model—one that bears repeating. “The definition of a programming model refers to an abstraction of a way to write a program, so message passing is an example of a model. A programming system, which can be libraries, APIs, and other parts pulled from programming models, is different—MPI is a programming system, it’s more than just message passing.” An execution model, as the higher order, is an abstraction for the way a machine works—so a Von Neumann architecture or a vector machine, for instance. This necessary level-setting helps tune the conversation across the entire system and considers hardware and software co-design as an element.
In kicking off the panel, Gropp noted that in the last several years, taking an evolutionary approach to programming highly parallel machines has been successful. Just making incremental changes, however, will not be enough as the forthcoming exascale systems hit. “One view is that we need a revolutionary approach to how we program these systems, and that might require a complete rewrite. That might not be feasible for applications with millions of lines of code, but for new applications and areas where existing approaches are a bad fit could be a starting point.”
In fact, during the panel, there was little recognition that such a slow, evolutionary approach would be able to keep pace with the increasing core counts and capabilities cooked into modern and upcoming systems. According to Dr. Michael Heroux, parallel programming guru at Sandia (and co-lead of the HPCG benchmark effort)
“The most urgent thing is the effort that will go into re-architecting applications. We need to this as soon as possible and as broadly as possible so we can understand what our programming models and environments will require.” Heroux likens what is happening now to the early days of HPF and distributed memory computing when the first reactions of the programming community in HPC was to quickly set about creating abstracting away the distributed memory computations from the developers, thereby hiding all the complexity behind the runtime and user. This was good for ease, but bad for performance. All along, though, the programming tools were not as difficult to use as thought and there wasn’t such a need to have a programming model to hide everything away. “I’ll argue we’re in a similar period with the next generation of parallel computers in that we need to fundamentally re-architect the applications to be task-based. Application architectures need to be such that we should be able to arbitrarily scope out the size of data and work even on a node and be able to change the size, the distribution, expose dependencies across tasks so when they’re executed on a parallel machine and mapped to it, that’s an explicit part of the application design.”
“We need to urgently refactor our applications to gain some knowledge about how to do it and see what programming models and environment support we need.” – Michael Heroux, Sandia National Lab
Although the HPF reference ruffled some feathers and set a mumur about the group, many of whom remember those early days, it is difficult to find a more suitable parallel—even if agreement on how to set about re-architecting decades-old, million-plus-lines code is far from settled. While there do not appear to be agreed upon solutions (although panelist Brad Chamberlain from Cray pointed to Chapel as one option and noted that HPC isn’t where widely-adopted programming approaches are rooted anyway), Heroux does point to what is needed now beyond the “basics” of refactoring applications entirely.
“In re-architecting code completely, the best way to do this is with a small, representative set of the computations needed and from there, architect for tasking in a scalable way. You can then move (almost) wholesale functionality from a previous application to the new tasking environment and in such an environment, within each task, there is sequential code.” Along with this, he also says that although there has been a lot of talk about tasking across nodes, there is a needed conversation about tasking on the node. “In the same way the taxi and shipping industries have different models—they are both transportation, but shipping is a big chunk shipped long distances and taxis are shipping individuals integrated with other kinds of traffic.”
Heroux also argues that there are not necessarily entirely new programming models needed to approach these problems. “I can use a lightweight asynchronous task launching capability, a lightweight control transfer under which a lot of the logic can be hid under, and a lightweight data store.” Specifically, Heroux is talking about using exiting concepts and tools to extend parallel capabilities. For example, Atomics, which is supported in hardware to allow multiple threads to write to the same memory location, is not new—it’s baked into multicore processors already holds great promise. The same is true of the programming concept of futures, which can add marks to speed execution. Again, not revolutionary stuff for developers, but promising for those who are looking at efficiently re-architecting code to take advantage of far higher levels of performance.
While efforts like Chapel, which aims to create an entirely new (albeit familiar) programming system for HPC are admirable and hold promise, getting the kind of traction needed to build a massive effort that is widely adopted is going to be an immense challenge—and for those in HPC, this movie has been seen several times before (the shift to MPP, as but one example). Still, there is far less contention about whether change is needed and far more about that level of alteration that is coming to codes. Yes, it is highly expensive and cumbersome to refactor codes or worse, to rewrite them, but at a certain point it just won’t be a choice. Better to start now than pay the price later is the wider-held assumption—it’s the “getting there” that’s the hard part.