The high performance computing world is set to become more diverse over the next several years on the hardware front, but for software development, this new array of ever-higher performance options creates big challenges for codes.
While the hardware advances might be moving too quick for long-standing software to take optimal advantage of, for some areas, things are at a relative standstill in terms of how to approach this future. Is it better to keep optimizing old codes that could be ticked along with the X86 tocks, or does a new architectural landscape mean starting from scratch with scientific codes–even when the future of devices is still relatively uncertain.
This is especially the case for scientific packages that have evolved over several decades but have been able to keep relative pace with standard X86 hardware and MPI to parallelize at scale. One area where this is particularly the case is in climate and weather modeling—an area that requires a massive amount of compute but that is also bound to bulky codes that are the result of millions of hours of development over many years.
A European consortium of weather and climate modeling centers put together an extensive document detailing the mismatch between next generation compute capability and existing software approaches, as well as pointing to some possible directions forward. Participants include leaders from major weather centers, ECMWF, the Met Office in the UK, Germany’s DKRZ, and a number of research and academic centers, including STFC, the National Centre of Atmospheric Science, and STFC Hartree Center, among others in the UK and elsewhere. The document points to the relatively steady development trajectory along X86/MPI lines over the last decades, but looks ahead at what new architectural and software trends could yield greater performance—at the cost of productivity, at first at least.
“The coming computational diversity, coupled with ever increasing software complexity, leads to the very real possibility that weather and climate modeling will arrive at a chasm which will separate scientific aspiration from our ability to develop and/or rapidly adapt codes to the available hardware,” the authors explain. “Like getting across any chasm, the solutions which will need to underpin our future models will involve large community efforts in engineering and standardization; they won’t be built or sustained by relatively small groups acting alone—especially groups with the prime purpose of advancing science as opposed to software.”
“[Larger ensemble sizes and higher resolution models] and related scientific advances need to exploit the changing computing environment, where we also see increasing complexity: more cores per socket, multiple threads per core, the changing nature of a core itself, complicated memory hierarchies, 20 and more sockets per platform. In the future any given platform may assemble compute, memory, and storage in a unique assemblage of components, leading to major code development and maintenance problems associated with the increasing need for engineering resources to exploit potential performance, particularly in energy efficient, robust, and fault-tolerant ways.”
Major challenges ahead include diverse architectures, sometimes on the same machine or even node. This, coupled with the ongoing issue of performance portability across architectures creates mounting challenges. The group says new models are being developed, but the existing times scales from development to production science serving large communities is currently of order a decade.
On the books are ideas to begin developing a new model and the supporting ecosystem around it from scratch versus keep tuning old codes. The team also suggests it might be feasible to use models developed at other centers or look to other infrastructure entirely or to tweak existing models piecemeal.
“In most modelling groups, the “modeller” currently has to have full knowledge of how the entire software stack works with the hardware even if implementation details are buried in libraries. The new architectures present the “full-stack” modeler many challenges: Can one code to exploit the parallelism and the memory? Is there sufficient parallelism in the code itself to exploit the architecture? How can one obtain performance data upon which to make informed choices? Can one code to exploit multiple different architectures (and maintain that code)?”
Risks of taking these approach range from not being able to move fast enough to meet science goals, to having key capability removed because the external group has chosen either to do something completely different, or change key aspects of the interfaces upon which the model depends, the team explains.
“In the worst case, having outsourced some capability, when that is removed, the internal group may no longer have the capability to replace that capacity. Mainly for these reasons, large (primarily) national modelling endeavors prefer to keep as much development as possible in house. However, looking forward, it is not obvious that even large national groups have the internal resources to both keep up a production line of incremental model improvements associated with meeting near- term scientific (and/or operational) requirements and identify and taking the requisite steps necessary to develop codes which can hit quality + performance + portability + productivity requirements using next generation computing — particularly when the latter is unlikely to be achievable with small steps. This is recognised in some large institutions, for example, the Met Office is investing in OASIS, NEMO and XIOS, all of which have or will replace internal developments.”
The team says that progressing at the community level will require improved methods to allow the community to discuss, specify, design, develop, maintain, and document the necessary libraries and tools. They admit that historically, the weather and climate community does not have a great track record at sharing such tools, although in recent years necessity has begun to influence practice — with tools such as OASIS becoming more prominent in more places. “One of the reasons for this lack of sharing is in part the lack of a commonly deployed 20 structured approach to sharing, one that maximizes delivery of requirements, while minimizing risk of future technical burden — the sort of approach that has delivered the MPI libraries upon which nearly all of high performance computing depends.” They add that while a fully fledged standards track is probably beyond the will of the community at this point, there is hope for this on the horizon.
“The difficulty we now face is that fact because of the oncoming heterogeneity, even vendors are suggesting that the community have to face a world in which one can only have two of performance, portability, and productivity — a happy compromise is no longer possible.” The team adds that while this is to some extent hyperbole, it is certainly true that much more effort needs to be devoted to achieve these simultaneously, and that quite major changes in approach (e.g. the use of DSLs) are needed — so it is the amount of effort needed, the change in approaches required, and the timescales in play, that lead to the “cliff/edge chasm” motif they describe.
The full paper can be accessed here
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
> ” … have to face a world in which one can only have two of performance, portability, and productivity …”.
The ‘road forward’ would seem twofold:
1. Update existing Languages with either a new Type or use Comments processed by the Tokenizer to enable Line or Function optimization – for example allow one to specify that to optimize a particular Function there should be two or more versions of the Function: one that runs on the main Processor (EG: x86), one that runs on a GPU, and possibly a third version that runs on an FPGA or specialty Hardware (with an associated ‘Setup Time’ and Latency, but ultimately a much faster result than executing the Code on the main Processor).
2. Write a new Language based upon the best constructs of a few known Languages (for example, a mashup of C++, Fortran, Haskell and Lisp, Prolog – an off the cuff example) that can specify what is to be optimized (size or speed) by using Variables that are prefixed or suffixed (for example) and with a cost/latency/size ‘perspective’ (possibly using an alphanumeric that looks like a Line Number but is in fact a ‘specifier’) – that’s just a ‘quick suggestion’ offered as an example, I have no doubt a better system could be proposed.
By approaching the problem (accessing new Hardware both with traditional methods and by a new paradigm) bottom up and top down we can keep the old Code (and Algorithms), the decades of work, relevant and refine how we will address the problem of multiarchitecture and multiprocessor (many Cores) fusion.
PS: If some Standards Committee wants to call it “Paradigm programming language” (ppl, people) you’re welcome to use the name.