Weather modeling and forecasting centers are among some of the top users of supercomputing systems and are at the top of the list when it comes to areas that could benefit from exascale-class compute power.
However, for modeling centers, even those with the most powerful machines, there is a great deal of leg work on the code front in particular to scale to that potential. Still, many, including most recently the UK Met Office, have planted a stake in the ground for exascale—and they are looking beyond traditional architectures to meet the power and scalability demands they’ll be facing in the 2020-plus timeframe.
The Met Office has just released a report on its own requirements for exascale computing, which follows news from ISC16 last week about the vision for the 2017 systems to back its efforts which focuses on a number of elements for R&D emphasis. These include the exploration of new architectures, bolstering programming models, enhancing I/O and workflow and coupling complex multi-scale and multi-timescale models. The domain specific nature of the code work requires a more focused deep dive, but one of the key elements in the report is the focus on emerging architectures.
As the authors note, “Total power requirements suggest that CPUs will not be suitable commodity processors for supercomputers in the future.” In addition to looking to more common accelerators, including GPUs and Intel Xeon Phi, the Met Office is keen on watching 64-bit ARM developments as well as, most surprising, FPGAs as suitable offload engines.
“FPGAs are a well-established technology but difficult for applications developed using high level languages. Two potential research avenues include first developing a software stack to transform high level language code and transform it into hardware logic for FPGAs.”
There has already been work to develop the software backbone for FPGAs, but this is the first time we’ve seen public statements from a major weather center directing attention to it. Among the other novel architectures is the D-Wave quantum computer, which can “in principle, compute the entire space of a minimization problem, albeit with some non-trivial restrictions on how that minimization problem is defined.” The Met Office also notes that quantum optical devices are another potential avenue and says “research into how such devices might be exploited to perform simple computations and they can be coupled into existing software stacks and what new algorithms might be possible.”
Although the weather modeling and prediction arena might be interested in what comes after Moore’s Law, nearly all centers are operating with CPU-only machines. ECMWF and others have some GPUs on their systems, which are used for research versus production (as we understand based on our last check-in with the center) because of the nature of their codes. With some exceptions in the research arena, these codes are complex and not amenable to GPU acceleration except for certain parts that can be offloaded. This is not to say that GPUs will never find a place in weather prediction, or FPGAs for that matter, but if the weather communities are serious about exploring new architectures, investing in the code to fit the next generation of systems will be a requirement.
As is the case in many other scientific computing domains, writing codes from scratch to fit new architectures is not an option. Many codes, including WRF and others, have been developed over the course of many decades, with tweaks to suit the addition of new cores and memory options for optimizations. If integrating GPUs is difficult, one can only imagine the road ahead for FPGAs, which already have the reputation for being difficult to program.
For weather, we can predict more of the same ahead for the pre-exascale machines—and perhaps even those in the 2020 timeframe. In 2016, just four years away from the time when some of the first exaflop capable machines might be announced, there are many Top 500 supercomputers devoted to weather; many of which are in the top 100. Since these centers buy duplicate systems for continuity and research reasons, there are dual machines at ECMWF at the #17 and #18 spots and another duo at the #29 and #30 spot from the UK Met Office, which ran the Top 500 benchmark on its newest Cray XC40 systems for the first time to achieve the high ranking. Other machines in the top 50 of the list include the Korean Meteorological Administration and their twin systems (also Cray XC40 – more on Cray’s reach into that market here) and NOAA’s new Cray systems at the #51 and #52 spots.
The full report from the UK Met Office sheds some light on the higher level code work that needs to be done, but these are to optimize for larger-scale CPU only machines. The question we will be chasing in the coming months is how much effort, both in terms of codes and systems, will be required to get weather modeling out from traditional system architectures in the post 2020 timeframe.