For those who follow supercomputing, weather forecasting is one area to watch for systems that are designed for maximum capability. Unlike the many other areas in HPC, GPUs are not endemic. There are a few exceptions but generally, they only exist because the underlying codes are GPU native.
Almost exactly two years ago, we talked with Todd Hutchinson, Director of Computational Meteorological Analysis and Prediction at The Weather Company, about how their early adoption of GPUs for a majority of their numerical weather prediction system. As we’ve described before, hardware acceleration is not the norm in weather modeling and prediction, due in large part to million-plus line legacy codes and physical element simulations that have been refined over decades.
The Weather Company had the advantage of starting natively with GPUs for its forecasts and partnered closely with NCAR to build the code that supports their GRAF model. “It took me a long time in weather modeling to become convinced we could get these models onto GPUs and have them running efficiently. This project has proven that it’s possible,” he says. “I expect that while both GPUs and CPUs will get faster, there’s a growing advantage in the next generation of GPUs for weather prediction.”
GPUs are increasingly central to The Weather Company’s forecasts. In the last couple of years they have continued optimizing for V100 GPUs, leading to a 30% performance improvement and the ability to reach their goal of running forecasts across 30% of the world at an impressive 3km resolution. In a conversation this week, Hutchinson noted that they can also handle separate weather prediction models in addition to their core production workloads. He says that of their 100 nodes for their forecast runs, 88 have a four-GPU per node (with NVlink) configuration, which are all harnessed for mission-critical forecasts.
While there are GPU computing options from AMD and Intel on the market, possibly with some big enhancements before The Weather Company goes about the planning stages for its next forecasting supercomputer, he says their codes were designed with Nvidia GPUs in mind and so far, OpenACC and its current support for making the shift isn’t there yet. “As [OpenACC] evolves it will be great to have more options for whatever becomes available. At that time it will come down to two things: the speed of the GPUs themselves and the ability to push memory bandwidth capabilities.”
Hutchinson says that their next procurement will mean more powerful GPUs but that will mean a rethink of their current storage architecture. Right now their disk-only setup is working but he admits storage can be a bottleneck at times “We are always watching this and trying to make processes such that the I/O is asynchronous to the core model so ideally we won’t need to slow down the model due to I/O. At some point that breaks down—or you need a bigger I/O system to keep up but it’s all a balance. For now GPFS and disk has been good for us.”
When asked about what storage technologies are on the horizon and might pair well with their GPU-dense cluster, he notes that they are still a bit out of the loop with what might be out there. He is aware that NVMe and other approaches might be a fit but at the center, he says, is GPU performance. “It’s not just about storage, but also data flow—getting data in and out of the system network-wise is also something we’re looking at on the horizon,” he adds.
With all of those GPUs at the ready, one might expect some AI/ML story but for now, that is a research project versus anything that will find its way into production in the near term. “There are several places in the forecast that will ultimately take advantage of statistical and machine learning techniques,” Hutchinson explains.
“The core weather model is a physically-based, math-based model that is basically pushing air parcels around the world, generating precipitation and so on. That’s well known and done—simulating the dynamics of the atmosphere. But when we think about how a raindrop or snowflake forms, the science is uncertain—you might get the same amount of moisture and temperatures in atmosphere but a very different amount of snow based on how the crystals form in the clouds. That lends itself to approaches for ML techniques,” he says. “There are pieces of a code ultimately though, either improved or even replaced at some point.”
We have looked in some depth at how different weather centers and forecasting sites are using GPUs and AL/ML (separately and combined). These are both still nascent in most centers but will both gain traction in the years ahead, especially as AI/ML drives investments in GPU computing, even just for R&D for future models.