In the past year or so, watching supercomputer maker Cray, which is now part of Hewlett Packard Enterprise, has been a bit like playing a country and western song backwards on the record player. Supercomputing is booming a little (we don’t want to jinx it), Cray has its own interconnect again but with an Ethernet twist, and all of the big machines that Cray is taking down for delivery in the next year or two are based on AMD’s Epyc processors.
Yes, we know that Argonne National Laboratory is going to get the world’s first exascale machine based on Cray’s “Shasta” systems and its “Slingshot” interconnect, and that it will use Intel’s Xeon CPUs and Xe GPU accelerators, and we know that Isambard2 at the University of Bristol will be a Cray machine that predominantly uses Fujitsu’s A64FX Arm-based compute engine for its flops. But so many other machines, including a new pair of systems that will be installed this fall by the National Oceanic and Atmospheric Administration.
NOAA does weather and climate forecasting on a regional, national, and global level for the United States and feeds its forecasts to organizations in the news, transportation, agricultural, and other interested industries that are affected by weather. We did a deep historical dive on its supercomputer installations back in January 2016, when a pair of new machines was announced, and then talked more deeply with the NOAA techies in the National Centers for Environmental Prediction (NCEP), which is the part of NOAA that installed and managed the supercomputers at various facilities around the country. With the awards that were recently announced for a substantial upgrade to the NOAA facilities, NCEP has actually contracted with a division of system integrator General Dynamics IT to have that organization build the facilities that will house two Cray Shasta machines based on AMD “Rome” Epyc 7002 processors, but the machinery, and operate the whole shebang as a private cloud service for NOAA. So not only is NOAA gradually changing supercomputer vendors from IBM to Cray to Dell and back to Cray, but it is also gradually shifting its processors and networks as conditions warrant.
IBM won a five year contract with NOAA back in March 2011 which had a possible extension across another five years with a maximum contract value of $502 million. NOAA had several generations of Power-based clusters before this contract, and as part of this 2011 deal, IBM installed two generations of its iDataPlex clusters based on Xeon processors. When IBM sold off its System x server business to Lenovo, NOAA opted for a pair of Cray XC40 systems, called “Luna” and “Surge,” that were installed in early 2016 and that were located in NCEP facilities in Reston, Virginia and Orlando, Florida. These two machines used Xeon processors from Intel and the Aries interconnect from Cray, and had an aggregate peak performance of 2.06 petaflops each, and each system had an 8.1 PB parallel file system, based on IBM’s Spectrum Scale (GPFS), to store data. Two years later, with IBM still as the primary contractor, NOAA installed the “Mars” and “Venus” clusters based on PowerEdge servers from Dell and 100 Gb/sec EDR InfiniBand interconnects from Mellanox Technologies. Each machine had 1,212 nodes with 56 GB per node and a total of 33,936 cores for a peak theoretical capacity of 2.8 petaflops. It is interesting to note, as we did at the time, that IBM priced out the Mars and Venus machines with the “Broadwell” Xeon E5 v4 processors, which did not have as wide of vector engines as the then-current “Skylake” Xeon SP processors, and yet on real HPC workloads, the Broadwell chips delivered better floating point bang for the US buck.
NOAA is always very conscientious about price/performance, and it has a fixed budget that does not allow it to indulge in grand science experiments when it comes to trying out radically new system architectures. NOAA has to get forecasts to the National Weather service, and it has to do so like clockwork because the entire country is, to one degree or another, dependent on the weather forecast.
The pair of new machines, which have not yet been given their nicknames according to David Michaud, director of the National Weather Service’s Office of Central Processing, will be installed in new datacenters located in Manassas, Virginia and Phoenix, Arizona, and notably for the first time in the history of the NOAA’s computing, the facilities will be on completely separate power grids. The contract for capacity on the machines is with CSRA, a division of General Dynamics IT, and in this case the two machines have a five year cost of $150 million. The contract that NOAA negotiated with CSRA is a ten-year deal with a maximum contract value of $505.2 million, but it looks like these two new Cray Shasta machines will be in use for quite some time.
“That $505.2 million is the maximum amount that we could place orders against the contract,” Michaud tells The Next Platform. “That doesn’t necessarily mean that this is the current budget over the ten years is $505.2 million. As we put together contracts with the strategies that we have, this allows for any growth if we need to put additional budget on the contract over the ten year period, we would have the headroom to do it. That $150 million over five years really is a representation of the budget that we have over that five year period. And the next system we would expect to be for the for the back half of the contract for the last five year period.”
That does not necessarily mean that five years from now, in 2025, NOAA will plunk down the remaining $355.2 million it has in the budget to operate its next pair of machines. But what is clear is that NOAA thinks it can get by for $30 million a year to essentially have a service provider install and operate a pair of supercomputers that have 12.1 petaflops of raw capacity – more than four times the capacity of the current Mars and Venus systems – and there is little question in our minds that the aggressive pricing that AMD has with its Epyc processors is a big part of that equation.
The pair of Shasta machines that NOAA will start getting access to this fall are based on Shasta nodes that cram four two-socket servers based on the “Rome” Epyc 7742 processors. As we discussed with Michaud two years ago, NOAA is keeping an eye on GPU acceleration for its weather modeling applications and continues to do substantial experiments in this area, but once again, this pair of big production machines is based on a CPU-only architecture. These Epyc 7742 processors have 64 cores spinning at 2.25 MHz, and they are not the special Epyc 7H12 processors that AMD launched last fall that require water cooling. Each system has 2,562 nodes for a total of 327,936 cores, and significantly, each machine has 512 GB of main memory per node – nearly a factor of 10X increase – to help cover the factor of 4.6X increase in the number of cores. (The system is going from 2 GB per core to 4 GB per core, which shows you how much memory prices have come down in the past three years.) There will be around 67 nodes in the Shasta system that have 1 TB of memory, which will be used to do memory-heavy pre-processing and post-processing for the weather forecasts, according to Michaud.
The nodes in the pair of NOAA machines will be linked using the Slingshot variant of Ethernet that Cray has created for HPC customers and that runs at 200 Gb/sec per port. Each machine will have a ClusterStor Lustre parallel file system made by Cray with 13.1 PB of capacity – which is 2.2X what the current Venus and Mars machines have at 5.9 PB. But there is another big difference. Each of the new systems will have a flash-based Lustre file system weighing in at 614 TB plus another 12.5 PB disk-based Lustre file system for archival storage.
One of the things we always want to know when a weather forecasting center is doing a big upgrade to their systems is how this is going to affect the forecast in terms of the granularity of the simulation and the time horizon of the forecast itself. When the Mars and Venus systems went operation in early 2018, NOAA had three pairs of production machines that totaled 8.4 petaflops of compute (measured at double precision floating point), and its research and development systems – including some capacity it shares on the “Gaea” system at Oak Ridge National Laboratory – totaled another 8.1 petaflops of compute. These two new Shasta systems are expected to be accepted by February 2021 and to go operational with applications no later than February 2022, and when some of the older machines are shaken out after the new ones go in, NOAA will have an aggregate of 40.4 petaflops of compute in production, research, and development – a substantial increase in performance. And frankly, NOAA is not exactly sure how it will make use of that big jump in compute just yet.
Prior to the Mars and Venus clusters coming online in early 2018, the Global Forecast Model created by NOAA and used by the National Weather Service to provide the basis for forecasting by other interested parties has a 384 hour (16-day) forecast that rans at 13 kilometer cell resolution with 64 levels. But to get the forecast done more quickly, it was only run at that resolution for the first ten days and then at a lower resolution for the following six days. With the Mars and Venus upgrades, the 13 kilometer higher resolution was allowed to run for the full 16 days thanks to the increased floating point processing power, and then the next generation Global Forecast System, often called the American Model in the weather circles (and in contrast to what is called the European Model after the European Center for Medium-Range Weather Forecasts) delivered a finer resolution of 9 kilometer cells with 128 levels and a full 16-day forecast. The GFS is run four times a day, and as it is currently implemented, it provides an hourly forecast for the first four days and then every three hours for days five through sixteen.
The question now, Brian Gross, director of NOAA’s Environmental Modeling Center within the National Weather Service, tells The Next Platform, is figuring out how hard to pull three different levers in the Global Forecasting System: higher resolution in the model, increasing complexity in the physics engines embodied in the model and in the number of processes that are captured, and the number of ensembles that are used to figure out the accuracy of the forecast.
“The higher resolution models can simulate and predict smaller scale events like mountain waves, windstorms, and thunderstorms – and here we can we can hopefully be more specific in finding the areas where there’s a risk of flash flooding,” explains Gross. “The second area, complexity, is often encapsulated in the form of model physics. How many of the complicated physical processes that are taking place in the atmosphere can we capture in our models? This relies heavily on the amount of computing we have available to us, and it will tell us if it is going to rain or if it is going to snow at the top of the mountain, for instance. Complexity also encompasses the amount of and the different types of observations we can actually ingest to give us a better picture of what the atmosphere looks like at a given time, which will be the starting point of any forecast. Ensembles are a number of slightly different forecasts, and the differences can be in the initial conditions or in the model physics that are used, and the idea behind these is that if you have a set of initial conditions that are fairly tightly clustered, you can watch how they diverge as the forecast progresses. And if they don’t diverge very much, then you’re probably pretty confident in the forecast. But if they start diverging and go to all different sorts of directions – such as you can get anywhere from a trace of rain to an inch of rain to a trace of snow to a foot of snow – it’s tough to be confident in any one of those forecasts.”
All of these three levers that Gross and his team want to pull are dependent on the floating point capacity of the systems. The short rule of thumb in weather forecasting is that if you double the horizontal resolution of a forecast, then it will require 8X the compute capacity to support that higher resolution. The production machines are going from 2.8 petaflops to 12.1 petaflops, so clearly there is a lot of room to increase the resolution of the American Model. The core of the GFS was already upgraded last June to be a more dynamic model, so that lever has just been pulled. And NOAA plans to add more ensembles to the mix and upgrade their model physics as well as allowing it to ingest more data sources and more types of data sources. One area of focus is to jack up the resolution on relatively short term, regional forecasts to predict severe weather as it is unfolding to better protect people and property.
“We want to plan for our model upgrades, and one of the key pieces of information, as you can imagine that we need to know, is the capacity of the supercomputers,” explains Gross. “So now that we know what the systems are going to look like, we can actually map out over the next few years which modeling systems we’re going to upgrade and how we can best utilize the system.”
There is an old adage that more data beats a better algorithm, and this certainly applies here at NOAA. And as a hint to what might happen with the Global Forecast System once this new iron is up and running, Gross provides a possible clue.
“Ideally, we would be able to use the same data that’s available to us in both of those scenarios so we can do an apples to apples comparison of the resolution side versus ensemble side,” he says. “But right now, I can tell you that a coarser resolution ensemble will beat a single, deterministic model at a little bit higher resolution.”
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
“Slingshot variant of Ethernet”? than Fibre Channel is also a variant of Ethernet. And PCI-Express. Probably Gen-Z too…