As we head toward the annual Supercomputing Conference season we wanted to take a moment for a level-set on exascale.
There has been much talk about reaching this pinnacle over the last several years and while plenty of centers say they have reached exascale, that is only for single-precision peak theoretical performance. Entire countries have made the exascale claim as well, noting that their combined compute power is exascale. Neither of these count, at least by the standard definition. A true exascale system is measured in peak double-precision floating point and is a single, networked system.
Today we want to look at where practical exascale is today — and what we can expect going forward.
The two areas where there is the most imminent exascale action are the US and China. Perhaps it comes as no surprise that while we know much about the status of the former, the latter is still a bit of a mystery architecturally and practically.
Exascale in the United States
Over the next three years, three systems and close to $1.9 billion in systems, software, services, and support will hit the floor of three national labs. These machines will roll into production between 2022 and 2023, although this is quite a bit behind schedule — especially for the predicted first exascale machine at Argonne National Lab, the Aurora supercomputer.
In July, we learned that the Aurora machine would be pushed out even further due to Intel’s delays integrating the 7nm Ponte Vecchio GPUs into the architecture. The system will not appear until at least late 2022 with acceptance in 2023 — a long wait for HPE/Cray, the sub to Intel’s prime contractor status.
Recall that the vendors do not get their slice of the payment until a machine is accepted. For massive, expensive machines this makes it difficult for smaller vendors to take down big system deals. In other words, without the HPE acquisition of Cray, this machine would possibly not have the Slingshot interconnect and Shasta system base, especially since Cray would have been stretched to the limit across the other major supercomputer deals in the US.
And what a few years it has been (and will be) for HPE/Cray. The other two exascale-class machines planned for the US feature HPE/Cray as the prime: the “Frontier” machine at Oak Ridge National Lab (which will be the first exascale system with Aurora out of the running) that will hit the floor at the end of this year with acceptance in 2022, and “El Capitan” — a jointly-run NNSA and DoE system to be housed at Lawrence Livermore National Lab, which will be installed at the end of 2022 with acceptance in 2023.
Cray Shasta systems are the common thread between all three machines but Intel is standing alone with Aurora, which will have two Sapphire Rapids CPUs and six Xeon Xe GPUs per node. AMD CPU and GPUs rampaged back into the supercomputing world. Frontier, which will likely be capable of around 1.5 exaflops peak, is based on a custom AMD Epyc CPU and future Radeon Instinct (AMD has been omitting the “Radeon” as of late) GPUs with AMD’s Infinity Fabric handling back and forth. Frontier will be liquid-cooled and like its exascale compatriots, is expected to consume 30–40MW.
The interesting architectural bit about Frontier is that Oak Ridge National Lab was the first large lab to deploy Nvidia GPUs at scale and even with its high-ranked “Summit” system relied on Nvidia (and IBM) for accelerated compute. The shift to AMD GPUs is a clear sign that there are changes afoot both in large exascale systems and pre-exascale machines globally. The point is, AMD is challenging both Nvidia and Intel for realsies — something we would not have predicted in 2015 when the first practical talk about exascale architectures began.
El Capitan’s architecture is by nature more conversative since it has to match the mission-oriented nature of that system’s workloads, namely in simulating the future of the nuclear stockpile via the NNSA directives. While Frontier will have one Epyc processor and four Instinct GPUs along the Infinity Fabric, El Capitan nodes will have a single AMD “Genoa” processor with the Zen 4 core and four next generation Radeon GPUs.
By 2023, the US will boast a fleet of true exascale systems and will have helped to float HPE/Cray and AMD to even greater heights. The plan in the US is clear. In Asia, things are quite a bit more muddled but that is not to say there aren’t rival capabilities brewing for the exact same timeframe. And unlike other nations, the US doesn’t have to worry about cooking up native chip design and production in addition to simply reaching exascale from a budget priority standpoint.
Exascale in Asia
Japan has been the star of the HPC show with the dramatic performance and efficiency of the “Fugaku” supercomputer.
Built by Fujitsu, the system reaches almost half exaflop peak performance with pure CPU — although the A64X is anything but a standard processor. We’ve written about it at length (along with the Fugaku machine). The vector base makes it a unique system and if Japan is planning any follow-ons, this architecture is well-suited to the Linpack benchmark by which machines are ranked globally as well as to a host of real-world applications.
China’s quest for an indigenous processor continues with several options on the table, none of which have been publicly selected as the base for their next system. Currently, the top system in the world is Fugaku with two Chinese machines in the top 10 — the Sunway TaihuLight system (#4) and Tianhe-2 (#7), both of which have had their day in the #1 slot in Top 500 supercomputer lists gone by.
The Sunway TaihuLight system held the top supercomputer distinction for two straight years (all of 2016–2017) at 125 petaflops — a far cry from what exascale needs now. The Tianhe machine has gone through several iterations. When it first appeared as Tianhe1A in 2013, it topped the list dramatically and held onto that position for three straight years.
Both of these big systems with tenacious hold on the Top 500 definitely sparked action globally, but especially in the US, and put supercomputing back on the map as a mainstream headline-generating competitive issue. We are still seeing the effects of that today — and China is keeping the race fast-paced although keeping its cards close to its chest.
The big question this year? Out of its three dominant HPC system architectures, which will China choose for its next Top 500 chart topper? And is it possible that in an effort to thwart the US above all else China will adopt the Fujitsu A64X? We will leave that final question to your own debate, but it is worth considering. After all, we have not seen much from Sunway or Sugon more broadly in China.
The Sunway TaihuLight architecture is the most promising from our view. There is a deep dive here on how its put together and the chip at the heart of it: the homegrown Sunway SW26010 processor designed and manufactured by NRCPC in China. In addition to being an admittedly elegant processor with unique vectorization hooks and high efficiency, it is also the newest big architecture reveal from China. In the linked article above there is discussion about scaling it to exascale performance that is worth a read.
The other effort to watch in China around Chinese semiconductor and systems integration company, Sugon. It has a number of machines in China based on US parts (Intel/Nvidia for the most part) for large institutions like the China Meteorological Administration and for a number of telcos. But what we are watching is how it will deploy its own Hygon-based processor for future systems. There is not a lot available about the architecture other than that these systems sport a unique 200Gb/sec 6D torus interconnect of their own devising and the 32-core, 2GHz “Hygon” processors, which look an awful lot like AMD’s Zen microarchitecture at the core.
There is only one large public system with this architecture, which appears to be in-house at Sugon proper. It first appeared in 2018 but we have yet to see more machines roll out with this at the heart. There could be many reasons for this but Sugon is a contender to build future exascale machines.
And Europe Being Europe …
With so many member states and organizations, getting together around the idea that there will be a single system to reach exascale is more difficult than, say, in the US. After all, which country should be the host and for what reasons? Cheap power seems like a good reason and we suspect the big European ado about the LUMI system and its new, oversized datacenter in Finland might be a clue.
The EuroHPC effort, which formed in 2018 selected three sites for pre-exascale machines (Lumi is one, the others are in Spain and Italy) with a total spend of 60 million Euros and a scattering of smaller >5 petaflop systems. The big pronouncement from EuroHPC then was that there would be three exascale machines by the 2022–2023 timeframe, which as we know, is near at hand.
The sticking point to those plans is that one of those systems has to use a homegrown processor technology. That would be a RISC-V based design and as we reported yesterday, EPI just got 143 test chips back from GlobalFoundries. The tests were successful, but if an entire exascale ecosystem with all the necessary software to scale across an exaflop of performance is to be supported, Europe had, if you’ll pardon the Americanism, “better get crackin’”.
And while it’s bad form, apparently, to talk about the UK under the European subhead, the UK has big ambitions in exascale, reportedly formulating a plan to push up to nearly one billion Euros at the idea with the concept playing on broader public-private partnerships to support the ambitious effort.