US and EU Pushing Ahead With Exascale, China Efforts Remain Shrouded

The HPC industry, after years of discussions and anticipation and some relatively minor delays, is now fully in the era of exascale computing, with the United States earlier this year standing up Frontier, its first such supercomputer, and plans for two more next year.

As should be expected, exascale computing also is a popular topic at this week’s SC22 supercomputing show in Dallas, including Cerebras showing off its “Andromeda exascale-ish cluster of the company’s CS-2 systems powered by its huge Wafer-Scale Engine 2 (WSE-2) processor – 13.5 million cores in all – that can hit more than an exascale of performance, but of half-precision (FP16) floating point and against very sparse matrices. The “Frontier” machine at Oak Ridge National Laboratory is still the only publicly verified exascale system at 64-bit precision, hitting more than 1.1 exaflops of performance on the Linpack benchmark.

At the same time, SiPearl, which is designing chips for supercomputers in Europe, announced a partnership with AMD to develop an exascale offering the combines its Rhea HPC processor with AMD’s Radeon Instinct GPU accelerators. Also, Germany’s Leibniz Supercomputing Center (LRZ) will work with Hewlett Packard Enterprise and Lenovo to jointly develop an architecture for an exascale supercomputer.

They represent the latest developments in a global exascale computing market that will see new systems becoming operational over the next few years even as revenue will start to decline around 2025.

“This is not due to a reduction in exascale machines,” Earl Joseph, CEO of Hyperion Research, said during a presentation in the run-up to SC22 on the state of the HPC market. “It’s due to the price tag coming down. Fugaku came in at $1 billion, which was pre-exascale. The US’ first three were about $600 million each and we expect that migrate down to about $350 million in the future.”

Fugaku, introduced in 2014 by Fujitsu and RIKEN supercomputing center in Japan and completed last year, sits at number-two on the Top500 list.

Bob Sorensen, senior vice president of research at Hyperion, outlined what will happen in the United States and Europe in the coming years and then tried to decipher what’s going on in China based as much on what that country isn’t saying as what it is.

“The bottom line here is that China has not done any official announcements,” Sorensen said. “They have not made any entries [for the Top500 list] for June 2021, November 2021, or June 2022, and they may not this time. This is primarily, we believe, a political decision not to do anything that would further increase US-China high-tech trade friction.”

Frontier is made from HPE’s Cray EX235a systems and is powered by the 64-core “Trento” AMD Epyc 7003 chips running at 2 GHz chips and AMD Instinct MI250X GPUs, and was a good first step for the United States, he said, noting the performance (1.68 exaflops of peak performance) and 21.1-megawatt power envelope.

“It has a peak performance of about 1.68 exaflops and only used – I can’t believe I used the phrase ‘only’ – 21 megawatts to run Linpack, which is actually a relatively impressive power performance metric,” he said.

It will be fully operational in January.

Coming next is “Aurora” at the Argonne National Laboratory, another HPE system leveraging the Cray “Shasta” design that will be powered by more than 9,000 nodes. Each node will contain two of Intel’s upcoming next-generation “Sapphire Rapids” Xeon SP server chips and six Intel “Ponte Vecchio” Xe GPUs. In all there will be more than 20,000 Intel CPUs and more than 60,000 GPUs in Aurora, which will have a power envelope of 60 megawatts. Both of these compute engines are now known as the Max Series, and we delved into them quite a bit this week as part of the SC22 festivities.

The system, which was delayed for about a year, is set for delivery late this year, with acceptance coming next year. Intel CEO Pat Gelsinger last year said Aurora would deliver 2 exaflops of performance and that the chipmaker was aiming for zettaflops of performance by 2027. Sorensen didn’t mention Gelsinger’s predictions during his presentation.

El Capitan” is the third of the first three exascale systems in the United States, and will become fully operational in 2023 at Lawrence Livermore National Laboratory, comprising HPE Cray systems and AMD’s “Genoa” Epyc 9004 CPUs and Instinct MI300 GPUs in a hybrid APU configuration. El Capitan is expected to deliver more than 2 exaflops of performance.

The European Union is pushing ahead with plans for two exascale systems by 2024 that most likely will use EU technology, particularly a processor developed in the region, he said. Sorensen also said there will technologies used from other European national programs, most like from Germany and possibly from France.

At the same time, Europe – more than the United States – is interested in developing its exascale program so that it dovetails with its quantum computing efforts.

“It’s a much more integrated program,” he said. “Basically, their plans going forward in the post-exascale environment is to really think about hybrid classical-quantum systems as a way to achieve the potential high performance. I think that’s a really interesting aspect of what’s going in the EU.”

Something of a wild card is China, the United States’ chief rival in the exascale space. China at one time had been relatively open about its efforts. The country in 2018 outlined prototype exascale systems – Sunway Pro OceanLight, Tianhe-3, and Sugon – that used different architectures. However, in recent years China has been more secretive about its initiatives amid strained relations with the United States.

As The Next Platform reported a year ago, China may already have two exascale systems up and running, but there has been no official word about them.

Sorensen noted that what the country has said officially may differ from what is actually happening.

<<Hyperion China exascale>>

“What’s happened since 2018 has kind of changed, driven more by political machinations than any kind of technical implications,” he said. “The reality – or I guess the unofficial reality – of what’s going in the China right now is the Sunway Pro OceanLight system has been up and running since March 2021. The Tianhe-3 has been up and running since perhaps late last year. So in theory there may be two systems in China right now that have actually achieved exascale capability.”

There isn’t much information about the status of the Sugon system, which may be delayed, Sorensen said. He added that despite China’s official silence, there could be more than two exascale systems operating in the country as it ramps up its program.

“There have been some interesting developments of late with US export controls and some concerns within basically the defense community of China’s access to leading-edge technologies and what this has done is cause China to really drive their indigenous capability a little harder,” he said. “No one wants to fan the flames there by bringing Chinese systems out on the Top500 list, but we know that there’s pretty strong evidence that there could at least be five other Chinese systems that could make the top 10 list today – seven out of 10 – but again we have no official confirmation of that.”

The merging of classical and quantum computing will continue and liquid cooling will become more common. Sorensen also spoke about the sheer size of these initial exascale systems, but like Joseph is seeing a trend toward smaller systems with reduced price tags and shorter lead times. The days of huge, powerful, and expensive HPC systems “may have reached an evolutionary endpoint,” he said.

“Instead of buying one big machine that can serve perhaps a wide range of applications, you’ll be buying a series of smaller systems that are more efficient, that are better targeted, that are cheaper, and perhaps more energy efficient for the specific workloads at hand,” he said.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

2 Comments

  1. Tachyum Prodigy is a nice concept (blank-slate redesign integrating CPU, FPU, GPU, and more) but have they been successful (if so, they should present their feeds and speeds at S22, for example)? Also, to me, the description from here (next platform): “2020/04/02/tachyum-starts-from-scratch-to-etch-a-universal-processor” didn’t seem to depict a claimed revolutionary architecture (I could be wrong of course).

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.