And actually, one could say it is also far more than it appears.
Three years ago, a team from Oak Ridge National Laboratory (ORNL), Google, and NASA Ames published a paper showing the first glimmer of quantum supremacy. For those who don’t follow quantum computing, in a nutshell this means proving quantum systems can outperform traditional supercomputers.
The results were based on simulating the 53-qubit “Sycamore” circuit (Google’s quantum architecture) and running that same simulation on the “Summit” supercomputer at ORNL. The results were quite staggering then, showing the simulation of the full 53-qubit “Sycamore” circuit on “Summit” would have taken ten thousand years with the current state of algorithms, but took around 200 seconds on “Sycamore”. That’s not only faster, but more efficient – about 10 million times more energy efficient.
The authors, surprised by the dramatic nature of the results, submitted an entry for the coveted annual supercomputing award: the Gordon Bell prize, which focuses on high-value applications at vast scale.
The paper did not even make it past the first phase of weed-outs.
Yet now, three years later, a supercomputer in China, one of two of the first exascale systems on the planet (even though China did not publicly list the validated HPL results), ran a similar simulation of the “Sycamore” circuit on its own architecture. That paper just won a Gordon Bell prize – the only award to be given based on a supercomputer that is not even publicly listed on the Top500.
What is more, the paper, appealingly titled “Closing the ‘Quantum Supremacy’ Gap: Achieving Real-Time Simulation of a Random Circuit Using a New Sunway Supercomputer” appears to dramatically improve over ORNL’s “Summit” results with far more compute cores in the loop (an incredible 42 million) and with mixed precision. The problem is that the simulations have been dialed back in complexity compared to the original experiment – quite significantly, according to Dmitry Liakh from ORNL.
“In their Gordon Bell Prize-winning work, the Chinese researchers introduced a systematic design process that covers the algorithm, parallelization, and architecture required for the simulation. Using a new Sunway Supercomputer, the Chinese team effectively simulated a 10x10x (1+40+1) random quantum circuit (a new milestone for classical simulation of RQC). Their simulation achieved a performance of 1.2 Eflops (one quintillion floating-point operations per second) single-precision, or 4.4 Eflops mixed-precision, using over 41.9 million Sunway cores (processors).”
Liakh was part of the simulation team for the “Sycamore” experiments. His specific contribution was the development of a GPU-driven numerical tensor algebra library for the massive simulation on “Summit” in addition to other optimization and tuning. He says this new Gordon Bell prize winner, which claims to bring the simulation time down to 304 seconds, is doing so by leaving out key elements of the simulation, making it inferior to ORNL’s own simulation.
The team on the Sunway exascale system isn’t sampling the full space produced by the Sycamore circuit or any other random circuit. They are only sampling from the space of 21 qubits versus the original experiments that tried to simulate Sycamore, Liakh argues. “Because of the simplifications they used, without validation, that is how they reduced the simulation time to 304 seconds.”
“If you do this rigorously as we did and sample the full space of 53 qubits, then the best current estimates are in the order of a few days. With an exascale machine it might go down to one or two days. But if you aren’t sampling the full space, just using a sub-sampling with 21 qubits as they did, they can get this huge reduction in time to solution. But that’s not how this is supposed to work.”
The Gordon Bell award recognizes “outstanding achievement in high performance computing”. The organizers add that it is meant to track progress in parallel computing with “particular emphasis on rewarding innovation in applying HPC to applications in science, engineering, and large-scale data analysis”. One could argue that this prize winner for 2021 is showing remarkable parallel computing capabilities (in addition to porting this work to a novel architecture). But the application itself is really just a benchmark – and not necessarily one that has huge real-world value in an era of pandemics, global climate change, and major calamities that could benefit from massive simulation.
The award can also honor “peak performance or special achievements in scalability and time-to-solution on important science and engineering problems”. While the Sunway system has the former, the latter is worth questioning. It might generate different feelings if this were not the first system to run the winning submission that was not public. The Sunway system and the Tianhe-3 exascale supercomputers that came to light quietly are definitely not playing by the rules we expect in HPC with big, public machines showing off real-world scalability for problems that matter here and now.
Other than the issue of real-world value for this important prize in supercomputing, there is another aspect that sits ill with many in HPC: this paper was about China “coming out” with word it has an exascale system before the actual news broke. While the results in the paper are not truly exascale in the classical sense (they are based on low precision versus double-precision/FP64 results), the paper is seen by many as a “humble brag”, showcasing a cherry picked application for how it proves the system can handle traditional HPC simulations (FP64), and do AI workhorse tasks too with mixed precision and a tensor/matrix math acceleration component that is an alternative to modern GPUs on supercomputers.
So remember, up at the top, we say the paper is not all it appears and is also quite a bit more?
It turns out the simulation isn’t the story. For China, it was an attention-grabbing headline to show off a mixed-precision beast and to confirm to the world the system was real, despite the lack of public Top500 listing.
To be clear, for those from the outside, the Top500 listing is all double-precision focused in its exascale designation. The Chinese Sunway system is indeed the true definition of an exascale machine in a benchmark. However, benchmarks are often far from real-world application performance, hence the ability to have a true exascale capable system but top-level application performance that is significantly under that benchmarked peak. China wanted both stories to hit at the same time. And they did.
And here’s the thing: that a paper meant to stun the world with the details about a system’s capabilities (without China having to go through the actual rigor of publicly sharing its full HPL, HPCG, Green500 results) actually won a coveted prize for its application work is surprising at best – foolish on the part of the award committee.
The title got the attention of press, the scalability (despite low-precision/not HPC) got the attention of the prize committee. And what that means is that everyone did exactly what the powers-that-be in China (the same ones who decided to keep Top500 results quiet) wanted.