The choice of processors available for high performance computing has been on growing for a number of years. There are no less than three major types of CPUs available for HPC duty, including X86, Arm, and Power architectures with more than a half dozen credible suppliers in total, along with two – soon to be three – GPU architectures. Despite the embarrassment of riches, the vast majority of HPC systems in the field today are powered by Intel CPUs and sometimes Nvidia GPUs. Beginning this year, that is going to start to change.
To be honest, the biggest source of diversity in the near-term will be within the X86 universe, where AMD’s Epyc resurgence will give Intel the stiffest competition it has had to deal with since the days of Opteron. In particular, the second-generation Epyc silicon, aka “Rome,” will almost certainly eat a significant chunk of Intel’s market share in the server space, in both HPC and elsewhere. Rome’s impressive price-performance was undoubtedly key to its selection for a number of high-profile supercomputers systems in the United States, the United Kingdom, Germany, and Finland, most of which will come online this year. The only question is how much the newer Xeons – the 14 nanometer “Cooper Lake” Xeon SPs and especially the future 10 nanometer “Ice Lake” Xeon SPs – will limit the damage to Intel’s market share.
Arm adoption has come slowly to HPC, mainly, we would argue, because the architecture doesn’t offer any particular advantages from a technical perspective, compared to X86 or any other modern general-purpose processor. Arm’s strength is that it’s IP is licensable and thus the architecture can be the basis of a large number of customized processors that address different markets, all bound together by a global software ecosystem.
But that kind of malleability is a long-term, rather than short-term advantage. It took at least five years for Fujitsu to design and develop the A64FX, the first purpose-built HPC processor based on the Arm architecture. Its imminent debut in RIKEN Lab’s “Fugaku” 400 petaflops supercomputer will test the viability of the architecture and surrounding ecosystem for high-end HPC. Coincidentally, it will also demonstrate the strengths and weakness of a system without either accelerators or external memory.
A more vanilla chip, Cavium’s ThunderX2 SoC, aims somewhat lower in the HPC hierarchy. The processor was launched in 2018, shortly before being sold to Marvell, and became the basis for the first crop of Arm-based HPC clusters for a handful of early adopters in the United Kingdom and elsewhere. Although ThunderX2 was no speed demon flop-wise, it performed exceptionally well in applications that were limited by memory bandwidth thanks to its generous allotment of integrated memory controllers. Marvell hopes to build on ThunderX2’s success with the ThunderX3, which is expected to be released into the wild early this year. The company is predicting this third-generation product, which will be built on 7 nanometer technology, will be competitive with AMD’s “Rome” Epyc 7002s and Intel’s Ice Lake Xeon SPs, offering more than twice the performance of the ThunderX2, with faster clocks and better energy efficiency.
But in 2020, there’s going to be another important option for Arm silicon in HPC machinery: Commercial systems based on the A64FX. For example, customers can now opt for A64FX-powered CS500 clusters from Cray/HPE, thanks to a partnership deal with Fujitsu. For the local Japanese market and perhaps Europe, Fujitsu will also offer its own A64FX-based systems: the FX700 and FX1000. If these systems attract enough followers in their respective geographies, we expect other OEMs to ink similar deals with Fujitsu.
The ramifications of commercial A64FX-based machines are already being felt. Isambard 2, the next iteration of the original ThunderX2-powered Isambard cluster at the University of Bristol, will be an A64FX Cray CS500. Although nothing has been announced, it wouldn’t surprise us if one (or more) of Europe’s three pre-exascale supercomputers end up with A64FX chips as well.
We think that some of the current enthusiasm for Arm-based clusters by both users and vendors is based on the fact that adoption appears to have hit an inflection point. Hyperion Research, which has been tracking Arm sales in HPC for a while, is forecasting a 64.7 percent CAGR in Arm processor revenue in this space over the next five years. While only about 50,000 Arm chips destined for HPC machines were sold in 2019, Hyperion expects that number to reach more than 233,000 in 2020 and more than 610,000 by 2024. A lot of those systems will be built outside the US, reflected by the fact that all the initial Arm-based exascale systems will be built and deployed in Europe, China, and Japan. Together, these geographies represent more than half of the total HPC market. That said, even if high growth rated can be sustained, X86 processors will continue to dominate the market for the next five years and probably much longer.
On the HPC Power front, despite the aspirations of the OpenPower initiative, IBM is still the only game in town. The Power10 processor was due to launch this year, but now it looks like it will come out in 2021 and the company is not counting on HPC to drive its sales. While Power10 will almost certainly be an impressive piece of silicon for high performance computing, at this point there are no big systems in the pipeline that will be powered by this chip. (The Department of Energy passed on IBM and Power10 for the CORAL-2 contracts.) One potential bright spot is that the European Laboratory for Open Computer Architecture (LOCA) program has selected OpenPower as one of three architectures that it will use to develop open source HPC processors. Nevertheless, for the foreseeable future, the Power architecture seems destined to play a minor role in high performance computing.
The option for GPUs, and accelerators more generally, are certainly growing, especially if you take into account all the customized designs being pursued in China (Sugon’s DCU and the Matrix-3000 DSP), Europe (RISC-V and other domain-specific accelerators under the European Processor Initiative), and the myriad of AI accelerators entering the market, like Intel’s recently launched Neural Network Processors: NNP-T and NNP-I. And then of course, there are the various iteration of FPGAs from Xilinx and Intel that can be used to implement semi-hardened HPC applications in silicon.
However, for mainstream HPC usage, GPUs will remain the accelerator platform of choice. Nvidia, of course, rules this space, but AMD, with its Radeon Instinct devices, is poised to grab at least some of that market. The top-of-the-line MI60 product offers 7.4 teraflops of raw 64-bit performance, 32 GB of HBM2 memory, and 200 GB/sec connectivity between GPUs via the Infinity Fabric. In a future iteration, that connectivity will be extended to AMD’s own Epyc CPUs, such that GPUs and CPUs can talk over the same fabric. That capability will be tested at scale in Oak Ridge National Lab’s “Frontier” exascale supercomputer, in which four Radeon Instinct GPUs and one Epyc CPU will be connected in each node through the Infinity Fabric. Frontier is scheduled to boot up in 2021.
That’s the same year the “Aurora” exascale machine is expected to come online at Argonne National Laboratory. That system will feature Intel’s Xe GPU, a coprocessor designed to accelerate both HPC and neural network training, just like Nvidia’s V100 and T4. As such, Aurora will be the first big test for this processor for HPC and AI work. Since none of the Xe processors are in the field today (client versions are slated to be released later this year), their performance is unknown, as is their ease of programming.
In that regard, Nvidia has the advantage, since the company has been methodically expanding their CUDA software around their CUDA hardware for more than a decade and has built a critical mass of developers and users. The company’s GPUs have also proven to be a rather elusive moving target, and with a new generation (“Ampere”) expected to launch later this year, Nvidia may once again leave its competitors in the dust. But at least now, it’s a three-way race. And that going to make accelerators a lot more interesting as we start the new decade.