If you are an HPC center in Europe, and particularly one that is funded by public funds, you are thinking about Arm-based CPUs in your supercomputers. And that is despite Arm Holdings being a British company and all of the issues with the United Kingdom and its Brexit separation from the European Union.
Arm is still the closest thing to a European architecture that companies can deploy, and it is a licensable architecture – even if it is not an open one in the strictest sense – and that standard in stark contrast to the X86 architecture that has dominated HPC compute for three decades now.
This is particularly true given the A64FX processor designed by Fujitsu, with its fat SVE vector engines, and used in the “Fugaku” supercomputer at RIKEN Lab in Japan and the intent by Arm Holdings to add substantial vector processing performance in its upcoming “Zeus” V1 core, which has already been added to the 64-core Graviton3 (code-name unknown) processor from Amazon Web Services.
But interestingly, the first use of the Arm architecture in stock HPC systems might be as a babysitter to accelerators, and that ironically means that Ampere Computing’s 80-core “Quicksilver” Altra CPUs and 128-core “Mystique” Altra Max CPUs could start seeing come action. Particularly given the high throughput, deterministic performance, and low price Ampere Computing is charging for these CPUs relative to X86 alternatives, as evidenced by the 40 percent to 45 percent better bang for the buck that Microsoft and Google are both delivering on Altra instances compared to Intel “Ice Lake” Xeon SP and AMD “Milan” Epyc 7003 instances. Every euro or pound not spent on the CPU in a hybrid CPU-GPU system is a euro or pound that can be spent on accelerators, memory, network, or storage.
And that is why E4 Computer Engineering, based outside of Milan in Italy and one of the scrappy supercomputer suppliers in Europe playing to its niches and often up against Atos, Hewlett Packard Enterprise, and Lenovo, is bringing Ampere Computing’s Altra and Altra Max CPUs to its systems.
As you well know, Ampere Computing has been very clear that it is designing processors expressly for hyperscalers and cloud builders, who want better security isolation between cores (and therefore instance types) and processors that have all their cores running at the same speed all the time so the performance is more predictable than with machines that set their own speeds based on workload. We have said all along that Ampere Computing’s path may lead it outside of its target hyperscaler and cloud builder customers, particularly given the success of the Graviton family at AWS, and that for many workloads, cheap cores with enough math and good throughput is what the HPC center will need in a CPU where the accelerator does most of the calculating work.
Eventually, we think, Ampere Computing will want a piece of the HPC and AI pie directly and will bring vector engines into some of its future processors so they can be used in all-CPU clusters running HPC and some AI workloads. Ampere Computing has its Altra and Altra Max CPUs in Alibaba, Baidu, Tencent, Microsoft, and Google and will not be able to sell into AWS but can probably make its way into Facebook and Apple. The point is, to expand its total addressable market, Ampere Computing is going to have to go where the market is leading it.
“At this moment, we see three driving forces for Arm in HPC and AI,” Fabrizio Magugliani, head of strategic planning and business development for E4 Computer Engineering, tells The Next Platform. “The first one is the European Processor Initiative, which has selected the Arm ISA for the “Rhea” general purpose processor. E4 is a member of the European Processor Initiative, and we will integrate the Rhea CPU into systems. The second degree of freedom is the fact that for most of the scientific workloads today, the processor is basically the driver of the GPUs and both Rhea and Altra support Nvidia’s CUDA offload. And third, with AI applications, again the workload is driven mostly by GPUs, and an Arm CPU is a very good solution because it shows a good TDP while driving the same performance as the top-level Intel Xeon processors. So more and more HPC users will endorse the Arm ecosystem because it has a comparable level of performance as top level X86 CPUs and an overall a lower total cost of ownership.”
Magugliani adds that E4 already has a couple of customers who have deployed Ampere Computing Altra Max CPUs into their systems, but cannot name names because of confidentiality agreements of these early adopters.
To help foster more widespread deployment of Altra and Altra Max CPUs in hybrid CPU-GPU systems, E4 has worked with Ampere Computing and Nvidia to put together what it calls the Nvidia Arm HPC Developer Kit, which puts an Altra CPU and an A100 GPU accelerator on a system node and bundles the Nvidia HPC SDK toolkit on top of it so customers can load and go with testing HPC workloads on accelerated systems. And, incidentally, Magugliani says that it has some other customers who are marrying the Altra and Altra Max processors with Xilinx FPGAs from AMD, too, the adoption of Arm CPUs in hybrid systems is not restricted to GPUs, whether they come from Nvidia, AMD, or Intel. The EPI’s own STX accelerator, which we have written about here and which turbocharges the math used for stencil tensor operations commonly used in the oil and gas industry, could also be well paired to an Ampere Computing Arm processor.