UK’s ARCHER2 Supercomputer Aims Squarely At The CPU Bullseye

The government of the United Kingdom is plunking down £79 million to purchase and operate ARCHER2, the country’s soon-to-be national supercomputer on track to be installed at the University of Edinburgh next year. The system is expected to be five times as fast as the original ARCHER system deployed in 2013.

According to the EPCC center director Mark Parsons, vendors have until May to get their bids in, with final selection scheduled to take place this summer. Parsons, who is not involved in procurement decisions, told us that he expects ARCHER2 will deliver somewhere between 20 petaflops and 25 petaflops of peak performance, which is between eight and ten times what the current ARCHER machine provides. The more conservative estimate of a five-fold increase in capability is related to application performance on the various numerical simulation workloads being run at the center. That encompasses an array of codes in life sciences, climate and weather modeling, energy production, and materials science, to name a few.

Although the contract for ARCHER2 has yet to be awarded, Parsons said they already have a pretty good idea of what the system will look like. According to Parsons, it will be powered exclusively by CPUs, forgoing GPUs or other types of accelerators, and this reflects the homogeneous architecture of the original ARCHER, a Cray XC30 machine outfitted with Intel “Ivy Bridge” Xeon chips.

Regarding processor choice, Parsons believes there are three legitimate contenders for the new system: Intel’s “Cascade Lake” Xeon SP, AMD’s “Rome” Epyc, and Arm, most likely Marvell’s ThunderX2. The Arm choice is a long shot, he admitted, since ThunderX2 is seen more as stepping stone to a true high-performance Arm processor, rather than as a genuine alternative to either of the current high-end offerings from Intel or AMD.

Parsons said many of the codes that run on ARCHER are limited when it comes to memory bandwidth. “The better the balance between memory bandwidth and flops, the more attractive the processor is,” Parsons says. “That’s really where the battle lines will be drawn.”

The other consideration that will loom large is energy efficiency. Despite the expectation that ARCHER2 will deliver up to ten times the peak flops and five time the application flops as its predecessor, it will have to run in the same 4 megawatt datacenter. ARCHER currently burns 3.3 megawatts to deliver 1.6 petaflops of sustained on the Linpack test, but that was four processor generations ago. Either Cascade Lake or Rome should be able to hit the new numbers fairly easily. Even relying solely on the current-generation “Skylake” Xeon SP processor, Eni’s newest supercomputer is able to achieve 18.6 peak petaflops with just 1.3 megawatts.

Of the £79 million allocated for the system, £40 million is being budgeted for hardware. That should be plenty of money to get ARCHER2 to 20 petaflops or more, with the assumption that its funders will endeavor to get the largest system possible for the money that fits into the power budget. The fact that they are limiting themselves to a non-accelerated system means they have sacrificed some performance in order to offer a more general-purpose platform.

Apparently, there was a significant amount of debate on whether to include GPUs or not. About 20 percent of the applications that run on ARCHER today can use GPU-enabled codes, so it would have made a certain amount of sense to equip a proportion of the system’s nodes with these accelerators. But the scientific community representing the two funding organizations – the Engineering and Physical Sciences Research Council (EPSCR) and the Natural Environment Research Council (NERC) – opted for a homogeneous system.

The lack of GPUs also means researchers will likely have to go elsewhere to do any significant work that involves machine learning. (That said, the upcoming Cascade Lake Xeon SPs will include the new Vector Neural Network Instructions, which are designed to accelerate deep learning  inferencing, although not training.) The lack of GPUs in such a large public system is something of an oddity these days, given the level of interest in augmenting HPC simulations with neural networks.

But as Parsons explained to us, the UK government has a more segregated approach to HPC and machine learning than is found in other places. In the United States, China, and even elsewhere in Europe, government-funded organizations are building and deploying systems that can handle both traditional simulations and machine learning codes, the idea being that these two application areas are converging. Nvidia certainly has had a big hand in that, since its most powerful Tesla GPU accelerator, the V100, conveniently offers both types of compute in a single package.

Apparently though, the UK has decided to follow in the footsteps of the Japanese government, whose approach is to deploy separate machinery for numerical simulations and AI/machine learning. For example, a GPU-dense system, like the AI Bridging Cloud Infrastructure (ABCI) supercomputer, is geared mostly for machine learning workloads, while the country’s first exascale machine, the Post-K supercomputer, will be powered exclusively by a vector-enhanced Arm processor, and will be devoted to traditional HPC work. Even here though, Post-K chip makes some allowance for machine learning codes, since it includes reduced precision formats and instructions for FP16, INT16 and INT8 that can be applied to machine learning codes.

“Without giving anything away, there is a separate large-scale AI strategy in the UK,” Parsons told us. “But there are no announcements around that yet.”

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

1 Comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.