It’s been just more than a decade since Arm executives began to talk about bringing the company’s system-on-a-chip (SoC) architecture, which has long been the dominant design for processors found in billions of smartphones and other mobile device, into the datacenter. They boasted that within years they would be chipping away at Intel’s massive market share in the server chip space, driven by the growing demand among HPC organizations and enterprises for greater system power efficiency and density and a viable alternative to Intel, if for no other reason than competition always benefits the customer.
The Next Platform and other tech pubs have since chronicled the march by Arm and its chip manufacturing partners to develop the architecture, ecosystem and processors that would get them into the datacenter, from the rise and fall of Calxeda and Applied Micro (whose technology has been resurrected by Ampere Computing), Cavium’s continued development of its ThunderX chips (now owned by Marvell) and the flirtation by the likes of Qualcomm and Samsung before bowing out.
There has been more than a share of stops and starts, steps forward and backward, but Arm – owned by Japanese multinational SoftBank Group – over the past couple of years has been getting some traction. Cavium/Marvell has come out with two generations of its ThunderX chips and is on the verge of ThunderX3, Ampere only this week unveiled its Altra line of chips for hyperscalers and cloud providers and giant public cloud provider Amazon Web Services has its Graviton chips for cloud environments.
Notably, Fujitsu has developed its Arm-based A64FX manycore processor for its upcoming “Fugaku” Post-K supercomputer and is looking for OEMs who want to use the chip in their own systems. The Next Platform has done a few deep dives into the chip and Fugaku. Cray – now part of Hewlett Packard Enterprise – has made such a deal with Fujitsu and will feature the chip in its “Storm” line of CS500 clusters, spreading the goodness of the A64FX chip around the globe.
One of those places is the University of Bristol in the United Kingdom, home of the Isambard supercomputer, a cluster of 168 nodes based on the Cray XC50 system and powered by 10,496 ThunderX2 cores. Isambard 2 is coming this year and will include not only expanding the existing cluster – increasing the number of cores to 21,504 across 336 nodes – but also adding a Cray partition powered by Fujitsu’s A64FX CPUs rather than ThunderX2 or upcoming ThunderX3 chips.
It is a change for the supercomputer, but the adoption of A64FX chips fits with Isambard’s mission, according to Simon McIntosh-Smith, principal investigator for the Isambard project and a professor of HPC at the university. It’s designated in the UK as a tier-two supercomputer – a regional system – which means it has a few missions.
“The most important aspect is diversity in architecture and to give us a good cross-section of all relevant future technologies that we want the UK community to be able to build up experience with,” McIntosh-Smith told The Next Platform. “In Isambard, it’s the Arm architecture that we really wanted to make available to people to try for real and any other techy sort of things like different kinds of interconnect, different kinds of storage technologies, including the non-volatile, memory-dense CPU systems, regular GPU systems. It could have a bit of everything. In tier two, we’ve got a broad range of everything that people might want to try.”
Isambard 2 And A64FX
Isambard already is running ThunderX2 processors. The A64FX chip fits in with the drive for technology diversity, he says. Once up and running, Isambard 2 will be the largest Arm-based supercomputer in Europe.
“ThunderX3 is actually really closely related to and actually quite similar to ThunderX2, so it’s basically the same core,” McIntosh-Smith says. “Slightly faster memory, but the same number of memory channels, so from a scientifically new and interesting point of view, it’s more of an evolution, whereas A64FX has two major new technologies that we’re really, really keen to get our hands on. One is Scalable Vector Extension, which is really wide vectors. That’s not in ThunderX3. And the other is high bandwidth memory on the CPU, HBM2, and that’s not on ThunderX3 either. If we were just building a brand-new capacity system and that’s all we wanted, ThunderX3 looks like it would be a really excellent choice for that. But we’ve already got the existing Isambard system. It’s kind of easy just to expand that and then add in the new technologies, which we’ve got today in A64FX. That’s why we went that way, because it’s more different and it’s got specifically those two new things which I think are going to be really important in the future. It has the wide vectors and the high-bandwidth memory, which we just can’t get on ThunderX3.”
Both technologies are important as the industry not only moves from pre-exascale to exascale systems, but also eventually general-purpose exascale systems that can run a wide array of workloads. The SVE offers wide vectors that will help CPUs run workloads that tend to run now on GPUs, he says. However, some of those applications need to be tweaked to get them to run well on GPUs. Similarly, high-bandwidth memory also is a significant benefit of GPUs. Now those are on an Arm-based CPU. GPUs have two key technologies, McIntosh-Smith says: Wide vectors like SVE and a lot of memory bandwidth. Fujitsu stuck both onto A64FX.
“If that works, we’ve got CPUs with all the best bits of a GPU – so lots of floating point operations and lots of memory bandwidth – but it’s just a CPU and it can run MPI or anything else,” he says, adding that they are “the two fundamental enablers for me for what I would call general-purpose, easy-to-use exascale, which I think will take a little bit longer to get to. First-generation exascale, we’ll do anything we can to get there as soon as we can, even if that’s using technologies that a bit more exotic, like GPUs. But eventually we want everyone to be able use exascale and it needs to be a lot easier to use than that. SVE and high-bandwidth memory feel like two of the key technologies to explore along that path.”
Other Architectures As Well
A key role for tier-two supercomputers is using them to compare the performance and other capabilities of various technologies. So while Isambard and Isambard 2 are based on the Arm architecture, in Isambard 2 there will be an updated set of comparison systems running Intel’s “Cascade Lake” Xeon SP CPUs and AMD “Rome” Epyc chips, as well as the latest GPUs from Nvidia and AMD. The university already has IBM Power9 processors in Isambard. There also will be some ThunderX3 chips, and McIntosh-Smith says he is interested in checking out Ampere’s new chip as well as AWS’ upcoming Graviton2.
“It’s sort of the precursors to what’s going to be in the exascale machines in the United States that use AMD GPUs. We’re going to have some early versions of those,” he says. “Then in Isambard, you can do some really good sort of node-level performance comparisons between all of the major CPUs, all the major GPUs, including all of the Arm stuff, all in one system. You can make it as adaptable or as scientifically rigorous as you can. We want to take a suite of benchmarks and run them on all these different architectures and see which ones are stronger for what and then we’ll share that with everyone. Our users can find out what they want to run their workloads on. Lots FLOPS will run on A64FX, so we’ll help people do these sorts of evaluations across the range of different implementations that are becoming available.”
Networking And Storage
The current Isambard uses Cray’s “Aries” interconnect on the Dragonfly topology. McIntosh-Smith says there was thought given to using Fujitu’s “Tofu” interconnect, but the Isambard engineers wanted to continue working with Cray compilers and libraries, so with Isambard 2 and the A64FX cluster, the interconnect will be InfiniBand from Mellanox Technologies. Eventually the plan is also to bring in a rack of Cray’s “Slingshot” interconnect developed for the OEM’s Shasta systems. Slingshot is being designed to offer features that enable it to run in both HPC and enterprise environments and be highly scalable and high performing. Once that’s in place, the Isambard engineers will have three network technologies to compare.
There is about a petabyte of storage on Isambard, which uses a Cray Sonexion cabinet.
The timetable McIntosh-Smith is looking at is first expanding the current system in May by doubling the number of cores. The demand on the supercomputer is heavy and the goal is to keep Isambard running as the workhorse. In October, the rack of A64FX-based Cray systems will be installed and will run alongside the existing Isambard cluster. Once everything is in place, engineers will run rigorous comparisons of the various technologies, while also beginning to look toward Isambard 3.
The Isambard project got a significant financial boost last month. The same day the government in the United Kingdom announced a $1.56 billion investment in weather and climate supercomputing efforts, it also announced $5.3 million in funding for Isambard 2, which will be hosted by the Met Office. McIntosh-Smith said such an investment in a supercomputing system could become more common in the post-Brexit era. Now out of the European Union, the UK government will probably drive more funding in areas like technology as it looks to help boost the economy.
McIntosh-Smith says he also expects to see demand for Arm-based HPC systems to pick up. An Arm-based system broke into the Top500 list of the fastest supercomputers in November 2018 with the Astra supercomputer, housed at the Sandia National Labs and powered by 135,328 ThunderX2 cores. A year later, Fujitsu’s A64FX prototype reached the top of the Green500 list of the most energy-efficient systems.
A question about Arm-based systems has always been how easy would it be run workloads on them and how much work would need to be done to the code, he says. With Isambard, what users found was that their workloads just ran and what competitive in performance to X86. Some users didn’t realize they were running on the Arm architecture, he says. At most there was some tweaks of the compilers.
The added competition Intel is seeing from AMD’s Epyc chips is helping to drive down overall costs for x86 processors, but even with the price drops, Arm still will provide better performance-per-dollar, giving HPC users “a way to get even more science done for our dollars over time,” he says.
“I think we’re going to see an increase in uptake and I think we’ll see a few more bigger systems start to pop into the Top500 based on Arm,” McIntosh-Smith says. “They’ll be worth watching out for. We’re not going to see everyone switch to it overnight, but I think you’ll see more this time. And then I think over the next couple of years, we’ll see Arm start to take more market share from the top. It’s going to be slow at first and then I think it is going to start to get faster, so watch this space for the level of adoption. I think is just the beginning.”