Many of us are impatient for Arm processors to take off in the datacenter in general and in HPC in particular. And ever so slowly, it looks like it is starting to happen.
Every system buyer wants choice because choice increases competition, which lowers cost and mitigates against risk. But no organization, no matter how large, can afford to build its own software ecosystem. Even the hyperscalers like Google and Facebook, whole literally make money on the apps running on their vast infrastructure, rely heavily on the open source community, taking as much as they give back. So it is with the parallel HPC community. It has been two decades since Linux first took distributed supercomputing by storm, and by and large on machines powered by X86 processors, and that only happened thanks to the myriad personal, academic, governmental, and commercial collective efforts of researchers and IT experts around the world.
Arm Holdings, the chip designer division of IT conglomerate Softbank, has made no secret of its aspirations in HPC. Among other interesting developments, Cray is building a cluster, nick-named “Isambard,” using its “Aries” dragonfly interconnect to lash together somewhere north of 300 of Cavium’s 32-core ThunderX2 Arm server processors for the University of Bristol that will bring over 10,000 cores to bear on HPC workloads. Arm is also working with Fujitsu on a set of vector extensions to the Arm architecture that will be deployed in a future exascale system in Japan, and Arm bought Allinea, a maker of debugger and profiling tools to create its own compiler stack as an alternative to the open source GCC compilers.
Getting systems based on Arm chips into academia and among third party software developers is a key – and perhaps vital – step in moving Arm-based supercomputers from concept to production, and three universities in the United Kingdom, working in concert with Arm, Cavium, Hewlett Packard Enterprise, and SUSE Linux, are going to be among the first organizations in the world to get clusters based on Cavium’s next-generation ThunderX2 processors so they can help flesh out the HPC software ecosystem for Arm processors. If you don’t get the software running, they won’t come.
Working under the auspices of a collaborative effort called Catalyst UK, the University of Edinburgh, the University of Bristol, and the University of Leicester will be each getting clone clusters where they can help port and test HPC code running atop SUSE Linux Enterprise Server.
All of the parties in the Catalyst UK effort are throwing in people and wares to get this important done. The universities are hosting the machines and providing the people to do software porting and testing. HPE is contributing six racks of Apollo 70 servers, which were launched last year, with the compute sleds used in the clusters based on its “Comanche” design. The Comanche server sleds have a pair of the ThunderX2 processors from Cavium, which are not the homegrown designs we told you about here in June 2016 and which have 54 cores running at 3 GHz and six DDR4 memory controllers. Rather, the Comanche server sleds have a pair of the ThunderX2 processors that are derived from the “Vulcan” design at former rival Broadcom, which shuttered its Arm server chip development efforts in late 2016 and sold off the designs to Cavium.
This Vulcan Arm server chip design had 32 cores with a top potential speed of 3 GHz, but importantly had eight DDR4 memory controllers, providing a better balance between processing and memory bandwidth. The ThunderX2 chips inside the three clusters going into the Catalyst UK effort are spinning at a more modest 2.2 GHz. The server nodes are configured with 128 GB of memory using eight 16 GB memory sticks, which is not a lot of memory but it is not an unusually small amount for an HPC cluster node – especially given the high cost of main memory these days. The nodes have ConnectX-5 InfiniBand adapters from Mellanox Technologies and link out to 100 Gb/sec EDR InfiniBand switches; they run the HPC variant of SUSE Linux Enterprise Server. Each of the three clusters deployed at Edinburgh, Bristol, and Leicester is comprised of two racks of machines, with 32 of them in each rack for a total of 4,096 cores and 8 TB of distributed memory, all within a 30 kilowatt power envelope. The storage for the machines was not detailed.
Cavium is not yet shipping these ThunderX2 processors, so we don’t have a lot of metrics, in terms of floating point operations per second at single precision or double precision, for these clusters. We have heard through the grapevine that the Comanche node delivers 1.13 teraflops peak at double precision and more than 240 GB/sec of memory bandwidth on the STREAM benchmark. In many cases – particularly for HPC codes – the memory bandwidth that is available on the ThunderX2 is making it perform better than you might otherwise expect relative to the latest “Broadwell” Xeon E5 and “Skylake” Xeon SP processors from Intel, at ;east based on the flops. Last fall, Simon McIntosh-Smith, one of the leads on the Isambard system, who also worked on the European Mont Blanc Arm clustering project, put out benchmarks last fall showing off the performance of the ThunderX2 against Broadwell and Skylake Xeon systems, with anywhere from 50 percent to 100 percent better oomph on many HPC workloads and parity or a little better on others.
This will soon be a real contest, at long last.
The Apollo 70 systems from HPE are also not yet shipping, but the company thinks that this time around, thanks to the ecosystem work it has done with the Comanche server sled – which coincided with Red Hat finally supporting its Enterprise Linux distribution for real on the Arm architecture – and the work being done by the Catalyst UK collaborators, the uptake of Arm systems will be better than with the earlier “Redstone” and “Moonshot” systems earlier in the decade.
“The impetus of the Catalyst program is to accelerate the software ecosystem around Arm,” Mike Vildibill, vice president of the Advanced Technologies Group at HPE tells The Next Platform. “We are so early in the cycle that developing the ecosystem is first and foremost to its adoption. I can tell you first hand that there will be large scale Apollo 70 deployments, and these will be announced in the future. But this focus is on the ecosystem development so these larger deployments can take place. Technology alone does not drive adoption, and a lot of other factors, including the readiness of the ecosystem, are crucial to success. It is not just a feeds and speeds play.”
The three clusters at Edinburgh, Bristol, and Leicester will be installed later this summer, according to Vildibill. At this point these machines are designed for strictly CPU-only clusters and there are no plans to add GPU, FPGA, or DSP acceleration to them.
As we have pointed out many times before, supercomputing is inherently a political as well as technological and economical phenomenon. It is natural enough that a British company that once was called Advanced RISC Machines should have 64-bit processors based on its architecture end up as the main motors in a collection of clusters at three different universities that have been spearheading research in HPC for decades. We have nothing against SUSE Linux or Red Hat, but it would seem that Canonical, maker of the Ubuntu Server version of Linux, would be a perfect British pairing with these ARM systems. But SUSE Linux and Red Hat have done the work tuning up Linux for HPC, while Canonical has focused on hyperscale and AI workloads with its Linux.