Japan Strikes First in Exascale Supercomputing Battle

RIKEN and Fujitsu announced that they have finished the design of the Post-K exascale platform, paving the way for production of the hardware, followed by shipping and installation. Still in the running to be the world’s first exascale system, Japan’s Ministry of Education, Culture, Sports, Science and Technology (MEXT) is aiming to make the system available to its users “around 2021 or 2022.”

For Fujitsu, the announcement has special significance since the company is now able to manufacture, ship, and install the hardware for the Post-K machine. Concurrently, Fujitsu also revealed that they will be offering a commercial version of Post-K for the HPC market, based on the same set of technologies.

It will be interesting to see what kind of buyers show up for the commercial Post-K offering, which presumably will be folded into their PrimeHPC portfolio. We wouldn’t be at all surprised to see the one or more European governments invest in a system or systems, given their predilection for Arm-powered supercomputing. The UK government, in particular, seems especially devoted to the processor architecture for public HPC duty, having deployed Isambard, which is currently the largest Arm-power system in Europe. Some of Fujitsu’s existing domestic HPC customers would also be good prospects for such a product, and these include a whole slew of Japanese universities and government research agencies. In any case, potential customers won’t have to wait very long to get their hands on Post-K technology. Sales of these systems are slated to begin in the second half of fiscal 2019.

Based on our previous reporting on the Post-K rollout, that timeframe also lines up with when the first Post-K racks are expected to be installed at RIKEN.  Given that it’s already April, we suspect the production and assembly of the server components are already well underway. Which means that whoever is Fujitsu’s manufacturer of the Post-K’s A64X processor, will need to churn out a few hundred thousand of them over the next year or so. According the company’s announcement, the A64FX  will deliver “over 2.7 teraflops,” which is the same number Fujitsu was touting last August, when they divulged the initial details about the processor. At the time, we thought, Post-K might get a more advanced, powerful version of the A64FX, but at least for now, that doesn’t appear to be in the cards.

When you’re talking exascale, 2.7 teraflops per chip means you’re going to need about 370,000 of them to get to one peak exaflop, and 10 or 20 percent more than that to get to a Linpack exaflop. Since Post-K is built with just a single A64FX per node, the processor count also represents the number of server nodes in the system, which means a lot of the burden of keeping these chips working in unison will fall to the system’s new Tofu D interconnect, the network glue specifically devised for this kind of scalability.

The A64FX is the first implementation of Arm’s Scalable Vector Extension (SVE) capability, and as such, is something of a test case for Arm and its ability to go head-to-head against the best and brightest from Intel and AMD. Intel’s upcoming Cascade Lake-AP and AMD Epyc “Rome” processors are likely to be just as floppy as the A64FX, if not more so. However, neither one of them comes with 32GB of on-package HBM2 memory, as the A64FX does, which gives the Fujitsu processor a terabyte per second of memory bandwidth. That kind of memory access speed is usually associated only with GPU accelerators, which also use on-package HBM2 to feed their multi-teraflop silicon.

Which brings us to one of the more notable features of the Post-K supercomputer: it will be powered exclusively by its central processor, without any boost from a coprocessor. That kind of architecture is something of an outlier these days for exascale machinery, and in fact is more reminiscent of the vector supercomputers of old. Both of the first exascale systems in the US and China will be accelerated – the DOE’s Aurora system, with Intel’s Xe discrete GPUs, and the Chinese Tianhe-3 system, with the domestically-produced Matrix-3000 DSPs. And it’s a pretty good bet that one or both of the other two early US exascale systems – Frontier and El Capitan – will be accelerated by GPUs. Even the EU, via their European Processor Initiative, is looking to RISC-V as the basis for their domestic HPC accelerator for exascale systems on the continent.

All of which is a long way of saying that it’s easy to fixate on the novelty of Fujitsu’s Arm technology, but the monolithic design of Post-K may turn out to be its truly unique feature. In fact, if the x86 architecture had been offered with a more liberal licensing policy, Post-K might have ended up being powered by something that looked awfully close to a fourth-generation Xeon Phi.

Fujitsu is, of course, hoping to monetize all their exascale technology beyond the Post-K contract, which is why they are planning to develop a commercial HPC product line around it.  Apparently, the company is kicking around the idea of developing an entry-level model for this platform, although it didn’t really elaborate on what exactly that entailed. Obviously, not everyone is in the market for an exascale machine, so maybe you’ll be able to buy a rack or two of Post-K goodness with a simpler Tofu network. By the way, a single rack of 2.7-teraflop A64FX nodes will get you around a petaflop of performance.

Fujitsu is also considering supplying these technologies to other vendors.  We read this to mean that the company is mainly thinking about selling its A64FX chips to Arm server makers interested in the HPC market or licensing its A64FX IP to processor vendors looking for a quick path to Arm SVE. The latter case opens up a lot more possibilities, inasmuch as not everyone is interested in the on-chip Tofu D controller, but they may want to mix Arm SVE circuitry with other interesting IP blocks. Either approach could help Fujitsu amortize the enormous cost of developing the hardware and software for this processor.

More significantly, it would potentially make the Arm vector technology available to a much wider audience, much faster. Marvell is almost certainly considering an SVE implementation for ThunderX3, and maybe that’s already in the works. But we have to believe such a project will take more than a few years to reach fruition. On the other hand, if Cray, Bull, or other HPC system vendors could buy A64FX chips or their equivalent within the next couple of years from Fujitsu or other chipmakers and find a way to slot them into their own servers, that would change the dynamics of the Arm-HPC market rather dramatically. Obviously, whether any of this comes to pass remains to be seen, but it’s sure fun to think about.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.