Exascale Might Prove To Be More Than A Grand Challenge
September 6, 2016 Timothy Prickett Morgan
The supercomputing industry is accustomed to 1,000X performance strides, and that is because people like to think in big round numbers and bold concepts. Every leap in performance is exciting not just because of the engineering challenges in bringing systems with kilo, mega, tera, peta, and exa scales into being, but because of the science that is enabled by such increasingly massive machines.
But every leap is getting a bit more difficult as imagination meets up with the constraints of budgets and the laws of physics. The exascale leap is proving to be particularly difficult, and not just because it is the one we are looking across the chasm at.
At the ISC supercomputing conference in Germany back in June, the Japanese government’s RIKEN supercomputing center and server maker Fujitsu made a big splash by divulging that the future Post-K supercomputer that is being developed under the Flagship 2020 program would be based on the ARM architecture, not the Sparc64 fx family of processors that Fujitsu has been creating for massively parallel systems since Project Keisoku was started back in September 2006. For the past several weeks, rumors have been going around that the Post-K supercomputer project was going to be delayed, and sources at RIKEN have confirmed to The Next Platform that the Post-K machine has indeed been delayed, by as much as one to two years.
These sources, speaking on the condition of anonymity, tell us that the issue has to do with the semiconductor design for the processors that are to be used in the Post-K machine, which will be based on a homegrown ARMv8-compatible core created by Fujitsu and which will also be using the vector extensions for the ARM architecture being developed by ARM Holdings in conjunction with Fujitsu and unnamed others.
The precise nature of the problem was not revealed, and considering that we do not know the process technology or fab partner that Fujitsu will be using – it is almost certainly a 10 nanometer part being etched by Taiwan Semiconductor Manufacturing Corp – it is hard to guess what the issue is. Our guess is that adding high bandwidth memory and Tofu 6D mesh torus interconnects to the ARM architecture is proving more difficult than expected. Fujitsu has already added HMC2 to the current Sparc64-XIfx processor, which is an impressive chip that many people are unaware of and which is a good foundation for a future ARM chip. Adding the SVE extensions to the ARM cores and also sufficient on-chip bandwidth for a future HMC iteration and a future Tofu interconnect might be the challenge, coupled to the normal difficulties of trying to get a very advanced 3D transistor process to yield.
When it comes to supercomputing, you have to respect ambition because without it we would never hit the successive performance targets at all.
The original Project Keisuko, which was also known as the Next Generation Supercomputer Project when it launched precisely a decade ago, was certainly ambitious and was meant to bring all three major Japanese server makers into the project. Being a follow-on to the Earth Simulator supercomputer, which was a massively parallel SX series vector machine, the Keisuko machine was supposed to have a very large chunk of its 10 petaflops of performance coming from future NEC vector motors. Earth Simulator cost $350 million to build and with 5,120 SX nodes it was able to reach 35.8 teraflops of performance.
The Keisuko machine had a $1.2 billion budget, including the development of a scalar processor from Fujitsu, which turned out to be the eight-core “Venus” Sparc64-VIIIfx, and the Tofu interconnect created by Hitachi and NEC. NEC pulled out of Project Keisuko in May 2009, when the Great Recession was slamming the financials of all IT suppliers, and but commercialized some of the vector advances that it co-created with Hitachi. By November 2009, rumors were going around that the Japanese government, under extreme financial pressure, was going to cancel the Keisuko effort. Fujitsu started talking up its Venus chip, and did a lot of financial jujitsu and managed to talk the Japanese government and the Advanced Institute for Computational Science campus of RIKEN in Kobe to accept a scalar-only super. In the process, it took control of the Tofu interconnect, and has built a tidy little HPC business from the parts of the original Project Keisuko effort, which resulted in a machine simply called K.
When the Flagship 2020 effort resulting in the Post-K machine was started in 2014, it had a $910 million budget, and that lower number was more about the exchange rate between the US dollar and the yen than the actual amounts budgeted. The K project had 115 billion yen allocated for it, and Post-K is costing 110 billion yen. So the financial cost is basically the same for a system (including facilities, running costs, and software development) that will have 100X more raw performance.
As you can see from the chart above, the basic design of the Post-K super was done last year, and it is not a coincidence that this was about the same time the rumors started going around that Fujitsu would be abandoning its Sparc64-fx chip in favor of an ARM design. Under that original schedule, design and implementation was supposed to run through early 2018, when manufacturing would commence and the machine was to be installed and tested in phases in 2018 and 2019 to be operational around the second quarter of 2020 – just in time for the ISC supercomputing event and a killer Top500 rating.
Now, that is not going to happen until 2021 or 2022, if all goes well, and RIKEN was looking forward to having bragging rights to having the first exascale-class machine in the field. The US government does not expect to get its first exascale machine into the field until 2023, and does not seem compelled to try to accelerate its exascale roadmap based on the ambitious schedules set by the governments of Japan and China. The Middle Kingdom has three different exascale machines under development right now – based on Shenwei, AMD (presumably Opteron), and ARM architectures, as we have previously reported. The Chinse government was set to deliver an exascale machine with 10 PB of memory, exabytes of storage, and an efficiency of 30 gigaflops per watt by 2020. Now, China is ahead of Japan and way ahead of the United States.
It is likely that there will be delays with any project of such scale, which is why China is backing three architectures and it is also why the US is probably going to back two architectures, as it is doing with the pre-exascale machines, “Summit” (a hybrid IBM Power-Nvidia Tesla system) and “Aurora” (based on the “Knights Hill” massively parallel processor from Intel). Europe has a number of exascale research projects, but thus far only Atos/Bull has an exascale system deal, in this case with CEA, the French atomic agency, for a massively scaled kicker to its Sequana family of systems. by 2020.
We expect more exascale projects and more delays as the engineering challenges mount. But we also think that compromises will be made in the power consumption and thermals to get workable systems that do truly fantastic things with modeling and simulation. That’s just how the HPC community works, and even tens of billions of dollars for a dozen exascale machines is not too much for Earth to spend. It is a good investment for so many reasons.