China’s Triple Play For Pre-Exascale Systems

Before any country can deploy an exascale system, they have to get pre-exascale prototypes into the field to test out their underlying technologies and determine what approaches have the best chance of scaling up performance and being manufactured affordably. It looks like China is looking at three different pre-exascale systems, and none of them will deploy processors or accelerators made by US companies.

It is no secret that China has wanted to develop an indigenous capability to design chips and build supercomputer-class systems, and this was true even before the US government put the kibosh on selling Intel Xeon and Xeon Phi coprocessors to certain labs in China last year. That ban spawned what, from the outside, looks like a flurry of new chip development activity, but what is clear from the unveiling of the 93 petaflops Sunway TaihuLight supercomputer in June – a working system with a sophisticated and elegant processor that rivals anything an American, European, or Japanese company can put into the field. China has dabbled with Sparc and Alpha processors for years, and tried to create its own variant of the MIPS architecture with an X86 compatibility mode with the Godson chips. But with the Shenwei SW26010 processors used in the Sunway TaihuLight, which have 260 cores running at 1.45 GHz per socket and which delivers around 3 teraflops of number crunching power at double precision. Significantly, the performance of the SW26010 is on par with Intel’s “Knights Landing” Xeon Phi processors, and gives China has a solid foundation on which to push upwards to exascale systems.

As it turns out, China is not betting solely on the Shenwei chips, and apparently has plans to build three different pre-exascale systems with three very different architectures, according to some Tweets put out by James Lin, vice director for the Center of HPC at Shanghai Jiao Tong University.

The most interesting statement made by Lin was that due to the embargo on the Tianhe-2A system, all national-level supercomputer labs need to use processor technology that was “self-controllable.” (Those are his quotes, not ours, and it is not clear who Lin is quoting.) The Shenwei chips are absolutely under the control of the Chinese government and indigenous chip industry, and so is the Matrix2000 DSP accelerator that was revealed by the National University of Defense Technology at last year’s ISC supercomputing conference as a reaction to the embargo. That DSP runs at around delivers around 2.4 teraflops at double precision in a 200 watt power envelope. That’s not nearly as impressive as the Knights Landing or SW26010 in terms of performance per watt, so this DSP is going to have to crank up the performance without breaking the thermal envelope to compete at the pre-exascale level.

According to Lin, the three-way horse race for exascale machines in China will set up a horse race between three different organizations to build pre-exascale clusters based on ARM, Shenwei, and AMD (presumably Opteron) technologies. The first pre-exascale machine is being created by NUDT and will use ARM-based processors and will be deployed at the national supercomputer center in Tianjin where the Tianhe-1A CPU-GPU hybrid was deployed in 2010 and gave China its first top spot on the Top 500 rankings of supercomputers. There is no mention of using the Matrix2000 DSP accelerator with this system, but unless NUDT plans to create its own ARM chip with a homegrown floating point accelerator and embed it on the die, it stands to reason that this first pre-exascale machine will be an ARM-DSP hybrid.

The second pre-exascale machine is being developed by the same people who put together the Sunway TaihuLight system, and it will be deployed in the national supercomputing center in Jinan, where its predecessor, the Sunway Bluelight system, currently runs.

The third pre-exascale machine, and perhaps equally interesting, will be built by Chinese system maker Sugon and will employ an X86 processor licensed from AMD. We presume this is a licensed variant of the future “Zen” Opteron chip, due in 2017 for servers. It is not clear who is doing the licensing of the X86 technology from AMD, but back in April, AMD announced that it had inked a deal worth $293 million to license X86 chip technology to Tianjin Haiguang Advanced Technology Investment Co, which is itself an investment consortium that is guided by the Chinese Academy of Sciences. (By the way, server maker Lenovo traces its roots back to the CAS as well.) AMD said back in April that it believes that the deal with THATIC does not violate its cross-licensing agreements with Intel or export regulations with the US government. (We will find that out soon enough if Intel or the US government do not agree.)

Each of these three pre-exascale machines will come in at around 2.5 petaflops of peak performance, according to Lin and have somewhere between 500 and 600 nodes.

Back in May, China committed to delivering an exascale-class machine by 2020 with 10 PB of memory, exabytes of storage, and 30 gigaflops per watt efficiency (about five times better than the new Sunway TaihuLight system), and greater than 60 percent efficiency on the Linpack Fortran benchmark test.

It is interesting to note that a pre-exascale system based on Power chips is not, as far as we know, in the cards for this horse race to exascale. The US government is certainly betting on the combination of the Power processor and the Tesla coprocessor with the Department of Energy’s future “Summit” and “Sierra” systems, and China could take a kicker to the CP1 chip that is under development by Suzhou PowerCore and based on the Power8 architecture and use to as the CPU in a hybrid CPU-DSP machine. It is a bit of a mystery why the first pre-exascale machine did not do that, in fact. For whatever reason, the Chinese government seems to have opted for ARM over Power, if the statements by Lin are correct.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

5 Comments

  1. The HPL (High Performance Linkpack) benchmark that is used for top500 list is written in C and not Fortran.

  2. Well looks like China is the one who has clearly seen through the Power+GPU hype as it is what it is, just hype!

    • Not so much to do with hype(which is very much not hype as far as Power+GPU is concerned) as it is probably not getting a low enough licensing price from OpenPower for the power design/s. And AMD is free to license the x86 64 bit ISA, sans the 16/32 bit parts that are licensed from Intel. The cross licensing agreement between AMD and Intel is of a more equal value with Intel getting access to the x86 64 bit ISA that AMD created, and AMD getting access the legacy x86 16/32 bit ISA that Intel created. I’d guess that Intel is free to license that x86 16/32 bit ISA to anyone if it wanted, but that will not get Intel far in the x86 64 bit world, and AMD will be better off as far as most future HPC/supercomputer software being done for the 64 bit part of the x86 ISA anyways.

      Now take the AMD x86 64 bit ISA and add to that some GPU accelerator technology that AMD is also free to license and that still could result in some CPU, or APU variants without the x86 16/32 bit legacy and a smaller sized core that can still run any x86 64 bit code and send plenty of FP work over to the GPU accelerator, ether a PCI based GPU or a Interposer based APU with the x86 64 bit ISA only based Zen core(We will call it Zen legacy lite core without any 16/32 bit ISA fat)for an interposer package based HPC/server APU.

      I’m more inclined to thinking that AMD will not be allowing any licensee to make any third party business licensing AMD’s IP. I’m thinking that this Chinese deal is more similar AMD’s deal with the console makers for AMD to allow the licensee to use its x86 64 bit ISA and that AMD will provide that licensee with a Ready made GPU accelerator die that can be integrated via an interposer package with the x86 64 bit ISA core/s that the licensee can implement in any way that the licensee can achieve, much like the top tier ARMv8A ISA architectural licensees do with the licensed from ARM Holdings ARMv8A ISA. So the Chinese licensee is free to cook up its own micro-architecture that is engineered to run the x86 64 bit ISA, and only the 64 bit instructions. The Chinese licensee will be free to design up an interposer layout for the custom HPC APU on an interposer and may develop its own fabric/network on the interposer in which to connect up the x86 64 bit ISA based CPU’s cores to the GPU accelerator die supplied by AMD and create a very capable custom APU to meet the Chinese licensee’s needs. There is also the option for the Chinese licensee to simply develop its own x86 64 bit running custom micro-architecture and create its own socket and main-board design and use the PCI standard to connect the CPU up to any makers GPUs.

      AMD has a lot of Sea Micro IP that it can license for revenues, should the Chinese licensee need some very nice coherent fabric IP to complete a very nice HPC/server SKU, and AMD would probably be willing to help for a fee in assisting its IP licensee/s with some design integration services. AMD has a lot of IP to offer up for a limited semi-custom arrangement to any clients with the bankrolls to support all sorts of interesting CPU/GPU designs tailored to the clients needs, and that include a certain x86 64 bit ISA that AMD created, or the client could source the CPU part the standard way that the console makers get their x86(32/64) bit parts from AMD without any complaints from Intel. AMD has a lot more latitude with its X86 64 bit ISA and and a lot more relevance towards future computing that any 16/32 bit legacy will have going forward.

      P.S China will be a big OpenPower customer as will Google, and Google’s main-boards for its Power9’s will come from China, and most likely the Licensed Power9 custom Google variants also!

      • That is a good point. The x64 license is actually AMD IP, so I think they are not breaking any laws as far as export to China is concerned. Intel did not create it and US government restriction is only on Intel chips being sold to Chinese deference universities.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.