China Intercepts U.S. Restrictions with Homegrown Supercomputer Chips
July 15, 2015 Nicole Hemsoth
When the U.S. blocked exports of Intel chips for key Chinese supercomputers the expectation was that it would do little more than hasten the development of native architectures, effectively pushing up the timeline for Chinese chipmakers to find their own solutions to large-scale computing without Intel.
It appears that this is exactly what has happened. And far sooner than most might have thought.
System designer for the multiple iterations of the Tianhe machines, Dr. Yutong Lu, revealed this week that the Tianhe-2 supercomputer, which raked in yet another number one placement by a long stretch on the bi-annual list of the top 500 most powerful supercomputers, will be receiving its upgrade in the following year. However, due to the trade restrictions, they won’t be boosting their supercomputer with more Xeon Phi cores. Rather, the novel architecture they developed will deliver the system the extra 45 petaflops it needs to continue its reign at the top of the list for the foreseeable future.
A rapt audience, including The Next Platform, listened to Lu during a session at the International Supercomputing Conference in Germany as Lu outlined the digital signal processor (DSP) basis for the new chips that will extend Tianhe-2A (the name of the upgraded system) within the next year instead of by the end of 2015, as was originally planned. In fact, there was much speculation that China would have Tianhe-2A ready in time for the next benchmarking round to determine Top 500 supercomputer placement, meaning the world might have seen the first (by a rather long shot) 100 petaflop peak capable machine in history.
Dr. Lu has overseen the evolution of the Tianhe machines, beginning with the Tianhe-1A supercomputer, which took the world by surprise, toppling the dominant Titan system at Oak Ridge National Laboratory in 2013. She told the audience this week that the team at NUDT believe in the future of heterogeneous architectures and will move ahead as planned with the upgrade leveraging this new accelerator, which one can only imagine must have already been in development at NUDT for some time if the upgraded machine can have its new chips within one year.
As seen in the chart Lu presented below, the upgrade path for the Tianhe machines has been clear for some time, including the fact that the complete system would be around the 100 petaflop range for peak possible performance. The difference between the chart presented below and the one that has been seen before is that instead of the latest generation Xeon Phi chips as accelerators, the “China Accelerator” is the powerhouse of the machine.
The Tiahne-2 machine (and its eventual successor sporting the DSP accelerators) is housed at the National University of Defense Technology (NUDT) in China, As a side note, the fact that the center is a major defense and Chinese national security research center and DSPs are being leveraged in high performance systems as co-processors with all of the necessary software stack and programming environments to support it means this is likely something that has been in the works for military and defense systems within China. DSPs are frequently used in embedded military applications, including remote sensing, radar, and other activities—and Lu did tell the group this week that NUDT has had extensive experience with DSPs.
While we will cover the new DSP-based architecture in a separate article this afternoon once more information has been compiled (UPDATE– architectural feature can now be found here), the fact that the chips are not Chinese developed GPUs or coprocessors from one of several possible Chinese chipmakers that provide on-package acceleration with familiar elements is notable.
What is interesting is that the large Tianhe 1 and Tianhe 2 machines leveraged processors from U.S. companies, including Intel and NVIDIA and the site was planning to continue this trend with future upgrades featuring next-generation Intel chip technology. What it was that sparked the trade restrictions exactly is still unknown (and it is not as though the U.S. doesn’t already have a keen sense about what goes on applications-wise in the systems it ships to China) but it does appear that the center is still set to use the latest generation Xeon “Haswell” host processors with the addition of their China Accelerator—although it is not unlikely that by the time the upgraded machine emerges it could sport a processor from ShenWei or another Chinese chip design and manufacturing house.
Lu says that the team at NUDT is still on track to continue work on key applications that consume large numbers of the machine’s 3,120,000 cores (a combination of two Intel “Ivy Bridge” processors and three Xeon Phi coprocessors). She pointed to application successes in computational fluid dynamics (CFD), which is the most popular code area for Tianhe-2. From scramjet combustion and large passenger and cargo aircraft simulations, the teams have scaled their code past the million core mark with parallel efficiency of close to 80%, Lu says.
Other research areas, including genomics (population genetics and biomedical applications are two Lu described in detail) are also running on TIanhe-2 with unprecedented levels of scalability and parallel efficiency, validating the “neo-heterogeneous” approach (the combination of SMP processors and manycore accelerators) as the continued path forward for China’s supercomputers.
With the rollout of Tianhe-2A sometime next year, Lu says China will expand its capabilities in Hadoop and Spark as large-scale analytical platforms, as well as build out the Kylin Cloud—a multidisciplinary cloud platform that combines multiple research areas aimed at wellness (healthcare, social, public policy, etc.).
Although China has what is far and away the most powerful supercomputer on the planet, the number of systems in China has dropped significantly over the last year. In November of 2014, the country claimed 61 systems on the Top 500—a number that has plummeted to 37 with the retirement of several systems that were toward the bottom. The United States is holding steady with 230 machines on the Top 500—an impressive number, but this is the fewest supercomputers on U.S. soil outside of one other drop (down to 226) in the early 2000s.
Even still, the country sees a path forward—and is working toward being able to do this without American chip vendors like Intel. In the last ten years, China has unfolded a 1000x increase in system performance in 10 years—and while that may represent a competitive threat to the United States, or a national security one (no one is entirely sure which precipitated the restriction) it creates a new set of conditions for U.S. chip manufacturers who could see their business shrink in an area that is making big investments in big systems.