China Intercepts U.S. Restrictions with Homegrown Supercomputer Chips

When the U.S. blocked exports of Intel chips for key Chinese supercomputers the expectation was that it would do little more than hasten the development of native architectures, effectively pushing up the timeline for Chinese chipmakers to find their own solutions to large-scale computing without Intel.

It appears that this is exactly what has happened. And far sooner than most might have thought.

System designer for the multiple iterations of the Tianhe machines, Dr. Yutong Lu, revealed this week that the Tianhe-2 supercomputer, which raked in yet another number one placement by a long stretch on the bi-annual list of the top 500 most powerful supercomputers, will be receiving its upgrade in the following year. However, due to the trade restrictions, they won’t be boosting their supercomputer with more Xeon Phi cores. Rather, the novel architecture they developed will deliver the system the extra 45 petaflops it needs to continue its reign at the top of the list for the foreseeable future.

A rapt audience, including The Next Platform, listened to Lu during a session at the International Supercomputing Conference in Germany as Lu outlined the digital signal processor (DSP) basis for the new chips that will extend Tianhe-2A (the name of the upgraded system) within the next year instead of by the end of 2015, as was originally planned. In fact, there was much speculation that China would have Tianhe-2A ready in time for the next benchmarking round to determine Top 500 supercomputer placement, meaning the world might have seen the first (by a rather long shot) 100 petaflop peak capable machine in history.

Dr. Lu has overseen the evolution of the Tianhe machines, beginning with the Tianhe-1A supercomputer, which took the world by surprise, toppling the dominant Titan system at Oak Ridge National Laboratory in 2013. She told the audience this week that the team at NUDT believe in the future of heterogeneous architectures and will move ahead as planned with the upgrade leveraging this new accelerator, which one can only imagine must have already been in development at NUDT for some time if the upgraded machine can have its new chips within one year.

As seen in the chart Lu presented below, the upgrade path for the Tianhe machines has been clear for some time, including the fact that the complete system would be around the 100 petaflop range for peak possible performance. The difference between the chart presented below and the one that has been seen before is that instead of the latest generation Xeon Phi chips as accelerators, the “China Accelerator” is the powerhouse of the machine.

The Tiahne-2 machine (and its eventual successor sporting the DSP accelerators) is housed at the National University of Defense Technology (NUDT) in China, As a side note, the fact that the center is a major defense and Chinese national security research center and DSPs are being leveraged in high performance systems as co-processors with all of the necessary software stack and programming environments to support it means this is likely something that has been in the works for military and defense systems within China. DSPs are frequently used in embedded military applications, including remote sensing, radar, and other activities—and Lu did tell the group this week that NUDT has had extensive experience with DSPs.

While we will cover the new DSP-based architecture in a separate article this afternoon once more information has been compiled (UPDATE– architectural feature can now be found here), the fact that the chips are not Chinese developed GPUs or coprocessors from one of several possible Chinese chipmakers that provide on-package acceleration with familiar elements is notable.

What is interesting is that the large Tianhe 1 and Tianhe 2 machines leveraged processors from U.S. companies, including Intel and NVIDIA and the site was planning to continue this trend with future upgrades featuring next-generation Intel chip technology. What it was that sparked the trade restrictions exactly is still unknown (and it is not as though the U.S. doesn’t already have a keen sense about what goes on applications-wise in the systems it ships to China) but it does appear that the center is still set to use the latest generation Xeon “Haswell” host processors with the addition of their China Accelerator—although it is not unlikely that by the time the upgraded machine emerges it could sport a processor from ShenWei or another Chinese chip design and manufacturing house.

Lu says that the team at NUDT is still on track to continue work on key applications that consume large numbers of the machine’s 3,120,000 cores (a combination of two Intel “Ivy Bridge” processors and three Xeon Phi coprocessors). She pointed to application successes in computational fluid dynamics (CFD), which is the most popular code area for Tianhe-2. From scramjet combustion and large passenger and cargo aircraft simulations, the teams have scaled their code past the million core mark with parallel efficiency of close to 80%, Lu says.

Other research areas, including genomics (population genetics and biomedical applications are two Lu described in detail) are also running on TIanhe-2 with unprecedented levels of scalability and parallel efficiency, validating the “neo-heterogeneous” approach (the combination of SMP processors and manycore accelerators) as the continued path forward for China’s supercomputers.

With the rollout of Tianhe-2A sometime next year, Lu says China will expand its capabilities in Hadoop and Spark as large-scale analytical platforms, as well as build out the Kylin Cloud—a multidisciplinary cloud platform that combines multiple research areas aimed at wellness (healthcare, social, public policy, etc.).

Although China has what is far and away the most powerful supercomputer on the planet, the number of systems in China has dropped significantly over the last year. In November of 2014, the country claimed 61 systems on the Top 500—a number that has plummeted to 37 with the retirement of several systems that were toward the bottom. The United States is holding steady with 230 machines on the Top 500—an impressive number, but this is the fewest supercomputers on U.S. soil outside of one other drop (down to 226) in the early 2000s.

Even still, the country sees a path forward—and is working toward being able to do this without American chip vendors like Intel. In the last ten years, China has unfolded a 1000x increase in system performance in 10 years—and while that may represent a competitive threat to the United States, or a national security one (no one is entirely sure which precipitated the restriction) it creates a new set of conditions for U.S. chip manufacturers who could see their business shrink in an area that is making big investments in big systems.

Full details on the specifics of the DSP-based processor can be found here in our follow-up.

The slide shows that with a 12% increase in the number of nodes, almost no increase in power consumption and the same Ivy Bridge E5s, that they can double the peak theoretical FLOPS of the machine.

However, Tianhe 2 is only about 55% efficient with 33 PFLOPS in HPL and considering its size and power consumption its not very good at HPCG(which is becoming much more relevant in pre-exascale systems).

I’m immediately skeptical of those architectural changes and their increases in peak FLOPS translating into almost 2x real application performance. Might be good for bragging rights if it translates into good HPL numbers though.

I don’t think you really have to wonder why they blocked the export of Knights Hill either. Some US company or agency is almost constantly detecting and dealing with intrusions involving China that have the purpose of industrial espionage.

Just giving them the latest technology to reverse engineer would be a ridiculous idea. Especially considering that the west in general is losing its competitive edge by outsourcing everything.

Industrial espionage says:

May 12, 2016 at 12:06 pm

Industrial espionage, same stuff that the U.S. is again leader of.

Reply

BlackDove says:

July 15, 2015 at 8:26 pm

The slide shows that with a 12% increase in the number of nodes, almost no increase in power consumption and the same Ivy Bridge E5s, that they can double the peak theoretical FLOPS of the machine.

However, Tianhe 2 is only about 55% efficient with 33 PFLOPS in HPL and considering its size and power consumption its not very good at HPCG(which is becoming much more relevant in pre-exascale systems).

I’m immediately skeptical of those architectural changes and their increases in peak FLOPS translating into almost 2x real application performance. Might be good for bragging rights if it translates into good HPL numbers though.

I don’t think you really have to wonder why they blocked the export of Knights Hill either. Some US company or agency is almost constantly detecting and dealing with intrusions involving China that have the purpose of industrial espionage.

Just giving them the latest technology to reverse engineer would be a ridiculous idea. Especially considering that the west in general is losing its competitive edge by outsourcing everything.

- Industrial espionage says:
  
  May 12, 2016 at 12:06 pm
  
  Industrial espionage, same stuff that the U.S. is again leader of.
  
Nicole Hemsoth says:

January 6, 2022 at 10:47 pm

selbst die dämlichsten Kommentare funktionieren auf Deutsch.

“Krieg” spielen. ǀ Der chinesische Henker — der Freitag – www.Duwir.com: Niemand ist mehr Sklave, als der sich für frei hält, ohne es zu sein.

China Intercepts U.S. Restrictions with Homegrown Supercomputer Chips

Sign up to our Newsletter

3 Comments

1 Trackback / Pingback

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

Ampere Readies 256-Core CPU Beast, Awaits The AI Inference Wave

Pandemic Compute Needs Drive Intel’s Data Center Group

oneAPI 2023: One Plug-In To Run Them All

3 Comments

1 Trackback / Pingback

Leave a Reply Cancel reply