Researchers at South Korea’s Electronics and Telecommunications Institute (ETRI), in conjunction with Arm, are one step closer to designing and deploying a native CPU that can handle double-precision supercomputing applications and low-precision, low-power AI inference. For a country with all HPC resources tied to Intel processors, this could be a significant development for future large systems if performance and efficiency projections play out as planned.
Youngsu Kwon, AI Processor Research Department at ETRI, says the design spec was to create a device that could be capable of 2.5X performance over accelerators (GPUs in particular, which are increasingly common on supercomputers) with a 60% power reduction via a proprietary power gating architecture cooked into the software. The goal was also to create a software stack that could manage power consumption (temperature control, clock, mixed precision, etc.) and could also allow seamless hopping between the built-in accelerators, the double-precision compute, and the use of frameworks like PyTorch and Tensorflow via OpenMP and OpenCL.
The result of the design effort is the K-AB21 (AB standing for “artificial brain). The team says they’ve managed to pack 16 teraflops per CPU, which is almost completely achieved by dense matrix cores (XPUs) in the unit. The performance of a rack will reach up to 1600 teraflops, paving what Youngsu Kwon says is a path to Korean exascale.
“The focus should be on single chip performance for low power chips and systems. From there you can integrate more chips, increasing the performance and reducing power consumed. Also, the integration of CPUs and accelerators into a single chip will allow more bandwidth, which can remove the data bandwidth bottleneck.”
A closer look at the architecture is below, highlighting the processor die with multiple HBM2 dies for expanding reads and writes as well as DDR5 for expanded capacity. The HBM and processor are integrated with via their own interposer scheme with a hierarchial memory structure comprised of the HBM2 and DDR. The interesting feature here is the Arm “Zeus” cores, which are coupled with the ETRI XPU many threaded, scalable AI/HPC cores, which are essentially matrix math units provided those 16 teraflops.
In the image below the center is sliced by a backbone mesh network provided by Arm. Those Zeus cores are at the upper and bottom sides. The MMU600 blocks are connected to the accelerators (XPU). Each sub-block of the XPU is called an XEMC here—these have their own caches, load-store, double-precision units, and programmable cores that can execute multiple simultaneous threads.
The integrated block of XEMC and Zeus cores comprises a tile in this design with four tiles, flanked at the bottom by PCIe 5.0 interfaces that can act as a normal PCI interface for interchip communications. There are also DDR and HBM controllers around the die.
The group is still finalizing elements of the chip but expects it to be available by the end of 2021, possibly just in time to usher in a new era for Korean supercomputing via a new, natively-built chip that wrenches them free of Intel.
The largest supercomputer in production in South Korea today is the Nurion machine, which is the 17th most powerful system on the planet. It is based at the Korea Institute of Science and technology and is built on all-U.S. based tech, including Intel CPUs (with Xeon Phi, which shows its age) and integrated by Cray (before the acquisition by HPE). Interestingly, Cray has a strong presence in South Korea. All of their three Top 500-ranked systems are Cray/Intel CPU only machines, including Nuri (#138) for weather forecasting and Miri, another similar system for weather modeling at the country’s Meteorological Administration.
Considering that weather systems don’t often employ GPUs for workload acceleration and AI is still nascent, an architecture like the K-AB21 might not be a good fit internally, but as the replacement to the Nurion machine it is possible, especially since that system is likely hitting the end of its lifespan at about the time the K-AB21 will be fully available and tested.