ARM has rearchitected its multi-core chips to they can better compete in a world where computing needs are becoming more specialized.
The new DynamIQ architecture will provide flexible compute, with up to eight different cores in a single cluster on a system on a chip. Each core can run at a different clock speed so a company making an ARM SoC can tailor the silicon to handle multiple workloads at varying power efficiencies. The DynamIQ architecture also adds faster access to accelerators for artificial intelligence or networking jobs, and a resiliency that allows it to be used in robotics, autonomous driving, or any other application where a device uses lots of compute locally rather than over the network.
ARM’s Nandan Nayampally, vice president and general manager of the CPU Group at ARM, expects the new DynamIQ architecture to be released as part of the new generation of Cortex-A chips later this year and to first appear in products by 2018.
The new architecture is an evolution from ARM’s previous big.LITTLE architecture, which was introduced in 2011. The idea there was that a designer would pair a “big” processor that was powerful, but more power hungry, with a “little” CPU that was able to perform some tasks without sucking up a phone’s battery life. The operating system would take advantage of these options by assigning the appropriate tasks to the right processor, and whichever one wasn’t in use would go to sleep. This big.LITTLE approach had advantages for the mobile phone world, where a task like sending a text required little CPU power, while editing a photo before posting it to Instagram required a lot more. But it is not made for the next big anticipated workload for computers (whether they are phones or robots.) As we add more intelligence and computing to everyday objects, the ability to train and execute machine learning models is becoming essential.
This big.LITTLE architecture has not seen any adoption in datacenter processors, and the reason is obvious: servers are connected directly to power and the goal in the datacenter is to get them to do as much work as possible, all the time.
The interesting bit in regards to machine learning is that training usually happens in the datacenter (although one company, XNOR.ai, also maintains it will offer training on a device) while the inference, or execution, of the models happens on both the cloud and on specific device. ARM anticipates that instructions it plans to add to the DynamIQ instruction set will boost AI performance by 50X over the next three to five years on the SoC and and upgraded bus will allow for 10X faster communication with an accelerator. This could mean a lot of machine learning training could shift from the datacenters to devices, and that could have radical effects on the growth of processing in the datacenter.
The new DynamIQ architecture will affect the use of Cortex A series processors on phones, computers, networking gear, and other machines that require more robust processing. “On any Cortex-A device you now have eight CPUs (as opposed to four) which is good for high performance computing,” said Kevin Krewell, principal analyst at Tirias Research. “You also could use lower power cores for speed control and power management which is good for mobile phones.”
Krewell sees the new design being relatively weak for the sensors and wearables that can act as endpoints in the internet of things, since many of those run on the Cortex-M architecture. However, as those devices, or slightly smarter devices such as cameras need to do more computer vision or implement other machine learning models, this technology could gain ground.
Krewell says this is a big program for ARM, and he expects the chip design firm to announce a new Cortex-A architecture based on DynamiQ even sooner than ARM’s annual tech conference held in October.
It has reason to hurry. ARM’s decision to create a new architecture isn’t a surprise given the recognition among chipmakers that AI is the workload of the future. Nvidia was early in recognizing the opportunity AI offered for its massively multicore graphics processors, and Intel has been buying its way into this sector with more than $30 billion of acquisitions in the last three years. Intel’s $16.7 billion purchase of Altera in 2015 was its first foray into a new way of thinking about its silicon, and its acquisition of Nervana Systems last fall was another big move. Additionally Intel purchased Movidius, which offers low-power, on-device computer vision for mobile platforms, and just this month said it would spend $15.3 billion on Mobileye, which provides chips that offer computer vision for automotive systems. ARM can’t afford to stand still as Intel and even Nvidia step up.
Today, ARM’s emphasis on the flexibility that DynamIQ can provide and it echoes Intel’s investment in FPGAs through the Altera deal. So while ARM cores still can’t compete for the raw performance a GPU or an X86 chip can offer, ARM is pitching the ability to customize the design and the new architecture’s fast access to a powerful accelerator to those who might care more about flexibility and power consumption over raw performance. As chips become more specialized and the emphasis on general purpose computing fades, ARM’s approach could turn out to be a good one, and this time around, even for servers.