There are several competing processor efforts targeting deep learning training and inference but even for these specialized devices, the old performance ghosts found in other areas haunt machine learning as well.
Some believe that the way around the specter of Moore’s Law as well as Dennard scaling and data movement limitations is to start thinking outside of standard chip design and look to the human brain for inspiration. This idea has been tested out with various brain-inspired computing devices, including IBM’s TrueNorth chips among others over the years. However, deep learning presents new challenges architecturally and in terms of software development for such devices.
With GPUs dominating in neural network training in particular (with other devices vying to tackle training and inference on the same device), the question is where brain-inspired approaches might fit in the market and what advantages they might have over GPUs, even in theory (or software, for that matter).
We have already covered much of this work as it pertains to more recent neuromorphic computing advances for deep learning and other areas, but there is still a long road ahead to commercializing these devices at scale. There are many competing efforts on the device side—everything from circuits to full-fledged chips, which means disparate research directions versus a concerted effort and with a limited number of neurons to demonstrate at that.
To begin answering these questions, a team from IBM research has put together a detailed overview of the past, present, and future of neuromorphic devices and then tuned that review into the requirements of deep learning with an emphasis on what would be required from an analog memory device to speed deep learning workloads. The team takes a detailed look at a number of memory approaches, including non-volatile memories such as Resistance RAM (RRAM) and memristors, Phase Change Memory (PCM) devices, Li-ion-based devices, capacitor-based and other CMOS devices. In addition to these efforts focused on integrating analog resistive-type electronic memories onto CMOS wafers, they also look at photonic-based devices and systems and how these might fit into the deep learning landscape.
Of course, IBM has its own work to tout on this front, beginning with the aforementioned TrueNorth architecture and leading up to current work set to rival emerging efforts from Intel in neuromorphic. They admit that most shops deploying deep learning are not concerned with neuromorphic computing as they are more focused on “maximizing performance while finessing the limitations of commercial available hardware, VN hardware, which up until recently has meant hardware that was originally designed for something other than deep learning.” They also note that some of the custom ASICs designed for deep learning are neuromorphic in nature but many of these are “re-imagining the GPU as if it had been expressly designed for deep learning.”
“Leveraging conventional digital circuit design techniques, numerous design teams are seeking to deliver hardware acceleration for high energy-efficiency, high throughput (Tbit/second), and low latency DNN computation without sacrificing neural network accuracy [37,40]. Thus it is critical for researchers working on analog-memorybased hardware accelerators to both understand and take into account the advances that can be expected to arrive soon with such digital accelerators.”
Digital accelerators for deep learning are limited by data movement in particular, which is why the IBM team sees analog approaches using non-volatile and other memory approaches as successful for non image processing-like applications where GPUs shine.
Analog memory-based deep learning accelerators can work well on LSTMs and GRUs for instance, the team argues. “The heart of any analog-based accelerator is a memory array that can store the values of the weight matrix in an analog fashion. Weights are encoded into device conductances (e.g. the reverse of resistance), typically using NVM devices. In analog-based accelerators, the MAC operations within each VMM is performed in parallel at the location of the data, using the physics of Ohm’s Law and Kirchhoff’s Current Law. This can completely eliminate the need to move weight data at all.”
“Conventionally, NVM devices are used as digital memory devices. A high conductance or SET state might represent a digital ‘1’ and a low conductance or RESET state might represent a ‘0.’ In a crossbar array of such memory cells (Figure 5), access devices allow addressing of a single memory cell by appropriate activation of word- and bit-lines, for reading device conductance to retrieve stored data and for programming device conductance to update the stored digital data values. Such an NVM array can readily be used as an accelerator for deep neural networks.”
The IBM team concludes that even with the right networks defined for these approaches, early efforts to demonstrate equivalent accuracy on commercially interesting scales would likely involve either mixed hardware-software approaches or simulations that can reasonably accurately capture real device behavior. They add that such experiments would greatly benefit from hardware-aware simulation frameworks that could allow mapping of networks from deep learning platforms such as TensorFlow and Caffe to real systems. “Small-scale hardware demonstrations investigating the core critical modules or tasks that will eventually be an integral part of a large-scale system (for example, implementing parallel vector-vector multiplies needed in LSTMs, or block-wise reconfigurable routing) will be an extremely important stepping stone.”
“At the circuit and micro-architecture levels, there are several open avenues of research. For instance, analog-to-digital converters at the edge of each array might provide the maximum flexibility for implementing arbitrary neuron functions in the digital domain. However, the tradeoffs in power and performance need to be carefully quantified. There may still be good use cases for hybrid analog-digital chips, for e.g. in the area of memcomputing . Memcomputing applications that could use the exact same hybrid chips designed for Forward-Inference of Deep Learning acceleration would be particularly attractive.”
The full review of existing devices and analysis from IBM can be found here.