We did not plan it, but today has become make-your-eyes-bleed-with-chip architecture-patent-applications day. On that note, here is the TPU3 patent in its full glory. And while we are at it, an interesting look at how Facebook has re-architected the convolutional neural network as we know it.
But of particular interest now is a new patent application from researchers at IBM’s Almaden lab, which shows expanded efforts in making the company’s True North neuromorphic chips more suitable for deep learning as well as more scalable overall. We have been steadily following the path neuromorphic architectures have been taking with the rise of new deep learning training and inference architectures, but until more recently, IBM and others, including Intel’s Loihi, have seemed more rooted in research than reality.
The full patent application, which appeared this week, also shows some of the first images to date of the structure and connection networks of the latest generation True North devices. IBM has had silicon devices since 2014 but with the rise of similar architectures that gain power advantages via a spiking neural network approach gaining traction for deep learning training and inference, one can reasonably assume development pace to capture mindshare (the chips are not publicly available) will be on the uptick.
The real problem with True North, or any other neuromorphic architecture, is that they are esoteric from a programming and frameworks standpoint and clicking several together has been a challenge from a development perspective. IBM appears to be working around those limitations with its programming language, methods for partitioning problems across multiple devices, and approach to handling spikes and how those are represented the graph which becomes the neural network.
The patent pitch, which focuses on how graphs are partitioned and allocated to the cores and more notably, across devices for scalability, also sheds light on how IBM converts signal to spikes for the neuron cores to process and how the Corelet Programming Language interface manages neural net input and output.
Among the elements to make their neuromorphic device more efficient and scalable is a new algorithm that determines placement, or how a neurosynaptic core in True North is mapped onto the chip or now, across chips.
As the patent explains with stunning clarity (eyeroll), “A multi-chip neurosynaptic system may comprise a KxKxM configuration of neurosynaptic chips , where M corresponds to the number of KxK boards . The bandwidth between these chips is generally limited , according to the characteristics of the chips . To maximize the hardware throughput while minimizing the power consumption , a physical synthesis software is used to generate efficient core placement that minimizes the communication between cores across chips and maximizes communication between cores within each chip.”
In other words, scaling any chip without taking a performance hit is a tough problem but they have cobbled together some code that automatically routes the best spot for data on each chiplet and also how to make sure the data transfer between those is fastest since they need to talk to one another. This is not new, but this is the first time we understand how problems are mapped and partitioned onto the 4096 cores and how those cores communicate, even though the actual fabric is not described.
Another new data point is that standard deep learning frameworks can be used and converted during offline training. The trained networks are converted into corelets after training and the corelets are then converted into model files that can be than loaded onto the specific neurosynaptic substrate at hand . The model file stores the neuron configurations , crossbar states and neuron – axon connections for all the cores.”
To explain this better, earlier in the application they note that “A deep convolution network of neurosynaptic cores can comprise of multiple layers of cores . It may be a feed – forward network comprising various type of layers such as convolution layers, splitter layers, or averaging ( pooling ) layers . Convolution layers perform three – dimensional convolution for a given patch size, stride , and group . In some embodiments, different TrueNorth cores are used for different topographic locations and groups. Such a construct provides natural sparsity in the network with convolution cores. Averaging layers may perform pooling on each feature map for a given patch size and stride. In some embodiments , a single TrueNorth core can pack large number of features from the same feature map , resulting in a densely – connected network . Similarly, in splitter layers, random choice of inputs for splitter cores may generate complex connections in the network.
We believe that this sounds quite a lot more like what Graphcore is doing on the network side mixed with what a spiking neural network hardware device like Brainchip might do once it’s released into the wild. It also shares much in common with Intel’s Loihi based on the few architectural details we’ve been able to glean.
The point of this patent pointer is to show that neuromorphic architectures are finally coming to the fore and IBM is working to bolster their device, which was the first of its kind (along with a handful of research prototypes at various universities) for the new onslaught of high-value deep learning applications. Being able to wrap a common deep learning framework on top, scaling it across multiple high core count chips, and providing a useful programming interface are all keys of bringing a research product into the mainstream—something that no one has done yet but Brainchip (although we are waiting on silicon) but that IBM could definitely bring to market. Assuming, of course, there is a market for these by the time that happens.