With so many chip startups targeting the future of deep learning training and inference, one might expect it would be far easier for tech giant Hewlett Packard Enterprise to buy versus build. However, when it comes to select applications at the extreme edge (for space missions in particular), nothing in the ecosystem fits the bill.
In the context of a broader discussion about the company’s Extreme Edge program focused on space-bound systems, HPE’s Dr. Tom Bradicich, VP and GM of Servers, Converged Edge, and IoT systems, described a future chip that would be ideally suited for high performance computing under intense power and physical space limitations characteristic of space missions. To be more clear, he told us as much as he could—very little is known about the architecture, but there was some key elements he described.
First, the architecture, called the Dot Product Engine (dig out math notes on vectors and dot products since it’s relevant here) is less of a full processor and more like an accelerator, which takes offload of certain multiplication elements common in neural network inference and broader HPC applications. Bradicich says that the prototype as it stands can be standalone for certain problems but could also be PCIe or direct sensor attached eventually to cut down on bandwidth and latency constraints. From his first description, it sounds very much like a tiny version of Nvidia’s tensor core, which (at its simplest) can handle a 4X4 matrix operation-based workload well.
With the power consumption constraints of extreme edge environments, something like a Volta GPU with the TensorCore, for instance, would be far too energy hungry—not to mention demand a larger form factor. Other devices from the chip startup world have other limitations as well. HPE decided to roll its own—and has been heads down cobbling together an early software stack to support the DPEs for eventual productization (assuming that happens). As Bradicich tells us, ““These days you have to make a lot of make versus buy decisions and today, we buy a lot of Intel processors and a lot of Nvidia GPUs. But to get what we want, we are seeing we can’t get the performance and energy we need for this and we need to take a ‘make’ approach.”
The tough part about the discussion was that HPE insisted on calling this a neural network processor, but how that moniker matches against what we think of with other neural network architectures is still unclear. In some ways, it almost seems like HPE could tie this to the “neuromorphic” term easier than neural network since it is a non Von Neumann architecture that sounds like it might be based on memristor memory concepts (see some of our work on The Machine if this is unfamiliar). It also sounds like there could be an FPGA angle here as well, but HPE is unable to comment on specifics of the hardware architecture. What they did say after a second round of questioning about the neural network angle is the term is being used in a broader way.
“DPE is not a neural network per se, in the sense that it’s not a fixed configuration, but rather is reconfigurable, and can be used for inference of several types of neural networks (DNN, CNN, RNN). Hence it can do neural network jobs and workloads,” Bradicich clarifies. “DPE is executes linear algebra in the analog domain, which is more efficient than digital implementations, such as dedicated ASICs. And, further, it has the advantage of reconfigurability on the fly. It’s fast because it accelerates vector * matrix math, dot product multiplication, by exploiting Ohms Law on a memristor array. It can also be used for other types of operations, such as FFT, DCT, and convolution.”
We put a similar question to HPE Labs Rebecca Lewington, who provided a bigger picture view of what a systems level take on these process looks like. “You need to look at the fleet of IoT devices holistically. We believe that if architected and connected, you would have a central learning engine that can gather the experiences of the entire fleet of devices. To train the neural network, you would retrain a centralized model and then push that model to the edge devices again, which is where the inference is done. That way, every device in your fleet has the ability to learn from every other device in your fleet. In the next generation, we would incorporate the Memory-Driven Computing architecture, which would enable us to attach task specific accelerators—like the Dot-Product Engine—to a centralized pool of memory. That means that we can complete the model training much more rapidly and more energy efficiently, as well as more flexibly because we can accommodate new learning frameworks as they become available.”
When asked to put this chip architecture in the context of similar devices (TensorCore in the Volta GPU, Nervana, Graphcore, Wave Computing, etc) Bradicich said it is similar to several existing chip designs in that it can handle vectors well, but he says how it is designed from a hardware and software co-design standpoint makes it far faster than anything on the market—and a much better fit for the performance, power, and space requirements of extreme edge environments. “We are not afraid to buy or partner for technology like this but there is nothing like this on the marketplace horizon.”
We asked what, if any, neural network frameworks this might run inference for; whether or not it can do it training, and of course, if this is really conventional neural networks we’re talking about here. No answers on any of those that point to anything we think of here at The Next Platform as true neural nets. Neuromorphic perhaps–but neural networks seems a stretch.
We dislike being shy on technical details but wanted to set the stage for more information set to emerge on this architecture following the prototype demo in Spain at the end of the month during one of HPE’s events. The interesting part for now is that perhaps some of the tech giants that we guessed would have started snapping up deep learning chip startups really haven’t. The reason is becoming clear: although the architectures are promising for now, perhaps what is needed is less general purpose for many machine learning algorithms and is instead tailored for applications or environments. This takes us back to the premise we started with a few years ago; IT overall is moving from homogeneous mega-clouds and datacenters and chip concepts to specialization, novel architectures, and fine-tuned approaches to real-world problems.