Micron has a habit of building interesting research prototypes that offer a vague hope of commercialization for the sheer purpose of learning how to make its own memory and storage subsystem approaches more tuned to next generation applications.
We saw this a few years ago with the Automata processor, which was a neuromorphic inspired bit of hardware that focused on large-scale pattern recognition. That project has since folded internally and moved into a privately funded effort from a startup aiming to make it market ready, which is to say that it has all but disappeared from view since that was a couple of years ago.
There is more here for anyone interested in the Automata architecture, but for those curious about why Micron wants to get into the accelerator business with one-off silicon projects like that or its newly announced deep learning accelerator (DLA) for inference, it’s far less about commercial success than it is learning how to tune memory and storage systems for AI on custom accelerators. In fact, the market viability of such a chip would be a delightful bonus since the real value is getting a firsthand understanding of what deep learning applications need out of memory and storage subsystems.
This deep learning accelerator might be counted among those on the market (and that’s a list too long to keep these days) but we do not expect the company to make a concentrated push to go after a large share. This is for the same reasons we don’t expect much to emerge into IBM’s product line from its research divisions. They are all efforts to build better mainstream products. If there is commercial gain, great, but it is not the wellspring of motivation.
Nonetheless, it is worth taking a quick look at what Micron has done with its inference accelerator since it could set the tone for what we may see in other products functionally, especially for inference at the edge.
Last year, Micron bought a small FPGA-based startup that spun out of Purdue University called FWDNXT (as in “Forward Next”). It also acquired FPGA startup, Pico Computing, in 2015 and has since been hard at work looking for where reprogrammable devices will fit for future applications and what to bake into memory to make those perform better and more efficiently.
The FWDNXT technology is at the heart of Micron’s new FPGA based deep learning accelerator, which gets some added internal expertise from Micron via the Pico assets. The architecture is similar to what we’ve seen in the market over the last few years for AI. A sea of multiply/accumulate units geared toward matrix vector multiply and the ability to do some of the key non-linear transfer functions. Micron took the FWDNXT platform against some tough problems and worked to do things like build tensor primitives inside the memory (so instead of floating point based scatter gather they could go fetch a matrix sitting in a buffer versus going over memory) They have also used the platform to build a software framework that is hands-off from an FPGA programming perspective (just specifying the neural network).
Micron wants to target energy efficiency by going to the heart of the problem—data movement with the performance goal of better memory bandwidth. All of this creates an accelerator that can be useful, but Micron was better able to see how to create future memory by working with FWDNXT to get the device ready.
“It became obvious that if we are tasked with building optimized memory and storage we need to come up with what is optimal rather than just throwing in a bag of chips and hoping it works,” explains Steve Pawlowski, VP of Advanced Technology at Micron. “We are learning about what need to do in our memory and storage to make them a fit for the kinds of hard problems in neural networks we see ahead, especially at the edge.”
Pawlowski is one of the leads behind some of Micron’s most notable efforts in creating specialized or novel architectures like Automata. He previously led architectural research initiatives at Intel where part of his job was to look how prototype chips were solving emerging problems in interesting ways and if those architectures held promise or competitive value. In the process, he developed an eye for building out new programs at Micron that took a research concept and tested its viability and role in using or improving memory devices.
“By not having observability into the various networks on the compute side we could only guess if the things we were building into memory would be useful, Pawlowski says. “The only way we could get real observability into how neural networks area executing was to have the entire pipeline so we could go in and instrument every piece of it. This is how we end up making better memory.”
He adds that they build this base of knowledge by looking at some of the most complex problems and architecting from there, including with a cancer center that is doing disease detection at scale. Here accuracy is the biggest challenge. They’ve also been working with a “very large high-energy physics entity” (venture a guess) where the drivers are performance and latency. By taking a view of solving problems with different optimization points (accuracy versus raw performance) Micron is hoping to strike a balance that can inform next generation memory.
During these research and productization experiments, Micron does get a forward look at what future memory might need for a rapidly evolving set of workloads like AI.
The funny thing is, what they’re learning is the inherent value of what they already built as a commercial product several years ago—something that had great potential but strong competition. That would be hybrid memory cube (HMC) which has since been folded as a product as Micron focuses on what is next for that concept of memory stacked on top of logic.
As Micron looks at AI workloads the potential for this exact thing, which exists in plenty of devices now as rival HBM, has even more potential, even for inference. It might sound heavy-handed for an energy-efficiency-focused set of workloads, but more demands from the inference side will mean greater compute requirements. Doing all of that in a stacked memory device at the edge might seem like an expensive stretch, but Pawlowski says this is what he sees in his crystal ball.
“There may be a renaissance of memory stacked on logic doing neural networks at the edge. The need for higher memory bandwidth will matter more in the years ahead. There will also be a need to reduce memory interconnect power too,” Pawloski says, adding, “I believe there will come a day when an architecture that is in the HMC style will be the right thing. By then it might not just be a memory device, it could be an accelerator. There will be other capabilities that come along there as well, including better ECC, for instance.”
It’s hard to tell where the research ends and the commercial potential begins with some of Micron’s research efforts for new chips or accelerators. If indeed these flow into the next instantiation of HMC, whatever that might be, this is interesting backstory. But when it comes to innovating in memory in a meaningful way that captures what AI chips need now the market might move on before Micron has a chance to intercept it with who knows what analog and other inference devices at the fore.