Hardware Slaves to the Master Algorithm
September 9, 2016 Nicole Hemsoth
Over the long course of IT history, the burden has been on the software side to keep pace with rapid hardware advances—to exploit new capabilities and boldly go where no benchmarks have gone before. However, as we swiftly ride into a new age where machine learning and deep learning take the place of more static applications and software advances are far faster than chipmakers can tick and tock to, hardware device makers are scrambling.
That problem is profound enough on its own, and is an entirely different architectural dance than general purpose device have ever had to step to. Shrinking dies and increasing reliability and speed are fine arts that have been mastered. But with the new algorithms that rise and take shape, merging and evolving with an incomprehensibly young, rich code ecosystem, chipmakers do not even sure where to begin, let alone how to integrate their ages-old wisdom.
The easy way to address this period of transition is to buy the expertise. The hard way is to build it. And with a machine learning code base that is rich and shape-shifting, there appears yet to be a standard “correct” approach. In the midst of all of this is a fundamental divide between what devices are being dreamed up to suit this new crop of users and what the codes they are deploying actually require, according to Dr. Pedro Domingos, University of Washington computer science professor and author of The Master Algorithm.
“In the past there was no real compelling reason to have machine learning chips, but that has changed,” Domingos tells The Next Platform. Just as machine learning and deep learning were seeing a resurgence, work on the hardware and software side on GPUs was proving itself at scale and, along the way, paving the way for other research areas to push into acceleration. And it also just so happened that GPUs were very good at the matrix multiplication-based problems deep learning was chewing on.
That convergence gave Nvidia a clear head start in the marketplace for deep learning training in particular, but as we have been reporting over the last couple of years, others with specialized architectures (from FPGAs, custom ASICs, neuromorphic chips, etc.) have seen an opportunity for catering to the different hardware demands for this segment of the market. Of course, Intel has also see the opportunity, snapping up Nervana Systems and Movidius—both device makers with machine learning optimized software stacks in tow. Despite all of this effort to play catch up and fight battles over who wins the processor shares for the wide variety of machine learning users, there is still a fundamental disconnect from the general purpose processor players and the evolving needs of the diverse machine learning community.
“The big companies right now; Intel and Nvidia, are still trying to figure this space out. It is a different mode of thinking. From a machine learning perspective, we can say what specific primitives are needed from the hardware makers, but the issue is deeper…The machine learning people can tell the hardware people what they want but the hardware people need to tell the machine learning people what they actually can and can’t do. It’s that interaction that will get interesting results.” As it stands now, throwing hardware devices at the wall to see what sticks is a wasteful approach—and even less useful with such a quickly evolving code base.
“Deep learning is just one type of machine learning. Just because there are chips that are good for deep learning doesn’t mean they will be good for other types of machine learning,” Domingos says. The real question hardware makers need to consider is what will happen when yet a new wave of machine learning comes in; how can the hardware be flexible enough to support it and if there is some set of primitives that can be implemented in hardware, those could change over time as well.” There are some primitives that haven’t changed much over the last five years during the new golden age of machine learning, and by focusing on these, Domingos says, Intel and other companies can get an early foothold.
“The invention of the microchip was, and still is an amazing thing; it is super-reliable, it’s completely deterministic, and it lets us keep building things because solid state electronics are so reliable. No other area has this amazing gift. Not chemical engineering, not mechanical engineering—they have to live with all the crap and noise and things that break. But in computer science, we get to live in the real world; a world where we build programs. And machine learning is taking us back to the real world. It is statistical. It is probabilistic. And we don’t know always know why things work or don’t all the time.”
It can be confusing for us in machine learning but for a company like Intel, where they are used to things being reliable and rigid, machine learning takes a dramatically different mentality. “For machine learning where things are statistical to begin with, not everything needs to be in its place and defined. There is a big opportunity here, but the mental transition to this way of doing things—of letting things work 80% of the time versus near 100% in an application is better than not having anything at all—and better than having a bunch of pre-programmed rules that only catch 5% of the potential use cases.”
But the general purpose hardware camps are planting their flags in machine learning now, building the software underbelly as they go; afraid to miss out on a potentially lucrative market. After all, it wasn’t long ago that Intel stated that over half of all workloads running in datacenters will have a machine learning component. On a technical level, that sounds overblown—and one has to wonder what Intel means by machine learning. Is this a catch-all definition for advanced analytics or is it actually a new layer of technology layered on top of all of the other database, data management, and other tools and applications as a top-level intelligence tie? If it’s not overblown, can we say that in the next five years we will see the death of static analytics? Either way, big companies are taking big steps to plant a stake in the ground—several for different machine learning workloads, as it turns out (Intel’s Knights Mill and Nervana/Movidius acquisitions) and Nvidia’s many chips for deep learning training and inference (Pascal, M40/M40, Tesla series, etc.).
So with the understanding that general purpose processing options are still lagging behind in terms of real value to the extensive, growing list of machine learning applications, what will win and lose? One answer is to look to custom ASICs, which several of the startups, especially those focused on deep learning, are seeing as the path forward.
“The thing about machine learning that is key is that there are two sides to the problem; the learning and the model that’s produced from that learning. Once you’ve learned a model it’s really just a simple program and that is easy to implement in an ASIC. But the problem is, you don’t know what that will be until you have the data—so that means a different ASIC for the first learning part.” That can be expensive up-front and besides, models evolve, rending an ASIC useless without an ability to rapidly reconfigure—something an FPGA should work well for.
“The data dictates the approach. The better algorithms are the flexible ones and the more flexible ones are harder to implement in an ASIC. That is different than what we are used to in computer science, but that is the essence of machine learning; it is not determined going in. This is the learning curve for hardware companies. It takes a different way of thinking entirely.”
FPGAs are another possible accelerator for deep learning in particular and in fact, neural networks can be seen as a “soft” version of a neural network. The problem here, as with custom ASICs, is that the problem is not known until the data informs it. In other words, all you can get out of an FPGA or neural network is a subnetwork. So while it might do well for part of the workload, it can’t do it all.
Of course, GPUs, FPGAs and custom ASICs aren’t the only promising hardware trends on the horizon. Domingos points to neuromorphic devices as a promising area to watch. “Building a neuron out of digital devices is already vastly more efficient than generic hardware and software stacks for deep learning. Opinions are divided here but this is one promising path for efficient semiconductor devices that can tackle these workloads well.”
For Domingos, the hope is that we will start to see some core primitives baked into hardware and for further unification of the software tooling to support various machine learning workloads. “”We are going to see that unifying frameworks at the software level will migrate into the hardware (logic, graphical models, Bayesian networks, etc.). Things have to become standard for machine learning in key areas, particularly graphs and similarity computations. This is an area where there will be progress,” Domingos says, but as he agrees, on the hardware front it’s still anyone’s game—and a game played on many fields to boot.