When it comes to traditional supercomputing, the tools, frameworks, and software stacks tend to be codified, especially within the various domains that use high performance computing. In recent years, a new cadre of large-scale data analysis tooling has come into the HPC fold, but until more recently, machine learning, deep neural networks, and other re-emerging artificial intelligence tools have not been looped into the supercomputing world.
We have already described how machine learning and AI frameworks are using various elements of high performance computing, particularly on the accelerator side with companies like Nvidia dominating the computationally intensive training phase of machine learning via GPU acceleration and others, including FPGA maker, Xilinx, talking up the role of FPGAs for the inference portion of such workloads. However, outside of the use of these frameworks for hyperscale web companies, the connection between research-centric HPC and machine learning is still resolving.
In all fairness too, machine learning, neural networks, and artificial intelligence are not new trends—various elements of this broad class of algorithms are already integrated into the pipelines of numerous application domains in scientific computing. But as we have described before, the computational capability, power efficiency, and ultimately, the access to far more, complex, and rich data is present in ways it was not before. It is, in fact, the natural next step for supercomputing that follows the push to create “big data” ready supercomputers that kicked into high gear a few years ago. An evolution, really.
This theme went hand in hand with a series of in-depth conversations and sessions at the annual Supercomputing Conference (SC15) this week around what the end of Moore’s Law means for extreme scale computing, how MPI-bound applications and other code will scale to meet new capacity, and how applications themselves can be “smarter” and by proxy, more efficient with more multi-leveled results. One look at the ways big companies using machine learning to achieve efficiency and scalability gains while producing unique insights says this there is a reason for all of the fresh attention on machine learning. HPC is just getting into step in a bigger way.
The question is, are traditional supercomputers the right computational horses to drive machine learning—and if so, where is this set to be a fit for scientific applications? Or is the purpose-built Top 500 level machine too heavily laden with tooling (hardware and software alike) to fit the needs of deep learning and AI applications to work without significantly paring them down? There is little doubt that accelerators like GPUs are blurring the HPC and deep learning lines but what about the rest of the modern supercomputing stack?
All of this comes to our attention once again because, tucked away upstairs at SC15, far away from the vendor crowd, were rows of dense research posters, guarded by legions of talkative graduate students. All of the expected MPI, network, storage, and file system focused themes were plentiful. Other newer areas for HPC, including large-scale graph analysis and data mining were also found. But there was a completely new theme—one that has not been emphasized in HPC much, if at all—machine learning and artificial intelligence.
One such research presentation was on large-scale artificial neural network training using many GPUs. This is something that we know is already happening at Facebook, Google, and elsewhere, but given the fact that most CUDA development at extreme scale comes from the HPC world, there was some slightly different perspective. The emphasis of this research was to refine the training phrase by reducing the forward and backward passes involved in matrix multiplication on GPUs, which currently adds some performance overhead. The team’s solution is an out-of-core multi-GPU matrix multiply step that is tied into the core artificial neural network algorithm. Notable speedups for Caffe were reached over standard approaches using GPUs for the training phase.
Another group of researchers tackled a more software-focused problem within machine learning at extreme scale in a presentation featuring a comparison of machine learning approaches for dealing with multi-collinearity in large-scale data analytics and data mining on high performance computing systems. For most large data analytics operations, variables are tightly correlated, but the bottleneck lies in the pre-processing stage for the many variables. Their work uses a research database and compares several machine learning techniques, including stacking, to determine the most efficient method for processing many thousands of variables for large-scale data mining.
Yet another effort to make machine learning well-suited for extreme scale platforms was presented by the team that created MLTUNE, a tool-chain for automating machine learning workflows, along with the requisite performance tuning. As the team describes, training of complex models requires knowledge of statistical techniques that end users might not be skilled in deploying. MLTUNE does sacrifice some sophistication in favor of generalization and automation but, as they note, “the system’s applicability is demonstrated with an auto-generated model for predicting profitable affinity configurations for parallel workloads.” In other words, tools like this and others that are being developed to suit HPC platforms are designed put machine learning in broader reach.
These are just a few early hints that the traditional high performance computing crowd is seeing enough value in what lies ahead for deep neural networks and machine learning to warrant some research space during its largest annual showing. And as one might imagine, these are only a couple of examples in what is, if one looks through the peer-reviewed pool long enough, a growing trend. And what is refreshing here, unlike the wave of “big data” everything that we all somehow lived through, where the need was valid but the hype obscured meaning, machine learning is an actual “thing” we can watch. There are distinct algorithms and approaches–same thing is true with deep neural networks. We have to work harder to see how it fits, but it’s narrow enough that the development can be tracked instead of casually lumped into some catch-all “big data” bucket. Or at least that’s the view of someone whose job it is to classify all of this in narrative form.
And so yes, there have been other efforts to suit machine learning and HPC platforms to one another that were not on display at SC15. For instance, projects like FireCaffe for accelerated training on large clusters, and other work to take advantage of very large core counts to do things like human facial recognition in very large crowds (something Google has already been working on) already exist. Returning to the founding point here, none of these things are new—but the time is ripe for work in HPC and machine learning to interplay more, and more publicly, if nothing else.
One could make the argument that what Google, Facebook, and others have built for their own machine learning workloads is already supercomputing. To circumvent a “what is a supercomputer” conversation here, let’s say for now (with a follow-up on this later) that the systems that chew on those workloads at hyperscale datacenters are stripped down. They are purpose-built, and they are designed to on a minimal software stack poised to do one (or maybe two) things really well. This is not the case with supercomputers at national labs or research centers, but this does not mean they cannot (or will not) be a fit for machine learning.
And as a side note, aside from the research itself, it’s useful to remember that for poster sessions at SC15, this is the new crop of computer science professionals—and they are seeing the capabilities of large-scale supercomputers as applicable to the new wave of machine learning and neural network driven applications. Problems with older or existing tools and frameworks are giving way to a refreshed interest in Spark over MPI, in machine learning and deep neural networks over standard applications. Not in all cases, but enough to merit a pointer.