As we have written about extensively here at The Next Platform, there is no shortage of use cases in deep learning and machine learning where HPC hardware and software approaches have bled over to power next generation applications in image, speech, video, and other classification and learning tasks.
Since we focus on high performance computing systems here in their many forms, that trend has been exciting to follow, particularly watching GPU computing and matrix math-based workloads find a home outside of the traditional scientific supercomputing center.
This widened attention has been good for HPC as well since it has brought new attention to the field, which many outside tend to think of as academic (even if many of us know it is far broader than that). But what was interesting this week as we wrap up at the International Supercomputing Conference in Germany, which focused heavily on HPC and AI, was how much deep learning folks need from HPC, but how little there is (yet) to feed AI and machine learning back into supercomputing.
To be fair, in our conversations, we asked users at several national labs and supercomputing sites if the emphasis this week at ISC ’16 on deep learning was relevant for their workloads. While many said there was interest in finding smarter ways to pre-process data in a smarter way using machine learning, or to replace some applications entirely (especially in climate and other studies that are largely image classification and pattern matching workloads) these are still early days for HPC users and machine learning. But of course, for the deep learning pros who were on hand to talk to the supercomputing set about their challenges, HPC has far more to give than to gain—again, at least for the moment.
To break this down, here are some of our observations listening to the deep learning and machine learning experts who presented at this historically traditional supercomputing-focused conference, among them, Andrew Ng, Chief Scientist at Baidu’s Silicon Valley Lab and well-known AI guru.
As Ng noted this week during his keynote at ISC16, training a speech recognition like the one at Baidu takes around 10 exaflops of compute for the entire cycle across 4 terabytes of data. That kind of scale of compute and data management is only found in HPC, and this is, as Ng said to the audience, “where machine learning could really use HPC’s help”. Of course, it’s not just about the raw compute or sheer volume of data and while Ng gave a brilliant talk about the co-evolution of HPC, CUDA and GPU computing, and hardware for deep learning, the main takeaway was that HPC has been great for AI, but more developments from the HPC community are needed to keep moving it forward.
What wasn’t said was what HPC might expect from their role in such developments in terms of new approaches to standard scientific computing problems. In other words, how can both benefit going forward?
Where the challenges on the HPC side for AI became far clearer was in some of the deeper dive sessions looking at some key pain points for machine learning shops beyond mere compute. One such weak point in deep learning that stands to benefit from HPC is the difficulty in scaling deep learning applications. “The scale we’re operating at now is a joke compared to what HPC can do,” said Dr. Janis Keuper, a researcher in machine learning from the Fraunhofer Institute in Germany who outlined challenges that bridge the HPC and machine learning divide.
In many ways too, Keuper says, there are elements that look like HPC but in actually aren’t in practice. “The common hardware setup is workstations and blades with maybe two to eight GPUs times as many processors as you can afford. But there are no distributed solutions. Many use single entities to solve these problems; so if you have 20 people in your group, it’s likely 5 nodes for each user’s playground and each will train one model on one machine.” But again, some of this is due to the lack of scalability of codes they’re working with as well—something he says he hopes will change.
Further, he says, “even if you have eight GPU cards; PCIe can’t handle the data load. The computation time for one iteration is a single second, but you have to transfer the whole model twice at the same time. So that means hundreds of megabytes of data transfer at the time computation. What we really need is more bandwidth in the box and of course, more network bandwidth.” This provides an interesting perspective counter to the experiences we’ve covered of Baidu and other companies that can invest in far larger GPU-laden machines to power large-scale deep learning. For centers that are just on-boarding, HPC will be a good eventual landing point, but with the current GPU configurations they have, scaling beyond a certain point is out of the question.
One other area where, quite arguably, the HPC community can be most helpful is on the code side. “We need to work with much lower precision; these algorithms work well at 16-bit or even 8-bit in some cases but developing around this is difficult,” Keuper explains. Further, he says work on model compression and moving from dense matrix to sparse matrix computation—an area where HPC shine above all others—is critical. “Deep learning people want low precision and sparse matrix. We want better algorithms too. And at the same time, we need to increase the number of nodes and bandwidth to push a lot of data—it’s hundreds of terabytes to feed these networks and if the data is not coming, you have to find a solution.”
“It’s about fast interconnects, algorithms for matrix multiplication—both dense and sparse, low precision floating point capabilities, efficient distributed data access (using parallel file systems) and benefit from those decades of experience in optimizing and parallelizing difficult optimization problems. It’s all HPC knowledge,” Keuper concludes.
In short, high performance computing has already been extremely useful for deep learning on the hardware side, particularly with the development of CUDA and GPU-powered algorithms that run on high performance iron. The code work and optimizations, especially for matrix math problems, gets into denser territory, but it is clear where the experts can be found—many are digging into similar problems for their scientific code.
Now that we have this baseline, the real question, which was especially prescient for those at the ISC event this week to talk about their supercomputing centers and gear is:
So, this is where things get somewhat speculative. Because outside of research interest a few examples, AI really has not arrived for HPC–and even browsing the research poster sessions at ISC to see what’s next for HPC, most were focused on MPI, scalability, and other “run of the mill” problems and applications.
As we have described before, there are already some areas where traditional supercomputing centers are branching into machine learning and deep neural networks to advance scientific discovery. For instance, this is useful in image recognition for weather and climate modeling as well as for a few other areas, as noted by Intel in their opening up about an upcoming focus on machine learning, that attach to existing HPC workflows.
From what we gathered this week, however, the high performance computing crowd has its hands full with putting together the various compute, data movement, and energy story that will lead to exascale computing over the next few years. And while there has been talk at every major conference for HPC since last year’s SC event in November about the exaflops required to power next-generation machine learning-driven services (as Raj Hazra noted in the Intel keynote at ISC referring to autonomous cars and the data they generate/compute they require to learn from drivers and the road), the real-world connections between these two still-disparate worlds are still somewhat nebulous.
To be fair, there is real interest from people we talked to in what’s next for deep learning for scientific and technical computing. But with legacy and existing codes that have been honed over the course of decades, the idea of changing on a dime (especially with hardware investments that match that application base) is not practical. Still, we are watching how some areas in HPC integrate machine learning into the front-end or post-processing steps of their workloads. Beyond that, it’s anyone’s guess.
As a final note, there Intel and Nvidia are both making a lot of noise about their processors that can do double duty on HPC and deep learning workloads. The DGX-1 appliance from Nvidia featuring Pascal and Knights Landing, which Intel thinks can put both the training and inference and HPC capabilities on a single platform are compelling stories. Watching how HPC grabs all of this capability and goes forward will be a key story to watch this year. And we are.