The Hard Limits for Deep Learning in HPC
January 26, 2018 Nicole Hemsoth
If the hype is to be believed, there is no computational problem that cannot be tackled faster and better by artificial intelligence. But many of the supercomputing sites of the world beg to differ.
With that said, the deep learning boom has benefitted HPC in numerous ways, including bringing new cred to the years of hardware engineering around GPUs, software scalability tooling for complex parallel codes, and other feats of efficient performance at scale. And there are indeed areas of high performance computing that stand to benefit from integration of deep learning into the larger workflow including weather, cosmology, molecular dynamics, and more. Still, for most standard HPC application areas, one tough fact that stands in the face of all of AI’s promises: good old-fashioned physics.
Even though this limitation, coupled with the problem of a black box versus first principles approach to science, represents a severe set of limits, it has not stopped teams at the National Energy Research Scientific Computing Center (NERSC) from reaching to the most untested (and promising) of all AI approaches to apply to HPC problems—generative adversarial networks, or GANNs. If you are a regular reader here you already know what these are, but in a nutshell, these are dueling neural networks that promise more accurate, complex results.
The AI on HPC systems efforts are led by Berkeley National Lab and NERSC’s incredibly prolific Prabhat. Currently he heads up the Big Data Center at the lab and is focused on studying performance and applications on both the flagship Cori supercomputer—a Cray XC40 and other systems at the lab and to work on algorithms for both HPC and deep learning, new use cases for both, and the development of software tools, including a framework for automatic hyperparameter tuning, something that is gathering investor attention in the AI startup world lately.
The early results from work on GANNs for HPC simulations at NERSC have been promising—but enough to make Prabhat think traditional HPC is in for a major shakeup anytime soon. “We have found that it’s possible that a well-trained GANN can produce data that statistically looks identical to what a simulation might produce,” Prabhat says, pointing to recent work by other teams at Berkeley Lab in cosmology and high-energy physics. “So does all this mean we can throw simulations away and rely on GANNs? No. The frontier of deep learning in HPC is for now going to be centered on understand the limits of deep learning.”
Since neural networks are not good at simulating physics under different conditions and even if they were, are difficult to track back for validating scientific use cases, Prabhat says they will be limited in HPC. “There are many things we have been getting right in the simulation community based on 30-40 years of applied math, and so a black box, as good as it may be, will not be suitable until these things are worked out.”
“There will be some specialized use cases in simulations where we use heavily parameterized or complex hand tuned models where deep learning could replacing some of the tuning there. Beyond that, for now, people will find deep learning is good at reproducing patterns, but those issues of baking in the physics and making models credible have to be solved for much beyond that.”
In terms of the system needs for an HPC center toying with deep learning ambitions, the placement of an AI research center at NERSC with a CPU-only supercomputer might seem like a bit of an odd fit since many deep learning efforts hinge on GPU clusters for training at scale.
The Cori supercomputer at NERSC, which has been in production for two years, is comprised of Haswell and Knights Landing nodes, which the are great for traditional simulation but face stiff competition from GPUs when it comes to machine learning model training in particular.
Prabhat says teams have been working closely with Intel to get deep learning to scale and perform well with training on Knights Landing and that they did succeed in scaling out the entire Cori phase II partition of the machine (which added around 9600 Knights Landing nodes to the mix) using a software stack NERSC co-developed with Intel. He says that while it is possible to get reasonable performance on a CPU only machine like Cori, “if you take Knights Landing and compare it to Volta there is a performance gap; now we get 2-3 teraflops on a single Knights Landing node for scientific use cases but the number on Volta is likely to be quite a bit different for the same workload,” he says.
As a reminder, the Knights family of processors, which Intel had staked a large claim in HPC with, is no longer being pursued according to the original plan. Recall our conversation about the switched plans for the Aurora supercomputer and subsequent architectural shift with Intel and how it presented a more diverse architecture than had been planned for Knights Hill (which we detailed here when that was still in the works). While NERSC has put a great deal of effort into that Knights family trajectory, it could be that the change means better things for their work on deep learning, although how such a new architecture would benefit HPC remains to be seen as well.
For an Intel-oriented center like NERSC, another option could appear in the very near future. We have already seen what a GPU partnership between AMD and Intel looks like in terms of Radeon for PCs. It is not unreasonable to extend that to servers this year. This would support CUDA and OpenCL and could even have an NVlink like capability with EIMB, shared high bandwidth memory (we say HBM because the only thing using HMC/MCDRAM is Knights Landing, which is dead in the water).
“AI workloads are well suited for HPC hardware and that’s a convergence that will accelerate,” Prabhat says. “Nvidia has been doing it for a few years now and increasingly we will see either conventional CPUs or GPUs or other accelerators showing up on HPC systems so that a single system can do well as supporting both conventional HPC and AI workloads,” he adds. It is interesting to compare this to the new direction post-Knights that Intel is expected to go based on our chat with Intel’s Barry Davis about a multi-purpose architecture to fit all workloads, not just HPC as the original Knights family was targeting.
There is also the possibility of an architecture like the forthcoming Nervana, Intel “Lake Crest” on the table, and while that is expected to perform well on AI workloads, there is no telling how suitable it will be for handling simulation workloads as well or how important low precision and the Neon framework might be for mixed workloads.
Even with all of the different architectural directions NERSC could take eventually post-Knights Landing, Prabhat says the hardware side of all of this is far more simple than what is happening for HPC and AI meshing in software-land. “To begin with, there’s a divergence since all of the deep learning frameworks were not developed by HPC folks. Their datacenter systems are different than ours; they are not tightly coupled so there’s extra effort needed since these were designed for commercial datacenter community.” It is here where the center wants to bridge the divide, using a classic HPC system to figure out intersection points between hyperscale-class AI and supercomputing. “We have a system to use for this with all the complexities; programming challenges, the interconnect, burst buffer, a manycore system.” The goal is to get deep learning running and scaling well on traditional HPC—and seeing what fit there is for HPC applications and use cases.
More can be found on the NERSC deep learning in HPC projects here.