Over the last year in particular, we have documented the merger between high performance computing and deep learning and its various shared hardware and software ties. This next year promises far more on both horizons and while GPU maker Nvidia might not have seen it coming to this extent when it was outfitting its first GPUs on the former top “Titan” supercomputer, the company sensed a mesh on the horizon when the first hyperscale deep learning shops were deploying CUDA and GPUs to train neural networks.
All of this portends an exciting year ahead and for once, the mighty CPU is not the subject of the keenest interest. Instead, the action is unfolding around the CPU’s role alongside accelerators; everything from Intel’s approach to integrating the Nervana deep learning chips with Xeons, to Pascal and future Volta GPUs, and other novel architectures that have made waves. While Moore’s Law for traditional CPU-based computing is on the decline, Jen-Hsun Huang, CEO of GPU maker, Nvidia told The Next Platform at SC16 that we are just on the precipice of a new Moore’s Law-like curve of innovation—one that is driven by traditional CPUs with accelerator kickers, mixed precision capabilities, new distributed frameworks for managing both AI and supercomputing applications, and an unprecedented level of data for training.
At the core of this uptick in new applications enabled by a different way of thinking about problems at scale are hardware and software co-design elements. In the room during the chat with Huang was one an IBM lead for the Minsky architecture, which represents a shift in direction for HPC centers as we have seen with the Summit supercomputer (which will integrate deep learning frameworks into traditional simulation workflows), but for enterprises who are under pressure to integrate AI into conventional product design and beyond. The company’s own DGX-1 Saturn-V supercomputer, which was listed on the most recent Top 500 supercomputer list, is showing the way for enterprises to adopt AI, Huang says, based on its suitability to handle both computationally and data throughput intensive workloads to speed product development, and it is also being used as an end-to-end development platform for an ambitious cancer research initiative that will support exascale-class deep learning and simulation using new frameworks on intractable problems in medical research.
In short, Huang sees a new golden age of computing ahead and it is one so fast-moving that even our current goals for exascale capabilities might arrive sooner than expected and while this might have a hyperbolic ring to it, after listening to HPC end users and vendors from both hardware and software sides, there is indeed a fortunate merging between AI and supercomputing codes and systems. With their lower precision focus and offloading of traditional simulation results to deep neural networks for large-scale inferencing, there is the possibility of lower power consuming, higher-yielding supercomputers around the corner. Judging from the nodes on the forthcoming Summit supercomputer, which is setting the stage for machines capable of handling mixed precision, mixed HPC/AI workloads, this is all really happening—and fast.
“Deep learning is a supercomputing challenge and a supercomputing opportunity,” Huang says. “Modern supercomputers should be designed as AI supercomputers. This means a system has to be good at computational science and data science and that requires an architecture that is good for both. We want to be able to support models that are very large and process may of those across multiple nodes, so interconnectivity is important. We have shown GPUs that can be shared in this way across massive GPU sets of nodes and the supercomputers of the future will be balanced by these two computational approaches with this architecture.”
We have already speculated on what future Volta GPU nodes might look like, and based on the prior jump between Maxwell to Pascal, it is quite significant. The question is whether Nvidia can stay on the same ramp. While Huang obviously couldn’t comment on Volta details, he did say that we are about to enter an era where we will be advancing at a “hyper Moore’s Law” level. “I know it sounds strange,” he says, “but here is an example: Between Maxwell and Pascal, we improved the capability and capacity by 65X. I believe what we will see will be a very big jump, and when it comes to performance, what we will see will not be possible with traditional computing techniques.”
“This is quite a moment for GPU computing. After a decade of evangelizing this, people are now realizing the full important of this approach—and we have invested billions. The world has awoken to the potential of AI.”
In reference to the hyper-Moore’s Law pace of innovation enabled by non-traditional approaches to computing, he adds, “What has taken us 20 years in computer vision we have nailed in four. What’s taken twenty years in speech recognition has been nailed in three years.” He notes that these are software innovations as well, but ultimately, “we will see these leaps in scientific computing as well. It won’t just be standard numerical approaches, data science approaches will predict and infer through a large amount of training.” He pauses, laughs, and says, “there’s no question we are bringing sexy back to high performance computing, and I think this is the first time anyone’s said that about high performance computing.”
“This computing approach that we’ve been evangelizing for a long time has finally reached a tipping point. The CPU is still important for instruction throughput and processing and the GPU is the accelerator of choice for data throughput. We recognized this a long time because there is this large body of opportunities we’ve had plenty of time to explore with video games. We could therefore afford enormous R&D budgets–$15 billion since we’ve started for pushing what is possible with combining data throughput and instruction throughput. Now, the GPU computing era has truly arrived.”
Of course, the AI supercomputing message extends well beyond scientific computing. “We can take these systems into areas where it has always been traditional science. The use of AI for unstructured data can potentially predict and accelerate computations that are largely computational. We can do more than accelerate scientific simulations, think about products. This is a new approach to delivering products to market,” Huang says, pointing to manufacturing companies that are largely simulation based for their offerings, including cars, as an example.
“Our strategy is to create and advance AI and to build an end-to-end platform to reach as many as possible. A data throughput machine is a radically different architecture and the Minsky systems are designed for applications that enterprise are looking to. It is one of the first examples of what an AI supercomputer will look like,” Huang says, pointing out its viability on both the Summit supercomputer and in future enterprise settings. “We sold out of Minsky in the third quarter already, which is not a bad start for an AI supercomputer,” he adds.
“An architecture like we see with Minsky is good for both computational and data science. On top of that is all the software that goes in; the algorithms, libraries, tools, and frameworks. The first part of our strategy is to make sure we have those elements in place, then to evangelize this like we are doing in our deep learning institutes where we have done 50 or 60 large-scale training sessions because in the future, software engineers aren’t going to be writing code so much as training models. They won’t be writing software as we know it. Even inside Nvidia, we want every engineer to have a DGX-1 for this reason. Just like at other companies where this is one part of the company that builds product and another side that runs the company; the part of the company that is about developing new products is going to move very, very fast. My guess is that this is where Minsky is selling into.”
“Hundreds of applications are already GPU accelerated, every framework has been accelerated by it, it is used by researchers and developers all over the world—it is the most pervasive accelerator on the planet,” says Huang in a way that is, indeed, hyperbolic, but is ringing true now more than ever. It is hard to argue with two clear facts at this point in computing history. First, deep learning is on track to revolutionize the way everything from scientific simulations to search engines work. It will take some footwork on the developer front, for sure, but the hardware will be there to meet those innovations. Second, when it comes to computing, acceleration is the new normal. Frankly, not just because of the ability to offload, but increasingly, because of the capability to do that at lower precision, thus adding more oomph to existing compute infrastructure and applications.
Ultimately, it has been one hell of a year for Nvidia. At the beginning of 2015, we may have seen a few use cases in deep learning worth commenting upon. By the end of 2016, we are now seeing a new path for GPU computing beyond HPC—the enterprise requirements for AI will need specialization but with a solid software base to develop from. While Intel is hedging its bets with Knights Mill and future integration of Nervana technology, this is all roadmap. DGX-1 appliances are on the market, Pascal is shipping, Volta is expected to appear in line with Intel announcements and then, friends, things get even more interesting, don’t they?