Over the course of the last five years, GPU computing has featured prominently in supercomputing as an accelerator on some of the world’s fastest machines. If some supercomputer makers are correct, GPUs will continue to play a major role in high performance computing, but the acceleration they provide will go beyond boosts to numerical simulations. This has been great news for Nvidia’s bottom line since the market for GPU computing is swelling, and for HPC vendors that can integrate those and wrap the proper software stacks around both HPC and machine learning, it could be an equal boon.
Deep learning and machine learning was the understated story just six months ago at the International Supercomputing Conference but the framework and application-level connections were not being made between the two, even if the hardware was in place. However, these ties are being proven out at scale as an enhancement to traditional simulations—and HPC hardware vendors, especially those whose bread and butter is in supercomputing, don’t want to be left behind as the world marches past standard HPC operations.
With that in mind, kicking off its presence at the annual Supercomputing Conference (SC16), which gets underway in Salt Lake City today, Cray announced another in their line of XC supercomputers, the XC50, which is the first of its machines to feature the latest Nvidia “Pascal” P100 GPU accelerators.
Matched with Intel “Broadwell” or “Haswell” Xeon E5 processors one-to-one, as the XC systems tend to manage accelerators (as opposed to its Storm line of systems, which offer beefier GPU counts in a single node), the XC50 is capable of delivering a petaflops in a cabinet, Barry Bolding, Cray’s senior vice president and strategic lead tells The Next Platform. This balance of compute to accelerators, matched with the “Aries” XC dragonfly interconnect, has led one of the company’s main users on the weather front (a segment where Cray supercomputers dominate) to be among the first to go down the XC50 road—albeit for purposes that extend beyond the traditional pure simulation use case.
The Swiss National Supercomputing Centre (CSCS) is upgrading its existing XC30 machine, “Piz Daint,” to a combination of XC50 and XC40 nodes to add new capabilities to its existing simulation workflows.
“The new Cray XC50 supercomputer will significantly accelerate our computational research capabilities, allowing our users to perform more advanced, data-intensive simulations, visualizations, and data analyses across a wide array of scientific studies,” says Dr. Thomas Schulthess, center director at CSCS. The addition of all of those Pascal generation GPUs and the increasing breadth of news around weather simulation centers integrating neural networks and machine learning into their overall workflows makes this system choice something of a bellwether for other weather systems, not to mention those in other simulation-rich areas. We will describe this shift in the role of pure simulation to a more neural network-driven workflow in several articles this coming week, but suffice to say, Cray is seeing the writing on the wall and the wider set of customers it might reach with a dense GPU system that can handle both machine learning/deep learning workflows and how those might integrate in lockstep with traditional HPC simulations.
This leads us to an important point about the XC supercomputing line—one that will eventually come to an end with “Shasta” somewhere in the next couple of years. The XC line represents more of a portfolio approach from Cray versus an “upgrade” with successive numbers. While there might not be any vast architectural or other differences between the XC30, XC40, or XC50 (instead these are more processor choices), increasing heterogeneity of workloads and the companion needs for hardware are driving Cray to diversify now more than ever before. While the XC50 is more of a differentiated product in the XC line versus any dramatic additions to the architecture (since the goal of XC in the beginning was to be static at the infrastructure level and extensible by nature), Bolding says there have been many smaller modifications to help users attain that petaflops in a cabinet peak figure. Cray teams have added image-based systems management for easier upgrades and enhanced their programming environment to keep up with the capabilities in the Pascal GPUs.
The arrival of deep leaning and machine learning as critical workloads is pushing this further and it will be interesting to see how their approach to building scalable, high performance systems changes as the role of the simulation as the heart of the HPC workflow (versus, say, deep learning) alters in tandem.
The whole XC architecture, developed under the code name “Cascade,” when it kicked off in November 2012 featuring the Aries interconnect was built with the vision that the same interconnect could persist no matter what leaps in compute came down the pike. That Cascade system architecture put four two-socket Xeon compute nodes on a blade, 16 nodes in an enclosure, six enclosures across two cabinets to create a group, all linked with copper networking cables and Aries router chips in a two-level network. To make a bigger system, a third level of networking is done using optical cables, which was designed to scale to hundreds of cabinets and, as Bolding said four years ago, to over 100 petaflops of compute. To add accelerators to the system, you take out half of the Xeon E5 CPUs and add in four accelerators to each blade and Aries just keeps doing the dragonfly topology as before, and as we discussed with Cray chief technology officer Steve Scott back in January, the Aries interconnect is interesting in that it provides relatively consistent latency and ease of system upgrade expansion as machines grow, which is not something you can say about 3D torus or fat tree networks.
“This XC50 is not something on an island. People can buy sets of nodes that are these with XC40 and Knights Landing; we’re not limiting things to GPUs as the accelerators. It’s about choosing the right architecture for the workload and there are many who want this diversity,” Bolding says. Ultimately, “the exciting thing here is that with Haswell, Broadwell, future Skylake or Knights Landing and Pascal, these are fitting into the same overall infrastructure and scaling to the top of the Top 500 list and now handling deep learning workloads.”
Although this is a new product line in Cray’s portfolio of XC supercomputers, it is not breaking traditional supercomputing scalability records, but the underlying story is potentially richer. “We should expect to see some powerful scalability metrics when it comes to deep learning,” Bolding says. “The combination of the Aries interconnect and being able to load data into a global address space memory and run neural networks takes things one step further—there is a story here that goes beyond mere accelerators; we can solve not just partial differential equations quickly, but deep learning problems as well.”
We spent some time talking to Bolding about this shift (or integration) of deep learning and HPC, which we will present in an article tomorrow. Suffice to say, like others in the HPC ecosystem who see the machine learning writing on the wall, keeping an open mind when it comes to architecture is critical. Of course, the more things change, the more they stay the same—at least with the XC line of systems. In particular, the Aries interconnect is here to stay, and is proving itself at scale, performance, and power even with the addition of the compute capabilities in Pascal.
“The great longevity and scalability we and our users keep seeing with Aries, even when adding higher performance compute elements, is important,” Bolding said when asked if this is the final XC machine we will see with Aries, at least in its current variant. What we do know is that Scott told us without equivocation that the future 200 Gb/sec Omni-Path 2 interconnect it is developing with Intel and that is based on a mix of Aries and InfiniBand technologies, will be used in the future Shasta systems and interconnecting Intel processors of various types. But that Cray is also looking at other interconnects and processor options, too.