The HPC market is opening up in a lot of different ways these days, and Cray is right smack dab in the middle of all of this change, embracing it.
There is a Cambrian explosion in compute under way, with a cornucopia of new processors and accelerators becoming available to radically improve the performance of traditional HPC simulation and modeling workloads and to bring machine learning into the HPC fold to augment the capabilities of these HPC systems.
At the SC18 supercomputing conference in Dallas, The Next Platform sat down with Steve Scott, chief technology officer at Cray, to talk about how the company’s forthcoming “Shasta” supercomputers, converge the capabilities of its high-end XC systems and its more traditional CS clusters while at the same time embracing processors and accelerators of all stripes.
In this interview, we talk about how the Slingshot project was started, and how it is at the heart of the Shasta systems, and why it is a key element of Cray’s resurgence as a supercomputer maker that controls its own fate by not only offering customers choice, but also offering its own technology as well as those developed by others.
Scott, who also developed the prior “SeaStar” XT family of interconnects at Cray, is one of the world’s experts on network fabrics, and it is not a coincidence that he did some work on interconnects at Nvidia and followed that up with work at Google after Cray sold off the Gemini and Aries interconnect business to Intel back in early 2012. Scott has always been secretive about his work at Nvidia and Google, but definitely has applied some knowledge of the needs of accelerators and of vast scale computing learned from these jobs before he returned to Cray as CTO two years ago. The moment that happened, we suspected that Scott was up to something with interconnects, although he could not talk to us directly about it when we did a very deep dive on interconnects in early 2016, coincidentally when the Slingshot project started and when Intel’s plan for Omni-Path 200, which was supposed to be the interconnect for Shasta, began to change.
At the heart of the Shasta systems, which will be the foundation of Cray’s business for the next ten years, is the company’s homegrown “Slingshot” interconnect, the company’s first foray into making a custom interconnect since the “Aries” Dragonfly interconnect debuted back in 2012. While Cray will still be supporting Intel’s Omni-Path and Mellanox Technology’s InfiniBand in these future systems should customers desire them, Cray believes that many of its customers – particularly those using the Aries interconnect in its “Cascade” XC system from 2012 and its predecessor, the “Gemini” interconnect used in the “Baker” XE system from 2010 – will adopt Slingshot because it is a superset of 200 Gb/sec Ethernet that preserves and improves upon the adaptive routing of these prior interconnects and adds congestion control as well as true Ethernet compatibility.
With the Shasta systems running Slingshot, the compute and storage can be pulled into one network, with point to point connections between all elements, and the outside world can talk to the same fabric because Slingshot can talk plain vanilla Ethernet frames as well as the special variant Cray has cooked up to goose HPC and AI workloads in ways that normal Ethernet cannot. The best news is, all of this change will come without Cray customers having to tweak their application code. Those applications distribute data to code using the Message Passing Interface (MPI) protocol, and this abstraction layer has always rode above Gemini and Aries interconnects and now does on Slingshot, too.
Being compatible with Ethernet and yet having lots of goodies for supporting networks with more than 250,000 endpoints, the Slingshot fabric might even be appealing to hyperscalers and cloud builders, who have spent enormous amounts of money to create vast networks with high bandwidth, low latency, and controllers that manage the monitoring of networks and congestion and routing on them. And if Slingshot is not of interest to the Super Seven, then perhaps it will be to the upstart competitors looking for an edge to compete with Google, Amazon, Microsoft, Facebook, Alibaba, Baidu, and Tencent who are. That would be a kind of Revenge of the Nerds. . . .