Last week, when we talked to Nvidia co-founder and chief executive officer, Jensen Huang, about how the datacenter was becoming the unit of compute and in such a world networking was critical, it was obvious that acquiring Mellanox Technologies for $6.9 billion was just the beginning of the strategy that will no doubt unfold in the coming months and years.
Huang didn’t wait long to make another move, with Nvidia acquiring open network software provider Cumulus Networks for an undisclosed sum and marrying it with Mellanox in its newly formed networking business unit.
Sometimes, to understand what a company is doing you have to take a really hard look at the things that key people at that company have seen and done in their careers. This is one of those cases.
Cumulus Networks was founded in 2010 by JR Rivers and Nolan Leake, and dropped out of stealth in the summer of 2016.
Rivers was the company’s chief executive officer and its face until recent years. Rivers got his start as an engineer at 3Com back in 1989, and then had gigs at Grand Junction Networks and Cisco Systems, rising in the engineering ranks in networking. After a decade at Cisco, Rivers move to a short four-month stint at Google, and then went back to Cisco a few years before the Great Recession and stayed until it was mostly over, which is when Cumulus Networks was founded.
Leake’s first big job was as a member of the technical staff at VMware for three years in 2002 through 2005, and then he took a job as technical director of software engineering at server startup 3Leaf Networks, which created NUMA big iron from InfiniBand switching and a homegrown distributed virtual machine of the likes of ScaleMP, RNA Networks, and TidalScale. Leake was then a member of the office of the CTO at Nuova Systems, the Cisco spinoff that created its NX-OS network operating system and their Nexus line of switches, which was spun back in as Cisco launched the Nexus 5000 series in 2008 ahead of (and more or less concurrent with and connected to its “California” Unified Computing System blade servers, which converged servers and switching). Leake was the founder of Tile Networks, which created a virtualized storage fabric for compute clouds, and then joined up with Rivers to co-found Cumulus Networks in early 2010 as its chief technology officer.
Leake left Cumulus Networks in June 2016 and Rivers took over as CTO until he left in July last year. Rivers went to Amazon Web Services to become a senior principal engineer on its network, and Josh Leslie, who headed up sales at both VMware and Cumulus Networks, took over as chief executive officer at the network operating system startup as Rivers switched roles to CTO back in 2016.
Over the past decade, Cumulus Networks has been working on a number of fronts to try to break open the datacenter switch, prying its operating system from the underlying hardware as Facebook talked to us at length about when we first started The Next Platform five years ago. Some history and laying out of the terrain is in order to understand what Nvidia is doing and why it is doing it.
The hyperscalers and big public cloud providers all have their own network operating systems and various kinds of controllers that comprise their networks; in some cases, as with the customers of Arista Networks, companies can use that company’s Extensible Operating System, a variant of Linux that is hardened for networking and that has features to allow them to put other features on the box given the CPU, FPGA, of switch ASIC functions in a box. In the case of Microsoft, it created the Switch Abstraction Interface, or SAI, and its own variant of a network operating system called SONiC, which itself runs on Open Network Linux, a Linux kernel tuned for networking created by Big Switch Networks, which was eaten by Arista Networks in February and which has a stack of software-defined networking software that makes it a valuable asset.
There was a flurry of activity back in the 2015 and 2016 timeframe, when Hewlett Packard Enterprise open sourced OpenSwitch, a new network operating system, based of course on the Linux kernel, that was inspired by (but distinct from) the Comware and ProVision NOSes that were deployed on its respective 3Com and HPE switches. Dell, which had acquired Force10 Networks in July 2011 to add datacenter-class networking to its portfolio, open sourced the FTOS NOS created by that switch company in early 2016, calling it OS10 and running underneath SAI as well. Mellanox Technologies, just acquired by Nvidia last week, had its own open source NOS interface tools, called SwitchDev, which worked in conjunction with Cumulus Linux as well as the homegrown Onyx operating system from Mellanox (the successor to MLNX-OS) and the Microsoft Azure networking stack (which is not completely open source) and which come from the same time in early 2016. MLNX-OS and Onyx are based on Linux.
There are a lot of open source Linux NOSes out there, or pieces that can be assembled into one. The NOS created by Cumulus Networks is arguably the most popular of the open source ones. Leslie tells The Next Platform it has thousands of customers and hundreds of thousands of ports under management – and importantly, has a staff of network experts who know how to make it work that will be valuable to Nvidia as it makes its tries to realize its datacenter aspirations. But as we have pointed out in the past, networking doesn’t require a Linux kernel – Arrcus has created its own ArcOS from scratch starting from a routing base and coded by Cisco routing luminaries, and it is most certainly not open source, just like neither IOS or NX-OS from Cisco or EOS from Arista or the homegrown OSes from the hyperscalers and cloud builders most certainly are not.
In addition to creating the ONIE NOS installer, which the entire industry uses on whitebox switches powered by ASICs from Broadcom, Mellanox, and others, Cumulus Networks also created a fork of the open source Quagga routing stack, called Free Range Routing, or FRR, which runs atop the Linux kernel as well as a bunch of Unix kernels and which addresses many of the shortcomings of Quagga. The Cumulus Linux 4.0 stack and its NetQ 3.0 telemetry software were last updated in November last year and in April 2020, respectively, adding support for the deep buffer “Qumran” ASICs from Broadcom and the “Spectrum-2” ASICs from Mellanox. To date, by the way, Cumulus itself supports 14 different ASICs from Broadcom and Mellanox (on 134 distinct switch platforms), and while it has been evaluating other ASICs, like those from Innovium and Barefoot Networks (now part of Intel), the latter are not yet supported.
All of this work is distinct from Dent, which is an edge Linux NOS aimed at retail and other types of locations that Amazon (the online retailer, not its Amazon Web Service cloud unit) has been building with Cumulus Networks, Mellanox, Marvell, and others over at The Linux Foundation.
“There is really synergy here,” Kevin Deierling, senior vice president in the networking business unit at Nvidia after the Mellanox acquisition. “If you look at these technologies narrowly, you might say they are all competitive open platforms. But if you look at them broadly, there really is a huge amount of complementary as well as common technologies. We will see where each of these open networking platforms prevail, but more importantly, what this means is that Nvidia has embraced the open networking stack and it is going to accelerate the networking business within Nvidia.”
We suspect we will learn more about exactly how this will all map out during Jensen Huang’s GTC 2020 keynote, which is happening on May 14.