If the world doesn’t need another thing, one of those things that it doesn’t need is probably another switch operating system.
Cisco Systems, Arista Networks, Juniper Networks, Big Switch Networks, Mellanox Technologies, Hewlett Packard Enterprise, Dell, and Extreme Networks all have their own, tied to their own hardware. The big hyperscalers and cloud builders have their own network operating systems, and Dell, HPE, and Mellanox have open sourced their own to try to catch the open network wave that built momentum over the past few years thanks in large part to Cumulus Networks and its Linux-based switch OS.
The field is not just crowded, it is moated. Still.
So why does Arrcus, a startup that uncloaked from stealth mode recently, think it can make any inroads at all here? Particularly when everyone seemed to be thinking that an open source operating system – Dell, HPE, Mellanox, and Big Switch have all opened up their switch OSes to a certain extent, and the Quagga project provides an open source router – or at least one based on Linux was going to be the network OS that ruled the datacenter in the long run. As Linux is coming to dominate servers for new workloads.
“What we fundamentally want to do is give people a choice to use whitebox routers and switches with an operating system that represents what they get from the big three,” Devesh Garg, one of the company’s co-founders, tells The Next Platform. “It is more like what the big OEMs – Cisco, Arista, and Juniper – have in terms of the scale, performance, and functionality.” And ArcOS, as the company’s product is called (only one R there and an O instead of a U) is designed to be portable across all switch ASICs – at least those that Arrcus will port it to based on customer demand. So, in theory, companies could buy switches from many different vendors and run ArcOS on all of them.
Garg is no stranger to the eccentricities of the semiconductor business. During the dot-com boom and subsequent bust, Garg was general manager of the security chip business at Broadcom, which as everyone knows is the dominant supplier of merchant chips in the datacenter and has been for about a decade. Broadcom’s Trident, Tomahawk, and Jericho chips are the engines of the whitebox revolution in switching with a smattering of routing thrown in for good measure. Garg was also co-founder of Tilera, a maker of a massively parallel MIPS-inspired processor with a 2D mesh interconnect that blazed some trails, but did not take off on its own and ended up inside of EZchip, which in turn was acquired by Mellanox.
Mellanox has moved out the MIPS-alike cores on the Tilera chips and replaced them with Arm cores to create its “BlueField” processor, which it has high hopes for. EZchip bought Tilera for $170 million, which had $40 million in revenues and over 100 companies kicking the tires and using them in production at the time. Mellanox bought EZchip for $811 million in September 2015. The Mellanox deal for EZchip closed in February 2016, and that summer Garg got together with Keyur Patel, who is chief technology officer and who was a distinguished engineer at Cisco for 14 years, and Derek Yeung, who is chief architect and who spent 25 years at Cisco in various engineering leadership roles. Both are among the world’s experts on various routing protocols, including the Border Gateway Protocol (BGP) that is favored by hyperscalers and cloud builders for their hybrid switch/routing gear.
That last bit is important. But hold on a second. We need to talk about money.
Importantly, Arrcus has secured $15 million in Series A funding from General Catalyst and Clear Ventures – notably the a nod of approval from Steve Herrod, a general partner at General Catalyst who was chief technology officer at VMware and was instrumental in the company’s acquisition of virtual networker Nicira in July 2012 for $1.26 billion. The Nicira products are the foundation of VMware’s NSX product line – notable Open vSwitch – that complements its ESXi server virtualization and its vSAN storage virtualization. Herrod was one of the architects of the Transmeta morphing X86 processor as well as the MIPS R10000 processor at SGI way back in the day, and his seal of approval means much.
“Not all operating systems are created equal, and respectively to all of those other suppliers that have come before, they really have not out any dent in the Big Three. The growth of all the other network operating systems has been muted because they don’t really offer a viable alternative. We see a large networking market that is dominated by a few vertically integrated suppliers, and they have a stranglehold on the market and that causes high prices and stunted innovation. And at the same time, we saw this tremendous pressure on the networks themselves – people have more connected devices and they need more and smarter networks with more visibility and control, and at massive scale.”
Any market that is dominated by a few companies that are vertically integrated over time leads to best-in-class horizontal segmentation. This certainly happened in the systems rack in the 1980s and 1990s and into the 2000s. Proprietary systems were the high point of this, and in the late 1980s and early 1990s, Unix systems and their RISC processors took away a lot of the workloads because they at least promised some sort of API cross-compatibility that allowed for applications to be ported. But then Linux came along, running on all popular processor architectures and it rode up the Unix wave and took it over, and pretty much all other processor architectures have come under pressure from the hegemony of the Intel Xeon. There are pockets of non-conformity, and they are growing. But as yet, there has not emerged a Linux equivalent for network operating systems.
There is, as we have noted in the past, tremendous innovation being done on switch ASICs, which is enabled by Moore’s Law advances in transistor density and the consequent lowering of costs of those transistors. Switch ASICs can do so many more things today than they could do a decade or two ago, and there are more ASIC suppliers and more ODMs coming into the market as the OEMs try to hold their ground. There is all of this innovation, but Garg says there is not really a viable network operating system that can span both switching and routing jobs and that can also span all of the merchant silicon and, maybe some day in the future, some of the captive chips if customers demand it.
So, Arrcus went back to the drawing board. Rather than starting with a minimalist and hardened Linux kernel and creating a switch layer on top of that with some Quagga routing functions bolted on the side, the Arrcus engineers created the ArcOS network operating system from scratch. But there are no plans to open it up and give the code away to the community. (At least not yet.) The main strategic difference between Arrcus and the other network operating systems out there in the datacenter is that the latter are very switch centric, and very much play down in Layer 2 with only a few fingers reaching up into Layer 3.
“Switching has to be smarter, and we are very routing centric,” explains Garg. “When you can capture the very high end of the routing capabilities set, it is really easy to come back down and solve the switching set.”
ArcOS has no open source code in it whatsoever, and it was created by a dozen or so of the top people in the world that understand these protocols using a mix of C and C++, as you might expect. This new network operating system prunes out all of the cruft that is no longer necessary in datacenters, and uses Tail-f Systems’ ConfD tool, which provides a command line interface that is similar enough to Cisco’s iOS and Juniper’s JunOS so network administrators can do their work without hurting their heads. (Cisco bought Tail-f four years ago for $175 million.) On the northbound side of the operating system, ArcOS also provides an OpenConf/Yang model with APIs that are harmonized such that it can support any Linux application or any command line interface or any API such as REST, NETCONF, SNMP, or whatever.
On the southbound side, Arrcus has created a Data Plane Adaptation Layer, or DPAL, that is akin to an intelligent hardware abstraction layer that allows it to hook into any merchant or captive silicon. And because ArcOS is the first network operating system to be ported to the “Jericho+” ASIC from Broadcom, which is the one that has the very large memory buffers, it can ingest a full routing table and store it and therefore be used in a variety of routing jobs. This is the kind of thing the hyperscalers have been doing on their own. And importantly, it has been able to create a multi-threaded implementation of BGP, and it can now scale as the core counts go up. These are huge differentiators compared to the alternatives.
ArcOS is also the first network operating system to be ported to the “Trident-3” ASIC from Broadcom, which is used in enterprise-class switches that don’t need to do such heavy lifting at Layer 3.
At the moment, these are the two ASICs that ArcOS supports, but as customers need more, Arrcus will port it to more chips. It is significant that Barefoot Networks, Innovium, Mellanox, Cavium (XPliant), and Nephos are mentioned by name alongside Broadcom, and as we mentioned above, it could be hacked onto proprietary switch or router ASICs if the demand is high enough. For now, the other Broadcom ASICs are in the works, including the Helix and Tomahawk families as well as the earlier Trident-2 chip. On the compute side, ArcOS can hand off applications to X86 or Arm instruction sets for adjunct processing on the switch/router. We don’t think the demand will ever be high enough to justify ports to captive switch ASICs.
And we also think that to get the backing of the Super 8 hyperscalers and cloud builders and other large enterprises that swim in their wake, Arrcus is going to have to consider opening up at least the core of ArcOS. If Google can let go of Borg to create a Kubernetes community, then maybe Arrcus can let go of ArcOS to create a vast customer base and enthusiasm for its network operating system.