Why There’s Hard, Cold Cash For Soft, Disaggregated Routing

No matter if you are talking about compute or networking, there are two opposing forces that are constantly at interplay on a field of green money. The first force is the desire to implement as many specialized functions in transistors to accelerate the performance of those functions. The other force is to abstract those functions in software as much as possible to make them malleable over time so devices can meet those needs.

If performance was not an issue, we would be building switches and routers as well as compute farms out of massive banks of X86 servers or maybe even server processors based on the Arm architecture like our smartphone, tablet, and increasingly PC clients as well as a slew of embedded devices that have long since moved from Motorola 68K, X86, or Power architectures. But there’s the rub. Performance is a big issue, and this is particularly true in networking because you cannot mask latencies in the network as easily as you can with compute and storage.

This is why the experiments of the last decade in network function virtualization, or NFV, and virtual network functions, or VNFs, did not pan out as hoped. We do not have switches and routers running on server infrastructure, but rather on very specialized switch and routing ASICs, or sometimes hybrids that provide functions for both switching and routing. But don’t be confused. That does not mean – it certainly does not mean – that there is not a huge appetite for a more flexible and a more generic kind of switching and routing that looks more like datacenter compute and less like a bunch of dedicated network hardware appliances bought from Cisco Systems, Juniper Networks, and a slew of upstarts that have mostly been swallowed up into the gaping maw of the server incumbents (like Dell and Hewlett Packard Enterprise) or who have built their own software stacks, like Arista Networks, atop merchant networking silicon, usually from Broadcom.

The appetite for such open networking – and we did not say open source networking, but that is an important subset of open networking that may not make a big difference in the longest of runs – is increasing, and it is happening at an increasing rate. But network operators are a conservative bunch, and given the nature of the job – you can lose a few nodes here and there in a datacenter and you recover, but you can’t lose the network or the whole business is hosed – you can appreciate that. There is an appetite for both open switching and open routing, and while we have talked about open switches for a long time, we are just coming around to the need for open routing as credible companies are emerging to take on these tasks.

Open networking has the same chicken and egg problem that open serving did back in the late 1980s and early 1990s, and it took a good recession or two and the dot-com boom to get everyone on board with running some flavor of Unix on top of a RISC processor that offered better bang for the buck than the proprietary systems of the time. The hardware has to come first, and then the software moves onto it and drives the total cost of ownership down relative to fully proprietary solutions. This is ever the way in IT. It started happening with whitebox switching in a big way a decade ago, although no single network operating system has yet emerged here, and it is most definitely happening right now with whitebox routing as Arrcus, DriveNets, Volta Networks, and the open source FRR project take on routing in their own unique ways.

To its credit, Broadcom has pushed the hardest with its “Trident” and “Tomahawk” StrataXGS or “Jericho” StrataDNX ASICs, laying the foundation for merchant silicon that has replaced a lot of the captive switch ASICs and is starting to make a dent in the captive router ASICs now, too. This is why Cisco finally gave up and launched its own Silicon One ASIC in December 2019, starting with a single router chip, and then updated the Silicon One lineup in October 2020 with three additional router chips and three new switch chips. To a certain degree, Innovium, Marvell (XPliant), Nvidia (Mellanox Spectrum), Intel (Barefoot Networks), and Nephos are able to perform Layer 3 routing functions, but at the moment, most of the action is focused on created some kind of whitebox implementation of Broadcom Jericho ASICs; Facebook and Microsoft are using Silicon One ASICs, and Cisco itself is using the initial Q100 chip in its own 8000 series routers. Juniper makes its own ASICs as well. And none of the open network operating systems are allowed to boot on Cisco or Juniper routing gear, although there is technically no reason why it could not be ported if that’s what Cisco and Juniper wanted to do. But that is really their nightmare, not their dream.

Against this backdrop, it is with keen interest that we saw DriveNets rake in $208 million last week in its Series B funding, with D1 Capital Partners in the lead, bringing its total funding raised to date up to $325 million including the Series A round led by Bessemer Venture Partners and Pitango Venture Capital. After this funding round, DriveNets is now riding herd with unicorns, with a valuation in excess of $1 billion.

The amount of funding that DriveNets, which was founded in 2016, has been able to gather up is significant. Cumulus Linux, which tried to make a network operating system focused on switching out of a Linux kernel raised $134 million between 2012 and 2018, and last year was acquired by Nvidia as that company was barely digesting its $6.9 billion acquisition of switch ASIC and switch maker and NOS provider Mellanox. In less than half the time, DriveNets has raised 2.5X as much money – and is focusing on a much smaller subset of the overall networking problem by focusing on disaggregated routing only at telcos and other service providers. To our way of thinking, the advent of Arrcus and DriveNets, which have come on strong in the past several years, plus the rise of Broadcom in switching, has forced Cisco’s hand to compete against Broadcom’s Jericho ASICs in routing with Silicon One. But now the pressure will be to have routing NOS options as well as routing ASIC options, and that means both Arrcus and DriveNets will run on Silicon One ASICs sooner or later. At that point, it will be interesting to see what Cisco does. Will it port its routing stack to Broadcom?

That seems highly unlikely. None of the Unix vendors really made the jump from their own RISC chips to X86, although Sun Microsystems did a half-hearted effort a few times and then reconsidered its position when it stopped making money. And then, Linux plus X86 came along and took all of the money anyway. The routing situation could play out similarly over the long haul, and the economics certainly argue for this kind of tectonic shift. Routers are among the most expensive, monolithic things in the datacenter today, and everybody has them – just like big Unix NUMA servers were the big, expensive things everyone had in the dot-com boom before distributed computing platform software – and we are not talking about web server farms here, but real things like Memcached, NoSQL databases and now true distributed SQL databases, Hadoop analytics, and Spark in-memory processing systems – came along and changed everything.

In the fullness of time, switching and routing in the enterprise datacenters and the service providers will be no different. It already has long since gone that way at the hyperscalers and cloud builders for switching and routing is fast moving in this direction, as best as we can figure from the little these companies say and the rumors we hear.

All of this is what is driving the investment in virtual distributed routers, and the key is that economic shift moving from a scale up design to a scale out one, moving from a proprietary stack focused on a vertical controlled by one vendor to a stack that, in theory and hopefully in practice, can span hardware platforms, various scales, and different use cases. Right now, to use an analogy for networking, we have several operating systems, all distinct, for different switch and router ASICs, and a whole bunch of unsuccessful attempts to create open source stacks, and one or two will emerge and span the iron and the use cases. Just as has happened in datacenter and service provider compute. The biggest hyperscalers and largest public clouds will control their own networking fate – writing NOSes and maybe trying to foster communities like Microsoft has done with SONiC/SAI – until they don’t have to. It is too early to tell who is going to win or lose here, but what we can say is that the level of play is much higher and the money is coming in bigger – and that is an indication that this time, the bit is about to hit the fan.

“DriveNets has been on a five year journey,” Hillel Kobrinsky, co-founder and chief strategy officer, tells The Next Platform. “We started with a lot of knowledge of networking from our previous experiences, but we wanted to change the economics of networking. We looked from one end and saw that the hyperscalers moved compute and storage into the cloud, and at the other we looked at networking from service providers and so that there was no advancement. We thought that VNF and all of those experiments of moving network software into servers, it didn’t make sense from an economic perspective. We are proud of the way that we built a system. We went to the ODMs in Taiwan, to chip manufacturers like Broadcom, and customers like AT&T and we put everybody around one tabled and we designed the Network Cloud, which runs on whitebox routers.”

Kobrinsky founded a Web conferencing service provider called Interwise in Boston back in the dot-com boom, raising $88 million in six rounds of funding and selling the company to AT&T for $121 million a decade later. After that, Kobrinsky built and ran AT&T’s research and development center in Israel. The other co-founder of DriveNets is Ido Susan, who is the company’s chief executive officer, and is a self-described “technology prodigy” who began his career in the intelligence unit of the Israel Defense Forces. In 2008, Susan founded Intucell, which created a “self-organizing network” for mobile carriers to automagically manage the performance of those networks. In 2013, Cisco bought the Intucell business for $475 million, and last June it sold it off to India’s HCL Technologies, where legacy systems go to make profits, for $50 million. Bessemer Venture Partners put in $6 million of the $8.5 million that Intucell raised, and clearly that was a big payoff that it wants to repeat with DriveNets.

The hooks into AT&T obviously have helped DriveNets make its case, particularly after AT&T’s own NOS efforts, based on the DANOS stack from Vyatta and supported by IP Infusion, another disaggregated network operating system that now lives at the Linux Foundation as an open source project. If AT&T picked DriveNets over DANOS, its own software stack, that speaks volumes about its routing capabilities.

AT&T has made no secret of its desire to use use of whitebox networking gear in its network, and has picked the DriveNets Network Operating System (DNOS), one component of the Network Cloud, to deploy a disaggregated core router. That does not mean that AT&T has replaced all of its router appliances with this disaggregated core, but rather that these have been added to the mix and are supporting MPLS traffic across that backbone. AT&T has its own SDN control plane which DNOS is hooking into to be told what to do.

This perfectly illustrates the principles of smashing up the network stack. First, you break the control plane from the data plane so they are separate and can scale independently, and the control plane can run fine on generic CPUs. But you also break the operating system free from the underlying switch and routing hardware and, if possible, make that operating system run across more than two and hopefully at least three different families of ASICs for both switching and routing. This latter bit will come with time, and will create very intense competition in routing.

Anyway, this Distributed Disaggregated Chassis, or DDC for short, designed by AT&T, which was submitted to the Open Compute Project, is perfectly analogous to taking a workload that ran on a monolithic device – a big RISC/Unix NUMA server – and breaking it up so it can run on a cluster of smaller devices lashed by network interfaces – a Linux/X86 cluster.

Instead of hard wiring the line cards based on the Jericho 2 ASICs and hard wiring the fabric interconnects (using the same Jericho 2 ASICs) and putting them all in a modular chassis, the DDC router from AT&T uses 400 Gb/sec Ethernet ports to link multiple line card systems with an aggregate of 4 Tb/sec of bandwidth each. The largest configuration AT&T is having built has 48 line card systems and 13 fabric systems (based on the same design) that delivers 192 Tb/sec of routing capacity. Interestingly, the mesh of routers uses a cell-based protocol to provide redundancy and distribute packets across the cluster, much like modern NUMA servers started doing in the 2000s. The idea is precisely what Arrcus and its ArcOS supported in its Virtual Distributed Router that launched in July 2020.

The Jericho 2 chip from Broadcom has 9.6 Tb/sec of aggregate bandwidth and can drive 24 ports at 400 Gb/sec with deep buffers based on HBM stacked memory. It was announced in March 2018 and started shipping in production in February 2019. It is a mature product at this point, and one that has been awaiting a rival.

Now, here is the funny thing about DriveNets, and it makes sense based on its history. Kobrinsky says that DriveNets is going to focus completely and exclusively on the service provider space – forget large enterprises, hyperscalers, and cloud builders. DriveNets started at the core for routing, Kobrinsky adds, and has moved into the peering and aggregation layers of the network and has even moved out to the edge and is sometimes used in datacenter interconnects. But DriveNets has no desire to move into other routing use cases and has no interest in doing switching at all. At least for now. This is not the case for Arrcus, which believes its routing software can be used across enterprises, hyperscalers, cloud builders, and service providers and also believes it can span from core to edge and back. Volta Networks, who we have also just talked to, is moving from the edge inwards to peering and eventually hopes to get to aggregation and the core, and is similarly not interested in switching but only routing. If Cisco thinks one hardware architecture can span switching and routing at all scales, as it does with Silicon One, we have no trouble believing that one NOS can span switching and routing – and might.

We shall see. A lot depends on the quality of the code and how switching is implemented in these NOSes that have routing as a foundation. (See The Switch-Router War Is Over, And The Hyperscalers Won for our thoughts on this.)

In the meantime, there is plenty of router business at the service providers that DriveNets can disrupt. Kobrinsky says that a Network Cloud cluster of routing nodes can scale to 200 whiteboxes with a total of 8,000 ports and a stupid amount of aggregate bandwidth. The game is about collapsing those service provider networks – broadband, enterprise data, 5G mobile, and so on – down to a single infrastructure that can be configured with services on the fly (many of them running on CPUs in the whiteboxes) but having real routing ASICs that run like a bat out of hell and with deep buffers sitting right next to them.

“We collapse everything down into one infrastructure, and you can have all of those functions running and you can use any port for any service,” says Kobrinsky.

Real routers – meaning old school appliances like those sold by Cisco and Juniper – are not like that. And Kobrinsky says that DriveNets plus whitebox routers can drop the total cost of ownership by 50 percent. And interestingly, DriveNets has come up with a revenue model that is synchronized with the ever-dropping revenue per bit that service providers have to deal with. The bits moved goes up every year, the cost per bit goes down, and the price of the Network Cloud scales with them so they can predict their costs and their profits from the services they offer.

At the moment, the DriveNets Network Cloud runs on the Jericho 2 ASICs from Broadcom, but the company is working to support Cisco’s Silicon One. Most of the service providers have had dual vendor strategies for routing for years, and Kobrinsky doesn’t think that will change. But it could be a pairing of merchant silicon and a mix of merchant NOSes (say from DriveNets and Arrcus) instead of Cisco and Juniper appliances going forward.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

2 Comments

  1. Great article as always, I learn SOooo much everytime I go through from beginning to end. I find virtualized/software defined networking to be interesting. And I remember back to documentaries following the pre-rise of Cisco at Stanford U. Those “boxes” were only meant to tie together allt the various LANs on campus. And there was an OS in there somewhere, so it seems like things go from being a computer with an PHY, to a switch, core router, now back out there other side again a computer with a PHY.

  2. Such a well written article! I wonder what Juniper is doing to address all these? Routing still counts a big part of their revenue.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.