Cisco Fights The Merchant Network Chip Makers On Their Own Turf

There is nothing wrong with buying a compute, storage, or networking appliance where the ASICs and software both come from the same supplier. There is, however, something wrong with this being the only choice and that is what the long battle to break open the switch by the hyperscalers and cloud builders, exemplified most publicly by Facebook, is all about.

And so we have seen the rise of merchant silicon ASIC suppliers, first for switches and then for routers, and also the rise of homegrown network operating systems at those hyperscalers and cloud builders that can afford such a large investment – and indeed, given the scale and complexity of their networks, they have had no choice but to make such investments and create the NOSes that are not available from the commercial switch and router suppliers. And just behind them, we have seen a wave of portable NOSes, some of them open source, some of them not, that can span multiple switch and router ASICs – and some of those chips even don’t come from Broadcom. . . .

Two decades after the rise of merchant chip suppliers and alternate NOSes not allied to any particular switch or router maker, routing pioneer and switching giant Cisco Systems put a stake in the ground, back in December 2019, with the launch of its Silicon One Q100 routing chip, which has been commercialized in the Cisco 8000 series routers and which is, to this date, according to Rakesh Chopra, a Cisco Fellow and architect on the Silicon One line of ASICs, the only product from Cisco itself based on Silicon One chippery. But what Chopra could not say is whether others, like the hyperscalers and cloud builders, are building their own switches and routers based on the expanded Silicon One lineup, which was widened in October 2020 and which is being further broadened today. But we suspect that more than a few of these companies have Silicon One silicon in proofs of concept, and that more than a few of the commercial NOS suppliers are also getting their hands on Silicon One to consider doing ports.

Before we get into the expanded Silicon One lineup, we would like to make a funny observation. A switch or a router, more than any other kind of device in the datacenter, has been a black box for a very long time, and as it turns out, for the longest time. The hardware and software components of networking devices are like guarded state secrets, proprietary and differentiating in the market, and what customers really got was a collection of ports at a given speed in specific form factors with a set of APIs and functions implemented in the software. While SDKs have been made available to the largest customers, opening up the platform for more malleability and customization for specific needs, the funny thing to us is that the silicon itself, merchant or not, is still largely a black box – very unlike the merchant CPU chips we are all familiar with or even those that are really only going to be used in specific machines by their proprietary vendor. The switch and router ASICs themselves are the last bastion of black boxery in the datacenter. You are lucky to find out the process technology used to etch them and to get a SERDES speed and a SERDES count out of Broadcom, Innovium, Intel, Marvell, Nvidia, and now Cisco. You very rarely see a block diagram, and even more rarely a die shot.

As you might imagine, we think this should change. But, until it does, we get package shots and some basic feeds and speeds, and after the addition of three new ASICs to the line today, Cisco’s Silicon One family looks like this:

In fifteen months, Cisco has rolled out switch and router ASICs that fulfill nine different roles in the network, which is a pretty good start in taking on Broadcom and the other merchant networking suppliers and their switch and router manufacturing allies that have given Cisco plenty of grief in the past decade.

We did not do our job properly in covering the original Q100 routing chip launch in December 2019, and we dropped the ball on the October 2020 announcements, too, when Cisco launched three additional routing chips and three switching chips, all based on the same Silicon One architecture. But we intend to do a good job from this point forward, and our apologies on that. (For most of us, 2020 was a very complicated year, and it was no different for The Next Platform.)

We spent some time with Chopra to get up to speed, who has been intimately involved with the Silicon One effort since it was conceived nearly seven years ago. Chopra got his bachelor’s in computer engineering from Rensselaer Polytechnic Institute back in 1997, and has been at Cisco ever since, starting out as a hardware engineer and rising through the ranks.

“From a Silicon One perspective, we are starting to operate like a standalone silicon company,” Chopra tells The Next Platform. “And in this announcement and in the one from October last year, we did not announce any Cisco products that are consuming these chips. We are a standalone inside of Cisco and our charter is to grow the business of Silicon One and alternate business models. And when we have discussions with external customers, we go through a standard ROI discussion, looking at how much effort it takes for us to onboard somebody versus how much volume and how much revenue we are going to generate. And based on that, we will make a decision about whether or not we do it.”

It is not at all clear how many hyperscalers and cloud builders, or how many OEM or ODM switch makers, might be using or thinking about using Silicon One ASICs. But everyone is talking about it, and clearly from the financial results that Cisco chief executive officer Chuck Robbins talked about several weeks ago when Cisco’s most recent quarter closed, Cisco’s many-year effort to get back into the hyperscalers and cloud builders as well as other “web-scale” service providers and telcos, as Cisco calls them all, is paying off. Robbins said that sales to webscalers accounted for an average of 21 percent of total service provider sales in the prior four quarters, and hit 25 percent in the January 2021 quarter. Some customers are buying raw Silicon One ASICs, some are buying whole Cisco 8000 routers, and we presume they are doing proofs of concept on the ASICs announced last October and look forward to doing the same with the devices being announced now.

The trick to the Silicon One architecture, and which is reflected in its name, is having a single, unified architecture that spans both switching and routing and that includes all features for all devices – unlike the balkanized approaches companies have to deal with when they use multiple products from Broadcom or Intel or Marvell or Nvidia. The feature sets are not always the same across families of chips – in the case of the Broadcom “Trident” and “Tomahawk” switch chips and “Jericho” router chips, this is absolutely intentional to drive different price points related to different capabilities – even if the software development kits are the same or similar enough.

“Silicon One is constructed in a way that is fundamentally different from any other technology in the industry,” explains Chopra, and we are going to let him talk here for a bit because you don’t get a lot of insight into the architecture of a network ASIC a lot of the time. “The way our memory structures work, the way our processing engines work, it allows us to do things which were just completely impossible before. So if you look at other silicon in the market, in the routing space, for instance, they are built up out of two pipeline stages, and they have two stages for a dedicated fabric interfaces. We have a completely scalable architecture where we can have many, many slices that can scale out linearly with technology node. We can fully share all of our memory – we don’t have to replicate memory state. It’s a very fundamental shift in terms of building routing and switching silicon. We have laid this foundation, and we now have this infrastructure where all we do for one of these new pieces of silicon is modulate the number of slices, adjust the scale of various tables, adjust the memory, and then just fabricate it. And that’s why you see so many of these devices coming so quickly. All of these devices are really architecturally identical. We change the number of slices. They’re all run the same P4 forwarding code, they run the same SDK. We’re modulate some scale depending on if we are going after a webscale market or routing market. We adjust the size of packet buffers or add external buffers. But these are all functionally the same, so if you use one, you instantly can use any of the other ones. There’s no re-education that’s necessary. If you think about what that means to a webscaler, they can deploy a top of rack switch and their field techs can understand how to troubleshoot on top of rack switch and that that same field tech can troubleshoot a core router because it’s literally the same kind of ASIC.”

The same ASICs can be used in fixed port devices like 1U and 2U switches common in the top of rack, a disaggregated leaf/spin chassis, with some of the ASICs being used as fabric elements and some as line line cards, or in a fixed modular switch, which implements the connections between the ASICs on the line cards with the ASICs on the fabric in a hard-coded fashion.

In the chart above, the Q100 chip that launched in December 2019 had 10.8 Tb/sec of aggregate routing capacity and was aimed at routing. In October 2020, Cisco added three new routing chips, with the Q200 running at 12.8 Tb/sec, the Q201 running at 6.4 Tb/sec, and the Q202 running at 3.2 Tb/sec. Companion switching chips at the same speeds – the Q200L, Q201L, and the Q202L – were also added, brining the total number of Silicon One devices up to seven.

However Cisco did it – and Chopra is not saying – the Silicon One chips are delivering high performance and much lower power usage than the competition. We don’t know exactly what chips from which vendors Cisco is comparing to, but it would be interesting to see them labelled and also compared to prior generations of Cisco router chips as well. Anyway, the idea is that these are representative comparisons:

The Cisco Silicon One setup using the Q200 chip can provide those 32 ports at 400 Gb/sec in 1U chassis, while the other two require a 2U or 3U chassis. And look at the power consumption:

And Chopra says that the 390 watt thermal envelope for the router based on the  Q200 ASIC is “very conservative” and would likely be a lot lower in the field. And importantly, the switch ASIC power draw would be lower than the transceivers power draw that is needed for the optics to hook into a switch using the Q200. It is this overall number – the switch ASIC power plus the optics power – that really matters.

If you are talking port density per rack unit going up and power consumption per port going down compared to the competition, this is a language that the hyperscalers and cloud builders, as well as their peers in the telcos and service providers, understand full well.

That brings us to the three new devices added today. First, there is a 25.6 Tb/sec switch ASIC, the G100, which matches what Broadcom and Nvidia can put into the field in terms of aggregate switching capacity, plus an 8 Tb/sec routing ASIC, the Q211, and an 8 Tb/sec switch ASIC, the Q211L, that a number of international webscale companies (meaning not in the United States and we presume that they are in China, but maybe not) have asked Cisco to supply to match the aggregate bandwidth of current ASICs they have in their devices. This will allow them to do a box for box replacement, presumably, and not have to rearchitect or rewire their networks.

Add it all up, and Chopra says Silicon One has pretty good coverage in the datacenter across switching and routing, across service providers and webscalers, and across different form factors:

It will be interesting to see what the hyperscalers and cloud builders do. Some of them have spent a decade getting off of Cisco iron. Now, they have a chance to move their software to Cisco chips and unify their ASICs across the entire span of the datacenter and across them, or at the very least play Cisco off against Broadcom and the other upstarts.

AWS
Vendor Voice - High Performance Computing on Amazon Web Services

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.