Making Mainstream Ethernet Switches More Malleable
August 4, 2017 Timothy Prickett Morgan
While the hyperscalers of the world are pushing the bandwidth envelope and are rolling out 100 Gb/sec gear in their Ethernet switch fabrics and looking ahead to the not-too-distant future when 200 Gb/sec and even 400 Gb/sec will be available, enterprise customers who make up the majority of switch revenues are still using much slower networks, usually 10 Gb/sec and sometimes even 1 Gb/sec, and 100 Gb/sec seems like a pretty big leap.
That is why Broadcom, which still has the lion’s share of switch ASIC sales in the datacenter, has revved its long-running Trident family of chips, which lead the move to 10 Gb/sec over the past decade, while at the same time pushing these ASICs to support higher 50 Gb/sec and 100 Gb/sec speeds and the full programmability that many network administrators demand in their gear these days.
Broadcom has three different families of chips it makes for the switches that other companies build. The Tomahawk line is the direct result of the influences of the hyperscalers and the 25G standard they created a few years back to meet the bandwidth and port count needs of their massive Clos fabrics that span entire datacenters with 100,000 or more servers. The second generation of Tomahawk chips came out last October, offering 64 ports running at 100 Gb/sec, double that of the original Tomahawks that launched in early 2014. The Tomahawks are intended to offer the most bandwidth on a chip, with a more sparese protocol set but the best price/performance and the best performance per watt – exactly what a hyperscaler that wants minimalist iron requires.
The Jericho line of switch chips, which came to the company by virtue of its $178 million acquisition of Dune Networks in 2009 and which were last updated back in December 2016, have a different architecture with very deep packet buffers and expandable routing tables as well as a modular interconnect that makes them suited for carrier-grade networks that need monster devices. These are the most costly chips, on a per port basis, that Broadcom offers.
The Trident family of chips launched seven years ago, and they were the flagship datacenter product until the company forked its line with the introduction of the Tomahawk line for the hyperscalers and cloud builders. Now, the story is that the Trident family, which is being extended yet again, addresses the broader needs of enterprises that have to support a wider variety of protocols than the hyperscalers and cloud builders and who also want more programmability than is offered with the other Broadcom ASICs.
The original Trident chip came out in 2010 and offered 64 ports at 10 Gb/sec, and helped upstarts like Arista Networks take on the incumbents Cisco Systems, Juniper Networks, Hewlett Packard/3Com, and Dell/Force 10 just at the same time as the hyperscalers – Google, Amazon, Microsoft, Facebook, Baidu, Tencent, Alibaba, and China Mobile – were starting massive buildouts of their clouds. The Trident-2 was implemented in the same 40 nanometer processes as the first Trident, but doubled up the bandwidth so that it could support 32 ports running at 40 Gb/sec. The Trident-2+ chips, which came out in 2015, had a process shrink to 28 nanometers, which allowed them to run a little faster and a lot cooler and also to be geared for 100 Gb/sec switches as well as for 10 Gb/sec and 40 Gb/sec devices. Rochan Sankar, senior director of core switch group at Broadcom, tells The Next Platform that the Trident class of chips have shipped over 100 million ports combined across their OEM and ODM customers, and by our math that works out to over well over 3 million switches – a very large installed base for a switch ASIC line.
With the new Trident-3, Broadcom is trying to cover a lot more bases, which is necessary because, as Sankar explains, the networking market is increasingly sub-segmenting itself. This, as we know, is also happening with compute, with new devices being added to the mix to do very specific kinds of computation rather than relying so much on a general purpose CPU and software to implement certain functions.
To be specific, the Trident-3 was designed with a number of tight engineering constraints. First, it had to offer low latency – on the order of several hundreds of nanoseconds of latency for a port hop, according to Sankar. It also had to be affordable at 10 Gb/sec, 25 Gb/sec, and 100 Gb/sec speeds if enterprises would ever be encouraged to upgrade their switch gear. (Some enterprises might use Tomahawk devices, but they do not have the same broad protocol support as the Trident family.) The chips also had to be fully programmable – something all of the network upstarts are offering these days so new protocols can be quickly added without having to wait for a new ASIC and swap out the switch. VXLAN virtual networking took way too long to come to market because it was etched in the chips, and customers don’t want to wait for the next innovation for years as they had to for VXLAN. For its own purposes as well as for those of its customers, the Trident-3 also had to span a much wider range of throughputs and port counts than the prior Trident chips for a larger number of use cases. Take a look at the family here:
The original Trident chip could handle 320 Gb/sec of aggregate switch bandwidth, and this was stepped up to 640 Gb/sec with the Trident+ chip, to 720 Gb/sec with the Trident-2 chip, and 1.28 Tb/sec with the Trident-2+ chip. With the shrink to 16 nanometer processes with the Trident-3, Broadcom is able to boost the top-end performance to 3.2 Tb/sec for datacenter-class spine switches (half that of the Tomahawk-2), but it can also scale it down lower to a miniscule 200 Gb/sec that is suitable for WiFi access points and campus switches 1 Gb/sec or 2.5 Gb/sec ports with 25 Gb/sec uplinks. It is reasonable to expect that Tomahawk-2+ is in development and will double up to 12.8 Tb/sec in aggregate switching bandwidth and that Trident-3+ will double up to 6.4 Tb/sec, but Sankar is making no such promises.
The aim with Trident-3 is to provide networking for enterprises (with standalone apps as well as those running on private clouds) that need broader Ethernet protocol support than the hyperscalers have as well as scaling down to the service provider edge where an increasing amount of compute is also getting done. These enterprises don’t run at the same upgrade cadence – about every two years – as the hyperscalers, and they want programmability so they can add new protocols as they emerge without having to swap out their gear as VXLAN, NVGRE, and other protocols required, and they do not want to pay a latency penalty for these changes to the protocols, either, as often happens with programmable switches. Sankar says that Broadcom is able to add programmability without sacrificing low latency, and that the Trident-3 has about the same latency as the Trident-2+ on similar workloads.
Add it all up, and ironically, switches using the Trident-3 ASIC could be in the field for many years – much longer than the five years a typical switch sits in an enterprise datacenter. This could slow down Broadcom’s sales because in the past new protocols, driven by enterprise customers, required new chips. Broadcom is going to try to make it up in volume, and frankly has little choice to do so since other networking ASIC suppliers are adding programmability. This is the way of the networking world, and everyone has to adjust.
To make it up in volume, Broadcom is taking on the enterprise networking job, end to end:
So, now instead of focusing just in top of rack, spine, and converged core switches, Broadcom’s partners can make devices all based on Trident-3 that scale down to the distribution and wiring close parts of the LAN as well as the datacenter core.
Broadcom is rolling out the Trident-3 from the top down, starting with the X5 and X7 devices, which are sampling now. The X7 chip has 128 SERDES running at 25 Gb/sec and can drive 32 ports running at 100 Gb/sec, while the X5 drops down to 80 SERDES and can drive 48 ports running at 25 Gb/sec plus eight 100 Gb/sec uplinks. At 32 MB, it has twice the buffer memory of the Trident-2+, which weighed in at 16 MB, and considerably more than the 12 MB of the Trident2.
Delta Group’s DNI networking division and Quanta Cloud Technology, both big ODMs, are going to be launching bare metal switches with the ONIE Linux network operating system loader but not including that NOS or any of the optics for the ports for a cost of under $3,000 for 32 ports running at 100 Gb/sec, or about $94 per port. All of the software and optics will raise the price, of course, but this is pretty inexpensive for such bandwidth.
Perhaps equally importantly, because the power consumption of the networking gear is becoming an issue in the datacenter as bandwidths rise and power draw grows exponentially as bandwidth grows linearly, switches using the Trident-3 will sip juice rather than guzzle it like the prior Tridents did, and will even beat the Tomahawks by a smiden, too. Sankar says that current modeling puts the power draw of 32 ports using QSFP28 optics at under 400 watts. Switches using the original Trident ASICs were about 2.5X that, and those using Trident-2+ were about 75 percent higher and Tomahawk was about 25 percent higher.
Like other switch ASIC makers, Broadcom is not supporting the P4 networking programming language created by Barefoot Networks. The Trident-3 chip is built on a set of programmable engines for packet parsing; programmable lookups, databases, and actions; and packet editing, and these are stitched together into a programmable datapath with a programmable metadata layer that drives the pipeline from the ingress of data to the egress.
Each block is programmed, but perhaps most importantly, these blocks are already coded to have full backwards compatibility with prior Trident chips while at the same time allowing for future proofing as new encapsulation schemes, load balancing approaches, new forwarding paradigms, and such come down the pike.
It is natural to ask why programmability is so important, so we did, and here is what Broadcom thinks:
“We see that the programmable pipeline is really geared for intercepting a few categories of innovation that we see rapidly evolving in the networking industry,” Sankar explains. “The first is in network instrumentation, whether that is new mechanisms such as in-band telemetry through active probes or other standardization efforts. The ability to build histograms on flows that flux through the switch via streaming flow tracking and the ability to detect anomalous events in the network such as microbursts, and to characterize packet and flow latencies and packet drop rates – all of this is key. Customers want to diagnose, troubleshoot, and manage their networks more effectively. The second category is new overlays and formats, and we have seen new protocols emerge and build momentum, such as service function chaining or VXLAN or Geneve or MPLS over various transport protocols. Forwarding schema is the last category, we have different potential forwarding paradigms, such as segment routing or source routing or policy-based routing, and these can be intercepted using the programmability features in the Trident-3.”
So the other obvious question is who is going to be doing all of this programming on the switch, which is heady stuff indeed. Broadcom will do the data plane part for all but the biggest and most ambitious customers, who are not afraid of the Trident SDK or getting into the guts of things. Most customers will want to leverage turnkey images for specific functions that can be upgraded over time. Exactly how this will be monetized remains to be seen, but you can bet that both Broadcom and its switch partners would love to have a recurring revenue stream and treat this like SaaS software, not hardware.
The one thing that Broadcom is not talking about as the Trident-3 launches is how it will take on rival switch chip makers Mellanox Technology, with its Spectrum-2 chips, and Innovium, with its Teralynx chips, who are showing off Ethernet devices that range up to 200 Gb/sec and have a relatively quick path to 400 Gb/sec. We expect to see some 200 Gb/sec Tomahawk-2+ unveiling before too long so Broadcom can stay with the pack on the heels of bigger bandwidth.