New Dune Chips Enable Heftier Switches
March 19, 2015 Timothy Prickett Morgan
One size does not fit all when it comes to merchant silicon chips that are used to build network devices for enterprise, hyperscale, and service provider customers.
Broadcom is more or less the merchant ASIC supplier of choice for the nascent open switch movement for datacenter switches, although Intel and Mellanox Technologies are active in this area as well. But the existing Trident+ and Trident-II ASICs and the future Tomahawk ASICs from Broadcom do not cover all of the use cases in high-scale, high-bandwidth networking. And so Broadcom is launching a line of switch ASICs that sacrifice a bit on the bandwidth but which offer deeper buffers, more packet processing, and expandability beyond 100 Tb/sec in bandwidth a single switch.
In many of the areas that Broadcom is targeting with the StrataDNX line, device makers are using custom silicon that allows them to command a pretty hefty premium compared to the ASICs they use in fixed port, top of rack devices. Broadcom has about 65 percent market share in for ASICs for top of rack switches these days, and even companies that make their own ASICs such as Hewlett-Packard and Cisco Systems, have opted for Broadcom ASICs in at least some of their top of rack switches. (In Cisco’s high-end Nexus 9000 line, the company uses Broadcom ASICs for the switching but adds in its own chips to implement the APIC protocol for application and network configuration management on top of that.)
The Trident-II ASIC delivers 1.28 Tb/sec of switching bandwidth and its follow-on Tomahawk ASIC, which debuted last year, almost triples that up to 3.2 Tb/sec. As big as that jump is, these Trident and Tomahawk ASICs have fixed deep packet buffers and fixed table buffers on their ASICs, which can be limiting, and they cannot be ganged up to create modular and aggregation switches.
Collectively, the StrataDNX chips are known by the code-name “Dune” internally at Broadcom. The company’s existing “Arad” switch ASIC and related “FE” fabric Ethernet chip in the Dune family are used to create modular switches and the high-scale fabrics that link line cards within the chassis together to create scalable machines that not only sport more ports, but have deeper buffers and larger tables and have quality of service features that can allow for capacity to be allocated on a per-subscriber basis across the switching capacity encapsulated within the chassis. The deeper buffers are key for networks with heavy congestion (which happens on clouds and hyperscale workloads) and larger tables are key for networks with lots of devices hanging off them.
The new StrataDNX chips launched this week are follow-ons to the Arad and FE devices, and Nick Kucharewski, vice president of product marketing at Broadcom, tells The Next Platform that they will not only allow for heftier modular switches to be created, but will also allow Broadcom to better attack the broader Ethernet market.
“Where it becomes relevant is when we are talking about network function virtualization,” explains Kucharewski. “We are bringing a datacenter compute model to service provider challenges. One of the attributes of this product is that it can work within an NFV-ready transport network on the service provider or carrier pipes or on the datacenter pipes or because it has the service provider features that are needed as well as the bandwidth scale required by datacenters. So this particular chipset opens up some interesting opportunities for the manufacturing of equipment that combines some of the best attributes of carrier and datacenter gear.”
The off-chip expandability, such as dumping packet buffers or configuration tables off to DDR4 or GDDR5 memory on switch linecards, will allow switch makers to stop using specialized network processors they have designed and fabbed in the past. As an example, the new StrataDNX in a single chip implementation would allow a top-of-rack switch with 48 ports running at 10 Gb/sec plus a few uplinks running at 40 Gb/sec or 100 Gb/sec, but the difference is that this switch would have gigabytes of packet buffers off the chip, and table forwarding can be scaled to millions of routes and MAC addresses. (The StrataXGS chips used in fixed top-of-rackers have a few megabytes of packet buffer and table capacity.) These deep buffers are particularly important on networks with mixed speeds of Ethernet running across the network fabric, because the speed mismatch would otherwise cause congestion on parts of the network.
As you can see, Broadcom will be arguing that the StrataDNX devices will offer about the same level of expandability as these custom network processors, but will also have about four times the bandwidth, too. This extra oomph is what will allow the StrataDNX chips to be used in markets that are adjacent to the core datacenter switching where the Trident family of chips have done so well over the years, particularly during the 10 Gb/sec rollout.
The Dune family of chips were already popular for spine and end-of-rack switches and core switches in the datacenter as well as in edge and core routers in carrier networks as well as in optical transport networks linking datacenters to each other and in campus core switches. In addition to the top-of-rack example with deep buffers and tables cited above, the new Dune family of StrataDNX chips will likely see use in what Broadcom calls “megascale” datacenter core chassis switches as well as in compact carrier Ethernet aggregation switches, packet aggregation switches, and enterprise campus aggregation switches.
That top-of-rack switch with deep buffers mentioned above would be built using the QumranMX chip, which has the product number BCM88370 in the Broadcom catalog. This chip has full support for Layer 2 and Layer 3 protocols as well as support or MAC network interfaces running at 1 Gb/sec, 10 Gb/sec, 40 Gb/sec, and 100 Gb/sec. It supports four or six uplinks running at 40 Gb/sec or 100 Gb/sec and up to 48 downlinks running at 10 Gb/sec across its 800 Tb/sec of aggregate full duplex switching bandwidth. Support for 25 Gb/sec and 50 Gb/sec interfaces are also baked in for the hyperscale and cloud players that are pushing for these network speeds in their datacenters.
Scaling to 6,000 100 Gb/sec Ports
The other two chips in the updated StrataDNX family are called “Jericho” and “FE3600” and they are used in the scalable switches. The one that Broadcom has its eye on in particular is what it calls a hyperscale cloud chassis. Such a machine will use the Jericho chip, which is BCM88670 in the Broadcom catalog, in line cards and the FE3600 chip in the fabric modules. The Jericho chips peak out at 720 Gb/sec of switching bandwidth, and you can link up to four of them together without the fabric chip. For larger configurations, you can lash up to 144 of them into a single fabric for a wonking chassis, which would have 103.7 Tb/sec of aggregate bandwidth in a one-tier fabric topology and deliver at least 6,000 ports running at 100 Gb/sec. The FE3600 fabric chip has 10 GHz SERDES for 10 Gb/sec and 40 Gb/sec ports and 25 GHz SERDES for 25 Gb/sec, 50 Gb/sec, and 100 Gb/sec ports. The latter two will be important for aggregating capacity coming off top-of-rack switches that use the Tomahawk StrataXGS ASICs, which will link servers with 25 Gb/sec server ports to the Ethernet fabric.
One interesting potential use case for the StrataDNX chips is for WAN traversal, which Kucharewski says is commonly done by an edge router these days, with MPLS or Layer 3 protocols used to tunnel between datacenters. “We have the qualities of the datacenter core equipment and the Ethernet feature set that is required for the wide area network. So there is potential for a new kind of device that does WAN traversal on the same equipment that you are doing for the datacenter core. This converges the edge router into the datacenter core and also provides higher bandwidth because you are no longer using separate equipment.” Thus far, no one has implemented such a design, says Kucharewski, but the three new Dune family chips are only just sampling now. It generally takes about a year to go from first sampling to finished products using a Broadcom ASIC, so we should expect to see the top of rack, hyperscale chassis, and maybe this converged edge router-datacenter core machine sometime in early 2016.