Hyperscalers Ready To Run Barefoot In The Datacenter
January 30, 2017 Timothy Prickett Morgan
Breaking into the switch market is not an easy task, whether you are talking about providing whole switches or just the chips that drive them. But there is always room for innovation, which is why some of the upstarts have a pretty credible chance to shake up networking, which is the last bastion of proprietary within the datacenter.
Barefoot Networks is one of the up-and-coming switch chip makers, with its “Tofino” family of ASICs that, among other things, has circuits and software that allow for the data plane – that part of the device that controls how data moves through the switch – to be malleable and programmable instead of static as it has been since the dawn of networking. This malleability is something that hyperscalers, their cloud building peers, and perhaps someday large enterprises and HPC centers, will want as they run more and diverse applications on their networks.
This programmability is not a new idea – switches and their router cousins have been programmable for a long time, and the control plane – that part of the network that is governed by forwarding tables and deals with how devices are linked to each other – has been famously made programmable through protocols such as OpenFlow. (There are other approaches, of course.) What is interesting about Barefoot and the P4 programming language that it is giving a leg up along with researchers at Intel, Google, Microsoft, Stanford University, and Princeton University is that P4 allows load balancers and firewalls as well as pieces of databases and middleware to be more network aware by integrating them directly into the data plane.
The Tofino chips, as we explained back in June 2016 when they were announced, implement 100 Gb/sec Ethernet switching at various speeds and bandwidths, and have been co-designed to run P4 really well and to allow customers to, for instance, decide if they want to do fast forwarding or simple lookup as the data plane protocol, which affects switch cut-through latency, and to also set table sizes, which is another important aspect of latency. After years of development, the Tofino chips taped out in the third quarter of 2016 and started sampling to switch makers at the end of the year.
The Tofino chip design has 260 SERDES blocks running at 25 Gb/sec that delivers 6.5 Tb/sec of aggregate switching bandwidth. Depending on the way the communication blocks are carved up and the yields on the SERDES units themselves, Barefoot offers a variety of different SKUs that have different numbers and types of ports and different levels of aggregate bandwidth, as shown in the chart above. The important thing to note is the high radix nature of the chip, as they say in the network industry, which is just a fancy way of saying how many ports you can hang off the device. The more ports per device, the fewer switches you need to connect any given number of server ports, and equally importantly, the fewer switches you need as spines or aggregation switches above the top of rack level. The four Tofino chips offer a good balance of port counts, port speeds, and aggregate switching bandwidth, as you can see.
All of this engineering is only useful if switch makers step up and actually commit to build devices, and the big news is that Edge-Core Networks, a division of whitebox switch maker Accton, and Wistron NeWeb Core (WNC), which is part of the Wistron original design manufacturing (ODM) behemoth, have stepped up to build switches based on the Tofino chips alongside the existing switches they make based on the “Trident” and “Tomahawk” series of ASICs from Broadcom, which is the dominant supplier of chips for datacenter switching.
None of the companies who have invested in Barefoot has committed publicly to actually using the Tofino chips in their own datacenters, but the implication is that they are very keen on the idea and with Edge-Core and WNC committing to making switches based on these upstart chips, there will soon be whitebox suppliers who can sell devices to the hyperscalers, cloud builders, and large enterprises of the world that want open, programmable switching. Barefoot raised $1.35 million in seed money back in December 2013 to get started and having seen the idea and the growing popularity with the P4 programming language, Lightspeed Venture Partners and Sequoia Capital kicked in $24 million in Series A funds the following May. In June, three unnamed partners kicked in dough (our guess is that they were hyperscalers of some sort), and in June 2016, when Barefoot dropped out of stealth and told people about the Tofino chips, Google and Goldman Sachs (which is big into Open Compute) ponied up another $57 million in Series C funding. That Series C round was extended to November last year, when Chinese hyperscalers Alibaba and Tencent kicked in another $23 million to Barefoot to continue development and the product ramp. It is no coincidence at all that Edge-Core and WDC are signing up to manufacture switches based on the Tofino chips. These are the potential customers, who put thousands of switches in each datacenter, and they have hundreds of datacenters across dozens of regions. But, as we say, no one has made a formal commitment to deploy Tofino chips in their datacenter switches.
The two Edge-Core switches based on the Tofino chips are basically variants of the “Wedge 100” switches that the company already makes, except it is replacing the Broadcom Tomahawk ASICs used in the original Wedges with the Tofino chips. The Wedge 100 switch was designed by Facebook and is being deployed internally at the social network, but that doesn’t mean Facebook is endorsing the Tofino chips or buying the Edge-Core variants of the Wedge, which has the same power supplies, fans, and management modules as the original Wedges that Facebook does in fact deploy. The key point is that should Facebook want Wedge switches based on Tofino ASICs, it can get them from Edge-Core. Two variants of Wedges, which we detailed here, are available from Edge-Core: One with 32 100 Gb/sec ports and another with 64 100 Gb/sec ports.
WNC has two new switches based on the Tofino chips. The OWS1800 switch, which will be available in the first quarter of this year, has six uplinks running at 100 Gb/sec and 48 downlinks that can run at 25 Gb/sec or 10 Gb/sec. The OWS6500 has a straight 65 ports running at 100 Gb/sec crammed into 2U of rack space, and it will be available in the second quarter.
The big difference is not feeds and speeds, or even price. It comes down to that programmable data plane, Ed Doe, vice president of products and strategy at Barefoot, explains to The Next Platform. “There are many chips out there from multiple vendors, and all of them have a fixed, hard-coded data plane. So you can’t update and change that data plane and have it do what you want it to do and you can’t upgrade it over time. As a new protocol comes along, such as an extension to VXLAN or a new forwarding protocol or some extension tags, you have to change the chip in these other devices and that takes years. In the world of disaggregation, you can decouple the switch OS from the hardware, and now you can also decide the nature of the forwarding plane on the switch chip, too. From a speeds and feeds perspective, the ASICs are similar, and the power draw, the performance, and the price points are roughly the same. You get the same performance and power density, but now you get programmability, too.”
It was probably too much to hope for the switches based on Tofino chips to be cheaper as well as more malleable. (Grin.) But oddly enough, companies deploying switches based on malleable chips may end up paying less over time for their switching, or at least be able to wring more time out of the devices, and time is a kind of money. The average shelf life of a switch at a hyperscaler or cloud builder is on the order of two to three years, says Doe, and this is on the same order that we have heard anecdotally from the big cloud and hyperscale players. By contrast, the average age of a switch in the enterprise, which changes infrastructure in general more slowly and in switching in particular, is more on the order of five or six years. That’s several generations of iron.
With the data plane being updatable through P4 and the control plane being malleable through OpenFlow and other protocols, hyperscalers might be able to get three, four, or five years out of a switch for at least part of their network infrastructure. A lot depends on how ports will be aggregated as networks require more bandwidth. Enterprises tend to build networks with a lot fewer devices and a lot lower bi-section bandwidth, so they can lag a bit. It will be interesting to see how this plays out, and how quickly new protocols will emerge to deal with software containers and be implemented. It took years for the VXLAN and NVGRE protocols necessary to make virtual machine server virtualization practical and easy to be added to chips. The programmable chips from Barefoot, plus the XPliant ones from Cavium, might have an edge over Broadcom Tomahawks in the container age as these as-yet unwritten protocols emerge.
With these two ODMs signing up to use Tofino chips in their devices, it won’t be long before some of the big OEMs that make switches (and usually servers, too) will announce their support for Tofino. This could take two or three quarters before we see this. Both Barefoot and those prospective OEMs are being as quiet as church mice about it, and even after they adopt Tofino, they might keep quiet. But any switch that supports P4 for data plane programming will be a pretty good indication of what’s under the metal skin of a switch.