Open source software has done a lot to transform the IT industry, but perhaps more than anything else it has reminded those who architect complex systems that all elements of a datacenter have to be equally open and programmable for them to make the customizations that are necessary to run specific workloads efficiently and therefore cost effectively.
Servers have been smashed wide open in large enterprises, HPC centers, hyperscalers, and cloud builders (excepting Microsoft Azure, of course) by the double whammy of the ubiquity of the X86 server and the open source Linux operating system, and storage has followed suit and has even arguably become just another kind of server workload on distributed systems. Networking, which is at the heart of the datacenter, is a bit trickier to bust open, but gradually this is happening, first with open source network operating systems, virtual switches, and control plane software that are all part of an evolving “software defined networking” stack.
The last bit of the network stack that needs to be opened up and made programmable – what is called the data plane or the forwarding plane – is still firmly controlled by those who make network chips and operating systems. But upstart switch chip maker Barefoot Networks, which uncloaks from stealth mode today, wants to change that and its efforts are going to probably smash the proprietary switch and router as we know it for good.
Unix system upstart Sun Microsystems famously said ahead of the dot-com boom two decades ago that the network is the computer, and this adage has never been more true. Networking is the glue that make distributed applications possible, and it is often the bottleneck when it comes to squeezing the most performance out of a cluster of machines. Much of the hullaballoo about software defined networking in the past decade has focused on extracting the control plane from individual switches and routers and centralizing it on a set of servers so traffic can be shaped dynamically in real time. The control plane is like the traffic cop directing how data packets move between devices through elaborate forwarding tables. But the data plane, which controls how data moves through switches and routers, has still been static to a large extent. You would think that the data plane could be fairly static, but at the very high end where hyperscalers like Google and Facebook have hundreds of thousands of devices on a big, flat Clos network in a single datacenter, being able to dynamically change both the control plane and the data plane within devices is the next big thing, and some would argue the last big thing, that needs to be made malleable within the datacenter.
Programmability is not new to either switch ASICs or the chips at the heart of network adapters, but making them programmable to all and using an open source, domain specific language as Barefoot Networks has done, is new.
That new language, called P4, was developed in conjunction with the techies from Barefoot Networks along with researchers at Intel, Google, and Microsoft as well as from Stanford University and Princeton University. Stanford is, of course, a hotbed for networking and it is where the OpenFlow protocols for open source control plane software were cultivated over the past decade. Nick McKeown, who is chief scientist at Barefoot Networks and one of its co-founders, is a professor at Stanford as well as being one of the founders of virtual networking upstart Nicira, which VMware acquired for $1.26 billion back in the summer of 2012 to get a virtual networking companion to its ESXi server virtualization and, now, its VSAN storage virtualization.
The research on the programmable network that resulted in the formation of Barefoot Networks was initiated through a collaboration between Stanford and Texas Instruments back in 2011. Three years later, Barefoot Networks was co-founded by McKeown and Martin Izzard, who is the CEO at the firm and who was previously in charge of the research and development labs at TI, and Pat Bosshart, CTO at Barefoot Networks, whose 37 year career at TI was capped by being a fellow.
The 80-person company has raised over $130 million in three rounds of funding, with two initial cash injections from Sequoia Capital, Lightspeed Venture Partners, and Andreessen Horowitz which was just followed up by a third round led by Google and Goldman Sachs.
The P4 language is domain specific language that was created by Barefoot Networks and is now supported by over 40 organizations, all of whom want to be able to make parts of their software stack – ranging from whole load balancers and firewalls to pieces of databases and middleware – more network aware by integrating them into the data plane. The seminal P4 paper published in 2013 concerning the programming of protocol-independent packet processors outlines the vision the project founders have, and it takes the relatively simple abstraction of OpenFlow (which abstracts the forwarding tables from the switches) and takes it up a whole level higher to a full blown programming language that can control the data plane in switches and routers.
Everyone wants such programmability, of course, but as Ed Doe, vice president of sales and marketing at Barefoot Networks, explains to The Next Platform, the issue has always been that fixed function network ASICs could runs 10X to 100X faster than more generic and programmable switch and router chips, and thus these were not practical from an economic standpoint. You can’t buy 10X to 100X more switches for a datacenter to yield the same performance, and that is why Barefoot Networks has spent years developing a programmable switch ASIC, called Tofino, that is the companion to P4 language, much as C and C++ are languages that compile on CPU processors like X86 chips or OpenCL and CUDA compile down for GPU accelerators or C and MatLab compile down to DSPs.
“Networking has been late to the game,” says Doe. “A lot of SDN has been limited to the control plane but the data plane has been left out of it, with the exception of Nicira and its ability to control the virtual switch. You can think of this as the next wave of networking innovation, and we are trying to move away from the overused and diluted term SDN, and we simply focus on bringing programmability down to the data plane.”
The Tofino chip implements what Barefoot Networks calls a Protocol Independent Switch Architecture, or PISA for short, and this is analogous to the advent of RISC processors in a world that was formerly dominated by CISC chips. With CISC chips, you have very complex instructions that are specific to a particular system and its applications, but with RISC chips, you break down instructions into simpler elements and let compilers gang them up to do the same functions as a CISC instruction. But because the RISC instructions are simpler, as long as you can build deep pipelines then you can crank up the clock speeds to get more aggregate throughput from the RISC chip in a lower thermal envelope than is possible with a CISC design. It is the RISC-iness of the Tofino chip that gives it a personality, and P4 applications would turn it into a firewall, a load balancer, or an Ethernet switch as the conditions dictated.
This is a very powerful idea, indeed. Rather than take a relatively static switch ASIC and mix it with an X86 processor, you make the switch ASIC itself a much more general and programmable device and that makes the Ethernet protocol another programmable data stream.
“The Ethernet protocol was defined over twenty years ago, and the only thing that has really happened to boost performance is feeds and speeds of the ASICs,” says Doe. “But there really has not been a place to innovate at the protocol level on their own. We have come up with a way to do this with no overhead in terms of performance, or power, or cost. It is kind of hard to believe that all of a sudden we can make the switch from fixed function to programmable for high speed switches, but this is largely possible because of process technology. If you think of how the chip controlling the data plane is constructed, it is largely made of I/O, and this is the same whether it is fixed or programmable. The next biggest thing on the ASIC is memory, which is table memory or packet buffer memory, and those are the same whether the switch is fixed or programmable. The last thing is the logic, and we have been able to make it programmable and this does scale pretty well with the process node, generation over generation.”
The Tofino chip still frames data packets like the Ethernet protocol, and at a peak of 6.4 Tb/sec that provides about twice the performance of any switch ASIC out there. (The basic features of a programmable ASIC were outlined in this paper from the P4 backers in 2013.) The Tofino chip is implemented in 16 nanometer processes from Taiwan Semiconductor Manufacturing Corp, and it is this very small shrink that allows it to have enough performance to outperform fixed function switch ASICs.
At the moment, there are four different variants of the Tofino chips in development, which come in speeds of 1.8 Tb/sec, 2.4 Tb/sec, 3.2 Tb/sec, and 6.4 Tb/sec. Tofino has 260 SERDES that run at 25 Gb/sec speeds, and these can be stepped down to 10 Gb/sec speeds. The SERDES can be grouped into bandwidth chunks that support ports running at 10 Gb/sec, 25 Gb/sec, 40 Gb/sec, 50 Gb/sec, and 100 Gb/sec speeds. As for latency, end users can decide that for themselves, which is a powerful concept, too. On the latest Broadcom “Tomahawk” chips, for instance, the port-to-port hop latency is on the order of 450 nanoseconds to 550 nanoseconds, says Doe, and on Tofino you can do below 400 nanosecond latency and decide if you want to do fast forwarding or simple lookup as your protocol and because you get to decide the table sizes, you also control another important aspect of the latency.
No surprises, then, that cloud builders, storage cluster makers, and financial services companies are all interested in Barefoot Networks and its P4 language and Tofino chips. But the hyperscalers are probably going to be the early adopters.
The P4 language that turns these Tofino chips into recognizable network devices like switches, firewalls, and load balancers is open source under an Apache v2 license and is being used by a number of network equipment providers, including Netronome, Xilinx, Huawei Technology, and AT&T; Google has also been playing around with it and is presumably going to be an early adopter of Tofino switches once the chips are available.
Doe says that the first generation of Tofino chips are being taped out now and will be back from the foundry in the fourth quarter. The hyperscalers and their ODM partners are designing their Tofino switches now, as are whitebox switch makers, he adds.
The truly interesting part of the technology developed by Barefoot Networks is that it has been seeding application software created by Google, AT&T, and others long before the hardware that makes the best use of it comes to market, through a complete P4 stack that includes compilers, debuggers, visualizers, and reference programs. Those companies are compiling P4 code down on their own network processors, and therefore the functions will be, in theory, portable across many different network ASIC architectures, depending of course on the instructions and operations they support. (Much as a C program is portable across X86, ARM, Power, and Sparc architectures with optimizations for each architecture.)
But perhaps more significantly, large organizations will be able to add features and functions to protocols using P4 that they might have to spend years pushing through the IEEE standards bodies and then wait a year or two longer before it appears in an ASIC from Broadcom, Cisco Systems, Hewlett-Packard Enterprise, Intel, or the few others who make switch chips. It took four years for the VXLAN overlay created by Cisco and VMware to make it into merchant silicon, and that is simply too long.
The other neat thing is that Ethernet network specifications are written in English, which is subject to lots of interpretations on how to implement that spec, while data plane network behavior is written in P4 code, which you can’t really argue or interpret. You can, however, improve it. So in some far off future where all network ASICs speak P4, the standards might be written in this language and be actual pieces of code that run on different hardware implementations where chip makers still innovate. This will be a much more user-centric future, if it should come to pass. And one that switch chip makers and switch makers may come to abhor because it makes it harder for them to control product cycles and prices, and therefore profits.
We might as well just come out and admit that if P4 succeeds on an array of programmable switch ASICs, then a whole bunch of profits are going to vanish.