Point To Point In The Datacenter With Andy Bechtolsheim

Most people in the IT community tend to their fields, making their living in their patches, but there are some who change the landscape, and still fewer who do it again and again. Without question, Andy Bechtolsheim is one of the people who has caused his share of tectonic shifts in the industry over the past four decades.

Bechtolsheim was one of the co-founders of a company that was a pioneer in scientific workstations, the first of which was designed at Stanford University, where Bechtolsheim was getting his PhD in the late 1970s and early 1980s. That company, of course, was Sun Microsystems, which was founded in 1982, went public in 1986, and reach $1 billion in sales by 1988 when the Unix server revolution in the datacenter grew out of the workstation market.

In 1995, when the dot-com boom was just getting rolling and Sun was the king of the Unix market, Bechtolsheim partnered with David Cheriton to found Gigabit Ethernet switch maker Granite Systems, which was sold to Cisco Systems a mere year later for $220 million and forming the foundation of its switching business. The pair were famously the first two angel investors in Google back in 1998, and in 2001 they co-founded clustered system maker Kealia, which was then sold to Sun to reinvigorate its hardware business, and in 2005, Bechtolsheim and Cheriton tag teamed once again to found Arista Networks, the upstart switch makers that has embraced merchant switch ASICs early on and created its own Linux-based network operating system. The combination of switches based on a mix of merchant chips plus the Extensible Operating System, which allows for adjunct X86 and FPGA processing on the switch, has been eagerly received by hyperscalers, cloud builders, and more than a few enterprises.

We were, of course, thrilled to have Bechtolsheim sit down with us at our recent The Next I/O Platform event in San Jose.

Right off the bat, we noted that in the past few years, switch ASIC makers have been revving up their cadence of product rollouts and getting on the most advanced process nodes at the chip foundries and have gotten networking more or less back on a Moore’s Law track. While many enterprises are still in the transition to 10 Gb/sec Ethernet – believe it or not –the hyperscsalers and cloud builders have made the jump to 100 Gb/sec and are eyeing jumps to 200 Gb/sec and 400 Gb/sec in the coming years. The obvious question is: What do we need all of this bandwidth for? And moreover, what is the interplay of port bandwidth on the switch and bandwidth going into and out of servers? As we have talked about numerous times here at The Next Platform, as switches add bandwidth there is an opportunity to keep the bandwidth per port constant and increase the number of ports per switch, giving datacenter operators a change to have flatter networks and to spend less money for connectivity.

“That’s a very good question and there are various ways to look at it and think about this,” says Bechtolsheim. “Most of this growth is happening in the cloud, and there is this macro effect that a lot of applications are still moving to the cloud. If you look at Microsoft Azure and AWS cloud revenue growth, it is 50, 60, 70 percent every year, so as you deploy more applications on these cloud infrastructures, obviously you need more hardware – and more networking and more bandwidth – to support that. Now, in the search and social space, let’s call it, which is not commercial applications but large indexes and trying to figure out how to serve you the best ad, there is still more indexing, more content, more photos, more video, more social whatever. So these companies invest in whatever they can to stay ahead on the monetization curve and keep those eyeballs engaged.”

Little things done at scale, Bechtolsheim reminded us, can add up to lots of hardware being needed, and Google instant search, is a good example. Instant search launched in 2010 and, based on our histories, took a guess at what we were searching for as we typed and preloaded search results to our computers from the Google search engine. At the time it was launched, according to Bechtolsheim, instant search tripled the hardware demand for Google’s search engine because it had to cache all of these results for all of its users all the time to make its guesses. “Even seemingly simple features take a lot of hardware to implement if you serve millions and millions of users simultaneously all the time.” (Luckily for Google, it stopped instant search in 2017, mainly because mobile phones could not handle it, although we didn’t notice it died back then because we had disabled it.)

Bechtolsheim says it is hard to reckon exactly what the demand for networking in the cloud – by which he means what we call hyperscalers and cloud builders here at The Next Platform – has grown about 50 percent annually for the past several years. Part of that growth has been driven by elastic demand; as the cost per bit transferred has gone down through several generations of 10 Gb/sec switching, 40 Gb/sec switching, and three generations of 100 Gb/sec switching, customers can afford to buy more bandwidth and they do. The other part of the equation that is driving network bandwidth, says Bechtolsheim, is the increasing throughput on processors: As you add more and more cores to processors, you need to balance out the networking into and out of those processors, and ditto for flash storage, which needs lots of bandwidth to utilize its very high IOPS.

“Flash by itself probably drove an order of magnitude in networking bandwidth just because with NVM-Express and having this flash distributed all over the network,” he says.

The trick for any datacenter is to make a balanced investment, and that is not always an easy thing because compute, networking, and storage are being advanced at their own paces with speed bumps that do not always align. These companies are spending an enormous amount of money on servers, with about half of that just being main memory and flash storage, and the rest is mostly the CPUs and the occasional GPUs and FPGAs and the cost of the actual datacenters themselves. Only about 10 percent of the cost is being spent on the network, says Bechtolsheim, although we think companies were worried for a long time that it would rise higher than that.

“If you could get more juice out of your servers by doubling your network spend, you would actually do that,” says Bechtolsheim. “But you are not going to do that unless you get more out of it. On the other hand, you don’t want to spend less and waste that very expensive server and storage investment because you don’t get to very high utilization on those CPUs. So it is basically keeping the utilization high on all of these servers and storage elements with the right speed at the network, which at this point looks like a 25 Gb/sec or a dual 25 Gb/sec NIC is the most common way to interface a server to a top of rack switch that just happens to be 3.2 Tb/sec, and you can put 48 of these servers in rack and have 800 Gb/sec coming out at a 3:1 oversubscription – that’s actually the cheapest thing you can do right now to connect a rack of servers to a network and still get very good performance.”

The interesting change in networking is we are on a steady cadence of very large bandwidth increases. Switch ASICs with 12.8 Tb/sec of aggregate bandwidth are coming from Broadcom, Innovium, Marvell, Intel, and a few other merchant switch silicon arena players. At that aggregate bandwidth, you can have 128 ports running at 100 Gb/sec or you can split them and run 256 ports at 50 Gb/sec. This allows a single top of rack switch to span three, four, or five racks of servers – depending on the density of the servers and the length of the copper cables used – instead of one. And when 25.6 Tb/sec ASICs start coming out a year from now, provided that the port speed is still 50 Gb/sec coming out of the servers, you could have a single switch span six, seven, or even eight racks of servers. And a year after that, if all goes as planned, there will be 50.2 Tb/sec ASICs and the rack span of a single switch can potentially double again.

It will all come down to how many CPU cores and how much flash are in each box, and if companies jump to 100 Gb/sec on the server ports, there will still be plenty of bandwidth to span many racks with a single switch.

We talked about a lot more in the interview, so have a look.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.