Signposts On The Roadmap Out To 10 Tb/sec Ethernet
September 8, 2017 Timothy Prickett Morgan
The world of Ethernet switching and routing used to be more predictable than just about any other part of the datacenter, but for the past decade the old adage – ten times the bandwidth for three times the cost – has not held. While 100 Gb/sec Ethernet was launched in 2010 and saw a fair amount of uptake amongst telecom suppliers for their backbones, the hyperscalers decided, quite correctly, that 100 Gb/sec Ethernet was too expensive and opted for 40 Gb/sec instead.
Now, we are sitting on the cusp of the real 100 Gb/sec Ethernet rollout among hyperscalers and enterprise datacenters, which John D’Ambrosia, chairman of the Ethernet Alliance trade group, says “will be the largest rollout that we have ever seen,” and that is true for a bunch of reasons. For one thing, the cost of 100 Gb/sec Ethernet switches, which often include routing functions and therefore allow standardization of iron across switching and routing workloads, is coming down fast as new ASICs enter the field based on the 25G signaling standard that the hyperscalers (primarily Microsoft and Google) rammed down the IEEE’s throat a few years back for the good of the entire industry. For another thing, there are machine learning and IoT workloads that are dependent on gathering up immense amounts of telemetry from every device known to man, from blenders to cars, and chewing on it back in the datacenter for insight, and that is putting bandwidth pressure on networks. And then, of course, there is the ever-embiggening media files that we use in our business and personal lives, the increasing cross connection between people, the increasing distributed nature of applications, and the increasing population of the world.
There are no surprises, then, that with 100 Gb/sec Ethernet now at an affordable price, seven years since it entered the field, it is finally ready to take off. It is beyond overdue, based on the pressure from compute and storage, which has been growing capacity faster than networking bandwidth rates in the past decade.
Gigabit Ethernet came out in 1998, running at the blazing speed of 1 Gb/sec then, when the hyperscalers were not anywhere near the technology giants they are today. The 10 Gb/sec Ethernet standard came out in 2002, and it ramped pretty quickly in the datacenter and then down into other use cases, but the jump to 100 Gb/sec did not take the typical three to four years. That is because it 25 Gb/sec lane signaling was not developed and vendors instead decided to gang up the 10 Gb/sec lanes used in the intermittent 40 Gb/sec Ethernet standard, which was really done to give hyperscalers a stopgap on the way to 100 Gb/sec. The telcos did not mind paying the price, because they can pass it on to us in our exorbitant phone and data plan bills. Those providing essentially free services at a massive scale cannot afford to do this. Hence the 25G Ethernet standard in 2014, which has led not only to 25 Gb/sec, 50 Gb/sec, and 100 Gb/sec switching and sometimes routing, but also these speeds at a port density and price point that hyperscalers demand and that enterprises, lagging a few years behind, will be able to leverage.
Anyone who thinks that enterprises will embrace older 40 Gb/sec technologies when they make the jump from 10 Gb/sec gear is silly. They will go to 25 Gb/sec on their server ports and either 50 Gb/sec or 100 Gb/sec on their switches, and if we were them, and they were smart, they would be using cable splitters to use 25 Gb/sec on the switch today and then have the option to move up to 50 Gb/sec and 100 Gb/sec in the future just by changing the splitters and doubling and quadrupling their switches.
Enough cannot be said about how the 25G standard changed things. The networking folks at Alcatel-Lucent (the former Bell Labs where so many great technologies were invented) cooked up this comparison that makes it plain:
This is better at both the server level, where hyperscalers had to aggregate four 10 Gb/sec ports together or buy more expensive 40 Gb/sec adapter cards, on the servers (they did the former at first and the latter after a few years of rolling out 40 Gb/sec on the switches) and at the switch level. To be sure, 25 Gb/sec ports coming out of the server does not provide as much bandwidth as the 40 Gb/sec option, but look at the difference in the switching, which is what really matters in the total cost of ownership calculation. Using 25 Gb/sec signaling, a typical ASIC with 3.2 Tb/sec of aggregate bandwidth can support 96 server ports running at 25 Gb/sec and eight 100 Gb/sec uplinks, and every bit per second of that switch is utilized; moreover, you only need 1,042 top of rack switches to cross-connect 100,000 servers, which is about the capacity of a hyperscale datacenter, give or take. (A region has multiple datacenters, of course.) This is compared to the 40 Gb/sec setup, which would require 3,572 switches for those 100,000 servers. And if you wanted to double up the port speeds to 50 Gb/sec on the servers, yielding 25 percent more bandwidth than the 40 Gb/sec ports, you would still only need 2,084 switches using 25G devices, still a lot fewer than the 40 Gb/sec switch farm.
The 40 0Gb/sec compromise was just that – a compromise until 25 Gb/sec signaling was available, and the issue was that the switch ASIC makers and the IEEE were willing to wait for 25 Gb/sec signaling at the 200 Gb/sec Ethernet bump, and the hyperscalers were not.
The Unfolding Ethernet Roadmap
As we said above, predicting the progression of increasing Ethernet speeds used to be fairly easy, and then it got tough for a while when the base ten upgrades stopped happening in a predictable fashion. Now, rather than focusing on 10X upgrades, the industry is settling in to do 2X, 4X, and maybe 10X or 16X bandwidth improvements over the coming years.
Making such predictions is hard, though, because there is a lot of clever engineering that needs to be done in the switches, in the transceivers, and in the cables to make it all work. There are no sure bets here, because Moore’s Law issues affect switch chips every bit as much as they affect compute chips.
Just for fun, here is an Ethernet roadmap from 2007, showing the steady state that was expected a decade ago:
The demand projections were certainly there. In 2012, the IEEE Ethernet Working Group published a bandwidth assessment report that showed key types of workloads – scientific computing, financial services, telecom are the biggies – growing such that 1 Tb/sec Ethernet would be needed by 2015 and 10 Tb/sec by 2020. The normal cadence:
We obviously do not have 400 Gb/sec, much less 1 Tb/sec, switching available today. But as we have pointed out before, networking is getting back on track and is now accelerating faster than compute again. (It is very likely that compute and networking will not grow faster than storage, but the shift to faster media is mitigating this gap while also putting pressure on the network at the same time.)
In a talk about the future Ethernet roadmap, D’Ambrosia demonstrated that the I/O capacity of a single network ASIC has been doubling every 18 months or so, as expected, but also showed how the capacity in a switch ASIC is being constrained by practical limits on the size of chips. Take a look:
The upshot is this: To cram more bandwidth onto a switch ASIC, the only way to do it is to boost the lane signaling on the SERDES. We just jumped from 10 Gb/sec to 25 Gb/sec, and that was tough enough, but to keep on the Moore’s Law pace, that means switch ASIC makers have to get to 50 Gb/sec signaling in 2018 or so (and we have talked about some vendors who are trying to do that) and to 100 Gb/sec signaling by 2020. That would mean a switch ASIC that could drive a whopping 25.6 Tb/sec of aggregate bandwidth.
That would be pretty sweet to have about right now, and the fact that such capacity is not here is why chip makers and the hyperscalers that are egging them on are pushing the technology ahead of the standards. They can read a SERDES chart as well as the ASIC makers, and they can make their own assessments of how to best make use of signaling rates and lane aggregation to get a desired bandwidth per device and per port. Like this one:
Anywhere you want to take Ethernet in the next couple of years has to intersect a point on that chart above.
“In the past four years or so, we have come to recognize the importance of the SERDES,” explains D’Ambrosia, who is senior principal engineer at FutureWei Technologies and who was chief evangelist for Ethernet at Force10 Networks before Dell bought it several years ago and had a similar position at Dell. “So with 25 Gb/sec you get 100 Gb/sec Ethernet, with 50 Gb/sec you get 200 Gb/sec, and with 100 Gb/sec you get 400 Gb/sec Ethernet. There is a basic relationship that is going to be driving things, and we are following the SERDES. As we do this 1X and 4X approach, we are also finding that we are going to have to do 8X. We are not doing eight times 25 Gb/sec to get to 200 Gb/sec switching, but today we are doing eight by 50 Gb/sec to get to 400 Gb/sec and in the future, we will do four by 100 Gb/sec to get to 400 Gb/sec as well. That begs the question of what the next Ethernet speed is going to be after that. If we follow this logic, eight lanes is what we would jump to after four, and sixteen lanes is a little wide.”
No one is talking about ten lanes at 100 Gb/sec to hit 1 Tb/sec, which would have been the classical bandwidth bump beyond 100 Gb/sec. Everyone assumes that we will be filling in the gaps, making the best price/performance choices to drive down the cost per bit as the hyperscalers have compelled the industry to do. And that probably means that the next speed after 400 Gb/sec is probably going to be 800 Gb/sec or 1.6 Tb/sec at the switch level, very likely sometime way out beyond 2020. That hardly better than the 1 Tb/sec that was originally expected in 2015 and the 10 Tb/sec expected in 2020, mind you. Networking might be getting back to a Moore’s Law curve, as we have observed, but it did not make up for lost time.
The Ethernet Alliance posted a roadmap in 2015, and then updated it again in 2016, and D’Ambrosia says that it will take in all the new developments in ASICs and transceivers and update the roadmap again in 2018. It would be interesting to see far out into the hills of 2025 and the mountain ranges of 2030 and have the industry take a stab at what networking might look like way far out. Compute and networking could hit the Moore’s Law wall at about the same time, and that is precisely what we expect.