Amazon Web Services, the juggernaut of cloud computing, may be forging its own path with Arm-based CPUs and associated DPUs thanks to its 2015 acquisition of Annapurna Labs for $350 million. But for the foreseeable future, it will have to offer X86 processors, probably from both Intel and AMD, because these are the chips that most IT shops in the world have most of their applications running upon.
We talked about that, and how AWS will be able to charge a premium for that X86 compute at some point in the future, in a recent analysis of its Graviton2 instances and how they compare to its X86 instances. Other cloud providers will follow suit. We already know that in China, Tencent and Alibaba are eager about Arm-based servers, and so is Microsoft, which has a huge cloud presence in North America and Europe.
There is no such explicit need to support a particular switch or routing ASIC for the sake of cloud customers as there is for CPUs. And that is why we believe that AWS might actually be considering making its own switch ASICs, as has been rumored. As we detailed way back when The Next Platform was established, AWS has been building custom servers and switches for a very long time, and it has been concerned about its supply chain of parts as well as vertical integration of its stack for the past decade. And we said six years ago we would not be surprised if all of the hyperscalers eventually took absolute control of those parts of its semiconductor usage that it could for internal use. Any semiconductor that ends up being a part of back-end infrastructure that cloud users never see, or part of a platform service or software subscription that customers never touch, can be done with homegrown ASICs. And we fully expect for this to happen at AWS, Microsoft, Google, and Facebook. And Alibaba, Tencent, and Baidu, too. And other cloud suppliers that are big enough elsewhere in the world.
This is certainly true of switch and router chippery. Network silicon is largely invisible to those who buy infrastructure services (and indeed anyone who buys any platform services that ride above the infrastructure services), and in fact, the network itself is largely invisible to them. Here is an example of how invisible it is. A few years back when we were visiting the Microsoft region in Quincy, Washington, we asked Corey Sanders, the corporate vice president in charge of Azure compute, about the aggregate bandwidth of the Microsoft network underpinning Azure. “You know, I honestly don’t know – and I don’t care,” Sanders told us. “It just appears infinite.”
The point is, whatever pushing and shoving is going on with AWS and Broadcom, it will never manifest itself as something that customers see or care about. This is really about two hard-nosed companies butting heads, and whatever engineering decisions have been already made and will be made in the future will have as much to do with ego as feeds and speeds.
There is a lot of chatter about the hyperscalers, so let’s start with the obvious. All of these companies have always hated any closed-box appliance that they cannot tear the covers off, rip apart, and massively customize for their own unique needs and scale. This is absolutely correct behavior. The hyperscalers and largest public clouds hit performance and scale barriers that most companies on Earth (as well as those orbiting Rigel and Sirius) will never, ever hit. That’s their need, not just their pride. The hyperscalers and biggest cloud builders have problems that the silicon suppliers and their OEMs and ODMs haven’t thought about, much less solved. Moreover, they can’t move at Cisco Systems speed, which is find a problem and take 18 to 24 months to get a feature into the next generation ASIC. This is why software defined networking and programmable switches matter to them.
Ultimately, these companies fought for disaggregated switching and routing to drive down the price of hardware and to allow them to move their own network switching and routing software stacks onto a wider variety of hardware. That way, they can grind ASIC suppliers and OEMs and now ODMs against each other. The reason is simple. Network costs were exploding. James Hamilton, the distinguished engineer at AWS who helps fashion much of its homegrown infrastructure, explained this all back in late 2014 at the re:Invent conference, which was five years after the cloud giant had started designing its own switches and routers and building its own global backbone, something that Hamilton talked about back in 2010 as this effort was just getting under way.
“Networking is a red alert situation for us right now,” Hamilton explained in his keynote address at Re:Invent 2014. “The cost of networking is escalating relative to the cost of all other equipment. It is Anti-Moore. All of our gear is going down in cost, and we are dropping prices, and networking is going the wrong way. That is a super-big problem, and I like to look out a few years, and I am seeing that the size of the networking problem is getting worse constantly. At the same time that networking is going Anti-Moore, the ratio of networking to compute is going up.”
The timing is interesting. That was after AWS had embraced the merchant silicon for switch and routing ASICs from Broadcom, and it was six months before Avago, a semiconductor conglomerate run by Hock Tan, one of the richest people in the IT sector, shelled out a whopping $37 billion to buy semiconductor maker Broadcom and to take its name.
You don’t build the world’s largest e-commerce company out of the world’s largest online bookseller and then create an IT division spinout that becomes the world’s largest IT infrastructure supplier by being a wimp, and Jeff Bezos is certainly not that. And neither is Tan, by all indications. And that’s why we think, looking at this from outside of a black box, AWS and the new Broadcom have been pushing and shoving for quite some time. And this is probably equally true of all of the hyperscalers and big cloud builders. Which is why we saw the rise of Fulcrum Microsystems and Mellanox Technology from 2009 forward (Fulcrum was eaten by Intel in 2011 and Mellanox by Nvidia in 2020), and then the next wave of merchant chip suppliers such as Barefoot Networks (bought by Intel in 2019), Xpliant (bought by Cavium in 2014, which was bought by Marvell in 2018), Innovium (founded by people from Broadcom and Cavium), Xsight Labs, and Nephos. And of course, now Cisco Systems is trying to make up to them all by having its Silicon One ASICs available as merchant silicon.
Tan buys companies to extract profits, and did not hesitate to sell off the “Vulcan” Arm server processors that Broadcom had under development to Cavium, which was eaten by Marvell and which last year shut down its own “Triton” ThunderX3 chip because the hyperscalers and cloud builder customers it was counting on are going to build their own Arm server chips. And with old Broadcom having basically created the modern switch ASIC merchant silicon market with its “Trident” and “Tomahawk” ASICs, the new Broadcom, we speculate, wanted to price its ASICs more aggressively than the smaller old Broadcom would have felt comfortable doing. The new Broadcom has a bigger share of wallet at these hyperscalers and cloud builders, many of whom have other devices they build that need lots of silicon. So there is a kind of détente between buyer and seller.
“We’re not going to hurt each other, are we?” Something like that.
We also have to believe all of this competition has directly or indirectly hurt the Broadcom switch and router ASIC business. And hence we also believe Tan has asked the hyperscalers and cloud builders to pay more for their ASICs than they would like. And they have more options than they have had in the past, but change is always difficult and risky.
We don’t know what switch ASICs the hyperscalers cloud vendors use, but we have to assume that all of these companies have tried out their homegrown network operating systems on each and every one of them as they tape out and get to first silicon. They pick and choose what to roll out where in their networks, but the safe bet in recent years has been Broadcom Tomahawk ASICs for switching and Jericho ASICs for routing, and maybe having Mellanox or Innovium or Barefoot as a testbed and negotiating tactic.
This tactic may have run its course at AWS, and if it does, the cause will be not only hard-headedness and pride, but the success that the $350 million acquisition of Annapurna Labs back in 2015 has had – just when AWS was hitting a financial wall with networking at the same time as Avago was buying Broadcom and the Tomahawk family was coming into being specifically for hyperscalers and cloud builders – in demonstrating that homegrown chips can break the hegemony of Intel in server CPUs.
So that’s the landscape within which AWS may have decided to make its own network ASICs. Let’s look at this from a few angles. First, economics.
What we have heard is that AWS is only spending around $200 million a year for Broadcom switch and routing ASICs. We believe the number is larger than that, and if it isn’t today, it surely will be as AWS grows and its networking needs within each datacenter grow.
Let’s play with some numbers. Take a typical hyperscale datacenter with 100,000 servers. We don’t care if they are compute servers or storage servers, by and large, on average, there is something on the order of 200,000 CPUs in those machines. From the people we talk to who do server CPUs for a living, you need to consume somewhere between 400,000 to 500,000 servers a year – meaning 800,000 to 1 million CPUs a year – for the cost and trouble of designing chips, which will cost somewhere between $50 million and $100 million per generation. This does not include the cost of fabbing these chips, packaging them up, and sending them to ODMs to build systems. AWS clearly consumes enough servers in its 25 regions and 80 availability zones (which have multiple datacenters at this scale each).
Now, depending on the network topology, those 100,000 servers with 200,000 server chips will require somewhere between 4,000 and 6,000 switch ASICs to make a leaf/spine Clos network to interlink all of those machines. Assuming an average of two datacenters per availability zone (a reasonable guess) across those 25 regions, and an average of around 75,000 machines per datacenter (not all of the datacenters are full at any given time), that’s 12 million servers and 24 million server CPUs. Depending on the topology, we are now talking about somewhere between 480,000 and 720,000 switch ASICs in the entire AWS fleet. Servers get replaced every three years, on average, but switches tend to hang on for as long as five years. Sometimes longer. So that is really like 100,000 to 144,000 switch ASICs a year. Even if it is growing at 20 percent per year, it is nothing like the server CPU volumes.
But, that is only counting datacenter switching. Those numbers do not include all of the switching AWS needs, which will be part of its Amazon Go stores and its Amazon warehouses, themselves massive operations. If the server fleet keeps growing, and these other businesses do, too, then Amazon’s overall datacenter and campus and edge switching needs could easily justify the cost and hassle of making networking chips. Add in routing, and a homegrown ASIC set with an architecture that spans both switching and routing as Cisco is doing with its own Silicon One (which Cisco no doubt would love to sell to AWS but good luck with that), and you can pretty easily justify an investment of around $100 million per generation of ASIC. (Barefoot Networks raised $225.4 million to do two generations of its Tofino ASICs, and Innovium raised $402.3 million to get three Teralynx ASICs out the door and have money to sell the stuff and work on the fourth.)
Now, let’s add some technical angles. What has made Annapurna Labs so successful inside of AWS is the initial “Nitro” Arm processor announced in 2016, which was used to create a SmartNIC – what many in the industry are now calling a Data Processing Unit or a Data Plane Unit, depending, but still a DPU either way – for virtualizing storage and networking and getting these off the hypervisors on the servers. The new Nitros get damned near all of the hypervisor off the CPU now, and are more powerful. These have spawned the Graviton and Graviton2 CPUs used for raw computing, the Inferentia accelerators for machine learning inference, and the Trainium accelerators for machine learning training. We would not be surprised to see an HPC variant with big fat vectors come out of AWS and also do double duty as an inference engine on hybrid HPC/AI workloads.
Homegrown CPUs started out in a niche and quickly spread all around the compute inside of AWS. The same could happen for networking silicon.
AWS controls its own network operating system stack for datacenter compute (we don’t know its name) and can port that stack to any ASIC it feels like. It has the open source Dent network operating system in its edge and Amazon Go locations.
Importantly, AWS may look at what Nvidia has done with its “Volta” and “Ampere” GPUs and decide it needs to create a switch that speaks memory protocols to create NUMA-like clusters of its Trainium chips to run ever-larger machine learning training models. It could start embedding switches in Nitro cards, or do composable infrastructure using Ethernet switching within racks and across racks. What if every CPU that AWS made had a cheap-as-chips Ethernet switch instead of an Ethernet port?
Here is the important thing to remember. The people from Annapurna Labs who made the move over to AWS have a deep history in networking and some of their closest colleagues are now at Xsight Labs. So maybe this talk about homegrown network ASICs is all a faint as AWS is testing out ASICs from Xsight Labs to see how they compete with Broadcom’s chips. Or maybe it is just a dance before AWS just acquires Xsight Labs as it did Annapurna Labs after choosing it to be its Nitro chip designer and manufacturer ahead of its acquisition by AWS. Last December, Xsight Labs announced it was sampling two switch ASICs in its X1 family, one that had 25.6 Tb/sec of aggregate bandwidth that could push 32 ports at 800 Gb/sec and a 12.8 Tb/sec one that could push 32 ports at 400 Gb/sec using 100 Gb/sec SerDes with PAM4 encoding.
It would be difficult, but not impossible, to put together a network ASIC team of the caliber that AWS needs. But as we pointed out, the Annapurna Labs people are a good place to start. And we fully realize that it takes a whole different set of skills to design a packet processing engine wrapped by SerDes than it takes to design and I/O and memory hub wrapped by a bunch of cores. (But when you say it that way. . . )
A little history is in order, we think. It all starts with Galileo Technology, which was founded in 1993 by Avigdor Willenz to focus on – wait for it – developing a high performance MIPS RISC CPU for the embedded market. This chip Galileo created ended up being used mostly in data communications gear, and was eventually augmented with designs based on PowerPC cores, which eventually came to rule the embedded market before Arm chips booted them out. In 1996, Galileo saw an opportunity and pivoted to create the GalNet line of Ethernet switch ASICs for LANs (launched in 1997) and eventually extended that to the Horizon ASICs for WANs. At the height of the dot-com boom in early 2000, Willenz cashed out and sold Galileo to Marvell for $2.7 billion.
Among the many companies that Willenz has invested in with that money and helped propel up and to the right was Habana Labs, the AI accelerator company that Intel bought for $2 billion in 2019, the above mentioned Ethernet switch ASIC maker Xsight Labs, and Annapurna Labs, which ended up inside of AWS. Guy Koren, Erez Sheizaf, and Gal Malach, who all worked at EZChip, a DPU maker that was eaten by Mellanox to create its SmartNICs and that is now at the heart of Nvidia’s DPU strategy, founded Xsight Labs. (Everybody knows everybody in the Israeli chip business.) Willenz is the link between them all, and has a vested interest in flipping Xsight Labs just as he did Galileo Technology and Annapurna Labs (and no doubt hopes to do with distributed flash block storage maker Lightbits Labs, where Willenz is chairman and investor).
Provided the price is not too high, it seems just as likely to us that AWS will buy the Xsight Labs team as it is to be building its own team from scratch. And if not, then maybe AWS has considered buying Innovium, which is also putting 400 Gb/sec Ethernet ASICs into the field. With its last round of funding, Innovium reached unicorn status, so its $1.2 billion valuation might be a little rich for AWS’s blood. A lot depends on how much traction Innovium can get selling Teralynx ASICs outside of whatever business we suspect that it is already doing with AWS. Oddly enough, that last round of money may make Innovium too expensive for AWS to buy.
If you put a gun to our heads, we think AWS is definitely going to do its own network ASICs. It is just a matter of time for economic reasons that include the company’s desire to vertically integrate core elements of its stack. This may or may not be the time, despite all the rumors going around. Then again, everything just gets more expensive with time and scale. Whatever is going on, we suspect we will hear about custom network ASICs at some point at re:Invent – perhaps even this fall.