After many years of rumors, Microsoft has finally confirmed that it is following rivals Amazon Web Services and Google into the design of custom processors and accelerators for their clouds. That confirmation came today as Satya Nadella, Microsoft’s chief executive officer, announced the Cobalt 100 Arm server CPU and the Maia 100 AI accelerator chip.
The move is a surprise to precisely no one, because even if Microsoft does not deploy very many of its own chips, the very fact that they exist means it can negotiate for better pricing from chip makers Intel, AMD, and Nvidia. It is like spending hundreds of millions of dollars to save billions, which can be reinvested back into the infrastructure, including further development of homegrown chippery. Particularly at the relative high cost of X86 server CPUs and the outrageous pricing for Nvidia “Hopper” H100 and H200 GPU accelerators and, we presume, for the forthcoming AMD “Antares” Instinct MI300X and MI300A GPU accelerators. With supplies limited and demand far in excess of supply, there is no incentive at all for AMD to undercut Nvidia on price with datacenter GPUs unless the hyperscalers and cloud builders give them one.
Which is why every hyperscaler and cloud builder is working on some kind of homegrown CPU and AI accelerator at this point. As we are fond of reminding people, this is precisely like the $1 million Amdahl coffee cup in the late 1980s and the 1990s when IBM still had a monopoly on mainframes. Gene Amdahl, the architect of the System/360 and System/370 mainframes at IBM founded a company bearing his name that made clone mainframe hardware and that would run IBM’s systems software, and just having that cup on your desk when the IBM sales rep came to visit sent the message that you were not messing around anymore.
This is one of the reasons, but not the only one, that a decade ago, Amazon Web Services came to the conclusion that it needed to do its own chip designs because eventually – and it surely has not happened yet – a server motherboard, including its CPU, memory, accelerators, and I/O – will eventually be compressed down to a system on chip. As legendary engineer James Hamilton put it so well, what happens in mobile eventually happens in servers. (We would observe that sometimes the converse is also true.) Having an alternative always brings competitive price pressure to bear. But more than that, by having its own compute engines – Nitro, Graviton, Trainium, and Inferentia – AWS can take a fill stack co-design approach and eventually co-optimize its hardware and software, boosting performance while hopefully reducing costs to, thus pushing the price/performance envelope and stuffing it full of operating income cash.
Microsoft got a later start with custom servers, storage, and datacenters, but with the addition of the Cobalt and Maia compute engines, it is becoming a fast follower behind AWS and Google as well as others in the Super 8 who are making their own chips for precisely the same reason.
The move by Microsoft to design its own compute engines and have them fabbed was a long time coming, and frankly, we are surprised it didn’t happen a few years ago. It probably comes down to building a good team when everyone else – including a few CPU and a whole bunch of AI chip startups – is also trying to build a good design team and get in line at the factories run by Taiwan Semiconductor Manufacturing Co.
“Being the world’s computer means that we need to be even the world’s best systems company across heterogeneous infrastructure,” Nadella explained in his opening keynote at the Microsoft Ignite 2023 conference. “We work closely with our partners across the industry to incorporate the best innovation from power to the datacenter to the rack to the network to the core compute, as well as the AI accelerators. And in this new age of AI, we are redefining everything across the fleet in the datacenter.”
Microsoft has wanted an alternative to the X86 architecture in its fleet for a long time, and way back in 2017 it said its goal was for Arm servers to be 50 percent of its server compute capacity. A few years back, Microsoft was an early customer of Cavium/Marvell with its “Vulcan” ThunderX2 Arm server CPUs and was on track to be a big buyer of the “Triton” ThunderX3 follow-on CPUs when Marvell decided in late 2020 or early 2021 to mothball ThunderX3. In 2022, Microsoft embraced the Altra line of Arm CPUs from Ampere Computing, and started putting them in its server fleet in appreciable numbers, but all that time there were persistent rumors that the company was working on its own Arm server CPU.
And so it was, and so here it is in Nadella’s hand:
We don’t know what Microsoft has been doing all of these years on the CPU front, but we do know that a group at the Azure Hardware Systems and Infrastructure (ASHI) team designed the chips. This is the same team that developed Microsoft’s “Cerberus” security chip for its server fleet and its “Azure Boost” DPU.
The company provided very little in the way of details about the internals of the Cobalt server chip, but the word on the street is that the Cobalt 100 is based on the “Genesis” Neoverse Compute Subsystems N2 intellectual property package from Arm Ltd, which was announced back at the end of August. If that is the case, then Microsoft is taking two 64-core Generis tiles with the “Perseus” N2 cores with six DDR5 memory controllers each and lashing them together in a single socket. So that’s 128 cores and a dozen memory controllers, which is reasonably beefy for 2023.
The “Perseus” N2 core meshes scale from 24 cores to 64 cores on a single chiplet, and four of these can be ganged up in a CSS N2 package to scale to maximum of 256 cores in a socket using UCI-Express (not CCIX) or proprietary interconnects between the chiplets as customers desire. The clock speeds of the Perseus cores can range from 2.1 GHz to 3.6 GHz, and Arm Ltd has optimized this design bundle of cores, mesh, I/O, and memory controllers to be teched in 5 nanometer processes from TSMC. Microsoft did confirm that the Cobalt 100 chip is indeed using these manufacturing processes. Microsoft said that the Cobalt N2 core would offer 40 percent more performance per core over previous Arm server CPUs available in the Azure cloud, and Nadella said that slices of Microsoft’s Teams, Azure Communication Services, and Azure SQL services were already running atop the Cobalt 100 CPUs.
Here is a shot of some racks of servers in Microsoft’s Quincy, Washington datacenter using the Cobalt 100 CPUs:
Nadella said that next year, slices of servers based on the Cobalt 100 will be available for customers to run their own applications on.
The Maia 100 AI chip is probably the one developed under the code-name “Athena” that we have been hearing about for more than a year and that we brought up recently as OpenAI, Microsoft’s large language model partner, was rumored to be looking at creating its own AI accelerator, tuned specifically for its GPT generative AI models. This may have all been crossed wires and Athena is the chip the rumors about OpenAI were referring too, or maybe OpenAI is hedging its bets while also getting Microsoft to tune up an AI engine for GPT. Microsoft has been working on an AI accelerator for about four years, if the scuttlebutt is correct, and this may or may not be the one it intended to do back then.
Here is the Maia 100 AI accelerator chip that Nadella held up:
What we can tell you is that the Maia 100 chip is based on the same 5 nanometer processes from TSMC and includes a total of 105 billion transistors, according to Nadella. So it is no lightweight when it comes to transistors or clock speed. The Maia 100 chip is direct liquid cooled and has been running GPT 3.5 and is powering the AI copilot that is part of GitHub right now. Microsoft is building up racks with the Maia 100 accelerators and will be allowed to power outside workloads through the Azure cloud next year.
One of the neat things about the Maia effort is that Microsoft has designed an Open Compute compatible server, which holds four of the Maia accelerators, that slides into the racks it has donated to OCP and has a companion sidekick rack that has all of the liquid cooling pumps and compressors to keep these devices from overheating and allowing them to run hotter than they otherwise might with only air cooling. Take a look:
The Maia 100 is designed to do both AI training and AI inference, and is optimized for large language models – and presumably, given how fast this part of the IT industry is changing, is going to be able to support other models besides flavors of OpenAI’s GPT.
The other interesting thing is that Microsoft is going to be using Ethernet interconnects to lash together the Maia accelerators, not Nvidia’s InfiniBand.
We will be poking around to get more details on the Cobalt and Maia compute engines.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.