As we pointed out recently, there has been a certain amount of tumult and change in recent months in the Arm server processor space. Whatever is going on, Ampere Computing is full speed ahead on its roadmap and is just as happy that Marvell and presumably Nuvia, just acquired by Qualcomm and we think not at all interested in taking on Intel and now AMD in the server CPU market, have left the field.
Marvell did not really explain its exit, but had been dancing towards the door all last summer and fall. The company revealed the feeds and speeds of the 96-core, 384-thread “Triton” ThunderX3 chip server chip during the Hot Chips conference, which we drilled down into in August, and by the end of September, as Marvell was talking up its semi-custom chip business for customers, it said that the ThunderX3 would only be available as a semi-custom part, sold through direct engagements only, and would not be launched with a broad SKU stack revealed to the public and open to all buyers. The word on the street is that Marvell has canceled the ThunderX3 chip entirely (although all the intellectual property is done if you want to buy some chips) and that a lot of the design team has moved over to Microsoft, which is rumored to be designing its own chips for servers as well as working with Ampere Computing.
Ampere Computing, of course, is a reconstituted and revitalized Arm server CPU effort that has former Intel president Renee James as its chief executive officer, who secured funding from The Carlyle Group and whose initial eMAG chip was based on the intellectual property of the former Applied Micro and its “Skylark” X-Gene 3 project. Since then, Ampere computing has added a whole bunch of chip designers and marketing people from Intel and other companies and put together the Altra line of Arm server processors from scratch and a roadmap going out several years. The 80-core “Quicksilver” Altra chip is Ampere Computing’s first homegrown chip, and it follows the philosophy of providing cores with no threads and absolutely deterministic performance as opposed to all of this clock and power scaling and threading that can mess with the repeatability of performance, and the 128-core “Mystique” Altra Max processor is in the works for this year. These chips are both etched with 7 nanometer processes at Taiwan Semiconductor Manufacturing Corp, and follow-on chip code-named “Siryn” is under construction that will come out in 2022 and that uses TSMC’s 5 nanometer transistor manufacturing.
Given the situation, we thought it would be a good idea to talk to Jeff Wittich, senior vice president of products at Ampere Computing, about what is going on with Nuvia and what it all might mean for the prospects for Arm servers in the datacenter and at the edge. Perhaps – and we are just conjecturing here – what Nuvia was going to try to do would work well in a client device, but maybe not so well in a server, and hence Qualcomm snapped up Nuvia for $1.4 billion to build out its Snapdragon processors as well as eliminate potential competition for Arm chips for clients. Like the rest of us who heard the Nuvia pitch – remember, it was explicitly and only attacking the server market – Wittich is perplexed by Nuvia. But not by what Ampere Computing is doing.
“We talk a lot about the fact that our architecture is all about scaling out with a lot of cores, with a good amount of performance in every core, because this makes a lot of sense in the cloud if a core is your unit of compute,” Wittich explains. “I always thought that the design point that Nuvia had been talking about didn’t make a lot of sense because it involved some pretty big cores and that means they would not be able to scale out to create a high performance cloud. The fact that Qualcomm is presumably using it for other applications and usages makes a lot of sense, and that confirms our respective design points, like the M1 from Apple, which is a great design for a client. Both are very complementary to us: They can have a really high performance client design, and they can use Ampere in the cloud.”
The issue now is who is going to vertically integrate their clouds, designing their own CPUs and DPUs and maybe other kinds of accelerators, and who is going to buy parts from third parties, design systems, and ship them off to ODMs for manufacturing? Amazon Web Services definitely wants to control a lot of its server compute fate, but as a cloud hosting applications for millions of corporations, these are only good for internal workloads or ancillary work, like a DPU, or a small portion of the base that wants the better price/performance of an Arm server and has a software stack that they control that runs on it.
The hyperscalers and cloud builders may design and contract out their own Arm server chips, or they may rally behind Ampere Computing at this point. (Marvell could do something really interesting and create an open source ThunderX3 chip. But we would not hold our breath on that one. . . .) We think that some of these homegrown compute engine projects are as much about providing leverage to buy commodity parts like Intel CPUs and Nvidia GPUs as they are providing better bang for the buck for tailored architectures. But, as Wittich points out, we have not as yet seen a true horizontal Arm server CPU player as yet, so we cannot presume that it won’t happen or that vertical integration among the hyperscalers and cloud builders – where they design and build their own processors and have absolute control – is the necessary end state.
“I think there is still a strong desire for a CPU that can be leveraged across the datacenter space,” Wittich says. “That is what Intel is doing, and that is what AMD is doing. There hasn’t been an Arm player who has actually built an Arm server that has the highest performance and that gets traction, and sticks with it long enough to get the wins. This is why there is this gravitation at times towards building their own. However, the horizontal play makes a ton of sense and creates a ton of value with x86 processors, particularly in ecosystem creation, and that is also valuable with Arm processors. Until Ampere Computing, no one has delivered, stuck with it, and then actually gotten some adoption.”
Oracle is the first big buyer of Altra processors, which it will be deploying on its eponymous cloud, and another one of the big clouds will be deploying Altra chips “over the next couple of months” and “a few other announcements are pending.” There are other design wins in the works in the United States and China, according to Wittich, and more server makers in addition to the “Mount Jade” two-socket Altra system that customers and benchmarkers have been playing with for the past several months and the single-socket Mount Snow” server that is coming from Gigabyte.
The Mount Jade system is made by Taiwanese ODM WiWynn. You can see early benchmarks on this machine from Serve The Home here and from Anandtech there.) The Mount Jade machine has a pair of the top-bin Q80-33 processors, which have those 80 cores running at 3 GHz with a thermal design point of 250 watts each but burns more like 210 watts when running benchmarks such as SPECint_rate, according to Wittich.
The Mount Jade system has Altra 160 cores in total, which are fed by eight memory controllers and up to sixteen memory sticks per socket, for a maximum of 8 TB of main memory fully loaded with 256 GB DDR4 memory sticks; maximum speed is 3.2 GHz. (It is far more likely for this machine to be configured with 32 GB or 64 GB memory sticks, which are a lot cheaper per unit of capacity and which would deliver the same bandwidth into and out of the CPUs for any given number of memory slots used.) The Mount Jade system has one Open Compute 3.0 PCI-Express 4.0 x16 slot, plus two other PCI-Express 4.0 x16 slots that are capable of supporting the CCIX protocol running at 25 Gb/sec for accelerator coupling. (We wonder if these CCIX slots could also be used for NUMA coupling to create an eight-socket system.) The machine also has six PCI-Express 4.0 x8 slots for other peripherals. The system has a pair of 2,000 watt power supplies.
Importantly, the future 128-core Mystique processors will plug into the same slots in the Mount Jade and Mount Snow systems as the current 80-core Quicksilver parts do. This will not be the case with the future Siryn processors, which will support different memory and I/O – presumably DDR5 memory and PCI-Express 5.0 peripherals, but Ampere Computing is not saying – and therefore require a different socket.
One more thing: The 128-core Mystique Altra Max processors are sampling and out the door, and early hyperscale and cloud builder customers are playing with them now. They should be out in volume by the middle of this year.
“I think people are going to be shocked when they see the list of providers who are coming out with services based on Altra,” boasts Wittich.
We certainly hope so, and we expect one of them to be Microsoft. Hence the moves by Marvell to basically mothball ThunderX3. But that is just a guess, and maybe Microsoft will say more at the Open Compute Project Global Summit, which is being pushed out to early November this year and is scheduled to be an in-person event in San Jose.