We have a bad case of the silicon shakes and a worsening deficiency in iron here at The Next Platform, but the good news is that new CPU processors from AMD and Intel are imminent, and more processors are expected later this year from IBM and Ampere Computing, too. There will be others.
It was strange indeed for us a month ago to scan through the advanced program for the International Solid State Circuits 2021 virtual conference and not see one single abstract relating to a server CPU, or indeed any CPU we recognized from one of the major suppliers. There was an eight-core RISC-V “vector machine” running at 1.44 GHz implemented in 16 nanometer process technology out of the University of California at Berkeley, which we will circle back and find out more about.
Now, after nearly two years of waiting, we see that AMD is going to be announcing its third generation “Milan” Epyc 7003 processors on March 15, a week from now, and that should help cure our shakes a little. AMD president and chief executive officer Lisa Su will be hosting the webcast launch along with chief technology officer Mark Papermaster, general manager of the Datacenter and Embedded Solutions Business Group Forrest Norrod, and general manager of the Server Business Unit Dan McNamara, who joined AMD from Intel last year.
AMD is jumping the gun to get out ahead of Intel, which is widely expected to do its “Ice Lake” Xeon SP processor launch sometime in March or April. We have been expected a March launch, and said as much back in early January when we talked about the epic showdown between the Milan and Ice Lake processors that was coming this quarter. The truth is that these processors have been taped out and manufactured for many months and have been shipping to hyperscalers and cloud builders for several months. This launch is a formality for the biggest buyers and for the rest of us. By April, we will have the lay of the land for X86 compute in the datacenter, and all eyes will shift to the battle that is already brewing between AMD’s future “Genoa” Epyc 7004s and Intel’s “Sapphire Rapids” Xeon SPs, both expected about a year from now.
When the dust settles from the 2021 announcements from AMD and Intel, we expect that AMD will make some market share gains, as it has done, but also that the server market will continue to expand fast enough that Intel does not lose as much share as one might expect given the substantial price/performance differences that prevailed in the comparisons between the AMD “Rome” Epyc 7002s and the two generations (more like one and a half generations?) of “Cascade Lake” Xeon SPs. We expect that the gap will remain largely the same, but that really depends on the level of aggression that AMD and Intel expect from each other as they circle each other, wrists lashed together on one hand and knives drawn in the other.
AMD has been meeting or beating Intel’s performance per socket and, at list price at least, cleaning its clock cycles on bang for the buck with the Rome lineup compared to Cascade Lake, and we don’t think that gap is going to change much. And we also think that Intel has been able to do deals with the hyperscalers and cloud builders (and therefore their ODM partners) as well as some of the OEMs (who sell both Xeon SP and Epyc gear these days) to win deals and extend its revenues. We suspect that there are a lot of deals that involve the bundling of motherboards and network interface cards and perhaps even enterprise flash or an FPGA, that allow Intel to keep the CPU price high. But this amounts to a discount on the CPU price, no matter what the bean counters might say.
Back when I was a cub reporter when giant sloths, sabretooth cats, and mastodons roamed the Earth, replacing Triceratops and Brontosaurus and Tyrannosaurus Rex, my editor asked me this question: If you call a tail a leg, how many legs does a dog have? And the answer is and always is forever: Four, because calling a tail a leg does not make it a leg. Unless, of course, you are talking about that three-legged dog named Lucky, but that is a different joke. (That said, Lucky didn’t lose his tail, so maybe to some Lucky still has four legs.) This tail-leg riddle is a good principle to apply when analyzing the IT market, that is for sure. There are a lot of free-floating matrix math engines that are being called AI coprocessors, for instance. But I digress. We were talking about server CPUs here.
It is hard to say, but we were pretty sure last summer that the Ice Lake chips would be capped be capped at 28 cores per die with the small possibility of a very hot and very expensive dual-chip module with 56 cores, and that the Milan chips will be capped at 64 cores. We thought, therefore, that there could be a kind of detente in core counts and the differentiation would shift to the underlying cores. And therefore we expect some pretty big performance leaps, in terms of instructions per clock or IPC, from both AMD and Intel. But we remembered (after this tory first ran) that Intel’s Trish Damkroger, vice president and general manager of its High Performance Computing Group, was showing off a 32-core Ice Lake Xeon SP chip in her SC20 keynote last November:
Intel has not said what the top-bin core counts will be with Ice Lake, but there is a good chance that it will rise above 32 cores. Which probably means there will not be an Ice Lake-AP dual-chip module given the performance increases expected with Milan and the core count and per core performance increases expected with Ice Lake based on the data in the chart above. The X86 server race will be a-foot.
Next year with Sapphire Rapids Xeon SPs, Intel will used an advanced 10 nanometer processes and crank up its DLBoost mixed precision on its AVX-512 vector engines and will also add something called Advanced Matrix Extensions. With Genoa in 2022, AMD will shrink from 7 nanometers to 5 nanometers; both companies will be doing pretty substantial changes in their cores, too, and we may even see a true chiplet architecture with Sapphire Rapids so Intel can bring the list price per socket down as yields increase on smaller chips as AMD has been doing throughout the Epyc relaunch into datacenter compute that started in 2017.
What we are saying here is that the competition is going to heat up between Intel and AMD, not cool down, and as before, if the economies of the world stay reasonably stable and compute remains a scare and valuable commodity, both companies should be able to profit.
Considering that X86 processors comprise the vast majority of shipments in the server racket, the AMD Milan and Intel Ice Lake chips are pretty much going to be the big excitement for the year when it comes to CPU compute for servers. But there will be other battles starting out on the periphery.
As we have previously reported, Ampere Computing has been sampling its 128-core “Mystique” Altra Max processor to hyperscalers and cloud builders since the beginning of this year and will be shipping them in volume by the middle of this year. We look forward to digging into these chips and how they stack up to the AMD Milan and Intel Ice Lake processors, as well as to the Power10 chips that IBM plans to launch at the end of the year.
Big Blue plans to put the high-end 15-core Power10 chip with SMT8 threading (eight threads per core) into the field sometime in the fall of 2021 and it will only be available in its four-socket and larger NUMA boxes. (We did a deep dive on the Power10 architecture and possible system designs based on the chip back in August 2020.) That Power10 really has 16 cores in total, but IBM is using Samsung Electronics for its 7 nanometer fabbing and having a new core and a new foundry at the same time is a bit risky, so Big Blue is assuming one of the cores will be a dud from the get-go given the hefty size (602 square millimeters) the chip.
In any event, the Power E1050, as we are calling it, will have 30 SMT8 cores per dual-chip socket running at around 3.5 GHz, only a 12.5 drop from the 4 GHz target design of the Power10 chip. (IBM has been doing DCMs with the Power chips since 2005, so this is nothing new to Big Blue.) This machine will have a total of 120 cores and 128 DDR5 memory slots using the OpenCAPI Memory Interface (OMI) architecture we talked about at length here. This machine will also have 64 lanes of PCI-Express 5.0 I/O to the outside world and a total of 192 lanes of interconnect SerDes (some of it used up to create the eight-way NUMA interconnect that has one hop between all processors) for CPUs and various kinds of accelerators. If IBM prices this machine right, and tunes up the Red Hat OpenShift and Red Hat Enterprise Virtualization stacks on it, it could be a formidable container or VM machine as well as a very capable database engine, in memory or otherwise. With 120 cores, it has 960 total threads, and each thread can be a container or a VM or a thread underpinning a database.
IBM looks increasingly interested in shipping a balanced machine with heavy cores and lots of memory bandwidth and is less interested in playing in the Core Wars. We won’t see true 30-core variants of Power10 (which have SMT4 threading, or four threads per core) aimed at the entry and midrange of the server market until 2022. And by then, as we have said, all eyes will be looking ahead to AMD Genoa, Intel Sapphire Rapids, and Ampere “Siryn” server CPUs, the latter of which will be etched with 5 nanometer processes from Taiwan Semiconductor Manufacturing Corp. IBM’s Power11 will be around three years into the future from that point, and heaven only knows what the plan will be for then.
It depends on a lot of things, first and foremost how the legacy AIX and IBM i businesses continue to invest in Power Systems machinery, how Inspur is doing with its own Power machine sales in China (where IBM gave up because of political issues between the United States and China), and how the Red Hat stack is preferentially adopted on Power iron instead of X86 machinery.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Though I’ve made this comment before, I think continued progress for IBM depends on a Power10 workstation with competitive price performance that is available to the hoards of open source developers who made Linux successful. It is the software ecosystem that pushes hardware further upscale as successful companies grow. For open source it is the entry-level hardware market share that drives the software development.
While I understand it’s crucial for IBM to protect established high-end customers with an upgrade path, Red Hat is not an island but instead tightly interlinked with many independent open source projects. Leaders of those projects need to be convinced that Power10 is relevant by market share in order to focus any effort on optimising for that architecture.
Maybe the tapeout of a new design at a new foundry will result in great yields. However, just like the student who aspires for a B grade often fails the course, I think expecting one core out of 16 to be defective on each chip is destined to lead to many chips where less than half the cores work. If so, maybe be those yield problems could be turned into new market share for cost-effective 6 and 8-core workstations.
By all accounts and sources, Sapphire Rapids is a 10nm part, not 7nm
Yes, I knew that and just fixed it.
I just read an article on a gigabyte Cooper Lake Server on servethehome. Those chips doubled the UPI bandwidth in the 4 socket design. They also support up to 18TB of memory per node, including the Optane DIMMs.
So … just curious why no mention of Cooper Lake in this article.
Also … Facebook specified the avx512 bfloat16 requirement for their Zion platform and Intel delivered in Cooper Lake.
We covered Cooper Lake, too. And that was last year. This was about this year coming up.
“AMD has been meeting or beating Intel’s performance per socket and, at list price at least, cleaning its clock cycles on bang for the buck with the Rome lineup compared to Cascade Lake, and we don’t think that gap is going to change much.”
I’m saying … you compare to Cascade Lake, as if there were no Cooper Lake. Cooper Lake did make big changes … doubling the UPI bandwidth, adding bfloat16 processing to double the ai training performance vs fp32.
But not in the two-socket space where AMD is playing.
“…as they circle each other, wrists lashed together on one hand and knives drawn in the other.”
You slay me, Timothy, you really do!
This is one of the funniest serious tech articles I have read in a long time. Well done with the humour!