We were complaining a few weeks ago that Intel had not put out a server processor roadmap of any substance in a long time, and instead of just leaving it at that, we created our own Xeon SP roadmap based on rumors, speculation, hunches, and desires. In the absence of a proper roadmap, someone has to try to figure out what might happen on the server front, and we figured readers would pipe up if we went first.
The engineers who design and the marketeers who sell Intel’s Xeon SP processors were mostly amused, and somewhat annoyed by, this stunt – and make no mistake, we know it was a stunt designed explicitly to get people talking and maybe get some real roadmaps out of Intel.
Today, at the Investor Day event that Intel is holding mostly online – there are some onsite events for actual Wall Streeters – the company did indeed put out an official, updated, and wholly real roadmap for the Xeon SP server CPU lineup. And while the real Intel Xeon SP roadmap does not have a lot of detail on it, it does have two interesting things going on that you need to be aware of. We touched base with Ronak Singhal, senior fellow and chief architect for Xeon roadmap and technology – which is a funny way of saying that Singhal bridges between customers, Intel marketing, and Intel chip designers to get the right server CPUs into the field – to get a little more insight into the roadmap that was shown.
The first new item is that the Intel 3 process (what we might have in the past called a 5 nanometer process) is coming along so well in the labs that it is being moved forward a year from 2025, with the speculated “Diamond Rapids” Xeon SP v7 processor, to the now officially acknowledged “Granite Rapids” Xeon SP v6 CPUs due in 2024.
“We are updating Granite Rapids to be on Intel 3, and this speaks to the health of Intel 3 being good enough that we can intercept it now with Xeons in 2024, versus our prior plans,” Singhal tells The Next Platform. “And Granite Rapids, as you expected, will be on a new platform, different from what Sapphire Rapids and its follow-on Emerald Rapids uses, and we are not yet talking about the details of that platform. And as you can image, there will be things that follow after Granite Rapids.”
And so, without further ado, here is the official Xeon server chip roadmap as it stands today:
It is not clear what happened to the Intel 4 process (what we would have called a rev on the 7 nanometer process), but clearly if Intel is going to assert undeniable and unquestioned “transistor performance per watt leadership” by 2025, as it has promised, it needs to leapfrog its own process roadmap at Intel Foundry Services to catch up with what AMD and the Arm collective will be doing with transistor etching techniques from Taiwan Semiconductor Manufacturing Co. It is arguable that Intel 3 will not be able to keep pace, but jumping once process ahead one year ahead is a step in the right direction.
The roadmap above is also the first time that Intel has acknowledged the existence of the “Emerald Rapids” CPU complex for servers, and we think this is just a rev on Sapphire Rapids with more of the cores on each core chiplet activated and tweaks on the cores as well as other features that wrap around them.
This stands to reason. With the prior Xeon E5 and Xeon SP chips, Intel offered three unique chips – the Low Core Count (LCC) variant with 10 cores on a mesh, the High Core Count (HCC) variant with 18 cores on a mesh, and the Extreme Core Count (XCC) variant with 28 cores on a mesh. We think the “Ice Lake” Xeon SP v2 announced last year had a 40-core Ultra Core Count variant that looks like a Sapphire Rapids layout with a 10-core compute block, but all laid out on one monolithic chip. Take a look:
If you rejigger the placement of the UltraPath Interconnect (UPI) and PCI-Express controllers a bit and snap this Ice Lake chip in half vertically and then snap those two pieces in half horizontally, you would get something very much like Sapphire Rapids, but with an older core. For all we know, that was the plan – or one of the many plans – at Intel as it kept changing its roadmap.
We think that the chiplet going into Sapphire Rapids and Emerald Rapids Xeon SP v4 and v5 processors has 18 cores, but with Sapphire Rapids, due to yield issues with the SuperFIN 10 nanometer Intel 7 process, only a maximum of 14 of these cores can be turned on. (At least four cores have boogers on them, so to speak.) With Emerald Rapids, Intel will be using a more refined SuperFIN 10 nanometer process plus a new “Raptor Cove” core, and the yield will go up and we think 16 of the 18 cores will be available, for a maximum of 64 cores per socket. (If the yields really improve, 72 cores in theory could be available in Emerald Rapids Xeons.)
Both Sapphire Rapids and Emerald Rapids plug into the same “Eagle Stream” server platform, which supports DDR5 memory and PCI-Express peripheral slots, like this:
The move of Granite Rapids to Intel 3 probably doesn’t change the shape of the chip much, but it certainly will change the thermals and cost per transistor. We think Diamond Rapids in 2025 will stay on a refined Intel 3 process, but it may try to jump the process gun again. We shall see. The heat is clearly on with AMD fielding excellent processors, Nvidia and Ampere Computing entering the field with Arm server chips, and Amazon Web Services doing its own Graviton family of Arm chips. Intel cannot afford to be a foundry laggard here anymore, as we showed here.
One other thing: Singhal confirmed that there would not be an Advanced Platform version of the Sapphire Rapids chip because the “Cascade Lake” AP-9200 platform, where two whole Cascade Lake Xeons were put into a single socket, was done to get high memory bandwidth for a certain compute density. With the HBM variant of Sapphire Rapids, Intel can jack up the bandwidth without having to jam two processors into a socket.
The second interesting thing on the Intel Xeon server chip roadmap – and this is something that we have all been wondering about but we did not get into as part of our speculative roadmap story – is that Intel is indeed going to create Xeon SP CPU complexes that make use of its so-called E-core (short for energy efficient, and derived from the Atom line of processors). Up until now, Xeon processors, with the exception of the Xeon Phi “Knights” family of many core processors, have all used the heavy Xeon core, what is now called a P-core (short for performance core) designs.
You can put around four of the E-cores into the same space as a P-core, so that tells you how many more threads you can get into a socket in theory with the “Sierra Forrest” Xeons, which will also be etched with the same 5 nanometer Intel 3 processes used for Granite Rapids. Intel is not saying what it will do, but Singhal did say that the use cases for what we might call the Xeon EP family would evolve as customers see them and deploy them. Even though the Sierra Forrest and Granite Rapids machines will use the same “Mountain Stream” server platform, you cannot mix and match the P-core and E-core processors inside of a single node, and unlike Intel’s client processors, Intel does not seem to be inclined to mix P-cores and E-cores in the same chiplet or socket. The idea is that the Granite Rapids and Sierra Forrest processors will has the same memory channels and I/O and the same wattage range, and customers can decide which one to put where in their fleets. The socket scalability could change, with Xeons with P-cores going from 2 to 8 sockets and Xeons with E-cores maxxing out at 2 or 4 sockets. That remains to be seen.
And don’t expect more than two types of cores, either. “Today. I believe that two provides the right amount of coverage,” Singhal explains. “My concern is this: If you had a third core between the P and the E, the thing in the middle always tends to get squeezed by the thing of the top coming down and the thing at the bottom coming up. So to have three – I’m not saying it can’t happen, or it won’t happen. But today, we are focused on two cores.”
And finally, this does not spell the end of the Xeon D processors, famously used by Facebook and a few other hyperscalers as well as some switch and storage array makers. That line will continue to evolve, even though it is not on this roadmap.
“The heat is clearly on with AMD fielding excellent processors…”
Intel showed a slide with SPR-HBM performing CFD openFOAM benchmark at 2x the Milan-X performance. Does AMD have zen4 processors on the roadmap with in-package HBM? If not, how will AMD compete with what Raja described as a 4X memory bandwidth advantage vs Ice Lake Serverr?
Perhaps many solvers and datasets don’t fit in the relatively tiny 64 GB.
Yep, 64GB is a joke for many workloads
It’s telling that I’ve only heard of HBM actually deployed on server processors in the context of HPC, and even then it’s pretty rare. I’m not aware of a64fx selling any sizable clusters beyond Fugaku. More bandwidth is great, but 64GB of fast ram, on a processor with 64+ cores means <1GB per core, which wasn't that much 20 years ago.
Could not agree more. Get it into the range of 256 GB or 512 GB and maybe I will be impressed. And by then, they will have 128 or 192 cores against it. I want eight HBM controllers and 16 high stacks and eight stacks per chip and fat fat fat memory chips. With 16 GB chips, that is 2 TB of memory and a stupid amount of bandwidth. Can you imagine having 16 GB per core and 6.4 TB/sec of aggregate bandwidth? Against 128 cores, that would be 16 GB per core and 50 MB/sec per core.
What does an Nvidia A100 get? It has six banks of memory with 96 GB, but only 80 GB and five controllers are activated, against a total of 128 SMs, of which only 108 are activated. That’s 758 MB per SM and only 19 MB/sec per SM.
…and the nvidia gpu is all userspace (in the hpc context), and codes that are too big fall back to the cpu memory.
Knights Landing had the option of both HBM and ddr Dimms, which sounds like the best of both worlds. I’m sure there was a die-area penalty paid for this, of course. Also, it meant you had to constantly figure out if you wanted the HBM to be a cache for the DDR, or if you wanted them separately managed address ranges. (With a reboot if you switched) No good technology goes unpunished.
Typo: Sierra Forrest processors will has the same memory channels
Should be will have the same memory channels
Correct! I changed my mind halfway through the sentence.
I think this is more to pacify investors and the stock market if anything. I’d be extremely surprised if Intel3 (5nm) is really on track for 2024 like they say it is – they took so long to even get to 10nm it’s unlikely. EPYC on the other hand will probably be fully transitioned to ‘proper’ 3nm by that point, so still well ahead.
Intel’s parts are still barely competitive with AMD’s and Intel’s security mis-steps in the past couple of years (followed by the performance losses from necessary patches) badly damaged their reputation for quality. Nevertheless, while AMD sources all of its processors from an area with significant geopolitical risks, Intel seems like the best bet for any large scale and/or long term design work. Once TSMC has its planned new FABs operational in the US and Europe, it will be a more level playing field for AMD and Intel. Currently, China could get huffy and AMD would be out of the CPU business 15 minutes later – then what happens to AMD’s customers, OEM and enduser?
“Amr” -> “Arm”, and “Intle” -> “Intel”.
I kinda like Amr. And Intle. One is a dating site for chip nerds in the UK and the other is a unit of integer performance.
Intel always has a plan. The problem of the past 5-10 years has been executing the plan…
Did they mention anything about the missing CXL.mem support in the roadmap?