Sometimes, especially with Intel in the past several years, the fact that the CPU roadmap doesn’t change is the news. And in this case, during Intel’s Innovation 2023 event hosted this week in its stomping grounds of San Jose, the Xeon SP news is good and the messaging around its converged GPU-AI accelerator roadmap is being firmed up.
It was Intel chief executive officer Pat Gelsinger, one of the second generation of leaders at the chip designer and manufacturer who learned his trade directly from the company’s co-founders – Gordon Moore, Robert Noyce, and especially Andy Grove – who walked everyone through this new “siliconomy” that, as he put it, drives a $574 billion industry that in turn power a global tech economy that is worth almost $8 trillion. Those numbers look like the personal and corporate compute, storage, networking, and datacenter spending that in turn drives the all of the software and IT services suppliers as well as the telcos and service providers and the hyperscaler advertising and search businesses and maybe things like online retail. Gelsinger was not precise in what is included in this siliconomy, but wanted to make sure developers know that they “rule” in this new era.
It seems to us that developers define the rope and then help pull it as well as maintain it, with a relative handful of developers who started software firms that have become behemoths, or who started hyperscalers and massive clouds, sitting up on high directing massive shifts in the economy at their whim and pleasure. It is pretty clear who is ruling and who is not. It is not the mass of several tens of millions of developers in the world, although the way they have transformed the world collectively – and utterly – is absolutely undeniable.
With that, let’s talk about datacenter compute engines from Intel, starting with the Xeon SP lineup:
The big news is that the “Sierra Forest” variant of the Xeon SP based on Intel’s “energy efficient” E-core will have a doubled-up configuration with a whopping 288 cores on the die.
Sierra Forest will plug into the same “Birch Stream” server platform as the future “Granite Rapids” Xeon SP with “Redwood Cove” P-cores due in 2024 and the follow-on E-core chip called “Clearwater Forest” due in 2025, based on an un-codenamed core, will plug into. (The “Mountain Stream” alternative platform for these three server CPUs was not mentioned in Gelsinger’s presentation slide shown above.)
Here is what Sierra Forest looks like as Gelsinger held it up during his Innovation 2023 keynote:
We are still working on our deep dive on the E-core and P-core architectures, but the E-cores, as we have reported back in March of this year, do not include AVX-512 vector units, AMX matrix units or two threads per core using Intel’s HyperThreading implementation of simultaneous multithreading.
The Sierra Forest chips are etched with a 5 nanometer extreme ultraviolet (EUV) process called Intel 3, the name of which is meant to invoke comparisons to the 3 nanometer 3N processes from Intel foundry rival Taiwan Semiconductor Manufacturing Co. Argue amongst yourselves about whether Intel 3 is 5 nanometer or 3 nanometer – we only know what the old roadmaps used to say about process geometries.
Up until now, Intel has said that the Sierra Forest chip would have 144 cores and six DDR5 memory channels, and Intel’s engineers figured a way to cram two of these Sierra Forest chiplets on a single Birch Stream socket, yielding a beast socket with 288 cores and a dozen DDR5 memory channels.
The P-cored Granite Rapids Xeon SPs are, like the Sierra Forest chips, expected sometime in 2024, with Sierra Forest being promised for 1H 2024 and Granite Rapids coming shortly after that. Granite Rapids is also etched using the Intel 3 process. Intel has been vague with any feeds and speeds for Granite Rapids thus far, and has only said that it will be a fast follower behind Sierra Forest. Which probably means May or June 2024 but that is just a guess. And Intel is not talking about Granite Rapids P-core chips right now too much because it doesn’t want to step on the follow-on of the current “Sapphire Rapids” Xeon SP v4 processors, which were only launched in January and which we now know will be replaced by the “Emerald Rapids” Xeon SP v5 kickers that are slated to be launched on December 14.
That means the hyperscalers and cloud builders have had these chips for months already. . . . Emerald Rapids is etched using a refined Intel 7 (akin to a super-refined 10 nanometer process in some ways, but meant to invoke 7 nanometer comparisons) process that was used on the current Sapphire Rapids Xeon SP v4 chips.
Ronak Singhal, Intel Fellow and chief architect for the Xeon SP line, tells The Next Platform that the Emerald Rapids chips will have a modest core count increase (we think from 60 cores to 64 cores) plus higher DDR5 memory speeds and higher UltraPath Interconnect (UPI) links between processor sockets in multi-socketed systems. There are some microarchitecture changes in the “Raptor Cove” cores used in Emerald Rapids CPUs compared to the “Golden Cove” cores used in the Sapphire Rapids chips, but we don’t know what they are as yet. Gelsinger did say that Emerald Rapids would provide up to 40 percent more performance on key workloads like AI as Sapphire Rapids in the same thermal envelope.
“I remember when we produced the first four core products,” Gelsinger said in the keynote, and we do too, with the “Nehalem” Xeon E5s in March 2009 when the Great Recession was hitting the global economy pretty hard. “288 cores – wow, I must be getting old or something – this is really incredible gains. 2024 is shaping up to be a really, really good year for the CPU and our Xeon customers.”
As for performance, Gelsinger said the Sierra Forest E-core processors are slated to offer 2.5X higher compute density in a rack and 2.4X better performance per watt over comparable Sapphire Rapids Xeon SP v4 chips, and the Granite Rapids Xeon SP v6 P-core processors are expected to offer 2X to 3X better AI performance than the Sapphire Rapids chips. Looks like there is an AMX matrix math boost coming.
Intel has not said jack about the “Diamond Rapids” Xeon SP v7 kicker to Granite Rapids that we expect in 2025. But if the Clearwater Springs Xeon SP v6 is coming in 2025 using Intel’s future Intel 18A process, we see no reason why Diamond Rapids should also not shift from Intel 20A down to Intel 18A (just like Granite Rapids moved from Intel 4 down to Intel 20A) and get some of the benefits of a process shrink.
That brings us to the convergence of the Max GPU line and the Habana Gaudi matrix math accelerator line, which we have talked about before. There is not a lot to say here, but it confirms what we have been saying.
Intel is selling the 7 nanometer Gaudi 2 matrix engines now, which as Gelsinger reminded us yet again can compete on price and performance for a lot of AI workloads, and it still has a 5 nanometer shrink and architectural improvement that has taped out in the Gaudi 3 engine. But after that, Intel will be converging its GPU and NNP lines, and it looks like it will use the matrix math engines and software from Habana as well as its integrated Ethernet networking alongside its Xe GPU vector engines to create the future “Falcon Shores” compute engine. This is not the hybrid CPU-GPU compute engine that Intel had originally promised with Falcon Shores, but it is the one it can deliver and the one that it can converge its current (however few) GPU and NNP customers to. But make no mistake – Falcon Shores is a GPU, not an NNP.
Intel is excited that its Developer Cloud is up and running with the Gaudi 2 devices as well as Sapphire Rapids CPUs, the HBM variant Max Series CPUs, and the “Ponte Vecchio” Max Series GPUs. And it was pleased to announce that it was building a hybrid CPU-NNP cluster with 4,000 Gaudi 2 devices doing the math and was working with startup Stability.ai – the ones who created the Stable Diffusion generative AI for images – as the anchor customer for what Gelsinger called a “top 15” AI supercomputer.
It doesn’t look like Stability.ai is buying it, but more like buying time on it, and it is not clear if this will be part of the Intel Developer Cloud or run by a third party service provider. At the moment, Stability.ai is using the Ezra-1 UltraCluster on Amazon Web Services to run its Stable Diffusion platform, which it bills as “the world’s fifth-largest supercomputer.”
We have said it before, and we will say it again. If you can etch a matrix math engine with a decent price and decent thermals, and it can run TensorFlow or PyTorch, then you can sell it – or better still, rent it.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.