Intel Rounds Out “Granite Rapids” Xeon 6 With A Slew Of Chips

It is no secret that chip maker Intel is having a tough time these days on a number of fronts, but it is important to remember that nearly two out of every three processors sold into the datacenter are Intel Inside. This is a good business that can be moderately profitable, and Intel can – and often does – compete with rival AMD on the X86 server CPU front, and also can offer benefits not embedded in the various homegrown Arm server CPUs that the hyperscalers and cloud builders have created and have fabbed at Taiwan Semiconductor Manufacturing Co.

Intel launched the “Sierra Forrest” Xeon 6 processors back in June 2024, which are based on its E-core “efficient core” designs that are based on Atom-style cores. These are the first of the Xeon 6 generation, and represent merging of traditional Xeon cores, now called P-cores (short for performance cores), onto a single set of sockets with the same external feeds and speeds and based on the same I/O chiplets within the sockets to provide those feeds and speeds. The first of the “Granite Rapids” Xeon 6 processors based on the P-core compute chiplets were announced in September 2024, the high-end chips aimed mostly at the hyperscalers and cloud builders who want to cram as many cores as possible into a socket to make as few servers as possible for their massive fleets.

The differences between the E-core and P-core server processors happen inside of the chiplets that have the compute cores on them, and this is a sensible division of chippery even if you might argue some of the merits of maintaining the forking of compute between what are still essentially Atom and Xeon cores.

The market will decide what it wants. That’s what markets do.

The rest of the Xeon 6 family gets added today, and Ronak Singhal, an Intel fellow and long-time chief architect of the Xeon line and now its product manager as well, gave us a briefing on the remaining Xeon 6 processors ahead of the launch. The Granite Rapids SP variants, which bear the nomenclature Xeon 6500P and 6700P, are really the core of the Xeon 6 line and the ones aimed at enterprise customers that still prefer Xeon chips by a wider margin than the overall market, which has hyperscalers and cloud builders that have a higher preference for AMD Epyc server CPUs when it comes to X86 architecture processors.

“This is really focused on the broad enterprise and all of the different use cases there, with a specific focus on AI and security and how do we provide our customers with something that improves their investment in their infrastructure, allowing them to consolidate old infrastructure into this new infrastructure that has better capabilities for new workloads and can reduce their power footprint,” Singhal explained when referred to the Xeon 6500P and 6700P processors. “Or as they are looking to launch new capabilities or new services with their infrastructure, why should they choose a Xeon.”

Before we dive into the Xeon 6300P, Xeon 6500P, and Xeon 6700P, leys do some short housekeeping.

First, there is not going to be a big launch for the Granite Rapids Xeon 6900E based on the “Crestmont” E-cores. Intel revealed it was working on a Sierra Forrest chip with up to 288 cores back in September 2023, and Singhal confirmed that the Xeon 6900E is ramping now.

“The 288 core is now in production,” Singhal said. “We actually have this deployed now with a large cloud customer, and when they are ready to talk about what they are doing there, I think it will be pretty interesting. We are really working on that 288 core chip closely with each of our customers to customize what we are building there for their needs. So you are not going to see us talk about it from a broad deployment scenario. It’s really built for those custom cloud scenarios first and foremost.”

Intel also, as promised, rolled out a system on chip variant of the Xeon 6 P-core platform aimed at telcos and other service providers for network and edge use cases. We are not going to spend a lot of time on this right now, being focused on the datacenter as we are.

And finally, a reminder that Intel cut prices on the existing Granite Rapids Xeon 6900P chips with up to 128 cores a few weeks ago. Here is the updated price and price/performance tables for the Xeon 6900P, which is handy when looking at the rest of the Granite Rapids lineup today:

With that, let’s dive into the details on the rest of the Granite Rapids lineup.

The Middle Of The Road

As has been the case for many, many generations of Xeon server processors, Intel doesn’t just etch one big chip and then flesh out the product line based on core and I/O yields on that design. The company designs multiple chips of different sizes because yields are generally far better on smaller chips. And even as Intel has moved into the chiplet era, chiplets each have their own yield curves (smaller is generally better) but fewer chips improves package manufacturing yields. In the Granite Rapids chiplet and socket designs, you see this interplay to maximize yield as well as depth and breadth in the Xeon 6 product line without sacrificing profitability.

The four different Xeon 6 die packages are familiar in name: Ultra Core Count (UCC), Extreme Core Count (XCC), High Core Count (HCC) and Low Core Count (LCC). No matter what type of core chip complex is used, all of the Granite Rapids core complexes are etched in the Intel 3 process (roughly akin to a 3 nanometer process from TSMC) and have one, two, or three of the compute complexes, of which there are three designs. One small core complex has 16 cores, the middle one has 48 cores, and interestingly, the third one that is used in Granite Rapids sockets with more than one core complex has 44 cores, leaving room for interconnects to link the core complexes and their caches to each other to create a virtual monolithic chip.

All of the Granite Rapids chips have a pair of I/O chiplets, which have DDR5 memory controllers, PCI-Express controllers, and various accelerators that were included with prior Xeon 4 and Xeon 5 CPUs and that get updated from time to time with new hashing or encryption algorithms or juiced in some other way. Those accelerators are outlined at the bottom of this salient characteristics table for the Xeon 6 lineup:

The Xeon 6500P and 6700P processors and their platforms announced today scale up to 86 cores, and they support both AVX512 vector unit and AMX tensor unit acceleration, the former of which is important for both HPC and AI and the latter of which could prove a true differentiator for AI workloads and maybe HPC routines in the future.

We find it a bit perplexing that the Xeon 6700P processors that are used in machines with either four or eight sockets, linked by on-chip NUMA clustering technology (so-called glueless NUMA), only have four UltraPath Interconnect (UPI) links on each processor. Granted, these UPI links run at 24 GT/sec, which is amazingly fast. But the Xeon 6900P, which only scales to only two processors in a single NUMA image and therefore does not need a lot of interconnect compared to a NUMA cluster with four or eight processors, has six UPI links running at 24 GT/sec. Ditto for the Sierra Forest Xeon 6700E and 6900E variants, which also have more UPI links but which only scale to a maximum of two sockets.

We would have thought that OEMs making big NUMA machines to run back-end relational or in-memory databases and their applications would have wanted to use the Xeon 6900P for more tightly coupling the sockets together. More links is better since it reduces the hops across the NUMA memory. Six UPI links allows any one processor to directly link to six processors, with a second hop only required to reach a seventh CPU in an eight CPU machine. You can do an eight-way with four links, as Intel has done, by overlaying two four-ways and using the fourth UPI link to cross connect the two four-ways, as is shown in the image above. But, with six links, you could do a glueless machine with 16 sockets in a single NUMA image, too. Which would help Intel’s OEM customers better compete with IBM Power Systems iron.

Why anyone wants an eight eight-core Scalable SKUs to link together in a node is beyond us, but such a machine could have 32 TB of main memory, 4.8 TB/sec of aggregate memory bandwidth, and 64 cores running at 4 GHz base and 4.3 GHz Turbo Boost speed. That is a memory capacity and memory bandwidth CPU cluster, sort of the opposite of a single socket that has 128 or 288 cores. Maybe a memory muscle server is what someone needs?

Perhaps those hyperscalers and cloud builders acquiring Granite Rapids 6900P and 6700E and 6900E processors are doing something interesting with those extra two UPI links. They ain’t there by accident. . . . That much we know.

In addition to versions of the Granite Rapids chips that scale up to four or eight sockets, there are ones that are tuned to only run in a single-socket, datacenter-class use case that is distinct from and beefier than the Xeon 6 SoC that is aimed at the telcos and service providers.

The single socket variants of the Granite Rapids 6500P and 6700P chips are interesting, and they are a testament to the success that AMD has had peddling single-socket devices as HPC and AI head nodes and as more generic server sleds at the hyperscalers and cloud builders. (AMD does not have four-socket or higher NUMA configurations and has remained at a ceiling of two sockets. But that could change if AMD wants to get a piece of the SAP HANA and other big database action.)

The Xeon 65X1P and 67X1P – the X is a variable, then 1 at the end means single socket – Granite Rapids chips for single socket servers span from 16 cores to 80 cores, which is not a huge number of cores, but this is sufficient to do certain kinds of computational work – think controllers for software-defined storage as a good example – and have lots of I/O.

“I think we’re seeing a lot of really strong interest in this platform today.” Singhal said, referring to the single socket design it has come up with. “We have seen some cases, in fact, where we are winning back designs with this platform from our competition already, and as we scale this out over time in the market, I expect to see more of that.”

Intel also has to cover SMB and edge compute cases where a real Xeon processor with strong P-cores is important, and the Xeon 6300P chips do that. These chips may find their way into the datacenter, but it will be in a Trojan horse, say perhaps in a switch or another device. The cost of a unit of compute in the Xeon 6300P line is pretty low – about half that of the Performance SKUs and the 1 Socket SKUs, as Intel calls them, and about a quarter to half the price of the versions of the Xeon 6500P and 6700P chips used in four-way and eight-way machines.

The Xeon 6300P is limited to 128 GB of main memory and it only runs at 4.8 GT/sec, so these are not for heavy memory workloads at all.

Where The Compiler Hits The Core

We will do a more thorough performance analysis, but for now, here is the relative performance of the 64-core (Xeon 6 6767P) and 86-core (Xeon 6 6787P) in the Granite Rapids line versus the 64-core Xeon 5 8592+ in the prior “Emerald Rapids” line.

On like-for-like core counts, the Granite Rapids chip offers a performance bump between 14 percent and 41 percent across a wide range of workloads that stress compute, memory bandwidth, and I/O differently. The incremental 22 cores in the 86-core top bin Granite Rapids 6700P line gives a performance gain of over the 64-core Emerald Rapids chip of between 30 percent to 54 percent.

By our eyes, it averages to around 25 percent for the same 64 core counts and 40 percent for the increase to 86 cores. But of course, performance always comes down to details, and while datacenters run a lot of workloads, they don’t average them. Each workload gets what it gets based on how well the system is configured for it. The best CPU in the world doesn’t mean much if it doesn’t have enough memory or I/O.

Pushing absolute performance at 100 percent CPU utilization is generally not the goal of CPU designers, even among those who are making their own Arm server chips. They are trying to get the right balance of performance, thermals, and price for the typical workload. For many hyperscalers and cloud builders, 40 percent of peak CPU is the typical load, and so Intel has optimized the Granite Rapids P-core designs to be more efficient than their Emerald Rapids predecessors. Like this:

So, that gives you the high level shape of this announcement. Without further ado, here is the SKU table for the additional Granite Rapids chips announced today:

We will be drilling down in to performance across various categories of workloads, historical comparisons with prior Xeons as well as competitive analysis between the Granite Rapids Xeon 6s and AMD’s “Turin” Epyc 9005 line and their Xen 5 and Xen 5c cores. Stay tuned.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

1 Comment

  1. A very well rounded lineup for the Xeon 6 IMHO! As noted though, the use of 4 UPI links in 8S chips is a bit puzzling when the 2S CPUs can have 6 of them … With 4 links, the best fully interconnected system would have 5 nodes, like a 4-dimensional tetrahedron (simplex), or 5-cell, or like the 5 fingers in one’s hand, hmmmm …

    Seeing how, with two hands, and 8 fingers (discounting the pinkies for convenience), and a loop of string, one can scientifically generate so many interesting NUMA topologies, from the cat’s cradle, to the grandfather clock, the fish in a dish, and two diamonds, then I think it is safe to conclude that 4 UPI links, like 640 KB, should indeed be right enough for everybody! d^8

    (key literature reference: https://en.wikipedia.org/wiki/List_of_string_figures )

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.