Intel Pushes Out Hybrid CPU-GPU Compute Beyond 2025

Timothy Prickett Morgan

1 year ago

One of the reasons why Intel can even think about entering the GPU compute space is that the IT market, and indeed just about any market we can think of, likes to have at least three competitors. With capital intensive businesses, there is an inevitable consolidation, and sometimes only two companies can be supported.

Nvidia claims to have invented the modern Graphics Processing Unit, which is silly since dedicated graphics chips were in arcade games back in the 1970s and were in PCs in the 1980s, and culminating in the Texas Instruments TMS34010 graphics chip in 1986 and IBM 8514 graphics card in 1987 for IBM’s own PS/2 systems and, interestingly enough, for clone machines. 2D graphics acceleration started with ATI Technologies in 1987, which spawned competitor S3 in 1991; 3D graphics acceleration – and the concentration of a tremendous amount of compute – started with S3 and ATI as well in 1995, but Nvidia was founded two years earlier to take them on because their cards lacked good performance and low cost according to company founders – Jensen Huang, Chris Malachowsky, and Curtis Priem. Nvidia absolutely embraced the idea of general purpose GPU compute as academics were hacking GPUs to offload parallel calculations from CPUs to the GPU shader engines, and it has unquestionably been the driving force in GPU compute in the datacenter.

So much so that AMD/ATI and Intel are chasing Nvidia to get some of that big, fat, juicy datacenter compute budget for AI and HPC workloads. AMD has already carved out a piece of the exascale HPC opportunity thanks to its aggressive price/performance with the combination of its “Aldebaran” Instinct MI250X GPU accelerators and “Trento” and “Genoa” Epyc CPUs and seeks to extend its advantages with the uncodenamed Instinct MI300 series, which has a pair of CPUs and six GPUs on a single package and which will ship later this year. Nvidia is going to respond in the second half of 2023 with its “Grace” Arm server chip paired with its “Hopper” GH100 GPU accelerator in a single package.

Intel’s first real GPU compute engine for the datacenter – we are not counting failed “Larrabee” X86 GPUs or their spinoff “Knights” family many-core CPUs – is, of course the “Ponte Vecchio” Max Series GPU, which is not yet shipping in volume and which is at the heart of the 2 exaflops “Aurora” supercomputer going into Argonne National Laboratory.

“Deployment is going well, with Intel collaborating closely on testing and development,” Jeff McVeigh, interim general manager of Intel’s Accelerated Computing Systems and Graphics business and general manager of the Super Compute Group, explained in a blog post last week. “Argonne expects the system to be accessible to early researchers by the third quarter of 2023.”

It is hard to say if that means another delay for the Ponte Vecchio GPU accelerator, which has pushed out the delivery of the Aurora system a bunch of times. (And the delays with Sapphire Rapids did not help, either.) This machine will have over 10,000 nodes and over 20,000 “Sapphire Rapids” Xeon SPs with HBM2e stacked DRAM memory (called the Max Series CPU) and over 60,000 Ponte Vecchio GPUs, all lashed together with Hewlett Packard Enterprise’s Slingshot 11 interconnect. Last November, McVeigh said that Ponte Vecchio will be delivered in the Aurora system first and then will be available for other HPC and AI system designs in early Q2 2023. That would be in about a month.

Intel was planning to launch a follow-on to called “Rialto Bridge” and then also get going with its own hybrid GPU-CPU compute package called “Falcon Shores” and presumably also in the Max Series like the Sapphire Rapids HBM CPU and the two discrete GPUs, Ponte Vecchio and Rialto Bridge.

With Intel shaking the can at the federal governments in the United States and Europe so it can get back to being a world-class foundry again and trying to cut costs anywhere that it can without affecting its market positions too much, Intel has decided to kill off the Rialto Bridge GPU, throwing it onto the same scrap heap as the “Tofino” switch ASICs it got from its June 2019 acquisition of Barefoot Networks.

The Rialto Bridge discrete GPU was due to come out this year, boosting the X^e graphics core count to 160, up 25 percent from the 128 X^e cores in the 52 teraflops Ponte Vecchio device. It was to be paired with the “Emerald Rapids” Xeon SP with HBM memory.

Given the difficulty that Intel has had getting the Ponte Vecchio package out the door, which has 47 chiplets using multiple chip manufacturing processes and interconnect methods, it is understandable that customers are probably a little skittish about committing to Rialto Bridge before they see it. And with both AMD and Nvidia both focusing on hybrid CPU-GPU packages and Intel needing to cut costs, you can understand why Intel is shifting its focus to the Falcon Shores effort, which we have called “Aurora in a socket” to drive the message home. (It is a wonder why the Emerald Rapids HBM chip in the Max Series CPU line has made the cut when Rialto Bridge did not.)

Rather than play catch up with an annual cadence of new products, McVeigh said that Intel would be moving to a two year cadence for datacenter GPUs, in both the Max Series for HPC and AI compute and the Flex Series that are aimed at video processing and AI inference.

“This matches customer expectations on new product introductions and allows time to develop their ecosystems,” said McVeigh in the post. “Targeted for introduction in 2025, Falcon Shores’ flexible chiplet-based architecture will address the exponential growth of computing needs for HPC and AI. We are working on variants for this architecture supporting AI, HPC and the convergence of these markets. This foundational architecture will have the flexibility to integrate new IP (including CPU cores and other chiplets) from Intel and customers over time, manufactured using our IDM 2.0 model. Rialto Bridge, which was intended to provide incremental improvements over our current architecture, will be discontinued.”

Intel never committed to a date for Falcon Shores publicly – the X axis in the roadmap never said 2024, but everyone expected it then – but clearly the hybrid CPU-GPU package from Intel is is supposed to compete against AMD’s Instinct MI300 and Nvidia’s Grace-Hopper combo, and now Falcon Shores is probably about two years away and probably on the order of 18 months behind them.

McVeigh added that the successor to the current “Arctic Sound” Flex Series GPUs, code named “Lancaster Sound” and expected this year, would be canceled and that development of the next one, code-named “Melville Sound,” will have a “significant architectural leap” and the canceling of the Lancaster Sound allows Intel “to accelerate development” on Melville Sound.

To cover the loss of Rialto Bridge, we fully expect for there to be a variant of the Falcon Shores GPU that has only X^e GPU cores in it, as Intel showed last June when talking about Falcon Shores:

Perhaps for there will indeed be five different Falcon Shores configurations, two more than the three shown above – one that is half and half CPU and GPU by chiplet area, one that is 25 percent CPU and 75 percent GPU, and one that is 25 percent GPU and 75 percent CPU would be logical, plus ones that are all CPU and ones that are all GPU. So, to one way of thinking about it, Intel is just doing another process shrink on Rialto Bridge, perhaps boosting GPU cores by another 25 percent, then offering different mixes of CPU and GPU on the package, and pushing the whole thing out by a year or more.

The word on the street from our pals at ServeTheHome is that the first Falcon Shores device in 2025 will only have GPU cores, so hybrid CPU-GPU variants are therefore slated for 2026. And frankly, an all-CPU variant could come out with HBM3 memory way ahead of that in the “Granite Rapids” Xeon SPs in 2024 or the “Diamond Rapids” Xeon SPs in 2025. An all GPU variant of a Shores package is effectively in the Bridge family and an all CPU variant in the Shores family is effectively in the Rapids family. So these distinctions are largely semantics.

But if we read all of this right, the important thing is that Intel is not going to deliver a hybrid CPU-GPU Shores device until 2026. And that is far too late for Intel’s Shores devices to compete effectively against AMD second generation Instinct MI400 hybrids and perhaps an Nvidia “Ada” CPU and “Lovelace” GPU hybrid at the same time. (The latest Nvidia GPU should not have been called Ada Lovelace because it breaks the symmetry of the Grace CPU and Hopper GPU.)

Given the trouble that Intel has had with Ponte Vecchio and Sapphire Rapids HBM, skipping Rialto Bridge and maybe Emerald Rapids HBM – should the latter come to pass, too – takes away a means for Intel to build credibility in the HPC and AI training communities. Even if it does time to the next wave of 5 exascale or so HPC systems that will probably be coming out around 2025 or so. Intel needs to build credibility, and it knows that, but it also needs to cut costs and not miss future deadlines. If there is a time to cut things out of the Intel roadmap, it is now.

And from here on out, Intel has to hit every single deadline, on spec and on time.