Intel Is Counting On AI Inference To Save The Xeon CPU

Timothy Prickett Morgan

7 months ago

There is little question that generative AI as well as other kinds of machine learning are going to augment applications in every industry and in every part of the application stack in the coming years. It is also pretty obvious that AI training for the most advanced models, with trillions of parameters in their neural networks and trillions of tokens of data, is very costly and that if generative AI is to be deployed in production, a way has to be found to do AI inference at a much lower cost.

With the largest GenAI models, it takes tens of thousands of GPUs to train those models over the course of somewhere around three or four months. And it takes a node or two with eight GPUs each to do the inferencing that creates the generative responses that will be embedded in the applications mentioned above. If GPUs do not get a lot smaller and cheaper, then CPU makers will be able to add a lot more matrix math capability to their devices to keep inference work from shifting even more than it has to GPUs or other kinds of matrix math accelerators. If the CPUs don’t get enough matrix oomph and take the business back from GPUs, there is a chance that GenAI will not be cheap enough to be widely deployed – at least not at what GPUs cost these days.

It is an interesting conundrum. And it is not clear how this might all play out. Intel has a relatively weak position in matrix math accelerators with its “Ponte Vecchio” Max Series GPUs – they are too hot and too expensive to make – and even with its well-regarded Gaudi 2 and Gaudi 3 neural network processor (NNP) chips, it is not at all clear how customers will adopt them for GenAI inference. The Gaudi line will be displaced by the future “Falcon Shores” converged GPU-NNP sometime in 2025, so it is a bit of a dead end and there is no reason to believe that Intel can build a better and cheaper GPU than either Nvidia or AMD can in 2025. Also, there is no indication that Intel is going to be wildly expanding the low-precision math capabilities of the AMX units in the future Xeon SP cores, either.

We continue to believe that companies want to run AI inference on their CPUs whenever possible, but the trouble is this might not be possible with the most advanced GenAI models, which need a lot of compute to achieve acceptably low latencies on responses to prompts.

It is against this AI backdrop, as well as increasing competition from AMD on the X86 server CPU front and the dominance of Nvidia in datacenter GPUs and the rise of AMD in datacenter GPUs, that we have to consider Intel’s current datacenter compute business. Yes, it did better than expected in the third quarter ended in September. Which is great for those of us who want intense competition to drive down prices for datacenter compute of all kinds. But this battle for the datacenter is far from over, and in fact may be a decades-long war that no vendor can ever win. The fact that Intel could control datacenter compute for so long is perhaps an anomaly that can never be repeated, even if it looks like Nvidia is setting the pace for the datacenter. There are a lot of workloads that have nothing to do with AI. But the question is how long will this remain true? Over the next four or five years, AI training and inference together could drive around half of server revenues by our estimates. We do not doubt this, but it is less clear where these AI training and inference workloads will run.

Pat Gelsinger, Intel’s chief executive officer and general manager of its datacenter business as well as its first chief technology officer in glory days gone by, talked a bit about this situation in going over the company’s financial results for the third quarter.

“While the industry has seen some wallet share shifts between CPU and accelerators over the last several quarters, as well as some inventory burn in the server market, we see signs of normalization as we enter Q4, driving modest sequential TAM growth,” Gelsinger explained. “Across most customers, we expect to exit the year at healthy inventory levels, and we see growth in compute cores returning to more normal historical rates off the depressed 2023. More importantly, our successful road map execution is strengthening our product portfolio with Gen 4 and Gen 5 Xeon, Sierra Forest and Granite Rapids positioning us well to win back share in the datacenter. In addition, we expect to capture a growing portion of the accelerator market in 2024 with our suite of AI accelerators led by Gaudi, which is setting leadership benchmark results with third parties like MLCommons and Hugging Face. We are pleased with the customer momentum we are seeing from our accelerator portfolio and Gaudi in particular, and we have nearly doubled our pipeline over the last ninety days.”

That pipeline is about a $2 billion opportunity, mostly centered on the Gaudi line of accelerators that are seeing a resurgence in a world of extremely scarce GPU supplies from Nvidia and AMD, if we understand how Intel has talked about it over the past several quarters, but we think that AI inference and training servers represent something close to $50 billion in revenues in 2023. And be careful of comparing a pipeline to actual sales – pipelines are always many factors larger than revenues, and moreso as there are many different competitors with various devices to chase those opportunities.

As we have said before, if you can make a reasonable matrix math engine and run TensorFlow and PyTorch on it, you can sell it. The fact that Intel is putting 4,000 of the Gaudi 2 devices on a cloud and not selling them directly to an AI startup is interesting to us. You might jump to the conclusion that maybe Intel can’t sell this capacity directly to customers. But when AI processing capacity generates around 2.5X more revenue over multiple years than selling the raw iron itself, you can see now why Intel would be building its own cloud and getting Stability.ai, the maker of the Stable Diffusion generative image processing platform, as its anchor customer.

Given the dearth of Nvidia “Hopper” H100 GPUs and given that we really have no idea how many “Antares” Instinct MI300A and MI300X GPUs that AMD can make, small wonder that Intel can sell Gaudi 2 accelerators – and indeed, will be able to sell the Gaudi 3 accelerators that will double performance. So what? The question is will this revenue be material, and will these sales lay a foundation for the future Falcon Shores GPU or not?

The Data Center & AI group, or DCAI for short, had $3.81 billion in sales, down 9.4 percent, and posted an operating income of $71 million, 4.2X higher than the year ago period.

Intel is introducing a new abbreviation into out lives: MNCs, short for multinational corporations and what we used to just call “large enterprises to make it distinct from SMBs, or small and medium businesses, and hyperscalers and cloud builders as we call them, which Intel might abbreviate to be HCBs if it wants to start sounding like a military organizations. Anyway, Gelsinger said on the call that DCAI exceeded Intel’s forecasts by a little bit and that revenues were up modestly on a sequential basis, with the world’s ten largest CSPs – short for cloud service providers, which is apparently hyperscalers plus cloud builders – having the “Sapphire Rapids” fourth gen Xeon SPs, which launched in January of this year, in production. Intel broke through 1 million shipments of Sapphire Rapids at the beginning of this quarter and will break through 2 million shipments in November, according to Gelsinger, who also has high hopes for the sixth gen “Granite Rapids” Xeon SPs, which will have 2X to 3X the AI performance of Sapphire Rapids.

Gelsinger reminded everyone that the fifth gen “Emerald Rapids” Xeon SP, which is just a tweak on Sapphire Rapids, will be launched on December 14, and that the sixth gen “Sierra Forest” Xeon SP based on its energy-efficient “Sierra Glen” E-cores rather than the “Redwood Cove” performance cores, or P-cores, that are coming in the Granite Rapids Xeon SPs, also a gen six product sharing the same “Birch Stream” socket and platform. Sierra Forest, which will pack 144 cores on a die and which will come in a two-die socket with 288 cores, comes in the first half of 2024, with Granite Rapids following shortly after it. (We drilled down into both CPUs back in September.)

We shall see how much Intel gets supply wins and how much it gets design wins. It’s not like AMD is sitting still with Epyc CPUs and Nvidia and Ampere Computing are not competitive in some server segments, too, And Google and Microsoft are working on their own Arm CPUs, too, alongside Amazon Web Services, which will be debuting its Graviton4 in November if history is any guide.

In its 10-Q filing with the US Securities and Exchange Commission, Intel provided a little more insight into its DCAI business. Intel said that server volumes (which means mostly CPUs but includes some motherboards and chipsets) were down 35 percent in the third quarter, which is a stunning number really and which Intel blamed on “a softening CPU datacenter market.” Which is fair, with the cloud builders and hyperscalers taking a pause as they pour a lot of their money into GPUs for GenAI workloads. Interestingly, thanks to that downshift in sales from the hyperscalers and cloud builders, server average selling prices (ASPs) were up 38 percent, a trend that was also boosted by the adoption of CPUs with higher core counts by all Xeon SP shoppers (including those hyperscalers and cloud builders).

Year to date through the end of the third quarter, DCAI revenues are off 22.5 percent to $11.54 billion, and Intel said in the 10-Q that server volumes are off 41 percent but ASPs are up 17 percent. Sales of FPGAs also helped boost revenues but going forward, despite product launches, Intel seems to be entering a slowing period of FPGA sales. For the nine months, DCAI has an operating loss of $608 million, compared to an operating gain of $1.92 billion against $14.89 billion in sales in the first three months of 2022.

As far as we can tell, Q1 2023 was a local minima for Intel, financially speaking, in the datacenter. It remains to be seen if it is an absolute minima.

Now, DCAI covers a lot of the datacenter business at Intel, but not all of it. Its Network and Edge, or NEX, group also sells gear into the datacenter and its edge extension. In Q3, NEX sales were off 36 percent to $1,45 billion, and operating income was down 77.3 percent to $17 million. For the nine months, NEX sales are off 36.8 percent to $4.3 billion and in terms of profits, has shifted from an operating income of $682 million in the first nine months of 2022 to an operating loss of $470 million in the first three quarters of 2023. Ouch.

Add DCAI and NEX together and you geta kind of proxy for what used to be called Data Center Group in the old days, and if you add up pieces of the old flash, storage, FPGA, and IoT businesses that Intel used to have, you can get a proxy for what Intel’s “real” datacenter business looked like over time and how it has changed in the wake of product divestitures, product shutdowns, and competitive pressures. Like this:

No one watching Intel rise through the 2000s and 2010s would have expected the Intel datacenter business to dip below that red line in the chart above into operating red ink. It seemed, as was said many times in The Princess Bride, inconceivable.

The discontinued Optane 3D XPoint persistent memory, which was used solely in servers, is now part of the Other revenue and operating income – well, operating loss – category, and we are being generous and not trying to allocate a portion of the $2.25 billion in operating losses Intel posted in the Other segment in Q3 2023 to the “datacenter” business as we show it above. Heaven only knows what the losses are for the Accelerated Computing & Graphics (AXG) business that was split up and apportioned to the DCAI and Client Computing (CCG) groups.

Funny how Intel doesn’t really talk about the Ponte Vecchio GPUs, which are deployed in the “Aurora” supercomputer at Argonne National Laboratory, any more. It’s all Gaudi 2 this and Gaudi 3 that and just wait until you see the converged Falcon Shores GPUs with Gaudi matrix math engines and Gaudi fat Ethernet pipes on the chip. . . .

By our math, Intel’s “real” datacenter business is down 19 percent to $51.7 billion and its operating income has been cut in half to $86 million, or 1.7 percent of revenues. That is a far cry from the peak Intel datacenter business, which posted $9.06 billion in sales in Q2 2020 and had an operating profit of $3.43 billion, or about 37.8 percent of revenues.