Intel Is Still Struggling In The Datacenter, But It Could Get Better

Intel has been pushing its two-core server CPU strategy for so long, in one form or another, that we have become accustomed to differentiating products the way Intel does and then try to figure out what workloads these chips might be useful for.

The Atom and E-core chips, which have their heritage in Intel’s laptop processors and which are aimed at energy efficiency, are minimalist designs aimed at high throughput per socket but modest workloads, while the true Xeon cores – now known as P-cores, short for performance – are distinct cores with different but overlapping features and higher throughput per core, which is important for single-threaded workloads that are common in the IT estate.

With the upcoming “Diamond Rapids” Xeon 7 P-core variants, Intel’s chip architects – many of whom no longer work at the company – decided to remove simultaneous multithreading, known as HyperThreading in the Intel architecture, from the design. The idea, we surmise, was to take out the overhead of SMT from the design, which allows two virtual threads per core, which can boost throughput at the expense of slightly lower single-threaded performance and also introduces another attack surface for security vulnerabilities. This is why many Arm server CPU designs do not have SMT.

To SMT or To Not SMT is a pesky question, and Intel has vacillated here. The original Atom processors from a decade and a half ago had it, then it was removed with the “Silvermont” cores in 2013 and was not added to the E-cores (code-named “Gracemont,” “Crestmont,” and “Skymont”). Given that chips these days have a lot of physical cores, some of the P-core CPU designs for desktops and laptops had SMT removed to improve their performance and efficiency, and this carried into the high-end Xeon server CPU line with Diamond Rapids, which is based on Intel’s 18A process (roughly akin to 2 nanometers) and which is expected to ship in the second half of this year. High-bin Diamond Rapids Xeon 7 parts will have four compute tiles and a total of 192 cores.

Over the past several months, after Kevork Kechichian came from Arm to be general manager of the Data Center Group (they are no longer calling it DCAI except in the financial reporting), Intel has decided to can the eight-channel variants of Diamond Rapids and focus on high-end 16-channel parts that are aimed at big workloads, including database servers, HPC systems, and AI host nodes. With 18A still ramping and in relatively short supply, Intel has to pick its production targets very carefully to chase the dollars.

Lip-Bu Tan, Intel’s chief executive officer, said on a call with Wall Street analysts that Intel was working to accelerate the delivery of the follow-on “Coral Rapids” Xeon 8 P-core part, which would add SMT back into the design. This chip was originally slated for the second half of 2027 to early 2028, and we will see how quickly it will be able to get it out the door.

We think that one of the ways Coral Rapids might be accelerated to market is to use an advanced node of 18A instead of the 14A process that was expected. So far, Intel Foundry has no external customers for 14A and the company is very clear that it needs one for the ramp to proceed. Hopefully, 18A is not the new 14 nanometers, a process that Intel was stuck at for way too long as Taiwan Semiconductor Manufacturing Co pushed down into 7 nanometers and 5 nanometers with their 6 nanometer and 4 nanometer tweaks.

To be fair, Intel is still kinda stuck at the 10 nanometer SuperFIN and Intel 7 processes for parts of its Xeon 6 chips even as it uses Intel 3 (something around a 4 nanometer to 3 nanometer process) for core tiles. With the “Clearwater Forest” Xeon 7, which is an E-core design expected in the first half of this year, the I/O are etched using Intel 7, the base tiles are etched using Intel 3, and the core tiles are etched using 18A. This may be a choice based as much on the relatively low volumes expected for Clearwater Forest. The E-core Xeon 6 processors have not exactly taken the world by storm, but there is some interest, but some manufacturing helps the 18A ramp and also helps cover the cost of that ramp.

Anyway, Coral Rapids might be the first Intel processor to integrate NVLink Fusion ports to attach to Nvidia memory fabric switches and GPUs in a coherent fashion. There is speculation that the Coral Rapids chip will support DDR6 main memory, and up to four memory sticks per channel for a big boost in main memory capacity for server nodes.

If there was one big bummer in the Intel financial report, it was the admission that Intel could not meet demand for its Xeon processors at any process because of supply constraints with Intel 7 and Intel 3 and the fact that it has to balance the needs of client device builders against server builders.

“Obviously, we are shifting as much as we can over the datacenter to meet the high demand,” said Dave Zinsner, Intel’s chief financial officer, on the call. “But we can’t completely vacate the client market. So we are trying to support both as best we can and obviously work our way out of this supply issue. I do believe that the first quarter is the trough. We will improve supply in the second quarter. And part of the challenge is that in the third and fourth quarter of 2025, we lived off of supply. But we also had a reasonable chunk of fixed finished goods inventory to also work through. Unfortunately, that is now down to kind of 40 percent of what it was at peak levels. So we don’t have that to rely on. It is just literally hand to mouth – what we can get out of the fab and what we can get to customers is how we are managing it.”

Elsewhere in the call, Zinsner said that Intel was prioritizing internal wafer supply to Xeons and leveraging an increased mix of externally sourced wafers for its client devices. It is good that Intel has that option, but that does not help the ramps and it might be more costly than using internal capacity at Intel Foundry. (Then again, it may be cheaper, and on second thought, we think it might be. . . . )

Everybody has their eyes on the 14A process, which Intel has said it will not put into production until it lines up external customers – perhaps later this year or early next. In the meantime, development continues so that Intel can do the ramp relatively quickly when it does get the go-ahead, and we strongly suspect that there will be political pressure on chip companies like Apple, Nvidia, and maybe even AMD to source some of their chips on 14A if the ramp is not terrible.

“Intel 14A development remains on track,” Tan said on the call. “We have taken meaningful steps to simplify our process flow and improve our rate of performance and yield improvement. We are developing a comprehensive IP portfolio on Intel 14A, and we continue to improve our design enablement approach. Importantly, our PDK is now viewed by customers as industry standard. Engagements with potential external customers on Intel 14A are active. We believe customers will begin to make firm supplier decisions starting in the second half of this year and extending into the first half of 2027. We also have the opportunity to provide strong differentiation in advanced packaging, particularly with EMIB and EMIB-T. We are focusing on improving quality and yield to support customer desire for ramps beginning in second half of 2026.”

In the meantime, AMD and Nvidia will be competing hard against Intel, and TSMC is absolutely not going to let up at all as it ramps its American fab capacity to do its part to help prevent World War III.

And with that said, let’s go over the numbers for Intel in the final quarter of 2025.

In the fourth quarter, Intel’s revenues were down 5.2 percent to $13.67 billion, and operating income shifted to a gain of $580 million versus a $401 million operating loss in the year ago period.

Intel Foundry continues to be a drag on the company, with revenues – almost exclusively coming from the Client Computing Group and the Data Center Group – of $4.51 billion, up a tenth of a point year on year. But operating losses at the foundry, thanks to the ramps of Intel 7, Intel 3, and 18A and development for 14A, grew to $2.51 billion.

Intel’s current inhouse Core and Xeon CPU volumes can carry the company for a while if need be, and that seems to be the plan. All profits from the CPU products are propping up the foundry – which is how it has been for a decade and a half since Intel hit the 10 nanometer wall and was knocked flat.

Here is the big table showing the numbers since Q1 2023:

What we care about here at The Next Platform is the datacenter business, which has thankfully been consolidated back into a single Data Center Group thanks to the sale of the flash storage business and the spinoff of the Altera FPGA business. Now, everything datacenter is in one place and we don’t have to model it. We can see it.

What we see in Q4 2025 is that Intel had $4.74 billion in sales for what is still called the Data Center & AI group in the financial reports but which is using the old Data Center Group name, up 8.9 percent year on year and up 15.1 percent sequentially. This is not GenAI Boom growth, but it is not decline. Moreover, with $1.25 billion in operating profits, the Data Center Group has increased its profitability by a factor of 3.3X year in year and by nearly 30 percent sequentially. This is good, given all the circumstances, and is reflective of the demand for high-end CPUs for HPC and AI systems that don’t have a lot of CPUs but they do tend to use the most expensive ones given the task of keeping even more expensive GPUs or XPUs fed with data.

It is hard to say what the steady state rate of revenue and profitability will be for the Data Center Group as we go forward, but it will almost certainly not ever attain the revenue levels of 2020 through 2022 or the profitability levels of 2017 through 2020 ever again. A steady state for this business might be somewhere around $6 billion a quarter in revenues and maybe $2 billion in operating profits – and that is if everything goes right. We think the future is Arm chips accounting for 25 percent of server revenues, with Intel and AMD arguing over the remaining 75 percent and fighting to be the one with 40 percent share compared to the other’s 35 percent share.

Last Note: AMD’s Epyc designs, which double up core counts by cutting cache in half, is a much cleaner way to get two different types of processors without having to change functions. Intel might want to think about that.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

8 Comments

  1. Yikes a 17 percent drop in Intel market capitalization today!

    I think it’s true that Intel has too many different core designs. While this duplicated effort leads to an opportunity cost, at least each server processor consists of only one type of core and they look good. On the other hand, the desktop chips currently have a mix of three different types of cores on package and that makes scheduling tasks impossibly difficult for the operating system.

    The cache differences between the upcoming Zen 6 versus 6c cores are likely much less costly to engineer. Unfortunately, AMD follows Intel and includes both 6 and 6c cores in the next generation desktop packages. Who hasn’t noticed that the 9800X3D with uniform cache for all cores is more popular and runs faster than the 9950X3D where half the cores have less cache than the others?

  2. Intel q4 2025 channel of financial reconciliation DCG and CCG is posted at my Seeking Alpha comment line.

    https://seekingalpha.com/user/5030701

    Here’s DCG q4 discovery

    Xeon Sierra, Granite and Emerald all up $1K AWP on change in quantity q/q = $5753 + 112% for gross @ .264 = $1518.79, < R&D @ .235 = 357.74, < MG&A @ .085 = $130.40, no restructure, < Tax @ 4.9% = $74.53 for net take $956.32

    On q4 cumulating Sierra, Granite and Emerald all up $1K AWP = $4167.42 + 8% for gross $1100 < costs nets $692.75

    Throwing out Emerald rapids on meager q/q gain on supply + 5.3%, on revenue + 9.8%, Sierra and Granite alone,

    On cumulating $1K AWP = $5132.77 for gross $1355.05 < costs nets $942.18
    On change q/q $1K AWP = $5347,64 for gross $1411.78 96 cores, includes Scalable on very little channel supply just 12% of full run to date. Where for 6 quarters because Hyper-cloud bridging from Sapphire or Emerald SF 10 that is 7, would skip Granite at Inte1 5/3 intermediate on no stretch aiming for Diamond 18A platform validation. This is what Intel sales lives for, samples on customer into future contract commitments meeting spec is how Intel warps competitive time.

    18A has been available in some form for 6 questers. If Hyper-cloud said go 18A over 3, Intel provided 18A.

    Sierra and Granite Lake,

    q1 MC = $1632 and MR = $1632 and debatable on standard practice
    q2 MC = $883.74 and MR = $81.54
    q3 MC = $581.21 and MR = $25.92
    q4 MC = $254.84 and MR = 1157.16 and a function of producing < 80C

    Over Sierra + Granite long run to date,

    80 to 144 cores = 5.32%
    40 to 64 cores = 36.94% steps up from “traditional servers”
    24 to 36 cores = 31.81% and “the traditional servers”
    4 to 16 cores = 25.93% and “the traditional servers”

    On the question of Clear Water Forest viability, Sierra is 28.16% of all Granite Ridge channel available to date.

    More at my Seeking Alpha comment line.

    Mike Bruzzone, Camp Marketing

  3. Timothy Prickett Morgan wrote “Hopefully, 18A is not the new 14 nanometers, a process that Intel was stuck at for way too long”. This is what worries me most. If Intel can’t put 14A into production, TSMC will raise the price of their 14A process. Having a monopoly for leading-edge lithography equipment (ASML) that sells to a monopoly for leading-edge process technology (TSMC) will result in little to no improvement in the price/performance ratio of semiconductors over time.

    • I agree that the monopoly is scary. Yet tsmc has a healthy product margin, but nothing like that of Nvidia, which does have direct competitors. How is it that tsmc can’t squeeze more of the juice? Is it that Intel and Samsung are “close enough” to competitive that AMD/Nvidia/Apple/Etc could jump ship if prices went up 20% more?

      ASML truly is a monopoly, but their competitor is themselves last year. If they increase prices too much, their customers will still buy, just not as many units, and will only fab the most profitable wafers, or run the old lines longer. There’s a balance there to maximize total profit at the expense of per-unit profit.

    • Has there been an improvement in the price/performance of semiconductors in the last five years? They’re getting faster, but also getting more expensive.

      • There has been improvement in the price/performance ratio of semiconductors during the last five years. Most of that improvement has come from improvements in microarchitecture rather than improvements in the price/performance ratio of transistors. For example, the price/performance ratio improved when NVIDIA switched from FP16 to FP8 to FP4, even with little improvement in the price/performance ratio of transistors.

        Another example is AMD’s use of 3D V-Cache. 3D V-Cache improves performance while using older, less expensive process technology. Transistors on the leading-edge process will probably make up a smaller fraction of total transistors shipped in the future as chip designers stack a leading-edge die on top of one or more SRAM die built on an older process. If the improvement in price/performance ratio slows down, hardware will be upgraded less frequently, which will hurt the whole semiconductor industry.

        Your point about ASML maximizing total profit at the expense of per-unit profit also applies to TSMC. Investors are willing to fund huge AI data centers even though the AI frontier model makers are not currently profitable. TSMC can’t squeeze the AI chip makers too hard without making the AI bubble pop so TSMC forgoes some per-unit profit to maximize total profit. TSMC also wants to kill Rapidus and Intel Foundry before raising prices, just like Amazon did to retail stores.

        About 2/3 of NVIDIA engineers are software engineers so NVIDIA is really a hybrid software/hardware company. NVIDIA has to find ways to improve the price/performance ratio of AI applications because NVIDIA doesn’t want the AI bubble to pop either.

  4. I’ve never bought hyperthreading for my home workstation and generally heard at least dark muttering about the effects on servers. Never thought it was a great idea, but maybe it gave the CPU logic something to do while waiting out cache faults and such. If that’s it, I wish someone would just say so. And even then a full analysis of memory channel contention might still find single threaded cores scale up to multi-core more smoothly and just have fewer cache faults to wait around on.

    • I think that certain kinds of code likes threads, even the way it is presented. IBM has eight-way SMT on Power cores for as long as I can remember, and on Java app servers and databases, it really does boost throughput performance by a lot.

      See: https://www.ibm.com/downloads/documents/us-en/10c31775c5d40fed

      Just as one example: for the rPerf benchmark on AIX using what I presume is the Oracle database, a 60-core Power11 machine called the S1124 (two sockets, 30-cores each, running at a base 2.8 GHz and drifting up to 3.95 GHz as thermals of the system allow), with one thread per core it has a rPerf rating of 515.8. two threads per core gets you 1,052, four threads per core gets you 1,374.2 and eight threads per core gets you 1,736.8. Now, IBM obviously is tuning its software to run in SMT8 mode, and heaven only knows what optimizations it has done, but this scaling works on real world applications and IBM has a guarantee of it in its contracts.

      rPerf is short for relative performance and it is a variant of the TPC-C benchmark test with some changes. It can be I/O limited and a lot of times these days the whole benchmark can fit into main memory, which helps. But, this also helps with real DBMSes. Even decades ago, on the TPC-C test the hassle of running the benchmark was not getting a CPU, but putting together the 30,000 to 100,000 disk arms necessary to drive the test with the I/O levels that the real TPC-C test required. But that I/O is not necessary to do a relative CPU benchmark as long as customers know that in the real world, they will have to add enough disk or flash IOPS to get the expected work out of the machine.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.