How Long Before AI Servers Take Over The Market?

When hyperscalers and cloud builders think about their infrastructure, they talk about megawatts and they think about the mix of serving and storage and the total capacity that is delivered in a megawatt of power. And of course they also think in terms of budgets because money is, in fact, what makes the world go around.

We like feeds and speeds and slots and watts as much as anyone, but we like money. Because money is how you keep score. So after two years of not seeing the server trackers from either IDC or Gartner, when we happened to be searching for server forecasts on Google this week and we saw this page with lots of juicy data spanning from 2022 through 2027 from IDC, we got out the trusty Excel spreadsheet and went to work.

There was some commentary on the server forecast, which is generally not made public but a summary was released in late September 2022, which we also saw randomly a few months later and wrote about, oddly enough just like now.

Here is the IDC chart, which shows worldwide server revenue data from 2022 and the forecast for 2023 through 2027:

I don’t like this chart because the way IDC does the second Y axis with the growth rate it looks visually like it is going to zero but it is actually a 4.3 percent growth rate. We redid the chart and added in the data from 2021 to the data, which we had from last year:

Here is the actual data in table form:

This particular dataset doesn’t just show server revenues and forecasts five years out, but breaks X86 and non-X86 servers from each other, and for the past decade that would have been pretty boring for a lot of people, given that two-thirds or so of the non-X86 iron is comprised of Power Systems and System z mainframe sales from IBM and the rest was a mix of other proprietary machines and Arm servers. But with the rise of Arm servers at the hyperscalers and cloud builders, that non-X86 part of the business is getting interesting, and will continue to do so as RISC-V machinery becomes more normal in the decade ahead. So it is not as retro a way of thinking about it as you might think.

We actually had IBM System z and Power revenue figures for 2020 from IDC as part of their server tracker, which was $4.98 billion, and that means Arm/Other comprised the remaining $3.87 billion in the non-X86 category. If you make some assumptions about the IBM products going forward, with an upgrade cycle in 2021 and 2022 for Power10 and z16 machines and another Power11 and z17 upgrade cycle in 2025 and a general decline in revenues as the amount of compute in a Power or z processor keeps growing faster than online transaction processing and other compute demands, you might get a gradual decline in IBM server hardware sales that decline from just shy of $5 billion in 2020 to maybe $3.5 billion in 2026 and maybe $3.3 billion in 2027. If you do that, and use the baseline IDC data, then the Arm/Other part of the non-X86 business will grow at a very healthy clip, again depending on the binge-digest cycles of the hyperscalers and cloud builders, who nonetheless have a baseline level of consumption that is unavoidable, then Arm and RISC-V servers – and we think mostly Arm servers even way out there – will be in the range of $20 billion a year. That is about a 10 percent revenue share for Arm servers, which is not the same thing as the 20 percent or so Arm server shipment share we were talking about back in January when looking at some Gartner and Wells Fargo data that went out to 2026.

With so many hyperscalers and cloud builders working on custom Arm server CPUs and custom AI coprocessors, the options are wide open and the pressure is there to not just use X86 server CPUs and Nvidia GPUs for AI and other computationally intensive workloads.

Speaking of which, what we really want to know is how sales of AI servers – mostly for training but also for inference – is distinct from the rest of the server acquisitions. And we also want to have some sense of what inflation, which was let loose by server makers last year and into this year, is having on boosting revenues. GPU inflation is a big part of that as too much demand is chasing too little supply.

IDC had this to say in its report: “Direct impact of inflation on servers was felt more strongly each subsequent quarter during 2022, with year over year ASP growth rates escalating to 29 percent year over year in the second quarter of 2023, while unit growth, which had been in the teens through most of 2022, dropped to a meager 1.4 percent in 2022 Q4, declined year over year in 2023 Q1 by 10 percent, and now by 19.9 percent in 2023 Q2.”

Those are very big shipment declines against pretty high ASP growth driven by very expensive AI training and inference nodes with four or eight GPUs each and costing hundreds of thousands of dollars each. The AI and non-AI servers really need to be separated from each other because these are very different parts of the market. So we took a stab at it based on the top-line server revenues from IDC from 2020 through 2027, like this:

We realize there is some guesswork in here, but we think this is the shape of things to come and things that are happening as well as things that have happened in recent years.

The upshot is that unless something happens to slow down the growth in AI models and unless AI training and inference compute gets a lot cheaper, we think there is a non-zero chance that AI compute will comprise around half of server revenues by 2026 or 2027.

That model assumes a modest, GDP-like growth rate in non-AI server revenues each year after a pretty steep decline of 11.2 percent that started in 2022 and that will improve a bit to only a 5 percent decline in 2023. It also assumes a pretty staggering leap of nearly 5X in revenues for AI servers between 2022 and 2023 and then pretty healthy, steady state growth in the range of 20 percent in 2024 and slowing down to 15 percent in 2027. We didn’t force that growth for AI servers, but rather assumed modest growth/consumption cycles like we have seen in the past and then everything else left over was for AI servers.

So this is the explosion happening, right now, and as Nvidia GPU supply goes up and prices come down as well as other brands of GPUs and other kinds of accelerators enter the market and get traction at volume, everything will level out and normalize a bit. Perhaps. And at a whole new level.

There is a question as to how much supply of AI serving the world will need, and we admit predicting that out four or five years from now is very difficult indeed. If AI accelerators remain in short supply and prices stay high, then revenue will stay high. If volumes double or triple, prices will be cut by half or two thirds and revenues will be consistent nonetheless. Perhaps. We are tossing this idea out there for comment.

Obviously, not one wants to make it up in volume when it comes to gross margins. But intense competition – such as that which Nvidia has brought upon itself by its own success – has a nasty habit of forcing companies to do just that.

It’s funny. It took a decade and a half – from 1985 to 2000 – for RISC/Unix machines and the advent of Internet technologies as well as aggressive replacements of mainframes and proprietary minicomputers to reach 45 percent share of server revenues. And it may take the same decade and a half – from 2010 through 2025 or 2011 through 2026, however you want to call it – for AI servers to comprise around 45 percent of worldwide server revenues and for the AI workload to replace or augment just about every kind of application you can think of.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

10 Comments

  1. Contemporary AI seems most suited to behavioral analysis and targeting (French: ciblage comportemental) with associated activities of recommendation systems, sentiment extraction, marketing analytics, targeted ad-generation, mass surveillance, and stock-market prediction. Some percentage of datacenter machinery had been dedicated to such workloads before the recent “craze” in AI (eg. at Alphabet, Amazon, Meta, Berkshire Hathaway, USA’s NSA/DHS/CIA, France’s DGSI/DGSE, etc …). I wonder how much of the estimated 40% share of AI servers corresponds to conversion to AI of these “classical” ops (mainly useful to large entities, specific corporations or governments), versus how much might be for “new” applications of AI, by expansion into new fields of activity that may be of interest to individuals and small groups of people (families, friends, neighborhoods, news outlets, …). It seems to me that if the bulk of AI servers are conversions from classical ops then their share will probably level off due to limited market size (same relatively-fixed size as before AI). Market growth (beyond replacement/upgrade) will require some new “killer-app” (KA) I think.

    We may need an AI KA that 2 billion people are each willing to shell-out $500 for, essentially, to make this a proper trillion-dollar market. It would be coded in something like JavAIscript, WebAIssembly, or pAIthon … ?

    • The AI part of your killer-app will need to run in the datacenter, otherwise those servers won’t need GPUs (they’ll just be conventional servers). It seems to me that AI apps that run at the edge, or in the browser, won’t need AI servers at all.

      • Point taken! AI servers would likely be needed for TurnItIn/”TurnItOut”, the Yin/Yang of learning/cheating through academia. TurnItIn is already server-based and would be a conversion op, but “TurnItOut” may be a new and quickly growing killer-app, requiring brand new genAI servers, and for which many individual users would be willing to pay as it’d provide them with more free time in the evenings and weekends to enjoy life, play video games, post thoughtful material on social nets, experiment broadly, get better grades, jobs in positions of responsibility, higher wages, and improved self-esteem (ahem!).

        On a side-note, irrespective of whether the AI is run in servers or edge, if it accounts for, say, 1 TF/s per person then that corresponds to aggregate oomphs of 340 EF/s in the US, and 450 EF/s in the EU, both of which are larger than recently stated 2025 goals of 300 EF/s and 80 EF/s for China and India, respectively. Using TOPs (instead of TF/s) and noting that Qualcomm’s upcoming X Elite laptop NPU does 45 TOPs (INT4), suggests that those apparently astronomical 2025 goals may not be actually that outlandish … (?)

        • For my tax money, I’d say every EU citizen should get a free AMD RX7800XT (1.2 TF/s FP64 @ 263W) so we get this 450 EF/s goal done good (especially key with today’s news the EU will run out of drugs this winter!)! 8^p

          • If the EU had half the balls of Indiana, they’d give each citizen TWO Radeon RX7800XTs, and heed the impending rise of ZettaCthulhu! (eh-eh-eh!). All for less than half of the Netherland’s GDP, or Nvidia’s market cap (incidentally, it would be 3x more expensive to do this FP64 ZettaFlopping using RTX4090s instead!).

            The EU should elect Lisa Su as president (a physicist, just like Angela Merkel) and get that Zettascale show on the road! With 4 million square km of EU surface area, the needed 263 GW can easily be provided by photovoltaics (need just 66 milliwatts per square meter)! (ih-ih-ih!)

          • I like your ideas (HuMo and Slim Jim) … but doubt they’ll work (kinda like giving every citizen 2 books to increase literacy). The EU’ll probably be more successful if it focuses on enhancing public infrastructure, with micro-, mini-, and normal- public datacenters, accessible to individual town residents, and housed in its many city halls (like computational libraries, or public transport). Those would provide 2 TF/s of oomph to each resident (for the Zettascale goal), using EPYC Zen or SR CPUs, each paired with three or four MI210 accelerators (22 TF/s FP64 @ 300W), as suggested here: https://www.nextplatform.com/2023/08/15/crafting-a-dgx-alike-ai-server-out-of-amd-gpus-and-pci-switches/ . Using your EU surface area numbers, that’s 4 mW/m^2 and thus eminently solar-powerable.

            A town of 300 folks would get a micro-datacenter (a node) with 10 CPUs and 30 GPUs (about €150k, 15kW), which scales great on both Hashcat and ResNet50, and, like a GigaIO SuperNode, would allow them to simulate 1 second of Concorde flight every 33 hours! They could also house their personal webpages on the system, and do whatever genAI job they desire. It is a vision for public AI datacenters that would surely spark decentralized innovation and computational literacy, most broadly, throughout the EU (in my mind)!

          • I have joked that every fast food joint in every town in the modern world should be using an overclocked supercomputer as a grill. Every home could use a smaller node as a heating unit? Instead of trying to make heat, what if we made it on purpose as a hot water heater?

  2. Good to hear about IBM’s 2025 This is Spinal Tap Power11 chip — hopefully a good 10% louder than Power10! :^b

  3. It looks to be a shorter time-frame than 15 years. P100, arguably the first merchant accelerator specifically geared for machine learning (with HBMs and NVLink), shipped mid 2016. More specifically, not sure we can call any server work load “AI” before 2013 as the Cambrian explosion like event of Alexnet domination of Imagenet that drives this resurgence in ML was late 2012.

    With respect to declining ASPs, can 70% corporate gross margins be maintained by Nvidia indefinitely? Likely not. But I wonder if the server landscape doesn’t end up looking like the PC Graphics landscape where one supplier earns healthy margins and the others either limp along or have exited.

    A lot of the competitive debate is often reduced to hardware, chip vs chip feeds and speeds if you will. I don’t believe it’s that simple. In this case Nvidia has created a platform (with millions of developers) with incumbency and leadership and the flexibility to fight competitive solutions with mature products. Meanwhile they continue to innovate, dropping products in at the top of the stack which preserves high GM transactions. (MI300 appears to be around 15 months behind H100, and MI300 performance remains to be quantified.)

    Unless competitors come up with a same-generation accelerator in a close time frame with near performance, I don’t see anything close to halving of ASPs. A decline from these nose bleed levels? sure. Intel basically maintained 60%GMs from 2010-2020 showing how dominant one supplier can be. Nvidia is still just getting ramped.

    • Software talent and companies are all over the place. There’s tons of competition there for Nvidia, from CSPs to Hypers doing their own, down to Hugging Face and all the other small AI companies. All they needed was an open framework that could be used on more than Nvidia GPUs. And as long as the hardware provider can optimize their hardware to those frameworks (OpenAI, ROCm, etc.), then the software lockin Nvidia had with CUDA, is gone. Then anyone can provide software solutions on any available hardware platforms. So I think margins will crash for Nvidia. How much of their DC margin is in hardware vs. software services? They don’t break that out yet, both are inflated. Both will deflate as other hardware becomes available and as software developers compete using it. AMD is optimizing ROCm now for beyond CDNA, but to RDNA and eventually will incorporate FPGA, DPU, and CPU optimizations. That’s where AMD will have an edge, they’ll be able to sell their HSA hardware in many markets. They just need to keep hammering the optimization to the open frameworks. It could end up being a bit like the gaming GPU market, with Nvidia in the lead. But though AMD has a smaller share, they still make very good margins. Competitors need to be able to keep up on the optimizations (like drivers for gaming) to compete. That’s my take.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.