AMD Will Need Another Decade To Try To Pass Nvidia

Lisu Su has turned in her first ten years at the helm of AMD, and what a hell of a run it has been.

The company was a mess when she came on board two years prior to being named chief executive officer, and she took her knowledge of the game console business garnered while at IBM Microelectronics and took that business away from Big Blue by competing hard. Within a year of being named CEO, AMD had a battle plan to re-enter the datacenter CPU business and soon thereafter started laying the foundations of a datacenter GPU business that could compete with archrival Nvidia.

To be sure, AMD got lucky in that Intel’s foundry business – and thus its dependent CPU business for clients and servers – foundered, but AMD’s roadmap execution for Epyc CPUs and Instinct GPUs has been flawless. The few changes that there has been in roadmaps have been made to better intersect with technologies that made the CPUs and GPUs better, and these days, AMD is fielding consistently better CPU hardware than Intel and has GPU hardware that is at parity on raw features with Nvidia.

There is much work to be done to create full systems, not just compute engines, but every dollar that AMD makes and every share of that dollar it gets to keep after paying the bills has been earned with sweat and smarts. And, AMD has added Xilinx to the fold and put out respectable and competitive client CPUs and GPUs, too, which help create another virtuous cycle that can help AMD ride out bumpy spots in the datacenter, should they come to pass again. (And they will, fear not.)

In the quarter ended in September, AMD’s revenues rose by 17.6 percent to $6.82 billion, and net income rose by a factor of 2.6X to $771 million, which is 11.3 percent of revenues. This may not be AMD’s most profitable quarter – it had a killer Q4 2020 and a very good run for profits between Q3 2021 and Q1 2022 – but this is the largest revenue that the company has ever brought in during a 13 week period. And even after acquisitions and investing heavily in research and development for future CPU, GPU, DPU, and FPGA products, the company still had $4.54 billion in cash and investments in the bank.

This is the healthiest we have seen AMD since it re-entered the datacenter in 2015 and absolutely bests its first pass through the glass house in the early 2000s with the Opterons, a time when GPUs were only used to draw pretty pictures and when AMD spent $5.4 billion to buy graphics card maker and Nvidia rival ATI Technologies. That acquisition already paid for itself through sales of client GPUs, but datacenter GPU sales from the five quarters of Q4 2023 through Q4 2024, inclusive, will also more than pay for the ATI acquisition again.

That is because, as we have been expecting all year, Su & Company have raised their guidance for GPU sales for all of 2024 to more than $5 billion, an increase of $500 million from the forecast a quarter ago and a factor of 2.5X higher than AMD was telling Wall Street to expect way back in October 2023 ahead of the “Antares” MI300 series datacenter GPU launch that hit in December.

Here is a table of the various models we built since AMD started forecasting GPU revenues for 2024 late last year:

Our models show that AMD’s Instinct GPU ramp started off a bit more slowly than we expected – compare out best case scenario from January 2024 to the quarterly GPU sales at the bottom of the table above. But starting in the third quarter and with what we expect in the fourth quarter, the ramp for Instinct GPUs is accelerating. We also think that the blended $30,000 price tag for MI300X series GPUs and MI300A hybrid CPU-GPU was perhaps a bit high, and that means AMD is shipping more GPUs than we originally thought.

We think the average price of an MI300 series GPU is $22,500 and that means, given more than $5 billion in sales for 2024, AMD is shipping 224,222 units. Depending on how you measure the FP64 performance for El Capitan – either on vector or tensor cores – and depending on the peak performance you expect it to have (we are guessing 2.25 exaflops), the soon-to-be world’s fastest supercomputer, installed at Lawrence Livermore National Laboratory, will have either around 36,700 or 18,350 MI300A units. Assume they count the tensor core math throughout, which is 2X the vector core math, to give peak theoretical performance for El Capitan. That would leave somewhere around 206,000 other MI300X units on the market, and that leaves around 25,750 eight-way universal baseboard GPU nodes sold during the year.

This is but a fraction of what Nvidia will do in terms of revenues and volumes. But, AMD is going to turn in the best year in its history, too. It will take a long time – and perhaps a major and highly unlikely screwup by Nvidia to let AMD catch it. Nvidia is not Intel, which let AMD catch it once with the Itanium debacle and then again with the foundry debacle. Nvidia co-founder and chief executive officer Jensen Huang, who is a distant cousin to Lisa Su, is a driven visionary and does not need to be paranoid to survive. Nvidia helped create the next wave of computing and is benefitting from first mover advantages, including massive revenue and profit streams.

The Datacenter group at AMD is nearly twice as large as the Client group that sells CPUs and GPUs for PCs, and it is 3.8X times as profitable. The Datacenter group has operating profit margins that are three times the average for the company. But due to the expense of research, development, and manufacturing, the Instinct datacenter GPU line has operating income that is lower than the company at large and is a drag on profits garnered from datacenter CPUs, FPGAs, and DPUs.

This will change over time, and at some point, as AMD and its manufacturing partners get better at this and ramp volumes higher, the Instinct line will have higher than average profit margins than other datacenter products from AMD and than AMD overall.

It will take time for the Datacenter group to be larger than the rest of the company, but that could happen in 2025 or 2026. A lot depends on how many Instinct GPU accelerators AMD can make.

What seems obvious is that the AMD Instinct datacenter GPU business, which is the fastest-ramping product in the company’s long history, will soon reach parity with its Epyc datacenter CPU business if current trends persist and if our model accurately reflects AMD’s reality.

On the call with Wall Street analysts going over the numbers, Su confirmed that Instinct GPU sales were above the $1.5 billion mark in Q3 2024, but she didn’t say how much. Our best guess is that in Q3 2023, AMD had about $50 million in Instinct GPU sales, so $1.57 billion in sales we think AMD posted in Q3 2024 is a factor of 30.4X higher. That is, as you see above, a pretty fast ramp and as steep as anything Nvidia has done. It’s just a lot smaller chunks of revenues than Big Green has been taking down.

In the quarter, we think AMD did about $1.84 billion in Epyc CPU sales, up 24 percent year on year and up 9.9 percent sequentially. With the “Turin” CPUs launched, we will be very keen on what Intel has to say in a few days about sales of “Sierra Forest” and “Granite Rapids” Xeon 6 processors that have to compete against them. We could see 2025 be the year when AMD and Intel have equal revenue share of server X86 processors.

The question now is will Su stick around for another ten years to try to reach revenue parity with Nvidia. It may take that long, and at 55, there is still time to do it. And, importantly, Su is six years younger than her cousin also in the GPU business.

May the next decade also not be boring. Beating Nvidia is going to be a lot harder than beating Intel was.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

5 Comments

  1. “Beating Nvidia is going to be a lot harder than beating Intel was.”
    AMD doesn’t need to beat Nvidia to be a highly profitable, important company. A close second to a goliath is still a giant. Moreover a close second can provide choice in the market, and help limit the premium Nvidia can charge for their products.

    • Exactly. Could not agree more. But that is not what Lisa Su is thinking about when she gets out of bed each morning between now and 2034.

  2. I just love those Instinct devices that AMD’s got going here. I mean, looking at the Great Danes comparison Table (TNP Oct. 23, 2024), El Capitan’s MI300A cost per FP64 TF/s is around 1/8th that of H100 solutions … and still advantageous at FP16! Hopefully they (AMD), and open source outfits, can get whichever AI software challenges that might remain at this stage, resolved, since their tech does look to have the better (much) price/performance ratio across use cases, from FP64 HPC to FP16-and-less AI — IMHO.

  3. On AMD change in DC revenue q2-q3, from the change in CPU channel volume q/q estimating what is not CPU revenue, I have q3 DC revenue split 50 : 50 between Epyc and Instinct. I am aware Acuri on financial call Instinct revenue query $1.5 B. Rasgon testing $1.7 B. Su tempering somewhere closer to above $1.5 B but not $1.7 B on FPGA in DC revenue.

    For Instinct I rely on gross $16,500 subsequently a $33K consulting VAR and/or design build engineering ‘street price’ bundling in their cost of self-training, education and customer focused field / applications engineering assistance. Hyperscale and cloud have the same cost’s (consideration) and that difference needs to be reflected in the end buyer price. Accelerated compute is costly to set up and the training cost in manpower and in-memory problem frameworks is expensive.

    For Nvidia H100 my q2 gross is $16,123 and street $21,456 so approximately the same gross as AMD. My H800 q2 gross is $11,106 and $14,780 street.

    Instinct quarterly volume on a net basis showing q2 gross < R&D, < MG&A, < Restructuring, < tax = $7050.60 that covers all of Instinct appliance with subsystems build cost.

    Instinct revenue on an even split of AMD q3 DC = 1,750,000,000 based on change in CPU revenue q/q that is ZERO on channel supply so Epyc q3 revenue is roughly the same as last quarter, from which I calculate volume.

    Epyc per unit price is up in q3 which affects volume. Hsu states + 25% and I have that increase as much higher and still working on it. My Client and Game q/q revenue + 25% and < 29% respectively on channel data are spot on with Hsu. I will state AMD q3 full line volume within the week.

    I will discount Dr. SU's guidance said closer to $1.5 billion and rely on channel data for Epyc q/q from total DC revenue showing other and I'll give it all to Instinct. GPGPU, big FPGAs, they're all accelerators.

    q3 2024 at $1.75 B = 248,205 units down to Dr. Su's 212,748 units
    q2 2024 = 143,089
    q1 2024 = 152,127
    q4 2023 = 25,204

    Instinct all up to date = 561,534 units and let's just say AMD does another 248,205 units in q4 = 816,830 and since corporations like round numbers it's probably more like 1 million.

    Now you can divide my stated volume / 2 to get back to gross into revenue or half the volume I stated or 408,415 units that is still + 67% more than TPM's 242,222 units in 2024.

    But wait,

    $5,759,138,421 in Instinct 2024 revenue = 816,830 units at marginal revenue = $7050.60 = marginal cost $7050.60 that goes to TSMC = $14101.20 subtracted from gross $16,500 and AMD earns $2398.72 per card subsystem in appliance. However, AMD's take can be more.

    TPM's 2024 total for 242,222 units into $5,759,138,241 = $23,776.28 per subsystem in appliance. or at $5 B revenue = $20,642.21.

    How much more [?] and this infra-marginal example, R&D cost in q2 is 27% and in q3 24% so there is a cost savings q/q sans weighing q3's net price adjustment down.

    Mi300 like H100 at run end price slides supporting supply elasticity this is a competitive market. AMD and Nvidia are not raising any generation's past peak production volume price they are lowering that price. No one's waiting around for Intel Gaudi or other acceleration options to ramp up.

    In q2 AMD R&D cost at 27% applied to Instinct Mi300 is $4476 and in q3 $3960. But is it really? This is where AMD, and Nvidia, actually make their profit. On what is the actual R&D cost taken in relation stated, that is an 'infra marginal' savings which falls to the bottom line as net profit.

    Mike Bruzzone, Camp Marketing

    • Clarification and correction; My above comment observes an AMD profit of $2398.72. Whoops, this is actually a premium over average marginal revenue; $7050.60 + the premium $2398.92 earns AMD $9449.52 per Mi3x0_ subsystem. On the flip side TSMC’s cut of Instinct total potential revenue at AMD gross price $16500 is $7050.60. The economic rule on stakeholder / 2 presumes AMD and TSMC split a product’s total revenue potential. On a gross basis that would be $8250 each. The rule goes on to presume that TSMC’s cost is one half of their revenue take or between $3525.30 and $4125. On the AMD side however the real profit potential comes from not taking the full R&D charge in time. For TSMC, reducing marginal cost per unit on manufacturing learning or input cost reductions over time increasing marginal revenue.

      We can drop $16500 into my Nvidia Ampere model based on actual (change in) quantiles supplied over the Ampere full run. I have Ada actual supply over time as well as RDNA lll with their precise change in end sale price data but have not yet updated the model to change in quantity and change in price.

      Note the Ampere model relies on a static suggested price so I am expecting a level of precision better when I complete the Ada and RDNA lll models. My completion trigger for Ada updated model is on the first indicator of gamer whine about Blackwell RTX card price so I can calm them down on logical reasoning.

      Here is Ampere change in quantities over 5 quarters with a cost optimized tail;

      0.15500
      0.19668
      0.32804
      0.20491
      0.11534

      $16500 cost over time on volume can now be estimated;

      At unit 1 = $16500
      Between 16% and 34% of full run = $13,003.70
      Between 35% and 67% of full run = $9,883,03
      Between 68% and 87% of full run = $6585.61
      Between 88% and 100% of full run or run end = $3686.93 that is TSMC cost of production

      AT $16500 total revenue potential, marginal cost here is $8856.59 and marginal revenue = $7643.41.

      This model has worked to provide the marginal cost reduction of any Nvida xxGPU subsystem over a five-period production run and exposes the dGPU kit price, to the AIB, or the card cost in the fifth period.

      Let’s look at RTX 4090 FE on its $1599 regulating price cap and the percentage over full run is the same as in my first example;

      $1599
      $1260.18
      $958.72
      $638.21
      $357.30 is the price of AD102 kit to the AIB
      Price to AIB is confirmed on a bottom up TSMC cost to produce assessment.

      4090 FE average marginal cost = $858.20
      4090 FE average marginal revenue = $$740.72

      I’ve found this model works for and Nvidia dGPU or GPGPU subsystem to identify cost of the key components, average marginal cost and average marginal revenue of the dGPU or GPGPU subsystem.

      Mike Bruzzone, Camp Marketing

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.