Compute

AMD Says “Helios” Racks And MI400 Series GPUs On Track For 2H 2026

Published

While releasing an update to its InferenceX AI inference benchmark test, formerly known as InferenceMax and thus far only having Nvidia and AMD testing systems using it, the analysts SemiAnalysis declared correctly that thus far only Nvidia, AWS, and Google have created rackscale systems that are deployed today, and that AMD was working on it.

And to be more precise, let’s name the rackscale platforms. Nvidia has its “Oberon” NVL72 rackscale systems using its “Blackwell” B200 and B300 GPU accelerators. AWS has its Trainium2 and Trainium3 accelerators, and Google with seven generations of TPU systems that have been brought to market for more than a decade. (It depends on how you want to draw the lines with memory coherency and flexibility with TPUs – you could argue for fewer generations because TPUs v1, v2, and v3 were hard-wired torus interconnects and only with TPU v4 did Google employ its “Palomar” optical circuit switch in reconfigurable layers in the torus interconnect.)

As you well know, AMD has been working with Meta Platforms on the Open Rack Wide v3 specification, a double-wide rack that AMD calls “Helios” and that is the platform for delivering rackscale compute based on the “Altair” MI400 series GPU accelerators. This series includes the MI450, the MI430X, and the MI455X for the Helios rackscale systems, which will have 64, 72, or 128 GPUs per system, and possibly the MI440X for eight-way system nodes.

Here are the feeds and speeds for the Helios rack:

In the SemiAnalysis report, which we saw because we subscribe to the publication, the authors made this statement:

Engineering samples and low volume production of AMD’s first rack scale MI455X UALoE72 system will be in H2 2026, while due to manufacturing delays, the mass production ramp and first production tokens will only be generated on an MI455X UALoE72 by Q2 2027.

This was a bit perplexing to us, since AMD did not mention any delays in its most recent quarterly results from a few weeks ago. This would no doubt be a material delay, and one that would run the company afoul of the US Securities and Exchange Commission should it have been true at the time. It was possible that the delay could have hit after the results were posted on February 3. The further rumors on the street were that the Helios rack system delays were due to some thermal issue in the design.

This is one of the reasons why we sat in on a conference call hosted by New Street Research last week, where Forrest Norrod, general manager of the Data Center Solutions business group, and Doug Huang, one of the co-founders of ZT Systems, which was acquired by AMD last year to underpin its rackscale system engineering efforts, and now its senior vice president in charge of the Data Center Platform Engineering group.

Norrod was having none of it when asked about a delay with the Helios racks:

“I have no idea where this purported issue around thermals is coming from,” Norrod said unequivocally. “I literally have no idea. We have no significant thermal issue. The risk around the thermal design at the component level all the way through the rack level was retired quite some time ago. So, no idea where that’s coming from. I think the meta question that you are asking is when we expect to see the ramp, and are we on track. Lisa showed the first silicon, we are right on track with where we thought we would be, both in terms of the readiness of the overall solution as well as the readiness of the silicon. And we are highly confident of ramping Helios in high volume in the second half of the year.”

So that is that. The reason why AMD can be confident was explained by Huang, who showed off two charts, and explained that they use what amounts to dummy hot plates to simulate the CPUs and GPUs long before they come back from the fabs so they know the physical specs and thermals of the chips that will come back will fit into the racks and the racks will work right.

And here is the Helios development and manufacturing model:

This may be AMD’s first rackscale design, just like the NVL36 and NVL72 rackscale machines that were supposed to be based on Hopper were – and never came to market except for one machine sold to AWS. But this is not the first rackscale machine that ZT Systems has developed, and it is not the first one for Meta Platforms, either, which started the Open Compute Project back in 2011 with server, rack, and datacenter designs. Getting Helios out the door on time and being manufactured in volume is why AMD spent $4.9 billion to acquire ZT Systems back in August 2024. Not wanting to be in the server manufacturing business is why AMD sold the manufacturing arm of ZT Systems to Sanmina for $3 billion last fall.

One of the big changes is that with rackscale systems, AMD will be delivering whole rackscale systems through what the industry calls a New Product Introduction, or NPI, partner. We are reasonably certain that this initial Helios NPI partner is none other than Sanmina, which has the ZT Systems manufacturing business, which as we pointed out a year and a half ago was a $10 billion system manufacturer with rackscale system cred when AMD bought it. (The biggest server maker you probably never heard of and that never made the OEM or ODM lists.)

Every single component is tested, and is then tested when it is assembled into racks, and then the racks are tested before they ship, and the shipping process is tested before the real racks go in so they can handle the stress of travel to customer datacenters. All of this has to be simulated and mocked up long before the first real rack is assembled in the NPI facility – what Huang called “early risk retirement.”

The issue is that all of this mockup and testing has to be done in parallel because time needs to be compressed. You cannot waiting for the real components to be available for prototypes and then early builds and then hope the rack works out.

“The factories nowadays are really extensions to the labs, because we go so fast that we don’t have the time that we have had before to do everything in series,” Huang explained. “So it is super-important to have an environment and a partner understands that, and that the supply chain has so many components in this complex solution. We are focusing a lot of energy to make sure that the whole ecosystem is enabled.”