Space has always been a premium in the datacenter, but the heat is on – quite literally – to drive up the density of GPU and XPU compute not just because real estate is expensive, but because latency is perhaps more expensive. The closer you can get compute engines and their components to each other, the lower the latency between them and the higher you can, in theory, drive the utilization of those very expensive resources.
Hence the mad engineering dash to go from 30 kilowatts in a rack to 1 megawatt in a rack and the absolute inevitability of liquid cooling across the rack (already happened) and silicon photonics for rack interconnects (slated for production perhaps in 2028 or 2029, depending on who you ask).
As the biggest OEM supplier of traditional supercomputing equipment – in large part to its acquisitions of Compaq, Silicon Graphics, and Cray and the abdication from the HPC market by IBM – Hewlett Packard Enterprise has its own ideas about how to build dense machinery, and like rival Atos has built systems in recent years that have cabinetry that is much larger than standard racks. With great difficulty, Nvidia is keeping to standard datacenter rack dimensions with its “Oberon” NVL72 and future NVL144 rackscale systems based on its “Oberon” rack, and that is perhaps putting pressure on HPE to reduce the size of its racks while driving up density.
It is interesting that rather than try to cram everything into one rack, the AMD and Meta Platforms collective has created the “Helios” double-wide rack to take on Nvidia’s Oberon racks, striking a slightly different balance between density but, over time, we think will be absolutely competitive with the future “Kyber” racks from Nvidia.
The Helios rack is starting out with 72 GPUs to match the compute and interconnect capacity of its “Vera-Rubin” platform coming next year, and will eventually scale to 128 devices in a coherent memory domain, and we think it will go even more dense. And that means Helios will be able to provide the density of Nvidia racks, which will eventually go double-wide with the Kyber racks, splitting out power and cooling into one rack and compute and networking into another sitting beside it.
When HPE previewed the “Discovery” GX5000 rackscale supercomputing platform a few weeks ago – we are giving the design the same code-name as the first HPC system that will use it, which is the “Discovery” OLCF-6 supercomputer at Oak Ridge National Laboratory – you might have thought that HPE would just use the Helios rack and be done with it, given that the machine was based on future AMD CPUs and GPUs. But, as it turns out, HPE started on its new Discovery rack design long before AMD bought ZT Systems and realized it needed to design its own rack for hyperscalers, cloud builders, and model builders.
“We are seeing this not so subtle increase in TDPs and power requirements from silicon providers,” Chris Davidson, HPE’s vice president of HPC and AI customer solutions, tells The Next Platform. “And it is really forcing the hand of the community to look at what we are doing from an infrastructure perspective. We heard customer feedback on the EX4000 that this was a very big machine, with a lot of dependencies, and that they needed a smaller rack – instead of being ultra-wide, it would be wide, but not so wide, and it would still allow for the increase in TDP and power requirements that we’re seeing from the silicon providers and support warmer water intake but also fit into normal customer datacenters. And we are trying to drive more similarities to what we are seeing through the Open Compute Project community, too.”
Ahead of the Supercomputing 2025 conference next week in St Louis, HPE is giving us a sneak peek into the GX5000 compute trays and confirming the speed and scale of the Slingshot Ethernet interconnect used in these systems. Like the prior “Shasta” Cray EX3000 and EX4000 racks, the “Discovery” GX5000 racks do not have a scale up network to create a memory fabric across the GPU accelerators in the rack, which is a big difference between Nvidia Oberon and AMD Helios racks. The assumption is more traditional scale out interconnects between GPUs for either AI or HPC workloads. This does not mean that HPE could not add GPU or XPU memory interconnects when they become available.
The Cray EX4000 rack was able to power up 293 kilowatts of compute and networking, and could hold up to 64 compute blades and 64 Slingshot blades and measured 2,489 mm high by 1,181 mm wide by 1,740 mm deep. That is 5.11 billion cubic millimeters of volume for the rack.
The Cray GX5000 rack can draw 400 kilowatts of power, and it is 2,045 mm high by 900 mm wide by 1,200 mm deep, which is 2.21 billion cubic millimeters and which represents a 56.8 percent reduction in cubic volume for a rack with essentially the same power draw. There are fewer compute blades in the chassis, however. The GX5000 will support up to 40 CPU-only compute blades, up to 28 AMD CPU-GPU blades using “Venice” Epyc CPUs and “Altair” MI400 GPUs, and up to 24 Nvidia CPU-GPU blades using “Vera” Arm CPUs and “Rubin” R200 GPUs.
The neat thing about the HPE GX5000 design is that a row of eight racks only needs two liquid cooling side-cars, unlike the one-to-one pairing coming with Nvidia’s Kyber racks. The GX5000 row mechanicals look like this:
It looks like Nvidia still has a few things it can learn from traditional supercomputing . . . .
At the moment, there are three different compute blades that HPE is revealing for the GX5000 rack:
The GX250 is the CPU-only blade and it will be equipped with the future “Venice” Epyc CPU from AMD. To be specific, it will have four two-socket Venice processors. The specs for the Venice ships have not been divulged as yet, but at its Financial Analyst Day this week in New York, Dan McNamara, general manager of compute and enterprise AI at AMD, said that the Venice chip would have greater than 1.3X more “thread density” than the “Turin” Epyc 9005 processors announced in October 2024. That should mean the Venice Zen 6 variants have around 172 cores (probably running around 2.6 GHz) and the Venice Zen 6c variants have 256 cores (probably running around 2.15 GHz).
Here is a zoom on the mechanicals of the GX250 blade:
There are eight Venice CPUs on a blade and 40 blades in a rack, and assuming our math is right base don what McNamara said in his presentation, that will be over 55,000 Zen 6 cores and nearly 82,000 Zen 6c cores in a GX5000 rack. All of us can remember when an entire supercomputer had a few ten thousand CPU cores and that was pushing the envelope. Not that anyone will do this, but 300 cabinets of Venice Zen 6 CPUs would be a “capability class” supercomputer and it would have 16.51 million cores.
Here’s the funny question: Will there even be a Xeon 7 or Xeon 8 version of this blade? Given this CPU-only blade was not called the GX250a, with the “a” designating AMD, it would seem not. (That said, HPE should have called it the GX250a for the sake of consistency.)
The GX350a blade is the one that will be used in the future Discovery supercomputer, the bidding of which was opened up in October 2023 and the award of which was given to HPE last month by the US Department of Energy. This blade will have a single Venice processor – we presume the Zen 6 and not the Zen 6c version that has half the cache but twice the cores – acting as a host for and as part of the memory coherency domain (thanks to AMD’s Infinity Fabric) with four Altair Instinct MI430 accelerators. (Once again, we have given the MI400 series the Altair code-name because AMD’s GPU division hates synonyms, unlike its CPU division, which loves them.)
Not a lot is known about the MI430X accelerator coming next year except that it will lean more heavily into FP64 double-precision floating computing than the FP4 and FP8 formats that are emphasized for AI compute in the MI455X GPU that is also expected next year from AMD. The Venice part used in the HPE Cray blades will burn somewhere well north of 500 watts ands the MI430X will burn 2,000 watts each, and that is 8,550 watts to maybe 8,600 watts (and maybe a little more) just for the compute engines in the blade.
Here is the mechanical for the GX350a blade:
We are a little bit surprised that HPE could not get two Venice CPUs and eight Altair GPUs onto a single node, given that the GX5000 is designed to support up to 25 kilowatts per blade. The EX4000 blades were capped at 11 kilowatts, so we get why this would not have happened in the older systems. The GX5000 rack can hold 28 of these blades, which is a total of 28 CPUs and 112 GPUs. We do not know how many FP64 flops this might represent.
That leaves the GX440n blade, which is based on Nvidia “Vera” CV100 Arm server CPUs and “Rubin” R200 GPUs, which are sold in a 1 by 2 configuration by Nvidia just as the prior compute complexes based on the “Grace” CG100 CPU and “Blackwell” B200 GPU were.
The GX440n blade has four of the Vera-Rubin compute complexes, which are considerably smaller than the Grace-Blackwell units, which yields four CPUs and eight GPUs per card. With 24 GX440n blades per GX5000 rack, that works out to 48 CPUs and 192 GPUs per rack, which is a lot more CPU and GPU density than is offered with the AMD blade –1.71X more, to be precise. Again, this strikes us as odd, and when we asked about this we did not get a real answer from HPE.
All of the blades above can have four or eight 400 Gb/sec Slingshot network interfaces each and also come with two E1.s ruler flash cards for local storage. The GX5000 rack has a blade chassis, presumably in the back of the rack, that can hold 8, 16, or 32 Slingshot switch blades that have a single switch ASIC supporting 64 ports each.
The Slingshot network uses a dragonfly topology to link all of the CPU nodes or the GPUs on the nodes to each other so they can share data and do distributed processing. It is not a tight coupling of memory, just like InfiniBand is not and unadulterated Ethernet definitely is not, but a more traditional interconnect that is perfectly fine for simulation and modeling and also fine for many kinds of AI training. Here is what that topology looks like:
In this particular configuration above, the Slingshot 400 dragonfly network has 2,086 GPU nodes, 804 CPU nodes, and plus two ranks of storage ports and one rank for the head nodes that link the whole shebang to the outside world. This is not necessarily representative of a typical configuration.
With the dragonfly network, the more switch groups you have, the larger the number of network interface cards you can have. Then you can gang up the network interfaces to get more bandwidth to any particular device. So why we have been smart-alecs about HPE not having 800 Gb/sec Slingshot switches and adapters out yet, you can definitely double up the network cards to double the bandwidth into and out of any particular device. (This obviously reduces the number of devices you can support, however.)
At the low end, with two switches per dragonfly group, you get 33 groups and that yields a maximum of 1,056 Slingshot endpoints. With 32 switches per group and 257 groups, you can drive 263,168 Slingshot endpoints. You can find spots in-between these by dialing up or down the number of switches per dragonfly group and the number of groups. The EX4000s have tended to have either 16 or 32 switches per group, which is 37,120 or 263,168 maximum endpoints.
In any event, you add the switch blades into the chassis, and they act like a modular switch for interconnecting the compute in the rack and across the racks in that dragonfly topology. With eight switches, you have 512 ports per rack, 16 doubles it to 1,024 ports, and 32 doubles it again to 2,048 ports.
Here is the whole hardware and software stack for the Cray GX5000 system:
This table says up to 36 compute blades per rack, but it is actually up to 40.
That leaves us with the K3000 Cray storage system, which is meant to be integrated onto the Slingshot fabric directly alongside the compute nodes in the GX5000 system. The K3000 also supports 400 Gb/sec InfiniBand and 400 Gb/sec Ethernet links as well as Slingshot 200 and Slingshot 400 links because sometimes storage is shared with other systems. The K3000 can run the Lustre parallel file system as well as the DAOS alternative that HPE has taken over commercial support from Intel.
The K3000 storage server, shown above, can support 8, 12, or 16 NVM-Express performance-optimized flash drives or up to 20 drives for capacity-optimized storage nodes. Drives come in 3.84 TB, 7.68 TB, and 15.36 TB capacities, and main memory on the storage server, which is based on the ProLiant DL360 Gen12 server from HPE, can be 512 GB, 1 TB, or 2 TB, depending on the drive size the memory is caching for.
The GX5000 blades are expected to be available in early 2027, with March being a good time with Nvidia’s annual GTC confab happening. The Slingshot 400 switches and NICs will be available at the same time. The K3000 flash arrays, which will be available in early 2026, deliver up to 75 million IOPS per rack against a maximum of 12.3 PB of capacity in that rack.
This hardware is set to be the foundation of the Discovery supercomputer at Oak Ridge, which plans to have Discovery installed in 2028 and up and running in 2029. This seems pretty far out, which is why the “Lux” system is coming in to fill in the gap between the current “Frontier” OLCF-5 exascale-class machine, which was operational in May 2022. Lux is based on unnamed Epyc CPUs paired (or quadded or octoed as the case might be) with “Antares+” MI355X GPUs and will be comprised of ProLiant XD685 servers and using Pensando DPUs as NICs in the nodes. The interconnect between the nodes has not been revealed, but if it was Slingshot, we presume HPE would have said so.
Lux and Discovery have a budget of $1 billion, and will be paid for with “public and private funding,” which HPE, Oak Ridge, and the Department of Energy have yet to explain. Who is paying, and for what?