Store

AI Eats The World, And Most Of Its Flash Storage

Published

If you want to be in the DRAM and flash memory markets, you had better enjoy rollercoasters. Because the boom-bust cycles in these businesses are true white-knuckle events.

Just as the GenAI market was having its ChatGPT mainstreaming moment in November 2022, the buildout in both personal and datacenter infrastructure that had been driven by the coronavirus pandemic for nearly three years at that point had not just run out of gas and prices for both DRAM and flash dropped by half or more as demand dried up across the IT sector. The memory and flash players took it on the chin, and inventories piled up sky high.

The hyperscalers, cloud builders, and model builders that are now driving the GenAI boom are these days probably wishing that they had time machines because demand for DRAM and flash memory is now far outstripping supply, and prices are once again going through the roof and up into the clouds.

On the DRAM front, more than half of the servers in the world need to ship with hundreds of gigabytes of HBM stacked memory for the millions of devices that ship. For eight high HBM3 memory stacks it took three DRAM chips for every one that ended up in a working stack to make the stack because the stacking often doesn’t work right and you can’t peel the memory apart in a junk stack and reuse it. The taller the stack, the more difficult it is to get a working stack and the lower the yield, and with each new HBM generation, the inherent yield is also lower. So while HBM is in hot demand, it burns a lot of chips, and that is capacity in the memory fab that might be allocated to high performance DDR5 memory for servers but isn’t.

The flash shortage that is now plaguing the IT sector has a different issue. As with DRAM, capacity at the foundries that make flash – Kioxia, Micron Technology, Samsung, SanDisk (fab partner to Kioxia), Solidigm, and YMTC if you want to count the indigenous Chinese supplier – cannot be ramped up quickly. They rebalance their production to chase the most money when they can. The big issue here is that demand is way outstripping supply.

“You know that 2023 was pretty bad, and was in fact the worst downturn in memory market history,” Greg Matson, head of products and marketing at Solidigm, tells The Next Platform. At the time, Solidigm’s fattest flash drives were 30 TB and 60 TB in capacity. “At the end of September 2023, products started shipping again and then suddenly in Q1 2024, products started flying off the shelves. At the same time, we build the highest capacity drive, and we thought it might be just a very small portion of our demand. As it turns out, it rapidly became one of the highest growing portions of our demand.”

This was happily perplexing to Solidigm, and its peers in the flash chip and flash drive businesses that no doubt were seeing a rapid ramp of flash storage revenues in 2024, which has continued through 2025 and now into 2026. Those flash makers benefitted nicely from price increases on the order of 50 percent to 70 percent over the past two years (end of 2023 compared to end of 2025) as demand for flash drives has outpaced supply.

What is driving this demand? Tiered storage for what Nvidia calls AI factories and what we still call AI supercomputers. (A datacenter has always been an information factory.) While storage does not dominate the budget of AI supercomputers that are contracted in gigawatt units of capacity these days, storage – and particularly HBM, DRAM, and flash storage – are as important as raw serial, vector, and tensor compute to the AI supercomputing architecture.

Just for fun, Matson walked us through the math that he recently used to explain the current situation to upper management at Solidigm.

The Nvidia AI factory architecture has four tiers of storage, designated by the letter G for reasons that are not obvious but perhaps because everything in the Nvidia universe is subservient to the GPU perhaps that is it. The G1 level is the HBM memory on the GPU accelerator package, and G2 is the DRAM memory on the host server. The recommendation is that G2 be somewhere between 2X to 4X the size of G1 so it can absorb the overflow from G1 for large context windows as AI is processing.

Flash comes into play in the next two tiers of storage. The G3 storage is node-level, which in the case of Nvidia NVL72 machines or AMD Helios racks will be a rackscale node. This G3 tier is used to store the intermediate processing data that is created and checkpointed periodically. This checkpointing is important because AI supercomputers run with synchronous communications between the GPUs and XPUs in the system, which means if one of them fails, then the calculation – which might take days to months – fails. By checkpointing periodically, you can load intermediate data back into the GPUs and restart the calculation before the point of failure and not have to start the AI training run from the beginning.

With the “Vera” VC100 CPUs and “Rubin” R200 GPUs in the Vera-Rubin platforms coming later this year, Nvidia will introduce a new G3.5 tier that is called inference context memory storage, which is basically use BlueField-4 DPUs used as a storage controller and put inside of the node/rack to be even faster as well as to deliver some local processing on the data.

The G4 level of storage in the Nvidia AI supercomputer architecture is the network storage that stores objects and files outside of the node or rackscale system (which is just a big node). VAST Data has tweaked its architecture so it can absorb the job of G3 storage for checkpointing, which is an interesting architectural choice and one that can save AI system architects some money.

We think the Nvidia architecture should include a G5 level of storage, based on very fat disk drives. This would follow the practices of the hyperscalers and cloud builders, who buy something like 95 percent of the world’s disk drive shipments these days. As far as we know, there is no G5 storage tier in the Nvidia reference architecture.

With that out of the way, let’s do some math on flash storage. For a 1 gigawatt installation using Nvidia “Grace” GC100 CPUs and “Blackwell” B200 or B300 GPUs, depending on who you ask it can power between 500,000 to 600,000 GPUs, depending on the options in the system and the cooling methods used. Matson took 550,000 as an average, which seems reasonable. Nvidia recommends that there is 15 TB per GPU of G3 storage in the nodes for checkpointing and other functions and 30 TB per GPU of external networked storage to hold bulk data.

If you do the math, that’s 8.5 exabytes of internal flash capacity and 16.5 exabytes of networked flash capacity, for a total of 25 exabytes of capacity, for a 1 gigawatt installation.

After poking around on the Internet for estimates and statements by the big GPU and XPU makers, it looks like there were somewhere around 3 million compute engines (by which I mean sockets) shipped in 2023, around 7 million in 2024, and around 10 million in 2025. At 45 TB of flash per GPU/XPU, using Nvidia as a guideline, that means around 135 exabytes of flash was consumed for these AI supercomputers in 2023, around 315 exabytes in 2024, and 450 exabytes in 2025.

That’s a lot of flash. And 2026 will get worse as a lot more demand chases a modestly growing supply, and prices will rise accordingly. The flash chip and flash drive makers are going to be making some big bucks.