When It Comes To AI, Flash Is Getting Smaller And Bigger

Sponsored We all know AI and machine learning rely on vast amounts of data, just as we all know the Internet is largely cat videos and other user-generated content no one ever looks at.

Although the latter may well account for a large amount of the 6.72 ZB of data that is currently stored worldwide, according to IDC’s StorageSphere report, the former is a driving factor behind the 19.2 per cent CAGR in storage volumes projected between now and 2025.

Even as system builders weigh up the pros and cons of CPUs and GPUs, IDC predicts it is the storage element of AI hardware that is set to grow fastest this year, with revenues up 31.8 percent, compared to 26.4 percent for AI servers. To put that in context, revenues for the wider global data center systems market are expected to grow 7.7 percent in 2021, according to Gartner forecasts.

And when we talk about AI storage, we’re really talking about flash. Last year, IDC cited all-flash object stores, along with new data generation and the adoption of high performance parallel file systems, as one of the reasons why storage revenue growth would outstrip server growth.

When it comes to AI, machine learning, and other data hungry applications, the onset of SSDs on NVM-Express has changed everything, opening up the bottlenecks that had often left the ever-growing number of cores in modern CPUs – and GPUs – woefully under-utilized.

“The AI engines are often driven by data that you want in DRAM,” says Eric Pike, senior director for cloud enterprise flash marketing at Western Digital. “But system limitations mean there’s only so much DRAM you can put in front of a processor.”

And even if those system limitations didn’t exist, there is no escaping DRAM’s substantial price premium. “Eventually,” says Pike, “You need to reach beyond that. That’s where very large scale SSDs have become really advantageous.”

But raw capacity alone is not the only factor to consider when matching SSD storage to AI and machine learning workloads. Machine learning and AI operations can broadly speaking be split into two key phases, and each phase has distinct requirements and imposes different demands on the storage component of the system.

In the training phase, where the model or algorithm is being built and refined, a framework is put to work on a large training dataset. For example, a movie streaming service building a recommendation engine might train its algorithm to predict users’ preferences, using a dataset of what users had watched – and not watched – in the past, along with other their other interactions with the service.

Once the model or algorithm has been built, the focus turns to inference, or the use of the trained model or algorithm in production to deliver predictions or results. In the case of the movie recommendation engine, when a viewer uses the service, the algorithm predicts and highlights other items they should enjoy.

Training is storage intensive, relying on a broad range of data to build the model. As Pike explains, this is where “You start talking about massive scale analytics, where we use terms like data lakes, where you start talking about hundreds of terabytes, petabytes and even exabytes of data.”

By contrast, Pike explains, inference is compute intensive. “You’re analyzing data in flight as you ingest it. You don’t need a large amount of capacity, because you basically just need to hold it long enough to analyze whatever you can immediately.”

Not All SSDs Are Created Equal

As Pike explains, there are characteristics that differentiate all SSDs. These include latency and quality of service, as well as read-specific performance.

However, when you focus on SSD options, random write performance stands out as one of the distinguishing factors. “In the SSD market, you get significant write performance improvement if you go from a SATA to SAS, and then go from the SAS to an NVM-Express. But then even within NVM-Express, we have different kind of categories or classifications… generally speaking when you get to the top end, you get the best random write performance.”

“When you look at these different ways to define performance, the fascinating thing that we’re seeing is you don’t necessarily need that extreme high performance with AI training, or very large data lake-scale workloads.”

With training, for example, “You want to pull in as much data as you possibly can – the largest available capacity, all at once, sequential write – you’re dumping that data in that large scale analytic environment, and you’re reading from it randomly, many, many times.”

Conversely, compute intensive AI inferencing involves ingesting data, often from multiple sources, and analyzing it on-the-fly. Random write performance becomes significantly more important. In a compute intensive environment, access density becomes a key characteristic, says Pike. He describes access density as the ability to read or write the full capacity of a device. “During AI inferencing, the data is transient and therefore the size of the device is driven by how efficiently I can access the data. I don’t need more capacity than that.

Some Data Comes And Goes In A Flash

This bifurcation at the device level is echoed at the storage array level., Devices targeted at the compute-centric, inflight data space may have just a handful of lower capacity disks, reflecting the fact that access density is key.

The onset of PCI-Express 4.0 with its obvious implications for NVM-Express will be another factor in compute, Pike says. “You’re going to want to chase that right? Because it increases your access, and therefore may increase your utilization.”

Another transition that will influence SSD is the proliferation of devices designed to the Enterprise and Datacenter SSD Form Factor (EDSFF) standards, which address some of the short comings of earlier formats originally designed for client devices, such as the lack of hot swap capability.

There are over a dozen variants within the EDSFF standard, and Pike says, Western Digital is working to map these to architectural systems functions and application use cases, “So we can simplify how we deliver value added technologies to different factors.”

For example, Pike says, an E1S (S for short) device caps out at 8 TB just because of the physical limitations of getting NAND in the space, with the sweet spot in the 2 TB to 4 TB space. This makes the E1.S form factor well suited for the compute intensive workloads.

Meanwhile, the E1L (L is for long) format allows for much more physical storage to be put behind the controller. As, Pike explains, “Today in the market, you’re going to see E1Ls that will start at 16TB,” with 32 TB and 64 TB solutions on the roadmap horizon.

“At the rack level, with this E1L form factor, you can put 24 E1Ls inside a 1U system, and tens of E1L chassis inside of the rack, and get this absolutely astronomical amount of data storage capability in a unit volume of space.”

This raises the possibility of 1 PB of flash storage in a chassis, and over 10 PB in a rack, all operating at NVM-Express speeds. That should present a formidably large dataset for model training.

But as AI/ML models and their underlying datasets continue to grow at an explosive rate, and the systems used for both training and inference encompass ever more data hungry CPU and GPU cores, the need for data at speed and at scale will only increase.

As Pike says, “We’re still scratching the surface.”

This article is sponsored by Western Digital.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now