Existing Storage Infrastructure is the Bottleneck for AI

There is little question that storage and I/O are hot topics once again with the arrival of a host of new technologies that displaced the tried and true from decades past (disk, bulky parallel file systems, etc.).

This new array of tech, when coupled with the demanding and different workloads AI training and present, adds a pressing quality to storage startups that might have been too early to market if not for deep learning. These include storage startups coupling 3DXPoint and QLC flash as well as NVME over fabrics.

There are only a few such startups and we tried to integrate them into the storage and I/O section of The Next AI Platform event in May to provide a well-rounded view of how storage shifts with AI in the mix. Among those startups we interviewed live on stage at the sold-out event was Vast Data, whose founder and CEO, Renen Hallak, gave us a look inside how their company blended new and old to match emerging demand.

“Existing storage infrastructure is the bottleneck for AI,” Hallak told the audience.

“AI has been rejuvenated in the last few years and leans on some very old ideas and algorithms with a lot of new computation and lots of very fast access to massive datasets. Those are key pieces to AI’s success. But existing storage systems weren’t built for these workloads. HPC has been thrown onto the storage stage but HPC workflows are pretty much the opposite of AI workflows,” he explains.

“Traditional storage systems were always scale limited because of the way they were architected. NVMe over fabrics was one of the best new technologies that enabled us for the first time to build disaggregated shared-everything architectures. That shared-everything approach is key because it allows us to scale to tens or even hundreds of thousands of nodes under a single name space without partitioning data or needing to compromise on the linear uplift of performance,” Hallak adds.

“In the old workloads where you had to burst the data down very quickly without the need to read much of it, burst buffers and caching layers worked well. But for these new random read workloads that span sometimes hundreds of petabytes you need fast access to the entire dataset. The big challenge there is not performance, it’s cost. Because flash is a lot more expensive than disk and to build a multi-petabyte flash store you need new technology that lowers cost significantly.”

He says Vast Data’s advantage is that they started late. “We had a glimpse of the requirements for these new workloads in AI but also glimpses at some of the new technologies that, when architected properly, could answer the challenges.”

For a more in-depth look at Vast’s tech under the hood, check out this deep dive from earlier this year.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.