The Rise of Flash Native Cache
March 8, 2017 Nicole Hemsoth
Burst buffers are growing up—and growing out of the traditional realm of large-scale supercomputers, where they were devised primarily to solve the problems of failure at scale.
As we described in an interview with the creator of the burst buffer concept, Los Alamos National Lab’s Gary Grider, the “simple” problem of checkpointing and restarting a massive system after a crash with a fast caching layer would be more important as system sizes expanded—but the same approach could also extend to application acceleration. As the notion of burst buffers expanded beyond HPC, companies like EMC/NetApp, Cray, and DataDirect Networks (DDN) picked up the ball and rolled out commercial burst buffer offerings to snap into larger storage system deals and large HPC sites like NERSC began experimenting with just how useful burst buffers could be beyond mere checkpointing.
The developments around burst buffers, coupled with the growth of budget allocations for flash, has pushed the few companies offering such fast caches further into mainstream enterprise. In fact, the term “burst buffer” was so synched with HPC that DDN says they are looking for a way to recast the flash-based technology for wider appeal. DDN’s James Coomer says that they are more interested in describing burst buffers as a “flash-native cache” that can sit above fast storage and go far beyond checkpoint to include data protection, the ability to talk to anything POSIX or MPI IO based, and optimize data for a parallel file system (in DDN’s case, Lustre or GPFS).
“The prototype burst buffer idea was a checkpointing system, a big write cache that could absorb huge amounts of data from many threads and dump that into a flash-based system such that if the application fell over, it could re-read that checkpoint and start again. This was great for high concurrency, near exascale systems given the failures—it was an enabler of larger-scale systems,” says Coomer. However, that same idea is broadening into more of a generic flash distributed platform for all kinds of I/O. “People want a more generalized platform for flash and can now afford to have large scale-out flash deployments that can address far more problems; reads, writes, mixed I/O and random I/O.”
So if burst buffers are growing up, where and how will they fit in—and for what high-value applications? Consider what a time-series database in financial services or machine learning applications need from a hardware perspective first. For instance, one use case for DDN’s IME burst buffer involves the KX tick database for investment banking. In that case, there are a number of nodes with that in-memory database doing rapid analysis on the contents of that memory. The problem is a wealth of historical data that needs to be fed to the nodes as memory subsystem as fast as possible. The bottleneck, of course, is keeping that memory fed and flash cache fits the bill for those kinds of large datasets with intense concurrent read operations.
Consider too the needs of machine learning—and neural networks in particular. These nodes are often equipped with accelerators, which adds a great deal of compute performance, but the neural networks must be fed as fast as possible using a training set. There might be 100 or thousands of dense compute nodes that need to be fed with hundreds of examples at a time, with each iteration taking a few seconds. Across so many nodes, this becomes a big I/O problem—one that is the opposite of the traditional burst buffer problem—high concurrency, large datasets, and the need to feed the memory beast via large concurrent and read operations from the flash cache.
Coomer says even out of the box, machine learning shops and others with similar access requirements are already ahead. “All the training data or historical data is already sitting on low latency flash, which is distributed with a shared namespace and available to all the nodes. It’s already a good start to have this sitting on low-latency media.” DDN says they are doing work now with an unnamed hyperscale company to prove out this concept and have run successful benchmarks on ImageNet with more deep learning frameworks to be tested in the near future.
The use cases for the more mature burst buffer are still found in commercial and research HPC, with oil and gas, financial services, manufacturing, life sciences, and other market segments seeing the light for key applications. But where DDN expects to see a rise in interest is on the large-scale analytics side—these very machine learning use cases described above, in addition to more real-time or streaming analytics workloads inside the enterprise.
The burst buffer idea might have an HPC appeal, but as DDN and others work to push the same concept out to a more general user base, there are still some questions about what it can provide. Mostly, Coomer says, people seem worried about integrating their applications, especially at smaller sites. What to do with the scheduler to make it burst buffer friendly, how to instruct users to get the most out of it, and what optimizations need to be made are the most frequent questions. He says that while it can run out of the box (especially for applications that have I/O demands that could benefit from flash to begin with—not all can get a speedup), the toughest thing is undoing all of the optimizations that were made to get around the limitations of parallel file systems. This involves increasing the number of I/O threads and this, along with the scheduler, can be done relatively quickly. On the scheduling front, the goal is to manage the queue for mixed application environments so that those applications that are slow on a parallel file system can get bumped to the buffer. Ultimately, since DDN’s IME burst buffer is integrated with the file system, from the user perspective, nothing looks any different after optimizations.
Ultimately, Coomer says, the flash is just the hardware component—it takes a tuned software stack approach to get beyond the limitations of file systems, which are the real I/O problem many sites face. “The file system often gets in the way of the performance and potential of flash devices. Even if the application is able to issue huge IOPS or random I/O, there is still a bottleneck. The file system can also get in the way of shared file access—something we see in seismic applications, for instance.” The trick for burst buffer makers is to deliver a flash platform that can account for the weaknesses of parallel file systems with the support that can help users undo all the things they’ve hard-coded to get around those issues, including limiting the number of I/O threads—something that doesn’t need to happen with the flash cache of a burst buffer.