What Sort of Burst Buffer Are You?
November 11, 2016 Dan Robinson
Burst buffer technology is closely associated with HPC applications and supercomputer sites as a means of ensuring that persistent storage, typically a parallel file system, does not become a bottleneck to overall performance, specifically where checkpoints and restarts are concerned. But attention is now turning to how burst buffers might find broader use cases beyond this niche, and how they could be used for accelerating performance in other areas where the ability to handle a substantial volume of data with high speed and low latency is key.
The term burst buffer is applied to this storage technology simply because this is the purpose for which it was originally developed, according to Laura Shepard, senior director of product and vertical marketing at high performance storage firm DataDirect Networks (DDN).
“A burst buffer is a layer of storage that has been introduced between compute and the traditional high performance storage layer, and its preliminary intention was to capture peak I/O so that a site would not have to size its storage infrastructure to be able to match this peak load. This allowed it to burst I/O into a level of storage that could then act as a buffer for those peaks and afterwards relay it out into the standard storage environment,” Shepard explains.
“However, there is a lot of development going on that is taking the products in that category beyond that original definition, and that’s certainly the case with IME,” she says, adding that there is as yet no accepted label that encompasses all of the use cases for which the technology might be applied to, and so the burst buffer tag is being applied even when this storage layer is not strictly implemented as a burst buffer.
IME is DDN’s burst buffer, and stands for Infinite Memory Engine. It is essentially a software product that works with high speed flash memory, but is also available as part of an all-flash storage appliance from DDN known as the IME14K, and the newly-announced IME240 and also as software only.
“What we’ve done is we’ve built a layer of very, very fast storage between compute and traditional storage, and in this case it is based on NVMe. It arranges I/O to go into that very fast flash tier, and it can write into that fast flash tier without the constraints of a file system,” explains Shepard.
In essence then, DDN’s burst buffer is a super-fast extra layer of storage that sits between the compute nodes and the primary storage. Its capacity is typically much smaller than the primary storage, while its purpose is to absorb peak I/O throughput levels without the site having to go to the expense of engineering its entire primary storage to be capable of matching those peak levels. But once you have this fast extra storage layer, why not put it to other uses as well?
DDN believes that IME is more flexible than other burst buffer architectures and can easily be adapted to other purposes because it is primarily a software solution, and is independent of any particular compute platform. It is also application independent and requires no application changes, meaning that sites should be able to operate a heterogeneous environment where multiple applications can make use of one burst buffer, and no custom code effort is needed to leverage it.
One such additional use case for the technology that DDN sees is as a file system accelerator or an application optimizer, with these representing two faces of the same benefit use case, according to Shepard. Here, the burst buffer is essentially used as a sophisticated write cache, in order to optimize the performance of the primary storage by consolidating many small file operations into fewer, larger and well-ordered ones for greater efficiency.
“Much of what people are trying to do with a burst buffer was originally to spare the site from having to over-purchase storage performance capability, but another capability that has been added to the things a burst buffer is trying to address these days is to solve a performance issue as well, and that is the contention caused by the locking under a POSIX file system,” Shepard says.
Most large sites are using parallel file systems because they are able to get more performance from more nodes into more storage devices by using a parallel file system than is possible with a traditional file system. But parallel file systems are still governed by the same POSIX rules that apply to traditional file systems, and specifically that there must be some locking capability to ensure that one process does not overwrite the data for another process. What this means is that everything has to go through lots of different steps, with a process acquiring access to a file, writing its updates, then releasing its lock on the file, and the upshot is a lot of latency whenever multiple processes are accessing the same data.
“This is the problem with file systems in getting to exascale, or beyond the current performance paradigm with standard file systems and parallel file systems. You have this POSIX locking that creates contention within the file system and the I/O and slows your performance. We’ve demonstrated this a number of times at various conferences – that when you have a lot of very fast, small I/O thrown at a parallel file system, you end up taking it down to a very small percentage of what its peak capability is supposed to be just because it’s dealing with all the requested operations,” Shepard says.
DDN says that its burst buffer can help out in this situation because it is not a standard file system. “With IME, data from the compute side is sharded anywhere it wants to, it writes anywhere within the very fast data tier as its coming out of the compute side, and then the burst buffer holds the data and aligns it so it can be written out in the most optimal way to the parallel file system, so the file system instead of receiving lots of little requests that can make it thrash, is receiving large well-formed I/O that is optimal for it.”
Using this technique, writes in a parallel file system such as Lustre or IBM’s Spectrum Scale File System (formerly GPFS) can be 10, 20, 100 or up to 1000 times faster than without the use of the IME burst buffer, DDN asserts. And it is not just file storage that could be accelerated this way: DDN expects that the same technique might eventually be applied to other storage systems, such as object storage.
“Then you would have a very fast burst buffer and a massive active archive relying on object storage, and potentially nothing in between. Many sites are going to stay with multiple tiers of storage, but this does open the door to dealing with very fast data in one particular way and having what you like on the back end, escaping traditional tiered storage altogether for some sites and some use cases, but that’s some way down the road,” Shepard explains.
The performance increase gained from this technique can also enable the burst buffer to serve as an application optimizer tool. While this is effectively just a flip side of the same use case, “we have to measure them both because different folks will see themselves in one camp or the other; one site will know that their issue is locking contention in the parallel file system, but some will look at a similar situation and say that what is really bottlenecking is this application or specific set of applications, in which case we talk about the same capability as an application accelerator,” she adds.
The third use case for burst buffer technology is what DDN refers to as a core extender. This is aimed at delivering the required level of performance for applications with data sets that are too large to fit into main memory in the compute cluster, but for which the latency of shuffling data back and forth between memory and primary storage would be too high. In this scenario, the data is accessed from the burst buffer rather than primary storage, and so the burst buffer is used to accelerate both reads and writes.
“Core extender is something that will become increasingly important as systems get larger and larger, and the data sets for a specific problem get larger and larger,” Shepard says. “We all see this problem that now we have all this wonderful instrument data and there are supercomputers that can produce massive amounts of data for a single model, and we want to compute a lot of it together to increase the data set size to get more precise in the problem resolution.” In a lot of cases, this is going to lead to the data set growing to be larger than the amount of memory in the system itself.
“If the data set I want to compute is too large to hold in memory, and takes too much time to batch through in smaller pieces, but I don’t want to incur the latency of going all the way out to spinning media to compute every I/O for this problem, I can instead pull that data into the burst buffer and compute it off of a much faster media much more proximate to the processor, and without the contention rate of the standard tools that go all the way out to disk,” Shepard said.
Currently, this use case, alternatively known as a read-optimized application I/O accelerator, is not the largest use case that DDN sees for the burst buffer according to Shepard, but it is expected to be one that will be increasingly in demand in the future.
With additional use cases, DDN expects that burst buffer technology could find its way into places beyond the traditional HPC applications, with financial services, oil and gas, and the manufacturing sector all showing early interest in the burst buffer’s potential.
“In financial services, it is about dealing with very, very large models, having real-time analytics into existing large amounts of historic financial data to go for singular modelling. They have lots of different types of applications, lots of different data types, and they’re trying to compute it all at the same time,” Shepard says.
“It’s quite evident to me that this is going to become a more and more standard layer.. within HPC over the next several years, and it’s starting to cross the chasm into the commercial side for those who really need it. There is nothing else on the horizon that fills the same need, and that need is only going to increase over time.”