A Storage Model for the Exascale Crowd

One bit of news that almost got lost in the shuffle at the recent ISC 2019 conference was Intel’s announcement that it is bringing DAOS, the Distributed Asynchronous Object Storage platform, to supercomputing.

Its first big test will be in Aurora, Intel’s exascale supercomputer that is scheduled to boot up in 2021. During the conference, Raj Hazra, Intel’s Corporate VP of the Datacenter Group, mentioned DAOS rather briefly during his keynote, but the company appears to have pinned a good deal of its hopes on the open source technology for its HPC strategy going forward.

And they’re not the only ones. Over the last several years, DAOS has been something of a rallying cry in the supercomputing community, especially those in the Department of Energy, who are looking for a way to move HPC storage beyond the confines of Lustre and its POSIX-based model. The general idea is to provide a more all-encompassing storage paradigm that can support many different kinds of data access models. And while they’re at it, also make the storage system more performant and scalable.

That last goal was the original impetus behind the DOE’s interest in the technology, which resulted in funding Whamcloud (later to be acquired by Intel), EMC, DataDirect Networks, Cray, and HDF Grou, under the agency’s FastForward program. The rationale for DAOS is that it would capable of replacing Lustre (while retaining support for it) on the DOE’s upcoming exascale machinery. So not only would it be possible to access POSIX files, but also things like HDF5 datasets, multi-dimensional arrays, and key-value stores. The paradigm is that everything becomes a container of objects.

DAOS has been in the research stage for a while. When we originally reported on it in 2015, the DOE saw it as a path to a post-Lustre world, bringing object storage to supercomputing. At the time, Gary Grider, head of high performance computing at Los Alamos National Lab, told us DAOS would enable the abstraction of data structures that “happen to live in persistent storage as opposed to memory.” The technology that would make this all possible was burst buffers and solid state storage, more generally, which enables indexed data to be retrieved a lot more rapidly than what’s possible with spinning disks.

Of course, with NVMe rapidly penetrating datacenters, and things like storage class memory, like Intel’s Optane persistent memory, making their way into HPC servers, high-performance NVRAM storage is on its way to becoming mainstream in the exascale era. When we caught up with Grider recently, his focus has shifted as well.

From his perspective, the biggest advantage to DAOS nowadays is that it will be able to handle both conventional block-oriented files and key-value, record-oriented data with equal dexterity. The latter is associated with AI and machine learning workloads, which, as readers of this publication are well aware, are also on their way to becoming mainstream in HPC. Post-simulation analytics is another big area that stands to benefit from moving from a file-oriented approach to a record-oriented one.

The downside is that for existing applications, using DAOS will require some development effort, which is no small thing for software with millions of lines of code. Grider admits that for traditional HPC, ditching Lustre for a new this paradigm is going to happen slowly because of the amount invested in legacy codes. “For AI and machine learning, it’s all brand-new,” he tells us. “They can hit the ground running with a different I/O interface.”

That doesn’t mean the DOE is uninterested in upgrading its existing code base. Even in traditional HPC, the data deluge is stretching the limits of Lustre to handle the multitude of files now being created by scientific simulations. A Los Alamos project that illustrates some of the functionality of DAOS – in particular, its ability to manage both object data and metadata much more efficiently — was recently made public. In this case, they developed a storage management application called DeltaFS, which was able to create trillions of files associated with a particle simulation.

Grider told us the they were able to index the files as they were written so that they could be retrieved much quicker when they were read back for any sort of post-analysis – queries or more sophisticated data manipulations. He says the indexing only resulted in a five percent performance penalty when written. The savings are realized when your read the data back, since the indexing makes it possible to retrieve the records in random fashion much more quickly than would have been possible with a conventional file system.

Again, what enables all this is NVRAM, especially its use in storage class memory (SCM). Intel believes it has a significant advantage here, thanks to its 3D XPoint-based Optane persistent memory offerings, which is one reason why it is leading the charge with this new storage paradigm. But since DAOS is open source, support for it can easily be extended to other hardware. Samsung, for example, has an SCM candidate in Z-NAND, while SK Hynix is supposedly working on a 3D XPoint-like device and Western Digital is developing its own SCM technology. Micron, of course, is still working toward get its 3D XPoint products out the door, but lags Intel by at least two years.

All of this portends big changes ahead for applications that want to take advantage of all this random access performance. But at a time when AI and other cutting-edge analytics are coming to the fore, legacy HPC applications are in danger of being left in the POSIX dust.

“We realize that we were going to have to change our applications to be record-based instead of file-based to utilize these next-generation storage devices that are upon us now and are getting better and better,” says Grider. “Of course, we’re a long way from converting applications to do that.”

Kelsey says:

June 26, 2019 at 7:32 pm

I do just want to point out, that DAOS does provide integration for still using POSIX, so applications don’t necessarily have to do any work to gain some benefit. There are two posix modes planned – one that is more relaxed for well behaved io, which will be more performant, and then a more strict version for applications that require it. Of course, there is more performance to be gained if one does integrate more directly with DAOS, but this can provide a smooth migration path forward. With this posix interface, combined with support for hdf5, mpi-io, and Apache Spark, many applications may not need to be modified at all.

Chris Martin says:

June 27, 2019 at 5:48 pm

What advantages does this have over CEPH? It seems like there is some duplication of effort going on, and CEPH is more mature – while supporting Object, Block, File, NFS, SAMBA and S3.

- Johann says:
  
  June 28, 2019 at 10:48 am
  
  The baseline object model is very different from RADOS. DAOS relies on a multi-level key-value store API built natively over persistent memory. The network stack is also built over libfabric.

A Storage Model for the Exascale Crowd

Sign up to our Newsletter

3 Comments

Leave a Reply Cancel reply