Where is Lustre Finding its Brightest Shine?

Lustre will continue to grow in enterprise settings, but as we have described in part one and part two of this extended series on the state of the HPC-centric file system, it appears it will probably not have a sudden, meteoric rise to the top of the large enterprise list if users are not already invested in high performance computing infrastructure and applications.

Still, that is not to devalue the file system’s role in key commercial HPC segments. In fact, in these areas there are some signs of real growth for Lustre, especially in its more stable 2.5 incarnation. Beyond academia and research, Lustre is finding new enterprise HPC users in life sciences, oil and gas, and manufacturing in particular—and growing out to new potential areas beyond the web-scale and hyperscale settings we detailed previously.

Despite the fact that they both are heavily intertwined with high performance computing, the oil and gas industry and Lustre were not historically close mates, at least until more recently. These environments are overwhelmingly driven by GPFS or Panasas pNFS. Terascala CEO, Steve Butler, says that while Isilon and scale-out NAS offerings are also dominant here, Lustre is earning its stripes due to the increased scalability demands. “Lustre has a native advantage here because so much work in hardware architecture has been done on Lustre to ensure it scales and retains high performance,” said Butler. As oil and gas companies continue to pile on the cores for massive simulations, having a robust file system that was developed in step with the newest supercomputing systems will be more important.

According to Cray’s VP of Business Development, Barry Bolding, “The major growth areas we’ve seen for Lustre are in commercial markets, such as oil and gas, where a high performance parallel file system complements the workflow in upstream and seismic applications. The same holds true for Xyratex/Seagate, which has seen its biggest boost in the same realm.

As Petersen echoed, “As their existing solutions are running out of steam and GPFS is becoming too expensive at scale and simply can’t scale as well as Lustre for these very large environments.” He says that while manufacturing is growing swiftly as well, oil and gas customers are demanding the scale, price, and added flexibility of an open source file system.

Oil and gas is also a booming area for Data Direct Networks. “Lustre is coming on strong at the top end of oil and gas,” said Shepard. “It’s expanding there due to the capacity and bandwidth requirements and also because these are large sites that have the type of staff where there can be a real benefit from the levels of customization possible with open source code like Lustre.” Shepard says that companies in oil and gas are always keeping their eye on the next generation of scalability requirements in much the same way the national labs do.

Life sciences, specifically in the arena of genomic sequencing, is a relatively new area for parallel file systems. Until relatively recently, Butler says many sequencing shops were using traditional and scale-out NFS before making the shift into scale-out NAS. Terascala’s customers, including TGEN and Novartis, were looking for faster storage systems to handle their sequence data, but were bumping up against the limitations of existing environments. Recall, however, that not all algorithms for genomics, especially beyond mere sequencing, are finding Lustre a suitable fit.

The limitations in terms of how Lustre handles metadata are not a perfect fit for smaller chunks of data, as we learned about from talking to the Wellcome Trust Centre for Human Genomics. Still, for the growing number of gene sequencing companies and applications, Lustre’s scalability might prove a stronger force than its overall performance due to the way it addresses distributed metadata.

Another slower-growing, but still lucrative early adoption is happening in manufacturing, which has been later to arrive to the HPC party than other areas that rely on large-scale simulations. Many manufacturing workloads are I/O intensive, which means they are favorable candidates for Lustre. However, according to Butler, many of these shops used to only invest in proprietary solutions across the stack. As the more general trend toward building commodity clusters and gluing the stack together with as much open source glue as possible plays out, he expects Lustre adoption in manufacturing to grow—in much the same way as oil and gas.

Another area that has been traditionally entrenched with proprietary tools is financial services, which one very often associates, at least for HPC financial simulations, with GPFS. Again, just as with manufacturing modeling and simulation applications, I/O is paramount. However, Butler says, when it comes to getting Wall Street to take a look at Lustre, the questions still revolve around high availability, security of their data, and the “why fix what isn’t broken” argument around their existing solutions.

These are all very expected areas for Lustre growth, but what about where the tech development is really happening—in large-scale analytics? The potential for Lustre to extend here will depend on how well it can snap into Hadoop and other upcoming frameworks. There are some users who are finding their own path to swapping out the native Hadoop file system (HDFS) with Lustre and Intel has built hooks for Lustre to do so, but few of the Lustre experts we spoke with felt that this was going to a path to rapid, unprecedented adoption of the HPC file system.

In the next section of this series we will look at this adoption curve in the context of the community in flux behind Lustre as Intel and OpenSFS prepare to part ways.

For now, if you’ve missed other sections, read “Tracing the Enterprise Path for Lustre” and “What is Standing in the Way of Enterprise Lustre Adoption“…