As we described in detail in the previous section of this series on the state of Lustre and the roadmap for the parallel file system, beyond traditional HPC, the file system has some weaknesses for large-scale enterprise shops. These tend to come down to a matter of support, especially when compared to its closest rival, GPFS.
Like a number of other vendors who sell storage systems backed both GPFS and Lustre across different product lines, DataDirect Networks (DDN) says that across the 10 to 12 different verticals they track with either a traditional or commercial HPC component, they are always surprised that one file system does not appear to completely eradicate the other, even if there is higher adoption in one market segment.
In a recent conversation with The Next Platform, DDN’s director of HPC markets, Laura Shepard, said that for its Exascaler and Gridscaler appliances, which respectively package Lustre and GPFS, Lustre shops account for only one-third of the business, with the rest dedicated to GPFS. But don’t get the wrong impression. Many Lustre shops roll their own file systems, while the GPFS shops pay DDN for licenses, and if you look at the business based on capacity, the split between Lustre and GPFS is a lot closer. While DDN has contributed a great deal of code to Lustre (the company uses Intel’s distribution) and have done a great deal of optimization and integration to make its Lustre boxes as easy as possible to get up and running, even with the capability for sites to do their optimizations with the open source code, the difference boils down to manageability—and perhaps more surprisingly, cost.
Molly Rector, chief marketing officer at DDN, tells The Next Platform that some of the company’s customers begin investigating Lustre because they imagine it is a cheaper option than GPFS or another parallel file system. “Yes, it’s open source,” she says, “but when you look at the type of hardware alone required to deploy Lustre at scale, even in terms of the way it handles small files and thus driving up the number of metadata servers needed, the cost can be surprising.”
While noting that Lustre is a perfect fit for some large-scale HPC sites, particularly those that have the people required to administer the parallel file system and its underlying cluster as well as make custom optimizations, Rector does suggest that aside from the “hidden” cost issue, Lustre is “not as well packaged and ready as GPFS from an optimization of deployment or ease of install perspective.” The key is to have a large staff, or at least someone on staff that has worked with Lustre before—and that can be a limiting factor for some organizations.
The teams developing Lustre are working hard to stay ahead of GPFS every step of the way, with major updates including the handling of distributed metadata and features like hierarchal storage management (HSM), which had previously set GPFS apart. However, when it comes to more enterprise features, including the ability to do snapshots, Lustre still has a long climb.
“If you talk to someone whose exclusive focus is exascale or extremely large systems, over the next five to ten years, the main concern is ensuring pure performance and very large namespaces,” says Torben Kling Petersen, principal engineer of high performance computing and strategic engagements at Seagate, which acquired Xyratex, a storage company that sells a optimized tailoring of Lustre, last year,”But for the standard enterprise user, these things aren’t much of a concern. What they are looking for is a fast file system and a large namespace, but they also want something that looks like an EMC solution on steroids. They want functionality and manageability and aren’t like the national labs that can hire PhDs to manage the file system.”
Petersen says that while he applauds all the work that’s been done in Lustre to make it more stable, it still lacks key functionality and stability features. The approach of adding functionality on top of functionality and increasing the complexity of the software stack, however, is to Seagate, “a recipe for disaster.” And thus Seagate is working on Lustre to make it stable for the middle of the market—the “average” high performance computing environment in manufacturing and other areas.
While the added complexity with layering on Lustre features is one issue, Petersen says the real problem with Lustre’s poor reputation in some circles in terms of reliability and stability has far less to do with the code and much more to do with the user.
“There are an incredible amount of tunables inside of Lustre, but unfortunately, tuning Lustre is an art form—and a difficult one to master at that,” Petersen explains. “When people talk about the stability of Lustre not being good, it’s often not hard to look back and find where a system administrator read about performance speedups somewhere and decided to tune it on the fly.”
He said that while some vendors have focused on locking down their file system boxes to avoid this kind of tuning, Seagate’s approach is different—at least to a point. “We offer the ability to tune many parameters, but we’ve set controls inside the system where if a user changes a parameter they shouldn’t, it sets itself back to the original setting.”
For some end users, including one we discussed in detail at the Wellcome Centre for Human Genomics, the choice between GPFS and Lustre was complicated because one of potential limitation for workloads that deal with small bits of data. Lustre does not have a distributed metadata service that works like GPFS. With the 2.5 release, Lustre began using multiple metadata servers, each of which claims responsibility for a certain part of the namespace that describes the data in the file system. This means that it is possible to allocate I/O to different metadata servers depending on need, which is useful for areas where the blocks of data are bigger. But for statistical genomics (in that Wellcome case) having core distributed metadata built into the file system made more sense. For others, particularly in oil and gas and other simulation-heavy segments, the Lustre 2.5 way is just as (if not more) suitable.
This is not to say at very large scale that either Lustre or GPFS have figured out a perfect solution. Consider the distributed metadata service advantage that some customers cited as the reason for choosing GPFS over Lustre.
As Gary Grider, a well-known pioneer in file and storage systems at Los Alamos National Lab, explains: “Since with GPFS everyone can do metadata updates, there is a huge locking problem which doesn’t exist in Lustre because you are shipping your metadata update requests over to the server-side and the server-side is doing the locking. The locking protocols in GPFS have had to become incredibly scalable and in some cases even have hardware assists to be able to deal with locking issues. IBM has done wonders in trying to mitigate that, but at large scales, despite all the great work they’ve done, it can be a real bottleneck. Some workloads bring this out and others don’t—there’s no perfect answer.”
On the business end of the spectrum, the limitation for Lustre in enterprise environments is a matter of support, says Grider. “When you get beyond the bleeding-edge supercomputing sites, Lustre is far more of a services offering and less of a product. You’re not really buying Lustre itself, you’re buying part of a person at Intel or Seagate, or wherever else to support you. Lustre grew up that way, though, which is different than GPFS or even another file system like that from Panasas.”
Grider says that for commercial sites that want to look to Lustre, there are a few options, but there is no “single pane of glass” to manage Lustre without loss of the flexibility and tenability one would get by installing Lustre on commodity hardware using in-house managers to get it up and running—itself a major task that takes expertise.
“If you buy into Lustre and you’re using it at any scale, you’re biting off costs for some part of a team or people at your site plus the initial support contract from an Intel, Seagate, DDN, or other vendor. From a cost perspective, that’s a huge step,” Grider explains. “Just to bring in that very first byte of storage, with the cost of the person to manage it, the support, and the hardware, you’re talking about a major upfront investment.” Using a company like Panasas, as an example, is different, he says. “It’s like a parallel NetApp–bring it in, give it an address, and you’re off and running.” The final word cost-wise, according to Grider is that if you’re putting in anything less than a rack (and even beyond that) it’s “dumb to hire and manage it on your own.”
With so many limitations described here, it might seem that Lustre’s challenges are too profound to tackle. However, for its key segment—namely, research and commercial HPC—it is alive and well. And it is in a ranges of growing use cases that the real scalability promise might be met by a more varied set of users. That is subject of the next part of the series, Where is Lustre Finding its Shine?