Improving HPC File System Performance and Data Integrity

The precise point of departure in the history of almost anything of significance – of a country, a company or a technology – is usually imperceptible. A new chapter in the continually evolving HPC market segment began about 10 years ago when a few HPC visionaries foresaw a shift – one that would run parallel with the everlasting drive for more compute power. It was the need for significantly improved high-performance data storage, protection, movement and bandwidth concomitant with processing speed.

Of course, even in 2005 it was a truism of long-standing that “the world is drowning in data and starving for insight.” But performance and prestige in the HPC industry still was – and still is – largely measured by amped-up LINPACK numbers and rankings on the TOP500 list of the world’s most powerful supercomputers. R&D effort, system design and purchasing were all heavily weighted toward gaining more power.

It was around that time that a cadre of forward leaning computer scientists saw that HPC systems were out of balance for some segments of the supercomputer market.  They became thought leaders in the HPC data space: data analytics, data handling and data movement; keeping data close to the system when it was needed and pushing it away when not needed. They envisioned the need for a new storage architecture with a radically different price-performance paradigm – one that breaks away from dependence on the “secret sauce” of expensive RAID hardware and toward more flexible and less costly software-defined storage.

The new approach they envisioned would need to provide storage functionality beyond moving and storing bits on media to also include data resiliency, fail-over and other advanced storage services. It would need to do this as a Beowulf activity using commodity off-the-shelf products along with open source software.

The new system would need to have online error checking that, in the event of a hardware storage glitch, enables the system to remain operational while it’s detecting errors on a disc, providing comprehensive data integrity – all the way from the hard drive up to the application.

Such a system would need to accommodate disc drives of up to 8 terabytes with delivered bandwidth of a gigabyte per second per drive – an entirely new level of performance.

Because it was something new, the technologists developing such a system would need to be risk-takers, with the confidence of early adopters and the competence to seamlessly move hundreds – even thousands – of users within a high production environment without interrupting ongoing science work.

Such a system, were it to exist, would need to be versatile and easy to use, requiring little support from the thousands of end-users running their applications and accessing their data.

SDSC’s Comet Offers Breakthrough Storage Technology

In California, at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego, that system now exists.

It’s called Comet, a petascale supercomputer launched in May 2015. (We profiled the server node and network configuration of Comet back in June.) The system offers a breakthrough storage technology based on Lustre (Intel Foundation edition), the most widely used open-source parallel distributed file system for large-scale cluster computing, combined with OpenZFS, advanced storage management software that provides protection against data corruption and support for high storage capacities and efficient data compression.

Comet is the culmination of SDSC’s ten-year migration toward something new in the HPC industry: becoming a world-class data-intensive, along with a compute-intensive, supercomputer center.

Comet joins SDSC’s Gordon supercomputer as a key resource within the National Science Foundation’s XSEDE (eXtreme Science and Engineering Discovery Environment) repertoire, which comprises a collection of some of the world’s most advanced integrated digital resources and services.

Designed to support up to 10,000 end users, Comet provides a solution for emerging research requirements referred to as the “long tail” of science, the idea that a large number of modest-sized, computationally-based research projects represents, in aggregate, a tremendous amount of research and resulting scientific impact and advance.

Core to Comet and other HPC clusters at SDSC is the “Data Oasis” Lustre-based parallel file systems designed and deployed by Aeon Computing, also of San Diego. Comet’s portion of Data Oasis is a 32-node system that enables it to retrieve or store 240 TB of data in about 20 minutes – an exceptional level of performance. “When we first built Data Oasis, we realized we were changing the landscape for cost effective Lustre performance,” said Jeff Johnson, co-founder of Aeon Computing. “When they brought us back to design the second generation of Data Oasis we were excited to design a Lustre file system with OpenZFS” continued Johnson. “The system was architected with no hardware-based RAID technology and no hardware bottlenecks, specifically optimized for Lustre and OpenZFS.”

Early in 2015, a partnership between SDSC, Aeon Computing, and Intel began working together on Data Oasis to include OpenZFS.

Now with the Lustre file system designed by Aeon, Comet users will have access to 7.6 petabytes of Lustre-based high-performance storage, with 200 gigabytes-per-second bandwidth to the cluster. It is split between a scratch file system and an allocated file system for persistent storage.

Lustre Plus OpenZFS Provide Software RAID capability

The incorporation of ZFS-backed storage within Data Oasis delivers significantly improved performance and data integrity. ZFS continually monitors and repairs low-level blocks of data stored on disk, avoiding the silent data corruption that can occur with storage as large as Comet’s. Comet will have a second level of data reliability as well, since the first-generation Data Oasis servers are being consolidated and re-deployed to create a “nearline” replica of the active file systems.

It is the integration of Lustre and OpenZFS, initially developed in partnership between Intel and Lawrence Livermore National Laboratory, that marks an important advance in data-intensive supercomputing and a milestone in the development of software-defined storage. It’s one of the first large-scale Lustre file systems to make full use of ZFS direct-to-disc drives without any hardware RAID technology.

Combining the Lustre with OpenZFS storage management has important implications for the HPC industry. According to Earl Joseph, a top HPC industry analyst with IDC, “Along with IBM’s General Parallel File System (GPFS), Lustre is the most widely used file system. But Lustre is experiencing healthy growth in terms of market share while GPFS remains flat.  Lustre is also supported by a large number of OEMs, providing the HPC community with a strong base for growth.”

“The largest challenge we faced in building Comet’s storage capability was hitting the performance target of 200 gigabytes per second,” said Rick Wagner, High Performance Computing Systems Manager at SDSC. “We have an Ethernet-based solution, which is unique; we use dual 40 gigabyte Ethernet adapters in each server. We had to get almost 7 gigabytes-per-second out of each of our 32 servers, and it took a lot of work with input from both Aeon Computing and our Intel support team. They provided substantial help in identifying code bottlenecks, showing us what changes to the Lustre code base we needed to drive the ZFS performance the hardware was capable of, and in particular the gap between the network performance and disk performance – eliminating that bottleneck and bringing both in line.

“From a bandwidth perspective, ZFS enabled the file system to perform at a rate on par with the network,” Wagner added. “This is significant, because usually software gets in the way of performance, but in this case it kept pace and delivered more features and easier maintenance. Plus, we get to take advantage of any future improvements to ZFS and Lustre.”

An impressive element of Comet’s storage capability is its performance in the SDSC user environment since its early operations phase began in April 2015. It’s a demanding environment because SDSC’s mission is to support large communities of users via “Science Gateways,” or web-based portals to community-developed tools, computing resources, applications, and other data for scientists engaged in cutting-edge research. Smoothly migrating thousands of SDSC users from Comet’s predecessor, Trestles, was a major task.

“We’re providing a high performance file system with a single logical namespace that can support up to thousands of jobs running simultaneously. We call it ‘supercomputing for the 99%.” – Rick Wagner, SDSC

“With a 10,000 user goal, it’s simply not possible to hold everyone’s hand,” said Wagner.  “We put a lot of effort into making the transition to Comet as simple as possible.  When we brought Comet online into full operation, we had 100 users log in and start running jobs with zero communication with anyone at SDSC.  It was that easy for them.”

“What we’re providing is a high-performance file system with a single logical namespace that can support up to thousands of jobs running simultaneously,” said Wagner.  “We call it ‘supercomputing for the 99 percent.’ Comet’s goal is to support a diversity of users, a wide variety of scientific domains and Gateway applications across the NSF directorates. We need versatility, ease-of-use and reliability – and the Lustre-OpenZFS implementation has been a big part of that.”

According to Mark Seager, Intel Fellow and Chief Technology Officer for Intel’s Technical Computing Ecosystem, Comet’s storage strategy represents a major advance in HPC storage price-performance.

“Previously, to get high performance you had to have a custom RAID controller, which meant a custom high end box from companies such as EMC or IBM,” said Seager. “But the rate at which Intel Architecture processors have been advancing, due to Moore’s Law, we were able to overtake the high performance capabilities faster than they were able to develop the next generation of their proprietary solutions. Over time, Moore’s Law ate them alive, so now all the storage value differentiation is in software.

“What we’re doing with SDSC,” Seager said, “is showing that you don’t need the ‘secret hardware-based sauce’ from high-end RAID vendors. Instead, you can do this as a Beowulf activity with commodity off-the-shelf Intel products plus open source software – Lustre and OpenZFS. It shows that in a production environment with very high bandwidth requirements we can deliver world class performance. They’re getting a leap-ahead bandwidth capability.”

Brent Gorda, General Manager of Intel’s High Performance Data Division, adds that software RAID – doing in OpenZFS software what used to be done in RAID hardware – is a big advantage.

“In a hardware RAID environment,” Gorda said, “there have been hundreds of thousands of people hours put into the algorithms and safety mechanisms built into the hardware to make sure your data is bulletproof. But software is much more malleable and easier to change, and it can be done by dramatically fewer people in much less time. Using Intel Xeon processors E5v2, these RAID calculations, which are really just mathematical operations, can be done in real time in the CPU and vectorized with AVX. You can get rid of the very expensive RAID card and start doing what ZFS calls ‘RAIDZ.’

An important aspect of the SDSC-Aeon-Intel partnership is Intel’s back-end support of Lustre, drawing on Intel’s and Aeon’s technical knowledge and experience with complex Lustre configurations, ensuring that the Lustre implementation is stable and production-worthy for the lifetime of Comet.

As HPC systems evolve to become both compute-intensive and data-intensive, Comet’s incorporation of Lustre on top of OpenZFS has major implications for the future of HPC in general. According to Gorda, SDSC deserves praise for being an early adopter of this new storage architecture, for their hard work to blaze the trail, and for sharing the results with the HPC community.

“For a supercomputing center to say ‘we know this is the right technology path, and we’re going to be the ones that put our necks on the line and start using software RAID in a very large and very visible production system’, is very commendable. These guys are willing to be the pioneers to make this stuff work, and they’ve deployed it in a big way.”

Wagner envisions continued development in partnership with Intel of the Lustre-OpenZFS storage strategy.

“We’ve been really pleased with Intel’s forward-looking dedication to open Lustre development,” he said.   “We’re going to build on that relationship over the years to come, and it really came through for us in the Comet project.  As the Lustre and OpenZFS code base is enhanced and more performance comes out of it I want SDSC to be there right alongside, taking advantage of it and continuing to be an early customer. This is an area where we will continue to work with Intel and to show leadership.”

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.