French Nuclear Agency Bolstering Supercomputing I/O Might

The French Atomic Energy agency (CEA) is one of the leading centers for nuclear weapons and energy research in Europe, and as such, is one of the top sources in Europe to watch for extreme scale computing trends. Like other national labs focused on nuclear simulation application areas, however, the continuing challenge has far less to do with compute capability than I/O concerns—an issue that will become more pronounced across extreme scale computing areas as the exascale era approaches.

There are a total of eighteen large-scale supercomputers in France, and while many are for research institutions, including CEA, other big systems, including one for Airbus and an unnamed manufacturing company, also make the Top 500 grade. The largest machine in France, the SGI built Pangea cluster for French energy giant, Total, is at #33 on the list of the world’s most powerful supercomputers. The Tera-100 system at CEA, which still maintains a top 100 spot (#74) will be retired this year to make way for the new system, also made by French HPC company, Bull/Atos.

Exactly half of the French supercomputers on the Top 500 are built by local HPC hardware vendor, Bull, which was acquired by Atos in 2014. As Jacques-Charles Lafoucriere, Department Leader of CEA tells The Next Platform, this new system will come in two stages and is expected to be in the top 20, if not the top ten. While he was reticent to share how many nodes or what processors the system would have and leave us to the fun projected performance math, he does note that the first phase of the machine will feature the 12-core Haswell E5 v3 Xeon processors in its first 2016 installment (possibly in time for the summer Top 500 rankings), and in 2017, Knights Landing processors will round out the full system.

On the storage side, the CEA team stuck with the Lustre file system, although they swapped out vendors, choosing Seagate over DataDirect Networks, which supplied the initial Tera-10 storage cluster with the original machine in 2010. Lafoucriere says that they are satisfied with the performance of Lustre across their simulation workloads and says that when the new Knights Landing processors hit the floor in 2016, it will be easy for the team to add another Seagate Lustre appliance. “Today, we have 200 Gb/s throughput and will double that next year with the new system,” he notes.

Interestingly, the Seagate setup will be the primary storage for Tera-1000, but will not be the faster tier. In 2017, with the arrival of more capacity to meet the Knights Landing compute, CEA will install a swath of flash-based storage on the order of one to two TB/s—not far off from what is expected at Los Alamos on the Trinity supercomputer. Lafoucriere and his team are still in the process of selecting a vendor for that fast flash tier, but expects that decision to made in the middle of the year.

For their set of applications, CEA requires that fast flash tier, as we’ve heard from Los Alamos and others. Nuclear weapons simulations are large parallel jobs and are I/O intensive. Accordingly, the few other centers in the world running nuclear simulations, including Los Alamos and Lawrence Livermore labs in the U.S. often focus extensively on their I/O strategies. As we have touched on recently in interviews about the pre-exascale storage and I/O systems at Los Alamos in particular, there is keen attention being paid to how next-generation pre-exascale machines will handle data movement to reroute around bottlenecks and to keep the systems efficient, namely by rethinking the role of the parallel file system and interfacing with potentially more efficient storage approaches, including object stores, while working with a POSIX-rooted community of codes.

CEA will use their flash in a different way than Los Alamos, however, partially because they’re starting with Lustre and will be using it at the file system level (where Los Alamos is using theirs with a burst buffer at the application layer). CEA’s flash will be used as what is essentially a Lustre pool, in part to prepare the teams for how that might move closer into the system with the arrival of the system to follow Tera-1000, which will be sometime in the 2020 and beyond timeframe. The team is looking at how their I/O layering and libraries need to evolve and what I/O patterns and workloads will need to change.

In an effort to work toward this, CEA has developed its own I/O redirection system software, which relies on I/O proxies to talk to the file system. They are also working on I/O delegation strategies and libraries. The goal is to see the difference between working at a POSIX level (which means it will be transparent to the codes) and the other, although equally intensive, will seek to build a library that is well suited for the next generation of machines. All of this is backed by a Horizons2020 funding effort called SAGE. Other efforts at bolstering the I/O capabilities for this upcoming (and future) exascale systems include exploring, as Los Alamos is, object storage for middle and long-term archival. For this, they are continuing their work with object storage company, Scality, which is also involved in the Los Alamos I/O systems as the platform upon which object stores for nuclear workloads might play out.

The focus on the I/O and storage systems may be a bit less of a sexy story when it comes to large-scale supercomputing, but the fact is, many of the new machines that will hit the floor beginning this year and through 2017 have some snappy processor and performance appeal, but on the ground, the real challenge is going to be the goopier middleware, storage, I/O, and parallel programming stuff that is less fun to read about, but critical to how the overall supercomputing story plays out for the future.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.