France Merges Cascade Lake Xeons With Volta Tesla GPUs For AI Supercomputer

As part of France’s national strategy to advance to the forefront of artificial intelligent research in Europe, the government will install a 14 petaflops supercomputer for the French National Center for Scientific Research (CNRS). The system, built by Hewlett Packard Enterprise, is one of a new breed of dual-purpose supercomputers designed to support both HPC and AI applications.

The machine, which will be known as Jean Zay (named after an accomplished French politician and public servant), is being procured by GENCI and will be deployed at CNRS’s Institut du Développement et des Ressources en Informatique Scientifique (IDRIS). Installation is scheduled for June 2019, with full production of the system to follow in October.

In terms of hardware, Jean Zay is based on the HPE SGI 8600 platform and will be comprised of 1,528 CPU-only nodes, which are outfitted with two “Cascade Lake” Xeon SP 6248 processors (20 cores at 2.5 GHz), and an additional 261 GPU-accelerated nodes, each equipped with two of those same Cascade Lake processors plus four Nvidia Tesla “Volta” V100 GPU accelerators. Each of the nodes will sport 192 GB of main memory, with each of the GPU outfitted with 32 GB of its own local memory. The system will be hooked together using Intel’s Omni-Path fabric using a quad-rail interconnect that delivers 400 Gb/sec of aggregate bandwidth.

DataDirect Networks will supply the storage subsystem, which will be based on the Spectrum Scale (formerly GPFS) parallel file system. It is comprised of six DDN GS18K appliances, equipped with 432 SSD drives (3.2 TB apiece) for a total net capacity of one petabyte.

The impetus behind the machine’s dual-purpose design is a result of the French government’s stated intent to become a major player in the AI research arena. The goal is to vault the country into the top five of the world’s leading nations in AI research, as well as make France the top dog in this technology on the European continent. Besides installing supercomputers like Jean Zay, the country will also leverage its well-regarded computer science and mathematics schools to develop and expand its AI-savvy workforce.

Like other multi-petaflop machines of its kind, Jean Zay’s AI prowess is derived almost entirely from the Tensor Cores in the V100 GPUs, which are specifically designed for the kind of matrix math needed to accelerate deep learning computations. While each of the NVLink-equipped V100s delivers 7.8 teraflops of double precision floating point (FP64) performance, each one also simultaneously provides 125 teraflops of mixed precision (FP16/FP32) deep learning performance. That works out to about 130 petaflops of deep learning number-crunching across the whole system.

Which puts Jean Zay in rather rare company. Only the Department of Energy’s “Summit,” “Sierra,” and “Lassen” supercomputers, the AI Bridging Cloud Infrastructure (ABCI) system at Japan’s National Institute of Advanced Industrial Science and Technology (AIST), and the Taiwania 2 machine at Taiwan’s National Center for High Performance Computing are equipped with more V100 GPUs than Jean Zay.

The Summit system, which also happens to be the most powerful supercomputer on the planet right now, exceeds them all, with 27,648 V100 GPUs. Together they provide more than 3 exaflops of deep learning performance. Unlike Jean Zay, Summit was not originally designed to be an AI powerhouse. That came about as the result of Nvidia’s decision to add the Tensor Core feature into the Volta-generation GPU architecture after the procurement for the machine was underway.

While the artificial intelligence aspect of the system is getting most of the attention, Jean Zay will also support traditional HPC simulations and modeling in areas such as particle physics and cosmology, biology, climate science, fusion energy, combustion engines research, drug discovery, healthcare, and public safety. Some of the codes in these areas are GPU-ready, so can use the V100 hardware to accelerate those simulations. Some, however, are not and will have to rely on the CPU-only nodes. In this regard, it is worth noting that the GPU-accelerated nodes contribute about eight petaflops of the system total, with the CPU nodes provide the remaining 6 petaflops.

Whether by design or happy coincidence, these dual-purpose supercomputers based on the V100 are becoming more numerous. That’s mainly a result of the increasing popularity of infusing HPC workflows with deep learning, a trend we have reported on with some regularity, most recently here, here, here, and here. Nvidia has cornered the market for these types of systems, given the ability of the ambidextrous V100 to deliver lots of flops for both FP64-based simulations and FP16/FP32-based deep learning codes.

When Jean Zay boots up later this year, it will be France’s second most powerful supercomputer in the country, trailing only the 23 petaflops Tera 1000 BullSequana system at CEA, the French Alternative Energies and Atomic Energy Commission. It will, however, be the only petascale machine in the country with GPU acceleration. As a result, CNRS is set to become a national center for AI research and development for French researchers.

GENCI is paying €25 million for the new CNRS supercomputer, including support, with even more money to come. The French government, along with the European Commission, has allocated over €170 million from now until 2022 to buy computing resources dedicated to artificial intelligence. So there very well could be more such systems in the pipeline.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.