Texas Advanced Supercomputing Center Taps Latest HPC Tech

Building on the successes of the Stampede1 supercomputer, the Texas Advanced Computing Center (TACC) has rolled out its next-generation HPC system, Stampede2. Over the course of 2017, Stampede2 will undergo further optimization phases with the support of a $30 million grant from the National Science Foundation (NSF). With the latest Xeon and Skylake processors, and enhanced networking provided by the Omni-Path architecture, the new flagship system is expected to deliver approximately 18 petaflops, nearly doubling Stampede1’s performance.

Stampede2 continues Stampede1’s mission: enabling thousands of scientists and researchers across the United States to deliver breakthrough scientific discoveries in science, engineering, artificial intelligence, the humanities, and more. Dr. Dan Stanzione, Executive Director, at TACC, describes the breadth of their work. “TACC provides large-scale support for our users who make significant engineering and research discoveries. As research goals advance, the problems supercomputers help solve are getting bigger and more complicated every year.

For our team at TACC, delivering greater speed and capacity with Stampede2 are top concerns.” A faster system means TACC can help scientists gain insights from their data more quickly. It also means Stampede2 can offer support for many more projects. In 2016, TACC received five times more requests to use Stampede than its capacity could handle. Unfortunately, that means many important scientific projects did not have access to the resources they needed. “Stampede2 will be a big step forward, since we can accommodate more projects and complete them more quickly,” Stanzione continued.

TACC’s Stampede2 is deployed with support from vendor partners Dell, Intel, and Seagate Technology, and operated by a team of cyberinfrastructure experts at TACC, UT Austin, Clemson University, Cornell University, the University of Colorado at Boulder, Indiana University, and Ohio State University. “An HPC system the scope of Stampede represents an incredible team effort,” expresses Stanzione. “We are thankful for the many partner organizations and experts across the country who supported us.”

Asked about the challenges of rolling out such a major HPC system upgrade, Stanzione describes key considerations. “Much of the work in science and engineering is computation-focused, especially in fields such as weather, astrophysics, and aerodynamics. However, we are seeing exciting growth in demand from fields like biology and the humanities, and these new communities bring different needs. Today we are providing familiar web and application interfaces to supercomputing systems that naturally fit into established workflows. And we are building systems from the ground up with big-data in mind. Stampede has grown from being a big engine to being an end-to-end solution.”

Several other benefits give Stampede2 a huge advantage over earlier systems. New applications require greater computational throughput. Stampede2 supports more than 40 major science and engineering applications at any given time, alongside thousands of applications used by small groups or individual researchers. Stampede2 is a heterogeneous system: one third of its processors are Intel Xeon processors, and the rest are Intel Xeon Phi processors. This combination of processors allows a broader range of applications to take advantage of the improved system configuration, and speeds up parallel workloads. For intensive computing applications such as visualization and ray tracing, Intel processors can also replace GPUs when tapping OSPRey interactive rendering capabilities.

Given the scope of the design, Stampede2 is being deployed in three phases. Phase 1, based on the Intel Scalable System Framework, is already in place. Each node consists of Dell PowerEdge C6320P servers with Intel Xeon Phi processors 7250 (code name Knights Landing). As with Stampede1.5, Intel Omni-Path Architecture provides 100Gbps bandwidth in a low-latency network fabric.

Despite having deployed only part of the system, phase one of the Stampede2 system ranks at #12 on the June 2017 Top500.org rankings of the fastest supercomputers worldwide. Comprising 4,200 nodes – each with 68 cores, 96GB of DDR RAM, and 16GB of high-speed MCDRAM – Stampede2 achieved a peak performance of nearly 13 petaFLOPS.

Phase two will add 1,736 of the newest Intel Xeon processors (formerly code named Skylake). As part of the Intel Xeon processor scalable family (formerly code named Purley), these new processors with 28 cores offer significantly faster performance for compute and data-intensive workloads. Faster performance is due also to significant increases in memory and I/O bandwidth.

Additional efforts will take the system’s capability even further. Incorporation of 3D XPoint non-volatile memory will provide an important opportunity to evaluate the impact of this new approach to memory on a large scale, and is expected to nudge performance even higher. When complete, Stampede2 is expected to deliver about 18 petaFLOPS of computing capacity for open science, nearly doubling the maximum performance of Stampede1.

Researchers have used the system since May, conducting diverse research on Stampede2 including automated tumor identification from MRI data (The University of Texas at Austin), simulations supporting the LIGO gravitational wave discovery (Cambridge University), and real-time weather forecasting to direct storm chaser trucks (University of Oklahoma).

When asked about the planning and rollout of Stampede2, Stanzione replies, “Stampede2 has been TACC’s smoothest deployment so far. Our team has already accomplished many system upgrades, and we are working hard on further optimizations. One element that makes the deployment process easier is our early collaboration with Intel. We always want to be among Intel’s earliest adopters so we can realize the benefits of their new technologies right away. In turn, we can also offer feedback to Intel when we encounter unexpected challenges.” Intel, working with TACC and other early adopters, uses this feedback to ensure products have been tested thoroughly in production systems before broad release.

While Stanzione and his team strive for the greatest possible performance, they must also ground their ambitions with the reality of a fixed budget. “We are very fortunate to have the financial support of the NSF, and we want to use their grant very wisely,” reflects Stanzione. “By experimenting with the latest Intel technologies, we can better understand the price-performance impact of each, consider bottlenecks, and determine where Stampede will benefit most from upgrades. Working early with Intel helps us identify and adopt technologies that maximize our system’s capability while remaining cost effective. The latest Intel Xeon processors will give us quite a boost in performance, and we are excited to implement phase two very soon.”

Once phase two is complete, Stanzione expects a much greater number of applicants seeking time on the supercomputer. “Our team here at TACC is very excited to get our work done so we can better serve engineers and researchers who require the system’s full potential,” he reflects. “We are the people behind the scenes; our reward comes when scientists achieve breakthrough discoveries.”

While the team at TACC cannot anticipate the multitude and diversity of scientific projects which will ultimately run on Stampede2, one thing is certain: HPC systems like Stampede2 serve as foundations for innovation, taking science forward into a very optimistic future.

Rob Johnson spent much of his professional career consulting for a Fortune 25 technology company. Currently, Rob owns Fine Tuning, LLC, a strategic marketing and communications consulting company based in Portland, Oregon. As a technology, audio, and gadget enthusiast his entire life, Rob also writes for TONEAudio Magazine, reviewing high-end home audio equipment.

BlackDove says:

July 26, 2017 at 12:55 am

I am interested to know how XPoint NVM will be integrated. I was under the impression that Intel was not supporting it with Skylake SP any more and it would need the Cascade Lake refresh in 2018.

I’m also interested to know how real workloads(as opposed to HPL) scale across the Xeon Phi and Skylake SP nodes; if they’ll work simultaneously on different parts of a problem or shuffle work back and forth depending on which processor is better for a given workload. Also, interconnect topology details please.

Texas Advanced Supercomputing Center Taps Latest HPC Tech

Sign up to our Newsletter

1 Comment

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

Putting Composability Through The Paces On HPC Systems

Putting TACC’s “Stampede3” Through The HBM Paces

The Microcosm Of Global HPC In The Lone Star State

1 Comment

Leave a Reply Cancel reply