SC16 for HPC Programmers: What to Watch

An event as large and diverse as the annual Supercomputing Conference (SC16) presents a daunting array of content, even for those who specialize in a particular area inside the wider HPC spectrum. For HPC programmers, there are many sub-tracks to follow depending where on the stack on sits.

The conference program includes a “Programming Systems” label for easily finding additional relevant sessions, but we wanted to highlight a few of these here based on larger significance to the overall HPC programming ecosystem.

HPC programmers often have special considerations in how they program that other fields do not. For example, nothing ruins a good cluster like a thermal event, so power-awareness is important at the many-thousands of cores scale. These three sessions offer insight into unique challenges for HPC programmers.

(Tutorial) Power-Aware High Performance Computing: Challenges and Opportunities for Application and System Developers – Sunday, 8:30 AM – 12 PM
(Tutorial) Secure Coding Practices and Automated Assessment Tools – Monday, 8:30 AM – 12 PM
Development Effort Estimation in HPC – Tuesday, 10:30 – 11 AM

Tooling and languages

The HPC world is more than just application software; the tools and languages programmers use are as varied and complex as the applications themselves. These sessions cover a variety of topics related to getting the job done.

(Tutorial) Managing HPC Software Complexity with Spack – Sunday, 1:30 – 5 PM
Towards “Write Once, Run Anywhere” HPC via Automated Translation – Tuesday, 10 AM – 6 PM
Bringing About HPC Open-Standards World Peace – Wednesday, 10:30 AM – 12 PM
(BoF) The Message Passing Interface: On the Road to MPI 4.0 and Beyond – Wednesday, 12:15 – 1:15 PM
PIPES: A Language and Compiler for Task-Based Programming on Distributed-Memory Clusters – Wednesday, 1:30 – 2 PM
Approaches to Modernizing and Modularizing Fortran Codes Using Fortran 2003 – Wednesday, 4 – 4:30 PM

GPUs and accelerators

GPUs aren’t just the domain of specialized HPC centers and bitcoin miners anymore. Recent investments by Amazon Web Services and Microsoft Azure into GPU offerings presages a mass-market adoption. Microsoft is also making a big play for FPGAs in machine learning. HPC programmers who haven’t yet had to write code for accelerators will probably have to soon. It’s no surprise that SC 16 has a wealth of GPU and accelerator content.

(Tutorial) Programming Intel’s 2nd Generation Xeon Phi (Knights Landing) – Sunday, 8:30 AM – 5 PM
(Tutorial) Harnessing the Power of FPGAs with Altera’s SDK for OpenCL – Monday, 8:30 AM – 12 PM
(Tutorial) Application Porting and Optimization on GPU-Accelerated POWER Architectures – Monday, 8:30 AM – 5 PM
(Tutorial) Debugging and Performance Analysis on Native and Offload HPC Architectures – Monday, 8:30 AM – 5 PM
Understanding Error Propagation in GPGPU Applications – Tuesday, 2:30 – 3 PM

Real-world applications

The point of HPC programs isn’t just to crunch numbers faster than anyone else. HPC is all about getting better answers to questions big and small. Seeing how others use HPC for their work is inspiring. Here are a few sessions on HPC in the “real world”:

Measuring IT Success in Milliseconds on the F1 Track – Wednesday, 3:30-4:00 PM
Com: Using Machine Learning to Organize and Contextualize the Largest Consumer DNA Database in the World – Thursday, 1:30-2:10 PM

The future

Conferences aren’t just a snapshot of the state of the profession, they’re an opportunity to examine trends and take a peek into the future. In addition to the Emerging Technologies track, SC 16 offers some sessions with an explicit focus on gazing into the crystal ball.

The End of Von Neumann? What the Future Looks Like for HPC Application Developers – Wednesday, 3:30 – 5 PM
(BoF) Emerging Trends in HPC Systems and Application Modernization – Tuesday, 5:15 – 7 PM

Programming environments, tools, and applications are the cornerstone upon which all of the future exascale efforts rest. After all, even with immense advances in hardware, without tuned and optimized programs, compilers, and tools it is without context. Over the course of SC16 week, keep an eye open for our writers and if you have a moment, stop and say hello and let them know what matters most to you from a programming perspective so we can tune future coverage of these issues over the course of 2017.

An interesting sidelight of the Sunday Tutorial by the TACC team is that they measured core to core performance of MPI between cores on the KNL over OmniPath. Not surprisingly Performance appears uniform ( and very good ) EXCEPT for two pairs of cores which performance was way below the norm.
This initially puzzled the team and on investigating with help from Intel the reason became clear – by default these pairs of cores are those on the chip which are handling the Omni- Path interrupts.
Effectively this reduces the useful core count for at least OpenSource MPI on the chip to 68 cores.
The finding highlights one of the most prevalent criticisms of Intel OmniPath via-a-vid Mellanox IB that OMP loads the processor unnecessarily.
TACC will also measure MPI performance over Mellanox IB and its greatly desired that a commercial code vendor ( on behalf of TACC my firm Integral Engineering has solicited ANSYS with at least preliminary support ) can step up with licensss that the TACC team can run against both fabrics.

David L Wade says:

November 19, 2016 at 11:43 am

An interesting sidelight of the Sunday Tutorial by the TACC team is that they measured core to core performance of MPI between cores on the KNL over OmniPath. Not surprisingly Performance appears uniform ( and very good ) EXCEPT for two pairs of cores which performance was way below the norm.
This initially puzzled the team and on investigating with help from Intel the reason became clear – by default these pairs of cores are those on the chip which are handling the Omni- Path interrupts.
Effectively this reduces the useful core count for at least OpenSource MPI on the chip to 68 cores.
The finding highlights one of the most prevalent criticisms of Intel OmniPath via-a-vid Mellanox IB that OMP loads the processor unnecessarily.
TACC will also measure MPI performance over Mellanox IB and its greatly desired that a commercial code vendor ( on behalf of TACC my firm Integral Engineering has solicited ANSYS with at least preliminary support ) can step up with licensss that the TACC team can run against both fabrics.

SC16 for HPC Programmers: What to Watch

Tooling and languages

GPUs and accelerators

Real-world applications

The future

Sign up to our Newsletter

1 Comment

Leave a Reply Cancel reply

Tooling and languages

GPUs and accelerators

Real-world applications

The future

Sign up to our Newsletter

Related Articles

Lenovo Drives HPC From The Middle Ground

HPE Takes On The High End With SGI Expertise

Nvidia CEO’s “Hyper-Moore’s Law” Vision for Future Supercomputers

1 Comment

Leave a Reply Cancel reply