Wringing Cost and Complexity Out of HPC
November 18, 2016 Rob Farber
The race toward exascale supercomputing gets a lot of attention, as it should. Driving up top-end performance levels in high performance computing (HPC) is essential to generate new insights into the fundamental laws of physics, the origins of the universe, global climate systems, and more. The wow factor is huge.
There is another area of growth in HPC that is less glamorous, but arguably even more important. It is the increasing use of small to mid-sized clusters by individuals and small groups that have long been reliant on workstations to handle their most compute-intensive tasks. Instead of a small number of massive clusters, this new development will ultimately result in a massive number of small to mid-sized clusters. A million points of light, all shining brighter, will drive innovation not only faster than we have seen before, but on a much broader scale.
There are two major barriers slowing this sea change in HPC usage: cost and complexity. Building an optimized cluster for parallel application execution requires considerable expertise and a decent budget. Using and maintaining these systems is also challenging. But inroads are being made toward solving these issues.
Many organizations are leading the way, including vendors, the open source community, and forward-thinking organizations that use HPC. One such organization is the University of Pisa.
The IT Center at the prestigious university serves about 200 HPC users. Some of these users are studying the disciplines you might expect: quantum chemistry, nanophysics, genome sequencing, proteomics, and engineering. Others come from fields that are relatively new to HPC, such as big data analytics, data visualization, machine learning, and other business-focused areas.
As the diversity of users increases, so does the challenge of addressing their many different needs. The University of Pisa IT Center has developed strategies to serve its expanding user groups more effectively, while keeping costs down. These strategies include new approaches to both software and hardware.
End-user HPC applications come from a variety of sources and tend to have unique software dependencies, so supporting them all is no simple task. Each HPC software stack may include dozens of individual components that must be obtained, integrated, and tested with each other and with the end-user application. There are a lot of pieces to the puzzle, and the frequency of new releases ensures that the puzzle is in constant flux.
To reduce the time and effort required for this process, the IT Center at the University of Pisa is evaluating the OpenHPC community system software stack. This pre-integrated software, which is hosted by the Linux*Foundation, includes a set of commonly required HPC components, such as provisioning and resource management tools, I/O libraries, and scientific libraries. One goal of the developer community is to maintain a consistent application programming interface as the components in the platform evolve. The combination of rich functionality and a stable API makes it easier to support multiple applications on a single, modular software platform.
After working with OpenHPC in their test environment, the IT Center sees considerable potential for this new approach. According to Maurizio Davini, Chief Technical Officer for the University of Pisa, “OpenHPC allows us to focus on the problem instead of the plumbing. We can setup a new cluster 80 to 90 percent faster. This is very big for our team and our users.”
Although still in the testing phase, Davini believes that OpenHPC may help the university in multiple areas, including:
- Faster innovation. OpenHPC is supported by about 30 vendors and research organizations, and contributions are made by many others. This broad participation should accelerate development, which would help the IT Center deliver new capabilities and support new hardware advances more quickly.
- A better platform for new HPC users. Many of the vendors involved in OpenHPC development have a vested interest in expanding the use of HPC to new markets. Providing better support for users who have little or no experience with HPC will be key to this growth, and could ultimately make it easier for the university to support a broader range of its own users.
- Simpler application development. OpenHPC includes support for a variety of languages, including Fortran, C+, and C++. Other languages, such as Python, Java, and .NET are available through the base operating systems that OpenHPC is validated to run with. This broad support has the potential to help a new generation of developers write and optimize code for HPC.
Simplifying Hardware Delivery—a Private Cloud
The University of Pisa has five physical clusters onsite, ranging in size from 10 to 60 nodes. They also have an on-premise cloud based on Dell Hybrid Cloud System for Microsoft. This pre-integrated cloud solution uses Microsoft Azure Pack to provide on-premise services that are interoperable with Azure public clouds.
Davini and his team have verified that they can use OpenHPC for both bare metal and virtual deployment models, so they can use the same software tools across all their HPC hardware options. Says Davini, “Many of our users can’t justify the cost of a dedicated physical cluster, so we offer lower cost solutions through our cloud. We recently set up a small cluster for a team of acoustics researchers that needed to increase their computing resources for a short-term study. They used it for a few weeks and were thrilled with the results.”
The University of Pisa is not just a consumer of HPC innovation. They are also helping to drive change. As part of this effort, they have been working with several vendors to evaluate and test new technologies and to provide feedback. They began working with Intel and Dell in 2013. Says Davini, “We share a goal of simplifying HPC, so it makes sense to work together. We hope we can eventually provide cluster solutions that individual researchers and even small businesses can use.”
One aspect of their engagement is to test components of the Intel Scalable System Framework (Intel SSF). Intel SSF provides a blueprint for balanced cluster designs. The benefits of using this framework are much like the benefits of using OpenHPC. IT staff don’t have to spend as much time evaluating, integrating and testing hardware components, because much of the heavy lifting has already been done.
The University of Pisa IT Center is working to evaluate and test:
- Intel HPC Orchestrator. This Intel supported distribution of OpenHPC includes several additional software components to help system administrators and software developers. Intel also provides additional testing and validation.
- Intel Xeon Phi processors. These processors provide up to 72 cores per socket to support highly parallel workloads. They can run existing x86 code without recompilation and can be used as either accelerators or standalone processors.
- Intel Solid-State Drives (SSDs) for PCIe. These SSDs can be used to resolve the performance, latency, and bandwidth issues that are common with mechanical hard drives and even with SATA-connected SSDs.
- Intel Omni-Path Architecture. This new cluster fabric is designed to provide comparable performance to EDR InfiniBand, but with improved cost and scalability.
The University of Pisa IT Center provides feedback to the vendors and the larger HPC community through blogs and technical reports. Their documentation includes guidance on setup and use. In some cases, they have moved into performance testing, and provide information that can help organizations make more informed decisions based on workload requirements. For the latest information, visit the University of Pisa IT Center website at: http://www.itc.unipi.it/
Larger and more complex models, increasingly sophisticated algorithms, and growing data sets are driving a need for more computing power in both science and business. HPC is the obvious answer, and a transformation is underway to help make cluster solutions not only more powerful, but also simpler and more affordable.
With recent advances, designing physical clusters from the ground up may not be the best option anymore, at least in some scenarios. As Davini notes, “Building a one-off cluster for every research team doesn’t always make sense. New approaches to hardware and software are beginning to change the HPC landscape and expand the potential user base. It’s exciting to be a part of this.”
- University of Pisa IT Center. http://www.itc.unipi.it/
 As of October 13, 2016, Intel HPC Orchestrator is validated to run with Red Hat Enterprise Linux 7.2 and SUSE Linux Enterprise Server 12sp1. Python and Java are available through these operating systems. .NET is available through the open source Mono project.
 Although applications running on Dell Hybrid Cloud System for Microsoft can theoretically run without change on Microsoft Azure public clouds, the lack of support for PXE boot in public clouds currently prevents easy replication. Future releases of Intel HPC Orchestrator may help to resolve this limitation.