Solving HPC Conflicts with Containers
March 2, 2017 Ben Cotton
It’s an unavoidable truth of information technology that the operators and users are sometimes at odds with each other.
Countless stories, comics, and television shows have driven home two very unpleasant stereotypes: the angry, unhelpful system administrator who can’t wait to say “no!” to a user request, and the clueless, clumsy user always a keystroke away from taking down the entire infrastructure. There is a kernel of truth to them. While both resource providers and resource users may want the same end result — the successful completion of computational tasks — they have conflicting priorities when it comes to achieving it.
System operators are tasked with keeping the resource available and performing for all users. This includes ensuring and enforcing proper resource allocation. It also means that any changes to the system have to be thoroughly vetted to ensure that they do not negatively impact the availability and performance of the resource. As a result, the operators have gained a monopoly on configuration and deployment. All of this is true for any IT resource, so what make this relevant to readers of The Next Platform? Simply put, high-performance computing is a more sophisticated endeavor than general-purpose computing. HPC, by its very nature, is inclined to be experimental and push boundaries. Thus the users’ need to try experimental software packages is directly at odds with the operators’ need to prevent that software package from taking the cluster offline.
Traditionally, HPC systems and operations have been designed around monolithic applications that are compute-heavy and often latency-sensitive. Weather modeling and computational fluid dynamics are two of the classic cases that still embody this paradigm today. The approach in this model is to throw as much homogeneous hardware at the problem as the budget will allow in order to increase the simulation resolution or shorten the time-to-results. These traditional applications fit well with the resources that have been developed to support them. Being computationally-bound, they are scheduled by the number of CPUs and the walltime requested.
Over the years, a new model of HPC has begun taking root. The new class of HPC applications is often smaller with different resource requirements. Some jobs are data-intensive and require fast access to local storage in order to perform computation against the data. Other jobs may require access to remote network resources, making network bandwidth the constraining factor. A third type of job is dynamic in its resource needs, potentially changing the core count or walltime by large amounts depending on the input parameters. None of these applications are well-served by scheduling systems that depend on the up-front request of fixed CPU and walltime. Furthermore, the varying nature of the “secondary” resource (e.g. network bandwidth and local IOPS) requirements leaves jobs susceptible to interference and competition from other jobs on the same machine.
In order to provide better support for this new class of HPC application, several projects are being developed. These projects make use of the Linux container (LXC) feature to allocate and enforce process-level resource utilization beyond CPU and memory, as well as to provide process isolation and application portability. Containers rely on the host kernel and thus are lighter weight than full-fledged virtual machines.
Docker is the most well-known container platform, and it has seen wide adoption among hyperscalers. Although its primary use is in powering scalable webservices, Docker offers some features that are compelling to HPC shops. Using the kernel cgroups feature to allocate and enforce resources means jobs can be scheduled to minimize or eliminate contention for non-CPU resources. Additionally, since the container is an isolated environment in which the user code runs, users can bundle the version of applications and supporting libraries specific to the job in question. Thus, there’s no need for the resource operators to worry about conflicting MPI libraries. The container format also lowers the barrier to using federated resources, which have historically suffered from a lack of application and library standardization. The National Energy Research Scientific Computing Center’s Shifter and the Berkeley Lab-developed Singularity project are two containerized HPC approaches currently in use.
Researchers from the University of Edinburgh and the University of St. Andrews present a third project in the HPC Docker ecosystem: cHPC. Like Shifter and Singularity, cHPC provides a mechanism by which HPC resource resource providers can make use of container technology to support what the authors call “second generation” HPC applications. cHPC also provides a telemetry layer that combines physical status (process placement, memory and CPU consumption, I/O activity, network activity, et cetera) and logical status (e.g. whether the job is running, idle, checkpointing, restoring, or in an error state). Combining these statuses allows operators and users alike to see a holistic view of jobs and resources, solving what the authors describe as an asymmetry of information.
Containerized HPC projects are an attempt to eliminate the conflict between the concerns of HPC resource providers and HPC resource users by enabling awareness of more resources and eliminating the need for operators to monopolize application deployment. Because these containers can be used alongside “first generation” HPC jobs in traditional schedulers, we do not expect to see the traditional HPC jobs adopt containers with any haste. However, the use of containers does solve real problems that many HPC shops face, and we will watch their adoption with keen interest.