When Will Containers Be the Total Package for HPC?
September 13, 2016 Nicole Hemsoth
While containers are old news in enterprise circles, by and large, high performance computing centers have just recently begun to consider packaging up their complex applications. A few centers are known for their rapid progress in this area, but for smaller sites, especially those that serve users from a diverse domain base via medium-sized HPC clusters, progress has been slower—even though containers could zap some serious deployment woes and make collaboration simpler.
When it comes to containers in HPC, there are a couple of noteworthy efforts that go beyond the more enterprise-geared Docker and CoreOS options. These include Shifter out of NERSC and Singularity out of Lawrence Berkeley National Lab. There are also some publicized efforts to use containers in complex HPC workflows, including one we wrote about on the research genomics side, and another out of the large hadron collider at CERN to handle high energy physics workload deployments via Docker. Middleware makers that have historically catered to HPC have also stepped to Docker and other container technology by adding hooks to make it easier to adopt (Univa, IBM Platform LSF, and others). These all represent big undertakings on both the environment development side or deployment side, but the question really becomes how “typical” HPC centers are considering implementing Docker and other container technologies.
To get to the core, we talked to Purdue University, which has recently detailed results of container deployments in their own diverse scientific computing environments. Purdue represents the needs of a mid-sized HPC center quite well in that they serve a broad base of users both on-site and via scientific gateways and portals. Like other research centers, Purdue is seeing a shift in applications and deployment needs recently in that there is a trend away from the big MPI and batch-scheduled jobs in some areas and a move toward workflows that have a gateway or portal component to the applications, involve real-time data analytics or data collection and dissemination as well as collaboration with remote teams.
Although those MPI and batch jobs will be a mainstay of Purdue’s HPC work for many years to come, the movement in the other direction got Michael Shuey, infrastructure architect at Purdue, and his teams thinking about how containers could be harnessed to speed deployments—preferably without losing any of the performance along the way.
“People are developing tools in house, they’re developing in new environments and those environments are no longer what you would expect from a batch scheduled system. We are seeing a need to support whatever kind of software environment researchers come up with. We need to that at scale, we need to do in a relatively standard fashion, and we need to provision it rapidly,” Shuey tells The Next Platform.
Of course, if adopting containers was a quick or easy proposition, more centers would do it. The challenges are around security and the application ecosystems, Shuey says. On the security side, however, a great deal of progress has been made in recent Docker releases and developments in the Linux kernel itself. “We are at a point finally where we can build a hard fence between different users and disciplines.” On the platform front, he notes that “If you look, early work with containers and microservices was heavily focused on web environments. That has not been the case with scientific computing…the maturity of the container platform has lagged behind a bit and it is just getting to the point where there is enough of an ecosystem, driven largely by industry, that is starting to be a real resource developers on scientific projects are looking to.”
In addition to more general concerns about security and the richeness (or lack thereof) of the container ecosystem, one might expect that performance would be critical. While Shuey says that there are still advancements that need to be made in container support for HPC networks (RDMA, OmniPath, and Infiniband) and getting those to work seamlessly with applications that need low latency, as well as parallel file systems like Lustre and GPFS, getting the broader base of MPI codes into containers is still proving to be a challenge. It is not likely that some codes will (or can) be efficiently containerized, in other words, which actually fits the bill for Purdue, which looks at splitting off its infrastructure assets between the batch scheduled HPC nodes and those designated for containerized jobs.
Overall, Shuey says containerization is becoming easier as the technology and ecosystems mature and the microservices mindset moves from the enterprise to HPC. That same mindset is what is pushing development on RDMA networks and other key components of many HPC systems. “We are also seeing this move on the file system front too; a lot of web platforms don’t require the kind of extreme scale performance from their file systems you’d get with a Lustre or GPFS. If we move some applications into a container, we don’t want to lose the I/O performance of it. But that really becomes an issue of driver integration and the complexity of using two stacks that haven’t traditionally been used together.”
Interestingly, the performance overhead from containers is not as hefty as one might think, at least in Purdue’s experience. It took some tweaking, but Shuey says they are seeing a 1%-2% performance overhead in terms of I/O throughput speed and network performance. This isn’t all “out of the box” however. “With the file system, we worked with older hardware in our test lab but could still get a half a GB/s per file stream to a particular server. When we containerized that application and tested the file connection inside Docker, we got slapped with a 50% performance penalty. But then we realized that by default, Docker was imposing an overlay file system layer—something that’s perfectly acceptable in the enterprise or for web workloads, but that layer, built into the Linux kernel, couldn’t keep up with the I/O we could get with a research grade NAS (not even a GPFS or Lustre file system).”
The point is, by working both the HPC systems and container software stacks—by learning how two different software stacks interact—the team has been able to slowly, but surely work towards implementing containers; a move that could make some of their workflows more efficient, especially those more “modern” approaches with web interfaces, science gateways, and other collaborative environments. Because many HPC centers are university or government institutions, however, things tend to move at a slower pace, especially if there are any potential security or other concerns. This, coupled with the complexity of HPC workflows (especially with MPI codes) means it’s still slow-going.