There has been quite a lot of Docker momentum in the datacenter over the last year, and a strong companion effort to move containers into a wider range of user bases. But for high performance computing, the container approach for supercomputing sites has been weaker—and for several good reasons.
Despite some of the efforts from workload and resource management companies like IBM Platform Computing and Univa, there is still an uphill road ahead, but an effort based at the National Energy Research Scientific Computing Center (NERSC) is set to change that, and open a Docker-like approach for a larger set of HPC users. The approach, called Shifter, takes all the best elements of Docker and wraps them in a more HPC friendly package and will be open sourced to push into more HPC centers.
The question at hand, however, is what is so wrong with Docker that NERSC decided to move past it and build something new, but it is more of an extension off a workable tool, according to Shifter co-creator, Doug Jacobson. In essence, Shifter generates read-only images for users to deploy on HPC platforms, but Docker is not cut out of the picture entirely as it is the way these images are generated. As Jacobsen says, there is little reason to reinvent the wheel here since Docker already has a well-designed and documented way of letting people easily create and maintain existing images that can then be pushed into a the cloud via Docker Hub or a private onsite hub. It is also good, he says, at solving a lot of interesting problems in the area of managing large parallel file systems—something HPC centers like NERSC certainly have.
Jacobsen, who is now a computer engineer at NERSC, used to be one of the center’s bioinformatics and recalls installing four hundred different software packages for bioinformatics workloads. These were complex stacks with difficult dependency traits and ultimately, users did not care about the options of using different versions of software—they just wanted their code to run. When the team first started working with Docker containers, the idea that these users could bring their own defined image and install in a simple environment—a move that was revolutionary in terms of productivity, Jacobsen says. “We see no reason why people should need to have these complex paths to use our systems. They just cause performance, understanding, and other problems. Here, you bring your own software and container and while there are some modifications that have to happen still, new users can get going much faster.”
But why re-invent the wheel at all if Docker is a strong tool for users in HPC center? Jacobsen tells The Platform they have used it with a number of smaller of HPC clusters. However, it turns out that there are two fatal flaws for Docker on a big high performance computing system like the Cray Edison supercomputer, for example. And these problems boil down to security and for the larger scope of HPC users at other sites, “simple” Linux versioning as well as a few other issues crop up with weighty parallel file systems.
“Docker has met the needs of a lot of user communities, but for a lot of HPC sites, the fact that it requires Linux kernel version 3.10 (and the previous required 3.08) this leaves a lot of centers behind. There are plenty of HPC sites that are still using 2.06 and some go back to other previous versions. It takes a long time to update.” This is one of several problems that Shifter addresses via its operating system support for other version but Docker as a whole it too far outside the update cycle for many HPC centers.
Even with the right version, there is another weak point that was much discussed in enterprise Docker use cases, but comes with equal weight for HPC centers, especially since they tend to share large research systems with many users. You guessed it, it is security, and it is on this topic Jacobsen and his co-creator, Shane Canon, have put a great deal of effort. Before evolving into Shifter, Canon developed a tool called MyDock to secure Docker by requiring users to operate as themselves (Docker hands out contextual root access to the image, which is generally safe since users can only access things in the image but it involves some heavy mapping in of many volumes). This has been tested at scale in projects like the Dark Energy Survey and the work has landed inside Shifter.
While MyDock was useful and tested with some notable users, it did not scale up or scale out well enough to work efficiently on larger data-intensive systems, Jaconsen says. And further, during the MyDock and Docker era, it came down to Canon or another administrator to create and maintain all the images, so from a productivity standpoint, it wasn’t scalable in that way either. The goal then, became to create a secure way for users to come in, create their environment and bring their software over and get up and running with all the security features of MyDock in place.
The security features for Shifter build onto what Docker (and then MyDock) does. Shifter gets around the security issues by using root privilege to set up the container but the processes are only ever executed within the container by that single user. “There is a controlled path for security management that meets the needs of HPC more directly than Docker,” Jacobsen notes.
Of course, these are high performance computing centers we’re talking about here, so what about the bottlenecks of the repositories and the networks? If a user wants to add one thousand images to start a job, how long does it take? It turns out, understanding some of the design constraints the Shifter creators bumped up against help explain this. What NERSC wanted from Shifter was simple in theory. Security, the correct versions, but further, they wanted to make sure users would not have to go through sys admins to get things rolling yet still be able to provide access to all the resources on the system (file system, high speed network, etc).
With Shifter, once the image is committed to the system, it’s immutable, so in the cases where thereis, for example, a 5 GB image, it goes into the system and across the parallel file system and the tuning features allow options for how many disk servers it will be spread across. That same file then gets mounted on all thousand nodes (if it’s a thousand node calculation). Jacobsen said that the startup time performance for both Python and MPI jobs improved dramatically, going from tens of minutes to seconds.
Shifter will be put to the test even further on the upcoming Cori supercomputer, which is set to hit the NERSC datacenter floor in its first phase this fall.