Bridging The Gap Between Grid And Containers
December 4, 2015 Timothy Prickett Morgan
At the moment, there are two types of software container users, but in the long run, there will probably only be one.
The first group of users are those who are starting from a clean slate. They are probably writing a brand new application and they want to try out a lot of the new technologies for developing applications as a collection of discrete services wrapped in containers so they can be deployed and maintained in piecemeal instead of blockbuster fashion like with monolithic applications coded in days gone by. Such containerization not only makes software easier to deploy and maintain, but it potentially allows for a much lighter weight deployment environment compared to full-on server virtualization while also allowing for multiple applications to be run on a single machine, driving up utilization and squeezing the most work out of the money invested in systems.
The second set of users have long histories in high performance computing, and we are not just talking about HPC in the national labs and academia, but the kind that exists in the oil and gas, financial services, media and entertainment, manufacturing, and other industries and has for many decades. Such HPC shops are not unacquainted with needing to run multiple workloads on their compute clusters, and they have long since used workload schedulers to allow clusters to automatically distribute a deep queue of jobs to run on machines in those clusters across time such that the utilization on the cluster (as well as the server) is as high as it can be and the throughput of work on the cluster is maximized.
As a provider of the commercial-grade Grid Engine cluster scheduling system and as a supporter of both the Mesos and the Kubernetes management frameworks, which come out of Twitter and Google, respectively, Univa is positioned to bridge the HPC and hyperscale worlds and bring a set of products that leverage the best of both worlds to the enterprise. (Univa is not, of course, the only organization trying to build this bridge. IBM has done similar work to get Docker containers to work on top of its Platform LSF scheduler, and Adaptive Computing has just updated its Moab scheduler with Docker support, which debuted at the SC15 supercomputing conference last month.)
This bridging of technologies is one of the founding principles of The Next Platform, of course, and is exactly what we expect to see happen as companies increasingly wrestle with scale.
This is an important thing because keeping track of stuff as it flits around the datacenter is getting harder and harder. A company that had a few hundred servers to run their applications two decades ago probably has a cluster with a few thousand virtual machines encapsulating their newer applications. And as they move to microservices architectures for their own applications – and as third party software vendors do the same for the ones that they buy – companies will potentially have many tens of thousands of containers to manage. For large enterprises, multiply those numbers by 100X, and the number of moving parts will get exceedingly difficult to keep track of and will require automation – and as it turns out, at many different levels.
Starting From Scratch
For those customers who are building greenfield software container infrastructure on bare metal clusters, Univa has put together a new set of tools that are being sold as the Navops suite. The first element of this suite is called Navops Launch, and it includes a subset of the company’s UniCloud provisioning tool, which is used to provision various cloud infrastructure so it can run the Grid Engine workload scheduler. In this case, the parts of UniCloud that allow it to deploy on bare metal are mixed with Puppet system configuration tools and then mashed up with Docker containers, the Kubernetes container and pod management tool from Google, and the minimalist Atomic Host Linux from Red Hat (it is the Fedora development release, not the commercial Enterprise Linux version) to create a complete container management environment.
You can download Navops Launch at this site. If you do that, Navops Launch itself downloads in three containers, and if you put it onto a Docker-ready machine, it installs itself, adds a Kubernetes master node and control plane and the Kubernetes worker nodes running Fedora Atomic Host and you are ready to start adding machines to the cluster to expand the Docker pods. Univa is distributing this software for free, but the UniCloud elements and the user interface that Univa has created for the stack are not open source, so the entire package is not open source. That may be a stickler for open source purists, but probably not for the enterprise customers that Univa is after and who want containers to be easier. Univa is just finishing up a comparison study to show how much time companies will save by using Navops Launch compared to cobbling together the pieces of open source code themselves. (Stay tuned.)
“This is going to be a really big pull, and I can tell you that the largest infrastructures inside of enterprises today, which are doing HPC in one form or another, are banging on the door trying to get to the cloud.”
Navops Launch 1.0 is in early access now and Gary Tyreman, CEO at Univa, tells The Next Platform that the company is looking for feedback from enterprise customers to see what kinds of additional features and functions they might need. “This is our way of helping people try Kubernetes and Docker containers by simplifying how they build them,” Tyreman says. The tool can be used to control on-premises Docker container setups, or those running on Google Compute Engine or Amazon Web Services. It is reasonable to assume that once Windows Server 2016 ships with Docker container support, it won’t be long before Univa can burst Windows-based Docker containers out to the Microsoft Azure cloud using Navops Launch, too.
There is obviously some enlightened self-interest going on here, as was the case with Google open sourcing the Kubernetes container controller system, which is based on the ideas derived from its internal Borg cluster management system, as we discussed at length when Google set it free as part of the Linux Foundation back in July. Eventually, Navops Launch will be available with a support contract that is on the order of a few hundred dollars per node per year. (Exact pricing won’t be set until generally availability, but Univa, unlike many companies and certainly unlike some of its peers in the Kubernetes game, is open about its pricing.)
The Navops Command part of the suite adds scheduling and policy management capabilities to Launch, and this was something that Tyreman hinted was in the works when the company debuted Grid Engine Container Edition back in September. (More on this in a moment.) Navops Command takes slice of the Grid Engine scheduler and plugs it into the Kubernetes scheduling API so Grid Engine can decide what part of the cluster is the best part of the machinery on which to deploy a pod of containers, leaving Kubernetes to do what it does, and that is to do the lifecycle management and resource management for that pod once it is running. With Borg, Google has a number of workload schedulers that system admins and software engineers use, depending on the application, and it has had a pluggable architecture for some time. Other workload schedulers will also be able to hook into the Kubernetes scheduler API, including ones from the open source community as well as proprietary ones like Grid Engine and maybe even Moab from Adaptive and LSF from IBM’s Platform Computing unit.
To get the Grid Engine orchestration and scheduling add-ons in Navops Command, customers using Navops Launch will simply have to upgrade to a premium support contract – the extra code is latent in the suite. This premium support will have a nominal uplift over the regular per-node annual support contract set for Navops Launch. The precise amount of that price delta has not been set.
Further down the road, Univa will add reporting and capacity planning analytics with the Navops Control module. This is expected sometime in 2016, but precisely when has not been divulged.
Starting From Grid
For customers that have Grid Engine already managing workloads on their clusters, Grid Engine Container Edition is probably the better option, says Tyreman. With Container Edition, the Grid Engine workload scheduler is able to reach into a Docker Hub repository (either the public one or a private one) and deploy a batch of containers like any other collection of Linux applications. The idea here is that such customers will want to run legacy, monolithic applications on Grid Engine, as they are doing now, side-by-side with modern containerized applications and they will not want two different sets of tooling.
This mix of containers and traditional grid is more complex, and therefore Grid Engine Container Edition it will cost more than the regular Grid Engine, which costs $99 per core per year, and probably more than premium support for the Navops Launch/Command combo. Grid Engine Container Edition is itself still in beta and pricing has not been set, so no one can tell yet. They also have different pricing models – one per core and one per node – so customers will have to do the math to see how the licensing fits their needs.
The interesting thing to contemplate is what customers will do once they gain some experience with the Navops tools and start playing with Grid Engine proper. It stands to reason that a lot of enterprises have cluster applications might want to shift them to Grid Engine as they work to containerize the applications. HPC applications – both traditional ones in the labs and academia and those in the enterprise – fall into this category.
“In order for Docker to penetrate the enterprise, companies have to be able to consume it, and I realize that this is stating the obvious,” says Tyreman. “Enterprises want to know how to orchestrate it, and that is where we come in. There are security and networking and other issues that really smart people are also working on, and that will all come together very, very quickly. Where we see the technology being deployed is in a pristine environment, and it can be in the same company but it may not be the same HPC person I am talking to today with Grid Engine. They do not necessarily want to mix their greenfield container cluster with their big data or HPC environment, and it will take time for this to evolve.”
Over time, there will be a mixing, we think and so does Univa as companies seek to drive more work out of their clusters. This, after all, is why Google containerized its workloads and created Borg in the first place.
Enterprises will come at automated containers for another reason, apparently because they see containers as a way to get to the hybrid cloud in a way they cannot do easily with virtual machines.
“As for the enterprise side that we are more interested in, we have some very large accounts with resource bottlenecks and application dependency craziness that requires them to slowdown the launching and runtime of workloads because their systems software is different for different workloads,” Tyreman explains. “This is where we are seeing Docker being applied, particularly because enterprises want to take what they run on premises and run it on the public cloud. Containers are offering a fantastic ramp to the cloud because the containers run the same binaries and have the same dependencies. This is going to be a really big pull, and I can tell you that the largest infrastructures inside of enterprises today, which are doing HPC in one form or another, are banging on the door trying to get to the cloud.”