Cluster Management Comes Full Circle

When it comes to cluster computing, what’s old is new again– at least for Global 2000 companies that are leaving behind the monolithic enterprise systems and appliances and moving into the scale-out Linux cluster camp.

While the monolithic system from a single vendor won’t be going away anytime soon, as Hadoop and other experimental distributed system initiatives take production root, the commodity cluster is making headway into the large scale enterprise datacenter. What this means, however, is that the expertise at managing clusters at scale (a task that once fell to the on-board management tools from BMC or was cooked into Hewlett-Packard and Dell machines, for instance) is, at best, a new challenge and at worst, a lost art. This is the current state of the enterprise shift to clusters, at least according to Tim McIntyre, whose company has its roots in managing scale-out commodity clusters for high performance computing systems. As one of the co-founders of StackIQ (formerly called ClusterCorp), he and an early team were part of the development force behind the Rocks cluster management suite. McIntyre said StackIQ’s ten-year window into cluster evolution gives it an advantage, especially at a time when commodity systems are making big headway in commercial settings.

“For a long while, we questioned whether the types of cluster management tools we had developed would find a place in the enterprise, but over the last few years, it’s really been proving itself out. We went from asking how bare metal management could be relevant, but now we continue to see how these tools are going to be even more relevant, especially as tools like Hadoop and OpenStack continue to grow.”

McIntyre says that beginning in 2013, their business significantly shifted away from HPC as a sole focus and as of today, 80 percent of StackIQ’s revenue comes from non-HPC deployments. Last October’s round of $6 million in Series B funding will be applied to further pushing this new focus into further levels of automation and tooling for DevOps in large-scale enterprise environments. “We’re see those users who skipped the ten or more years of cluster management and now for the first time are transitioning from Teradata, EMC, Oracle, and the like to building scale-out Linux infrastructure. They’re finding that the tools they have been using for years aren’t appropriate for what they’re trying to do now.

StackIQs focus is on the Global 2000 and it has already had solid traction at the top end of the market, snagging the top three credit card processors and top three wireless carriers, according to McIntyre. All of these customers have production Hadoop clusters with thousands of nodes and most of them started with StackIQ (which is integrated with all three of the major Hadoop distributions— Cloudera, MapR, and Hortonworks) simply because they planned on scaling beyond their original node count and didn’t want to have to work backwards management-wise as they added nodes.

“The tooling at infrastructure management layer is very similar between HPC and enterprise users but the overlap, in terms of users, is almost zero,” says McIntyre. “The notion that some companies have that you can just take an HPC user base and they’ll evolve into things like Hadoop and cloud is incorrect. What has been true for us, however, is that at the infrastructure management layer, the problems to manage things like OpenStack and Hadoop at scale are identical. There’s a direct line from what we saw in the HPC space with the transition from traditional monolithic supercomputers to commodity clusters in the early 2000s. In that case, having a management solution to take commodity infrastructure and go from bare metal to a running application workload needed to be completely automated, but the existing set of tools that were available back in 2000 in HPC weren’t up to the task, hence the birth of Rocks.”

The company is staking a claim to Hadoop infrastructure management in particular after watching the number of production nodes under its management grow 10X since 2013, fed by the need for cluster management tools that were built for scale to meet the demands of evolving HPC clusters over the last decade.

McIntyre points to Hadoop as a representative initiative that appears to be getting a great deal of enterprise momentum, having moved from test clusters into full production. He says that as users move from Hadoop to OpenStack, Docker, and beyond, the need for cluster management that was built with the scalability of HPC datacenters in mind will be more important than ever, especially since none of the tools attached to these relatively new frameworks offer the same infrastructure management capabilities. Their strategy is to integrate with all the built-in management frameworks for the various Hadoop distributions (for instance, StackIQ recently announced it will play well with Ambari, Cloudera’s management tool) but to stand alone on all the bare metal aspects that these tools weren’t designed for at scale. “These vendors have no desire to build down the stack, it’s just not their core competency,” said McIntyre. “They need a StackIQ to go from bare metal to running disk config, network config, the 40 pages of prerequisites you have to go through manually typically before you can get a Cloudera Manager up and running. We found that layer of the stack to be wide open and as applications like Hadoop, OpenStack, and even containers, what we do for real infrastructure management becomes ever more important.”

StackIQ Boss is the company’s primary platform, which is powered by “pallets” (for former Rocks users these are akin to Rolls). These pallets contain the needed functionality and configuration details for implementing everything from one of the various Hadoop distributions to Docker, OpenStack, Puppet, and other frameworks. Each pre-configured pallet allows a user to install a machine with all the necessary software and configuration, some of which can be set in advance inside Boss and carried over. The concept is that clusters can be spun up quickly, easily repeated and managed since pallets contain all of the necessary information and can be rebuilt or altered en masse, upgraded at the Pallet and thus cluster level, and monitored within Boss at the node and cluster level.

Even though the Global 2000 might be making a shift toward commodity clusters, this doesn’t mean they’re willing to take big experiments with the kinds of up-and-coming tooling found in web-scale datacenters. For instance, McIntyre said that although Mesos is great for a new generation of web applications, for the big name companies shifting into new territory, an experiment in Hadoop alone is enough of an adventure for now. They’re not moving to experimental management tools just yet.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.