A Clever Approach to Cultivating the HPC Stack
November 12, 2015 Nicole Hemsoth
A hypothetical scenario here.
So, let’s say you are far and away the world’s largest supplier of microprocessors.
Over the years, you branch out, slowly but steadily collecting an entire toolchain that complements your architecture and rides on the force of your chip business until, before long, there is an entire software stack. neatly bundled and broadly suited to a range of prime markets with enough wiggle room inside to cobble together quite sophisticated, tailored, but still ready-made platforms.
This stack is so tight and broadly appealing yet still laden with options that it can be shipped around to an entire ecosystem of system vendors, who can snip and tweak and ultimately build approximately similar clusters on stable platforms that have been tuned and turned to suit the needs of various verticals. This works, everyone has a business, because after all, technology business is driven by differentiation.
And if you are the chipmaker, the model is beautiful. And it is simple. And for what it’s worth, no one can touch it because there cannot be, as we have seen, two equally large suppliers of microprocessors and companion software stacks that are vying for the top position. Ahem.
And at another level, it’s even more brilliant and beautiful when it leverages open source—or better still, integrates open source—or maybe even most accurate—it compels open source. It makes open source the center of integration and leverage. Accordingly, if that neat bundle of open source tied so nicely together is attractive enough to create a support model around, what are you left with? Assuming, once again, we are playing the hypothetical role of the world’s largest microprocessor.
If you are one of the system integrators whose differentiation in the face of this sort of influence are those tunes and tweaks, this is, theoretically speaking, a rather uncomfortable position. The core, literally, of your machines is based on that microprocessor. And that software stack that created enough differentiation to make you worthwhile to customers—and to that big processor vendor—gets slowly pared away. You have to work harder than the hardware giant to beat them at software, all without ever openly endeavoring to beat them. And with so many open elements to cull a stack from, then bundle one around, then control from the top and decide what matters and what does not. Well.
The point is the platform. Without one, an integrated, preferably “open” one, a vendor in hardware or software is left prone to being a floating element instead of attached to the molecule. Or a free radical.
It is the unification of hardware and software and the expertise and reach that comes from the nip and tucks that make such a model ripe for entire markets. If you do that, keep it open and stable, and innovate on top of that, it’s a win.
Take high performance computing, for example. An area that has historically (and even still) remained tied to specific tooling and packages, yet the very growth of this market in the future depends on intensely close unification of hardware and software. And that software is hard to use, hard to integrate, and hard to extend beyond HPC. So much so that specialized vendors, all using the world’s largest microprocessor vendors’ chips, have to spend a great deal of research and innovation dollars in keeping it fresh, functional, and differentiated.
But what is the point of all of that hassle when the collective efforts across the software ecosystem can be bundled and managed under a single entity? And how much better could it be that the entity in question has primed all of those disparate tools to its own hardware—and used its extensive influence and reach to forge bonds at all of the trickier layers of the stack for HPC, the OS and scheduler and requisite tooling, for instance?
If you’re an end user, especially in HPC where a fragmented software stack of so many different compilers, tools, schedulers, resource managers, and more, having this put together and managed has its appeal. The overhead of managing the software stack is a full-time job and a revolving hassle. From several points of view; a system integrator, a research lab, or a general enterprise cloud user, cobbling together the various elements of a workable, stable software stack that is tuned around specific applications and needs is a resource-heavy task. It is for this reason a number of cluster management frameworks have arisen throughout the years, but for some users, particularly in high performance computing, being able to piece together the entire stack with stable components that are designed to play well together is advantageous.
So remember that story above? There is no value judgement in it. It benefits all and it does not benefit all. But what good does the isolated management of various stacks do to move anything forward either?
It is with these challenges in mind that Intel rolled out details around a new initiative launched off the Linux Foundation springboard called OpenHPC, which is geared toward maintaining a stable, open source software stack for the HPC community that allows for several options based on its partnerships with over thirty system and software vendors. This hit the news flow in advance of the largest gathering of supercomputing pros on the planet in Austin for SC15.
Although the initiative is anchored to the Linux Foundation, as several other open source efforts are as they gather momentum, Intel is the main driver as the founding member. As it stands, there is no one entity that can provide the various elements of the software stack for HPC, including through other cluster distributions like Rocks, for instance. As Intel’s general manager for high performance computing, Charles Wuischpard,, tells The Next Platform, the goal is to continue building this out with extensible “plug and play” pieces from the various ISVs, integrators, and end users.
Wuishpard put the OpenHPC effort in some personal context given his background as a systems integrator by noting how resource-constraining building and maintaining that stack can be. “All the main systems vendors in HPC, from Penguin, SGI, Dell, HP, Cray, and others were spending a lot of effort on their own HPC stack—it was resource consuming.” In talking with these and other vendors, particularly on the ISV side, the need to pull together efforts on all sides to create a more stable stack became clear, garnering them quick buy-in. Although to be fair, when Intel comes knocking these days it is getting harder to say “no.”
All of the elements above, which will be culled from some of the partners you see in the graphic below, will be bundled together into “supported versions” from Intel in 2016 to create a truly unified, diverse, and stackable stack that HPC centers can pull from based on their unique requirements. And here’s the part where you see how clever Intel is from a business perspective, but also, how important it might be that they’ve taken this on. Because, and correct me if I’m wrong here, something like this is needed. Both on-site and to fit into the very thing that is happening in extreme scale computing that warranted Intel’s “master platform,” the Scalable System Framework.
Who else could do this? And what does it mean? And is it something that IBM with its OpenPower efforts can also execute just as evenly? To be explored from SC this week…
If this approach to gathering their integrators under a single banner for a more or less unified software stack sounds familiar, it is because this is the same concept that backed their formation of the Cluster Ready program, which had at its core the goal to make spinning up stable software stacks for general purpose clusters easier and less fraught with peril. Wuishpard says for HPC in particular, especially with ever-larger systems coming down the pike beginning this year, the impetus is clear.
“As part of the CORAL delivery, we’re making investments already at extreme scale (handling 50,000-100,000 node systems and building the reliability and redundancy into that, for example). With so much research and development at that scale, along with lessons we’ve learned with efforts around OpenStack, for instance, the need for a software strategy that leverages all this work at the top end—and the on the cloud side—is needed.”
Even though the stack includes several elements relevant to large-scale computing centers, it is possible to pick and choose at will. It’s all about options and stability, Wuishpard says, managing to skirt a question about whether or not this move gives Intel even more leverage (and a clear hand over) what happens with the many system integrators in HPC that are reliant on the chipmaker for the next few years to come—at the very least.
In practice, the idea is that there will be a host of available options for HPC centers, all of which have been validated and packaged together for what Wuischpard says will be “plug and play” capabilities. This means that no matter which of the various partner tools are used, and the goal is to allow a great deal of choice, they will operate seamlessly—much like Intel has already been doing with the Cluster Ready stack. “For instance, one center might prefer a different Linux distribution, so we will provide options there, beginning with SUSE but with Red Hat right behind. It might be CentOS.” He notes that whether a user goes the supported route or not, it is possible to bring the licenses over. A supported version of, say SUSE, would then include the OS support as part of the entire package, meaning a better pricing option collectively, Wuischpard explains.
While pricing details on the support program, set to be rolled out sometime in 2016, have not been announced, it is clear that Intel is still cobbling together more options and partners in addition to the various members contributing everything from schedulers and resource managers to core tools, as you can see above.
And it is a positive new force in the community–having supported, stable versions of software that is specific to HPC end users. Because beyond the obvious, it means a more stable platform for extreme scale computing–and for Intel, that means growth of a business segment that while not their most profitable, is one of the most pure sources of true innovation. For open source components in the stack, it is a seal of approval and a continued new set of users. For ISVs, it’s a valuable partnership to keep extending their reach (even if it means now more than ever they are at the mercy of the world’s largest microprocessor maker).
Intel is the prime contractor on the largest forthcoming supercomputer we know of. To have it passing its collective work around that stack down to the smaller centers is an advantage–but as with any company that fights to stay at the top, there is some enlightened self-interest as well. That is not a bad thing, or a good thing. But it is, indeed, a thing.