William Lu was witness to one of the most successful business and technology stories in modern supercomputing from his long-term position as one of the first developers of the Load Sharing Facility (LSF) software from Platform Computing. Like many who worked for the 550-strong Toronto company, Lu was there from the beginning—and held on until the drastic change.
While most of the mainstream tech world might have only heard of Platform Computing when IBM acquired it three years ago, it was one of only a handful of companies in the industry that managed to be wildly profitable every single year, with year over year growth riding around the 30 percent to 40 percent growth curve. There was quite a bit of surprise when the acquisition news emerged, especially since former CEO, Songnian Zhou, fiercely defended his company’s position to never go public and to continue operating independently with a roughly consistent run of around $73 million in revenues for most of the mid-2000s.
Lu says that Platform Computing’s once front-and-center CEO has disappeared from view into a quiet retirement, but as someone who remembered the growth of LSF, and understood well the customer base, comprised of mostly commercial and research supercomputing sites, watching the IBM transition was rough.
“When IBM acquired us, there were probably 2,000 or so users of LSF. Among those 2,000, maybe 5 percent were not concerned about the future because they had strong ELA agreements with IBM, which meant they got excellent deals. These are their top customers, most with six and even seven figure deals. So that’s where the support focused—we watched this movie and saw that a lot of the second tier accounts, that big majority, were just not big enough to be on IBM’s radar—so without those great ELAs they started looking for alternatives.” Lu says that of that 2,000 base of LSF users, about 50 percent of the LSF revenue upon acquisition came in the EDA space, meaning it’s not hard to pare down who those customers likely are—especially at the six and seven figure range.
Interestingly, a few years before the IBM acquisition, Platform Computing released an open source variant of its Platform LSF product, called Platform Lava (now OpenLava), which had a very small user base that has since grown to what Lu thinks might be around 1,000 overall. While that is not a huge customer base, he says he and the small team of ex-Platform Computing folks, including James Pang, who ran product management at Platform for several years, could envision how that second tier, especially since they have all the existing LSF hooks in their datacenters, might want to look to a supported version of OpenLava, which builds on The Next Platform Lava tooling and provides a relatively (key word there, we’ll get to that in a moment) seamless way for users to move an existing cluster from IBM Platform scheduling and workload management tools over to OpenLava.
The small company, which has an undisclosed bit of capital to feed its initial push into the market as well as to drum up attention around OpenLava, got its start last July under the name Teraproc. Headquarters are just up the road from the old Platform Computing digs and as one might imagine, will be the new home for a few more former Platformers is the OpenLava effort translates well for users.
The real hook here for Teraproc to capture existing LSF users boils down to compatibility—something that was a difficult hurdle for other companies that thought there would be a mass migration to a new scheduler once IBM took charge of Platform Computing. For these existing LSF deployments, OpenLava is, as Lu describes, “API, command line and configuration compatible with LSF. Users build a lot of scripts to submit jobs to these workload managers. These scripts were sometimes built a long time ago and have been used a long time so if you change the underlying workload management, you have to change these things. This is very hard and in many cases, the users don’t have knowledge of the script, which is why on the commercial side we see changing the user job submission script is a big deal.”
As noted, EDA is a primary LSF base and there is there is a lot of expertise for LSF in that community and all the ISVs in that area make sure their applications are tested against it, Pang says. Moving outward, the market opportunities for an LSF-like alternative do seem appealing. For instance, he estimates that manufacturing (including automotive and aerospace in particular) is PBS Pro and LSF with half shares and a similarly integrated ecosystem among ISVs, which is a testament to its growth there with similar 50/50 splits in oil and gas (Schlumberger, for instance, has standardized on LSF for reservoir simulations). The national labs, research centers, and government agencies are transitioning. Moab used to be common, but now there’s much more use of SLURM, and PBS Pro has good adoption there, as does Grid Engine. Pang says Platform Computing products are a big winner in financial services, but that is mostly along The Next Platform Symphony, the Java messaging platform.
In short, there is market room at the high end of systems, especially for a company like Teraproc, which has all the tools at the ready for users to make the transition from the IBM product to the open source supported version. Make no mistake, however, this is still not a simple process. Lu was quite forthcoming about the challenges in making a shift in workload managers, although it can be done in a few weeks and with minimal to no interruption to running processes via some tools they’ve built to manage migrations. However, he notes, that top five percent of LSF users tend to have much more challenging environments with custom features, but they are unlikely to make a change due to the strong ELAs anyway, says Pang.
Lu says that Teraproc (which is a mashup of tera and processes) also noticed other trends among that second tier of LSF users, including the fact that cloud deployments for this base were gaining traction. They worked to make it possible for users to deploy clusters based on OpenLava’s framework on the Amazon cloud and have two other business angles selling OpenLava-driven R as a service with GPU hooks. Lu and the Teraproc team see HPC cloud as a burgeoning area (just as he did in 2010 when he was banging the drum for supercomputing clouds) and want to be out front, especially for the mid-sized customers that might have felt left behind once they moved from Platform Computing’s small town feel support-wise to having to make calls into the monolithic IBM division that handles Platform LSF, Symphony, and now IBM’s other related services.
Many of these users, especially in EDA and manufacturing, are not comfortable running workload management suites and schedulers without support, said Pang, who is now joined with Lu to lead the push toward OpenLava development. “These are mission critical clusters. They need to bring products as fast as possible to the market, and that’s a key driver. Also though, there are new HPC users who need open source schedulers but they are new to this community and are not familiar with configuring and using—this is the base we are targeted toward.”
The goal of Teraproc then, is not just to offer commercial support around OpenLava, but to help push some of their mystery capital into funding the community around the open source framework. Pricing is the interesting issue, however. The challenging part about understanding LSF pricing is that it’s always been per core, but discounted heavily and tied to other major deals. Teraproc is tackling the market with per node licensing, which is the same no matter what one stuffs into the box—GPUs, high-end processors, it does not matter, says Lu.
“PBS Pro priced their product at $15 per core. LSF is quite a bit more than that. The list price is close to 10 times more, but this is not the street price, of course,” Pang says. He adds that for that other 95 percebt of IBM’s batch of lower end users, especially toward the bottom of that stack, they don’t have a ton of cores so they do not get many discounts at all, making this a very expensive management framework.
With their hooks for LSF users and 5,000 downloads of the binary version of OpenLava through its early stage site (not all of those will use it in production, obviously, but that’s some decent tire-kicking), Lu and Pang says Teraproc willmove past its initial “handful” of early users this year—and keep pushing the OpenLava message to HPC and beyond.