If one had to make a guess about which of the big cloud infrastructure providers is sitting on eight million servers globally with a potential 1,400 petaflops of compute capacity, the list of companies is relatively short.
That is, anyway, if the answer is wrapped around the notion that the hardware needs to be owned and maintained by a single entity providing monolithic services like Amazon, Google, and Microsoft do. There are, however, other ways of navigating the IaaS road—and strategic relationships for both the hardware and software partnerships are at every turn.
The infrastructure provider in question here is Rescale, which has been steadily adding more resources to its pool since it emerged out of Silicon Valley in 2011 with an undisclosed amount of seed money. Some big names were writing the checks back then, including Richard Branson, and this was followed by two more infusions, including one in late 2014. Even though its resources are scattered across the globe, in true cloud fashion, the location of the hardware is mostly irrelevant for the business needs. The real magic is woven into the software, both to manage access and use of those resources—and more subtly, the way that software is managed in manner that suits both users and ISVs.
The key to all of this is, in many ways, more political than it is technical. The middleware, metering, and monitoring is nothing to overlook, certainly, but since the barrier for supercomputing applications in the cloud has much more to do with aligning ISVs and resources, the strategic story is one worth telling.
As one might imagine, even though the potential capacity provides an interesting press point, actually spinning all of that up at once to form a massive distributed supercomputer is unlikely—and not just because of the hefty logistics and cost. Those resources are collected across 30 global datacenters, including supercomputing centers as well as various public and private clouds, all of which are busy chewing their own workloads. In a conversation with The Next Platform, Rescale CEO, Joris Poort, said that while he is unable to share which supercomputing sites and infrastructure centers are at the heart of Rescale, these relationships are the result of a great deal of work between the two—presumably with the supercomputing site or cloud provider benefitting in some financial way by selling their unused capacity.
On that note, it has always stood out that one of the key investors in Rescale, back when they first got started with their aerospace and defense end users, was Jeff Bezos. It is not clear whether some of the general infrastructure that backs Rescale comes from spare or spot capacity inside any AWS datacenters or if there is a strong relationship with Amazon, but even still, Poort says that each of the datacenters brings something different to the table. Some provide resources that exist solely because of their geographic importance to a certain set of engineering end users, others because they are outfitted with GPU accelerators, fat memory, low-latency InfiniBand networking, and other more specialized hardware elements.
While the Rescale hardware story is an interesting one, to be fair, anyone with enough spare funds and good relationships with large-scale infrastructure providers could potentially wrangle enough servers to create a hardware cloud. Where Rescale is most interesting is on the software front, particularly in terms of how it has managed to wrap in as many engineering and simulation software providers as possible and offer them a way to match their business model to an on-demand or cloud-based environment.
As noted not long ago when ANSYS announced a new way for larger enterprise users of its simulation software using Amazon hardware, the challenge with HPC in the cloud is far less about securing the right infrastructure, especially now that some providers, including AWS are offering 10GbE and high-end Xeons (as well as GPUs) on the public cloud. It boils down to software licensing, which is a much less sexy topic when it comes to cloud-based supercomputing, but is in the fact an issue that keeps larger-scale simulations churning on on-premises hardware.
Growing use cases for cloud have meant big headaches for purveyors of HPC software packages, which are historically incredibly expensive and are still often managed with physical dongles. Recognizing this, as well as the increased demand from users to be able to run their applications without their own clusters (or to be able to test and develop in an HPC environment), Rescale started to cultivate a base of big name ISVs with HPC packages.
“For users, the real benefit is being able to get on-demand or hourly licensing—that removes a lot of constraints on what can be accomplished. For instance, at the most basic level, there is an engineering software package that a user wants to bring their license to—that is simple through our portal. But at the highest level, with something like Siemens PLM, we have that entire suite of simulation tools for purchase on demand or hourly directly through our interface. There’s an hourly price, although there is a premium because it is hourly versus annually, but it is right there and available immediately,” Poort explained.
“For ISVs, this is interesting because it is an opportunity for a new revenue stream. In the case of Siemens, the specific use cases they are targeting, with users running thousands of models in parallel at the same time. Nobody is going to buy that kind of licensing, so they’ve come up with a more flexible licensing model that is now open this way using this and other licensing mechanisms we are deploying.”
Poort says that other major engineering simulation vendors they have partnered with, including LS-DYNA, CD-ADAPCO, and others are also offering hourly licenses through the interface and can also expand their use cases by being more flexible with how they consider their pricing—even if it means charging a premium on that few hours since the full annual license can be prohibitively expensive for small engineering shops or for a short-term need that would otherwise necessitate a hefty license cost for what would otherwise be a spotty requirement.
ISVs are not the only ones who might benefit from expanding their users via the cloud, Poort says. Since Rescale provides a range of hardware flavors, users can see how their models perform using Nvidia Tesla GPU and Intel Xeon Phi coprocessors on the hourly license and hardware costs to evaluate what they might purchase for their future on-premises systems. “While we do offer a recommended hardware configuration for different applications, it’s possible to try out different configurations, say for instance, running the job on Haswell versus Ivy Bridge, with or without InfiniBand, with the high memory options, or other specialty configurations. These are options for expert users but we do not want to place a lot of restrictions, especially since there is a lot of variation in terms of what HPC applications need.”
HPC has apparently come a long way in the cloud, something that many had been expecting and others have been skeptical about.
“When we first launched over four years ago, users were hesitant to try anything even related to cloud. Certainly HPC workloads had a bad reputation on top of that, but that conversation has changed a lot—both from a performance and security standpoint. On the performance side, the way virtualization works and the way we’ve been able to tap into supercomputing centers has allowed us to reach similar performance to on-premises.”
Poort also explains that security has evolved over the years but more important, the geographic regions that users can choose from have enhanced the sense of security—not to mention keep in step with country laws that require some work to be kept within borders. He noted that the automotive market in Japan has become a hot customer set for Rescale, but they tend to keep most work inside Japan unless there is a requirement from one of the engineering offices in the U.S. or Europe.
“We’ve seen a transition from users looking at Rescale as an overflow capability to them looking at this as a more permanent solution. A lot of that is because our pricing has also changed for these specialty jobs that would be hugely expensive in both hardware and software to run. We built ScaleX Enterprise for these users who want longer term capacity—these users have done the math between what they can get if they built and maintained a cluster themselves and are seeing that it can actually be cheaper to use Rescale.”
It’s difficult to put a statement like that into any price comparison since the one big factor—the software license costs—remain mostly unknown, as does the scale. While the hardware pricing is on par with what AWS offers for general instances (not the specialty GPU and Xeon Phi), the big cost lies in the size of the model the customer wants to run and what those licenses add to the overall price. Again, the benefit of using any of the infrastructure providers is that models can scale to new heights of complexity without adding more nodes (and thus licenses) and then spin back down when they’re done. This is a compelling model but the pesky software licenses have held things back in the HPC market in particular.
In other words, on both the hardware and software front–and for both the users and the vendors–there is something to gained, which is at the heart of how politics work. Poort and the Rescale team plan to continue campaigning to keep adding more software possibilities as HPC simulation software users keep moving along infrastructure election cycles.