HPC and the cloud have an uneasy, lukewarm relationship. Some corporations running HPC environments take the view that they have the infrastructure and software capabilities they need to run their own often massive workloads and taking on the networking costs, security concerns and management hassles of running applications and keep data in the cloud doesn’t make sense to them. There is growing adoption of the cloud in HPC, but it pales in comparison to the aggressive push by many enterprises to move more of their workloads to Amazon Web Services (AWS), Microsoft Azure, Google Cloud or any of the other providers.
There are those who see a role for the cloud the HPC space. Pratt & Whitney, the giant manufacturer that designs, builds, and services jet engines for military and commercial aircraft and is part of the conglomerate United Technologies, has a significant HPC operation to run many of the complex workloads and algorithms needed in its space. However, the HPC space is changing, and those changes are creating their share of challenges. Pratt & Whitney has been turning to the cloud – in particular AWS – to help address those challenges, according to Pete Bradley, a Fellow with the company’s high-performance computing and modeling business.
The company has about 1,000 unique users every month using an HPC operation that runs on tens of thousands of processor cores, Bradley said. An example of the kind of HPC work done is the Pratt & Whitney’s PurePower engine, which took about 20 years and $10 billion to create. The result is the PurePower PW1000G engine, which offers significant improvements in fuel efficiency, noise reduction, and emission reduction. Much of such work involves computational fluid dynamics (CFD) to drive maximum efficiency through the compressor and turbine as well as for the combustor of the engine.
Pratt & Whitney now is working to adapt to what Bradley calls HPC+, an environment that includes more automated workflows, simulations and more crunching of data, and includes not only traditional systems but also big data analytics, the Internet of Things (IoT), open-source technologies like MapReduce and 3D inspections.
“We’re seeing HPC change a lot over the past few years,” Bradley said during a talk at the recent SC18 supercomputing conference in Dallas. “First of all, the original cadence for high-performance computing was really around the user. A particular user might have an air-flow design or something like that and so they would take their airflow design, mash it up, they would run it through their computational fluid dynamics software in batch, get their results back, analyze them, make some tweaks, and go again. There’s still some of that work, but a lot of things are becoming much more automated now. Where before we had a lot of users in the loop, when the computers got faster, the users became the bottleneck, so the intersection between high-performance computing and statistical methods is really exciting and has allowed us to drive explorations of the design space to where now we define the rules and a model will basically explore a number of different cases to get to the optimum.”
The company simulates these parts hundreds of thousands of times before it begins to bend any metal. The cadence has become increasingly real-time, leveraging data from diverse places as the shop floor to sensors on the engines to drive real-time knowledge back to the business. That changes operations, he said. Before they could schedule eight hours of downtime for the systems; now they have to be running all the time. In addition, in HPC+ environments, there are mixtures of HPC and other systems. For example, the company can immediately analyze huge amounts of data or put the data in a datalake and analyze it later. This is where the cloud can help.
Pratt & Whitney uses the AWS cloud for multiple purposes, including capacity management. The company has a well-established process to manage priorities, but sometimes there is an immediate need for more compute power. In addition, the cloud also helps accelerate innovation.
“The major cloud providers are not just providing horsepower,” Bradley said. “They’re providing major software platforms on top of that, things like serverless computing and database-as-a-service and all kinds of interesting Legos and building blocks for you to build applications. When you think about workflows and these complex workflows that we’re putting together now to really understand our design space, there may actually be some opportunities to do that.”
Disaster resiliency also is important, not only for the compute elasticity but also the geographic distribution.
At the same time, companies need to understand that there are holes in the myths around the advantages of the cloud. Public clouds are touted as less expensive options to spending money upfront for infrastructure hardware, but there are additional costs associated with the providers, Bradley said. They have people writing the various building blocks of the services, all of whom are included in the costs. And at the end of the day, the cloud providers are in it to make money and they’re making a lot of it.
“You’ve got to think about things that having cloud in your HPC portfolio allows you to do that you couldn’t do any other way. Where do those building blocks fit in? Where can you go faster? Where can you innovate faster? Another area is, where are the places that are just difficult from an investment point of view?” he said., noting that he was presented with a project that would need a petabyte of space for four to six months. “Do I want to go out and buy a petabyte of space for that project? It might sit on shelf after that, so that might be an over-investment. Another thing are machines that are infrequent and difficult to use. For example, if you don’t have a lot of GPU-based applications, but you need a couple of days a month.”
Other myths? There’s the idea of unlimited computing power, but cloud providers are no different than other businesses in wanting to maximize the money they’re spending. Nobody wants to leave processors idle, so while they have a lot of compute power, much of it might be being used by other companies and you will have to wait. And while organizations are told they can simplify via the cloud, there’s a lot of new technologies to learn.
The keys to successfully leveraging the cloud in HPC includes challenging everything that’s said about the cloud, performing your own benchmarks, don’t underestimate the difficulty and understand that cloud isn’t always the answer. “We’ve done some really cool stuff,” Bradley said. “We’re going to continue to do some really cool stuff. But it’s not trivial. It’s definitely something that takes a learning curve.”