(Sponsored Content) HPC workloads are rapidly moving to the cloud. Market sizing from HPC analyst firm Hyperion Research shows a dramatic 60 percent rise in cloud spending from just under $2.5 billion in 2018 to approximately $4 billion in 2019 and projects HPC cloud revenue will reach $7.4 billion in 2023, a 24.6 percent compound annual growth rate .
While leading cloud providers offer similar services and fee structures, the risk of lock-in is real. HPC users fear losing control over their fastest-growing infrastructure budget line item. A few simple strategies can help organizations stay nimble and avoid cloud lock-in.
Use containers along with custom machine instances. To provide portability between on-premises environments and clouds, users commonly create VMs that encapsulate HPC applications. Standards such as VMware’s VMDK, Microsoft’s VHD, and the Open Virtualization Format (OVF) have made it easy to package and ship VMs to your favorite cloud where they can be imported as managed images such as AMIs. While this is a big improvement over installing software directly on cloud instances, VMs can be unwieldy, and procedures for importing images vary by cloud provider. An alternative solution is to create a smaller set of virtualized base images containing essential components like the OS, libraries, and a preferred container runtime such as Singularity or Docker. Application-specific containers can then be pulled from a container registry allowing the same machine image to be used for multiple applications. This will help you stay portable across clouds and on-premise, deploy environments faster, and significantly reduce the work involved in preparing and maintaining machine images.
Stay as “down stack” as possible. While most HPC users tend to consume cloud services at the IaaS level, cloud providers offer increasingly impressive PaaS and SaaS offerings. For example, a customer deploying a machine learning environment may be tempted to turn to offerings such as Amazon SageMaker, Microsoft Azure Machine Learning Studio or Google Cloud AutoML. HPC users may be similarly tempted to look to cloud-specific batch services, elastic file systems, native container services, or functions. While these cloud services are capable and convenient, there is a price to be paid in terms of portability. Users can easily find themselves locked into cloud-specific software ecosystems. Also, when PaaS or SaaS offerings are deployed, each service generally consumes separate IaaS infrastructure, so even value-added services that are “free” (meaning that users pay only for infrastructure) tend to drive up costs. An alternative approach is to deploy containerized services on IaaS offerings where each instance can run multiple software components. This will take a little more effort, but it will help you move applications more easily between on-prem and cloud environments, reduce costs, and ensure that you can repatriate workloads back in-house should the need arise.
Beware of data gravity. In HPC applications, data is the elephant in the room. HPC operators constantly struggle with whether it is better to “bring the compute to the data” or “bring the data to the compute.” The issues are complex and depend on factors such as where the data originates, costs and time required for transmission, short-term and long-term storage costs, and the type and level of access required (file, object, archival, etc.). If you’re storing significant amounts of data in the cloud, keep data egress costs in mind, and look for solutions that can help automate data movement. You’ll want to be able to automatically tear-down storage services that are no longer needed, migrate data to lower-cost storage tiers, or automatically retrieve data to on-prem storage – especially in hybrid or multi-cloud environments.
Have a fall-back plan. Cloud computing offers many advantages for HPC users – instant access to state-of-the-art infrastructure, the ability to burst capacity as needed, and ability to scale capacity rapidly for faster results on a variable cost infrastructure. These capabilities come at a price, however. Seasoned HPC professionals routinely tell us that running HPC applications in the cloud on a sustained basis can be a number of times more expensive than operating on-premise HPC clusters – especially when cloud infrastructure isn’t well-managed. This is why many HPC users tend to run hybrid clouds or deploy workloads to the cloud selectively. As you build bridges to one or more clouds, make sure that the bridge is bi-directional. Being able to fall back if necessary and run workloads in-house when capacity is available is perhaps even more important than staying portable across clouds.
Regardless of how you deploy high performance applications today, chances are good that more cloud computing is in your future. Following the four strategies above can help organizations ensure a smoother transition to the cloud, maintain flexibility, and avoid the risk of lock-in and cost surprises.
Rob Lalonde is vice president and general manager of Univa.
Be the first to comment