Microsoft, Stanford Researchers Tweak Cloud Economics Framework
March 2, 2017 Jeffrey Burt
Cloud computing makes a lot of sense for a rapidly growing number of larger enterprises and other organizations, and for any number of reasons. The increased application flexibility and agility engendered by creating a pool of shared infrastructure resources, the scalability and the cost efficiencies, are all key drivers in an era of ever-embiggening data.
With public and hybrid cloud environments, companies can offload the integration, deployment and management of the infrastructure to a third party, taking the pressure off their own IT staffs, and in private and hybrid cloud environments, they can keep their most business-critical data securely behind their own firewalls while still taking advantage of a cloud infrastructure.
For both public and private clouds, a challenge has been developing an economical and efficient way to allocate all that infrastructure. Cloud providers and operators have done a good job over time figuring out how to make these shared infrastructures available to multiple tenants and to secure the workloads, from enabling groups to share the resources and using virtualization and container technologies to secure them to creating ways to schedule tasks for the infrastructure over time. That ranges from ensuring service level objectives (SLOs) to enabling maximum use of the cloud clusters.
However, the allocation of the cloud resources has continued to be an issue. Organizations are still struggling to find the most economically efficient and fair ways to allocate all those costly systems and to resolve conflicts in resource demand from the multiple tenants that are sharing the infrastructure. A group of researchers primarily from Microsoft – with one from Stanford University – are saying they’ve developed technology that addresses the challenge by creating a pricing model that ensures infrastructure sharing and addresses scheduling issues for even high-value workloads. The group’s Economic Resource Allocation (ERA) framework is essentially based on the idea of reservations, similar to those people use every day to book hotels or find their way into busy restaurants, or in the supercomputer world, to schedule time on the clustered systems for their applications.
A key part of ERA is that it allows for flexibility in both time and pricing.
“We focus on the economic challenges of scheduling and pricing batch style computations with completion-time SLOs (deadlines) on a shared cloud infrastructure,” the researchers wrote in a paper titled ERA: A Framework for Economic Resource Allocation for the Cloud. “The basic model presented to the user is that of resource reservation.”
There are resource allocation tools that are currently used in both public and private clouds, but they have their limitations. In private clouds, these are in the form of fixed pre-paid guaranteed quotas, a technique that the Microsoft researchers argue forces the organizations to over-provision their clouds. They have to guarantee that every user has the resources available that they’ve paid for upfront, so the cloud must have sufficient resources to ensure the capacity that has been promised, though not all of that capacity will be used at the same time.
The lump-sum payment also means that “the users’ marginal cost for using their guaranteed resources is essentially zero, and so they will tend to use their capacity for ‘non-useful’ jobs whenever they do not fill their capacity with ‘useful’ jobs,” they wrote. “This often results in cloud systems that seem to be operating at full capacity from every engineering point of view, but are really working at very ‘low capacity’ from an economics point of view, as most of the time, most of the jobs have very low value.”
In public clouds, the primary technique is on-demand unit prices – usually fixed prices where users are charged money per unit of resource used. The problem is that demand can be spiky – short bursts of peak demand followed by periods of low use – and that means cloud providers are forced to decide whether to spend the money to over-provision their environments to guarantee resources at all times or to give up on the idea of being able to guarantee capacity at peak times. This means that users can’t risk their high-value workloads not being able to run when needed, which means that on-demand pricing is only best for low-value or time-flexible tasks. Companies have to buy long-term guaranteed access for their high-value production jobs.
Both fall short of meeting what the Microsoft researchers said is the key goal of the cloud: “to maximize the economic efficiency, that is, to maximize the total value that all users get from the system. … The optimal-allocation benchmark for a given cloud is that of an omniscient scheduler who has access to the complete information of all cloud users – including their internal costs and alternative options – and decides what resources to allocate to whom in a way that maximizes the efficiency goals of the owner of the cloud.” Efficiency is measured by the value obtained rather than the resources that are used.
With ERA, users put in a request at the time of reservation, essentially outlining their needs – how many containers with how much capacity and how many cores for each, for how long a time and in what time range (such as sometime between 6 a.m. and 6 p.m. on a particular day) and how much they’re willing to pay for all this. The flexible pricing is key: it drives greater flexibility and establishes the lowest price available for ensuring the system can support the request. The more flexible the user is willing to be, the better the chances they’ll get a good price, and the price is fixed at the time the reservation is accepted and the resources guaranteed. If the price request is too low for the environment, the request is denied. “The guarantee is to satisfy the request rather than provide a promise of specific resources at specific times,” they wrote.
The ERA system is situated between the user and the cloud infrastructure, and includes the ERA Cloud interface, which includes the key details of the resource management infrastructure and handles the allocation of those resources independently of the infrastructure. The researchers noted that it is essentially a complete software system in which users can experiment with the algorithms to drive innovation around it. ERA receives online reservation requests from multiple users, with some being accepted and then works with the job scheduler to ensure that the reservations that were accepted get the promised resources. The software tells the cloud how to allocate the resources to the jobs and the cloud should follow those instructions and, hopefully, provide updates about such issues as available capacity.
There are two outward-facing APIs – one that provides users with the reservation model and another that addresses the low-level cloud resource manager to enable the cloud system to step away from time-dependent scheduling or pricing concerns (ERA takes on those tasks) while enabling the ERA software from the lower-levels jobs of swapping out processors or assigning specific processors for particular workloads. An internal API deals with algorithmic scheduling, pricing and prediction modules. The scheduling and pricing algorithm dynamically figures out future resource prices based on supply and demand that includes both resources that are already committed to reserved workloads and predicted future requests. They also determine the pricing for current requests to ensure the least expensive possibility. “Our basic prediction model uses traces of previous runs to estimate future demand,” the authors wrote. “The flexible algorithmic API then allows for future algorithmic, learning, and economic optimizations.”
The researchers ran ERA through multiple simulations and proof-of-concepts using Hadoop/YARN (also known as MapReduce 2.0) and Microsoft’s Azure Batch simulator. Details of ERA and the testing can be found in the group’s report, but the researchers said the results were good. Compared with a “greedy” algorithm that focuses on fixed pricing per unit rather than the value of the jobs themselves, the tests found that the greedy algorithm allocated most of a cluster to large, low-value jobs, hitting a 10 percent efficiency rate of the total value of the jobs requested. ERA’s Basic-Econ 15 algorithm hit 51 percent of the requested value. They also were able to fully integrated ERA into Rayon, a cloud system that is part of Hadoop/YARN and that manages compute resource reservations. In addition, using the Azure Batch simulator, the researchers found that ERA’s algorithm performed better than other algorithms in multiple measures.
The authors noted that there is more work to be done with ERA in such areas as scheduling (such as enabling a tradeoff between the time to run a workload and the resources used), pricing (giving the user more flexibility in setting preferences, such as outlining a “preferred” deadline and “latest” deadline), learning (given that available resources and demand are always growing, possibly bundling the two forecasts or separate them), and robustness (what kinds of extreme-case guarantees can be given). Still, the initial results from the ERA tests were positive, which should be good news for cloud providers – and private cloud builders – that are looking for tools for getting the most economic value out of their infrastructure and for users seeking the most efficient way to run their workloads.