Wanted: An Energy-Aware Datacenter Application Scheduler

Around the world, the number and size of datacenters are both growing at a fast pace, and the devices housed in them are consuming more and more power as well to deliver ever-increasing performance. And as a consequence, datacenters are using more and more of the planet’s energy to do our processing and storage as well as to keep us entertained and connected.

When we were talking to Liqid, which makes composable infrastructure fabrics, during the SC21 supercomputing conference about its latest hardware and software, Sumit Puri, the company’s co-founder and chief executive officer, through the stats at us and they are relevant as we consider Treehouse, an energy-aware datacenter application scheduler that is being proposed by researchers at Microsoft Research, MIT, Columbia University, the University of Michigan, and the University of Washington and that just published a paper on the idea.

The numbers are striking. There were around 500,000 datacenters around the world in 2012, according to Puri, who is on a mission to make datacenters more efficient by driving up the utilization of CPUs, GPUs, and other ASICs, and by 2020, that number has grown to over 8 million. Depending on who you ask, these datacenters now consume anywhere from 2 percent to 5 percent of global energy, and there are projections that somewhere between 20 percent and 25 percent of the world’s energy will be consumed by datacenters by 2025. (Such stark projections have been done in the past and have not materialized, by the way, because of the relentless innovation in making datacenters and the systems inside of them more energy efficient.)

So just how much energy consumption by datacenters are we talking about? It’s a lot.

While the hyperscalers and cloud builders have built ever-more-efficient infrastructure, according to data compiled by Statista, in 2015 these two groups together consumed 93.1 terawatt-hours of juice in 2015 compared to 97.6 terawatt hours for traditional, enterprise datacenters as a group. Power usage by traditional datacenters has fallen by two-thirds by 2021, to a mere 32.6 terawatt-hours, but power usage by the hyperscalers and cloud builders has climbed by 70 percent over that same time to reach 158.2 terawatt-hours. With Moore’s Law running out of juice and devices getting hotter as they drive higher performance, datacenter power consumption can only keep rising.

And that is the impetus behind the Treehouse project, which bills itself as a means to make datacenter software “carbon-aware” in keeping with the common language used among those who (rightly) espouse energy efficiency. We don’t like that term because in many cases, energy supplied to datacenters does not come from burning coal or natural gas, but quite purposefully comes from wind, solar, and hydroelectric generation, and even though no one talks about it, if you plug into the power grid in the United States, Europe, and Japan, you are getting juice that comes from nuclear reactors, too. Yes, we know that energy forms often get converted to carbon equivalents to make comparisons. Nonetheless, no one would argue that energy efficiency is not a big deal for datacenters, and we think “energy-ware” is a better term for what the Treehouse project is proposing.

According to the researchers who are starting the Treehouse project, datacenter application demands for compute, storage, and networking are growing so fast that they are going to outpace expected energy efficiency improvements in the coming decades. We have already done server consolidation using virtualization and containerization, as well as new ways to distribute power to gear in the datacenter and innovative cooling techniques to keep that datacenter energy usage from exploding more than it has. Now, argue the Treehouse researchers, the time has come to dig into the software and make it aware of the energy it consumes before it eats the watts. Here are the gaps they see:

In essence, the Treehouse effort seeks to treat energy like a first-class datacenter resource, alongside compute, storage, and networking. The way they see it, a number of things have to be developed and agreed upon by the IT community for energy-aware datacenter application scheduling – meaning not only finding the right, open source for running a particular piece of software, but also being cognizant of its energy consumption while it is running – to be a reality. The first thing is to have a universal mechanism for tracking energy usage by software on all hardware. The second is to have a common interface to expressing the service level agreements (SLAs) for application software so SLAs and energy efficiency can be interplayed. Sometimes, of necessity, a job needs to be run relatively inefficiently because more than anything else, it needs to run now. Every datacenter operator understands this, and every end user demands this. The researchers also say that we need to develop a common, fine-grained, fungible unit of execution called a μfunction to enable more efficient and portable usage of hardware.

The energy consumption by applications is tricky, say the researchers, in that there are direct energy costs for running code in user space on a processor with memory, but also because there are also associated energy costs from using the operating system and middleware, storage devices access, and network transport of data. They call this combined power consumption profile the energy provenance of an application, and it looks like it will require some AI assistance to get it right.

“Since it is difficult to directly measure the lifecycle energy provenance of individual applications through hardware mechanisms alone, we believe it will be necessary to construct a supervised machine learning model to estimate the energy provenance of the application, given its resource usage,” the Treehouse researchers write. “The input (or features) of the model will be metrics that are easily measured in software, including the network bandwidth (for switches and network interface cards), bytes of storage and storage bandwidth (for memory and persistent storage) and accelerator cycles, as well as the type and topology of hardware the application runs on.”

Integrating SLAs into the datacenter-wide application scheduling stack is intuitively obvious, but the μfunctions may not be so obvious. As we all know, there is a lot of stranded resource in the datacenter due to static provisioning and overprovisioning, and the best way to improve energy efficiency is to fully utilize the resources that are burning juice. (You never turn a server off, but you always find it something to do, as James Hamilton of Amazon Web Services has said many, many times. Once you spend money on it, you have to work a server to its technical and economic death.) Applications are currently too coarse-grained to allow for efficient packaging and deployment on hardware, so Treehouse is trying to intersect with evolving containerized microservices and serverless application development and push this finer-grained approach and use it to bin pack smaller chunks of software onto shared hardware resources.

The plan is to use an RPC-based API for μfunctions and weave that together with the SLA and energy budget to determine when these μfunctions run and where they run. These μfunctions will, as the term suggests, operate on the scale of microseconds.

“In order to exploit fine-grained variations in resource usage and concurrency, we plan to support microsecond-scale invocations of µfunctions, an improvement of several orders of magnitude over existing serverless systems.”

To do this, the researchers say, will require some hacking. First, it takes too long to coldstart serverless functions today. After optimizations, the Firecracker lightweight virtualization layer for serverless applications, takes at least 125 milliseconds to fire up. The fact that RPC protocols are built on top of the HTTP Web protocol doesn’t help, either. It’s just too slow. There also needs to be some way to abstract μfunctions sufficiently that they can run across different processing engines and use different memory and storage as they become available, and this will require an agreement on an intermediate representation for both compute and memory. Something represented by this:

Isn’t that pretty? Charts are easy to make. New datacenter-spanning application and system architectures, not so much.

And, in the long run, the Treehouse researchers say they will need a new energy-aware and energy-optimized operating system and runtime environment for datacenter applications. Timothy Roscoe, professor in the Systems Group of the Computer Science Department at ETH Zurich, argued brilliantly last summer at the OSDI21 conference that it was time to start doing research and development in operating systems again, and this might be a good place to start.

Treehouse will, in short, be a massive project.

Wanted: An Energy-Aware Datacenter Application Scheduler

Sign up to our Newsletter

Be the first to comment

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

Inside The Infrastructure That Microsoft Builds To Run AI

Microsoft Azure Blazes The Disaggregated Memory Trail With zNUMA

Microsoft Holds Chip Makers’ Feet To The Fire With Homegrown CPU And AI Chips

Be the first to comment

Leave a Reply Cancel reply