How This Battery Cut Microsoft Datacenter Costs By A Quarter
March 13, 2015 Timothy Prickett Morgan
In yet another example of how distributed systems sometimes work better than centralized ones, the hardware engineers at Microsoft have come up with a new battery-backed power supply for their homegrown servers that allows for massive – and expensive – battery rooms to be eliminated from the cloud giant’s datacenters.
The new power supply, which Microsoft calls the Local Energy Storage (LES) unit, was designed as part of the Open Cloud Server hyperscale system that the company donated to the Open Compute Project last year and updated last October with some significant tweaks. In the spirit of openness that might seem a bit strange coming from Microsoft, the new LES specification is being opened up through the Open Compute community as well.
It is significant to note that the Open Compute designs put forth by Facebook three years ago had already moved batteries into the Open Rack design to gain efficiencies. And Google said way back in April 2009, in a rare look at its internal datacenters, that it had not only been using containerized datacenters to boost efficiency since 2005, but had put 12 volt battery packs on its servers so they could ride out failures on local, rather than centralized, stored power. That was a decade ago, just to show you how far ahead Google can sometimes be compared to its rivals. Supermicro and others offer power supplies with battery backups built in, too.
With the LES power supply-battery combination, Microsoft is making its engineering available to anyone, and it is explaining to people just how much more efficiently they can run their datacenters with this subtle shift from massive central batteries to distributed small ones. It is also doing a little engineering, too.
Assault The Battery
In a traditional datacenter design, companies deploy uninterruptible power supply, or UPS, systems that are giant banks of lead acid batteries. The UPS provides power to the servers, storage and networks if there is a short glitch in the power feed that might otherwise cause the machinery to fail or reboot. The UPS sits in between the high voltage feed coming into the datacenter from the electrical grid substations and the server and storage machinery that runs at a much lower voltage inside the datacenter.
The innovation at Microsoft, explains Shaun Harris, the principal hardware engineer at Microsoft who invented the LES unit, was to use the same lithium ion battery cells in the server power supply that are used in battery-operated, rechargeable hand tools that were first adopted in the construction industry and then widely adopted in homes. Specifically, Microsoft is using the Panasonic 18650 lithium ion cell, which Harris says is a bit bigger than the AA battery commonly used in consumer electronics and better suited to the server job Microsoft has.
“This cell has a commodity price, and it carries UL certification, so it has been abused and it has high quality,” Harris said. “Scale forces simplicity and we must have scale to keep our costs low.”
Hence the use of commodity lithium ion batteries from power tools, of which Panasonic made over 100 million in 2014. These batteries cost around $8 a pop when you buy them in a four pack and a whole lot less when you buy millions for a Microsoft datacenter.
The innovation that Microsoft did on this idea was to hack into the switched mode power supply used in its Open Cloud Server machines and put the battery right into the existing circuits. So the battery is not hanging off to one side, as they did in the Google servers from 2009, but is embedded in the power supply without any extra circuit costs. And importantly, the batteries are not in the power path between the electrical source and the server motherboards and components. Rather, they extend the life of the bulk capacitors in the power supply in the event of a power failure in the main feeds.
The net result, said Harris, is that the cost of providing battery backup power to Microsoft’s server and storage fleet has been reduced by a factor of 5X, and the power usage effectiveness (PUE) of its datacenters has been increased by 15 percent.
A lot of that cost savings with the LES approach comes from not having to build a separate room, with thick walls and special venting for hydrogen gas, that houses the giant UPS batteries. Kushagra Vaid, general manager of server engineering for the Cloud and Enterprise Division at Microsoft, tells The Next Platform that a typical 25 megawatt datacenter weighing in at around 600,000 square feet (that is a dozen football fields, roughly) has about 150,000 square feet of that, or about 25 percent of the floor space, dedicated to UPS gear. At an average rate of $220 per square foot for construction, not having to build that UPS room works out to around $31 million in savings for that 25 megawatt facility. (The same ratio of savings applies for smaller facilities.)
The move to distributed battery backup has some other cost reduction effects.
In a typical UPS backup scenario in a normal datacenter, the incoming power is converted from AC to DC so the battery can be charged, then converted back to AC coming out of the battery to be distributed out to the power distribution units, where it is stepped down to the 120 volts where the servers consume it. By putting the batteries in the servers, Microsoft can do one AC to DC conversion and distribute 380 volts DC directly to the Open Cloud Server power supplies and then step it down to 12 volts for the server and storage nodes. Harris says that about 8 percent of the energy coming into the UPS is lost during the battery recharge process and that another 9 percent is lost in the AC-DC double conversions and another point or two is lost through power management devices in the chain between the UPS and the systems. By putting the batteries inside the power supply, the overhead on battery charge is only 2 percent. (He did not explain why it is so low, but perhaps this is the difference is the sum of the efficiencies gained by moving from lead acid and to lithium ion batteries added to the fact that the datacenter only needs one battery per machine instead of redundant shared batteries for the whole datacenter. There may be a little more that comes through charging the capacitors instead of being in the main power flow inside of the power supply.)
The other benefit of the LES hybrid is that the battery backup capacity automatically scales with the compute and storage in the racks. You don’t have to do datacenter-wide capacity planning for UPS gear and you don’t have to buy ahead of need. You don’t need chemical suits and protocols to fix a battery-backed power supply, as you do a UPS room, so the mean time to repair is minutes instead of days, and the resulting setup is inherently more reliable because it has fewer parts.
“In all of our datacenters going forward, we have changed the architecture, and we have saved hundreds of millions of dollars so far,” says Harris.
Keeping Your Cool
You might notice in the presentations above another big change that Microsoft made in its datacenter designs, which Vaid said that the company did five years ago. Normal datacenters have CRACs, which is short for compute room air conditioners, and these are exactly what the name suggests: giant air conditioners that suck in hot air and pump out cool air to keep the datacenter from overheating. These CRACs have chilled water pumps, water chillers, condenser water pumps, and cooling towers attached to them, and water pumps that take water from the ground to feed the entire system. All of this machinery takes a lot of energy to run, and it is one of the reasons why the PUE of most traditional datacenters is so high.
About five years ago, when Microsoft was just starting to ramp the Azure public cloud and build out its various online services to mammoth scale, its engineers replaced this entire old-fashioned cooling system with an adiabatic cooling system, which uses less water than a cooling tower and essentially works by blowing hot air through moving streams or mists of water (sometimes cardboard saturated with water) to have evaporation remove the heat from the air. Vaid said that Microsoft was also using outside air economizers in conjunction with the adiabatic cooling system, which is just a fancy way of saying that it sucks in cool air from outside the datacenter and exhausts hot air to keep the datacenter at a constant temperature. Facebook uses a homegrown combination of outside air economizers and adiabatic cooling using cardboard in its Forest City, North Carolina datacenter, which The Next Platform has seen in action.
Microsoft did not say how much energy, money, and space it has saved by moving to such adiabatic cooling, but it is very likely substantial.