While online auctioneer eBay does not run the largest search engine in the world, search is a very key component of its service, which has over 162 million active buyers and 900 million product listings. Looking to improve upon its current hardware infrastructure underpinning its search engine, eBay has tapped Dell to create a new water-cooled, hyperscale-style rack system that will let it overclock its servers and boost their performance on compute-intensive search algorithms.
The hyperscalers are all a bit cagey about the search engine infrastructure that they use because it is such a critical component of what they do, whether search itself is the business as it is with Google, Microsoft, and Baidu, which have public search engines that crawl the Internet, or if it is merely a part of the service they provide, as it is with Facebook, Amazon, or eBay. It is somewhat surprising that eBay is giving the outside world a peek at its search engine server infrastructure, but the system that was designed as a proof of concept for eBay by Dell shows off some clever thermal engineering and gives Dell’s Extreme Scale Infrastructure business, which peddles several billion dollars in custom iron a year to hyperscalers and smaller companies that emulate them to a certain degree, a new product line to push.
Dell developed the new water-cooled system, which is code-named Triton, to help eBay lower the footprint of its search engine infrastructure while at the same time dropping its total cost of ownership for those servers. In a briefing at the engineering labs at Dell’s facilities in the Austin, Texas suburbs, Austin Shelnutt, principal thermal architect for the Extreme Scale Infrastructure group, walked The Next Platform through the Triton system. Shelnutt said that Dell’s former Data Center Solutions group, which makes custom gear for hyperscalers and which was merged with its Datacenter Scalable Systems group last December to form the Extreme Scale Infrastructure group, has been working on water-cooling options for the past six years and had created several generations of liquid cooling technologies that DCS customers have experimented with or deployed in production.
Liquid cooling is very retro, and, by the way, so were 19-inch equipment racks when they came to X86 machinery in the late 1990s. With hyperscalers that have hundreds of thousands of servers in their datacenters, Shelnutt says that they can do all kinds of thermal tricks with baffles and fans and other kinds of power optimizations to try to squeeze more efficiency out of their machinery, but that sooner or later, to drive more efficiency and performance at the same time, they are forced into some kind of liquid cooling. The reason is simple: For hyperscalers, removing a few watts from the power budget of a system can add up to millions of dollars in savings per year. And a kilowatt cuts two ways in a datacenter – coming in as juice to power gear and having to be taking out as heat when the bits are flipped.
Water has 4,000 times as much energy carrying capacity as air, but because water is so much denser than air it also takes a lot more energy to move it around the datacenter and across system components to remove heat. The upshot is that water is about 25 times as efficient as air at removing heat. But there are two problems with using water to cool systems. One, you can’t get water on electronic components or they fry, and two, the current crop of liquid-to-liquid heat exchangers are very inefficient. On top of that, cooling distribution units (CDUs) in the server racks add to cost and are difficult to deploy and require pumps. With the Triton design, Dell worked with eBay to eliminate all of these steps, basically bringing the same water that normally feeds into CRACs (short for computer room air conditioners) all the way into the rack and extending it directly down to the water blocks that are mated to the processors in the Triton server sleds.
This water can be somewhat cruddy and can gum up the works in the piping used to move it into the rack and then into the server nodes, but this is what Dell and eBay wanted to do so they could eliminate the liquid to liquid heat exchanger that is normally used to keep the relatively dirty water from getting deep inside of the datacenter. In fact, the cold plates that latch onto the Xeon E5 processors in the Triton server sleds can handle pollutants up to 300 microns in size, but the filtration system used in the rack can clear out particles 100 microns or larger in size so that should never happen anyway. Shelnutt says one layer of filters in the system can be replaced with charcoal filters so they can move corrosive materials as well as particulate matter.
The one thing that you worry about with any water cooling is leaks, of course, and the torch brazed copper tubing used in the Triton system to carry cooling liquid to the processors and voltage regulators (the hottest components in the machine) is rated at 5,000 pounds per square inch (and can be pushed as high as 10,000 PSI) even though the actual water pressure in the system is typically around 70 PSI. Dell has engineered the Triton so cope with 350 PSI if necessary. This leaves a heavy margin of error.
Leaks, if they occur, will most likely happen at seams where components meet, not in the pipe itself. The water pipes used in the server sleds have troughs underneath them to catch any water that might leak from tubes, and these troughs have sensors that kill water flow immediately when a leak is detected and also kills power to the node to prevent shorting out of circuits. The fat flexible pipes that lead into server enclosures from the outside water supply have covers that keep them from spraying other equipment if they leak and cleverly waterfall leaks down to the bottom of the rack and away from gear should they leak. These hoses and pipes use military-grade, quick-release, dripless links for the hooking the water blocks in the server sleds to a central water distribution unit in the Triton server chassis (which also has water catcher basins and leak detectors) and these in turn to the chiller water supply.
The water cooling system used in Triton eliminates about 80 percent of the heat generated by the server sleds, but the main memory and PCI-Express peripherals in each server sled are cooled by the air. But the air does not come from outside of the Triton rack, but rather from a liquid-to-air heat exchanger in the back of the chassis that blows cold air over these components. The combination of the cold plates and the air heat exchanger in the rack is rated at 120 percent of the capacity of the rack, so you can actually make it blow cold air out the back if you want . . . .
The water works in the Triton system are interesting, even if you don’t care much about plumbing, but the point is that this cooling system allows eBay to ramp up the performance density of its search engine infrastructure while actually reducing costs per unit of compute. The change in temperature – or delta T in the lingo – of Triton is on the order of 18 degrees to 20 degrees Celsius, and with that big of a change in temperature, you can move water more slowly through the system than if you are using water that doesn’t have as big of a delta. This saves on the water bill.
The Triton machine is based on Dell’s G5 hyperscale racks, which have been commercialized within the Extreme Scale Infrastructure group as the DSS 9000 system. (We told you all about the G5 and DSS 9000s earlier this year.) In this case, eBay is using a configuration that has three server nodes across the 21-inch rack. With eight enclosures per rack, that works out to 96 nodes in total. The system can accommodate JBOD disk enclosures or GPU accelerators in the sleds if eBay wants, and the water blocks could be put on the GPUs if necessary. But eBay’s search engine is all about raw CPU compute.
To that end, the Triton cooling allows eBay to use a custom “Broadwell” Xeon E5-2679 v4 processor with 20 cores running at 3.3 GHz, which is rated at 200 watts. That clock speed is sustained, not a Turbo Boost speed that can only be reached if other parts of the chip are cooler, and it can be sustained over all 20 cores. eBay says that this chip has 59 percent greater throughput in terms of search queries per second processed than the stock Intel Xeon E5-2680 v4 chip, which has 14 cores running at 2.4 GHz. Intel list price for the latter is $1,745 a pop when bought in 1,000-unit quantities. (You can see our coverage if Broadwell Xeons here.) This custom Xeon chip used by eBay for search is also 70 percent faster than a “Haswell” Xeon E5-2680 v3 chip, which had 12 cores running at 2.5 GHz. Pricing was not announced on the custom SKU for eBay, but we would guess the price scales more or less with performance adjusted for chip volumes.
The resulting Triton system uses 97 percent less power on cooling than a typical average air-cooled datacenter, according to Dell, and the cooling system uses 62 percent less power to cool than does the Apollo 8000 water-cooled system from Hewlett-Packard Enterprise, it says. The setup is, for all intents and purposes, a rack-level datacenter and has a power usage effectiveness of between 1.02 and 1.03, which is about as low as you can go. That’s pretty good for overclocked Xeons running at full, sustained throttle. The fun bit is this: compared to a regular air-cooled G5 or DSS 9000 rack, the Triton water-cooled version carries less than a 5 percent premium, according to Shelnutt.
Dell is working on a “closed loop” version of the Triton system that will not require datacenter facility water to work and that will presumably have a built in chiller for the rack, enabling free-standing, dense infrastructure that is water cooled at a very efficient level.
Be the first to comment