Lenovo Sees Expanding Market for Dense Water-Cooled HPC
February 22, 2018 Jeffrey Burt
The demands for more compute resources, power and density in HPC environments is fueling the need for innovative ways to cool datacenters that are churning through petabyte levels of data to run modern simulation workloads that touch on everything from healthcare and climate change to space exploration and oil and gas initiatives.
The top cooling technologies for most datacenters are air and chilled water. However, Lenovo is promoting its latest warm-water cooling system for HPC clusters with its ThinkSystem SD650 systems that the company says will lower datacenter power consumption by 30 to 40 percent of the more traditional cooling methods and provide up to 90 percent heat removal efficiency, twice the cooling capacity of processors and memory and 5 percent more heat recovery from the rack. The 6U NeXtScale n1200 Direct Water Cooling enclosure is aimed at HPC environments but can also be applied to enterprise datacenters, according to the vendor.
The new cooling system comes at a time when HPC systems are being asked to process and analyze massive amounts of data as quickly as possible and run simulation workloads to model highly complex scientific and engineering problems. Processing power in the last six years has increased almost 70 percent and performance has grown 10 times over that period, Lenovo said. Power consumption in HPC servers have grown by 2X over the past generation or two, with processors that consume 200 watts or more.
Such figures put increasing demand on power, cooling and density. Lenovo engineers looked for ways to reduce the amount of power used to cool the HPC environments, noting that air and chilled water consume a lot of power or require expensive investment in a datacenter’s physical structure. In a traditional data center, it takes about 60 percent of the server power to cool the servers; for chilled water, that number is 40 percent. Lenovo said that drops to less than 10 percent with water cooling systems. For an HPC datacenter, that could mean more than $123,000 in annual savings.
The RD650 servers and their warm water-cooling system will be the foundation of the next-generation SuperMUC-NG supercomputer at the Leibniz Supercomputing Center (LRZ) in Germany. The 6,500-node system will be powered by Intel’s Xeon “Skylake” Scalable Processors and tied together with the chip maker’s Omni-Path fabric, and will come with a peak performance of 26.7 petaflops, which would be enough to put it in the number-three slot on the Top500 list released in November 2017. It will be the third generation of the SuperMUC supercomputer, with previous iterations also running servers from Lenovo and IBM. Lenovo bought IBM’s x86 server business in 2014 for $2.1 billion, a move that instantly catapulted Lenovo into the top echelon of the world’s top system makers. In the last quarter of 2017, Lenovo saw revenue for its datacenter business jump 16.7 percent, to $1.2 billion, its highest quarter in two years.
As we reported in the The Next Platform, there were initial concerns in the HPC space after IBM sold its System x business that Lenovo would not be able to compete in the space and keep ahold of IBM’s place in HPC, a message no doubt amplified by competitors like Hewlett Packard Enterprise and Dell. However, the growth of the HPC market in general and Lenovo’s experience with hyperscalers and low-cost structures enabled the vendor to expand its HPC business.
The SuperMUC-NG was announced in December 2017, with Lenovo and Intel touting the power efficiency of the cluster. Lenovo engineers worked with Intel to develop a motherboard that was cooled by water flowing through inlets and that is run over the various components and then taken out of the system. The water exiting the system is warm enough to be used to heat campus buildings, according to Lenovo.
LRZ has used previous generations of the vendor’s warm water cooling servers, according to Lenovo. The SuperMUC-NG, which is expected to go online this year, will be used to run workloads in astrophysics, fluid dynamics, geophysics and life sciences. The system will be housed in a highly-dense rack of 72 servers that will consume 43 kilowatts. The research center’s current cluster delivers 6.8 petaflops of performance.
A single NeXtScale n1200 Enclosure can hold up to 12 SD650 compute notes and up to 24 processors, 9.2TB of memory, and 24 SFF solid-state drives (SSDs) or 12 SFF NVM-Express drives, and 24 M.2 boot drives. According to Lenovo, because of the efficiency delivered by the warm water-cooling system, the Intel CPUs can run nonstop in “turbo” mode, which means a 10 percent increase in chip performance. The system also leverages Lenovo’s Energy Aware Run-Time software that is designed to optimize power use when the workloads are running. The RD650s run the Red Hat, SUSE and CentOS Linux distributions, can run InfiniBand or Omni-Path and come in a 1U full-tray form factor, with two Intel chips per node and two nodes per tray.