SPONSORED It is only natural the world’s top supercomputing sites in climate and weather modeling should be leading the charge for more efficient, sustainable, and green datacenter practices. With the right approaches, these centers can show that power and performance do not need to be a game of trade-offs and that systems can achieve radical performance with highly efficient cooling.
While power and cooling are concerns at the facility level, the leading provider of supercomputers in the TOP500, Lenovo, and the Korean Meteorological Administration (KMA) are proving what server-level liquid cooling can do for cutting-edge HPC efficiency.
KMA, South Korea’s national weather service, provides weather forecasting and issues warnings of adverse weather conditions across the region. The administration also conducts research on climate change to enable the Korean government to enact policies. To do this work, KMA operates the National Center for Metrological Supercomputer (NCMS), the largest supercomputer in Korea supporting vital weather and climate forecasting.
At the heart of “Supercomputer No. 5”, KMA’s newest system, is Lenovo’s Neptune direct-water cooling (DWC) technology—a first in Korea. Neptune technology is a bellwether for what lies ahead at some of the most demanding HPC centers globally, utilizing water delivered directly into the system via a copper loop to cool on-board server components, including the processors, memory and PCIe devices.
This alternative approach to cooling is increasingly important on systems like Supercomputer No. 5 because of the sheer density of compute power in each node. That very compute capability, fueled by HPC-focused Intel 3rd Generation Xeon Scalable CPU, a.k.a “Ice Lake” is designed specifically for liquid cooling. The 8386Q CPU has provided KMA a 9X boost in peak performance potential over its previous system.
All mission-critical weather forecasting supercomputers are mirrored, meaning there are two systems with the exact same configuration for backup in case the model needs to be quickly picked up to avoid forecasting lapses. KMA is no different. The “Maru” and “Guru” clusters are capable of 25 petaflops of peak performance individually (enough to rank in the top 25 most powerful systems on the planet individually) and together with a theoretical peak of 50 petaflops would be powerful enough to rank in the top ten systems in the world.
The server density needed for this kind of performance is not just about the processor: Lenovo integrated leading network and memory technologies, including HDR InfiniBand, into the architecture. Along with the special 2.6GHz processors, this means attention to cooling each node is critical for sustained efficient performance. In fact, as more supercomputers bulk up in performance, the more direct-water cooling like KMA selected in its Lenovo partnership will become the norm on TOP500 systems, because that combination of performance and density will be impossible with standard air-cooled systems. Two separate two-socket Lenovo ThinkSystem SD650-V2 compute nodes fit on a single tray, with six trays to each 6U n1200 chassis. A rack can accommodate six of the n1200’s for a total of 72 nodes and 144 processors per rack. Lenovo also announced direct water cooling for GPUs with the Lenovo ThinkSystem SD650-N V2, that supports a two-socket server with four NVIDIA A100 GPUs in the same n1200 chassis.
Why did KMA chose Lenovo ThinkSystem SD650-V2 nodes with direct warm water cooling as the building blocks for Supercomputer No. 5? Because the Neptune implementation with warm water-cooling removes more heat from the CPU, supporting up to 300W—a massive improvement over air-cooled approaches that can address anywhere from 165W to just a tick over 200W with special adaptions. With all server elements at an optimal temperature, performance is assured, and datacenter energy costs can be cut by 30-40 percent over traditional air cooling.
For weather centers like KMA, a backup system to take charge of mission-critical forecasts is essential. KMA split Supercomputer No. 5 into twin systems, named Maru and Guru, to achieve that redundancy. Thermal shutdowns and throttled performance can cause lost time and even data loss are always a threat if efficient cooling is not employed. In addition to boosting the efficiency and performance of its twin supercomputers, KMA has the added benefit of keeping operations at peak productivity.
Lenovo has been perfecting the art—and the science—of liquid cooling technologies for over a decade. The company’s journey through DWC started in 2012 when they were challenged to deliver a high-density supercomputer, built on x86 processor technology. At that time, the CPU was the main heat producer within the system, so that’s what their innovative copper water loop targeted. As time has gone on, they have expanded the loop’s capability to memory, storage, PCIe, power supplies and now acceleration.
The end result of this decade-long research process is visible at KMA. The Lenovo ThinkSystem SD650 V2 features water piped directly into the compute nodes to remove heat. But Neptune goes beyond direct cooling, and includes rear-door heat exchanger technology, and unique thermal transfer modules (TTMs) and liquid to air (L2A) heat exchangers which use liquid to augment air in air-cooled systems and achieve greater performance without adding plumbing to the datacenter.
Like any other meteorological entity, KMA’s challenge is to produce forecasts faster, with greater accuracy, and for multiple constituencies, (farms utilize different data than do airports).
While it might seem a minor point in the midst of such dramatic improvements in efficiency, performance, and reliability, the Lenovo ThinkSystem SD650 V2 nodes are far quieter because they do not have system fans —something any datacenter operator will tell you can be a major issue. And for those facility operators, the density enabled by Lenovo’s Neptune technology leaves plenty of room for growth.
Climate scientists and forecasting professionals have far higher resolution and more compute for their models. The center has the assurance of system reliability and maximum performance due to efficient heat removal (up to 90 percent of all heat from server components), and the facilities owners can rest easy when it comes time to scale their centers.
Direct water cooling is a win-win. KMA, with Lenovo by their side each step of the way, is paving the way for ultra-high density supercomputing with an eye on sustainability, performance, efficiency, and future-proofing from a facilities standpoint.
“Lenovo is dedicated to helping the world’s brightest minds solve humanity’s greatest challenges with advanced technologies and innovation,” said Steve Shin, country manager of Lenovo Data Center Group Korea. “This strategic partnership not only accelerates KMA’s weather forecasting precision and climate change research but also places KMA on the cusp of Exascale computing.”
This article is sponsored by Lenovo.