A complex world demands complex systems.
Designing and improving new industrial systems, semiconductors, or vehicles, whether earth or space bound, presents massive engineering and manufacturing challenges. As does building digital twins which can help designers to understand how large or microscopic scale structures will work in practice.
An additional challenge is a customer base with increasing demands, whether we’re talking about companies that make electric vehicles, the chips that control them, or new manufacturing lines to produce them.
“The challenge is how do I bring products to market faster, better, reducing cost, and ahead of my competition,” explains Andre Gardinalli, HPC/AI manufacturing sales executive at Lenovo. “And to do that, there’s a lot of engineering that goes on behind the scenes.”
Artificial Intelligence (AI) definitely has its place. But when it comes to these specific industrial and manufacturing challenges, it tends to be fundamental engineering and physics that generate the answers – number crunching and data processing in the extreme.
That, in turn, means that the engineers working to deliver more detailed test results, more realistic prototypes, and run ever more fine-grained simulations turn to some of the most powerful high-performance computing systems to power their workloads.
What might have counted as a system capable of High Performance Computing (HPC) a decade, or even a few years ago, can quickly run out of steam. Computational fluid dynamics (CFD) applications often use thousands of CPU cores, points out Gardinalli. But it’s not purely a question of throwing raw power – and dollars – at the issue. The real conundrum is how to map to a wide range of different domains which all require different underlying infrastructure.
Finite element analysis (FEA), for example, focuses on working out how materials and structures will act under stress. It’s therefore critical to public infrastructure as well as to vehicle design and crash simulation. Likewise, CFD looks to analyze gas or liquid flows, which are again paramount to vehicle development. Electronic design automation (EDA) aims to streamline the design and manufacture of components such as semiconductors, which are amongst the most complex designs humans make.
All of these require – and produce – vast amounts of data. And this data grows as simulations become more detailed or analysis becomes more fine-grained. While engineers would once have turned to high performance workstations to crunch this data, Gardinalli says such systems are “not able to do the math that is required.”
Moreover, the nature of those diverse workloads requires different applications and places different demands on hardware, even as it has moved into the datacenter.
The Right Kind Of Power
If we look at an example such as systems running core CFD and Explicit FEA applications like Ansys Fluent, Siemens Star-CCM+, or Abaqus, they generally scale well to many cores and nodes. Typically, these will run on two 32 core CPUs with moderate clock frequencies with high memory bandwidth and fast interconnects to underpin that scale up. Memory will typically be 2 GB to 4 GB per core.
Implicit FEA loads will often run on systems with lower core counts, usually two 16 core CPUs or even two eight core CPUs. But these will have higher clock speeds and large amounts of RAM – 1 TB to 2 TB might be typical. They will use fast local storage or even rely purely on data loading from RAM.
So, meeting the demands of these challenges, and the engineers solving them, means being able to match the right system, and associated infrastructure, to the right problem. It also means understanding one of the key issues facing both society as a whole and technologists in general, explains Gardinelli.
The drive to make compute denser is undeniable. Consolidating older servers to newer, more powerful, more energy efficient systems can allow IT decision makers to reduce the number of servers in a datacenter, saving space, reducing energy consumption and improving overall ROI.
“Newer servers using processors with higher core counts and higher frequencies deliver the kind of performance customers need today,” Gardinelli says. “The challenge is finding the most energy efficient systems with the necessary thermal management capabilities to power the servers while keeping them cool. As newer CPUs allow for increased frequency and deliver a greater number of cores and memory channels, a key element becomes power and cooling in highly dense environments.”
Lenovo says it anticipated these issues over a decade ago and was an early innovator in direct liquid cooling. In fact, its Lenovo Neptune cooling technology was designed to address today’s power and thermal challenges from the start and to ensure customers can take advantage of today’s high-performance servers.
Liquid cooling is implemented in different ways across Lenovo’s HPC and AI systems, and at every scale of system. One thing Lenovo says it has done very differently from its competition is to have kept its liquid cooling capabilities or liquid cooling systems inside a traditional 600 millimeter rack.
The 1U ThinkSystem SR645, for example, can use liquid assist for cooling, while the ThinkSystem SD665 V3 and high density ThinkSystem SD665N-V3 nodes both employ direct liquid cooling, not just for the CPU, but for memory, drives, GPUs and networking cards. All these systems run on AMD Epyc CPUs, as does the ThinkSystem SR675 V3, which features air cooling and is designed to take eight full-sized GPUs.
“So, the benefit is performance, and the benefit is also sustainability by lowering your total energy bill or the power usage across your datacenter,” says Gardinalli. That allows customers to keep their flooring tight and avoid spreading more weight over a wider space in the datacenter.
Datacenter consolidation and modernization initiatives also deliver a powerful tool to help IT leaders to ‘make room’ for new technology initiatives such as AI. By replacing aging, inefficient servers, everyday workloads can be re-hosted onto more performant and energy efficient infrastructure, consuming less real estate and power, ultimately doing more with less and freeing up valuable data center resources to support innovation, says Lenovo.
Racing Into The Future
How all of these factors come together is highlighted in the world of Formula 1 motor racing, where teams apply HPC for CFD to squeeze every bit of efficiency from their vehicles – but also contend with regulations that limit the amount of capacity they can use.
“We have a group of benchmark engineers that understand those applications really well,” says Gardinalli. “They understand the requirements not only around the technology, but around costs, because that’s what they’re trying to do is cap costs and select your best price performance solution within the parameters of the Formula 1 requirements.”
The same principles apply when it comes to commercial automotive customers, he explains. “You’re not limited in the same way but limited by the number of licenses that you might have for a service.” The imperative then is to leverage those licenses at the maximum performance.
That early focus on liquid cooling is not the only way Lenovo believes it has gotten ahead of its rivals when it comes to tackling power issues and datacenter design in general – it also works closely with chip vendor AMD to get insight into upcoming architectures.
AMD shares Lenovo’s view on the need for energy efficiency, and that has informed what the company calls its “no-compromise” approach to developing Epyc processor designs. The latest AMD “Genoa” Epyc CPUs feature several designs tailored for business-critical workloads of any size or complexity, for example. And the AMD Epyc CPUs inside Lenovo servers have been deliberately architected to improve performance per watt and core density to make them suitable for datacenter deployments that value high productivity and energy efficiency.
That positions Lenovo to predict how those designs will influence datacenter design requirements in the long term, whether customers are updating their datacenter real estate or looking to work with a co-location provider.
“You will be looking into the power density of the datacenter and cooling capacity of that datacenter and how to prepare it for the next generation,” he says.
The previous models no longer apply. “We’re at an inflection point,” Gardinalli says, and “If you try to look to the past, it’s going to be hard to get right.”
Sponsored by Lenovo.