Over the year, Dell EMC has always had a hand in HPC and supercomputing. The Stampede system at the Texas Advanced Computing Center (TACC) was powered by more than 5,000 PowerEdge servers running on Intel Xeons and at one time had a place among the ten fastest supercomputers on the twice-yearly Top500 list. Stampede 2, comprising PowerEdge 6320P and C6420 servers and running 367,024 Intel Xeon Phi cores, currently holds the number 17 spot on the latest Top500 list, released this month at the SC18 show in Dallas.
However, the company’s commitment to HPC has ebbed and flowed over the years, though the on-again, off-again relationship is back on in a big way this year, starting with the promotion of Dell EMC veteran Thierry Pellegrino as vice president and general manager of the company’s HPC business. Dell EMC has repositioned the HPC business within the company, embraced AMD’s Epyc chips for some systems and broadened the accelerators it will use in its PowerEdge-C systems.
That is not to say Dell EMC hasn’t pursued the HPC space in the past. The TACC systems illustrate efforts in over the last decade, and in 2015 the company outlined for The Next Platform plans to attack the space in the wake of IBM selling its System x server business to Lenovo, opening up opportunity for Dell – as well as Hewlett Packard Enterprise and even Cisco Systems – in the supercomputing market. But according to Pellegrino, interest and effort has waned at times.
“You’ve probably seen us go through peaks and valleys of announcements over the years, wanting to go big in HPC and then there’s been a moment when we were trying to figure out how much investment we wanted to put into HPC,” Pellegrino told The Next Platform during the SC18 conference. “But something happened earlier this year where as a company we decided to be much more serious about HPC.”
That included moving the HPC business into the company’s server business unit, he says, adding that “now we’re making a pretty strong statement that we’re serious about HPC, bringing it so close to the top and having it as part of a BU. We’re not stating that HPC is only about servers, but if we’re going to host it somewhere in a BU, we might as well put it into the BU that has the majority of the components that are sold into high performance computers.”
He says that even through the past two decades, Dell EMC has kept a presence in HPC, remaining a relevant player. It has its PowerEdge-C systems aimed at HPC, deep learning, and data analytics workloads and offering such features as NVM-Express drives, high speed memory, automation capabilities, and liquid cooling, all of which help drive performance and density. In addition, the company has created the Dell EMC HPC Innovation Lab in Austin, Texas, a 13,000-square-foot datacenter specifically built for HPC that includes thousands of servers, a Top500 cluster, and storage and network systems. The staff of computer scientists and engineers work with partners, customers, and others in the HPC field to deliver early access to new technologies, creates reference architectures, benchmarks applications, tunes HPC clusters, and develops best practices.
“Those clusters are where all of our engineering organization does the work and derives from there our reference architectures, but also unique offerings for customers that come in with their own data sets,” Pellegrino says. “At any point in time we can have four or five customers running a POC in our lab. That is a differentiator and something that our customers value tremendously.”
Dell EMC also has racked up some impressive wins in the supercomputer space, including the $60 million award from the National Science Foundation to develop the “Frontera” supercomputer that will be housed at TACC. The system, which was announced in August and that we discussed in-depth here, will include PowerEdge systems that will include Intel’s next-generation Xeon SP processors, Mellanox’s 200 Gb/sec HDR InfiniBand interconnect, and the high-density Direct Contact Liquid Cooling system from CoolIT Systems. Among the early projects it will tackle will be analysis of particle collisions from CERN’s Large Hadron Collider, global climate modeling, hurricane forecasting, and multi-message astronomy.
The Frontera award adds to a list of other high-profile systems Dell EMC is building, including an OpenStack system in partnership with StackHPC at the University of Cambridge that will deliver more than 2 petaflops of performance, and the “Great Lakes” system at the University of Michigan, which will be used for everything from simulation workloads to artificial intelligence, machine learning and genomics. The Ohio Supercomputer Center will deploy its “Pitzer” cluster that will use PowerEdge systems with CoolIT to deliver almost twice the performance of the center’s most powerful system but require less than half the space and less power to run.
At SC18, Dell EMC also said it’s expanding the accelerators that will be available in its PowerEdge R640, R740, R740xd, and R7425 systems. The new accelerators include Nvidia’s Tesla T4 GPU acelerator, housed in a small PCI-Express form factor, consuming 70 watts of power, and designed for machine learning inference and capable of doing training, too. In addition, the OEM is adding to the field-programmable gate arrays (FPGAs) used in the servers. Along with the Arria 10 GX FPGAs from Intel, the systems now can run the Alvero U200 FPGA accelerator card from Xilinx, which at the supercomputing conference announced the addition of the Alveo U280 to its lineup of accelerator cards.
In talking with The Next Platform, Pellegrino noted the company’s efforts to provide a wide range of technology options to the HPC space. Right now, the company offers systems that run both Intel Xeons and AMD Epyc chips, GPUs from Nvidia and AMD and FPGAs from both Intel and Xilinx. At the same time, the company also is evaluating other technologies, including Arm-based server chips like Cavium’s ThunderX2 as well as processors from other vendors and startups, such as Ampere. Whether to make systems powered by such technologies will depend in part on customer demand, he says. For the first time, an Arm-powered system, nicknamed “Astra,” made it onto the Top500 list. The supercomputer, housed at Sandia National Labs, was built by HPE and its powered by 135,328 ThunderX2 cores, and at 1.5 petaflops, sits at number 205.
“If I paint a bigger picture outside of what we have available, at any point in time, within our CTO office or the HPC organization, we always evaluate other technologies,” Pellegrino said. “We have had several discussions with Arm silicon vendors that have come up at times, but today we don’t have a platform with a SKU that customers can buy. It doesn’t preclude from having that, and when customers asked why wouldn’t we have a portfolio of Arm-based solutions, my answer is typically, ‘Would you feel like you could integrate that into your IT environment or your HPC environment today?’ The disconnect around Arm is really around the ecosystem. You can’t just take an Arm server and run Exchange on it. It’s not where it would excel, and, second, it’s not supported, it’s not validated. It just doesn’t work.”
The ecosystem argument tends to trip up the idea of wide adoption of Arm systems in datacenters, though he says there are aspects – from such perspectives as size, efficiency, density and integration with some Internet of Things (IoT) workloads – that make the chips attractive.
“There are points that would potentially bring us to Arm, but it’s really customer-driven,” he says. “I will say that in the space of HPC or hyperscale, there’s probably a little lighter lift to going to a platform that can be used for commercial applications. It’s primarily because the scope of the application is a lot narrower. If you’re limited to one application stack, it’s a lot easier to validate and get that support than if you want very broad enterprise support. Intel and AMD are on the map already, Arm is always in the discussion.”
FPGAs and other kinds of accelerators, such as custom ASICs, as well as the growing list of smaller chip makers hoping to gain a foothold in the server space, will have be evaluated as well, though which ones will pan out is a tough guessing game, he says.
“As much as Nvidia wants to be focused on graphics processors and Intel is a lot more generalist in nature, those FPGAs and those accelerators can be targeted for certain types of applications,” Pelligrino says. “You could have, for example, an FPGA targeted at image recognition, which could get you a performance advantage, and if you leverage that in a commercial environment where time to recognize something on your line is very critical, you could see value there. But trying to tell which of those players is going to be extremely relevant in five to ten years is like reading tea leaves. Mainstream will still continue to be primarily around processors. Accelerators have become the co-pilot of mainstream and then you have the wild, wild west, all those other options that are probably pretty valuable, but we’re not sure which ones will prevail and which will map to what verticals and workloads.”