Companies Need On-Premise HPC – And For More Than AI, Too

Generative AI and the various capacity and latency needs it has for compute and storage is muscling out almost every other topic when conversations turn to HPC and enterprise. Intel, AMD, Nvidia, Arm, and other chip makers are boasting about what their latest products can do for AI workloads, cloud companies like Microsoft, Google, and Amazon Web Services are all racing to roll out the most AI services faster than their competitors.

That said, organizations and institutions running HPC environments are still doing the work – the non-AI work – they have always done, according to Gerald Kleyn, vice president of customer solutions for HPC and AI at Hewlett Packard Enterprise. Much of what such organizations need for compute, storage, and networking can be found in HPE’s expansive GreenLake hybrid cloud services platform – and much of the AI work being done by enterprises, institutions, and others is being done in the cloud – a lot of HPC is still being done on premises.

That in part is what’s driving HPE’s expansion of its Cray server and storage systems and was the impetus of the IT giant’s recent launch of the Cray Storage Systems C500, a storage box that brings many of the features of the larger – and more expensive – Cray ClusterStor E1000 Storage System in a smaller and more affordable configuration.

It can support both HPC and AI compute clusters, but it is “targeted at classic HPC environments and departments running modelling and simulation like computational fluid dynamics for computer-aided engineering on HPE-built compute clusters,” Kleyn tells The Next Platform.

Enterprises with AI workloads can look to the cloud for a file storage system with the GreenLake for File Storage service, which can cover the various needs of enterprise AI, such as efficient and high-performing capacity for data aggregation and preparation, model training, tuning, and inferencing, and generative AI and large language models. It also offers enough flexibility to adapt to the needs of organizations working with the rapidly evolving generative AI technologies.

And HPE continues to build out its GreenLake storage capabilities, as seen this week by the introduction of GreenLake Block Storage for AWS, a software-defined storage service to manage block storage in hybrid clouds, and NVM-Express support scaling up to 5.6 PB on GreenLake for Block Storage built on HPE’s Alletra Storage MP for midrange mission-critical storage.

In their report last month about HPC in 2023, Hyperion Research analysts wrote about a changing HPC space that still has a prominent on-premises business but that also is seeing fast growth in the cloud and in AI. The global HPC market – which HPE sits atop – was $37.2 billion last year, with on-prem servers still accounting for 40 percent of that, with storage taking 17 percent. Cloud spending hit 20 percent.

They expect growth in the on-premises HPC server market, from $15 billion in 2023 (which was a 2.7 percent year-to-year drop) to $16.3 billion this year. Overall, on-prem HPC spending will hit $32.2 billion, though trends like an economy that is putting pressure on buyers, supply chains – for GPUs, for example – are tightening, and delays in some exascale systems will be challenging, according to Hyperion.

In addition, the lower end of the on-prem datacenter market is struggling.

But there are areas driving growth, particularly in predictive and generative AI and large language models (LLMs). In addition, cloud computing is becoming an option for a growing number of HPC workloads, the analysts found. Kleyn notes that HPE also sees new supercomputers being designed with AI model training as a priority workload among traditional HPC codes, a trend he expects will continue.

That said, his company also will continue to help organizations build out their on-prem HPC environments.

“HPE intends to not only provide servers to on-premises HPC users in public sector organizations and HPC departments within enterprises, but a complete solution that is inclusive of storage, interconnect, and middleware in addition to our HPC servers [and] delivered as a complete solution with lifecycle services from the HPE Services team,” Kleyn says.

That’s where the Cray Storage Systems C500 comes in. Five years ago, the Cray ClusterStor E1000 Storage System was launched, aimed at exascale, pre-exascale, and national AI supercomputers for sites leveraging Cray EX supercomputers, according to HPE. The powerful storage system included an embedded Lustre file system.

However, there also are enterprises running modeling and simulation workloads on smaller HPC computer clusters based on Cray XD2000 servers and need storage capabilities that match those environments in both size and price. The C500 includes the same software, 2U24 storage controllers, and 5U84 HDD enclosures as the E1000. However, for the system management unit, the C500 uses HPE’s ProLiant DL325 Gen11 server rather than a 2U24 controller.

Another difference is that the E1000 uses a 2U24 controller with 24 NVMe SSD as Metadata Unit and another as Scalable Storage Unit Flash. The C500 brings both into a single 2U24 converged MDU and SSU-F with 24 NVMe SSD. Also, the E1000 only supports fully populated storage enclosures; the C500 can handle half-full enclosures in particular configurations.

Also, while the E1000 is an exascale-class system, the C500 scales up to 2.6 PB with all-flash capabilities and 4 PB hybrid SSD and HDD capacity, which Kleyn calls “a testing limitation, not an architectural one.”

While the E1000 works for organizations using HPE’s large HPC clusters and supercomputers, “we received feedback from many users of entry-level and mid-range HPC clusters built on HPE servers that they would like to see an ‘entry version’ of this offering,” he says.

The C500 competes with Dell EMC Ready Solution for HPC PixStor Storage, IBM’s Storage Scale System 3500, and Lenovo’s Distributed Storage Solution for IBM Spectrum Scale, he says. All three embed IBM Spectrum Scale as the parallel file system, which charges a licensing fee per terabyte capacity or per storage drive deployed, which drives up the costs.

The C500 instead embeds Lustre, which Kleyn said is more cost-effective because organizations don’t have to pay such licensing fees.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.