If there is anything for certain in the business of HPC it’s that building and maintaining a profitable enterprise is tough, even for established supercomputer makers like Cray (later acquired by HPE). Penguin Computing has been no exception but managed to weather its own storms, which included tactics like borrowing over $30 million from Wells Fargo’s supply chain financing arm earlier in 2018.
Later that same year, SMART Global Holdings agreed to pay up to $85 million to buy Penguin Computing. That strange wording is part of the agreement. The purchase itself was wrapped up in $60 million with an additional $25 million if Penguin could signal profitability within targets.
The thing many miss about the nuanced HPC business is that margins are low and CAPEX is through the roof. It makes even “large” HPC-specific companies like Cray in the pre-HPE days have to scramble, despite a long list of deployments. In its SMART deal, Penguin got access to more capital, which meant it could take down bigger deals — perhaps even ones like we are seeing finally play out three years later. For SMART’s part, Penguin is the foundation for its Specialty Compute and Storage business unit, which itself is a foundational piece of the company’s specialized memory business, which it has tuned toward AI/ML in particular.
While it’s not clear if Penguin ever did hit that mark, it has scored some wins via its alignment with Open Compute Project (OCP) designs and its TrueHPC platform, which is focused on quick deployment and ease of management.
Penguin has a total of seven systems on the most recent Top 500 list, with the largest at Sandia National Lab (#98 on the list). Of those seven, five are all-Intel CPU, Omni-Path “Tundra Extreme Scale” systems and the other two are the company’s “Relion” machines. While the Top 500 is by no means a complete cluster list, Penguin is best known for its delivery of small to mid-sized clusters in academia. It also provides an HPC cloud service with some enterprise use cases and has had wins in financial services and oil and gas. More recently, its partnership on the storage side with WekaIO has dominated company-specific headlines.
That same TrueHPC platform is at the heart of a $68 million contract with the Department of Defense in the US, which might breathe new life into Penguin and invigorate SMART’s confidence in its investment. The news also shows a shift away from Intel on the part of the DoD, at least at select sites for the Navy and Air Force. The new systems Penguin is set to deliver feature third-generation AMD Epyc processors coupled with Nvidia A100 GPUs. The contract includes compute and storage as well as managed services.
The balanced HPC systems and software significantly enhance the DoD’s ability to tackle the most demanding and computationally challenging problems in fluid dynamics, chemistry and materials science, electromagnetics and acoustics, climate/weather/ocean modeling and simulation, among other applications. Penguin Computing’s managed services team will bring additional capability to the DoD in emerging technologies, while also enabling DoD teams to focus on their research.
This is exactly the kind of contract Penguin Computing needs to gather momentum. At just over 8.5 petaflops for the Navy and 9 petaflops for the Air Force Research Lab, each would constitute higher performance than its current highest-ranking machine at Sandia.
It’s also worth noting that the storage selection for these systems breaks from the WekaIO partnership. DataDirect Networks (DDN) will provide four petabytes of NVMe SSD and 370TB of memory to the Navy at the Stennis Space Center in Mississippi and 20PB of storage with a petabyte of NVMe to the Air Force Research Lab at Wright-Patt AFB.
According to Penguin Computing’s president, Sid Mair, “these complete and highly dense HPC resources will be among the most powerful supercomputers in the DoD HPCMP’s resources, providing a combined total of over 365,000 cores, more than 775TB of memory, and a total of 47PB of high-performance storage including over 5PB of high-performance Flash storage. Combined, these two systems provide a peak performance of over 17.6 petaFLOPs.”
As a standalone, private company Penguin Computing might have had a difficult time securing the capital outlay to build the equivalent of a Top 25 supercomputer. It might have had an even harder time competing on price against HPE with all the Cray assets behind with it.
The fact that the small Fremont CA-based company, which got its start building clusters in the late ’90s, is still securing big wins in a market as tough as this one is a testament to resilience and resolve. But the economics of 2021, not to mention supply chain disruptions, increasing device costs, and firmer competition from giants like HPE on price and capability, make Penguin’s story all the more compelling. With that said, it might be the military rescuing Penguin Computing’s future but its first responders, and at exactly the right time for Penguin at least, were the SMART folks.
If the DoD decides to run Linpack, Penguin should have shiny notches on its belt when the systems are delivered and benchmarked.
The Navy DSRC at Stennis Space Center in Mississippi will receive a Penguin Computing TrueHPC platform with 176,128 compute cores from 3rd Gen AMD Epyc processors and 144 Nvidia A100 Graphics Processing Units (GPUs). The system is interconnected by an Nvidia HDR 200Gb/s InfiniBand network and supported by more than 26PB of Data Direct Networks storage, including over 4PB of high-speed NVMe-based solid-state storage and 370TB of system memory, and will provide 8.5 petaflops of peak performance.
The Air Force Research Lab’s DSRC at Wright-Patterson Air Force Base in Dayton, Ohio will receive a Penguin Computing TrueHPC platform with 189,440 compute cores from 3rd Gen AMD Epyc processors and 152 Nvidia A100 GPGPUs. This system is interconnected by an Nvidia HDR 200Gb/sec InfiniBand network and supported by more than 20PB of Data Direct Networks storage, including over a petabyte of high-speed NVMe-based solid-state storage and 405TB of system memory, and will provide 9 petaflops of peak performance.