Data is by its nature a messy beast, and it has only become more so as workloads have found their way out of the datacenter and into the cloud and even all the way out to the edge.
This is the problem that San Mateo, California-based Hammerspace, founded by David Flynn of Fusion-io fame, has spent the past five years getting to grips with. And this week, the company, which is five years old and which is developing it calls a “data orchestration system,” has lined up $56.7 million in Series A venture funding from the Prosperity7, Ventures, Pier 88, and ARK Invest to help extend and commercialize this idea.
While Hammerspace deals with data and storage, it’s not exactly a storage company. Despite Flynn’s history, he doesn’t want to sell you some exotic all-flash storage system. Instead, Hammerspace’s software aims to stitch together customers’ existing data stores and present them as a single, logical file system.
According to Hammerspace, the software platform runs on commodity hardware or in the cloud. From there, it connects to and then indexes the contents of each storage system, whether it’s local NVM-Express flash block storage, an object storage bucket, or a NAS running Ceph, ZFS, or some other file system. This metadata is then replicated across each installation, providing a reference to where specific data resides at any given moment. Flynn likens the platform to DNS for data.
However, this isn’t a static arrangement. Because this index is independent of the underlying storage system, the data it references can be dynamically shifted between different storage pools based on service-level objectives defined either on an individual basis or inherited from access control lists.
“There are statements that say when the file is recently used, make sure it’s on something that’s high performance, and when the file hasn’t been used in months, then demote it and put it in object storage,” Flynn told The Next Platform. “In a data-orchestrated world, data is perpetually in motion, and it’s available anywhere you need it, everywhere you need it. There is no data migration because data just moves under the covers.”
Implications For Distributed Workloads And HPC
The Hammerspace storage system has implications for a number of applications we deal with regularly at The Next Platform, like HPC and other distributed computing workloads.
“Hammerspace is unique in that it is a true parallel file system, but it’s standards based and client is built in,” Flynn explained. “It doesn’t require exotic proprietary clients; it comes in-box in Linux.”
By built-in, Flynn is referring to parallel NFS, which is available at version 4.1 and which is baked into many modern Linux distros. Flynn claims this simplicity allows Hammerspace to achieve higher performance compared to other parallel-and HPC-focused file systems like WekaFS, Lustre, or GPFS. “Because this is built into Linux and is so highly tuned, we are showing an advantage of about 2:1 on the same hardware,” he says.
Flynn adds that the company recently won a major hyperscaler contract to support its work on large-language models, though he didn’t name which.
“We have, in just the last few months, become key to one of the largest players in the AI arms race – one of the hyperscale companies that is often doing press and making releases about their large-language models,” he says. “Hammerspace is feeding tens of thousands of GPUs from within a single file system and doing checkpoints off of that at rates of performance that are only seen in the most hero-scale supercomputing. We’re talking 60 Tb/sec, over hundreds and hundreds of storage nodes.”
While Hammerspace has applications in HPC and AI/ML applications, the nature of the platform tends to attract customers dealing with data distributed over multiple regions, Flynn explained.
“The benefit is that you can bring together far-flung datacenters around the world, which means we kind of have to go immediately towards an international presence,” he says. “The benefits grow exponentially the more complex, diverse, and distributed your environment is.”
Because of this, many of Hammerspace’s early customers have been media companies. For example, the company claims its file system was used in the production of the Netflix series Stranger Things and Disney’s Mandalorian.
Media production often involves the copying and distribution of proxy footage used in editing, so it’s not hard to see where Hammerspace’s platform might be appealing.
Hammerspace has also been deployed by Jeff Bezos’ space venture Blue Origin, which is using the platform to manage data across multi-datacenter and hybrid cloud environments. The intent, Flynn explains, is to orchestrate data even across the vast distances and latencies of space.
Steps Toward An IPO
The funding round marks a change in Hammerspace’s strategy moving forward, as the company looks ahead to a potential IPO. Up to this point, the company has focused much of its attention on product development.
“There’ll be minimal investments on the engineering side, on the R&D side, but not so heavy,” Flynn explained. “Most of that has been paid upfront to get the stuff built in the first place.”
Instead, Hammerspace is shifting toward growing its customer base, which means larger investments in sales, marketing, and customer support to build confidence and familiarity with the product stack.
To this end, Flynn has his sights set on bringing the company to IPO though he admits this won’t happen overnight and certainly not in the next couple of months. “An IPO, or being public, is important because of our client base. They are very large companies and being public gives you credibility,” he says.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.