Object Storage Makes A Push Into HPC

Four years ago, Cloudian was a six-year-old startup in an object storage space that, while the technology had been around for more than a decade, was seeing a surge of interest from cloud providers desperate for a storage architecture that only could scale to meet the demands of their rapidly growing datacenters, the massive amounts of data that was being generated and the need to be able to more easily move it between core on-premises datacenters and multiple cloud environments – and in the coming years the edge.

Object storage had gotten a boost from the S3 protocol introduced by Amazon Web Services  – the cloud giant’s Simple Storage Service (S3) data storage platform was among the first services AWS rolled out when the cloud business launched in March 2006. The protocol made it easier to move the data back and forth – a key capability with the growing rates of enterprise adoption in recent years of both multicloud and hybrid cloud environments – and has been embraced by most object storage vendors, essentially becoming a standard interface for object storage.

The storage architecture offers high degrees of scalability and manageability – reducing management costs by as much as 20 percent – and Cloudian has been able to ride the growing popularity of object storage over the years. In 2017, the company had about 140 customers. In March, Cloudian said it had just ended its six consecutive fiscal year of record bookings and that it had grown its global customer base by 36 percent year-over-year, to more than 550. Total storage capacity jumped 63 percent.

Core to Cloudian’s portfolio is HyperStore, its S3-compaible scale-out object storage platform shown in the image above, with appliances designed for environments where high-capacity and scale is key. The technology can be deployed in Cloudian HyperStore appliances – such as the 4U HyperStore 4100, shown here to the right – or in a software-defined storage (SDS) mode, where the software can be run on third-party platforms. The company also offers HyperFile NAS storage and last fall added SSD support to bring flash into the HyperStore lineup.

Cloudian finds itself in competition with the likes of NetApp and Dell EMC as enterprises continue to see the amount of unstructured data growing in an increasingly distributed IT environment that runs from the datacenter to the cloud and edge, with that demand helping to lift all boats. Gartner analysts have said that their customers are saying that unstructured data is growing 30 percent to 60 percent a year and that by 2024, enterprises will triple the amount of unstructured data stored as file or object storage in the public cloud or at the edge.

Along with growing demand and the distributed nature of computing, Cloudian CTO Gary Ogasawara points to the growing number of partnerships with other tech vendors as key in the company’s growth over the past several years. Cloudian last year announced partnership with backup software provider Veeam, with some organizations choosing to store their back-up storage on premises in Cloudian storage rather than in the cloud with AWS S3, both for greater security protection against threats like ransomware and to save costs.

Cloudian chief technology officer Gary Ogasawara

Cloudian also has a growing partnership with VMware, offering ransomware protection for VMware’s Cloud Provider program – with growing adoption with such providers as Expedient and Green Cloud – and integration of Cloudian’s object storage software on VMware Cloud Foundation with Tanzu for a single, shared storage environment. Cloudian’s software also integrates with Splunk SmartStore.

The CTO says there has been an uptick in adoption of HyperStore for both ransomware – through its Veeam and CommVault partnerships – and secure environments as well as with its HyperStore Flash offering, which provides an S3-compatible flash solution at a third of the cost of competing flash offerings. A key reason for adding flash support – with systems like the 1U HyperStore Flash 1000, shown in the feature image at the top of this story – was to address the demands from modern workloads like machine learning and data analytics.

The security capabilities are particularly relevant for HPC deployments, another area where object storage – including HyperStore – is gaining traction, he says.

“Object storage in general is getting much higher performance,” Ogasawara says. “That’s been a big trigger for us and other vendors as well. For us, we’re software-based. For example, when I swap out a spinning drive with the flash drive, you automatically get better performance. But a lot of HPC use cases and opportunities we see now are taking advantage of tiering and multilevel storage infrastructure, so they might have one object storage cluster that’s replacing what they were doing with the filesystem before and have high-performance flash behind it. But it could also be networked and connected to a larger, cheaper and deeper layer that’s running your spinning drives. The advantage for the use cases is that they have the object storage’s single namespace, so they could just access any data they want using the S3 API and they don’t necessarily need to know whether it’s stored on flash or stored on spinning drives.”

Cloudian has seen some recent wins in the HPC space. The company in January announced that NEC and Osaka University will use HyperStore object storage in a cloud-linked data analysis supercomputer at the university. The CTO says DDN is providing a parallel filesystem with Cloudian delivering the object storage, with both environments enabling a broad array of use cases. Key to the supercomputer is the ability to share data across different sites and research groups, he says.

The growth of object storage in HPC workloads is a natural evolution of the architecture as it adapts to the increasing distributed nature of IT. Object storage started out in what Ogasawara describes as a “coldish warm tier,” but has moved – not only with HPC, but other workloads as well – to the hot tier, a direction that Cloudian and other object storage vendors are heading. With its partnerships and adoption of such capabilities as flash support, HyperStore is able to embrace broader use cases across the temperature range.

At the same time, there is still a need for tiering capabilities that provide separate storage clusters that are defined by different performance characteristics, he says. With that, organizations can do automatic data replication and transfer across those tiers. The S3 protocol as part of its API enables that. In HyperStore, Cloudian implements S3 Cross Region Replication (CCR) to help with data transfer. S3 also enables organizations to define at a per-bucket granularity rules that define that after so many days, move particular data to another S3 system.

“It’s the same logic that’s used for a hybrid on-premises and public cloud systems,” Ogasawara says. “We might have on-premises HyperStore and it says that after 30 days, move it to Amazon S3. That same type of lifecycle rules is now being used inside standard tiering, where the customer owns both systems and they’re just tiering across for economic reasons.”

The migration to warmer tiers for object storage in part is due to the increasing adoption of HPC workloads by enterprises, he says. The rise of AI, machine learning and other emerging workloads is forcing enterprises to adopt HPC-like environments. Ogasawara points to climate model analysis as an application that some enterprises are trying to do even though they don’t have access to the large supercomputing clusters.

“It might take more time because they don’t have the high-powered CPUs or specialized CPUs, but that’s the type of workloads that are more and more growing in the enterprise,” he says. “They’re looking for a standardized solution where they could access a whole lot of data and that’s where object storage is strongest, where you have a single namespace and you can access a lot of the data, and in particular, where they’re seeing the limitations of filesystems. They are not able to scale up those systems as they add more data, so instead of buying another filer and having to do extra programing to access both sets of old filer and new filer, they start to look at an object storage solution that’ll work and be able to scale over time without having to rewrite their programs to access the newer data.”

Data analytics is another important area. Some analytics jobs can take hours to run and organizations still want to access data across their disparate departments, archive the results and share them on the same system. Object storage fits well in environments where multi-tenancy is built in and where limits need to be set on the different groups.

With the improving performance in areas like performance and latency of object storage, another driver in the growing adoption of the architecture in HPC is the understanding that the data is key in AI and machine learning workloads. Pointing to the work of computer scientist Andrew Ng – an expert in AI and machine learning who has worked at such companies as Google and Baidu – Ogasawara says that the training data is at least as important to AI workloads as the algorithms and compute capabilities.

“If you think about having compute and storage, what Andrew is saying is, if you have your extra dollar to spend, spend it on adding more storage because having that extra data at a marginal cost is a much, much better bang for your buck then adding your next higher-level GPU or writing one more layer in your deep learning model,” he says. “That type of analysis … has increased the value and importance of mass storage and being able to access that mass storage easily, not in a siloed way, where parallel filesystems might be limited, but being able to expand that amount of addressable storage. That’s what a lot of deep learning models and deep learning-type systems are focusing on now, just adding much more ability for storage. In budgeting – how you split up your costs between CPUs, GPUs, networking and storage – storage … is sort of like the stepchild or third part of that in terms of spending. But now the evidence is showing in deep learning, certainly that storage is really the more important part. We’re going to see that type of evidence more and more in other types of HPC workloads, like simulations.”

With HPC gaining wider use in highly distributed environments as its spills more and more into the enterprise, the opportunity for object storage vendors like Cloudian that are expanding their portfolios to grow the use cases they can tackle will grow.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.