Object storage is not a new concept, but this type of storage architecture is beginning to garner more attention from large organisations as they grapple with the difficulties of managing increasingly large volumes of unstructured data gathered from applications, social media, and a myriad other sources.
The properties of object-based storage systems mean that they can scale easily to handle hundreds or even thousands of petabytes of capacity if required. Throw in the fact that object storage can be less costly in terms of management overhead (somewhere around 20 percent so that means needing to buy 20 percent less capacity to store the same data), and it is not difficult to see the attraction.
Object storage systems manage data as binary objects rather than as files, and an object can be anything from a document to a video clip to a virtual machine image, stored alongside metadata that can be used to query or analyze that data.
Unlike NAS or SAN systems, object storage uses a single global namespace that allows any object to be stored or retrieved from anywhere in the data environment via a unique identifier. This approach allows object storage systems to scale as the environment grows simply by adding more nodes or more drives.
The first commercial object storage systems appeared around 2003, with EMC’s Centera family a good example. These were initially aimed at secure storage of documents in highly regulated environments, because object storage is immutable; objects cannot be deleted, and any changes are instead stored as a new object.
Since that time, the launch of Amazon’s public cloud services platform, with its highly successful Simple Storage Service (Amazon S3), has raised the profile of object storage as a way to store very large volumes of data. This in turn has led to new interest in object storage systems for on-premises deployment, with a number of distinct use cases emerging.
These include active archive systems, high availability and disaster recovery, and even as a potential replacement for NAS systems serving traditional enterprise IT applications, but mostly for scale-out applications storing large volumes of unstructured data.
One of the early pioneers of object storage systems was DataDirect Networks (DDN), and the firm is still regarded as one of the leaders in this field with its Web Object Scaler (WOS) platform, according to the recently published report on object storage.
“Five years ago when people first started taking about object storage in a big commercial sense, it was being driven by these very large scale data stores in Web 2.0 infrastructures, and for many of these DDN is the back end infrastructure today,” says Laura Shepard, senior director of product and vertical marketing at DDN.
“That was what was driving object storage at the time; [customers] were looking for the architectural benefits of scale, and simplicity at scale translating into cost benefits for massive architectures, and object storage was really the only appropriate infrastructure for that,” she adds.
Simon Robinson, storage research director at 451 Research agrees, saying that traditional storage architectures were just not designed for the types and volumes of data that organisations are now having to deal with.
“If you think about NAS and SAN, they were designed in a terabyte era, when a few hundred terabytes was going to be more data than you would ever need, while now it is common to have multiple petabytes, and some organisations have hundreds of petabytes. You need a technology that is capable of scaling on that basis,” he explains.
With these apparent advantages, the question arises why object storage has not become more widely adopted already, and why it has not displaced NAS and SAN systems. The reality is that enterprise storage is notoriously complex, and different applications have differing requirements for data latency, bandwidth, and capacity.
“If you think about the storage environment in any large organisation, the big challenge is that the environment is too complex and too fragmented, and there are too many systems from different vendors all doing different things. So, coming in and saying that the answer is to buy object storage, well, that’s another silo you are introducing, so you would be making the environment even more fragmented to start off with,” says Robinson.
According to Shepard, some workloads will also not be appropriate for object storage for some time. “Things that have a massive amount of small IOPS, object storage is not good for that and can be slow. But for a large sequential I/O and massive archive; for wide area data sharing and data availability as a replacement for traditional backup, it is absolutely a strong candidate,” she explains.
Object storage systems are also accessed via a REST API, often those of Amazon S3 or OpenStack Swift, rather than the common file or block protocols used with NAS and SAN systems, and many existing enterprise applications do not natively support these. This is becoming less of an issue as object vendors add support for protocols such as NFS and SMB, enabling object storage to look like standard storage to applications.
However, Shepard believes that this is leading to a split developing in the object storage market, with many of the major vendors adapting their offerings to focus on serving standard enterprise IT applications, while DDN is keeping more of its focus on the use cases favoured by its traditional HPC customer base and more data-intensive enterprise applications.
“You’ve got a lot of the people in object storage headed for this very rich new opportunity they see for object storage which is as a pure IT play. DDN on the other hand, is focused more on our Web 2.0 customers, massive active archive, big data for collaboration over multiple sites, and creating new ways to improve data availability by replacing traditional backup and HA,” she says.
This matters, because the different use cases that vendors are focusing on will influence the features they prioritise when roadmaps for future development are being drawn up, according to Shepard.
For its customer base, DDN believes its WOS platform has a number of capabilities that differentiate it from rivals in key areas such as performance, efficiency, reliability and the ability to scale.
On the efficiency side, WOS is one of few true object storage systems that has no underlying file system, according to the firm, while many rival platforms are actually built on top of the ext3 or ext4 file systems.
“We have an implementation we call NoFS, because there is no file system, just a pure object implementation, which means you don’t lose the space that you need to accommodate your file system and your volume manager, and in a typical scenario the NoFS approach avoids about 10 to 20 percent of disk space lost to block wastage as well,” Shepard claims.
“Basically, what it means is that if you buy a petabyte of WOS versus a petabyte of rival object storage, you’re going to be saving tens of thousands of dollars just in hardware costs alone, so efficiency really does matter,” she added.
This also has a knock-on effect on performance, because with NoFS, WOS does not have to carry out the myriad underlying operations that a file system has to perform when writing and retrieving data.
“Who cares about object storage being performant? Well, at scale, everything is a performance problem. For WOS, the numbers we publish are from customers doing proof-of-concept deployments, so they’re real figures, not our lab test numbers, and they’ve published 1.2GB per second per node,” Shepard says.
Another key capability of WOS is a feature called Local Object Assure. This concerns DDN’s implementation of erasure coding for data protection, but which also has knock-on effects on scalability and performance.
Erasure coding splits the object data into multiple shards which are then encoded with redundant extra information so that the entire object can be still be recovered, even if a number of the shards should be lost or corrupted.
According to DDN, this results in most object storage systems requiring a minimum installation of six to nineteen nodes, just to allow shards to be stored across enough nodes to allow for a rebuild. Local Object Assure, however, supports local erasure coding inside an individual node.
There are two implications of this: firstly, customers can start out with just a single appliance containing two nodes and scale up as required, and secondly, the ability to perform a local rebuild eliminates a large volume of network traffic that would otherwise be required to perform the same rebuild operation between nodes.
“Local Object Assure is just one of the included data protection options with WOS. Multiple options can be combined and different protection levels applied to different data – based on customer needs. With Local Object Assure, you can do any rebuilds within a single node. Some customers care about this, because they may have one node in the US, one in Dubai and one in London, and suddenly you are looking at a wide area network rebuild, and what do you imagine that would do to rebuild times? Even rebuilding locally across 10, 40 or 100Gbps Ethernet adds a huge amount of latency to a rebuild, versus rebuilding internally on the system across 6Gbps SAS channels,” Shepard explains.
Meanwhile, DDN also says that WOS has a proven ability to operate at massive scale, with Shepard claiming that one customer has a production deployment storing half a trillion objects. For comparison, Amazon stated in 2013 that its S3 cloud storage service was storing two trillion objects across its entire global infrastructure.
In the future, it looks like object storage is set for broader adoption, as the volumes of data that organisations have to manage continue to expand, and this could lead to a reshaping of the storage hierarchy within data centres.
“When you talk to people who have a lot of storage knowledge and are looking at what their lives are going to look like three to five years from now, the story that we hear is that fast storage, like our parallel file system appliances, diminishes, and what expands instead is a very fast flash layer – like DDN Infinite Memory Engine – for immediate data processing needs, and on the back is something that is inexpensive and can grow very large, and right now the best candidate for that is object storage,” Shepard says.
“I think the question is over the NAS market, and you have to look at that space and ask how much stale data is on a Tier-1 NAS box, how much of that content has not been accessed over the past six months to a year, and in theory you could move that stale data off the NAS and onto object and in that way reduce the size of your NAS investment, or potentially get rid of it altogether,” Robinson says.
Another aspect to consider is the development of open source storage systems, such as OpenStack’s Swift module and Ceph, which offer some of the capabilities of commercial systems at a dramatically lower cost. But while the software for these may be free, organisations still need hardware to run it on.
“While these got a lot of attention early on, they’ve arguably not sustained that over the last year or so. I think a lot of organisations took a look at Swift and Ceph a year or two ago and decided they are not ready for primetime yet,” Robinson says.
However, currently only about one in five large organizations has an on-premises object storage system, according to 451’s own figures. This means that there is still a large untapped potential market, and explains why vendors are keen to promote their object storage systems.