Ceph, the open source object storage born from a doctoral dissertation in 2005, has been aimed principally at highly scalable workloads found in HPC environments and, later, with hyperscalers who did not want to create their own storage anymore.
For years now, Ceph has given organizations object, block, and file-based storage in distributed and unified cluster systems well into the tens of petabytes and into the exabyte levels, storage that takes high levels of expertise to deploy, run, and manage. Building and managing these massive object storage clusters takes the kind of skills that HPC, hyperscaler, cloud builder, and other service providers tend to have. But large enterprises and many Tier 2 and Tier 3 service providers do not have such skills. And the workloads they need to run – either themselves or on behalf of clients – is driving demand for object storage among more mainstream enterprises, who want to leverage artificial intelligence, analytics, containers, and similar advanced technologies but who do not have the expertise to manage complex Ceph environments.
Red Hat is looking to fix that. The company, a unit within IBM, has recently rolled out Red Hat Ceph Storage 4, with the goal of bringing petabyte-scale object storage to cloud-native development and data analytics workloads that are becoming more commonplace among enterprises and can take advantage of cloud-level economics. It also will help Red Hat broaden the markets for Ceph.
“Ceph has been used pretty much in the realm of the rocket scientist and PhD,” Pete Brey, marketing manager of hybrid cloud object storage at Red Hat, tells The Next Platform. “This will bring that into the realm of more junior administrators and more like everyday use, opening up the market addressability. In the past it’s been notorious that you had to be very, very careful how you set it up. Even if you’re experienced, you had to be very careful and you had to choose the right hardware in order to get the right performance and resiliency. There’s several different things that we’re doing in this launch that enable us to make both the installation experience much simpler but also the ongoing operational management experience.”
The rise of hybrid clouds has helped drive the development of object stores, not only from Red Hat but other vendors like Cloudian, Nutanix, and Dell EMC as well as open-source stores from the likes of Minio and SwiftStack. Brey says some estimates indicate that 70 percent of object store workloads can go to the public cloud. With Ceph 4, users will be able to deploy petabyte-scale object storage compatible with Amazon Web Services S3, the touchstone for object storage in the world.
The message around Ceph 4, which was based on the Nautilus release from last year of the Ceph open-source project, is that automation and other features within Ceph Storage 4 will make it easier to run but won’t hinder the performance or the scalability and that the data within the object store is secure, Brey says. In addition, Red Hat is looking to position Ceph Storage to work with other products, such as its OpenShift containerization software.
“Our strategy is going to be increasingly use Ceph, not just the technology, but Ceph as a product for OpenShift environments,” he says. “Our view of the world is that while today there’s a lot of excitement in the application development world for container technology, the reality is the data science side of the world is seeing the possibility of containers also. We’re seeing a lot of the open-source tools being ported to run on top of Kubernetes and so we want to be able to support that. Given the ability to support massively scalable environments, we think it’s the ideal platform. Without projecting too much, you’ll see us with that kind of positioning.”
Automation A Key To Ceph 4
Modern workloads, rapid data proliferation and distributed environments are taxing enterprise storage capabilities. In short time, the amount of data created outside the traditional datacenter will swamp that being generated at the core, driving the need for such scalable solutions like Ceph. Making it easier to use will be crucial to driving enterprise adoption, and automation is what enables simplification. That includes automating the installation process and some operational management tasks. For example, the company put a GUI onto the installer, with the software looking at the hardware and ensuring there is enough memory, that the network interface cards can handle the load and that the disk subsystem can deliver the needed performance. Red Hat also put in a dashboard for automated monitoring and problem detection and resolution. It also can detect and mitigate noisy neighbors, those virtual machines that consumes a lot of I/O.
“In the past, Ceph has always been very much CLI-driven configuration management interface because Ceph was developed for these massively scalable, hyperscaler-type of installations, where everything is scripted and it really doesn’t make sense in those enterprise environments,” Brey says. “But again, we’re trying to open up the market for Ceph and that’s why we’re making these changes and adding these features. For the mainstream user, we’re trying to make it simpler and we’ll give you guidance on what the exact configuration should look like. But under the covers, if you want to get in and pop the hood and you want to tweak knobs and you want to get the absolute best performance for your particular workload, you can still do that. We haven’t taken anything away from a crowd of people who know Ceph and like Ceph and know how to tune it.”
Automation also is found in the integrated bucket notifications that support Kubernetes-native serverless architectures. The goal is to create automated data pipelines. Brey uses the example of a person getting an X-ray during a doctor’s visit. The digital images are dropped into a container, which automatically triggers downstream serverless processes using Red Hat’s AMQ Streams, a productized version of the open-source Kafka data streaming platform. The image is analyzed and if the patient is at risk, another downstream serverless process is triggered to label the image and patient’s record and move them to a clinical bucket for another doctor to see and analyze.
The technology also can be used in other sectors. Red Hat is investigating use case in financial services and retail and government also have promise.
The bucket notifications feature also is an example of Red Hat pulling together products to create a full infrastructure that organizations can leverage. It sits atop OpenShift, uses Ceph as the bucket notification feature and AMQ Streams, as well as the Knative serverless open source computing technology, Brey says.
The Nautilus Effect
Ceph 4 is based on Nautilus, which Red Hat leveraged but also brought in features from other open-source projects, like OpenATTIC, a management and monitoring system for Ceph that Red Hat worked with others to develop. More important was the inclusion of BlueStore, which Red Hat initially implemented in Ceph 3.3.
“In the original release of Ceph and the early versions of Ceph, that piece of the architecture was called FileStore,” Brey says. “It had some particular challenges in how it did write backs around caching and logging. Those were two of the biggest problems that BlueStore solved. What it boiled down was there is a double-write penalty with FileStore and BlueStore eliminated that write penalty. … The truth of this is we introduced BlueStore in our 3.3 launch, but it’s not until now because we had to execute all these benchmarks. It’s not until now that we’re able to go public with these benchmarks in this particular launch.”
Red Hat sees a healthy market developing over the next few years, particularly around the use of data lakes, data repositories for analytics, AI and machine learning data, he says. Many enterprises are evaluating such technologies with tools like Apache Spark or TensorFlow.
“But the reality is that most mainstream organizations haven’t necessarily figured out how to use that technology to produce reproducible business results,” Brey says. “In the next three, four, five years, that it will become much more mainstream and the technologies underlying all of this will become much more proven. What that means for Ceph is it’s moving from the realm of big retailers or big financial services companies who we’ve worked with – and some of those are customers of ours, the bleeding front edge of this – to much more mainstream across enterprises, both big and small. My radical view of the world is that every business is going to need this technology at some point in one form or another and part of it is just the technology evolution problem. Part of it is also cultural with these organizations because it’s brand-new technologies, they’re trying to figure out how to use it.”
The ongoing use of Ceph with other Red Hat technologies also will be a key in the future. With the growing adoption of containers and Kubernetes, Red Hat is investing heavily in OpenShift. There also is another product called OpenShift Container Storage, which handles a lot of the provisioning for containers. The latest released, version 4.2, launched in January, which contains technology components from Ceph.
“We keep talking about opening new markets, but I want to talk about solving world problems for personas that haven’t been typical buyers of storage and the personas that I’m talking about, early application developers or data scientists,” Brey says. “I’m trying to figure out a way to message to them that it’s storage. You don’t have to think about it, it just works. We’re working on that together with our OpenShift team. They obviously have a lot of insight into particularly the application development world. From an application development perspective, that’s the vision of where we’re going. This Ceph 4 launch is almost like a brand new day because we’re going from the old world of what Ceph was and we’re adding a lot of these new features and we’re moving into this brave new world where I think it’s going to be fine. It’s going to be exciting to talk about how we solve not just for workloads. We solve real problems for application developers.”