The difficult part about storage these days is far less about capability than about adapting to change. Accordingly, the concept of programmable storage is getting more traction.
With such an approach, the internal services and abstractions of the storage stack can be considered as building blocks for higher level services and while this may not be simple to implement, it can work to eliminate duplication of complex, unreliable software that is commonly used as a workaround for storage system deficiencies.
A team from the University of California Santa Cruz has developed a programmable storage platform to counter these issues called Malacology, which aims to allow users to build new services on top of the existing subsystem abstractions of the storage stack. To understand this better, we talked with two of the leads on the project, Michael Sevilla and Noah Watkins.
The Next Platform: Let’s talk first about what this is, how it works, and what applications it might have—and where. What problems does this solve in terms of an increasingly messy storage system and what were some of the shortcomings of building a programmable storage system?
Michael Sevilla: . The motivation for this was we realized that software defined storage, like elastic databases and file systems, were taking over. And we think that this was because they’re extremely flexible, so software defined storage can run on a variety of smarty hardware, and it can scale up and down depending on your needs. So, our research takes a principled approach to extending software defined storage.
We have tackled this by adding interfaces into the storage system to expose some of the common functionalities and abstractions that are already inside storage systems. And by doing this, it makes it easier to efficiently implement robust applications by using a lot of the code hardened subsystems inside the storage system already. So we feel like this has really great implications for developers, because it allows developers to customize their stack to get the performance that they want, while reducing the amount of redundant code that they write.
One of the big problems that we saw was that application developers were just bolting on layers of software to get different kinds of functionality, and a lot of that functionality was already in the storage system. We expose that with what we call interfaces. We feel that this also has implications for what we call proprietary storage systems, because they can provide storage systems that can adapt to the applications, and this is a way that they can keep up with the open source world where flexibility is king.
The Next Platform: Talk about how that works. How does it adapt to applications? What is the intelligence cooked into this here, and what does it sit in that storage stack, or software stack?
Michael Sevilla: It’s more of about exposing what’s already there. The storage system itself does not adapt. Traditionally when you would build a really complicated application you would bolt on a bunch of layers, but now we’re removing all of those and using the functionality that is already there. So, the result is a system with a lot less layers where essentially your application is running directly over your storage system, and when the application needs something specific it can reach into the storage system and pull out code that will do what it wants.
One of the things that was in the storage system already that we exposed was something that managed storage metadata. Inside the storage system, the code has a pretty good idea of which servers are up and down, which servers are responding fully, which ones are going rogue. And, to do that a lot of the processes going on inside the storage system have to talk, and they talk using POSIX. But basically, applications up top, they want a lot of their information versioned in that way as well. The application will want to make sure that everyone’s agreeing on the same thing. So this consensus engine is already in the storage system and we exposed that to the application system so they can use that for any kind of consensus needed.
The Next Platform: What performance, efficiency, and other advantages can be gleaned from this when put against real world applications and storage environments?
Michael Sevilla: The performance impact should be pretty drastic, because there’s not as many layers the requests have to go through, there’s not as much code that has to traverse because you’re not bolting on extra code to get the functionality that you need. So, in a sense, the way that systems are traditionally built, when you’re adding a bunch of layers, you end up executing redundant things.
The application may build a load balancer into itself, and then the storage application will end up with load balancing logic underneath as well. You can just imagine a request that has to go through two load balancers instead of just one, so we’re reducing the code path, and that’s just high performance gains, but I think equally as important the storage system makes a lot more sense because there’s a lot less code that developers have to understand, and use.
Noah Watkins: I think he’s right when he talks about reduction of code paths. But just to make it explicit about the code paths since it is not self-evident why that’s important. We’re interested in new memory technologies like, NVME or persistent memory, and the performance capability of those memory devices are such that the bottlenecks are shifting slightly, and code paths are actually making a difference in the latencies that we see. For example, SPDK from Intel is running in user space because the devices can run so fast that they can’t to go through the kernel. So code paths are relevant, and why is that relevant for programmability? Well, some applications are simply complex, and they will have a lot of code, and so if we’re looking at a storage system that provides a fixed set of interfaces, then even though an application may be able to map its semantics onto existing interfaces, it may be the case that providing an application specific interface which may, in many cases, be much simpler, it could provide the application with significantly reduced code paths.
It’s that application storage co-design that can reduce code paths, which is in turn is really important for the next generation memories. And, there’s also improved ways of using the storage system that could benefit performance. For example, POSIX provides no transactional semantics. If an application says, I want to update part of a file, then the storage system tends to not provide any help there. Any type of consistency, then the applications have to manage themselves. But when you start to dig down into the storage system, you find that at the bottom layers, below the posit file systems layer provided at the distributed systems level, there’s often things like atomic updates, or transactions it does itself to implement these higher level services. So, you could imagine things like read copy update as a useful primitive that applications could use. And having that performed all on the storage system side prevents more complex protocols at a higher level that may not be the most optimal ways to achieve some consistency goal.
The Next Platform: You hit on a good point about the coming changes in storage and memory hardware. Are there any storage or file systems that are going to be a better or worse fit for what you’ve developed or is this all broadly applicable?
Noah Watkins: The work is sort of young enough that we had to push on Ceph pretty hard to get these things exposed, and Ceph is providing very strong consistency with target applications we were looking at. When you start to look at other storage systems, my personal view, and Mike may have a different view, is that when we’re looking at HPC storage systems or storage systems that are providing strong consistency, like things that are supporting say Openstack block store, these things tend to across the board have a lot of these common subsystems.
I think it is reasonable to think that many of these subsystems that we identified in the Malacology paper are going to be present in a lot of other systems. But it’s easy to also go look at storage systems like any number of cloud based things that have eventual consistency. These may prove much more challenging for the examples we used in the paper, but it’s reasonable to think there’s a bunch of other classes of applications for which those systems also have sub services that are also useful. But the work is primitive enough that we don’t really have a formal model for any of this. We can’t definitively classify all of the services and applications and that compatibility matrix.
The Next Platform: Can you give us some context about how this might work in a real world application and storage system context?
Noah Watkins: Part of the challenge of all of this is to be able to find really good examples, you need to be able to open up existing software that’s using the storage system in a way that we’re advocating that they don’t.
It took us a long time to find good examples for this, not because they don’t exist, but because we just don’t have a comprehensive knowledge of all of the potential applications already of this. So, we settled on the example that we used in the paper which is the CORFU protocol, which is itself a storage abstraction. The question we were asking ourselves there was this abstraction has received a lot of attention, it’s very powerful, but the authors of CORFU had to build an entirely new storage system to support it, so could we somehow instantiate this new interface on an existing system using the tools and building blocks
If we look at existing systems like Hadoop, they have an incredibly tall stack where there are a number of subsystems, such as Zookeeper, providing consensus that are used to support a number of applications that would run on top of this system—they may have lots of things like file formats, or protocols that are used to manage storage on top of POSIX in a very application specific way. I think that then with the introduction of something like Apache Kudzu, we’re seeing movement away from that, where there’s more intelligence pushed down into the storage system.
The takeaway that I see is that especially when people are starting to build new applications, some of the very first things they encounter are how do I provide this service or that service, and we through our investigation have found that those services are already available, but not exposed. We don’t have a lot of specific examples of existing applications that you would want to go modify to use this, but we find the thought exercise of imagining new applications being built, or new storage interfaces, that tends to be powerful. I hope that answers your question. Unfortunately, we just don’t have an exhaustive list of existing things that this is really great for.
The full paper on Malacology programmable storage can be found here.