While the interface has changed little over time, Amazon’s Simple Storage Service (S3) is anything but basic on the backend. With over fifteen years of development, the concept of “storage for the internet” spinning from retailer Amazon.com’s own interest, continues to evolve, driven these days by a sharp rise in machine-to-machine interaction.
“We launched S3 as ‘simple’ but I’m not convinced it’s simple anymore,” Amazon CTO, Werner Vogels, said with a chuckle during a conversation with The Next Platform.
“Everything we’ve added over time is far from simple, from lifecycle management, security properties, functions you can add to an object and so on. But we started off simple to make sure the basic API interfaces would be sufficient years later. APIs are forever. Once you’ve created them for your servers, it’s not something you decide you don’t want later. Businesses are built around them. But things do change over time, we’ve gone from people, say, putting thumbnails of products on eBay to storing literally petabytes of video, image, and other data and keeping it for good.”
The last few years of how access patterns for S3 will reshape the storage service’s future architecture. Vogels says that there has been a dramatic shift from consumer-oriented access to machine communication.
“More than 60% of current access to S3 is machine-to-machine now,” he tells us. “If you go back to when we launched S3, we began with hardcore distributed systems principles and built against those. But then, the networks weren’t as reliable as what we have today, partitioning was real, and we eventually had to work toward eventual consistency—making sure that under all circumstances we could still serve and object even if it wasn’t the latest version in a small window. For then it was fine, it was website materials and video and image content.”
“The internet of fifteen years ago, when S3 started, was mostly websites and imagery. We thought that would be the focus of this storage engine we were building. I remember being in the room during the first drawings of this trying to sort what scale it should have. We just added two zeros to the number of objects we thought we’d have for the heck of it and blew through that in the first month. Now, S3 holds on the order of over 100 trillion objects.”
Things are different now, to say the least with petabyte-sized datasets, advanced analytics and machine learning consuming what used to look a lot like static, relatively small sets. As that shift began to occur three or four years ago with demand for high performance database, analytics engines, and machine learning, Vogels says they noticed customers had to do way too much work on their own to avoid the pitfalls of eventual consistency, so they started to make strong consistency a priority.
Vogels points to a piece he wrote on this that describes the evolution of user requirements. As he explains, “We thought about strong consistency in the same way we think about all decisions we make: by starting with the customer. We considered approaches that would have required a tradeoff in cost, in the scope of which objects had consistency, or in performance. We didn’t want to make any of those tradeoffs. So, we kept working towards a higher bar: we wanted strong consistency with no additional cost, applied to every new and existing object, and with no performance or availability tradeoffs.”
“This was all quite a challenge, it wasn’t just that we wanted to implement strong consistency, but we wanted to make sure the guarantees we were giving customers were really true. It meant we had to focus on things like automatic reasoning and formal verification. We had hundreds of use cases, each of which had to be covered by our new protocol. So now we’ve launched strong consistency for S3 and can give the high availability guarantees we began with.”
With the strong consistency piece in place to weather the machine-to-machine revolution in S3 usage, there are other challenges on the horizon—and opportunities too if AWS can continue its tradition of adapting along a defined architectural route.
When it comes to future challenges for S3, Vogels says there are shifting tides ahead in networks and latency. Of interest, this means the concept of the cloud’s role in storage and compute has to rise to meet a new bottleneck, even if it means shifting the idea of cloud services entirely in some of the most demanding cases.
“If we go back to the early days, the cloud was really just a collection of datacenters centralized in regions. We’re now up to 24 regions but still, it feels centralized by location. But now the cloud is changing, for some customers it means we’re delivering into the 5G access points directly for low latency access. Or if you look at things like Snowcone, which basically holds a complete S3 storage engine and compute in something the size of a few Kleenex boxes and allows customers doing research without network access to store and process, it’s cloud, but it’s the cloud everywhere—not necessarily a bunch of datacenters sitting in regions. The whole notion of cloud is changing; how we move data, where we do the computation, and with things like Outpost, it’s like the cloud in terms of AWS services but it’s on prem.”
Because we had him on the line, we had to talk about future directions in the traditional storage infrastructure too and how those reflect changes in usage and the very notion of cloud as a less centralized concept. For instance, with block, file, and object—what’s next? Something that meshes these together in some new, clean, AWS-specific way? And what about the hardware all that runs on? Will disk continue to be enough or do these new requirements push some heavier-duty investment in SSD, new data handling, or all of the above?
True to his cloud nature, Vogels says he doesn’t want S3 users to even think for a moment about spindles or magnetic hardware. He doesn’t want them to care about understanding what’s happening in those datacenters at all. It’s all about the services, the interfaces, and the flexibility of access, preferably with the strongest consistency and lowest latency when it really matters.
As for shifts in approaches to storage (block, file, object) the question itself is less important than how the largest volumes can be sliced and diced for specific versus mass access.
“Even in the early days at Amazon we were a data-driven company, the amount of data even our own website generated was huge, we had to decide what data to summarize and collapse, but with the cost and ease of storage of S3, we could keep it, move it to Glacier if needed,” Vogels explains.
Instead of talking about approaches to storage, the mechanisms themselves, he emphasizes that it’s less about how it’s delivered into S3 and more about refining access in the face of huge volumes with close compute availability. “We have customers with these petabytes and they’ll go in and grab the entire dataset when they only need one-fifth of it. Now, they can write a SQL query and only get the subset they need to process it. With Athena, our ad-hoc analytics engine that works directly with files stored on S3, complex analytics are done there.” In short, without actually answering the block, file, object question, his answer was simply that it’s about being able to find what piece of a huge dataset is needed without moving it around.
For the hugely data and compute intensive AI training work, we at TNP are seeing a lot of new on-prem hardware purchases, in part to keep the I/O costs in check (for on-going, constant training runs) and the high-end accelerator-fed systems fed with the fattest possible pipes. When asked about how cloud-based I/O can outpace capability of on-prem for this growing market, Vogels says I/O is expensive no matter how you slice it, and not just financially. It’s a matter of time to solution and efficiency, something he says Amazon can poke holes in with keeping compute and data as close as possible and access to anything S3 clean, consistent, and ready for analysis.
The key to S3’s success can be a lesson for any systems builder, not just storage. “The thing we knew, almost on day one,” Vogels says, “is that as we went through scaling operations with Amazon.com in six months to two years we wouldn’t have the same architecture. With every order of magnitude of change in terms of operations, your architectures needs to evolve. We were fortunate to make the decision ahead of time that we knew we wouldn’t stay the same. We launched with what we’d call microservices now—eight of them—including things like storage and application engines, indexing, and load balancing, for example. Now S3 is over 300 microservices with very specific tasks. We built an architecture that allowed us to evolve.”