Battle Of The Document Databases

Cloud providers can be like sharks in that they have to keep moving forward – in their case, growing the number of services they can offer enterprises – or be overtaken by competitors.

That need to continually grow the service portfolio isn’t going to go away any time soon. Cloud adoption is accelerating, moving past the early adopter stage and into what Paul Teich, principal analyst with Liftr Cloud Insights and a contributor here at The Next Platform, calls the “early majority” phase, and enterprises are moving forward with strategies that involve leveraging more than one public cloud provider.

At the same time, the pile of data being generated by organizations is growing rapidly – and the cloud is becoming the place to collect, store, process, and analyze much of that data – but only a portion of enterprise workloads – about 20 percent – are being run in the cloud. That means that four out of five are still run in on-premises environments, so there is still a lot of applications that need to make their way from behind the firewall and into the cloud.

Given that, it’s not surprising that cloud providers like Amazon Web Services, Microsoft Azure, Google Cloud Platform, IBM Cloud, and others are rapidly expanding the number of cloud-based services they can offer businesses, and that often these services revolve around data – management, migration, storage, and analytics. One of the challenges for cloud providers is that as they grow their services portfolios, they can come in conflict with technology partners who have similar services on those cloud platforms, particularly if those services are based on open source technologies. The worry is that it’s become too easy for cloud providers to build branded cloud services based on open source code.

AWS has come under criticism in recent months from open source advocates for just that, essentially poaching open source code to build a new service and putting its name on the new service without contributing back to the open source community. A recent example was the launch in November 2018 of a managed version of Apache Kafka, an open source streaming technology. The same criticism arose again this month when AWS unveiled DocumentDB, a cloud-based NoSQL database compatible with multiple versions of the popular MongoDB database tool. AWS has said the service will deliver greater performance and scalability than MongoDB while enabling developers to continue to use the same MongoDB application code and tools that they have now and allow customers to move their MongoDB databases from either their on-premises environments or Amazon Elastic Compute Cloud (EC2) and find greater performance, scalability, and availability.

MongoDB argued that what AWS is offering is a pale copy that is based on older MongoDB code. The company also said DocumentDB is just the latest example of AWS saying on one hand how much they love the open source community while harming it on the other. In a statement to the media, MongoDB chief executive officer Dev Ittycheria said that “imitation is the sincerest form of flattery, so it’s not surprising that Amazon would try to capitalize on the popularity and momentum of MongoDB; however, developers are savvy enough to distinguish between the real thing and a poor imitation.”

AWS said customers like the MongoDB API and various features like including expressive language query and use the database to collect, store, and access semi-structure data. However, they also find the technology frustrating. Managing MongoDB clusters can be complex and it is difficult to build applications that can scale to fit their changing needs, such as to multiple terabytes and hundreds of thousands of reads and writes per second. Also, as applications grow, customers run into performance and availability issues. Enterprises end up spending a lot of time and money managing MongoDB clusters at scale, including securing, patching, and running the database.

According to AWS, DocumentDB emulates the APIs and runtime of the MongoDB 3.6 release – it does not include any open source MongoDB code, but it is made to look and feel like a MongoDB server. Because of this, enterprises that adopt DocumentDB can continue using their current MongoDB drivers and tools. The AWS service automatically scales from 10 TB to 64 TB of data per cluster, taking the responsibility for capacity planning or provisioning the storage infrastructure off the customers’ hands. And I/O demand is reduced because the database writes only the changes to the storage layer rather than across network links. According to AWS, throughput is twice that available on MongoDB offerings, the storage and compute are decoupled to enable independent scaling of each, and read capacity can scale quickly to millions of requests per second by quickly adding up to 15 low latency read replicas.

“Amazon DocumentDB uses a purpose-built, SSD-based storage layer, with 6X replication across three separate Availability Zones,” Jeff Barr, chief evangelist for AWS, wrote in a blog post. “The storage layer is distributed, fault-tolerant, and self-healing, giving you the performance, scalability, and availability needed to run production-scale MongoDB workloads.”

DocumentDB is one of a wide array of database services AWS offers. Other include relational database services Aurora, RDS and Redshift, DynamoDB for key-value databases, Neptune for graph databases and ElastiCache for both Redis and Memcached. DocumentDB is the only AWS service for document databases.

MongoDB offers has its own cloud database service, called Atlas, that runs in 60 regions on AWS as well as on Azure and Google Cloud. Sahir Azam, senior vice president of cloud products for MongoDB, told The Next Platform that he credits AWS with helping to spread the gospel of document databases over traditional relational ones and shifting the landscape toward MongoDB and others. However, Azam pushed back at claims AWS made about MongoDB and DocumentDB. After tearing down and testing DocumentDB, MongoDB engineers found that DocumentDB was more equivalent with the 2.4 version of MongoDB – released in 2013 – than the 3.6 version released in 2017.

He also said that DocumentDB is more about AWS looking to take advantage of the popularity of Atlas, which according to the company’s latest quarterly numbers grew 300 percent year-over-year and is now 22 percent of MongoDB’s business after fewer than three years on the market. A key to Atlas is that it not only can help customers move data across 60 AWS regions, but also between cloud provider platforms. DocumentDB can lock users into AWS’ environment.

“Fundamentally what they’ve done is emulated in the MongoDB API, so it’s not a real implementation of Mongo, and they’ve bolted that on top of a back-end storage system that they’d built fundamentally for relational databases,” Azam said. “That has big implications around such thing as the geo-distribution. One of the things that Atlas supports is the ability to span the database across any number of AWS regions and to support the isolation of data in certain countries for data sovereignty, to move data closer to the customer for performance or latency reasons – either for writing data or reading data. The Aurora architecture that they’ve built this emulation on top of fundamentally doesn’t support anything like that and that’s because at the architecturally level, it’s still a scale-up architecture.”

He also said he’s not surprised that enterprises trying to self-manage MongoDB databases find them complex, adding that “a distributed database, like any database, is complicated to manage on your own. We completely agree. That’ why we built Atlas. Amazon’s certainly seen the success of Atlas in terms of going from nothing to 20 percent of MongoDB’s revenue – thousands of customers globally from large enterprise to innovative startups. So we agree that if somebody is in the cloud, then consuming database as a cloud service is much better than hiring an operations team and learn the skills to scale that application globally.”

MongoDB and other have taken steps in recent months to protect themselves against cloud providers that might want to poach open source technologies without giving back to the open source community. In October 2018, MongoDB changed the wording in its licensing to stipulate that if a company wants to build a database service using MongoDB’s IP, they open source all the technology required to turn it into a service. AWS has become the focal point for “strip mining” open source technology, but there have been instances when other cloud providers have done the same, Azam said. Companies like MongoDB are just trying to protect themselves and the open source community against it, he said.

The pricing calculator for MongoDB Atlas is here, though pricing depends on usage. However, a MongoDB spokesman told The Next Platform that when looking at DocumentDB, AWS pricing on its site is on a per-instance basis, while Atlas is priced on a three-node replica set, so essentially three times whatever AWS is listing for an instance is the equivalent to MongoDB pricing.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.