A Database For All Locations, Models, And Scales

Enterprises are creating huge amounts of data and it is being generated, stored, accessed, and analyzed everywhere – in core datacenters, in the cloud distributed among various providers, at the edge, in databases from multiple vendors, in disparate formats, and for new workloads like artificial intelligence. In this fast-evolving space, a database vendor that is aiming for a broad reach is going to have to adapt quickly.

Aerospike is one of those vendors who has been constantly adapting. The company – going on 14 years, after starting off in 2009 as Citrusleaf before taking its current name in 2012 – offers its eponymous database that is the foundation of its Aerospike Real-Time Data Platform. The platform lets organizations to instantly sprint through billions of transactions, power real-time application at sub-millisecond speeds, climb to petabyte scale, and do so while reducing the server footprint by as much as 80 percent.

As we’ve mentioned in previous coverage, the goal is low latency for high throughput jobs.

At the same time, the company understands that if it wants as many transactions as possible to run through its flash-based NoSQL platform and to pull into its gravity as many organizations as possible, it needs to be able to reach into and support as many data sources as possible.

“We have supported strong consistency, which is a little rare in the SQL database,” Lenley Hensarling, Aerospike’s chief product officer, tells The Next Platform. “We find that some people actually move back to an Oracle or DB2 and this is one of the reasons that PostgreSQL has kept growing. But in the NoSQL world, we have strong consistency. We have the ability to do that for up to 12 billion transactions a day and we have some people doing more than that. But all of that data gets captured by Aerospike. As that has happened, more and more customers have said, ‘We need to make use of this data and provide this data wherever it can be used.’”

Over the years, Aerospike has moved down multiple avenues to ensure its platform is covering those bases, including through its growing portfolio of connectors, used to integrate the Aerospike Database with open-source frameworks, including Spark, Kafka, Pulsar, and Trino. The company this month unveiled Connect for Elasticsearch, an open-source search and analytics tool that will let data scientists, developers, and others to rapidly run full-text searches of the real-time data in the vendor’s database.

“We built a connector for that,” Hensarling says. “We could have built textual search into our database, but one of the things we pride ourselves on is the efficiency in handling transactions, both read and write, and being able to handle that massive ingestion. We’re very cognizant that it’s a distributed application and it can connect to other distributed applications or infrastructure like Elasticsearch.”

The Elasticsearch connector dovetails with the change data notification and change data capture abilities that first appeared in Aerospike Database 5 and has continued on in Database 6, which was released last spring.

<<Aerospike Database 6>>

The Aerospike technology is “key to streaming data,” he says. “It’s key to being able to push data off to where it can be used best. It’s also, in this case, used to update the indices in Elastic and to put the data in Elastic to use that textual, very flexible, fuzzy search capability. We push only those fields that are necessary to do that over into Elastic. We accompany that with the digest, which is essentially the address in our database. That’s one hop back to the actual record in Aerospike. If you want to do a textual search for all the things – and sometimes companies do it not just for what’s in Aerospike, but what might be in other databases, too – and they get back those results from Elastic. But for our data, they can go directly back using that digest to the actual record. And it’s incredibly fast to do that. … We have a number of customers who said, ‘We’re using Elasticsearch. We want to make your data accessible through Elasticsearch.’ That’s what we’ve done.”

Database 6 introduced highly parallel secondary indexes, giving such queries the same speed and efficiency found in primary indexes, and also supports SQL through the Spark and Trino integrations.

The Elasticsearch Connector followed on other moves over the past year to cast a wide net over what the real-time platform and database can do and support. In Database 6, the company included support for JSON document models regardless of scale and enhanced support for Java programming models, including JSONPath for storing and searching datasets and workloads. Aerospike for four or five years has support document and object-style records, but JSON helped the company continue to push into the mainstream, according to Hensarling.

Also last year, the vendor partnered with Trino-based Starburst to launch Aerospike SQL Powered by Starburst, an integrated distributed SQL analytics engine in Aerospike Database 6 based on Starburst Enterprise and leveraging Trino. The revamped secondary indexes in Database 6 gave Aerospike better search capabilities, which Hensarling told The Next Platform the company “can support through a push down from the Starburst worker and the connector that’s in that Starburst worker. That model allows us to do a lot of search and analytic capability through Starburst and make that open up our data to our customers, to new constituencies, like data engineers, compliance people, audit people.”

All these steps – from Elasticsearch and Starburst to the other connectors and more features in Database 6 – let Aerospike extend its reach into more datasets and data sources that enterprises are using and to better compete with the likes of other NoSQL databases like CouchDB and MongoDB.

“People talk about a data pipeline,” he says. “There’s not a data pipeline in any companies. There are literally hundreds or thousands of data pipelines for different users, and we can distribute the right data to the right places in some definition of real time. The global data mesh winds up being in sync, in real time to some extent. That’s our picture of the world and how we sit in that. More and more companies are starting to see that they’re being overrun with data, but we can’t just say we’re only going to take certain things. We have to handle it all and all at the same time, by some definition. It might be milliseconds. For some it might be seconds, for others it might be you go off and do machine learning training for hours someplace else.”

The vendor is seeing some momentum behind its efforts. Aerospike last February said 2021 saw worldwide sales double and that its roster of customers – which already included PayPal, Wayfair, and Yahoo – added others like Critero in France, India-based Dream11, and Riskified in Israel. The first half of 2022 continued the trends, according to the company.

One number that stood out in 2021 was the 450 percent jump in year-over-year recurring revenue for Aerospike’s Cloud Managed Service. Hensarling says that, like other tech vendors, the company’s cloud services got a boost from the COVID-19 pandemic, when enterprises had to rapidly accelerate their cloud efforts. He describes the managed services as the “high-end production workload thing.”

The company this year is taking a deeper step into the cloud with its Database 6-based Aerospike Cloud database-as-a-service running on Amazon Web Services (AWS). Aerospike in November announced early availability and Hensarling says it will be in trials with early customers into the early part of the second quarter, with general availability later in the quarter. Customers were telling the vendor that more of their projects were starting in the cloud.

“That’s what’s driving this, being able to have an idea, go implement it, not have to think about buying cycles for hardware or availability, operations staff in your company and things like that,” he says. “That is something that we see as really critical to growth going forward.”

Aerospike this year will expand its capabilities among other data models, including graph databases. Enterprises are using more models as they triangulate to get more answers from the data. There is the push for “multi-modal database capabilities from one vendor,” Hensarling says. “That’s driven some of our investments. For 30 years, we converged on relational databases and that was the answer. Then all of a sudden we said, ‘No, there are other things. There are reasons to be more elastic and more scalable.’ So NoSQL started happening. But there’s also these new capabilities like graph databases to get different kinds of answers. But people don’t want to deal with five, six, seven different vendors.”

Similarly, they don’t want to be beholden to a single cloud provider, which means companies like Aerospike – which also supports Microsoft Azure, Google Cloud, and Kubernetes-based private clouds – will have to continue to be cloud-agnostic and expand to meet customer demands, he says.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.