Like other technologies, a lot of the databases in use in the world are born out of necessity because some other database hit a performance, capacity, or latency wall – or sometimes all three at the same time. Such is the case with Aerospike and its eponymous NoSQL key-value database.
Aerospike gets its name from a rocket engine that doesn’t have the usual bell shaped nozzle, but rather an array of smaller exhaust ports that converge their explosive force across a “virtual” bell that changes shape as the outside air pressure changes as it rises through the air. It is in many ways a more efficient engine, particularly at low altitudes. Aerospike engines were designed into prototype X33 spaceplane from Lockheed Martin in the 1990s and were apparently considered for the Space Shuttle in the 1970s. A number of rockets with aerospike designs were announced in recent years, but the idea has yet to be fully commercialized.
Not so with the Aerospike Database, which has carved out a niche for itself as a low latency database for all kinds of high throughput jobs.
Aerospike was founded in the fall of 2009 and starting out as a company called Citrusleaf with a database of the same name. Aerospike has raised $78 million in funding over four rounds, starting out with a modest $2 million in March 2011 and another $8 million in August 2012.
Citrusleaf was founded by Srini Srinivasan and Brian Bulkowski. Srinivasan was a senior engineer working on various kinds of database connectors and gateways in the early 1990s. During the dot-com boom, Srinivasan was server architect at Liberate Technologies, a startup created by Larry Ellison, co-founder of relational database giant Oracle, to create thin clients for the rising commercial Internet, which morphed into an Internet TV set-top box business with the help of Bulkowski, who was a network protocol engineering technical lead at Novell before joining Liberate. Srinivasan eventually joined Yahoo in 2005 to work on its mobile applications, where he ran up against the scale limitations of the Oracle database, and decided to do something about it in September 2009 with a key-value NoSQL database that Bulkowski had been working on for a year. Bulkowski is still an advisor at Aerospike but has been chief technology officer at Yellowbrick Data, which dropped out of stealth two years ago with an all-flash data warehouse, since February 2019. Srinivasan was chief technology officer at Aerospike until 2012 and has been chief product officer for the past two years.
That August 2012 funding round coincided with the acquisition of a company called AlchemyDB, one of a handful of NewSQL databases that did not sacrifice SQL compliance for scalability or latency. AlchemyDB was based on the Redis key-value store and added SQL query features atop of it – plus relational and graph data formats. The company changed its name and its product name to Aerospike, and over the next two years, the functionality from Citrusleaf and AlchemyDB were fused, with the work done under the guidance of AlchemyDB creator Russell Sullivan, resulting in the Aerospike 3 database that launched in 2014. With that under its belt, Aerospike was able to take down $20 million in funding in that year and followed up with $32 million in November 2019.
John Dillion, a serial entrepreneur who has helped tech companies find buyers among the IT incumbents over the years, has been chief executive officer at Aerospike since early 2015. We recently had a chat with Dillon and Srinivasan about how Aerospike has changed over the years and what is going on in the database space.
Everybody starts somewhere, and Bulkowski and Srinivasan focused initially on creating a super-fast database specifically to address the needs of the Internet advertising industry. While that was a good reason to start, it is not sufficient for any company that wants to grow since so many of the largest advertising suppliers on the Internet – Google and Microsoft with their search engines, Facebook with its social network, and so on – create their own databases and datastores and take great pride in that.
“We started out in adtech because it was the first industry vertical to get swamped by the tsunami of Internet scale,” Dillon tells The Next Platform. “There was real time bidding for ads, and you had to serve a lot of data and make the experience intimate with each user on a Web site. Since then, we have focused on other verticals, including financial services (lovingly called fintech) and Internet of Things and telco and retail as well. And the interesting thing is that with Aerospike 5, launched in May of this year, the company has extended its database out to the edge, where devices have to make the same kind of split-second decisions that adtech and fintech systems have done back in the datacenter for decades.”
Over the past five years, Dillon says that annual revenue growth at Aerospike has been rising between 40 percent and 50 percent, which is pretty good growth for a company of its vintage. Having that experience in adtech and fintech is critical to Aerospike’s success breaking into new markets, and the competition is fierce with 358 databases listed in the DB-Engines rankings.
Sometimes legacy databases are replaced outright by Aerospike, and sometimes they are augmented, but either way, Aerospike has a “land and expand” strategy that is common in the tech market. You start with one project and you get an easy win, and then start taking down new projects across the vast estate of databases in the enterprise. IBM’s DB2, Oracle’s eponymous database, and Microsoft SQL Server are usually the big targets, but scale-out MySQL setups also feel the burn of the Aerospike engine sometimes, too.
For instance, companies that built a Hadoop stack with an HDFS file system and the HBase non-relational distributed database overlay – championed by Facebook (but not created by it) that was in turn inspired by Google’s BigTable – are hitting performance walls, and Aerospike has gotten traction replacing these systems wholesale. A marquee customer who made such a switch is online retailer Wayfair, and Indian mobile adtech giant InMobi is another.
“Lots of times we are breathing new life into an old workhorse, like at Charles Schwab,” Dillon explains. “This is often with simple relational or first-generation NoSQL databases where they are running out of gas at scale. We can go into the front end and really make that application work better and therefore last longer. Schwab’s DB2 database was getting squashed by their mobile applications, and now we are the 24×7 brokerage system and we drip feed data back to the DB2 system, which is still used for SEC and compliance reporting. They didn’t have to redo that database. At a lot of companies, we start with a simple application, such as credit card fraud, where there is a lot of data, or like at Schwab, where we do risk mitigation on their stock portfolio, which used to take twelve hours and now it takes seven minutes.”
Here is how Aerospike looks at the legacy database architectures, and the problems it sees customers wrestling with:
And here is how it has addressed these issues with its own architecture, including a tight coupling with the Spark in-memory processing framework and its machine learning extensions, which has over 200,000 customers worldwide and is a juicy target indeed:
In any given quarter, about 30 percent to 40 percent is the net-new business for Aerospike – the land part of the strategy – and the remaining 60 percent to 70 percent of the business is the expand part – where customers move more databases and datastores over to Aerospike. While Aerospike has some very large customers, the average client spends $150,000 a year on licenses and support and has saved a bundle of money that was being shelled out for other databases.
“We don’t really sell to anybody that doesn’t have two, three, or four different mission-critical databases,” Dillion says with a laugh. “And about half the time, the existing database is out of gas and while they may have a bit of headroom, they know they are going to have to redeploy and rearchitect because what they have is not going to scale that they are going to need in a year or two. Inevitably, we find companies that were big fans of a particular relational system or even those early NoSQL databases. We have taken out a lot of Oracle, we take out MongoDB and Redis, and we take out a lot of Cassandra – what are essentially brownfield installations. We just give them a massively better mousetrap at a much, much lower cost.”
Aerospike has greenfield customers, too, in markets where new applications are being built from the ground up, such as at Bharti Airtel, the Indian telecom giant that provides landline and mobile phone services as well as Internet service in 18 countries in Asia and Africa. Bharti Airtel first used Aerospike to create a digital payment system, and then used it to create a data warehouse that gave a 360 degree view of its customers across the payment system, phones, and Internet.
Aerospike was one of the first companies to embrace the use of DRAM and flash to accelerate commercial databases, and it provides what we call “scale-in” performance as well as “scale-out” performance. By scale-in, we mean that even a single server can deliver the kind of performance that dozens of machines running other databases need to get that performance level – and often at much lower latency. And then, through the data distribution layer in the database, it can scale out across multiple nodes to push that scale even further.
Srinivasan tells us that around the time when Aerospike 3 was launched, it was not unusual to see a single X86 server node being capable of processing 1 million transactions per second. But these days, thanks to the Moore’s Law improvements in hardware – more cores, more threads, more memory, and more flash – as well as improvements in the Aerospike software, it is not uncommon to see a single X86 server node be able to push 8 million transactions per second and can boost that to 15 million transactions per second with Application Device Queues (ADQs) used in Intel 800 Series Ethernet adapters. These numbers are for terabyte-scale databases. And if customers have larger databases, on the scale of petabytes, they can still get response times of 2 milliseconds to 3 milliseconds on transactions compared to sub-millisecond response on the terabyte scale databases.
The hybrid memory architecture of Aerospike, which has evolved over the years, is the key to its scale:
And the hybrid storage architecture underpinning Aerospike has been expanded to include Intel’s Optane 3D XPoint persistent memory, and this allows a much larger in-memory footprint – another 5X server compression, according to Srinivasan, compared to other databases. This is the real trick of Aerospike. For many in-memory databases and datastores, all of the data had to be in the physical DRAM to get super-fast performance. But Aerospike could get the same or better performance and cache a lot of the data to flash, and now, with Aerospike 5, persistent memory.
“Typically, hundreds of nodes of another system become a couple of dozen nodes of Aerospike,” says Srinivasan. PayPal was one such customer, which had 220 nodes per cluster using an unnamed database and moved to dozens using Aerospike. This consolidation can be significant. At Signal, an adtech customer that had over 450 nodes running Cassandra, the same load could be handled by 60 Aerospike servers and just for one advertising application the TCO savings over three years was over $5 million. At Playtika, a gaming customer that had over 200 Couchbase servers running on a mix of DRAM and flash, that system could be replaced with only 30 Aerospike servers, resulting in TCO savings of $4.2 million over three years.
At this point, Aerospike has several hundred paying customers, and many, many thousands of companies (estimated to be pushing up close to 10,000) who have downloaded the open source community edition of its database, which first came out in 2014 when Aerospike delivered its integrated platform after the acquisition of AlchemyDB. (It is never clear if open source downloaders are using the code or just looking at it.) About 50 percent of its license revenue and 40 percent of its customer count comes from companies deploying Aerospike on one of the big public clouds, and the remaining 50 percent of revenues and 60 percent of customers comes from companies deploying Aerospike in their own datacenters or co-los on their own Linux iron.
The way Srinivasan sees it, this is just the beginning of the second golden age of databases, and it took 30 years or so for the relational databases to come into the market and consolidate around a handful.
“But now, thanks to the Internet, the problems have changed so much, especially the user experience and the wide variety of applications that people use daily,” says Srinivasan. “Database technology has to keep up, and we have done a good job at that for Internet companies as well as some of the high-end enterprises. But a lot of enterprises are still in the early stages of migrating to this new level of user experience.”
And that is going to be a lot harder to pull off legacy databases with the number of devices going through the roof at the edge and the need to make real-time decisions at the edge and in the datacenter so it all looks easy and simple to billions of users with increasingly short attention spans. This is just stage one, as far as Aerospike is concerned.