After Three Decades, You Can Finally Have A Distributed SQL Database

Without good technology, all the marketing in the world won’t get a company off the ground and keep it in the air, and conversely, without good marketing and sales, all of the technology in the world won’t do it, either. In our many decades of watching the IT sector here at The Next Platform, we have always had this binocular vision, watching technologies and the markets they pursue. We have always talked to both camps: Technology and Marketing.

In recent years, we have spent a bunch of time with the techies at Cockroach Labs, particularly Spencer Kimball, co-founder and chief executive officer at the company, specifically when its distributed SQL relational database, inspired by Google’s spanner, dropped out of stealth in February 2017 and again more recently as Kimball was talking about how CockroachDB could take on relational database incumbents, particularly Oracle, in July 2019.

With the database market faring pretty well even during the pandemic, for reasons that will be obvious in a few minutes, we figure it was time to reach out to the marketing side of Cockroach Labs, which is none other than chief marketing officer Peter Guagenti, who joined the company last year.

Guagenti is an unusual marketing executive in that he has been on both sides of the bargaining table. Guagenti was a webmaster for People magazine as he started off his career way back in the early dot-com boom days when the commercial World Wide Web was shiny and new, and ended up within several Web ad agencies that were acquired by biggies like Razorfish and Omnicom. He then moved to Accenture and Cap Gemini to do product and marketing strategy in the wake of the dot-come bust, came back to Razorfish for a few years to run ad tech with a $120 million book of business, and then jumped into the software product management and marketing roles at Acquia (an early PaaS provider), Nginx (Web application server), Mesosphere (the platform that lost out to Kubernetes), and MemSQL (one of the first distributed SQL database providers and now a competitor).

With Cockroach Labs raising another $160 million with its Series E funding round this month, this is a good time to check in and see what the plan is for the years ahead and to take the pulse of the emerging database set.

Timothy Prickett Morgan: What about databases makes them the place to be right now? I feel it, too, and that is why we did The Next Database Platform event last summer, but I want to hear your answer.

Peter Guagenti: I have been in software startups for the last decade. I was in development from the beginning, and when I got to Nginx, what got me excited was data. If you really think about the evolution of applications, when we first started optimizing business processes and turning them into software, it is all about the data. And then the Web took over in the late 1990s and early 2000s, and it was not really about data. Maybe it was about content and the other things. But these were not really data intensive applications for the most part. They were experiences.

TPM: They were user intensive applications. Lots of people hitting simple things. Which was hard enough and started requiring hyperscale like we have never seen.

Peter Guagenti: User intensive – that’s a great way to describe it. Then we went from user intensive to data intensive with Web 2.0, and things really started to pivot towards data. We saw the rise of Facebook, Google, and others who learned to use data – a lot of data – for competitive advantage. Some would argue to weaponize data. So when I was looking to find a path for myself, where I wanted to be, I became laser focused on data because I felt like this is where the most interesting things are going to happen.

If you look at IT spending data from Gartner’s, data in the broadest sense is the single largest category. They are estimating something around $70 billion in the market in the next few years. And transactional data specifically makes up an estimated two thirds of that.

TPM: What’s the other data, if it’s not transactional? Analytical?

Peter Guagenti: Sort of. I think Gartner looks at it as historical and analytical, together, and operational and transactional, together, as the two general classes of data that are emerging now.

TPM: I thought transactional was a much smaller part of the pie.

Peter Guagenti: Interestingly enough, it is not. With IoT data, there’s a lot of operational data – just analytical sensors and things like that. And most of the transactional stuff we just don’t think about because it just happens. All the banking systems, all the CRM systems, every commerce and retail application. It is the stuff we expect to just work.

TPM: I started out on IBM mainframes and IBM AS/400 minicomputers, systems of record that were storing transactional data, and then I watched over the decades as this transactional world became analytical. It became data warehousing or OLAP serving, which was an admission that there was not enough money in the world to do this work on a mainframe or a minicomputer. I have been under the impression that the gap between transactional and analytical data was getting larger and larger and that transactional stuff was this super thin, very expensive layer.

Peter Guagenti: It’s actually going the other way. It’s going quite the other way.

TPM: I can’t be the only one with this misconception, so I’m glad we’re having this conversation.

Peter Guagenti: You’re definitely not the only one. I think it’s one of those things where we all think about data as this monolithic part of the IT organization. But we don’t think about the nuances associated with data or we don’t think about the importance of it. If your access to a deployment tool goes down, for instance, you have to do things manually, but it is not the end of the world, right? If your database goes down, it is the end of your world. It is. absolutely a problem.

We are at this interesting convergence of circumstances. Everything is shifting to the cloud, and so every organization in the world is rethinking its entire stack. Digital transformation has been dramatically accelerated thanks to the coronavirus pandemic – what we saw in the last nine months is as big a piece of change in the digital realm as the last nine years of transformation. Data is the heart of every business and application now. We have always had transactional and analytical data inside of organizations. We have also had the rise of the Web, just app data – think about stuff that would live in a NoSQL database. That is really just about that user intensive application we were talking about above.

As the cloud emerged, we realized that the cloud has always been a calculus for a CIO around risk versus reward. If they’re going to get cost savings and accelerated time to market there, they’ll look at everything that they’re running and say if, when, and how should I move to the cloud.

TPM: I’m not sure they get cost savings in the long run. They pay more per unit of compute time and more per unit of storage. But they also don’t have to overprovision and if the budgets are the same after you add it all up, on premises versus cloud, and they get agility in the bargain, then they do it. And if you have to pay a little bit more, you would still do it.

Peter Guagenti: If you are one of the world’s biggest banks – pick one – that operates datacenters on every continent and you’re really good at it, cloud is not a place. It’s a capability. Some of, then, it is other people’s servers and some of it’s your own servers. I think for most people, though, they’re looking at this and they want the flexibility of cloud.  That’s really what they’re looking.

TPM: Or so in some cases, and I have done this myself around my own infrastructure for after running it for ten years, they come to a day and they think: “I don’t want to do this any more.” This is not what I do. At that point, even with a higher cost, the question becomes: Is it riskier to run your own datacenter than it is to use theirs? It wasn’t hard for me to figure out that it was, and when the costs aligned, I didn’t make a decision at all. The decision made me.

Peter Guagenti: That’s a great point. If you are one of the big banks, actually datacenters are part of your competitive advantage. You think about algorithmic trading systems and things like that. They have computer scientists and engineers that rival the wizards that Google has, and they probably make more money than Google pays. In this case, the datacenter is part of your core business model. But if you’re if you’re a big manufacturer or a mid-sized manufacturer or retailer, you’re bad at it already by comparison, so why would you keep going?

So in the data world, app data went to the cloud first in the early 2000s, and it was apps on AWS attaching to S3 buckets. The second wave was analytical data because it had a limited number of users, limited amount of concurrency, was super costly to maintain on premises because of the size of the data. So we saw Redshift explode, we saw BigQuery explode. And then you eventually see Snowflake, which is sort of the poster child of moving analytical data to the cloud because it works in streaming it, it works on all the stuff you really care about. And if you are a CIO and you have got all of these legacy systems like Teradata or Netezza that are all end of life, it was a no brainer to move to these data services because of the limited number of users. And as soon as Snowflake come out, they could have multicloud and they weren’t handcuffed to a vendor like they were to Oracle. They could play each of these clouds against each other.

Transactional data was the one thing that hasn’t moved. And why this is the case is an important question. Think about all of the systems you know about. A lot of stuff still lives on the mainframe because mainframes are fast and they work. But when you talk to some of the big banks – and I encourage you to do so – they are running up against the limited of mainframes. They are literally running out of space to hold data and still have applications be performant. There’s an interesting thing that’s going to happen in the next five to ten years, and that is that the biggest champions of mainframes are now looking at them and saying these things aren’t going to work much longer.

TPM: Come on, they can just Parallel Sysplex a whole bunch of them together and it will all just make a giant clustered database. [Laughter] I’ve done the math on it. You could – oh, wait. Nevermind, that would be a $100 million, or maybe $250 million. You realize I am joking. If this were true, Google would be running on an IBM Parallel Googleplex.

Peter Guagenti: If you are a large retail bank with business on four continents, this isn’t going to work. At some point, you have milked everything you possibly could out of that mainframe investment. Retail and commerce are exploding and the amount of transaction data is exploding because instead of having a bunch of distributors they are selling direct to consumer. And, with that there’s an expectation of zero latency.

TPM: Well, 200 milliseconds, not zero, which is the precise attention span and patience of a human being these days.

Peter Guagenti: I think you are being generous at 200 milliseconds [Laughter]. Perhaps the amount of time we take to blink.

TPM: So let me ask this. There’s an awful lot of transactional data in Oracle, DB2, SQL Server, and such. Is that your target? Do you think you can move that stuff, whether internally on premises or in the cloud? How are you going to how are you going to conquer these guys and get their money?

Peter Guagenti: Is it our target? I’d say yes and no. And it’s not our focus, but it is our customers’ focus. Let me explain.

There are really three types of cloud applications. There’s wholly greenfield, which is actually probably still the majority of stuff that’s happening. We have gone through more change in the last 20 years than probably happened since the start of the Industrial Revolution. And we have to be honest with ourselves, about how much society has changed business. We are at least as transformative now as the invention of the steam engine was.

I don’t talk to a single Fortune 50 company that doesn’t have 30 new strategic initiatives on their agenda, and they are all going to include new applications. That’s our wheelhouse. But those new applications have the same demands and same requirements as the other mission critical systems that already operate in business because they are usually hyperconnected. We may be breaking up all these monoliths, moving to more microservices or even – heaven forbid we wouldn’t call it this any more – a services-oriented architecture. So for these apps, they are looking for something that is cloud native and modern, but they want something that is going to have the same consistency and reliability and durability that they’re getting from the traditional systems they have been working with for decades.

The second thing that is happening is modernization. A lot of the stuff that is running on Oracle, DB2, or SQL server – let’s be honest – was written around the time of Web 1.0 or Web 2.0.

TPM: Some of it was written when I was a baby. I remember SQL Server 200 being the OLAP server of choice for a lot of enterprises, including those that were absolutely and completely dedicated to IBM’s AS/400s or mainframes for their mission critical applications. It cost a lot less money to dump that on Windows Server, SQL Server, and a cheap X86 server and who cares if it crashed every once in a while. It cost one-fifth as much to acquire and run it.

Peter Guagenti: The reality is, even on these legacy platforms, that stuff is constantly changing because of its importance to the business. It is always under modification. So what we see a lot of now is not just greenfield, but app modernization. Take a large manufacturer selling direct to consumer instead of just going through distributors, I have to put demands on that back-end system, but those systems aren’t going to hold up the way I want them to. I’m going to have to modernize that app and then layer some new things in on top of it. And that’s when they start taking a hard look at things like Oracle and DB2 too, and say, well, why would I keep building on top of this thing? Why don’t I start shrinking it when I start shrinking the workloads that run against it – not changing the inventory management system that works perfectly fine in the warehouse today. Because of why go and mess with that. CIOs are great at portfolio management. Where do I invest, where do I de-invest, where do I keep stable? And so a lot of what happens in the database, they look and say, well, I’m going to keep this stable. It’s fine, it’s not broken. We’re going to leave it alone. But these new demands are going to require new systems because it’s not going to work well.

And that’s where I think we’re seeing this change to this third wave of database technology, which are these cloud native relational SQL distributed databases. It’s not just CockroachDB. It’s Google Spanner. It’s Microsoft CosmosDB, and others.

There’s a lot of innovation happening in the space right now because we love these transactional systems, but they were only scale up. Scale out was a nightmare. We all go dabble in NoSQL, NoSQL works really great for our app data. Why don’t we try for these other things? The we realized this was an absolute wrong fit, it does not work. The whole company that speaks English (SQL) and now we’re expecting everybody to learn Esperanto (NoSQL) just because somebody decided that was the thing. So it has come full circle. We still want relational, we still want SQL, but we want scale out, not scale up, and we want resiliency.

TPM: Just like we had to go through server virtualization to get to containers, maybe we needed to go through NoSQL to get to get to distributed SQL? We didn’t know how to do it. And there were a lot of people that were trying, but there were always compromises made with NoSQL. In the long run, I don’t believe that there is going to be a place for NoSQL in the long run.

Peter Guagenti: I would challenge that for lightweight app use cases where you’re just working with objects. NoSQL actually works very well. But there is also an argument to be made around to just use simple storage as opposed to a data store. As somebody who came from the experience side, that is a lot easier to work with because it’s a bucket, right? I mean, the way I think about it is it’s like it’s like sorting things on the floor.

TPM: The stuff I would say MongoDB is probably the ideal thing for?

Peter Guagenti: I would agree.

TPM: MongoDB thinks most of the data that is being generated in the world can be stored in a document and rightfully should be and will be, and everything else will be stored in something like CockroachDB. They don’t claim to want to take over the whole world.

Peter Guagenti: They do, actually, but I think they may regret the decision. They made that same argument when they first launched, if you remember. MongoDB’s developer relations people between 2010 and 2012 were notorious for talking about all of the things that could replace. And then people try because they were advocates and evangelists, and the data was inconsistent and they lost things. And so then they pulled back.

TPM: With distributed SQL systems actually available and scalable and relatively affordable, are we done now?

Peter Guagenti: It is difficult to say because you don’t know what you don’t know. But I do think we’ve been heading this way for 25 years. This concept has been building for a long time. I think we’re finally at the level of computer science maturity and equipment maturity that we can actually make it real. So I think we’re going to I think we’re going to start seeing some settling around this, because this is what the current state of the hardware we operate really is optimized for.

TPM: Given all of this change, how is business at Cockroach Labs?

Peter Guagenti: Our business has been rapidly accelerating thanks to these macro trends and the application modernization that COVID-19 forced. Take home improvement retailers, which have been great win for us. What used to look like the best day in a year for their home delivery orders became their business every day. And the one thing that failed for all of them was their data infrastructure because as all of this load increased on the transactional systems and scale up and database sharding wasn’t going to work.

I joined Cockroach Labs the third week in March 2020 from MemSQL. And so I was the first employee of the company onboarded remotely. And I remember us pausing as an exec team and saying, we don’t know what’s going to happen. We just don’t know. You need to just be patient and keep supporting our customers and keep doing what we can to help them out. Fast forward ten months and we have had explosive growth.

We are a private company and we don’t disclose numbers for a lot of stuff, but we have more than doubled our customer base. And in the last year, we’ve doubled our revenue and we have seen a fourfold increase in our in our cloud usage. More than half of our customer base is now running in the Cockroach Cloud. We have seen 2X renewal expansion within our existing customer.

TPM: Those are the kind of curves that Nutanix and Pure Storage were on a decade ago. Can you do me a favor, though? Could you make money before you get to ten years out? [Laughter]

Peter Guagenti: Well, this is always the game. And this is why we raised $355.1 million so far. Having a more than $2 billion valuation is pretty, pretty impressive for a company of our size. But the reality is, no, we won’t be profitable because look at who we going up against and what are we replacing. One of the reports I read estimated that a third of all IT spending went to data and data systems. A third. So we are going to keep raising money and keep doubling down on R&D because that’s the size of the market we’re chasing.

The nice thing about a database business – and you know this well – is that we could be in 100th place and still be $1 billion revenue company. But we don’t want to be in 100th place. We want to be one of the brands that is the default go-to if you’re a developer focused on transactional data. And so that takes a lot of R&D investment to earn that level of trust and respect. That’s what we’re focused on.

AWS
Vendor Voice - High Performance Computing on Amazon Web Services

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

2 Comments

  1. What? Juniors havent heard of uniface. Uniface emerged in 1990, and was a polyserver poly client structure which could perform SQL across cirm (cobol), Oracle, sybase(now mssql) and a few more. A table on a oracle server in the usa could be joined with a table on a sybase server in Amsterdam. So yes you’ve re invented the wheel but still not as flexible as uniface was/is.

    • Fair enough. But, my point is–the fact that I have not heard of them meant that despite the good technology, they didn’t do something right or we would have.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.