Google Wants Cloud Services Platform To Borg Your Datacenter

It has been our position from the beginning, when Google first open sourced the Kubernetes container controller, that it wanted for this to be the controller that ruled the datacenter. Having created its own Borg and Omega controllers, which do all kinds of provisioning and orchestration of containerized applications that sit behind its search engine, advertising, and other businesses, Google started from scratch and created Kubernetes to not only get a fresh start, but to let others contribute to the effort and build a software stack quicker than it could do all by itself.

The effort has been wildly successful on many measures, and there are many system software stacks that have Kubernetes at their heart now – and some of them are unexpected, and that is just a measure of how pervasive Kubernetes and the Docker container format has become. But Google has a longer game than just getting Kubernetes out there and enthusiastically co-developed with the community and adopted by enterprises. Google wants to create and support, for a subscription fee that makes money, a full Kubernetes stack. This stack is called Cloud Services Platform, or CSP for short, and it went into alpha testing with a select number of early adopters last summer. That alpha has now been completed, and Google is moving this Kubernetes stack into a more open beta period, working towards a day when it will be generally available and a formidable alternative to other on-premises Kubernetes stacks and absolutely compatible with the Kubernetes services that run on the Google Cloud Platform public cloud.

We had a chat with Urs Hölzle, a Google Fellow and senior vice president of technical infrastructure at the search engine, advertising, and video streaming giant that is also one of the big three public cloud providers, second only to Amazon Web Services and Microsoft Azure. Cloud Services Platform – and why could they not give it a fun name like Kooble? – could be one of the most significant software products that Google has ever put together and it will be giving Red Hat, Microsoft, and VMware a run for the budget dollars as enterprises consider their hybrid cloud strategies.

Timothy Prickett Morgan: These transitions in the datacenter take time, which is a blessing as much as a curse because enterprises cannot absorb change and take risks like hyperscalers and cloud builders pretty much have to in order to stay in business. It has been the role of software vendors like Red Hat to take the risk out of using open source systems software. And it looks like this time around, with the container revolution, Google wants to be a stack player in its own right.

Urs Hölzle: As you know, Cloud Services Platform is our hybrid stack. We have been working on it for quite a while, starting with Kubernetes, and then a bunch of other hybrid things that we’ve done such as Stackdriver, Apigee, Orbitera, now Istio as an open source tool. The premise is that CSP makes hybrid normal, natural, not a weird state. It’s a normal state and it lets you go “cloud native” on premise. So you can use many of the same benefits of cloud in terms of better securing your services, monitoring applications, and doing continuous integration and continuous deployment, all in containers obviously, both on premise and on Google Cloud Platform.

Now, we are announcing the beta of CSP. You know we went to alpha release in August, and the alpha has had enormous interest. We’ve had literally thousands of customers, many of them very large companies, interested and kicking the tires. We have had one of the most successful alphas in the history of GCP with CSP, and we have gotten very, very strong feedback from customers and have made very good progress in the maturity of the system, which we are now ready to put it into beta.

TPM: So did you have a bunch of Google Cloud Platform customers who then started kicking the tires on Cloud Services Platform, or did this bring in a wide net of people who weren’t already on Google Cloud Platform and they decided to start on premises?

Urs Hölzle: Our Early Access Program started with GCP customers in the early parts of the alpha test just because we know them and we give them preference. But we have seen a lot of interest from large customers who are not currently GCP customers or at least aren’t large GCP customers. They are kicking the tires a little bit on GCP but not much. And the interest really comes all of the large companies, which still have 90 percent of their workloads running on premises.

They realize that they need to modernize, they realize that they will eventually move to the cloud but they don’t want to have all of these things tied together into one step. And also they don’t want to make a single vendor bet. And so our pitch has been from the beginning that CSP, because it is based on Kubernetes and Istio, is as safe a bet as Linux is.

If you choose Linux, you can pick what runs underneath it – Dell, Hewlett Packard, or GCP – and you can pick what is running on top, such as MySQL or Oracle. They all work. Because CSP is based on Kubernetes, Istio, Knative, and so forth, it gives you all that functionality in a managed form but of the core of the stack, all of the bits that are running, are actually open source so you could take it out and run it yourself. So you have an exit path and also you have this very open source set of APIs that is very broadly adopted by many players – you know IBM, Red Hat, et cetera – in the industry and therefore is a safe bet for the next two decades. In the previous two decades, Linux and Windows Server were your two choices that you had pretty much.

What those large companies are really looking is a new stack to standardize on. Even when they don’t have any intention to move to the cloud right now or if they do not want move everything to the cloud, they really want something that’s open and that’s consistent so they can train their teams once and then apply those teams on premise or in the cloud and know it’s the same experience. So you don’t have to have two sets of teams and two ways of doing everything.

TPM: Will we end up with two stacks when it’s all done? Something based on the Kubernetes stack on Linux on public and private clouds on one side and then Microsoft with Windows Server and Azure Stack and the Azure cloud on the other? It is hard for me for me to imagine that the Windows Server base just goes away, even if it is shrinking slowly.

Urs Hölzle: Yes, I think that’s quite likely. People sometimes ask me if every platform reduces down to two players, and if that happens, then what is GCP doing as number three in the public cloud. And my answer to that is you’re asking the wrong question.

It’s not AWS versus Azure versus GCP versus Alibaba versus IBM. These are not stacks. These are clouds. And the two stacks have not emerged fully, but we certainly believe in the open source one that is emerging. And you can argue that in the Windows base, it has emerged with Azure Stack. Those are the two most likely to last for the next 20 years. And I would argue that CSP has a good chance to be the majority choice in in those next 20 years just because Linux has grown quite a bit along with the whole open source ecosystem, and CSP plays right into that ecosystem.

TPM: Can CSP be a substrate that can run across all or most of the public clouds and private clouds together?

Urs Hölzle: Theoretically, yes, because Kubernetes, Istio, Knative – all of these things are fully open source, so you can actually run them yourself.

TPM: Ah, but that wasn’t the question. My question was: Can you, a Google, be the universal provider and provide that comfort level across all of those platforms with this Kubernetes stack?

Urs Hölzle: I think we could. Right now, what we’re focusing on is really GCP plus on premises. It’s hard enough to get this to work really, really seamlessly. That’s what we’re working on right now, that’s what the beta is and what the alpha has been. But from a technical perspective, there’s no reason CSP couldn’t work on AWS or Azure because they have Linux VMs and they run Kubernetes, so it ought to work. But it’s definitely not something that we support today and it’s also not something that we focus on with this alpha and beta period.

TPM: It is still early days, but clearly it is not stupid to think of Google as positioning CSP as a means to avoid vendor lock in, and not because all of these open source things are available, because at that point you are relying on your own team or someone else to put it together. But if Google supports CSP on common virtualized environments in the enterprise and on all of the major public clouds, you can help enterprises manage it, keep it consistent, and then say, “Okay, AWS. What’s your move?”

Urs Hölzle: That is definitely not what we’re saying right now. Again, from a technical perspective there is no reason why this wouldn’t work. We haven’t tried it, so we may run into reasons why it might not work, but theoretically speaking if you use VMware on premise then you know it doesn’t look that different from AWS or Azure as a sea of VMs.

TPM: So you’re coming out of the alpha and today going into beta. How long do you think before CSP is generally available and has support contracts from Google?

Urs Hölzle: We have our stability requirements for GA, and we take GA seriously. Sometimes our customers tell us our beta is better than other people’s GA. The beta is pretty fast after the alpha, and that’s a good sign. It’s a big complicated system and our feedback has been very, very good from the alpha, which is why we’re confident to put it into beta now. GA depends on the broader use cases we will see with the open beta, and we will see what the feedback is both feature-wise and stability-wise before we go into fully supported GA.

The other prerequisite that we really need for a GA is a full certification process for the on premises infrastructure. We can’t we can’t support something GA in unknown environments. Our goal is for CSP to be only on certified platforms, but to be reasonably broad right now the beta runs on the VMware because that’s the most broadly available enterprise environment. So that’s why we picked it. But it’s not deeply dependent on that but really for on premise support to work we know it has to be coming with constraints so that we know when there is a problem that we can replicate it in a test environment, and it’s also an environment that we actually fully know.

TPM: Are you going to offer it as bare metal as well at some point, and can you?

Urs Hölzle: It is not the plan. So currently CSP assumes there is something that provides the VMs for Kubernetes. We are not solving the infrastructure management problem at the lowest level. We are assuming that you have an existing solution and we are going to let you choose that to the extent that we can. There’s lots of companies that have VMware or Red Hat infrastructure tools, and we feel that that’s the customer’s choice. They provision the cluster, so to speak. and we’re really dealing with and managing the bits inside.

TPM: So then you can ride on off the certifications of whatever provisioning tools and virtualization layers companies already have and that you certify against.

Urs Hölzle: To some extent when it comes to hardware, yes. We still have to worry about that container OS and the other bits and maybe performance benchmarks and other things like that just to make sure that companies can have a good experience in a certified environment. The hardware that you operate CSP upon has to be reasonably new.

TPM: What is your what is your revenue model for this thing? What does Google get from this?

Urs Hölzle: It’s a licensed product. So it’s a per node license model similar to that revenue structure that we have for a GKE.

TPM: Will system vendors and other channel partners sell it for you? Will it be like Red Hat in the early days, when all the system vendors got behind it?

Urs Hölzle: We are not announcing that yet, and there is not a channel for the beta. And that really is coupled to all those certifications that need to be done. We will definitely announce that at GA because that’s an important part of CSP working.

TPM: When you look out ahead, you have a certain number of GCP customers and there are plenty more to chase. Google is being aggressive with your datacenter rollouts this year. You want to be more competitive in getting customers who have not yet moved an appreciable number of their applications to the cloud to pick Google Cloud Platform. This is I think a good lever to do so. Microsoft has been – I won’t say brilliant but smart – about doing the same thing with the Windows base. They got people using Active Directory on the cloud without even knowing it. But the question I have now for Google is, how does this all map out? Do you envision for the short term you might have as much capacity on premises at customer sites than you have on GCP, and then over time it shifts back and forth depending on their moods and data sovereignty issues or whatever? How is this thing going to grow, and how will its shape change?

Urs Hölzle: It’s a good question we haven’t really looked at it from this sort of infrastructure perspective because this is really a software stack and are our goal is to win and be the default software stack for the next 20 years. That’s really what we want to do. You know we want everyone to conclude that that this is as safe choice says as Linux.

And the adoption will actually take a while, right. The reason why 90 percent of workloads are on premise is not because they’re all not suitable for the cloud or not suitable for containerization, et cetera. It’s just that it’s a lot of work and also requires a lot of training first. Most large enterprises are not actually able to do that today. They have too few trained people. And one of the big advantages of having a stack like CSP that is open and ubiquitous is that that now means you can really go and invest in training because you know what you people want to train your people on and at that point you have not had to go and have exactly everything in your cloud migration strategy figured out.

It is about mindshare with the developers, and mindshare with the CIOs and the CFOs who want an alternative that’s not vendor specific because they want to be in a safe position for the next 10 to 20 years.

If CSP is broadly deployed atop on premises infrastructure, it will be a significant upgrade to their security posture. But it is going to take a while because it really means companies have got to go containerize their binaries, and hopefully we can get that down to a few hours per binary. But it is still it’s going to be a fair amount of work. The training is going to be a fair amount of work and actually a bottleneck to the speed of the migration. And whether that then turns out that most people go to the cloud immediately once they’ve done this or whether they economic reasons to stay on premise, it is hard to tell. I think that’s a good problem to have because it means the stack has been established right. So certainly if we had more and more cores under management at on premises clouds than in the public cloud, that would still be a huge success. And to some extent given that the 90 percent of workloads are on premises that could happen at some point right when it’s kind of peak adoption but not yet peak migration to the cloud.

TPM: This is what I’m anticipating. But if you said, “No, that’s absolutely not going to happen,” then I would have to pay attention.

Urs Hölzle: The thing is, 90 percent of applications still on premises is huge. Maybe 20 percent to 30 percent of them are actually really hard to convert because they are mainframe this and COBOL that. Our roadmap is really about getting this to GA, establishing an open stack, and using the success of Kubernetes – really this is in a way Kubernetes 3.0, right? Kubernetes now can run your stuff anywhere, and your stuff is a service, and there are ways to secure, monitor, and deploy your services that are containerized. Or actually that are not containerized. We have people using CSP without the full containerization and it works.

TPM: Was getting to CSP the plan all along when you opened up Kubernetes almost five years ago? When you started Kubernetes way back in the dawn of time, my understanding was that not everybody at Google was sure that that was the right thing to do. Some were, others weren’t.

Urs Hölzle: I would say “plan” is the strong word. Hope is also a four letter word.

TPM: It’s one of the good ones, though.

Urs Hölzle: The hope definitely was that in the long run we could help create a software stack and this was not about hosting, so to speak. There’s a really huge difference between the two. If cloud is just better hosting of applications that looks a bit more or less the same way on premises – they have subnets and they have VMs and they have to patch OSes and all of that stuff – if that’s the only thing that cloud brings, then that really will be a disappointment because your people overhead and your technology problems are going to look pretty similar to what they looked in the past decade. I think plan and vision are strong statements because it means you had a complete map of how these things will work out. Hope basically says while you were definitely hoping that it would evolve into that, but you haven’t figured out all of the steps. So I think I claim hope.

TPM: How close is CSP to the experience that developers and administrators inside of Google, using its internal systems like Borg, have?

Urs Hölzle: You know, CSP is actually better in some aspects, like the way services are described, managed, and secured is significantly cleaner than what we’re using in Borg. So that’s the one thing that actually if you look at it and you say you know do I have this in Borg, you say no I wish I had this.

TPM: So at what point do you throw out Borg and start using CSP internally?

Urs Hölzle: It’s possible. We have definitely had discussions of whether we want to port CSP to Borg. And in your question about cross-cloud – should we port this to AWS, et cetera – this might actually be something we would do first to test that you could have a large underlying cloud and have a consistent experience. Test that idea in-house. But that’s speculation at this point.

TPM: Well, that is my favoritest thing.

Urs Hölzle: Speculation. I see.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.