How To Drive Infrastructure Like Uber Does

There are many things that the hyperscalers and cloud builders have taught enterprise IT to respect more than perhaps it had in years gone by. And this is because operating at scale brings its own sets of challenges and requires a lot more monitoring and management of infrastructure because people costs cannot scale with capacity at these companies or they would go broke. Or, more precisely, they would have never been able to build the vast infrastructure they have to support hundreds of millions to billions of users.

The good news is that even if enterprise IT relied heavily on people and their ingenious scripting capabilities to run the iron and applications in their datacenters, they can benefit from the tooling that these scale pioneers create – provided it is made available either directly through open souring of the tools or through from-scratch development based on imitation of an existing at-scale tool. So it is with M3, a monitoring toolset that is being commercialized by Chronosphere and that was initially created at Uber and that is taking a run at the datacenters of the world.

You Can’t Manage What You Don’t Measure

To many, monitoring is not very exciting, almost an afterthought. But the lesson that needs to be learned from the hyperscalers and the cloud builders is to be absolutely obsessive about monitoring – and to do that as part of the creating of system and application software. This operational telemetry often meets or exceeds the actual load of the application itself, but by having many different points of observation within a system – and we are using the most top-level as well as finely grained meaning of the word system – there is then a chance to better manage that system to make it more resilient and more efficient as well as scale better. This IT game is, after all, about providing a service at the lowest cost and with the lowest amount of grief.

Software engineers that work on distributed computing systems tend to move around, and it gives them a great practical education in the theories on how to do this right. This was certainly the case with Chronosphere co-founders Martin Mao, the company’s chief executive officer, and Rob Skillington, its chief technology officer. Mao got his bachelor’s of engineering from the University of New South Wales in Australia back in 2010, and did internships at both Microsoft and Google before landing a gig at Microsoft working on the cloud version of the Office suite, which we all know as Office365. This is where Mao and Skillington, who got his bachelor’s degree in engineering from the University of Melbourne in 2011, met each other, in fact.

Skillington eventually became the engineering lead at Groupon before coming to car hailing transportation industry transformer Uber as a software engineer where he was made lead on the M3DB metrics infrastructure platform that the company built. Mao eventually went to Amazon Web Services after working at Microsoft, and was the technical lead on running Windows Server on the EC2 compute service as well as on the AWS Systems Manager, the monitoring and management console that AWS use to span all of their services on that public cloud. Mao shared the technical lead job on the M3 stack with Skillington and eventually took over as manager of the developers that created the tool as well as the Site Reliability Engineering (SRE) team that keeps the Uber infrastructure, which runs on AWS and Google Cloud, humming.

To its credit, back in August 2018, Uber open sourced the substantial parts of the M3 monitoring stack, and in this case, it released the M3DB time series database that was created to be a backend for the Prometheus monitoring tool, which itself was inspired by Google’s Borgmon and its Monarch monitoring systems and which has been anointed by Google as the monitoring tool of choice for the Kubernetes container orchestration platform. The open source Prometheus tool, unlike Borgmon, only runs on a single node, and that was not going to work for Uber, and so Mao and Skillington led the creation of the M3DB backend for M3, which gave Prometheus scale. Technically speaking, M3DB is a distributed time series store and reverse index with configurable out-of-order writes. In addition to this back-end data store, Uber also open sourced M3 Coordinator, which is a storage interface that allows for subsets of the billions of datapoints of telemetry that a company like Uber manages not only across its infrastructure but across its applications.

With the popularity of these open source components growing, and enterprises expressing interest in deploying the M3 stack in their datacenters, Mao and Skillington decided to create Chronosphere as a means of extending M3 and providing commercial support for it, and did the rounds to raise capital as one does in Silicon Valley. After securing $11 million in Series A funding, with a bit from Lux Capital and the majority from Greylock Partners – specifically from partner Jerry Chen, who notably ran VMware’s Cloud Foundry and vFabric businesses for a decade – Mao and Skillington uncloaked Chronosphere from stealth and have been ramping up the company and the M3 tools. (Incidentally, governance of the M3 project is split between Uber and Chronosphere at the moment.)

“What we realized as Uber was going through its cloud native transition, going from CPU machines to VMs to containers and from monolithic code to microservices, that the infrastructure and the requirements of the monitoring changed a lot,” Mao tells The Next Platform. “The big thing that we found out as you are making this cloud native migration is that a lot more data gets generated because you are monitoring more and more services, more and more containers – and a lot of this is ephemeral. None of the tools that were available in the market and nothing in open source could really handle the sheer volume and storage of the data that Uber was creating.”

This data explosion is dramatic, as illustrated in the table below, which shows how the metric count explodes as you move from monolithic to services-oriented architecture to microservices for the programming model:

So necessity was the mother of M3 invention. And now, the M3 platform inside of Uber is second only to Google’s Monarch monitoring system in terms of the size of the time series database and the number of datapoints it is ingesting, writing, and reading. We were thinking that other hyperscalers and cloud builders themselves would have more telemetry streaming out of their systems than Uber, and so did Mao and Skillington, but that is not the case.

“That was our thought as well, and we were quite surprised actually,” says Mao. “But in production, only the one inside Google – the internal system there called Monarch – stores and ingests more data than Uber does. Other companies like Amazon, Facebook, Netflix, and so on definitely produce a lot of metrics, for sure. But Uber didn’t just use M3 to monitor the infrastructure and services. Very quickly Uber realized that it had a reliable monitoring tool that could be used to monitor the business operations, and that ended up producing so much more data that than a lot of these other companies.”

Google, Facebook, and others tend to have separate monitoring tools that are pinned to key applications and systems software, too, but Uber, with a much smaller engineering team, had to reuse M3 as many ways as it could. This is another reason why Uber’s M3 stacked up against bigger companies with its monitoring repository and system.

At the moment, Chronosphere is offering M3 as a service, which means it maintains and manages the M3 stack on behalf of customers out on the cloud, just like competitor Datadog, which came out of media giant NewsCorp and which went public last year. The other big player in modern monitoring is Wavefront, which was acquired by VMware back in May 2017.

Chronosphere does not yet offer support for customers who install the open source components in their own datacenter. We suspected at some point, after M3 is more established and Chronosphere has time to do more fit and finish, it will offer an on-premises version of M3, and Mao confirmed that this is indeed the long-term plan without getting into specifics.

Like all open source projects that go commercial, there are features that the paying customers who use the Chronosphere service get that is not available in the open source components. The open source version does not have any governors on its scale, so it can store billions of time series just like the cloud service does, but there is no partitioning of the data across users or tracking of data across users or cost accounting and reporting for the resources being monitored, for instance. The commercial version of M3 has a visualization front end plus the alerting platform and anomaly detection add-ons, too.

“The M3 service automatically detects all of the endpoints and ingests metrics from them, recognizing the formats and creating the dashboards where you have alerts and anomaly detection,” explains Mao. “You can get monitoring up and running with a single click and you don’t have to do any manual configuration of the service.”

As for how much stuff the M3 service can have thrown at it, it doesn’t count just devices but all of the metrics you can pull off the infrastructure, including software services at every level and custom metrics like those that come out of business operations if you want to use M3 like Uber does. (And there is no reason to believe that companies won’t want a monitoring system that unifies infrastructure and business operations. Why should they be separate? Or more precisely, how can they be separate?) Inside of Uber, the M3 tool was able to handle 1 billion datapoint writes per second and 2 billion datapoint reads per second, which just sounds enormous, and the database could hold 12 billion time series, which was many petabytes of data, which is stored in the system for as long as a year.

While not speaking specifically about the M3 configuration inside of Uber, Mao did talk about the scalability that was offered by this monitoring system, which not only scales horizontally for performance (unlike Prometheus by itself) but also for fault tolerance.

The ingestion pipeline and the query front end for the M3 platform are running stateless, either in Kubernetes containers or on bare metal, and their performance can be scaled out horizontally independently of the M3DB time series database, which runs on its own nodes. This database needs a heavier mix of memory than the ingest and query nodes because a portion of the data is cached in main memory to lower latencies on reads and writes. With four or five nodes running the ingest and database workloads, such an M3 cluster might be able to stores millions of time series and handle a few hundreds of thousand writes per second of telemetry; presumably the read performance would be about twice that, given the read/write ratios seen at Uber.

This commercial-grade M3 has come a long way since Mao and Skillington wove together the first version, which used the Cassandra database (out of Facebook) with time series extensions called Date Tiered Compaction Strategy (DTCS) as a data store, Elasticsearch for indexing, and Statsite metric aggregation tool, a C implementation of the Statsd front end (which comes out of boutique retailer Etsy) for the Carbon metrics (used an an ingest front end) and Graphite metric server (which was used as a query front end. Uber broke all of the scalability limits of all three of these open source tools, which is why it started from scratch and created its own time series database and the M3 Collector architecture, which distributes query processing across datacenters and regions as well as nodes in the cluster to boost its performance and to enhance reliability and resiliency.

Importantly, when it moved to its homegrown architecture, it was able to deliver 99.99 percent uptime on the M3 stack and it reduced its hardware spending by a factor of 10X on supporting M3. This becomes significant in that Mao estimates that M3 now consumed than 2 percent of the infrastructure budget at Uber, which implies it would be running on somewhere between 15 percent and 20 percent of the aggregate Uber infrastructure if the old, original M3 system was in place and also that in the early days of the M3 stack, based on those other open source components, it was probably eating a lot of iron.

Chronosphere was founded by Mao and Skillington in May 2019, and had its first customer (not Uber, which manages its own custom M3 installation) booked in September 2019, before it got its venture funding and uncloaked. The company has ten private beta customers who are paying for it, but this is a rapid iteration cycle to do that fit and finish before opening it up to enterprises anywhere.

As for pricing, this gets interesting. Given that this is a time series database, Chronosphere can take in data at very fine-grained resolution, down to nanoseconds, and store lots and lots of data. And the more data that customers store at higher resolution, the more it costs. But the time series database also has encryption to keep the data from getting too capacious as well the capability to dial down the resolution on a dataset or to truncate it after a set period of time to keep the costs from getting out of hand. Another aspect of the pricing model is that it takes into account the amount of reads and writes that are done through the M3 system, and this comes down to how much compute, networking, and storage is used. What Mao says for sure is that M3 is at least an order of magnitude lower in cost than either Datadog or Wavefront, and that is sure to cause some consternation among its modern monitoring competitors.

It would be just like Google to open source a variant of Monarch and really mix things up right about now. Or Microsoft to buy either Datadog or Chronosphere, for that matter. We shall see.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.