For Future Systems, Coordination is the Next Big Bottleneck

For last several decades, large-scale computing, whether for massive supercomputers or distributed enterprise systems, has been engaged in a never-ending game of whack-a-mole. Less colloquially, of chasing down one bottleneck bubble, only to find it revived elsewhere in the system with equal force.

That endless game is still being played, but as the scale of systems grows, those bubbles are spurting up at ever lower levels of the system. One generation’s hardware bottleneck bubbles have become this generation’s set of profound software challenges. But just as new processor, interconnect, and memory development sprung from necessity then, so too are some clever solutions emerging now. This time they’re not from the national supercomputing labs or the vendor community. It’s from the folks who know scale the best—and they don’t even need named here, do they?

From a research and practical perspective, the “generational” issues for future systems have defined the career of Berkeley professor Joe Hellerstein. From his early work on refining database systems in the pre-hyperscale and Hadoop era to, more recently, systems-level research on optimizing the software underbelly of distributed systems (seriously, go take a look), one could fairly say that Hellerstein is one of the foremost experts at playing bottleneck whack-a-mole. In addition to his startup roles at large-scale analytics company, Trifacta, he is still teaching at UC Berkeley, but he’s moved beyond database-centric work to look at wider system-level problems. Much of his research work is encapsulated in teaching practice around what he calls “progressive systems,” which represent that dense combination of computing forces we cover here at The Next Platform—large-scale distributed infrastructure and the many pieces that comprise the platforms that support it.

The concept goes beyond infrastructure at the scale of Google and Amazon. “Whether it’s at the single processor level where there are two cores communicating or at Google scale, when you look at where the real bottlenecks are, they all revolve around coordination,” says Hellerstein. This is not to say that coordination challenges are anything new, but many of the hardware-oriented bottlenecks (for instance, memory bandwidth and overall I/O capabilities on modern systems) are less pronounced and the challenge becomes taking advantage of the hardware efficiently with the highest utilization.

Hellerstein uses one of the most common examples to explain bottlenecks—the freeway system. These are pathways that are built for the capacity for many people to go at 70 mph and, in theory, this should be the case. Actually, there are lane changes, car troubles, accidents—and soon enough, there is not enough coordination between vehicles for anyone to move along the lines of true capacity. “The coordination and feedback from that coordination ripples through the system.”

The question then is, how do we build systems where everything stays in the right lane, where the processors don’t have to talk to each and make sure they’re not stepping on each other’s toes, where the databases don’t have to make sure two people aren’t updating the same record in two different places. This overhead of coordination is what is slowing systems down today.”

That coordination issue is being dealt with in some interesting ways. With the highway theme in mind, the idea is that every participant in the traffic flow could move at capacity if permitted to keep operating. If all the coordination (changing of lanes, so to speak) could happen in the background and update in sync with the rest of the system, things move along nicely. This sounds simple in theory—but we all know why the highway traffic metaphor is so persistent in computer science. Because it’s still essentially an unsolved problem.

The solution for distributed computing is a bit cleaner in that it extracts the human element—and a large degree of unknown, thus the need for constant feedback from coordination Hellerstein speaks of. Imagine rather if all lane changes or accidents could be noted or logged as they happened, while things continued to move along, and all the changes and coordination could happen at once in a sort of global update. There is still a hiccup there, but it creates less of a jam. At the systems level, this becomes a recipe for logs that are updated and shipped to make updates to systems (be they a processor or entire distributed system) and removes the coordination overhead of databases, processors, and applications as they report back to each other throughout the system. The overhead of all of these bits of necessary, but inefficient, coordinative acts is the real bottleneck—and one that gets more complex as the stack continues to evolve.

The concept of progressive computing is to replace decades of traditional computer science (the updating of database records, memory cells on processors, etc.) and instead to take a log approach to coordination. New technologies like Google’s Dataflow, which the search engine giant said earlier this year would bring an end to static databases, and similar in-memory computing approaches at Microsoft and elsewhere are where the next big leaps of progress for system efficiency will come.

Ultimately, Hellerstein, who developed the Bloom language and the CALM theorem, believes that there is a wide class of problems that actually don’t need any coordination—that there is no mole in the system-wide game of whack-a-mole. This is why big infrastructure shops like Google see Dataflow as such a big deal—why use MapReduce or other database approaches with updates (even if they’re fast) if that whole step can simply be avoided?

“Processors are cheap and fast. At global scale, at distributed systems scale, adding another node is not the problem. Networks are fast. Memory and I/O problems are progressing. But systems are not working. They are waiting. If you look at the utilization of systems that becomes clear quickly; Hadoop clusters running at 10 percent utilization? Big datacenters not much better. It’s because of coordination but what we are thinking about with progressive systems is that this does not need to be the case.”

So, why hasn’t any of this trickled down fully yet? Because it takes really clever programmers, for starters. But for the other part of the answer, the other side of Hellerstein’s CALM theorem is that for that class of problems that don’t fit in that camp of candidates to erase the coordination, there is certainly a mole and it will need to be whacked. That is hard to find—and even harder to fix. Generally speaking, some of these problems where even finding a mole to whack is difficult, seem, in theory, to be relatively simple.

Beyond the easier cases of simply filtering through a database, which is nicely parallelized and not coordination-centric, if that case is stepped up one notch, say to count all instances of a particular record, things get trickier. In essence, if you tell the system to find all records about “Hellerstein,” all processors need to go to work on that problem. That part is simple, it’s the checking in that happens, that coordination to ensure that all processors have finished, then agree with one another on the final tally, that the coordination bottleneck rears its ugly head.

This is where approaches like Google’s Dataflow are setting the stage by condensing the coordination into chunks that can be shipped off to the wider system instead of clogging the traffic lanes.If this was of interest, ex-Googler Joe Beda mentions some of these elements in a piece here at The Next Platform where he talks about, well, the next platform at scale and how refinements like Hellerstein describes—and tightening those into the stack—are defining for next-generation systems at scale.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.