A Decade of Hadoop: Creator Cutting on the Right Place for the Right Time
January 28, 2016 Nicole Hemsoth
Without discounting the extraordinary amount of work and creativity that goes into developing the next big architectural trend, there is something to be said for being at the right place at the right time—and having the perfect blend of the correct skills and experiences to meet the moment.
Hadoop creator, Doug Cutting, had more than a few words to say about the nature of luck, the particular blend of trends and solutions, and the needs of a changing technology and business landscape when he talked with The Next Platform about how the last decade of Hadoop has both shaped—and been shaped by—the wide-ranging, swiftly-moving, and all-encompassing meaning of data in the enterprise.
While much of the work he and his early collaborators at Yahoo and elsewhere focused on the needs of then-emerging Silicon Valley companies like Facebook Twitter, LinkedIn, and many others, the migration of Hadoop from that point to general enterprise datacenters is a by-product of coalescence—a peculiar blend of the right conditions, among them, he admits, pure luck, for the right time.
Of course, luck is a strange word here. Far more valuable in the equation is the ability to foresee how a relatively isolated project could potentially have much farther-reaching impacts. When Cutting and his early collaborator, Mike Cantarella, were working on the Nutch search engine, and shortly after Google’s publication of its MapReduce methods in 2004, Cutting started to see the first glimmer of a larger opportunity—and the possibility of an ecosystem. After all, he says, his open source roots taught him that projects are not developed as comprehensive applications, but rather as components with the expectation that some stitching together of pieces is required, expected even, to create something of broader functionality.
“This assembling of open source components from different open source projects was the context we created Hadoop in,” Cutting tells The Next Platform. “Components would naturally be embedded in other things; these were not standalone things. Pieces were meant to be built into other things. So folks at Facebook and Yahoo began to build an ecosystem around it versus reinventing it and solving again these hard problems—and doing it in a distributed, reliable way. They could build on top of Hadoop to handle that distributed level of the problem and then focus on other specialized aspects.”
The platform evolved from there, but Cutting says aside from recognizing early enough that he and his collaborator would need an institution that could feed them enough data, engineering knowledge, and hardware (which would be Yahoo in the 2005 timeframe), the next major step was Cloudera—the company that took the fledgling Hadoop and saw an opportunity to move it from Silicon Valley to Main Street, as he puts it.
“This move by Cloudera to give Hadoop a life outside of Silicon Valley is important. It’s really hard to imagine now why that was difficult to imagine then. But I came from this tradition of working on search engines and research institutions where we wrote software that was very different and shared few characteristics from what the Fortune 500 and enterprises were using. So much of it was based on relational databases and web companies didn’t really use them a lot. It was hodgepodge, a mix, much of it open source. From these two traditions, it was hard to imagine how this stuff could connect—how this hacker tradition could be embraced in the enterprise.”
Cloudera’s push to build an enterprise-class engine—and that of the subsequent ventures aiming at the same thing (MapR and Hortonworks as the prime examples) had a tough road ahead. “It took an incredible amount of work to build all the connectors to existing software, to add the reliability and usability features that people expect from enterprise software. And it’s really quite amazing where it’s gone.”
“Now we have a situation where people expect enterprise software to be open source. They no longer expect everything to be in a relational database—and these are huge changes in the last ten years.”
“Yes. Absolutely. I was at the right place at the right time.”
“It was entirely fortuitous too that I had the right background. I was working with search engines, I could recognize the value of what Google was doing, had experience in open source—and these things are really chance. It’s chance that at the time we read the Google papers and could then put things together. But there were a number of trends at that moment too that made it all take off.”
At the heart of these trends, and it gets tiresome to use the phrase “big data” too often, is, well, big data. Remember when that was the new thing, the obvious thing, the thing definition-oriented folks argued about from 2008 until the last couple of years? It was a “thing” because it presented a wide array of unsolved problems against what looked like a limitless set of opportunities across a vast range of market segments. It needed its own infrastructure; its own tools, software, systems, engineering prowess. And it needed it to be open source–open to components, additions, extensions. In short, an ecosystem. This played out over a number of years, and while certain technologies took center stage, Hadoop among the most primary, it has, in fact, been a holistic movement–“a seachange” as Cutting said definitively.
As Hadoop emerged from a quiet project inside Yahoo, and with growing ferocity as it moved under the Cloudera banner, and far louder still with the emergence of new distribution vendors adding greater robustness to the platform, so too was business changing rapidly. Businesses that never considered themselves technology driven were suddenly bestowed with a sense that, in fact, being technically aligned was a strategic imperative. Alongside Hadoop, and in some ways mirrored by (and mirroring) it, the concept of the new digital business emerged—or more appropriately, exploded. With such shifts—from healthcare and manufacturing to consumer goods and services, and beyond—data was being generated, tabbed, fed, and processed. And the existing ways of doing that required more of a Google-style (although not scale) approach to distributed computing versus an Oracle-like, relational database approach.
“This is the future of most companies now and that’s a tremendous trend for this century. This transformation of every business becoming a digital business at its core. And data plays a very big role there—it’s about having the capability to store and analyze the data,” Cutting notes, emphatically stressing the culmination in trends that went far beyond business transformation, Hadoop as a platform, changes in how companies thought of open source and their changing expectations. In short, he says, nothing happened independently—all of these things coalesced, fed by, enabled by, and reactive to the existence of new platforms like Hadoop.
Interestingly, for a wildly well-hyped technology set as encompassed by Hadoop, there is a dual meaning to the “right place at the right time” angle. While the last decade, and more thoroughly, the last five years with its wealth of new distributions, use cases, supporters, and detractors, has proven how a project can get off the ground quickly under certain conditions, the technology itself is bound by the same concept. Hadoop is not a pure fit for all workloads, certainly. When it is, and when the integration, security, performance, and efficiency is clear, it is the right tool for the right time. Cutting’s goal for Cloudera—and for the project he spearheaded—is to watch it dip into wider pools as the ecosystem becomes more robust. This next layer to the “right place at the right time” on an adoption, implementation, and technology level is the subject of a piece this afternoon where we weigh that against more concrete data about the nature, health, and viability for Hadoop’s next ten years.
The customer numbers and cluster counts, which we delve into in a great deal more depth with other founding members in the Hadoop community, may not create the sense that hype met the reality, especially in the last five years, but as a technology story, a tale of skill, luck, and seeing an opportunity in the wake of emerging trends, taking a higher level view for the decade of progress seems quite worthwhile.