The Yahoo Behind Fresh Deep Learning Approaches at Flickr
September 3, 2015 Nicole Hemsoth
There are few more interesting trends in infrastructure right now than the pairing between high performance computing hardware and the software tapestries of deep neural networks. And it’s not simply that it is all relatively new, but rather, it is because the range of tools on both fronts is shifting–moving away from hardware and software platforms that have outlived their efficiencies or scalability limits.
When The Next Platform talked to Yahoo earlier in the year about the very issue of outpacing existing technologies, it was a matter of growing hardware infrastructure to meet data demands. And as the company expands its services for divisions like the photo-centric Flickr service, it is pulling the best of several worlds in both hardware and software to cobble together yet another platform that keep pace at scale and delivers on the ever-growing need for real-time user services.
What’s interesting about Flickr’s approach is that they are leveraging as many “in house” tools to get the near-real-time service to sit up and hit the latency and availability levels users expect. For Yahoo, which has been the center of Hadoop development since the beginning, this means tapping into MapReduce frameworks and continuing to build out from all of the components that are part of that ecosystem (even if not part of Hadoop proper). This means cobbling together a framework that leverages Storm for the real-time needs, HBase for ultra-fast queries, MapReduce to meld when large-scale computations are needed, and a Lamba architecture to tie it all together. In short, even though there might be some novel deep learning approaches for the entire image recognition and analysis pipeline used at other companies, there is something of an implicit, internal political motivation for Yahoo and Flickr engineers to use their own tools to build on their own land.
When asked about how the Flickr team is creating a new path to deep learning algorithms by using tools that were not necessarily designed for the job (for instance, Hadoop and the idea of “real time” do not necessarily go hand in hand), Bhautik Joshi, Flickr’s Data Scientist and Senior Software Engineer, tells us that a lot of the model training and complex deep learning algorithms are built into the computer vision pipeline, which only handles a relatively thin (but obviously critical) slice of the overall task. The real story here is what they have been able to do with established open source data analytics-oriented platforms and tooling. In short, they have developed a very customized, but entirely operational real-time deep learning-based framework to rival other large companies, who arguably, have been at this for quite a bit longer—and with the flexibility to add and ditch tools as new things became available (without being tied to a specific set of tools as in the case of Yahoo with its Hadoop affinity).
And to give a sense of scale here, consider that the working data sizes for Flickr’s archive of photos (they retain the original, full JPG files for quality) is around 40 petabytes. That is not insignificant, of course, and while it has been pared down to a working, compressed dataset that is roughly 3 petabytes, meshing and tailoring models is an undertaking. And although it might not be what the Googles and Facebooks of the world are doing, Joshi says that this close coupling of 400 nodes in a datacenter closely integrated with a Ceph-based storage stack and Hadoop to seamlessly mesh, store, and compute in near proximity is the only thing that makes sense. It doesn’t hurt that Hadoop is a pet project at Yahoo, but Joshi says that given their internal expertise in that area (and related spin-off projects like Storm, HBase, and the rest of that Hadoop-oriented stack) made this a clear choice.
With all of this in mind, is it time to stop thinking in batch and see what role Hadoop might play in the large-scale serving of a deep learning process for petabyte-scale user base? As you might imagine, Yahoo thinks so….
New Architectures On Established Software Bridges
Back in 2013, the first onset of buzz around near real-time image auto-tagging and classification hit the mainstream with Google’s approach, which was based on technology developed by Dr. Geoffrey Hinton, who had recently demonstrated impressive results in the ImageNet computer vision competition. Since that time, there have been expanded efforts in computer vision and image recognition at Facebook (which classifies 2 billion images per day in near-real time), Microsoft, and others, all of which leverage neural networks at the core of the model training approach, which can be spun out to run on other hardware using more accessible, common execution models that run on commodity clusters.
What is missing, at least from what know about how Facebook, Google, and others are serving up the products of those neural networks, is that Hadoop and the related ecosystem tools (HBase, Storm, etc.) are not at play. So what value does Yahoo and its companion engineering base at Flickr see in this addition?
In the case of Flickr this week, that new class of real-time demands is driven by a new capability within Flickr that provides almost instant image classification. A new service for users called “MagicView” (because hey, all of this stuff happens by magic, right?) allows for instant categorization of uploaded photos. So, in other words, a user uploads a photo of a sunset, and lo and behold, it is classified as such. This sounds simple in theory, but as we have talked about in previous interviews with deep learning leaders like Yann LeCun and others, the training and development of image recognition systems is no simple task—and as Joshi told us this week, getting that to move from trained models to production systems (not to mention the various refreshes, re-merges, and bug fixes) is not easy either.
And while indeed, companies like Google and Facebook are taking entirely different (arguably more experimental) deep learning approaches to their real-time image recognition and classification tasks, the story here is that Yahoo and Flickr are proving early success by doing things another way. The neural nets are there at the outset, powered by that quiet acquisition Yahoo made in 2013 of a small company called IQ Engines, which is now the backend computer vision pipeline that ingests the corpus of Flickr’s archived and incoming streams to work through the initial training and base classification. The challenging part infrastructure-wise comes after this base operation—and requires an evolving round of updates and meshing to remain on top of continuous uploads.
As Peter Welch, Principal Software Development Engineer at Yahoo, tells The Next Platform, that corpus is not just one comprised of all the billions of photos either. “Each user has his or her own corpus of photos, which can be further classified and characterized. Pictures of children, pets, landscapes—all tailored, all available in real time. It’s a major challenge, in part too because the data being served is so broad,” explains Joshi. “At the highest level, and going all the way back we had never done anything of this scope in terms of backfill before—the initial backfill was everything from as far back as 2004.”
His team went back through 10 billion photo archive and using the IQ Engine stack (based on Python and C++) they backfilled computer vision tags for everything, then turned that exact same code base around to run on a different stack, powered by Storm and HBase to run real-time streams with nearly instant classification on incoming photos. This approach works well when they are handling tens of millions of streaming photos, but for large-scale bug fixes or revamps of the base, the Flickr team turns to Map Reduce to mesh and reintegrate before rolling out a finer tuned method of classification.
One of the interesting innovations that Flickr team developed and has been testing since April and May of this year is using the Lambda architecture in conjunction with HBase (which they had “on hand” given the Hadoop approach) and combine their database approach for a much faster and more efficient approach. While you can read about it in detail here, the idea is to have the bulk data in one stream, the real-time in another, and get a consistent result by putting both in HBase and have a separate process that lets them query the same data but from one location. Again, the same Flickr engineers we talked to have an in-depth view of the architecture backing MagicView.
At the crux of this is HBase and its associated heavy region servers that are backed by HDFS. While (interestingly) the team has decided not to use YARN for reasons they say will take more extensive explanations from the Hadoop team at Yahoo, the value proposition is simple.
The training set is a definite iterative art, but when it comes to merging and implementing a service like this, the Flickr team with its cluster running MapReduce and communicating through the HBase API can handle bulk loads across its 20 region servers with throughput around 400,000 writes per second, which, if we may editorialize for a moment, is damned impressive. At its Nebraska datacenter, where Flickr has 10 40-node racks, they cook their CPUs at near full utilization and keep iterating over and again on that 12 billion photo set to keep upgrading and refining the computer vision tags.
The point is, as both Welch and Joshi described, Hadoop, HBase, Storm, and of course, the HDFS that backs it all are still under continuous development and while they see how there are other approaches to delivering complex models as user services, there is still a lot of life and potential left in the elephant yet for real-time jobs Hadoop was never designed to tackle.
A working code sample of the simplified lambda architecture that you can download and try for yourself can be found here.