The arrival of augmented reality for the masses is imminent with Google Glass and other competing devices, but in many respects, the makers of these have put the proverbial cart before the horse. There is still a long wait ahead as the infrastructure backbone straightens enough to back the diverse array of workloads that run in tandem in near real-time. And it’s not just a matter of application developers getting on board—the shortfalls begin in the datacenter.
According to Chris Rossbach, a Microsoft Research veteran who has been working on virtual and augmented reality systems for years (including on the early team that developed the Kinect), the “devil is in the details” when it comes to widespread adoption of augmented reality. Now with VMware Research, he and his team are exploring just how deep the chasm is between real use of the technology set for augmented reality and actual implemented reality.
“Even if the devices that give us what we imagine is an augmented reality experience are there, which they are, most of the rest of the system stack is not. A lot of this is far harder to build than people realize. And while everyone knows what augmented reality is, no one can really define it, we are all just working from the devices and now toward building that interactive, immersive blend of the real and virtual worlds. But there’s a lot missing there.”
At the core level, there is a dramatic need for diverse processing elements that complement the CPU. In fact, with many of these ideally configured systems (as with other deep learning machines) the CPU is more of an orchestra conductor, handling data movement rather than doing a great deal of number crunching. Instead, multicore and specialized parallel processors are tasked with a majority of the recognition and processing, which makes these devices fast, but complicates the programmatic and hardware environments a great deal.
As Rossbach told The Next Platform, “All of this missing distributed infrastructure to support what people want from augmented reality is making a strong case for heterogeneity that is not as strongly made in other areas. The different kinds of compute in an application like this are not all well-served by a CPU; what works here might be a range of elements, from highly parallel processors like GPUs, other specific accelerators for video or compression, these sorts of things.”
“There’s a programmability and front end issue. There’s a heterogeneity problem. There needs to be a way to write applications and have a runtime that can hide some of the topological issues around distributed computing for these augmented reality goals.”
It is likely that as augmented reality devices expand to wider markets, most, if not all of the work will be running in cloud-based datacenters, which poses some interesting questions, because the very economics that make the cloud viable are shot to hell when one starts tweaking the architectures to support specialized tasks. Considering that different parts of the augmented reality workload can run best with a combination (for different sections of code or application components) on GPUs (which themselves require some major tuning at scale to support deep learning and image/video recognition), FPGAs, DSPs, and tuned accelerators for key tasks like video, image, and facial recognition, unless there is a strong financial argument to made to build out these custom machines, it’s unlikely any of the big cloud providers, at least at this point, are going to be interested. However, the good news is that many of these processing elements can be spun out to support different services. So, for instance, Google might see value in building a cluster to support augmented reality services because it can use the image recognition or video analysis features on some of those machines to support other Google services.
This is still a ways off, Rossbach guesses, but in his team’s evaluation of how much tuning is required at the core and cluster level to run augmented reality jobs in a high performance environment and at a price point that makes sense for the infrastructure provider, it became clear how much of a development investment these systems are. But if cloud infrastructure companies can turn one project’s costs into the benefit of another service, that might be a golden key to actually support widespread use of augmented reality devices.
For the infrastructure providers, it’s a matter of cost, but for users, it’s about having the immersive experience–and that means near real-time responses to input.“The question here is how we can provide the level of multi-tenancy required to support all of these users in a high performance way. While virtualization is high performance, nothing can compete with bare metal, but this is its own challenge, especially since what is needed doesn’t fit the commodity norm in these datacenters,” Rossbach says.
Making sure the next generation of heterogeneous architectures are present and primed to support new streams of augmented reality applications is only one part of the challenge ahead. The software stack for this area is also complex, in part because augmented reality requires a range of different applications that are both interdependent and distinct—meaning that meshing the codes to work together, even on suitable hardware with the appropriate accelerators, is no easy task. Rossbach and his team have developed a framework called Albatross to help create a reference for building both the hardware and software stacks to support augmented reality on existing devices like Google Glass, powered in part by a toolset called Dandelion, which lets users write .NET sequential code, then figures out how to parallelize that automatically to run on a GPU cluster.
“There is overhead with this,” Rossbach explains. “You can’t just take Dandelion out of the box and use it to solve this problem, but we’re showing that front end programming problem can at least start to be addressed. There is quite a constellation of codes involved; from Java for the Android portion, specialized accelerators for image detection, a bunch of C and C# code for the network, and some C and C++ code for facial recognition.
While some of this might sound a bit dire in terms of the lacking distributed infrastructure to support wide adoption of augmented reality devices, it is not all doom and gloom. In fact, although Rossbach would name the companies, he says that there are large-scale infrastructure providers who are already getting on board with heterogeneity since the diverse hardware and software stacks to support the complex range of augmented reality applications can also be spun out to support other deep learning application areas.
“What is good for them to be able to support augmented reality is also likely good at doing video and image analysis, so it’s not just about needing to build systems just for this purpose.”