We tend to cover the various platforms that are being used at massive scale to tackle pressing enterprise and research problems, but once in a while, something out on the edge that shows how these platforms are being deployed catches our eye. And large-scale processing of distributed video and image data streams across a range of applications for potentially limitless purposes counts in that category.
Perhaps it’s just conditioning, but hearing the word “harvesting” applied to any data mining task immediately sets many people on edge. It sounds particularly sinister when that threshing of data applies to something as ubiquitous as networked cameras. There is a passive acceptance that almost anything can and might be recorded anywhere, but that tends to play out on the public stage when a recording is used to validate an act or event—it ends with the public posting on a social or video site.
But now, thanks to advances across the board—from the capabilities of these networks of cameras, to the access of ultra-affordable storage and processing in the cloud, to advances in APIs that make it easier to tap into an ever broadening array of sources—camera data can mean far more than ever. And the possibilities are endless.
Network cameras are those that are constantly connected to the internet (for both public and private viewing) and can include everything from highway and traffic cameras, to those set to monitor parks, construction sites, government buildings and so forth. There’s nothing shocking in the fact that these are found almost anywhere—and unlike the old days where the “footage” was simply archived in case of an incident, these real-time streams are almost instant, always on, and cheaper than ever to store for posterity.
On that front, as a team of researchers from Purdue University noted after calculating estimates on data storage for large volumes of MJPG and MPEG-4 information, “even though it is not cheap, it is technically and financially feasible for an organization to store the data from all the cameras in the world.” Even as the resolution of both data types (and new ones as the might emerge in coming years) increases, the cost is still within reach for large companies.
So now that we have established why the word “harvesting” is particularly chilling in this context with adequate justification, the fun question becomes how this can be done.
Data from the public cameras is easy to tap into, but not so much for the HTTPs protected fleet of those networks. Massive companies are the only ones who might want to make use of these networks of watchers, and so let’s pop off the paranoia train for a moment and look at how data from these cameras might be used.
The Purdue team points to a series of innocuous uses of their proposed cloud-based system, which provides an API to allow for the analysis of millions of images from many thousands of cameras within a matter of hours. For instance, to “help the environment”. Cameras can track and monitor changes in flora and fauna, the movement of water, and animals and their various patterns too, which is beneficial for researchers and unlike the five thousand other uses for this, is not at all alarming.
So, in the interests of advancing necessary and useful adoption of the vast global web of networked cameras and their rich, accessible data for the health of the environment and nature (ahem), the team clearly laid out how this is best addressed. There are a few problems along the way, all of which are pretty common in the so-called big data world. From ingestion, processing, storage, and I/O, these files, which have multiple resolutions and are of varying types and sorting meaningful data from them to mesh together to track and monitor individual elements or “themes” within these image and video data streams and stored pools.
To begin with, as the researchers point out, not all of the networked camera data is archived or stored—some is simply streamed (ala webcams monitoring traffic) and even if it was stored, it might be socked away in distributed locations where retrieving it would mean working with more formats and protocols. Third, when it comes to taking advantage of these data for the health and safety of animals and the planet, of course, the processing of those data volumes is not insignificant.
If we did not live in the age of infrastructure as a service, it would require a pretty large cluster to work through I/O and compute-heavy video analysis. The cloud is relatively inexpensive for short bursts of use like gathering, processing, then cold storing the results—and said storage in the petabyte range is still quite affordable.
Ideas around using these cameras for larger studies are not necessarily new, but the problem for, say an ecologist studying weather and habitat patterns, is that she must also become a part-time computer scientist. If there can first be a central repository of all the available networked cameras in the world (one big hurdle) that narrows the search gap, but things get more complicated after that. Researchers need to then be able to efficiently retrieve the data, store it for long-term of streaming analysis, make sense out of the output (visualization, etc) and of course, manage the computational resources for these tasks, which itself is no picnic on AWS or any other public cloud provider’s iron.
The goal of the Purdue team was to package these tasks up, create the repository, and let the real analytics work begin without encumbering researchers with infrastructure. The solution is an API that can let them hone in with their code for certain features—temporal, changes in frame rates, alterations in color or activity, and so forth. As they describe, “Researchers can select the cameras for their studies without knowing the details of the cameras, The interface can retrieve individual images from cameras in JPEG or MJPG [currently integrating H.264].”
“This API completely hides the details of data retrieval. Users to not have to handle the different network protocols needed to retrieve the data from heterogeneous cameras…A user may submit a program that analyzes images. This program is copied to cloud instances which retrieve data from the cameras and execute the analysis.”
As a side note, there are already sixteen pre-written analysis engines as part of the project that can help users tap into key features they want to take notice of. Corner, motion, sunrise and other detection is built in with the user being able to extend this with their own code. As of now, users are limited to Python with OpenCV.
You can give it a whirl in the context of over 50,000 cameras here and check out the demos of non-nefarious uses of this project; just biding your time until you try to take over the world.