Today’s episode of “The Interview” with The Next Platform is focused on the evolution of stream processing—from the early days to more recent times with vast volumes of social, financial, and other data challenging data analysts and systems designers alike.
Our guest is Nathan Trueblood, a veteran of several companies like Mirantis, Western Digital, EMC, and current VP of product management at DataTorrent—a company comprised of many ex-Yahoo employees who worked with the Hadoop platform and have pushed the evolution of that framework to include more real-time requirements with Apache Apex.
Trueblood’s career has roots in high performance computing but as many HPC already know, this is an area just as much about efficient data management and processing than it is about the systems and software stacks to make it all happen.
In the audio interview above, we talk about the technical trickle down from supercomputing to the rest of the enterprise analytics world; what life is left in open source data frameworks like Hadoop and its stream processing offshoots from Apache including Storm and Apex; whether the next bottleneck for stream processing is rooted in hardware or software; and how to overcome the legacy of thinking of stream processing from a batch processing perspective.
The conversation also loops in ideas about the evolution of use cases and markets for stream processing. We already know the long-standing markets in financial services in particular but the rise of IoT, ever-more powerful recommendation and personalization platforms, and more data from social and other user-generated sources, stream is more important than ever before—it is just a matter of creating balanced systems and flexible frameworks that balance the need to be lightweight and robust.
Trueblood says their biggest customers for stream processing tend to deploy on-premises almost exclusively but with many of the integrated tools for machine learning and other capabilities on public clouds, the smaller, more nimble stream processing is done in the cloud even if it comes with a latency penalty.
All of these topics, including how machine learning will fit into the existing cadre of stream processing methods and tools, are addressed in the podcast, found in the player above and of course, delivered directly to you via iTunes as well.