Breaking Data Silos Open With An Apache Arrow Platform

Voltron Data officially launched last month with a mission to make Apache Arrow easier to use for big data analytics projects, and in particular to focus on improving interoperability with other systems. The startup, which has raised a total of $110 Million in funding, spoke to The Next Platform about how it aims to change the way people interact with data.

Apache Arrow was introduced in 2016 as a framework for developing high performance data analytics applications based around a columnar in-memory layer. Since then, a growing team of Arrow developers has built up the project into a toolbox of modular libraries that provide a foundation for advanced analytics processing.

Voltron Data claims to employ the largest group of Apache Arrow contributors in the industry, and has just unveiled a new enterprise subscription service for companies developing solutions with Apache Arrow. This offers what Voltron Data calls a focused set of services designed to accelerate time to success with Arrow, split into three editions, comprising a Free Edition, Dev Edition and Pro Edition.

Josh Patterson, co-founder and chief executive officer, tells The Next Platform that many people just don’t fully understand the breadth and complexity of the Apache Arrow ecosystem.

“In Apache arrow, there’s a project called Skyhook that allows predicate push downs, and more fine grained data access on the Ceph file system, and that’s being worked on out of UC Santa Cruz,” Patterson says. ”And the project as a whole is just a very large project. There’s things in Rust, there’s things in Go, there’s things in JavaScript, Python, and Java, there things like Rapids that is built on Apache Arrow, there’s integration points with a whole bunch of things. So when you start thinking about building these high performance systems, you know, it’s obvious in the HPC world to think about standards, like MPI, UCX, or OpenMP. And so what if we had the same level of standards in the data analytics system? What if we also made it easier for these standards to kind of cross pollinate with each other, where high performance computing could meet traditional big data, that’s kind of what we’re trying to do.”

Voltron Data is also aiming beyond the obvious customers for high-performance analytics, such as financial services, and sees a broader role for tools such as Apache Arrow in solving the kind of data problems that most organizations will have.

“It’s natural to think about the financial services industry as being one of the leading players in fast efficient analytics. But I think that it’s just more of a general thing that we should be doing, as it’s hard to find a Fortune 500 company that doesn’t have a big data problem,” Patterson says.

“When you start thinking about things like cybersecurity, the explosion of IoT data, log data as a whole is just one of the fastest growing types of data, and we have to analyze it – we can’t just say, oh well, this is too much data, or I can’t do this, or I can’t do that. And at some point, you need order of magnitude gains to process this order of magnitude increase in data that people are collecting,” he added.

The solution will be a combination of making systems more efficient and also a combination of being able to leverage new hardware effectively, according to Voltron Data. The company is also aiming at making it simpler for data to move between the various application siloes, and so bring benefits across all of them.

“I think this is kind of an exciting time in general, for data analytics as a whole, because we do have things like Rapids and other innovation happening. What if we could reduce those barriers as well? What if we could reduce the walls for allowing new innovation to come to market and we bridge these systems better? We bridge the HPC world with the data analytics world with where machine learning and AI, are going. I think that’s really exciting for how we can just build better systems that provide more value,” Patterson said.

Voltron Data co-founder and chief business officer, Darren Haas, went further and says that that Apache Arrow would allow data collaboration across different business units or divisions, even if those units had standardized on different tools and languages.

“So GE had all these different business units. And they all wrote in different programming languages. They wrote in Java, they wrote in R, they wrote in Python. If you take a step back and look at Arrow, it’s not just like bringing together the languages, it’s actually bringing together the business silos. If you adopt Arrow and the ecosystem, your team over here is now able to talk to another team over here as data,” Haas claimed.

According to Voltron, organizations signing up for one of its enterprise subscriptions will be able to report and track issues through a customer portal, and have a team of engineers will work to resolve any issues.

As with other developers offering commercial support for open-source projects, Voltron is offering to backport bugfixes into the latest major release of Arrow and ship hot-fixed stable builds packaged and delivered as per requirements.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.