Keeping track of the many graph analytics platforms and databases that exist and weighing their relative advantages could be a full time job. An informal count of relatively well-known offerings brings the total well past sixty, and that includes mostly vendor-backed efforts, but some Apache projects, including Orly, Cayley, ArangoDB, and a handful of others.
As Adam Kocoloski, CTO of IBM’s Cloud Data Services, tells The Next Platform, “When we think about the market for graph database systems, the fact is, there are quite a few of them out there and this has led to people being a bit reticent to experiment with them. There are a lot of use cases and problems that are amendable to graph analysis but with so much variety in the market, all of this has a slower adoption track than in could.”
Kocoloski understands well what it means to bring a product to market in an already-crowded ecosystem. Before taking his role for the cloud data division, he was CTO for IBM Cloudant, the company’s managed NoSQL database as a service offering, which was devised and launched a time when the options in that arena were also exploding. He does say, however, that by tuning existing frameworks within the IBM SoftLayer cloud, IBM does have a chance to differentiate and stand out from the rest via the integration of user data across the analytics pipeline.
Just as IBM has made investments in key open source projects in recent years, including Hadoop and Spark as the most touted examples, the company is getting behind another Apache project to bolster its graph appeal. This project, called Tinkerpop, is in incubator stage but has significant weight behind it, in part because its creators, who also spun out another graph project called TitanDB and Gremlin interface, have managed to collect strong support. Kocoloski says the platform has broad buy in across the industry and if IBM can help drive it forward as the de facto standard and do a lot of work under the covers to allow users to tap into it without redesigning their applications every time a new innovation comes along, it will be promising. “It has the potential to do for graph engines what SQL did for relational database management systems.”
The most likely users of such a service will be developers seeking to add graph traversals directly into applications to build things like recommendation engines, fast online risk assessments, and fraud detection platforms. The goal is to allow for such online, individual traversals as opposed to the types of bulk synchronous processing that one finds in increasingly common graph frameworks like the Facebook-developed Giraph or GraphX, Spark’s API for graphs and graph-parallel computation.
“There are a lot of problems that are certainly amendable to graph problems, but with so much variety in that ecosystem, it’s been slower to adopt than we’d like to see. So the barriers we want to remove, those of installing, deploying and managing a graph database, are meant to encourage experimentation.” From recommendation engines, fraud and risk analytics, generalized routing in geospatial and routing segments, and as always, in intelligence, there is a market for graph analytics—but it’s a space that has not fully shaken out its winners and losers yet. Having a common platform or base to tackle and experiment, whether it’s using an on-site cluster with some centralized data management or in IBM, Amazon or other clouds appears to be the key. So it boils down to which vendor or approach can provide the best way for users to snap together analytics from graph databases with Spark streaming data or other silos.
That larger goal was at the heart of IBM’s set of announcements this morning that expand the Cloud Data Services portfolio. Big Blue has added 25 new services in its Bluemix platform designed for both analytics and more important, data management for many moving parts. This includes a central feature called IBM Compose Enterprise, which IBM terms as a “managed platform to help development teams build modern, web-scale applications faster by enabling them to deploy business ready open source databases quickly on their own dedicated cloud servers.”
In addition to the managed graph database service based on TinkerPop (called IBM Graph), IBM has also added a new predictive analytics platform and a new open data exchange repository of public data with more than 150 datasets that can be merged into applications.
Kocoloski admits that the space is crowded and IBM is always surveying the ecosystem to decide where to invest effort, but he says that the areas where graph database approaches have good traction tend to be solid use cases that feed the need for more robust solutions—and investment on IBM’s part in creating more manageable graph frameworks.