Connecting The Dots With Graph Databases
October 24, 2017 Jeffrey Burt
Graph querying of data housed in massive data lakes and data warehouses has been part of the big data and analytics scene for many years, but it hasn’t always been a particularly easy process. Understanding with graphs has in many ways been a highly manual process, and not all data scientists have had access to the Cypher graph database query language. Executives at graph company Neo4j are looking to change that.
At the GraphConnect New York show this week, Neo4j announced it has donated an early version of its Cypher for Apache Spark language toolkit to the openCypher project, a move the company says will enable developers that use the Spark data processing framework to bring Cypher graph querying into their work. At the same time, Neo4j also rolled out its Native Graph Platform, which brings together such capabilities as analytics, visualization, discovery, and data import and transformation to the company’s graph database.
The moves will bring graph querying of databases to a wider audience and gives a broader range of developers and data scientists access to an approach that uses graphs to gain a better understanding of the data, Philip Rathle, vice president of product at Neo4j, tells The Next Platform.
“Until now, the full power of graph pattern matching has been unavailable to data scientists using Spark or for data wrangling pipelines,” Rathle said in an email. “Now, with Cypher for Apache Spark, these data scientists can iterate easier and connect adjacent data sources to their graph applications much more quickly. The contribution that graphs have to make is in using connections and context to derive causality and meaning. Cypher is a declarative query language much like SQL. Up until now, the graph technologies available in the Spark and Hadoop world have been imperative, programmatic, and iterative, requiring a deep understanding of graph theory and mathematics: much in the way that SAS and SPSS enable data scientists to carry out complex statistical analysis atop tabular data. What has been missing is a ‘SQL for graphs’: usable by a layperson, and based on basic principles of pattern matching and filtering. Cypher is the ‘SQL for graphs’ that has been missing in the Spark ecosystem, making the power of graph querying available to a much larger user base.”
Neo4j is among a number of organizations that offer graph database capabilities, with others including SAP HANA, Bitnine and its AgensGraph graph database, and the Redis project. In addition, parties involved in the openCypher effort are working to create an open specification for the Cypher language. An array of industries use graph technology to understand connections within their data, according to Rathle, who pointed to NASA using the data in its projects involving Mars and eBay using the technology for its AI-powered ShopBot smart digital personal assistant. Neo4j has more than 250 customers that include such companies as Cisco Systems, Walmart and Comcast.
Currently graph data technology is used by such people as engineers, developers and software architects, but with the Native Graph platform, Neo4j is looking to bring “the benefits of a native graph architecture to more users in the enterprise, including data scientists, business analysts and business managers.”
In bringing Cypher into the Apache Spark developer world, Neo4j worked with openCypher to enable greater flexibility in how graph data is shaped, he said. That includes allowing graphs to be split, snapshotted and linked together in what he called “processing chains,” and enabling graph queries to operate over the results of previous graph queries. Cypher for Apache Spark also includes new multiple graph and composable query features that have come out of the work by openCypher, which is hosting Cypher for Apache Spark as alpha-stage open source under the Apache 2.0 license.
The new Native Graph Platform brings performance boosts to the company’s Neo4j 3.3 Database, which offers more use of native indexing and reworked the Cypher query interpreter. The 3.3 Database has a write and update performance that is as much as 55 percent faster than version 3.2 and 700 percent more than version 2.2. The new database also includes support for intra-server encryption for all work and across regions and cloud zones, advanced analytics for AI and an Enterprise Edition that will have extract, transform and load (ETL) capabilities.
The ability of Native Graph Platform to pull in other players in the enterprise will bring benefits to those companies, Rathle said.
“First, it helps them better utilize their data across their entire organization, uncovering new connections than they could previously,” he said. “Secondly, it helps them prepare for a more connected future, whether that involves machine learning, intelligent devices and real-time activities like conversational commerce. Much like we have learned in the RDBMS world, the various actors who engage with the database each have a variety of specific needs of the database. Having a set of tools, connectors, and utilities that support each of the various roles so as to more easily fulfil those needs helps organizations gain more value from graphs in a shorter amount of time.”