MapR Technologies has been busy in recent years build out its capabilities as a data platform company that can support a broad range of open-source technologies, from Hadoop and Spark to Hive, and can reach from the data center through the edge and out into the cloud. At the center of its efforts is its Converged Data Platform, which comes with the MapR-FS Posix file system and includes enterprise-level database and storage that are designed to handle the emerging big data workloads.
At the Strata Data Conference in New York City Sept. 26, company officials are putting their focus on the database, rolling out a range of features for the MapR-DB technology that are aimed at making it easier for developers to manage rich, data-intensive applications. The features include greater support for secondary indexes and automation in applications, integrated machine learning (ML) and real-time processing, and Open JSON Application Interfaces (OJAI) 2.0 APIs. The goal is to give developers the tools they need for the latest generation of applications, according to Jack Norris, senior vice president of data and applications for MapR.
“What the advances [in MapR-DB] kind of add up to is really supporting the next set of applications,” Norris told The Next Platform. “Whether you call those context-rich, data-intensive, whether you call them translitic applications … it’s basically, ‘Ive got in place analytics and I’m applying those in real time to better add intelligence to my business processes, whether that’s top-line and customer-oriented or whether it’s risk and fraud [or] whether it’s operations.’ … There’s a lot of pressure on databases as they scale. What we have is a collection of different options that an organization or a developer can use.”
The pressure on databases will continue to grow as the amount of data being generated increases rapidly and businesses look to analyze the data in near real time to drive deeper insights and make decisions based on those insights to more quickly roll out products and services for their customers. Bigger data sets are helping to fuel the trend toward ML and AI to drive faster analytics of the data. The impact of the data growth can be seen in the rise in money being spent on big data and analytics technologies. IDC analysts are forecasting that spending in the market will hit $150.8 billion this year, which is a 12.4 percent jump over 2016. That growth will hold relatively steady through 2020, when spending will reach more than $210 billion.
The need for new data-intensive business applications that are intelligent, high performing and scalable are driving many of the new features MapR is putting into its database. Businesses need applications that enable them to be able to analyze what customers are doing in real time as the web pages they have clicked on are loading, Norris said.
“What this means from the database standpoint is support for multiple secondary indexes on the database,” he said. “It means I can query it from many different perspectives so I can get a response. So it might be across a returning customer or it might be a segment or it may be products they’ve already purchased. … So the ability to have those secondary indexes means that on a massive database I can get a sub-second response from a variety of angles. To do that appropriately and have that auto-propagate and auto-manage and auto-scale and have … the query engine determine what’s the right index to use, all of that is built into the product.”
The OJAI 2.0 APIs and smart query execution enable developers to directly query self-describing documents that are what Norris called a “de facto standard” for mobile and internet of things (IoT) applications. Other features include using the integrated SQL engine to enable in-place self-service SQL to explore data in the secondary indexes and to support automated applications with self-service business intelligence.
“It’s got a sort of in-place support and the ability to have the SQL engine support not only key value store but to have that SQL engine sit on JSON directly,” he said.
The real-time processing and ML includes native Spark and Hive connectivity, and real-time data integration and capture global data change is important at a time when the data is continuous and streaming. Businesses need to integrate with those data flows and act quickly on it, Norris said.
“But also, the database might be part of a broader process, so level-change data capture means that any update of the database targeting downstream applications, whether that’s a remote database or another stream or an update to a search index … there’s a series of features related to that,” he said.
The innovations to MapR-DB will be part of MapR 6.0, which is scheduled for release in the fourth quarter. It can be used on-premises or via the cloud.
Norris said MapR officials launched MapR-DB after seeing that a lot of customers were using Apache HBase, a Hadoop database for big, distributed environments.
“They had a lot of issues with HBase, and even though our underlying file system data platform was enterprise-grade, there was a lot of things about HBase,” he said. “It was kind of a brittle interface and we were getting many support calls related to how that was operating. Our first release of MapR-DB was basically making HBase enterprise-grade. What we did was rework all of the underlying database handling and exposed the HBase API.”
Customers embraced the database, and MapR later took JSON documents and put those into a key value store. It solved the problem losing fidelity when flattening a complex object. The advantage of supporting “the native JSON was to save the time of doing preprocessing before you can analyze it,” Norris said. “So [you’re] using the native document format directly and also supporting a lot of different analyses from applications because you’re not fine-tuning it just for the needs of a single application.”