Just a few years ago, using a new kind of tool such as the Hadoop data muncher was sufficient to gain competitive advantage in many industries. But now, incumbents in just about every sector of the global economy are facing challenges from upstarts, what Pivotal – the database, analytics, and platform cloud spinout from EMC and VMware – calls “the Uber-ization of industries.”
“I end up in meetings with these stalwarts of industry, whether it is the world’s largest thermostat manufacturer or titans of the transportation, and they are all pretty scared because their industries are being disrupted,” Michael Cucchi, senior director of the data product group at Pivotal, tells The Next Platform. “A company comes in, they completely change the game. They start fresh and because they don’t have historical baggage, they create something new.”
The upstarts know that the heart of their business is software, and they don’t waste a lot of their time on the back office systems that count the money, but rather concentrate on the applications that provide the experience to customers and that cull mountains of data from them. Using this data, the upstarts doing continuous and incremental tweaks to their applications and therefore their services, improved over time in an iterative fashion instead of with a major update every year or two as was the common practice across industries for decades.
For such upstarts, open source software is a given, and for those companies that are being forced to defend their turf against them, open source software is becoming the preferred platform for creating and running new applications.
The reason this is true is obvious. Upstarts and incumbents alike want to avoid vendor lock in and also want to be able to tweak their infrastructure software to suit any unique needs that might crop up. And that is why Pivotal is opening up more of its software stack, including some of the family jewels that EMC and VMware spent lots of money acquiring a few years back as they started building what has become the Pivotal platform. This platform has been welded together from various pieces and includes agile programming methodologies and tools, the Cloud Foundry platform cloud, and myriad data storage and querying tools including the Greenplum data warehouse, the GemFire in-memory NoSQL data store, the Pivotal HD Hadoop distribution, and its Greenplum-inspired HAWQ SQL query overlay.
“We are seeing SQL continuing to play a major role for the enterprise, whether it is for Hadoop or for more standard analytical tools,” explains Cucchi. “We had really good financial performance last year, and a lot of that was driven by our Greenplum database as well as HAWQ. We are also seeing a ton of in-memory momentum, and we saw customers lean in with GemFire, and I think that was because GemFire feeds massive scale. It is behind the largest booking systems in the world. We have seen customers reach tipping points with traditional database technologies, and they come to us because GemFire could break through those limitations.”
Databases and data stores are not the only drivers for Pivotal. “We are totally focused on streaming and machine learning applications, and we saw a lot of customers start to demand open source,” Cucchi says. “We even saw RFPs requesting open source software. We have huge service providers and huge telcos – some of the world’s largest companies – are rebasing, and as they do that, they want to go forward with the next-generation open source approach. We are seeing a huge trend there.”
Back when the Pivotal Initiative, as the spinout was initially called, was formed in late 2012 with the various tools outlined above from the VMware and EMC product lines, many of us were speculating that Pivotal would eventually open source the Greenplum database, which is a souped up and distributed variant of the PostgreSQL database that is also open source. That day is finally coming, and now Pivotal is opening up GemFire and HAWQ as well. The Spring application framework, the Hadoop distribution, and the Cloud Foundry platform cloud, were already available on an open source basis. Cucchi tells The Next Platform that the precise licensing of each of these elements of the Pivotal stack has yet to be decided and that the opening up of the code will come out in a cadence over the coming months with the expectation of being completed by the end of the year. Cucchi adds that Pivotal had been driving to open up its stack “for a long time,” adding that the first year of the company’s existence was about collecting all of the parts of the stack and fitting them together.
By the way, not all of the elements of the Pivotal analytics stack will be opened up. As is common for commercialized open source products, the core will be open but certain add-ons will remain closed. As an example, Cucchi said that the core GemFire in-memory NoSQL data store would be opened up, the ability to do replication of data across a wide area network or to do continuous queries across GemFire clusters might be a for-fee add-on.
Last year, Pivotal created what it calls the Big Data Suite, and it shifted from various licensing terms for the products when bought separately to an integrated model that sold them on a subscription basis and, most importantly, let customers move applications from Greenplum to Hadoop/HAWQ to Gemfire transparently without having to pay extra money for the cores as the workload shifted. Moreover, Pivotal’s pricing model allows for customers to move licenses from infrastructure in their own datacenter out to virtual infrastructure on public clouds as they see fit, or operate in a hybrid mode spanning the two, or move them back and forth.
This integration and flexible pricing seems to have paid off. Sales of the Big Data Suite went from zero to more than $40 million in the nine months it was available in 2014, says Cucchi.Overall, Pivotal posted more than $100 million in big data software bookings in 2014. Cucchi did not give overall revenues for last year for the complete Pivotal stack, which would be more useful than bookings, but said that around 80 percent of the deals for the data analytics tools at Pivotal were driven by Greenplum DB, HAWQ, and GemFire.
As for the augmented Big Data Suite in 2015, which will include a full stack to run Spark streaming and machine learning applications atop Pivotal’s Hadoop variant, Cucchi says that pricing will remain the same even as more tools are added to the suite. Pivotal does not release pricing for the Big Data Suite, but Cucchi says that it is “priced competitively” with other Hadoop distributions and that it is “less costly” than traditional enterprise data warehouses and relational databases. The Spark stack is expected to be integrated into the Big Data Suite during the first half of 2015.
Ganging up with Hortonworks, IBM, and GE
Open source is just one kind of open that Pivotal hopes to leverage as it seeks to become The Next Platform of choice for modern, data-driven, distributed applications. Open will also mean partnering with other platform providers, such as competitor Hortonworks, to certify key elements of its data analytics stack on Hadoop distributions other than its own Pivotal HD.
This latter effort is being done under an initiative called the Open Data Platform, and its long-term goal is to keep the Hadoop market from fragmenting and getting key tools developed by Pivotal, Hortonworks, and perhaps someday other Hadoop distributors to work on each other’s platforms. This is a lofty goal, of course, and one that we have seen tried across Unix platforms and then Linux distributions in days gone by. While Pivotal and Hortonworks are to be commended for ensuring that they deliver a core Hadoop stack that allows for applications to run across them without modification, Cloudera and MapR Technologies have to come to the party, too. (IBM is participating in the Open Data Platform effort, and presumably its BigInsights variant of Hadoop will also be certified as compliant.) At the moment, the compliance testing is focused on the Hadoop Distributed File System, the MapReduce compute framework and its YARN follow-on, and the Ambari Hadoop management tool, with other elements of the Hadoop stack to follow. Manufacturing giant GE, which pumped $105 million into Pivotal when it was spun out, giving it a 10 percent stake in the company, is standing behind the Open Data Platform effort, as are a number of other big industrial giants.
Under a partnership between Hortonworks and Pivotal, the HAWQ SQL query engine for Hadoop will also be ported to Hortonworks Data Platform, and there will be integrations made between the Hortonworks Hadoop distro and the GemFire NoSQL data store and the Greenplum database. Hortonworks is recommending the Pivotal tools for what Cucchi is calling “focused advanced analytics, machine learning, SQL on Hadoop, MPP analytics, and in-memory use cases,” while Pivotal is recommending Hortonworks for “broader Hadoop use cases.” This will, says Cucchi, also include joint engineering efforts and will result in a “drastically enhanced” Pivotal HD distro. Everyone has stopped short of saying that Pivotal will simply be shifting to the core Hortonworks Hadoop distribution over time, but it will not be surprising if this ultimately happens, allowing Pivotal to focus higher up the stack.