Cloudera Pivots To Data Management As Hadoop Fades

It was only two years ago that Cloudera, once one of the top vendors in what had been a white-hot Hadoop market, found itself fighting for survival. Hadoop, the open-source data analytics technology that a decade ago was seen as the answer to enterprises’ large-scale data analytics and management woes, was being batted about in a fast-evolving market that saw public cloud providers like Amazon Web Services, Microsoft Azure, and Google Cloud increasingly becoming the destination of choice for such workloads.

Enterprises were rapidly putting Hadoop in the rear-view mirror as they opted to ditch the costs and management headaches that came with running these applications on premises and instead manage, store and analyze their data in the public cloud. By 2018, the fate of Hadoop was coming into clear focus. Organizations not only had the cloud, but also a growing menu of other in-memory and accelerated databases and object store that they could use rather than Hadoop and its HDFS file system.

The companies that had made their coin offering commercial Hadoop-based solutions – primarily Cloudera, Hortonworks, and MapR Technologies – were scrambling to find some traction somewhere in a muddy road that had them stuck in the muck. Cloudera in early 2019 closed its $5.2 billion merger with Hortonworks and but that year still found bookings by existing customers slowing as competition from public cloud providers continued to grow.

In the middle of the year, MapR announced it was just weeks away from shutting down operations unless it could either find more financing or a buyer. In August 2019, Hewlett Packard Enterprise said it was buying MapR to bolster its big data portfolio and capabilities in artificial intelligence (AI), machine learning and analytics.

However, in March that same year, Cloudera launched its Cloudera Data Platform (CDP), a hybrid cloud and multicloud data management solution designed to make it easier for enterprises to run and move their AI and analytics applications and data from one location to another, stretching from the datacenter to the clou and out to the edge and delivering real-time insights from the massive amounts of data modern organizations generate. It came with automation and intelligent migration capabilities, offer consistency in such areas as security, compliance and governance and worked with the leading public cloud providers.

Fast forward two years and CDP has become foundational to Cloudera’s resurgence. In its fiscal year 2021 third quarter, which ended Oct. 31, 2020, the company saw revenue jump year-over-year by 10 percent, to $217.9 million, and subscription revenue come in at $197.4 million, an 18 percent increase. Annualized recurring revenue grew 12 percent. During the quarter, the company saw a 40 percent rise in the number of paying customers for its CDP Public Cloud and released CDP Private Cloud, a hybrid cloud platform whose offerings include analytics running on a containerized compute cloud, a highly scalable object store and a secure data lake.

The CDP enterprise data cloud now offers cloud-native analytics for Data Engineering, Data Warehousing and Machine Learning and can run in AWS and Azure as well as a on-premises private clouds.

“As for Hadoop, it is just one of 30-plus open-source projects that underpins Cloudera Data Platform,” Krishna Maheshwari, senior director of product management at Cloudera, tells The Next Platform. “From a business point of view, it is a small portion, but it is still there. Today, we think of Hadoop more as a philosophy than a technology.”

The company this week is announcing the availability of the Operational Database for the CDP on AWS and Azure. The new offering is a fully managed cloud native operational database designed for scalability and reliability that also can run on premises in a private cloud. It automatically scales, heals and tunes depending on the workload, enabling application developers to deliver prototypes in less than an hour.

“When designing an application, the database that developers select can have major implications for what they have to consider and build into the applications vs. what they can leave to the database to manage on their behalf,” Maheshwari says, adding that there are three key challenges developers face when building new applications: the database architecture (scale-up or scale-out, sharding, partitioning, risk domains and failure modes), relational versus non-relational tradeoffs (consistency models and familiarity with and the ability to leverage existing experience) and schema design (how forward-looking does it need to be and how difficult it is to change both the database and application).

“In addition, developers need to think about where they need to consider their corporate policy about where they need to deploy their applications – on-prem in their datacenter or in one of the several public clouds vendor,” he says. “They also need to think through company policy of where the data will be hosted – in their company’s control or in the database vendor’s control. And ultimately, developers need to determine whether they will want, or will need, the flexibility to change that decision at a future point in time.”

CDP Operational Database is designed to address those problems. For database architecture, it auto-scales with no limit on size, automatically partitions the data so the developers doesn’t have to worry about sharding and is highly available out of the box. It supports relational and non-relational with a focus on standards compliance and also s supports traditional relational schema ad evolutionary schema. It works in a hybrid cloud environment, enables organizations to keep ownership of the data by having it live in their Virtual Private Cloud in AWS or Azure or in the CDP Private Cloud on premises and makes it easier for developers to rewrite portions of the application.

“Say a developer wants to build enterprise-class applications faster, with CDP Operational Database, they can provision a new database in just three clicks,” Maheshwari says. “CDP Operational Database simplifies the process of deploying a database, enabling developers to easily provision a new database within minutes and start building applications.”

By automating operations, “if a developer wants to eliminate operational overhead, CDP Operational Database can automatically improve database performance based on application requirements, and can resolve failures without manual intervention. Additionally, it automatically scales up or down based on application requirements, helping to optimize cloud costs.”

Cloudera is building out the platform to address the ongoing enterprise embrace of hybrid and multicloud environments, he says. The COVID-19 pandemic is accelerating the trend, with IDC predicting that by 2022, more than 90 percent of enterprises worldwide will be using a mix of on-premises and dedicated private clouds, multiple public clouds and legacy platforms for their infrastructure demands.

As The Next Platform recently observed, organizations that invested millions of dollars in infrastructure to support their Hadoop deployments and are looking to maintain some of that hardware even as they move away from Hadoop. As enterprises shift to Cloudera’s CDP in a hybrid environment, they are “using the same infrastructure, as well as expanding into new data center infrastructure, cloud IaaS as well as our CDP cloud services,” Maheshwari says. “With CDP, our customers can now deploy via hybrid cloud so they have the same separation of compute and storage that they enjoy in the cloud. Customers can also leverage the existing skills and technology from their legacy operational database deployments. CDP enables them to manage all their data in any cloud, whether private or public.”

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.