Cloudera Puffs Up Analytics Database For Clouds
December 4, 2017 Jeffrey Burt
In many ways, public clouds like Amazon Web Services, Microsoft Azure, and Google Cloud Platform can be the great equalizers, giving enterprises access to computing and storage resources that they may not have the money to be able to bring into their on-premises environments. Given the new compute-intensive workloads like data analytics and machine learning, and the benefits they can bring to modern businesses, this access to cloud-based platforms is increasingly critical to large enterprises.
Cloudera for several years has been pushing its software offerings – such as Data Science Workbench, Analytic DB, Operational DB, and Enterprise Data Hub – as a platform for machine learning and analytics workloads. In May, the company introduced Altus, a platform-as-a-service (PaaS) offering hosted on AWS that gives customers access to Cloudera services, which have since its founding as the dominant Hadoop distributor enabled enterprises to gain more insight and value from the massive amount of data that they are generate and gather.
The initial service on the Altus PaaS was Altus Data Engineering, makes it easier for businesses to build elastic data pipelines and bring data engineering elements to such applications as business intelligence (BI) and data science. In September, the company brought the service to Azure.
Now Cloudera has added another service to the PaaS. At the Amazon Web Services re:Invent conference in Las Vegas last week, announced the upcoming beta release of Altus Analytic DB, a cloud-based data warehouse service that will initially run on AWS and that will offer self-service BI and SQL analytics. The cloud-based database, based on the Apache Impala SQL query engine, will be available in beta on AWS by the end of the year. It will be available on Azure after that, though the company didn’t give a timeframe. Companies interested in joining the waiting list for the beta can sign up here.
The Analytic DB is the latest step by Cloudera to expand its presence in the cloud.
“Increasingly we see the cloud as the fastest-growing avenue for us,” Alex Gutow, senior product marketing manager, tells The Next Platform. “The cloud offers a lot of capabilities to very quickly develop services and offers fast access to data.”
Cloudera is seeing many of its customers eyeing the cloud for their applications. In a conference call in September about the recent financial quarter, CEO Tom Reilly said he is seeing customers increasingly moving workloads to the cloud. About 20 percent of customers are leveraging Cloudera’s cloud capabilities and 20 percent of workloads are running in the cloud. Cloudera is targeting Global 2000 companies and said in the most recent quarter added 45 new customers in this category, with a goal of adding 120 by the end of the year. Recent big customers include payroll and HR software provider ADP, DBS Bank in Singapore, and Cox Automotive.
“The organizations that extract the most value from that raw resource will be the winners in a new data-driven economy,” Reilly explained. “Whether developing self driving vehicles, predicting health care outcomes, anticipating customer needs, or preventing fraud, the fuel for making all those things possible is the same: It is data. The tools of choice for maximizing the value of data are machine learning and analytics, increasingly delivered via the cloud.”
Key to the new Altus DB is that with the database already in the cloud, enterprises don’t have to move the data from the cloud and into an on-premises database, saving time and costs, Gutow said. In addition, with the data hosted on AWS, businesses can leverage various features on the public cloud, such as the ability to isolate workloads in multi-tenant environments. They also can take advantage of the ability to scale up and down as needed, he said, adding that “in the cloud, workloads are not always long-term; they can be transient. They can be around for a few hours and then they disappear.”
Through the Altus Analytic DB, enterprises will be able to access their data that is stored in Simple Storage Service (S3). The service will run on specific Elastic Cloud Computer (EC2) instances, and enterprises will be able to quickly – through a matter of a few clicks – spin up analytic clusters in the cloud. They can run workloads on the cluster and then take it down when done with it, taking advantage of the elasticity offered by the cloud.
The data is managed through Cloudera’s Shared Data Experience (SDX), which was announced at the Strata Data show in September. The offering gives enterprises a single console for managing data both on-premises and in the cloud, and through it, the data and policies and governance around it remains consistent. That means that everyone – from business and financial analysts to data scientists and data engineers – is working with the same data, schemas and structures, and can do so with whatever tools they want to use, including SQL, Python and R.
“The data is available for multiple or different workloads,” Gutow said. “Data engineers are working on the same data that BI [analysts] or data scientists can tap into.”