Easing The Pain Of Prepping Data For AI
November 7, 2017 Jeffrey Burt
Organizations are turning to artificial intelligence and deep learning in hopes of being able to more quickly make the right business decisions, to remake their business models and become more efficient, and to improve the experience of their customers. The fast-emerging technologies will let enterprises gain more insight into the massive amounts of data they are generating and find the trends that normally would have been hidden from them. And enterprises are quickly moving in that direction.
A Gartner survey found that 59 percent of organizations are gathering information to help them build out their AI strategies, while the rest have already piloted or adopted AI offerings. IDC analysts expect spending on cognitive and AI technologies to hit $12.5 billion this year – a 59.3 percent increase over 2016 – and reach more than $46 billion through 2020.
But AI models are only as good as the data that they chew on. With that in mind, IBM has added features to its Watson Data Platform that are designed to make it easier for developers and data scientists to analyze and share enterprise data and get it ready for AI applications. IBM is focusing much of its latest transformation on what it calls cognitive computing, which encompasses AI and machine learning, and is using its Watson portfolio as the foundation for its efforts. The Watson Data Platform is a combination of IBM’s cloud infrastructure and an array of data services, and leverages such open languages as Python and Spark SQL. The new features – which include Data Catalog, Data Refinery and Analytics Engine – will give enterprises better visibility into the data, better enforce security policies and access and move data across public and private clouds. Organizations have access to more data than ever, but that data can be located in different places and must be moved to power the increasing numbers of AI applications, according to IBM officials.
They noted that IDC analysts are predicting that by 2018, almost 75 percent of developers will build AI functionality into their apps. Their Gartner counterparts are seeing a similar trend. By 2020, AI will be in almost every new software product, they said. That’s where the new functionality in the Watson Data Platform come in, company officials said. Data Catalog is used to create a searchable index of structured and unstructured data throughout an enterprise’s environment, from existing on-premises systems to cloud platforms to the data streams created by the internet of things. Through metadata, it also enforces rules-based governance policies to control access to the data and ensure compliance, and – through machine learning capabilities – automatically profiles and categorizes the data.
The Data Refinery feature cleans and processes the data to prepare it to be shared widely and used by AI and machine learning applications. It also allows for quick discovery, visualization and sharing of the data to data scientists, developers and business teams can work together in real-time, according to IBM. The ability to use metadata, pulled from Data Catalog and Data Refinery, to tag and help enforce a client’s data governance policies. This gives teams a foundation to more easily identify risks when sharing sensitive data.
IBM also is making available the Analytics Engine available, which leverages Apache Spark and Hadoop to create an intelligent repository for the data to make it easier for users to get a better gauge of the size, value and creation of the data. Through the Analytics Engine, developers and data scientists can work with the datasets without having to manage the infrastructure behind it. In addition, the Analytics Engine is powered by IBM’s Cloud Object Storage, which is designed to make data easily ready and available for processing and analysis.
“The combination of Cloud Object Storage and Analytics Engine separates compute and storage, enabling companies to take greater advantage of the agility and economics offered by the cloud,” Derek Schoettle, general manager of the Watson Data Platform, wrote in a post on the company blog.
In the post, Schoettle outlined a scenario involving a retail company using the new functionalities to look at customer buying patterns and increase sales. In the scenario, a data scientist needs to see purchase transaction data that’s in both on-site databases and in the cloud. The data scientist will use the data to create seasonal- or demographic-specific categories, analyze the data and correlate it with customer feedback. Using the new features in the Watson Data Platform, the data scientist can access all the data regardless of where it’s housed, shape it and build a machine learning model, and then share the model with a developer, who can deploy it into an AI application that markets season-specific clothing based on customer preferences.