For a topic that generates so much interest, it is surprisingly difficult to find a concise definition of machine learning that satisfies everyone. Complicating things further is the fact that much of machine learning, at least in terms of its enterprise value, looks somewhat like existing analytics and business intelligence tools.
To set the course for this three-part series that puts the scope of machine learning into enterprise context, we define machine learning as software that extracts high-value knowledge from data with little or no human supervision. Academics who work in formal machine learning theory may object to a definition that limits machine learning to software. In the enterprise, however, machine learning is software. Moreover, if we view machine learning as a type of software, we can evaluate it with the same considerations as any other enterprise software: licensing, usability, provisioning, security and so forth.
At first glance, BI tools seem to fit our working definition of machine learning; the key clause, however, is with little or no human supervision. A human user working with BI software needs only ten 2-way crosstabs to analyze relationships among five variables comprehensively. The same user needs 4,950 tables for 100 variables and half a million tables for a thousand variables. Machine learning software can find significant patterns in a fraction of the time it takes a human analyst and can identify patterns that elude a human analyst.
It is a useful metaphor to think of machine learning as a manufacturing process in a value chain. The machine learning “factory” accepts low-value data as input; its output is high-value knowledge in the form of a mathematical equation, a set of rules, program code or some other useful object. The output from machine learning may be something that other machines can read, something that humans can interpret, or both.
A concise explanation of how machine learning works is as elusive as a singular definition. Within the field, there are dozens of different frameworks (such as decision tree learning, kernel-based methods or artificial neural networks) that define how algorithms learn; within each framework, there can be hundreds of distinct algorithms. A recent paper benchmarked more than 150 algorithms for classification alone.
In general, machine learning works in the following manner:
- A learning framework defines general rules for modeling a real-world problem, including one or more quantitative measures of performance.
- An optimization procedure searches for a set of parameters or rules called a model, that deliver the best performance with a set of training data.
- Depending on the implementation, the algorithm may automatically validate the model with fresh data; otherwise, a human user performs this task.
- The model is now available for inference, either within the same software or exported to some other application.
Machine inference is the inverse of machine learning. In machine learning, the algorithm searches for the best-performing model with a fixed set of data; in machine inference, an algorithm uses a previously developed model to compute one or more unknowns with new information.
Most machine learning frameworks fall into one of three categories:
- Supervised learning, in which the goal is to accurately model the value of a single target variable from other variables in the data. This approach is useful in prediction problems, such as Cisco’s purchase propensity models cited above.
- Feature learning, or unsupervised learning, where the goal is to model some multivariate characteristic of the data. For example, in clustering, the goal is to group similar cases together, so the performance measure to be optimized may be a measure of mathematical distance between clusters.
- Reinforcement learning frameworks learn through constant interaction with the environment, as in a robot or autonomous car. Reinforcement learning is especially appropriate when the best way to learn about the environment is to interact with it.
Deep learning is a machine learning framework that models high-level patterns in multi-layered networks. Companies like Microsoft and Google use deep learning to solve problems in areas such as speech recognition, image recognition, 3-D object recognition, and natural language processing.
Machine Learning in Enterprise Action
Much of the recent news about machine learning focuses on emerging technologies, such as autonomous vehicles or machines that recognize human speech. These innovations are very exciting, but they are also in the early stages of commercialization. There are many other more prosaic applications across the enterprise where machine learning drives value today. Here are just a few examples:
- The Carolinas Healthcare System (CHS) uses machine learning to construct patient risk scores. Case managers use these scores to prioritize patient services and make discharge decisions. This system enables CHS to deploy its medical personnel more efficiently, prioritizing patients according to risk and complexity of the case. Since implementing the system, CHS has lowered its readmission rate from 21% to 14%.
- Cisco uses machine learning to build individual purchase propensity scores for each of its products. Sellers use the predictions to find the best sales and marketing prospects and determine the best products to offer each prospect. The results: more sales, fewer wasted sales calls, and satisfied sales reps.
- PayPal was losing $10 million per month to fraudsters until it started using machine learning to identify suspicious transactions in real time. Today, PayPal’s scientists constantly tune and improve models to detect more fraud and help investigators work more efficiently.
These examples have two key characteristics in common. First, the machine learning outputs – patient risk scores, propensity to buy scores, and fraud predictions – are substantially more valuable to each organization than the raw data. Second, they are valuable because they are trustworthy; a clinician at CHS, for example, relies on the patient risk score because she knows that it is tested, proven, reliable and carries the weight of authority. The same applies to the other examples, and any successful machine learning project.
As the companies cited above can attest, machine learning does four things very well.
- Identify complex interactions among variables.
- Learn low-level features from raw data.
- Predict high-cardinality class memberships, such as in image classification.
- Work with unlabeled data, such as bit-mapped images.
Some phenomena are the result of complex interactions among variables. For example, the incidence of birth control use is not merely a function of sex or age alone, but a combination of the two variables interacting with other factors. While experts can model these interactions with statistical modeling techniques, the process is labor-intensive and time-consuming. Machine learning detects interactions automatically, and without significant user supervision.
Success with statistical techniques depends greatly on the user’s ability to prepare the data, a step that requires considerable domain knowledge and skill. As a rule, machine learning techniques are more robust with dirty and incomplete data.
Machine learning — deep learning in particular — works well with data that has a vast number of distinct values, such as words in a text, or sets of unique images. Practical applications include speech recognition, image recognition, or recommendation engines, where the best item to offer can be one of many.
Machine learning can learn from unlabeled data, which lacks a definite “meaning” pertinent to the problem at hand. Untagged images, videos, news items, tweets, computer logs are all examples of unlabeled data.
Machine learning produces output that can be difficult for humans to interpret compared to statistical techniques; which makes machine learning less useful when the goal of the analysis is attribution or analysis of variance. Practitioners address this “black-box” problem with thorough validation and simulation testing to explore how the model behaves when presented with new data. Techniques like partial dependency analysis make it possible to visualize how a machine learning model behaves.
Another potential drawback for machine learning is a propensity to overfitting, where the algorithm “memorizes” unique characteristics of the training data. Some machine learning algorithms have “built-in” controls to avoid or minimize this problem; for others, the user must manage the problem manually.
Machine learning algorithms require complex computation, and they need a great deal of computing power to build. The cost of computing has declined radically; as a result, machine learning frameworks that were impractical ten years ago are practical today. Nevertheless, computing is not free, and managing machine learning workloads poses a challenge for IT.
Computational complexity also makes deployment difficult. After awarding a million dollars to the team that won the Netflix Challenge, Netflix determined that it was too expensive to deploy the winning model. A machine learning model must outperform an existing model by enough to justify its deployment costs.
The technical foundations of machine learning are more than fifty years old. Until recently, business applications were rare, and few people were aware of its capabilities. Several converging trends contribute to the recent surge of interest in the field:
- Cheap computing makes machine learning practical.
- New and innovative algorithms deliver better results than previous approaches.
- Practitioners have more experience making machine learning work.
Highly visible successes, such as Watson winning Jeopardy or AlphaGo beating a human raise public interest and awareness. From the enterprise perspective, big data creates analysis problems that cannot be solved efficiently with other methods.
In part two of this series later this week, we will review the key challenges faced by organizations seeking to deploy machine learning at an enterprise scale–and how these methods different from existing approaches to more static analysis.
Thomas W. Dinsmore is an independent consultant and author, specializing in enterprise analytics. Thomas provides clients with intelligence about the analytics marketplace: competitor profiling, requirements assessment, product definition and communications.
Before launching his consultancy in 2015, Thomas served as an analytics expert for The Boston Consulting Group; Director of Product Management for Revolution Analytics (Microsoft); Solution Architect for IBM Big Data (Netezza), SAS and PriceWaterhouseCoopers. He has led or contributed to analytic solutions for more than five hundred clients across vertical markets and around the world, including AT&T, Banco Santander, Citibank, Dell, J.C.Penney, Monsanto, Morgan Stanley, Office Depot, Sony, Staples, United Health Group, UBS, and Vodafone. Thomas’ new book, Disruptive Analytics, published by Apress, is available on Amazon. He co-authored Modern Analytics Methodologies and Advanced Analytics Methodologies for FT Press and served as a reviewer for the Spark Cookbook.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
“As a rule, machine learning techniques are more robust with dirty and incomplete data.”
This sentence is incomplete;
“… compared to their more traditional linear model counterparts.”
“As a rule, machine learning techniques are more robust with dirty and incomplete data.”