It was only a few years ago that the foregone conclusion was that future of mission-critical applications would hinge largely on Hadoop. Month after month brought new additions, spin-offs, and startups to prop up the burgeoning framework, expanding the ecosystem and sometimes just plain pushing the elephant to do things it just wasn’t designed for (work in high performance computing environments, in near-real-time, handle a wealth of snap-in replacements to its native file system, and so on). As this story has played out with countless other panacea technologies, it led to an unrealistic set of expectations at best about what Hadoop could do and now we have learned what it couldn’t, and never should, do.
While the expansion around Hadoop led to some noteworthy improvements, particularly performance-wise, the sole focus for some organizations has been on the roots of the project. In other words, looking at The Next Platform as a viable tool for its basic components. Although it might still be too early to say the hype cycle and all the money that sloshed around it is winding down, if some of the more recent user examples of Hadoop implementations at larger organizations serve as any indication, the real value of the framework might be playing out to a more sustainable beat. Consider the insurance industry, for example, which is usually one of the last to integrate “risky” platforms into their workflows. The driver now, as we’ve all heard a bazillion times, is data. And for the insurance industry, that elusive promise of finding the hidden gold nuggets in data streams remains a compelling one.
Among those magical “gold mines” of data Hadoop vendors consistently reference, there does appear to be something hidden in the mountains of insurance claims data, at least for a few large companies. As insurers include an ever-growing range of fields to add to the wealth of data around claims, including new sources that were otherwise untapped (including personal external sources ranging from social to sensor to geographic), the potential to do everything from spot fraudulent claims to improve how claims are paid based on finer-tuned models with more input is clear. The problem is that many insurers lack the infrastructure to handle the complex natural language processing, data management, and overall architecture to map this into their existing environments.
What is interesting about the insurance use cases for Hadoop is that there appears to be little sense that there will be a wholesale shift to Hadoop for critical operations. Rather, what’s happening, at least according to Cindy Maike, who manages the insurance industry ties for Hortonworks, is that insurance companies are looking to Hadoop to solve a few very simple problems that the framework is historically best at tackling. The value on the vendor side, of course, is that once they see that the “hidden goldmine” concept isn’t entirely mythical, they’re moving on to integrate Hadoop and its siblings into a growing set of operations, which leads to the ability to sniff out new sources of data (thus the cycle begins).
For many of the property, casualty, and health insurance companies Maike works with, it’s not a matter of layering on Hadoop to replace existing operations. Rather, she said, it is viewed as an enhancement to the wealth of validated, time-tested tools for analysis and data management. What is interesting, however, is that once Hadoop is added to the stack, quite often to solve basic unstructured data handling functions at scale, insurers are finally able to see firsthand what all the big data hype machines mean when talk about being able to tap into unseen veins of potentially useful data that once lay lost inside boulders of larger, functional information.
Maike described a few of the use cases around Hadoop with various insurers, including one health insurance company that had trouble matching the proper codes with incoming claims. Doing this accurately means that the company is paying out the appropriate amount given the existing fee schedules and oftentimes, if that code assignment wasn’t being handled by a human being (with exceptional weight if it appeared to be a potential fraud case) it was stuck in an ever-lengthening processing queue. To simplify this process, speed it along, and increase accuracy, the company combined Hadoop with a natural language processing engine to survey across all known data associated with a claim and ensure its readings were on spot with the claim code assigned. When one considers this insurer handled over 300,000 claims per day, this auditing challenge takes on new weight.
Going back to that conservative “trial phase” argument, however, once the insurer could draw a direct line in claims payment accuracy and Hadoop with natural language processing, it started to look at other ways this same story could play out with new volumes of unstructured data. This meant that the insurer could extend these tools but also primed the pump for it to consider more data types to incorporate into the process.
Another company Maike described took a similar tack—starting with a “toe in the water” approach to separate the hype from the high-value. The insurer simply had too many items to contend with during the claims evaluation and was forced to throw out a great deal of potentially valuable side data that might have shed brighter light on the claim. Although its existing fraud detection, predictive analytics engine and validation processes were able to sort through and flag key claims for further analysis, the “false positive” problem came to a head—too many items were being shuttled to the front of the queue as potential fraud. Using Spark, Storm, and Hadoop in concert, she says, they were able to optimize how these false positives were addressed in a streaming fashion, meaning a quicker time to response on actual fraud and a cleaner queue for the daily claims load.
While there’s nothing fancy about the use models insurers are tapping into around Hadoop and these companies are not buying the line that Hadoop is magical cure for the world’s IT ills, what’s notable here is how the natural progression of conservative technology adoption (starting with a basic need and the most efficient solution) has shown that Hadoop might truly be coming of age.