IBM Wants to Make Mainframes Next Platform for Machine Learning
February 15, 2017 Nicole Hemsoth
Despite the emphasis on X86 clusters, large public clouds, accelerators for commodity systems, and the rise of open source analytics tools, there is a very large base of transactional processing and analysis that happens far from this landscape. This is the mainframe, and these fully integrated, optimized systems account for a large majority of the enterprise world’s most critical data processing for the largest companies in banking, insurance, retail, transportation, healthcare, and beyond.
With great memory bandwidth, I/O, powerful cores, and robust security, mainframes are still the supreme choice for business-critical operations at many Global 1000 companies, even if the world does not tend to hear much about them. Of course, as with everything in computing, there are tradeoffs. The cost and flexibility concerns are chief on the list, but the open source push from the outside world is pushing new thinking into an established area.
Companies that have invested in mainframes have sound cause to continue doing so. They are highly optimized for transaction processing, are as secure as a system can be, and have been the subject of many millions of dollars in code investment over the years. They are certainly not cheap, but neither is moving the bulk of business-critical applications to a new architecture. The only thing that might push a large company to do so is a perceived lack of capability and choice—something that mainframe users are willing to tolerate in favor of relative safety.
While the case for mainframes is still strong, there is a lack of flexibility that users of commodity X86 clusters enjoy. Those users can freely scale up and out, integrate the latest open source frameworks for analysis, and continue to scale those operations in a more seamless, agile way. Mainframe users are slower to adopt newer open source frameworks that might give X86 shops a competitive edge.
To counter this gap in flexibility, IBM described an effort to bring the machine learning components from its Watson AI framework to the mainframe. The company already announced Spark for mainframes—this builds on Spark as the engine to deliver the machine learning capabilities so users can have machine learning on the system (versus moving it off those boxes for different analysis). As Rob Thomas, VP of Analytics at IBM tells The Next Platform, this opens new doors for mainframe sites. He points to the example of Argus Health Systems, which manages a number of healthcare providers. Unlike the more static analysis their teams did with analysis being run at set intervals, teams can now get continuously evolving updates about patients and providers that can be fed quickly into the models and rerun for new cost assessments that use the most recent combined data.
One could make the argument that mainframes have been chewing on machine learning problems for some time, given the rich host of statistical analysis tools that support such transactional system powerhouses. However, if the differentiation between those tools and machine learning as we know it now means the ability to refresh data and models continuously, the conversation changes.
“The revolution in data science we saw over the last few years has been driven by open source. We’re bringing that revolution to the mainframe in terms of open frameworks and the ability to access data on the mainframe without having to move it off. Considering about 90% of the world’s most valuable enterprise data cannot be accessed because it is on mainframes and private clouds behind company firewalls, we have to work to bring that same openness.”
In response to the new wave of “big data” five years ago, IBM rolled out a sidecar appliance for the mainframe that was designed for large-scale analytics without forcing users to move their precious data outside of mainframe architecture. Bringing that modern UI and traditional BI feel to the mainframe whet the appetites of those users that could do far more without sacrificing safety, and has led to increasing demands for more open connectivity with other languages, tools, and frameworks. Thomas says that when the company rolled out Linux capabilities for Linux, that did also open the playing field for adopting new frameworks from the open source world, but users were hesitant to move their z/OS data to the partitioned Linux due to security concern.
The z System mainframe will now allow its users to adopt several non-mainframe oriented languages, including Scala, Java, and Python and rolls that into a number of open source machine learning frameworks, including SparkML, TensorFlow, and H2O. Thomas says these tools can reach any data type that resides on the system and will save users the concern about moving mainframe data to a different system for the purposes of deeper analysis. That data movement alone would be striking, considering mainframes can handle over two billion transactions per day.
While the focus here has been on the mainframe, IBM extended this same set of capabilities to users with private clouds. In both cases, Thomas stresses that while the company has extracted some of the underlying machine learning tech from Watson, the endpoint for these customers is moving toward AI—the difference being a more human interaction with the data versus the advanced analytics of machine learning.