Crunching Machine Learning And Databases Together On GPUs
May 8, 2017 Timothy Prickett Morgan
While it is always best to have the right tool for the job, it is better still if a tool can be used by multiple jobs and therefore have its utilization be higher than it might otherwise be. This is one of the reasons why general purpose, X86-based computing took over the datacenter. Economies of scale trumped the efficiency that can come from limited scope or just leaving legacy applications alone in place on alternate platforms.
The idea of offloading computational tasks from CPUs to GPU accelerators took off in academia a little more than a decade ago, and relatively quickly the high performance computing community and GPU maker Nvidia extended the existing Fortran and C++ frameworks commonly used on CPU-based parallel supercomputers so these could have their compute capacity dramatically expanded and the resulting systems therefore have much better bang for the buck. It wasn’t long before it was possible to use the same machinery to do the computations needed for simulation and models to also do the rendering to visualize the results of those simulations and models, and the next step in the evolution of supercomputing as we know it is to weave in machine learning to analyze the results of simulations and models and then cast them forward to do predictions. So there will be three different, but intertwined, layers of an application all running on the same accelerated iron.
A similar kind of revolution is getting underway in the world of databases and analytics, too, driven by GPUs for the most part but also with FPGA accelerators in some cases. Several years after GPUs took traditional HPC by storm, researchers working predominantly with the hyperscalers like Google, Facebook, and Baidu, discovered a means of running neural network software that had been invented back in the 1980s to the issues of image recognition in a massively parallel fashion on GPU-accelerated systems and actually created software that surpassed the human ability to sort out what is in a picture. In the past five years, the state of the art in these machine learning frameworks has advanced at a break-neck pace, and the techniques for image recognition have been used for text, voice, and video recognition as well as translation services – not just between languages, but converting speech to text or images or video to text – with a stunning degree of accuracy.
The result is that the nascent Tesla GPU computing business that Nvidia formed in 2008 is now quadrupling in size, being driven by HPC and the explosion in machine learning, and reached an annual run rate of $1.2 billion at the end of Nvidia’s fourth quarter of fiscal 2017 in February of this year. Business is so good that rival AMD has finally gotten its act together with its Radeon Instinct GPU compute platform and its ROCm software platform, which not only supports OpenCL offload to its Radeon GPUs but also includes the ability to emulate Nvidia’s CUDA parallel programming environment for its GPUs on AMD’s own Radeons motors.
And now, there is another application that is emerging as an accelerated workload on GPUs, which is the relational database that is at the heart of most business systems and that has scale issues just like HPC simulation and modeling and machine learning did that can be addressed by the parallel nature of GPUs and their very high memory bandwidth. We have chronicled the rise of the two major vendors of GPU accelerated databases, namely Kinetica (formerly known as GPUdb) and its upstart challenger MapD, but there are others in the field such as Sqream, BlazingDB, Blazegraph, and PG-Strom. It is not clear if database acceleration will ever drive as much GPU iron as traditional HPC simulation and modeling or machine learning, both of which drive about half of Nvidia’s Tesla GPU compute revenues, but there is a chance that this could also be a significant part of the business. Particularly on systems that can both chew on large datasets using standard SQL query methods and visualize the results of those queries on the same iron.
Ahead of the GPU Technology Conference hosted by Nvidia this week in San Jose, we talked to both MapD and Kinetica about the possibility that the emerging GPU databases could be mashed up with machine learning frameworks in the corporate settings, with both workloads not only sharing the same GPU iron, but also feeding back and forth into each other much as simulations, visualization, and machine learning are doing now in the HPC space.
“This is going to be a huge trend,” Todd Mostak, founder of MapD, tells The Next Platform. “If we have the database, visualization, and machine learning on GPUs, why can’t they work together somehow? People have been asking us left and right to be able to run TensorFlow on top of MapD, or even to run regressions on data to connect the dots without having to leave the GPU ecosystem. Right now, people are using Spark, which is fine because it is connected to Hadoop, but to get any kind of performance with the datasets these companies are using, they have to do a massive scale out and then they have to move the results from the CPUs in the clusters to the GPUs. But if you keep everything in the GPU memory and pass results from different parts of the GPU ecosystem, that would be a huge win.”
One of the issues that has yet to be resolved in this mashup between GPU accelerated databases and machine learning frameworks that are also dependent on GPUs for radically boosting their performance is how the data will be stored in a cluster of machines. Do you have a partition of machines in a CPU-GPU cluster that are running the databases and another partition of machines running a machine learning framework like TensorFlow, Torch, Caffe, Theano, or CNTK, or do you actually try to store data underlying the machine learning algorithms in the GPU database itself?
One approach, says Mostak, is to use the GPU database as the store of record, and even if the data is not stored in a tensor format within the database, the database could be tweaked so it could output into that format and pass it on to a framework like TensorFlow. People then use plain old SQL too query the database to get a subset of the data and then pull it into the machine learning framework. “We see this all the time, where companies are pulling subsets of data into their training algorithms, and they need to do this very quickly,” he adds. The other method is to actually pull the native machine learning formats into the database itself, because they do expect structured data, not just a core dump of clickstream and image data or what have you in an object store. A tensor is, Mostak explains, basically a vector that is expressed in something analogous to a columnar database format, so there is already a good fit here.
The convergence of artificial intelligence and business intelligence is something that is on the mind of Amit Vij, CEO and co-founder of GPU database maker Kinetica, a lot these days, too. And he concurs that the exact kind of GPU-heavy systems that companies are deploying to do the training of machine learning models have the same architecture as systems that would be perfect for running GPU accelerated databases like Kinetica. The hyperscalers have come at this problem from a consumer angle, trying to sort out our cat images and family videos, but Kinetica, formerly known as GIS Federal, has a far more serious background.
“With our background being incubated by the military, we have used machine learning and image recognition on entities being tracked by drones and feature extraction back on the bases to identify cars and other things,” says Vij. “We already have a database that is GPU accelerated and distributed, and that can converge AI and BI in a single platform.”
You can literally run Kinetica and a framework such as TensorFlow or Caffe or Torch on the same GPU clusters, but for workload management reasons, says Vij, it is best to run the database and the machine learning workloads on different partitions. (This is different from what MapD is talking about above, where you try to keep everything in the GPU memory and store and access it from there.) By partitioning the AI and BI workloads, you can keep the system from thrashing, says Vij, and having the two workloads adversely impact each other. Kinetica has a container environment as well, which allows for each workload to be independently scaled up and down on the cluster and run side-by-side in a dynamic and virtualized fashion rather than static bare metal means. Kinetica can store billions of rows of data in-memory (in either the CPUs or GPUs on a system) and has user-defined functions that allow for TensorFlow and other machine learning frameworks to train from datasets stored in tables in the very large databases. For instance, financial services firms can grab months or years of stock ticker data and do predictive analytics for stock prices given a wide variety of external conditions that can be correlated to those stock prices. (This is one use case that has been deployed by Kinetica.)
In general, at the early adopter customers who are mashing up the database and machine learning workloads have many more machines dedicated to machine learning than they have for database processing. The typical Kinetica cluster deployment is across 40 to 60 nodes, with lots of GPUs in the boxes, and these would make pretty good clusters to run machine learning algorithms, too. Particularly as the machine learning frameworks are extended with Message Passing Interface (MPI) protocols as is being done by researchers at Ohio State University or other using other methods as Facebook has recently done with the Caffe2 framework that it has just open sourced. While companies can deploy Kinetica in cloud environments, it tends to be deployed on premises given the sensitive nature of the data being used in financial services, military, or manufacturing operations, and customers usually deploy active-active high availability clustering given the fact that the applications that depend on the GPU-accelerated database are mission critical and cannot go down.
To make this integration between AI and BI even easier, Vij says that Kinetica is “deeply aligning” with the TensorFlow framework opened source by Google and making tensors a first-class citizen to be stored as a data format within the Kinetica database. This is not a hard task. The original GPUdb database created by what was then called GIS Federal – both the company and the product are called Kinetica now – started out as a geospatial and temporal engine and then it uses JSON objects to represent a point, line, or polygon, and having a native object for machine learning is perfect for the GPU database because it is operating on matrices.
“I see this as an idea solution, with everything like an Apple product and encapsulated in a single piece of technology and making for an easy deployment for the end user customers,” says Vij. “And we will not just be consolidating technology, but people as well. One person can be the database administrator and machine learning expert and data scientist – they don’t have to be fluent in five or ten different technologies and they don’t have to resort to using open source frameworks on their own and which are more batch oriented anyway. We become the distributed GPU pipeline for all of these stacks and lets developers use us as a database platform without having to move the data from one technology to another.”
No one has the time, money, or inclination to do all of this data movement. So there will be a strong case to be made for a converged platform for databases and machine learning. It doesn’t hurt that most companies resent the high software bills they pay for Oracle databases and extensions or their IBM and Microsoft equivalents.