The GPU Database Evolves Into An Analytics Platform

As the name of this publication suggests, we are system thinkers and we like to watch the evolution of a collection of tools into a platform. That’s the neat bit, when all the propellerheads have done the work on the components and the integration and when those tools can be useful for the rest of us.

It is with this in mind that we consider the evolution of MapD, an upstart maker of a visualization system and a database – both accelerated initially by GPUs – into OmniSci, which over the years has evolved into a platform for accelerated analytics in its own right, competing toe-to-toe and head-to-head with Spark in memory processing and leaving the MapReduce and Hadoop storage crowd in the dust.

It has been a while since we caught up with Todd Mostak, the creator of the MapD GPU-accelerated database and the chief executive officer of OmniSci, a name the company started using two years ago. That lapse in coverage about OmniSci is our fault, not his, but we have a fascination with databases and commit to you here that we will do a better job in this area because it is one of the most important areas of innovation in IT right now.

For those of you who don’t know the back story, you can read our initial coverage from June 2015 on MapD here to get all the details, but here’s the short version. Mostak was working on his master’s thesis at Harvard University and taking a course on database design at the Massachusetts Institute of Technology at the same time. The thesis was on how social media helped foment the Arab Spring revolution in Egypt, Mostak’s idea was to map the 400 million Tweets relating to this event that he had obtained from Twitter and test the idea that people were more radicalized in poorer areas in Egypt. The sentiment analysis for ranking the Tweets came from comparing them to forums and message boards. The initial system used a Postgres database and the PostGIS geo-spatial plug-in for it. This system crashed when trying to run the SQL queries Mostak created. Sam Madden, one of the co-creators of the Vertica parallel database, was teaching the database class at MIT, and because of this cross pollination, Mostak came up with the idea of creating his own GPU accelerated database that could handle the large row counts he needed and also have enough processing oomph to actually chew through that database to get answers.

And thus were the beginnings of the OmniSci accelerated analytics platform. Under the encouragement of Madden and database luminary Michael Stonebraker – who has created Ingres, C-Store (commercialized as Vertica), H-Store (commercialized as VoltDB), and SciDB, among others – MIT took Mostak under wing and incubated the database and visualization system. (We had a long chat with Stonebraker a few years back about the interplay of hardware and database design.)

Over the years we have talked with Mostak about the slow databases that wear everyone down because they can’t provide answers quick enough or sophisticated enough, and companies have datasets that scale to tens to hundreds of billions of rows or even to trillions of rows that they can’t really query. And then, a few years ago, when it became clear that machine learning was going to be the next important workload that needed a database underpinning to make it usable for enterprises, Mostak started reworking the database he created so it could store machine learning data in its native formats. As Mostak explained at the time, a tensor is basically a vector that is expressed in something analogous to a columnar database format, so there was a good fit to pour machine learning data into a GPU accelerated database that could not only provide the information to feed training algorithms, but could be used to do other kinds of analysis and visualization with that data. This was an order of magnitude faster than storing the data in Spark, as we showed at the time, and that was also when MapD open sourced its database to try to build more momentum and community around it.

Now, OmniSci has a platform, and it looks like the business is ready to take off just as all of the pieces are coming together. The database, OmniSciDB, now runs on Nvidia GPUs (both Tesla and otherwise) and through a partnership with Intel announced last October on Xeon SP processors, so technically it is no longer just a GPU accelerated database.

The rendering engine was part of Mostak’s original work, and it has always been paired with the GPU-accelerated database even people didn’t talk about it explicitly. The OmniSci Render visualization engine, which is driven by the popular Vega declarative visualization API, runs on the same GPU-accelerated iron as the database. This is important because that means data is already in the GPU memory when it needs to be rendered and that also means it can be done lightning fast pointmap, scatterplot, and polygon visualization, and for geospatial visualization, the system can handle billions of points, line, or polygons.

Immerse is an interface that rides on top of OmniSciDB and Render together, providing maps, charts, and heat maps in an interactive fashion and allowing for data to be cropped visually with a few clicks of the mouse and then processed, rendered, and visualized. And of course, data that is at rest and that is streaming can be sucked into the OmniSci platform, and applications can link to it through the usual interfaces, including ODBC and JDBC database interfaces, native JavaScript and Python calls, and the Apache Arrow and Apache Thrift analytics frameworks. The Google TensorFlow, Facebook PyTorch, and machine learning frameworks can lay on top of this stack, using Anaconda or Numba for data management and Apache Arrow for zero copy data interchange.

That brings us, more or less, to now, when OmniSci has turned in the best quarter in the company’s seven-year history, notably closing a deal with manufacturing giant Proctor & Gamble, which has some of the most complex analytics in the world. (Selling soap is a tough business.) Being still a relatively small startup on the blade of the hockey stick and moving upwards – it is always hoped – to the handle, the company does not give out revenue figures or employee counts. Bust Mostak says that the company has somewhere on the order of five dozen enterprise customers running the OmniSci stack on premises and hundreds of customers running it on the public cloud, usually in an ephemeral fashion where they fire up workloads and shut them down when they are done.

“Over the years, the focus has been increasing performance, increasing concurrency, increasing scalability – the way we have differentiated ourselves,” Mostak tells The Next Platform. “Now, people expect that, and they are moving from ten users on a proof of concept to hundreds of daily users on the platform, and running heavy queries, and to take the next step and get into these bigger production deployments, we have to ensure the same level of performance. And our partnership with Intel should not been seen as Nvidia versus Intel, but rather that the GPU and the CPU are complementary in the sense that a lot of our customers have queries that run on a core set of data that they want blazingly fast to power interactive analytics or real-time alerting or things like this, but they want to be able to fall back to the CPU with a slow lane where they can handle even bigger workloads so they don’t need a Teradata data warehouse or something like that on the side of their operational databases. We are getting to the point where I think we can become the main analytics platform, and when we started out, we thought of ourselves as a kind of fast cache.”

Customers still tend to think of OmniSci as a GPU accelerated database, and they usually start there, but then they find they can do bigger workloads somewhat slower since the OmniSci stack runs atop Xeon SP processors and makes use of their AVX-512 vector engines to accelerate database functions – albeit not as fast as can be done with an Nvidia Tesla V100 GPU. The CPU has more main memory than the GPU, but the GPU has a lot more cores and a lot more floating point oomph.

In the first quarter, OmniSci closed ten deals with large enterprises or government agencies, with half of the deals being expansions of existing database setups and half being new ones; all of these deals were in the low to high six-figure range. Proctor & Gamble was an expansion of an existing setup done last year with one division, and now it is being rolled out across a number of business units. This is exactly how data warehouses and Hadoop clusters and Spark clusters grew in their days.

What we wanted to know is what GPU motors customers are using these days when they run the OmniSci platform. On the cloud, because there is still wide availability of those vintage “Kepler” K80 dual-CPU cards, a lot of customers still use these and even some in-house organizations do this because there are no cheaper FP64 flops in the world right now. (This is not the most energy efficient of FP64 flops, mind you.) For in-house setups, there are customers that are still building out their infrastructure using “Pascal” Tesla P100 accelerators. But for new customers, the preference by far is for the “Volta” V100 accelerators. There are a few customers who have gone with the Quadro RTX 8000 cards, based on the “Turing” TU104 GPUs, but these are weak in the FP64 and you have to be careful that your data matches the GPU. The actual machines encapsulating these GPUs tend to come from Dell, Hewlett Packard Enterprise, and Supermicro – and Nvidia itself with its DGX line.

Which brings us to the new “Ampere” GA100 GPU and its A100 accelerator implementation, which Nvidia launched two weeks ago. Here are Mostak’s thoughts on the new device: “It’s pretty awesome. The new Tensor Cores are exciting, particularly as we get more into machine learning and we are thinking of ways that we can leverage it for certain database functions. There is much higher memory bandwidth, at 1.6 TB/sec for the A100 versus 900 GB/sec for the V100, and that pretty much gives you increased database performance right out of the box. The other big thing for us is the L2 cache, which goes from 6 MB to 40 MB, and that benefits random access workloads. Some things can run 8X to 10X faster if they can fit into the L2 cache just because of its even higher bandwidth. I would have loved to see more HBM2 memory.”

OmniSci is not alone in this, but there is a latent 8 GB more of memory awaiting yield improvements in the Ampere design, and hopefully we will see that before too long. We were thinking that 64 GB would be even better than 48 GB, with eight banks of memory instead of the six that Nvidia put on there, balancing out one bank for each slice of the GA100 GPU. But that was clearly a bridge too far with the latest HBM2 memory and the packaging for the Tesla A100 accelerator or Nvidia would have figured out a way to do it. This, too, could be coming in the future, and in fact, that’s what we expect since at this point the Ampere GPU is more memory capacity bound than anything else – it has plenty of compute and plenty of memory bandwidth and plenty of NVLink interconnect.

As for other CPUs and GPUs, OmniSci is keeping its options open. At the moment, there are no plans to support the AMD Radeon Instinct GPU accelerators, but as part of its partnership with Intel, it will be supporting the future “Ponte Vecchio” discrete Xe GPU, which Intel previewed last November and which is expected to be deployed first in the “Aurora A21” exascale-class supercomputer at Argonne National Labs along with the future “Sapphire Rapids” Xeon SP processors. By the way, Mostak says that porting to the AMD GPUs would not be a big deal, but it all comes down to customer pull and vendor assistance. So if AMD wants to make it happen, there is still time before the Intel GPUs hit the ground.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now


Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.