Startup Builds GPU Native Custom Neural Network Framework
January 26, 2018 Nicole Hemsoth
It is estimated that each day over a million malicious files are created and kicked to every corner of the web.
While there are plenty of options for security against these potential attacks, the methods for doing so at the pace, scope, and complexity of modern nasty files has left traditional detection in the dust—even those that are based on heuristics or machine learning versus signature-based.
With those traditional methods falling short of what large enterprises need for multi-device and system security the answer (to everything in IT in 2018 it seems) is to look to deep learning. But this is a slightly different story than you might be expecting.
While most of the services, cybersecurity and otherwise, that tout themselves as deep learning-based AI are using one of the handful of publicly available deep learning frameworks (TensorFlow, Caffe, etc.), the real innovation is happening among the very select few companies that build their own deep learning frameworks from scratch, both for training and multi-device inference, without using external libraries and all rooted in an optimal hardware architecture.
Specifically, we are talking about one of the few companies GPU maker Nvidia has backed, in large part because the custom framework is CUDA native for the highest GPU performance on training workloads. The company, Deep Instinct, is led by Eli David, who tells The Next Platform that there is nothing simple about building a custom AI framework but given the shortage of options for workloads outside the computer vision, speech, and text spheres, it was the only alternative for a truly competitive cybersecurity business.
We will get to his experiences with building the framework, scaling it across in-house GPUs for training (and which GPUs perform well) and more in a moment, but first, what is wrong with the current state of bad file detection—and why aren’t the several existing AI frameworks not up to the task?
When it comes to detecting malicious files, which is the sole focus for the startup, David says the difference is that they can look at a file as a bunch of bytes—all files are treated the same despite big difference when it comes to training time. This is different for machine learning approaches to the same problem in several ways. For instance, with traditional machine learning feature extraction is required before that data is fed into the model. Since that can be limited to just a few hundred features it is easy for new malicious files to get around what those approaches look for.
David says that in his teaching experience the publicly available frameworks like TensorFlow are great for research but are not efficient enough for deployment in cybersecurity, which comes with its own host of challenges. “When applying deep learning for computer vision, speech, or text you’re relying on the correct assumption that the input has local correlations. For example, if you look at a real world image the adjacent pixels are correlated in their colors. The same thing happens with speech. So using convolutional neural nets is good with local correlations in the data,” David explains. In Deep Instinct’s branch of cybersecurity, the binary of the many executable file types would look like random noise with no local correlation. Here it might seem like a fully connected network would be a workaround but with millions of files, this would quickly become far too unwieldy and inefficient.
There are other more subtle challenges to using an open framework. If using TensorFlow or PyTorch for image recognition for instance, it is necessary to resize or crop all images to be a standard input size. This is impossible with files which can range from 100KB to 100GB or more. This could not be done on standard frameworks and, as David adds, “even if you could, it would not be efficient during inference because of the large set of libraries required that would eat too much memory and slow response times.”
“The big AI frameworks are good for computer vision, speech, and text and if what you’re doing is cloud-based where you don’t need to care as much about the inefficiency or memory usage. The main issue of the those frameworks is also when you want to apply deep learning at the edge rather than server-side. They aren’t efficient enough for edge computing and because more of them implement the building blocks of deep learning for those domains in vision and speech it is next to impossible to adapt them other domains.”
Deep Instinct uses an in-house cluster based on an unspecified number of Pascal GPUs to train across hundreds of millions of malicious and legitimate files with the inference part running on the many clients in an enterprise (from mobile to PC to servers). The training, which David says is redone every 3-4 months depending on threat activity, uses labels for each of the files it feeds and on the other end, shows solid accuracy for users. That we can’t attest to since we’re more focused on his build of the framework and choice of hardware, so with those things in mind, it is of great interest to us that his framework trains 100X faster than the team could get with CPUs only and has far bigger speedups than computer vision, speech, and text deep learning workloads trained on GPUs. “The same training cycle that would have taken us three months on CPU can be done in just over a day on our GPUs.”
The framework is written completely in C (not Python, we were surprised to hear) and all directly on CUDA. This could be part of the compelling performance the team gets from its framework.
David says early tests with the new Volta GPUs have shown more remarkable improvements. Further, with their paring down of precision from 32-bit to 16-bit will make the new GPU a worthy investment. He also says that while their infrastructure is all in-house currently, he is not opposed to looking to cloud-based GPUs once the price comes closer to on-prem than it is now. Like many, he has been watching the gap between on-prem GPU cluster and cloud-based machines with the latest hardware shrink but it is still not a short enough jump.Pascal is the root for all the training runs but his research team is stocked with TitanX cards, which David notes are a cheap way to get good performance for smaller development runs.
Again, deployment happens on any device with the pre-trained model and takes up a “few tens of megabytes of memory” and delivers millisecond responses to incoming files.
None of this to say it is a practical bet for all startups to get an edge by rolling their own deep learning framework from scratch, but in areas that fall outside of that sweet spot of correlations, we might see more of this happening. In tandem will be more trends toward more custom hardware for specific AI workloads—perhaps even those rooted in custom frameworks for specialized applications. David says he watches this space closely because any performance advantage big enough to matter to his bottom line could be important enough to retool his code for. On his watchlist in particular are Graphcore and Wave Computing, both companies we’ve covered here.