GPUs Set High Water Mark for Financial Trading Algorithm

In the fast-paced world of algorithmic trading, speed is of the essence – not just for the execution of the trades themselves, but also for developing the trading models that are becoming obsolete in increasingly shorter timeframes.

A set of Python-accelerated libraries run on Nvidia’s DGX-2 system just demonstrated that these trading strategies can potentially be created thousands of times faster than previously thought, which could transform how hedge funds and other financial institutions build their models in the years ahead.

The demonstration involved running a particular STAC-A3 benchmark on a single DGX-2 box. The machine, which Nvidia touts as an “AI supercomputer” is powered by 16 of its V100 GPUs, two Intel Xeon 8186 CPUs, and a slew of NVMe flash drives. The benchmark, known as STAC-A3.β1.SWEEP.MAX60 benchmark, was developed by the Securities Technology Analysis Center (STAC) benchmark council, a group of financial firms and technology providers that develops test suites for assessing technology aimed at the FinTech industry. As noted by STAT director Peter Lankford, algorithmic trading has become so entrenched in the financial services arena that most transactions these days are just “robots competing against robots.”

This particular benchmark measures how many parameter combinations of trading simulations can be performed in an hour, with the idea that the greater number of parameter combinations that can be tested, the more likely one is to find the optimal – that is, the most profitable – model for a trading strategy. The DGX-2 system ran 20 million simulations on a basket of 50 stocks in the hour allotted. That compares to 3,200 simulations achieved on by the previous recordholder, a 20 Xeon Skylake 64 vCPU configuration in the Google Cloud. That works out to 6,250 times as many simulations delivered by the DGX-2 setup.

The system was also able to perform 10,000 simulations on a basket of 48 instruments in less than 6 minutes using this same benchmark. And according to Lankford, going from 1,000 simulations to 10,000 simulations added only 0.104 seconds of computation time. “That suggests that a quant can significantly increase their parameter space at little cost on this platform,” he concluded.

Testing these models is extremely compute intensive, so the people that use these models – the quantitative analyst, or quants – tend to reduce the parameter combinations in order to get a useful model in a reasonable amount of time. On a small cluster, often these runs can take overnight, which means a limited number of trading strategies can be tested in the rush to deploy the next model.

That doesn’t mean every bank and hedge fund should run out and install a $400,000 DGX-2 system in their machine room. John Ashley, Nvidia’s Director of Global Financial Services, said he expects this workload to scale with the number of GPUs, such that V100-based systems built by OEMs or even ODMs could deliver similar results.

But according to Ashley, the key technology is actually on the software side, specifically, Nvidia’s CUDA-X AI software along with its GPU data science software library suite, known as RAPIDS. By the way, none of what’s being used here has anything to do with machine learning; it’s strictly data analytics using integer and floating point (FP64) computations. The Numba package, in particular, allows developers to write Python code that gets compiled into CUDA, which means quants can more easily build their functionality on top of RAPIDS.

The V100 and the DGX-type systems have been around for some time, so it’s entirely possible that some CUDA-saavy quant have already figured out this hardware can help them build their trading algorithms faster and have developed the relevant code in C/C++. According to Ashley, even Nvidia wouldn’t know if that is true or not, since hedge funds and other investment institutions tend to keep that kind of information hidden from any potential interested parties, even from their technology providers.

Nevertheless, making CUDA and Nvidia GPUs accessible from Python means a lot more quants will be able to use this technology to optimize their models. And if the benchmark blowout demonstrated here is any indication of what’s possible, there should be plenty of interest in giving GPUs a whirl.

“We think this result will encourage the hedge funds, banks, and other investment firms to push on with defining the rest of the benchmark,” said Ashley. “And we think we’re going to be able to provide some really interesting results in that direction as well, going forward.”

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.