There is something to be said for being at the right place at the right time.
While there were plenty of folks who were in the exact wrong spot when the financial crisis hit in 2007-2008, some technologies were uniquely well timed to meet the unexpected demands of a new era.
In the aftermath of the crash, major investment banks and financial institutions had a tough task ahead to keep up with the wave of regulations instituted to keep them straight. This has some serious procedural impacts, and also came with some heady new demands on compute infrastructure. Post-regulation, investment banks that might have run valuation adjustments once per day were now required to run multiple such massive-scale Monte Carlo simulations in a 24-hour span. This meant pricing the entire portfolio of the bank (billions of individual pricing data points) as fast as possible—a requirement that meant even the huge compute farms dedicated to such simulations were overloaded.
At this same time as all of this, GPU computing was getting its legs, running at full bore on the then-top supercomputer at Oak Ridge National Lab and demonstrating that its programming environment was rich and capable enough to show strong results at scale. In a scramble to stretch existing CPU-only infrastructure farther for the new onslaught of valuation, risk, and other regulation-fed workloads, GPUs appeared in mature bloom at just the right moment. While doing such large-scale valuation adjustments (in addition to crunching the usual risk and other application sets) could have been managed on CPU-only machines, GPU offload arrived at the right place at the right time for big banks.
“It’s not uncommon now to find a bank with tens of thousands of Tesla GPUs,” Hicham Lahlou tells The Next Platform. “And this wouldn’t have been the case without that mandatory push from regulation. Regulation is driving the need for speed at scale in all financial applications; from market risk, collateral management, capital, and other areas.”
Lahlou has an inside view into the compute infrastructure powering some of the world’s leading investment banks, hedge funds, and asset management firms based on his work as co-founder of Xcelerit, which aids such institutions with their compute resource planning and deployments on both the hardware and software sides. Before focusing on banking IT infrastructure, he tuned supercomputing codes for the European Space Agency and researched code optimization strategies for GPU and CPU-based systems. He has watched the evolving application demands on big banks and says that architecturally, what big banks are looking for is performance and portability while internally, they work to keep their code scalable and optimized. So far, the only architectural combination that fits the bill for the simulation-heavy field is CPU and GPU. Now, with the arrival of the Pascal-generation GPUs with NVlink and support for both double and half-precision, the game is set to really heat up.
“The Pascal P100 GPUs are a serious jump, especially from what most of the banks we’ve worked with have, which are the K40s and K80s. The biggest difference is in raw performance with 3X more double-precision performance, which is important because those Monte Carlo simulations need to keep getting faster and double-precision is the preferred precision mode.” He says that most configurations have 2-4 GPUS per node and he doesn’t expect to see many with 8 or 16 (as we see for some of the shops doing deep learning training). The memory (16GB) on these GPUs is also important, as it stacked memory because Monte Carlo simulations are both compute and memory intensive. In short, the Pascal GPUs, for this market at least, appear to promise some big ROI out of the gate for the workloads concerned.
“Typically the large investment banks have compute farms with tens of thousands of cores and GPU counts in the thousands. There are a few players with more than a thousand GPUs in production,” he says. While the need for speed at scale has pushed this forward, he says other accelerators have not caught on in the same way, including FPGAs. Lahlou was hard-pressed to think of any deployments he knew of personally that leveraged custom ASICs or FPGAs, although there have been some prototype efforts. He pointed to a noteworthy FPGA deployment at J.P. Morgan for doing risk analysis but says, “anyone close to that project can tell you that it was an experiment that is no longer being used.” Although he says these are energy-efficient ways to boost calculations, the very reason why his company exists—because quants want to stick to their math versus become hardware/software engineers—FPGAs lacked the robust software ecosystem, programmability, and portability that GPUs came into the market with. He does, however, point to the real momentum in this market for ARM and Power 9 as contenders given the early sense of what these architectures might bring to the table.
Just as GPUs came of age at a time when big banks needed the acceleration most, GPUs are equally well-positioned for another trend that is hitting investment and other financial services institutions—deep learning. While for now, most of the training (which is generally GPU accelerated) is done on separate prototyping clusters, those workloads will be integrated into the larger workflows running on the massive machines. “In investment banking and hedge funds, GPUs are being used for deep learning. In algorithmic trading, they are using it to come up with new trading strategies based on market prediction; in asset management, they’re using it for portfolio optimization; in investment banking for more accurate Monte Carlo simulations (generative models in this case.” In these cases, Lahlou says there is no one framework that seems to dominate. Caffe and TensorFlow were cited, but he says his team is working with some to tune those for internal applications for GPU-based training.
The financial services industry is wholly dominated by the big OEMs (HP and Dell versus building their own infrastructure) but there are no trends on who appears to be favored there. He says that every one of the banks his organization has insight into is using a mix of CPU and GPU and he expects the GPU counts to tick up over time, especially with Pascal. The problems banks are facing, however, is making sure they have future-proof code and making sure that they are experimenting with technologies that keep them competitive but that can move and scale as architectures shift. “It takes time to get the software optimizations needed to get full performance out of existing hardware and the competition now between Intel, Nvidia, ARM, Power, and others means it’s also difficult to make big choices”
In many ways, the GPU computing story carries this “right place, right time” theme across time. In the early days, the graphics engine powering games could be tuned just so to handle CPU offload for its own computation—allowing Nvidia to keep a profitable business in gaming while building another with the same basic device. And just as the financial industry needed a way to run larger, more frequent, more complex simulations on the same infrastructure, GPUs appeared ready to go. And just as the industry started to look to bolster results with deep learning, a new generation GPU that could do both the double and half-precision work needed appeared.
In fact, some can say that Nvidia “lucked into” deep learning in general—it just so happened that the massive parallelism and mixed precision capabilities they were building anyway turned out to be a perfect fit for deep learning (versus designing the architecture to meet those specific needs). Luck is often less random than it seems, of course.