The phrase, “past results are not indicative of future returns” can be applied across the board in life, but in financial services, this philosophy has become a strategic, mathematical principle to live by.
Financial risk modeling via backtesting analyzes a particular strategy against historical data and trends before that strategy goes live. Despite its importance to a firm’s overall strategy, securing the on-premises cluster resources to run comprehensive backtesting models is not always straightforward. And for some smaller firms, the physical infrastructure needed to scale the business with sophisticated models is a cost-prohibitive capital expense. This is, according to Justin White, CTO and co-founder of Boston-based financial tech firm, Elsen, creating something of a slow revolution, putting new firms, especially those managing under $5 million, on more level competitive ground.
Even for companies with significant investments in high performance computing infrastructure in-house, there are still bottlenecks. White pointed to experiences with backtesting teams whose simulations were pushed to the back of the queue in favor of other larger-scale or more time-sensitive applications running on in-house clusters. In such a case, having rapid ability to on-board and quickly scale meant more time spent optimizing the resulting strategy. Another client White referenced wanted to run 250,000 Monte Carlo simulations at time, but only once per week. Here we have the “perfect business case” for the cloud; spotty but intensive use to replace the cost and maintenance of a cluster that might otherwise sit idle. Using cloud infrastructure backed by GPUs, it was possible for this team to meet the scale and demand needs without having a cluster sit idle six out of seven days per week. “While it’s true you may spend a nice chunk of cash for that couple of hours, compared to having someone manage that in-house system the entire week, there are real cost savings there,” said White. “Even though our driving force is throughput, not cost necessarily, this is another case where this shines.”
Aside from that, one of the major limitations of in-house systems goes back to scale. If a cluster for these simulations would only support ten of these simulations simultaneously, there’s both a time to result and overall productivity argument to be made.
“It’s trivial for us to spin up a new set of instances with the code that was on the previous system; now there are no limits in theory to how many simultaneous simulations there are.” These situations, where having quick access to instant scale is key, represent some of Elsen’s early growing user base, which is using the startup’s cloud-based backtesting paired with GPUs from those cloud providers who offer them. (It’s a short list—Peer1 Hosting and Amazon Web Services are two of the main infrastructure providers with Nvidia Tesla GPU coprocessors in their systems).
GPUs are a key differentiator for the “backtesting as a service” that Elsen is pushing. And for a small company with relatively young co-founders, Elsen brings quite a bit of GPU computing and financial services experience to bear. The company’s first prototype was based on GPUs for backtesting, which was something CEO and co-founder Zac Sheffer developed while building models at Credit Suisse. Along with White and a steadily growing team, Elsen began building more robust models for GPUs, even though not all calculations for backtesting can benefit from the parallelized boost. As is the case with everything simulation-driven, a lot depends on end user applications and requirements, but the GPU angle, coupled with the freedom from maintaining on-site hardware, is where Elsen hopes to put their experience into play.
In essence, the company’s system allows a user to build the strategy to test against, then run that strategy against a large amount of historical data before making the decision to move that strategy into place against a series of live data. Given the rapid, near-instant scalability of the cloud and handy access to a massive well of historical financial data, the idea is that users can quickly boost the volume of variables they look at over time. The scalability of adding more nodes, the ability to more efficiently process with GPUs, and the ability to tap into more data for backtesting at once is the driver for Elsen’s belief in this model for the future of backtesting at smaller funds.
“Traditionally, people may have been used to testing across a portfolio companies—but that topped out between 10 up to 500 at the high end,” White tells The Next Platform. “Since we can scale so quickly, it’s irrelevant to us how many companies they want to test against. Now users can make the quick decision to look at the entire New York Stock Exchange or the entire S&P 500 index. That’s been difficult before, but it all comes down to how we parallelized this system.”
The system can use the GPU to consider each company to look at as a “thread” and since one can run several thousand threads simultaneously, each company can be processed or backtested in lockstep with all the other companies, which allows backtesting at great scale and with many more variables to consider in one go. “To the GPU, numbers are numbers, as long as the algorithm is not particularly path dependent, then it will shine,” says White. “Because typically our users will be running against 2,000 or 5,000 companies, every single one of those companies has a timeseries of its own, which has the exact same set of calculations applied to it that is completely independent of the other timeseries around it. So as long as the algorithm is not or strategy is not path dependent, the GPU sings.”
The end result, according to Elsen, are financial modeling algorithms that run 600X faster than on CPU only-based systems. Sumit Gupta, general manager of accelerated computing at Nvidia, got behind this Elsen benchmark, agreeing that “For many financial services firms, the ability to maximize the efficiency of computational resources and increase financial application performance is game-changing.”
White says the “secret sauce” is how Elsen prepares and stores data. While he couldn’t reveal much, he says the format Elsen stores the data in inside the Postgres database and how it handles joins between tables for discovering data are keys, but their code yields massive improvements in how they procure and pull the data. “The whole point is that people can focus on increasing and improving their strategies, instead of working on the plumbing to actually start testing their strategies.”
While Elsen can also deploy on a customer’s in-house private cloud, the hosted option is dominant since its pitch is far less about infrastructure than it is about software that is designed for massive parallel jobs that can handle multiple streams at once through a single API call. On the software side, a great deal of backtesting is done using tools like Excel, and one step up from that is Matlab, but in these cases, a great of the effort is spent on building strategies versus using the time to optimize and understand them. According to Sheffer, “A lot of those generalized tools don’t always have the best support for parallelizing some of the actual functions. So, for example, Matlab has the distributed toolbox, which is great, but if you can’t get the data to the GPU fast enough that it doesn’t get you anywhere.”
It’s this type of bottleneck Elsen is countering. The goal is to take the entire chunk, dropping it wholesale into the system, and let users make a single API call to do something as intensive as performing a relative strength index against the entire NASDAQ listing from the last two years. Instead of them having to worry about how to implement the data, prepare it, and backtest it and then write the actual signal before getting the results, they can make one call and we take care of all the rest of it.
The core of the Elsen system is written in C with a couple of different interfaces, which are glued together with Python. The API is based on a “very webby based stack”, says Sheffer. This allows Elsen to bridge the divide between hosted and private cloud environments. Postgres and JavaScript round out the software stack. While Elsen is not accelerating Postgres (not yet, anyway—this is an area the company is working on), the cloud and performance angles are setting Elsen apart, says White. There are a large number of software companies catering to backtesting, but as of yet, there is little momentum among them to offer these services with GPU acceleration in a cloud environment where the infrastructure and software sides are abstracted away.
When one thinks about financial services for cloud, the limiting factor is always performance, even though companies like Amazon Web Services have tried to correct that by adding high-end processors, 10 GB Ethernet, and other performance enhancements to suit the HPC set. Although many in-house high performance computing clusters are used for backtesting strategies, the computational demands far outweigh I/O concerns, which means the cloud performance hit is less important. As Sheffer notes: , “For most users, we’re looking at 100 GB to 200 GB of data. The network is seldom a bottleneck for these things as well and latency isn’t crucial, either. For high frequency trading, its different, but the real target here is throughput—something we’ve worked hard to push.”
For hedge funds, faster and more scalable processing confers the ability to run more complex models with more complex calculations, scenarios, and sensitivities. That can mean understanding certain risks better than a competitor, or finding a sustainable trading advantage. “Running more simulations increases both the quality of your results and your confidence in those results,” says Havell Rodrigues, chief revenue officer at Elsen. “That can translate to faster time to market with new or refined strategies.”
Be the first to comment