There have been numerous attempts to bring optical processors into the mainstream computing fold, but they remain far behind their silicon cousins for practical reasons including programmability and overall practicality, especially since host processors and full infrastructure are still required to support them.
One of only a few companies seeking to commercialize the potential of light-based processing, UK-based Optalysys, is working with the Genome Analysis Center (TGAC) to show how optical processors can perform for certain data and compute heavy areas in genomics in a far more high performance, low power, and accurate fashion.
But before we get too far ahead of ourselves here, let’s talk about…well, wheat. Basic bread wheat. Although it does start to appear far less “basic” if you look at it from a genomics perspective. The bread wheat genome has 5X the DNA characters of our own genes—and, as one might imagine, that means baking the full sequence is quite a demanding computational task.
To be fair, the computational side might be more interesting from a pure performance angle, but the real challenge for sequencing centers working with dense genomes like that from wheat is data movement. Research hubs like TGAC have tried to circumvent the massive data handling issue by looking to large shared memory machines to bear the burden of at least one of the static genomes. The idea is that if they are able to store as much of one gene within a system’s large pool of memory for the comparative analysis (sequence alignment), the expensive trips back and forth from disk can be reduced and the overall time to analysis reduced. All of this means better performance, but even for centers like TGAC that are taking the computationally efficient route to genome analysis and alignment, there are still massive power and cooling concerns to contend with. With all cores blazing and the system at full utilization, however, TGAC is burning 130 kilowatts—a major barrier to its ultimate goal for sequencing technology.
These are all issues that TGAC considered during its massive sequencing effort to unravel the “simple” bread wheat genome. To do this and more human-focused genomic analyses, the center became home to one of the top systems in the world, utilizing a massive shared memory machine. TGAC has an SGI UV 2000 system with 2,500 “Sandy Bridge” Xeon cores and 20TB of shared memory—the latter a necessary feature to keep as much of the genome data in place for analysis versus moving to and from disk during sequence alignment. In essence, during this critical element of bioinformatics, the system seeks strings of DNA characters within a larger string (typically a genome) to find similar genes and thus determine common ancestry, for example. It’s like a complex, memory-intensive “spot the difference” puzzle, which means it can be useful to keep one entire genome (if possible) entirely in memory.
But even with the performance and efficiency savings of a large shared memory machine, it’s still racking up major power and cooling costs. But what if it was possible for this type of processing to happen within a small desktop-sized machine that could plug into a standard main for power and process, on the spot, a human genome? If proven functional at scale, optical processors could displace standard clusters for gene sequencing in a far more power efficient way—there is little heat generated, especially compared to silicon technologies. And even more interesting, what if memory and the scalability limits therein were no longer a concern?
These “what ifs” seem to present a rather tall order, but TGAC is working with Optalysys on a prototype processor that uses low-power lasers instead of standard electronics for processing. The goal is to do this genomics work using 95 percent less power than standard processing technologies.
“This project we got funded for with Optalysys is to design a new processor, in this case, an optical processor versus standard electronic processors. A current challenge with our HPC system is the amount it costs to power and cool large resources, which led us to ask how we provide the performance for our researchers but cut down on the major power costs. We’re hoping with this new processor that’s using light instead we can power a huge machine from standard main supply. We don’t have the same issue with heat, there’s not the same resistance like electrons have when they’re traveling—these are low-powered lasers”
Although Tim Stitt, Head of Scientific Computing at TGAC has yet to see the prototype his team will be working with, there has already been a great deal of collaboration to understand the programming and other features of Optalysys optical processor, and Optalysys is working to get a functional bit of hardware into TGAC’s hands. To be fair, it is better to think of the Optalysis device as a coprocessor, since there is still a host CPU required to handle part of the processing and other back-end operations. The device can connect to nodes inside an HPC cluster using standard PCI-Express links.
Of course, this all leaves some questions open. On the hardware front, one question is how, even with rapid processing of sequence data, the data movement is handled. Stitt says the bottleneck is still getting the large genome data from a file system onto the device, but there is hope on the horizon following work some companies have done with FPGAs using SSDs and direct memory access (DMA) models to improve the performance of moving data from a remote device or file system to the processor. Without an efficient way to do this, all the efforts at making an efficient processing engine for sequence alignments would be lost, so a large chunk of TGAC’s funding for the project will be to further explore the data movement problem.
The optical processing is a bit different from what goes on in a CPU.
“The work is done on small 4000-pixel liquid crystal displays, which are cheap and a lot like the ones you can get on a television screen. In essence, you feed data into the pixels and using defraction, it can perform scalable, very fast pattern matching—a perfect thing for us since we’re looking for patterns of DNA characters inside a much larger set of characters.”
Programming liquid crystal displays might sound a bit odd, but Stitt says they are able to use standard programming models, written in C primarily. It’s not the typical ones and zeros approach, but rather the data is encoded as an image onto the LCD.
“We’ll take an ASCII file of our genome data, encode it as an image on the LCD, which is less complicated than it sounds, and since we’re looking for a query sequence in a large reference genome, we can encode that reference genome in the LCD, give it our input (the query sequence) then query that. The Optalysys processor then compares the two and provides output via a similarity map showing peaks and valleys in similarity between the reference and query genome data.”
While this is all still in development, TGAC is confident in the prototypes Optalysys has built and demonstrated for other areas, including for the computational fluid dynamics (CFD) market. where certain types of spectral analysis operations are a good fit for massively parallel, almost instant comparisons of data. This addresses the power consumption problem by offloading the most complex calculations to be handled in light versus heat-generating electrons, and further, eliminates the memory wall that binds some genomic applications performance-wise.
If proven at TGAC, optical processing could change how we think about the scalability of machines for genomic analysis. Stitt says that scaling the system is only a matter of adding more LCDs, although it will be interesting to see what happens with the programming framework at that level—not to mention how data will be moved around the rest of the host system.
Eventually, Stitt says, there will be no heavy duty back-end system required to support these sequencing workloads.
“A major part of our plan is to be able to let hospitals do their own genome analysis, but it’s not realistic for them to have a giant HPC system in the basement. We wanted to look for ways to bring true personalized medicine to these centers without having them ship off data to high performance computing or research centers for analysis. They should be able to do it on the spot.”