It has been somewhat difficult to ascertain what problems the fastest supercomputer on the planet has been chewing on since it was announced in 2013, but there are some signs that China is now pinning the machine’s mission on the future of genomics, among other areas.
This July, the bi-annual list of the top systems in the world will refresh, and there is little doubt that with well over 33 petaflops of sustained performance (and approximately 54 petaflops peak capability) at the ready, boosted by 16,000 nodes that sport two Ivy Bridge generation processors and backed with three Xeon Phi coprocessors, the 3,120,000 core beast will retain its spot at the top. But the same questions about how the machine is being used—and more specifically, how China is doing the software legwork to be able to fully leverage such a monolith—will likely resurface during the list unveiling at ISC ’15 in Frankfurt.
It is safe to assume plenty of what happens on TIanhe-2 stays within hushed circles, but what a team has been able to showcase promises a big breakthrough for genetic research and demonstrates how application teams are rallying to meet the demands and opportunities of dramatic core count, memory allocation, storage, and other resources boosts, not to mention the needs to keep the machines fully fed.
A team working on the Chinese supercomputer was able to achieve a 45x speedup on a single node of the system without a loss in precision by refining their approach to parallelization of a critical part of the genomic analysis pipeline. By revamping how a commonly used SNP detection framework shares the load via the team’s mSNP framework, they could take this single node performance and scale it to just over 4,000 nodes of the Xeon Phi-boosted super.
The existing tool is called SOAPsnp, which the team says took more than one full week to analyze for one human genome with 20-fold coverage. To put the critical nature of this step in larger genomics workflow into context, consider the role of SNP detection in the future of medicine. The single nucleotide polymorphism (SNP) is the genetic equivalent of a bit flip, a spot in the DNA sequence where variation can be spotted. These are useful to identify in sequences since they can pinpoint vulnerabilities to certain diseases, map more targeted pharmaceutical routes, and highlight other genetic markers of importance. And these are not few and far between—there are several million SNPs that have been identified in the human genome alone. While we are more concerned with, say, sequential code rather than gene sequences here at The Next Platform,this is useful background for the main point of conversation here, especially since speeding the detection of SNPs provides a significant performance, and thus efficiency advantage for large-scale systems doing complex genomics research.
The researchers note that they achieved this speedup via mSNP by fully harnessing the coupling of the Ivy Bridge processors with the Xeon Phi. Interestingly too, this is one of perhaps only a few such projects to speed this part of the workflow as the researchers state that they are not yet aware of any parallelized variant of the SNP detection process that can take advantage of the Xeon Phi (although there are GPU accelerated frameworks targeting the same workload, some of which have also been developed in China). As the team describes in great detail, they “redesigned the key data structure of SOAPsnp, which significantly reduces the overhead of memory operations, then devised a coordinated parallel framework, in which the CPU collaborates with the Xeon Phi for higher hardware utilization.”
As they will be describing in more detail in July during the Top 500 supercomputing announcement week in Frankfurt, the NUDT researchers took these optimizations and proposed “a read-based window division strategy to improve throughput and parallel scale on multiple nodes,” which again, represents a first on the Xeon Phi.
As the news will potentially shift away from the huge leaps in computing power for this list (assuming no influx of further Tianhe-2 nodes this Top 500 cycle), there will likely be more emphasis placed on the real application performance of the massive machine. The argument has often been made that while the system is no doubt impressive, it is incredibly complex, comprised as it is by not just a stunning number of cores to contend with, but a front-end system comprised of the SPARC variant (Galaxy FT-1500) CPUs as well as a homegrown Linux variant, both of which were developed at the Chinese National University of Defense Technology (NUDT) with help from Chinese IT manufacturer and integrator, Inspur.
And on the applications and usability front, one can expect the same questions about the other purposes of the machine, especially in the wake of the recent U.S. block on exports for supercomputers in China. While news like the genomics work do highlight that indeed, real and valuable progress is being made on large machines (no matter where they are based), the tone of the conversations about how this behemoth will prove itself over time are likely to take a different turn, especially with the blockade and what looks to be the continued dominance of the Chinese super at least the next year—although remember, in 2013, no one saw this big system coming. Especially not with such an unforeseen peak petaflop rating.