As the theme goes this year, what’s old is new again. And not just in terms of technologies like deep learning or FPGAs, which are suddenly returning to life with new vigor. The trend appears to suit select companies as well.
Consider, for instance, Penguin Computing, which opened its doors in 1998 to kickstart the burgeoning Linux server space with the hopes of tracking down customers in the scale-out enterprise world as well as the emerging dot com companies. By 2000, their vision for Linux clusters in the Fortune 1000 was realized as they captured a few key large accounts, including GE Appliances, heavy equipment maker Caterpillar, and a number of others in diverse areas.
What’s interesting about Penguin’s journey is that they are back to where they originally started, driven around the circle to the scale-out infrastructure starting point. Of course, the entire market has shifted since then toward the very place they began–and they have learned a few lessons along the commodity hardware path. In that nearly eighteen year stretch, they had to fiddle with the aperture, first by shifting focus to their supercomputing segment.
This is not to say that their efforts in high performance computing were short lived or unproductive. Even now, roughly half of the company’s business is rooted in HPC, a figure that is expected to jump significantly following a major win announced late this year for a wide-ranging set of systems funded by the National Nuclear Security Agency (NNSA). This $39 million contract, which was signed to deliver the company’s Open Compute Project infrastructure for mission-critical workloads across three NNSA national labs, will power Penguin’s reach into more government-fed contracts—or at least that is the hope, according to the company’s CEO, Tom Coull, who tells The Next Platform that there will be an explosion in their opportunity to grow their federal business, something that proved difficult without tier-1 vendor status but is slowly and surely changing.
The three main NNSA labs (Sandia, Livermore, and Los Alamos) that took Penguin’s Open Compute Project designs via the high performance computing focused Tundra ES will be expecting anywhere from seven to nine petaflops to hit the floor beginning in 2016 with upgrades, including the Knights Landing-based nodes (as detailed here) eventually. The Advanced Simulation and Computing (ASC) program’s CTS-1 procurement will deliver 200 teraflops per “scalable unit” or modular unit, which can be plugged in as the labs demand. According to the contract, each of these units will be between 150 and 200 nodes. The first of the systems will feature the Xeon E5-2695 v4 “Broadwell” processors, but as the processor ecosystem gets far more interesting in 2016, so too will Penguin’s lineup for Tundra.
This contract is noteworthy in terms of what it means for the future of open architectures at major national labs, but the business story for Penguin is also one worth telling.
Penguin, as a relatively small company, has managed to nimbly dart around a couple of key markets at the right times. From its primary focus on HPC, which began in earnest when they bought the Scyld cluster management framework in 2003 and extended until close to 2010. It was around that period that interest refreshed right back where they started—in the scale-out market. And with open software, and increasingly, hardware, as well as density, power management, a nimbleness being more valuable than ever, they are positioned to keep their balancing act going. This is especially true now that the HPC side of their story is firming up with national lab contracts. Despite the fact that they only have one system on the Top 500 (at #435 and the University of Wyoming) this is no measure of the real influence of the type of functional workhorses they’re providing.
The company began its early run with the Open Compute Project in 2012. At the time, OCP was still a gleam in the eye of anyone but the largest hyperscalers, but momentum has built. For Penguin, the first year brought some thought leadership points, but by 2014 its OCP form factors were responsible for 8 percent of their business and in 2015, it accounted for 30 percent. According to Matt Jacobs, who leads worldwide sales for the company, OCP will drive 45 percent of Penguin’s business this coming year. The span here will roughly correlate with their half split between HPC (government and commercial) and enterprise sales. The primary drivers behind this growth are where one might expect them; in cutting-edge financial services circles and as a newer addition, in media and entertainment. General purpose cloud is another area where Penguin is seeing adoption of its OCP products, with increasing interest from higher ed.
Jacobs has seen the Penguin story play out over his sixteen years—and was the first to remark on the full circle cycle with the emphasis returning back to scale-out. He says that OCP, and specifically, their Tundra platform, are finding a fit anew as users consider Tundra as a bridge to some of the converged architectures they’re looking to deploy. That, and the fact that there is a general move away from the Tier-1 vendors—at least, according to Jacobs.
“We have multiple proofs of concept with several Fortune 500 companies as a direct result of the CTS-1 deal. It’s helped remove some barriers, especially as more are considering a move to a second tier vendor. The key for a lot of these is in flexibility. Tundra is delivering a heterogeneous configuration in a homogeneous wrapper and that is intriguing. And in the scale out-space, there’s a move toward lighter weight applications, bringing the environment and those applications closer to the user, and more of a devops mentality.”
As Dan Dowling, VP of engineering at Penguin, tells The Next Platform, the appeal of Open Compute serves across their HPC and general enterprise and cloud lines. “For big science, the things they are looking are the density and the centralized power. Those are huge factors there. For commercial HPC, it’s more focused on the common form factor and being able to leverage the same investment over many years and adopt Xeon Phi and upcoming processors as well.” For the big NNSA deal, however, it boiled down to openness and the associated flexibility.
“The fact that we accept a standard sized motherboard, can pick any processor or motherboard vendor we want, then fit it into a form factor really fast is key.” Since Penguin designs their own nodes from the sheet metal ground up, this is possible. “We don’t have to design an entire chassis and all the infrastructure around it. We design our own compute nodes, it’s easy to swap in different motherboards, and this is resonating.”
Interestingly, the three main labs share something in common with some companies that have distributed sites where they collaborate on or share workloads. Having the same basic systems where applications and people can move and have everything work—and simpler upgrades on top of that are big factors as well, Dowling argues.
The Tundra is noteworthy in its density, its ability to do both air and water cooling in the same system (cooling vendor Asetek is partnered with Penguin for the CTS-1 systems), and the centralized, single power system that provides redundant 12-volt power to the whole rack. This might make it sound like a good possible fit outside of big scientific computing and in large cloud or hyperscale datacenters. Jacobs says that while the hyperscale market drove a lot of their early ambitions and sense of how other datacenter operators might design future systems, their OCP work is a step down from the Googles and Facebooks of the world. Still, the trajectory is healthy, he says, and will grow along with HPC this coming year but will probably not outpace it.
“The hyperscale companies are working on designs of systems close to what they want and they’re directly involved with the manufacturers. That whole tier is morphing. And those margins? They are abysmal,” Jacobs says. “A lot of our competitors are taking a loss early with hopes of building into margin as they go. It’s a scaled game. But if you go one tier down from that and look at how many are emulating hyperscale, they appreciate what we’re doing.”
While a lot of what Penguin sees on the ground are what one might classify as “vanilla” systems, there is an uptick in GPU adoption in oil and gas and bioinformatics. Jacobs says they’re hearing a lot about the potential for deep learning but so far, the research being done there isn’t driving anyone to buy their systems—at least yet.
The national labs have always been pioneers of open source software, and the chance to do this across the system was appealing, CEO Coull says. But this is stretching out to enterprise—and bringing more users off the ice from large, bulky, locked-down systems.
At the end of the day though, Jacobs says, what is really happening is that “the machines with a lot of engineering, proprietary interconnects, proprietary form factors—all of that is a shrinking island as far as the commercial computing base is concerned. They did all of that in HPC twenty years ago, they did it in enterprise with blades a decade ago.” So its back to the scale out base for Penguin—a full circle journey.