Open Hardware Means More Than Saving Money For Rackspace
February 19, 2016 Timothy Prickett Morgan
The religious fervor of open source software has settled down and reality has settled in. Generally speaking, new and complex systems tend to be built from open source and the legacy stuff on which organizations still depend – and will for a long time to come because the risk of changing working systems is higher than the value gained from that change – continue to be closed source.
The same settling in and settling down seems to be happening with the open source hardware movement that Facebook kicked off with its Open Compute Project back in April 2011. Over the next few weeks, The Next Platform is going to be talking to key players and users of Open Compute kit as we approach the Open Compute Summit on March 9 and 10 out in San Jose.
Facebook was experiencing explosive growth in the middle to late 2000s and decided to shift from buying custom gear from Dell’s Data Center Solutions division to designing its own servers, storage, and datacenters. When it opened its first homegrown and self-run datacenter in Prineville, Oregon nearly five years ago, it also opened up the plans for the servers, storage, and datacenters themselves for the world to use.
The generosity of Open Compute is a great example of enlightened self interest in action, and cloud computing and hosting provider Rackspace Hosting, one of the forces behind Open Compute as well as the OpenStack cloud controller and the OpenPower initiative to bring an alternative to the X86 processor to the high end of the server market, has been a big beneficiary of Open Compute and has saved a whole bunch of money.
This was, of course, the plan. Like fellow hyperscaler Google and cloud builder Amazon, Facebook decided that it needed minimalist systems with everything ripped out of them except precisely what it needed for particular applications, and because it was not big enough to command supply chain volumes like Google and Amazon, Facebook figured that opening up the hardware designs would help it build a collective customer base for machines and a broad and deep supply chain to satisfy that base. In effect, it created a pan-company hyperscaler to consume specific designs. The input to the Open Compute effort from various large enterprise customers, other hyperscalers and cloud builders, and even some supercomputing centers would, Facebook hoped, create a flurry of engineering activity, pushing innovation that all could benefit from without having to pay the high overhead that comes from buying servers from tier one suppliers and at price points that would approach those of hyperscalers. Open Compute also meant not having to wait anymore for the tier one suppliers to innovate. Finally, vanity-free servers would reduce bills of material and lower electricity consumption and cooling, saving money on many fronts.
When the first Open Compute servers were adopted by Facebook for the Prineville datacenter, the company said that its designs were about 38 percent more efficient in terms of their resource utilization and cost 24 percent less than conventional customized servers it had been buying from Dell. That is obviously a big step function in the price/performance curve, and something that all fast-growing companies have to watch out for. Facebook has said little about how its designs compare since the early days of the Open Compute project, but two years ago, at the Open Compute Summit, Jay Parikh, vice president of infrastructure at Facebook, revealed that the social network had saved around $1.2 billion in infrastructure costs by managing its own supply chain, designing its own servers and datacenters, and hiring original design manufacturers (ODMs) to design its servers over the prior three years that it had been doing so. Our guess is that this is now close to $2 billion in savings – we will probably learn more at the summit.
Rack ‘Em And Stack ‘Em
The point of Open Compute was always larger than Facebook, of course, and people from Rackspace Hosting, one of the world’s largest cloud builders and one that is looking for any edge it can find against the juggernaut that is Amazon Web Services. (As far as we know, AWS does not participate directly in Open Compute, but for all we know, like Apple it could have been participating behind the scenes for a very long time.) But Rackspace has been there from the very beginning, in the Facebook cafeteria when the Open Compute ideas were sketched out on a napkin.
“We saved a lot of money, and we continue to save a lot of money doing it this way. Some of it is due to pure server design, pure bill of materials, and pure direct relationships that come in this Open Compute ecosystem.
To save money on its own expanding infrastructure, Rackspace decided in 2011 that it was going to shift to whitebox machines, and two years later, at the annual Open Compute Summit, the company disclosed its own plans to move to customized Open Compute systems with a set of iron manufactured by Quanta Computer and WiWynn, two of the big suppliers of Open Compute and other customized machinery to the IT industry. The company’s top brass explained the rational for moving to Open Compute in more detail to us a few months after that. At the time, Rackspace had 90,525 machines in its fleet and about 16,300 of them were based on Open Compute setups, hosting its private cloud business. As of the end of 2015, the Rackspace server fleet has grown to 188,177 machines, and a larger portion of the boxes rolling into its datacenters are based on Open Compute specs, according to Aaron Sullivan, the senior director and distinguished engineer at Rackspace who has been driving the company’s Open Compute server effort.
Because servers have depreciation schedules, like other capital assets, and because companies like Rackspace want to get the most out of their investments by having those assets run more efficiently for as long as possible to improve margins, no company can make the shift to Open Compute in a year. (We have seen various non-Open Compute machinery in Facebook’s own Forrest City, North Carolina datacenter, for instance.) So Rackspace is being methodical about its Open Compute transformation.
“Our course of doing this has been to choose which applications to put on our Open Compute platforms, and we have a plan to pick up more and more applications that deploy and manage that way,” Sullivan tells The Next Platform. “When we started, we were focused primarily on our public cloud, and today well over half of our public cloud infrastructure on total on a run rate basis – for our two big workloads – are pure Open Compute. We do not deploy infrastructure for our Virtual Cloud Servers or OnMetal servers that is not Open Compute, and that has been the case for over twelve months now, and we started winding down other machines about two years ago.”
The Virtual Cloud Servers, its virtualized public cloud, represents the biggest part of the overall Rackspace Cloud. Rackspace does not break out the public cloud and private cloud server and revenue numbers separately for competitive reasons. “But I can comfortably say that between those two and the other applications we deploy today, Open Compute is in the ballpark of half of the servers we deploy today.”
There are certain applications and certain elements of the Rackspace environment that do not lend themselves to Open Compute, says Sullivan, and some of that is just the difference between being a hyperscaler and being a public cloud provider and hoster for enterprise customers. “I don’t think we will build a storage array that runs EMC Symmetrix code, for instance, but just happens to be an Open Compute system. That is a portion of our offering, and some of our supporting systems in the dedicated hosting part of our business that are so tightly integrated that unless we work with those suppliers to provide the software to disintegrate and reintegrate those systems, we won’t do it. We have talked about that, but it will be a while.”
Interestingly, EMC already lets its ScaleIO distributed block storage run on any X86 iron, and it is moving to let its Isilon distributed network filer software run across generic servers, too. Maybe it is not so farfetched to think Symmetrix could be set free, too. But perhaps what really happens is that these monster SAN and NAS arrays just go away. That is what the whole software-defined storage movement is really about, after all.
“I could see Rackspace overall over the next few years getting to a run rate that is comfortably in the 80 percent to 85 percent of infrastructure and it would probably slow down from there until no one was running traditional storage arrays or de-dupe clusters or tape archive systems and things like that which tend to be tied to applications.”
Considering the extra engineering effort and cost and the hassle of building and maintaining a supply chain, it is reasonable to ask what Rackspace has gotten out of its shift to Open Compute iron in those portions of its datacenter that use the machines. Sullivan had a lot to say about this:
“We saved a lot of money, and we continue to save a lot of money doing it this way. Some of it is due to pure server design, pure bill of materials, and pure direct relationships that come in this Open Compute ecosystem. There are systems that are more efficient because of the way they are shaped or cooled or how electricity got distributed to them, and we have not made all of the same gains as Facebook in that regard because we have selectively opted to not pursue some of those efficiencies because of equipment serviceability or support for other kinds of equipment in the same datacenter on our electrical system. But I would say most of the gains Facebook saw we have appreciated and we continue to do it because we save money. That is probably the big driver, that is pretty cut and dried and easy. The other big driver is that our engineers from hardware up through software, once they get going with open, they tend to prefer it because of the freedom it gives them when they are building their solutions. That was the big story with OnMetal – we could do things there that we could not do otherwise, or that we could not do as fast. And that is our OpenPower story as well: We have freedom here that we don’t get anywhere else, and if we tie it into the other open source hardware and software efforts, it is compelling.”
Depending on the application that is running on the server and the configuration, Sullivan reckons that it saves anywhere from 25 to 40 percent off servers with its Open Compute suppliers compared to going to the tier one players. (And some of those tier ones, like Hewlett Packard Enterprise, will actually build and deliver those Open Compute machines if you want.) To put a number on it for but one example, Sullivan says that the move to Open Compute saved Rackspace around $40 million for its Virtual Cloud Servers platform. Sullivan adds that he does see the traditional OEMs, many of whom have hyperscale and cloud configurations with minimalist designs and low prices, are sharpening their pencils and competing harder with Open Compute. (This indirect competitive pressure is also what makes the ARM server chip makers a lever for hyperscalers and cloud builders negotiating with Intel on chip configurations and prices.)
“Open Compute, in that sense, was not just about open source hardware, but it was about a business model that came with an ecosystem,” Sullivan explains. “If all of those OEM suppliers decided that they would operate at the same margins that the Open Compute ecosystem brought, the question would be why would they do it?” Sullivan asks rhetorically with a laugh. “And is that really the only part of the play? I think that if I am selling servers it is not terribly compelling to say that I am going to cut my margins. It is better to say that it is coming with something else. I think that Open Compute has influenced the industry in total by making it more oriented on engineering services. People are more open minded about customizing and working with partners for their special needs than in the past.”
But in the end, they are actually cutting margins and supplying customers with those engineering services. The idea is that while the OEMs and ODMs doing custom gear (whether OCP-compliant or not) will be able to charge more money for lower unit volumes as machines go into large enterprises, telcos, service providers, and other companies that are running at scale, but not at hyperscale. The higher prices with these customers and an expanded supply chain will give OEMs and ODMs more overall volumes and better margins at the same time as this technology trickles down from on high.
This sure beats trying to upsell over-engineered enterprise-class servers with designs from days gone by, when every server was sacred, to companies that are trying to scale out. That surely did not work at Rackspace, and it is not going to work at any company with thousands to tens of thousands of systems.