Open Source Is The Future Of EMC Software

Technologies are not the only things that pollinate across the HPC, hyperscale, cloud, and enterprise sectors of the IT economy. People do, too, and Joshua Bernstein is a prime example of a techie that is comfortable with the technologies from all of these fields, and importantly, who has worked in these sectors and not just thought about them theoretically.

Bernstein is also an avid proponent of open source, and he was hired by EMC in May start the process of opening up a slew of software technologies at the storage juggernaut. Neither EMC nor his prior employer, Apple, are known for opening up the software that they consider the family jewels. EMC now seems to be committed to opening up its wares in areas where it sees shortcomings and also an opportunity to make itself more relevant in the modern datacenter.

Starting In HPC, Ending Up In Hyperscale

Early in his career, Bernstein spent a few years in academia as a sysadmin and then eventually became a software engineering manager at HPC cluster builder Penguin Computing, where he was specifically in charge of its on-demand HPC cloud infrastructure. He is well versed in finite element analysis, computational fluid dynamics, and life sciences simulations and the Message Passing Interface (MPI) protocol that is at the heart of HPC clusters.

While Bernstein was at Penguin Computing learning the HPC ropes, Apple was busy designing and launching its first iPhone smartphone, which came out in January 2007. Three years later, Apple was looking for a killer app to give it an edge over other smartphones, and so it shelled out more than $200 million to buy the Siri voice-activated personal assistant application from its eponymous developer. Soon after, Bernstein left Penguin Computing and joined Apple as manager of deployment and architecture for Siri, the personal assistant that is embedded in iPhones and iPads from Apple.

While Apple has been using open source technologies with more frequency in recent years, so the rumors say, and has even joined the Open Compute Project, the company is extremely secretive, just as online retailer and cloud computing giant Amazon is and, for the most part, Google and Microsoft are, too. They all use open source, but they don’t tend to release their own code as open source projects. Traditional IT vendors like EMC often contribute to open source projects where it suits them, but they similarly don’t often open up whole stacks of code. EMC’s Pivotal analytics unit is an exception, and it looks like the Emerging Technology division where are the shiny new stuff is at EMC might be one, too.

Bernstein helped Apple scale up the Siri voice recognition system from a couple of racks of iron to the over 60,000 servers it runs across today in Apple’s datacenters. He was there when Siri was moved from bare metal to VMware’s ESXi hypervisor (we are not sure when that happened, but it was after 2011), and also shepherded Siri off ESXi onto a homemade platform cloud called Jarvis that runs atop the Apache Mesos framework and job scheduler and uses Docker containers, which happened in 2014.

“I think that customers demand open source software for fear of vendor lock in or some such. But the real value of open source software to users is to allow them to integrate these storage platforms, in this case, into their environments easier. Open software helps them fix bugs faster. It is basically a way to outsource product development and product support, to a certain extent, to the customers that are using the product.”

As such, Bernstein is probably one of the few people on the planet who knows the ins and outs of supercomputing applications and job schedulers, VMware’s vSphere stack, and Mesos – spanning the HPC, enterprise, and hyperscale segments. And that is one of the reasons why he was put in charge of the team of open source evangelists, collectively called EMC{code}, that has a charter to open up the EMC software library and make it available and relevant to the open source community.

Bernstein is opening up the code on two pieces of systems management code that EMC and its customers want to see more broadly adopted, and if he has his way, a whole bunch of the company’s most advanced software for scale out compute and storage – which is tucked up into its Emerging Technologies division – might be open before too long, and just in time to bring some staunch competition to open and closed source software with which it competes.

No, EMC is not opening up the ESXi hypervisor and its related management add-ons from its VMware unit, which would be fun. But as the hypervisor is increasingly commoditized, don’t be surprised if future virtualization tools coming out of VMware that are aimed at containerized applications rather than full virtualization are eventually opened up to spur adoption and to give VMware a chance to compete against Mesos and the many flavors of Kubernetes that are being developed.

EMC’s open source efforts are starting in system management, with tools aimed at servers and storage. Specifically, EMC is updating the code from the CoprHD project that it open sourced back in May – its first product to ever be opened up – and is releasing two more – RackHD and REX-Ray – into the wild.

“What we would like to do is extend this concept to the rest of our existing software ecosystem,” Bernstein declared to The Next Platform. “There is no reason why our object storage platform, called Elastic Cloud Storage, could not be open source – either all of it or part of it in some way. Our ScaleIO software-defined block storage, which competes directly with Ceph, there is no reason why all or part of that could not be open sourced. And so if I had my druthers and get my way, I think that you will see a lot more of our core software portfolio will become open source.”

This might sound strange in a world that is, as they say, being eaten by software. But, to be more accurate, the world is being eaten by free software that runs on hypercheap hardware, and ironically the organizations that can most afford to pay support for that software often do not. Frankly, they have the expertise to help make open source software better for the rest of the world as well as for themselves, and they need to be encouraged to do that if the rest of us have to pay for it through support contracts. This way, the software advances and people can still make a living, even if it does obliterate the old perpetual licensing and 20 percent annual maintenance fee that has dominated the software world for decades. Now, vendors have to live on the 20 percent alone, and they rely on the community of self-interested to help advance the code.

“I think that customers demand open source software for fear of vendor lock in or some such,” continues Bernstein. “But the real value of open source software to users is to allow them to integrate these storage platforms, in this case, into their environments easier. Open software helps them fix bugs faster. It is basically a way to outsource product development and product support, to a certain extent, to the customers that are using the product. From my perspective, it is not about removing lock in, but allowing them to consume the software easier and faster and integrating it with their existing provisioning and monitoring tools and better leverage the software. That is what’s really what I am excited about going forward.”

Provisioning Hardware Is Not Sexy, Or Easy

The first new tool that EMC is opening up today is called Rack Hardware Director, or RackHD for short, that is the result of an internal systems management project that EMC started in early 2014 called OnRack mashed up with a tool called Monorail that EMC got ahold of in March of this year when it acquired a small startup called Renasar Technologies. At the time, EMC was looking for a system provisioning and management tool that could update the firmware and BIOS for many different kinds of platforms that underpin EMC’s products, and was also looking to do a better job at bare metal provisioning than the Ironic plug-in for the OpenStack controller. (When you own VMware and Cloud Foundry, as EMC does, OpenStack is not the be-all, end all even if it is very important.)

“RackHD solves the deployment problem,” says Bernstein. “When a server rolls into the datacenter, what happens next? RackHD has workflows to kickstart the machines and provision them. A lot of customers struggle with this systems management problem. It is the least sexy, it is the furthest away from the application, but it also presents the biggest challenge, which I know from my experience at Apple.”

emc-rackhd-block-diagramIn other words, RackHD is similar to the configuration and provisioning tools that all of the hyperscalers have created to automate this process – Google has Borg, Microsoft has Autopilot, Amazon has something it doesn’t talk about, Apple had a souped up version of an open source tool called Verdad that was used to provision machines and update their firmware and BIOS to get them ready for VMware ESXi installation. (The new provisioning tool at Apple, which does not have a name, is written from scratch in Ruby and is decidedly more open than Verdad in terms of being able to support technology changes above the hardware, but has not been and probably will not be open sourced.)

The difference with RackHD, which uses a mix of JavaScript backed by Node.js as the front end and has C and C++ as its core backend, is that you can now get your hands on this code under an Apache 2 license.

“In the enterprise space, they just don’t have the budget or the focus that the hyperscalers do to create such tools,” says Bernstein. “This is out attempt to bring that level of capability into the enterprise.”

RackHD does not just configure the hardware, but also deploys hypervisors and operating systems, and is being used in EMC’s VxRack hyperconverged platforms, to install its ScaleIO hyperconverged block storage and Pivotal analytics software, and in its Virtustream cloud.

The CoprHD tool that EMC already open sourced in May – yes, it is pronounced “copperhead” but RackHD is not “rackhead” because, you know, consistency – is based on the ViPR cross-vendor storage management tool that the company launched in September 2013 and that has been generating tens of millions of dollars a year in revenues for EMC. ViPR and its open source offshoot, now its underpinnings, can provision as well as monitor and manage storage through a single set of APIs. CoprHD is written in Java.

As part of the open source festivities today, EMC is announcing CoprHD 2.4, which adds storage orchestration for the OpenStack cloud controller plus a new software development kit for the southbound part of the tool that lets it talk to various storage, which EMC has used to create a plug-in for its ScaleIO scale-out block storage. Intel has joined the CoprHD community and is spearheading the OpenStack integration, and is working on a quality of service add-on for the Cinder block storage layer for OpenStack. EMC’s Elastic Cloud Storage object storage and XtremIO all-flash arrays are now also brought under the control of CoprHD with the 2.4 release. Oregon State University is kicking in a lot of development effort on CoprHD now, too, by the way.

The second tool that EMC is open sourcing is called REX-Ray, and it can be thought of as a southbound plug-in to CoprHD, but it is written in Go and is specifically aimed at managing the storage for the combination of Mesos and Docker.

“One of the problems that we solved at Apple was how do you handle persistence with containers,” says Bernstein. “You don’t get persistence for free with Docker like you do with virtual machines.” This is what makes him valuable to EMC, and importantly, is one of the problems that EMC is solving with REX-Ray.


REX-Ray is a little bit more than a year old as a project, and it makes the storage management for the Mesos and Docker combination a bit easier. CoprHD and REX-Ray are also released under the Apache 2 license. Presumably if customers want enterprise-class support for RackHD or REX-Ray, EMC is happy to comply, just like it sells supported versions of ViPR based on CoprHD.

This is quite possibly the beginning of a very large movement to open source the hottest software at EMC – the parts of the business that The Next Platform actually cares about and that can be the foundation of modern infrastructure. The question, of course, is whether Dell, which is in the process of trying to acquire EMC for $67 billion, is going to be inclined to open up the software portfolio. Dell has dabbled with some open source here and there, but is not really a software vendor and does not seem to be inclined to be. Which, we think, is a bad strategy, particularly in the system, storage, and network provisioning, management, and monitoring area.

Dell would have been wise to have its own Linux and OpenStack distributions already, and might want to seriously consider opening up all kinds of software at EMC and converting to an open model. Anything else that it wants to sell into the Global 2000 under perpetual licensing is legacy and its years – not days, but years – are numbered.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.