Lenovo has partnered with Excelero, an up-and-coming NVM-Express solution provider that sells a scale-out software-defined storage platform. The matchup brings Lenovo’s ThinkSystem hardware together with Excelero’s NVMesh2 software that can create a virtual shared storage pool from NVM-Express devices.
Essentially, NVMesh2 is able to aggregate a bunch of NVM-Express drives and make them accessible to applications as a shared storage array. The storage can be built both from drives installed locally (direct-attached) across a large number of servers, as in a cluster, or with those installed in a more centralized fashion, like in a top-of-rack storage server.
We described the technology in greater detail here, but the bottom line is that users are able to gain the flexibility of low-latency NVMe-Express storage as a shared resource without having to buy an NVM-Express over Fabrics appliance or, for that matter, any other additional hardware – which is the whole point of software defined storage.
And if the servers are hooked together with an RDMA fabric, like InfiniBand or RoCE, only about 5 to 10 microseconds of extra latency are added to access remote drives on the target server. Since that’s a relatively small fraction of the latency already present in the NVM-Express devices themselves, not only does the shared storage appear as if its local to the client, it acts like it too.
Offering a software-defined product also has the practical advantage of adding value to other people’s hardware. That makes it particularly attractive to companies like Lenovo, who are already selling NVM-Express-equipped servers, like their ThinkSystem SR630 and SR650. These have a wide spectrum of use cases including cloud computing, HPC, machine learning, data analytics, and general enterprise applications. With regard to local NVM-Express storage, the SR650 is by far the denser of two, supporting up to 24 drives per enclosure.
However, as we already mentioned, since the NVMesh2 client doesn’t require the drives to be local, it can bring a lot of NVM-Express capacity into storage-less servers as well. For example, Lenovo’s SR670, a 2U HPC/AI powerhouse that can be outfitted with up to four GPUs, doesn’t offer any NVM-Express ports. But if you connect one next to a few SR650s filled to the brim with reasonably high-capacity NVM-Express drives, all of sudden applications running on that SR670 now have access to a petabyte or more of low-latency storage.
According to Excelero’s Patrick Guay, this new partnership is a big deal for his company, since NVMesh2 is now part of Lenovo’s standard ThinkSystem offering and is being sold directly by the OEM, as well as through its global channel partners. Guay is Excelero’s VP of Strategic Accounts, and it’s hard to imagine anything more strategic for a 66-person company than getting your software folded into a product line of a tier 1 vendor. Guay told us they have sold their solution through system manufacturers in the past, but this is “the first global agreement that we’ve signed with an OEM.”
Tapping into Lenovo’s global partner network also opens up other opportunities for Excelero, including integration and support services. That’s especially relevant for third-party solutions that involve things like high performance file systems or database acceleration, where Excelero’s expertise with their own technology and their ability to tweak their software could create additional benefits.
Joint Lenovo-Excelero deployments currently exist at SciNet, in Canada and at a London-based machine learning firm. But now, with the closer partnership, Lenovo and Excelero have a better shot at securing even larger deals at places where low-latency I/O at scale is a critical driver – think hyperscale datacenters and top tier supercomputers. In fact, Guay says they are working with Lenovo on some “pretty sizable” installations right now.
However, to operate comfortably at that level of scale, the Excelero technology will need a few more proof points to show its potential. The largest publicly known installation to date was NASA (at a system that no longer exists), which used NVMesh across 128 nodes. Coincidentally, Excelero’s website specifies 128 nodes as its upper limit of scalability. However, that number is actually exceeded by an installation of about 1,000 nodes, in this case, for a hyperscale customer that wishes to remain anonymous.
Josh Goldenhar, Excelero’s VP of product management says they have tested their target software on up to 256 nodes and their client software on up to 1,024 nodes. “[B]ut the architecture is scalable and there are not hard limits in the software for clients,” he explained to us. “So we could go to thousands of clients.”
Goldenhar also noted that even 256 target nodes will be able to supply a lot of NVM-Express storage capacity, especially if you use 24-drive servers that are available from a number of different vendors. If you fill all 256 nodes with 15TB drives, like the Toshiba CM5 or Samsung 1725b, you would have 92 petabytes of raw storage. Goldenhar says the aggregate performance of such a system would be about 1,280 million IOPs (read) and six terabytes per second of bandwidth.
However, where direct-attached NVM-Express drives are present in each server in a supercomputer or datacenter, greater scalability is needed. Lenovo’s largest supercomputer, the SuperMUC-NG, has 6,480 nodes, a number which itself is dwarfed by the largest cloud datacenters. Guay said the company has every intention of making sure its technology can operate in such environments. ”Our objective is to scale into the thousands,” he said.