Making Remote NVM-Express Flash Look Local And Fast
March 1, 2017 Jeffrey Burt
Large enterprises are embracing NVM-Express flash as the storage technology of choice for their data intensive and often highly unpredictable workloads. NVM-Express devices bring with them high performance – up to 1 million I/O operations per second – and low latency – less than 100 microseconds. And flash storage now has high capacity, too, making it a natural fit for such datacenter applications.
As we have discussed here before, all-flash arrays are quickly becoming mainstream, particularly within larger enterprises, as an alternative to disk drives in environments where tens or hundreds of petabytes of data – rather than the exabytes found with hyperscalers – are more common. NVM-Express will help accelerate the use of flash, and many server makers are banking on that.
That said, NVM-Express does not come without challenges. In particular, the range of disparate requirements from the broad array of applications that run in the typical datacenter puts constantly changing demands on the flash devices in terms of capacity and throughput, and designing systems that can strike a balance between CPU, memory and flash resources to address all these different workloads is nearly impossible. The result is a situation that has long haunted many of the resources deployed in datacenters – overprovisioning. Enterprises need to make sure there is enough flash storage within their environments to ensure it can handle the myriad workloads with their differing requirements. As a consequence, the NVM-Express devices within the datacenters are often underutilized, and organizations are saddled with total cost of ownership (TCO) that is higher than it needs to be.
A solution to these issues is enabling remote access into the NVM-Express devices, which can improve utilization by enabling applications to access flash on machines that have capacity and bandwidth to spare or to servers that support large numbers of NVM-Express devices. Such remote access is commonplace with hard disks. There are a number of software systems that can make remote disks accessible as block devices, network file systems, distributed data stores or distributed file systems. Remote access into flash devices is not as easy – hardware-accelerated techniques come with challenges around ensuring high performance at a low cost and delivering consistent and predictable performance in an environment where there is interference between multiple tenants that are sharing a flash device. NVM-Express over RDMA fabrics can’t bring the necessary performance isolation or deployment flexibility, while current software-based technologies – such as iSCSI or event-based servers – come up short on the performance end.
However, a group of researchers from Stanford University are arguing that they have come up with software-based flash storage server that delivers remote access to NVM-Express devices that addresses all those challenges. In a recent paper, the researchers – Ana Klimovic, Heiner Litz and Christos Kozyrakis – unveiled ReFlex, which they said uses a unique dataplane kernel that tightly integrates networking and storage. The result is a technology that allows for remote access to NVM-Express devices that delivers high performance – close to that of direct access to local flash.
“The dataplane design avoids the overhead of interrupts and data copying, optimizes for locality, and strikes a balance between high throughput (IOPS) and low tail latency,” the researchers said in their paper, ReFlex: Remote Flash ≈ Local Flash. “ReFlex includes a QoS scheduler that implements priorities and rate limiting in order to enforce service level objectives (SLOs) for latency and throughput for multiple tenants sharing a device. ReFlex provides both a user-level library and a remote block device driver to support client applications.”
The ReFlex server hits 850,000 IOPS for each core when running over commodity 10 Gb/sec Ethernet networks with TCP/IP. The server can support multiple NVM-Express devices and thousands of remote tenants as well as hit networking line rates at a low cost. According to the paper, the server’s unloaded latency is only 21 microseconds higher than direct access to local flash via NVM-Express queues. The QoS scheduler can ensure that those tenants with SLOs get the required latency and throughput requirements, and that even larger legacy applications get performance that is almost the same as those with local flash.
The code for the open source ReFlex server, which is written in C, is available at https://github.com/stanford-mast/reflex.
ReFlex has three primary components, including the remote flash server itself, which the researchers said is an extension to the open-source IX dataplane OS. They created an NVM-Express driver that takes advantage of Intel’s Storage Performance Development Kit that communicates with flash devices and gains exclusive access to NVM-Express queue pairs. They also implemented an efficient two-step dataplace model that enables asynchronous access to flash. Other additions include the QoS scheduler as part of the first run-to-completion step and the system calls and events “to register and unregister tenants, submit and complete NVM-Express read and write commands, and manage NVM-Express errors.” The other components include clients and a control plane.
ReFlex “serves remote read/write requests for logical blocks of any size over general networking protocols like TCP and UDP,” wrote the authors of the report, which outlines details of ReFlex. “While predominately a software system, ReFlex leverages hardware virtualization capabilities in NICs and NVM-Express Flash devices to operate directly on hardware queues and efficiently forward requests and data between NICs and Flash devices without copying. Its polling-based execution model allows requests to be processed without interruptions, improving locality and reducing unpredictability. ReFlex uses a novel I/O scheduler to guarantee latency and throughput SLOs for tenants with varying ratios of read/write requests.”
The researchers tested ReFlex in a number of areas, including latency, throughput and CPU cost, performance QoS and isolation, scalability (in cores, tenants and connections), and performance for such Linux applications as FIO, FlashX and Rocks DB. Overall, they found that remote access to NVM-Express devices via ReFlex achieves performance close to that of direct access to local NVM-Express – for example, 850,000 IOPS for ReFlex versus 870,000 IOPS on local flash – and offers improvements through such technologies as the software-based QoS scheduler. As larger enterprises continue to adopt NVM-Express devices, being able to cost-effectively remotely access them will also grow in importance. If the Stanford group’s ReFlex technology can deliver what it promises, it can offer organizations one more reason to embrace flash over disks.