The Redemption Of NFS
November 30, 2017 David Flynn
If thinking of NFS v4 puts a bad taste in your mouth, you are not alone. Or wrong. NFS v4.0 and v4.1 have had some valid, well-documented growing pains that include limited bandwidth and scalability. These issues were a result of a failure to properly address performance issues in the v4.0 release.
File systems are the framework upon which the entire house is built, so these performance issues were not trivial problems for us in IT. Thanks to the dedication of the NFS developer community, NFS v4.2 solves the problems of v4.0 and v4.1 and also introduces a host of new features that make the file system ideal for the demands of the data-centric enterprise. Let’s take a closer look at how NFS 4.2 found redemption.
NFS earned its reputation for slow performance. The performance of NFS v3, while adequate, was often deemed unsuitable for high performance applications. As a stateless file system, protecting data integrity required additional protocol operations. Data access requests typically required five to six trips between the client and the NFS server for common file operations. In addition, NFS v3 could not effectively cache data on clients, so all access requests required these multiple steps and data had to traverse the network.
High Performance And Stateful
NFS v4 attempted to fix this problem by making the protocol stateful and enabling data caching on clients with the delegations feature. The problem with NFS v4 until now is that the rest of the architecture wasn’t designed to take advantage of this feature. The number of trips between the client and the NFS server actually increased from five to six to around ten. This overly chatty approach, offset the benefits of data caching, and performance and scalability from NFS v3 to NFS v4 got worse instead of better.
NFS v4.2 finally fixes those problems. At the protocol layer, NFS v4.2 is more efficient as compared with NFS v3. And v4.2 is able to leverage NFS compound operations, which means fewer round trips to the NFS server and therefore lower latency for metadata. If an application is accessing data cached on the client – which is very common – the data is accessed directly, without the need to check with server at all. Data access to storage with NFS v4.2 is direct, using NFS v3 for data read and write operations, as it delivers compatibility with all existing NFS v3 storage. The data IOPS and throughput is limited only by the storage and network capabilities.
In addition, NFS v4.2 enables clients to access multiple storage devices in parallel with a feature called parallel NFS (pNFS). The performance improvements created by these features in aggregate are significant and make NFS v4.2 an excellent choice for mission-critical applications.
All of these improvements make NFS v4.2 significantly faster, but running NFS v3 benchmarks against it will not reflect these improvements. This is because the NFS v3 tests were designed for a stateless architecture and performed simple file operations, such as opening or creating a thousand files. NFS v4.2 outperforms NFS v3 dramatically in tests that measure real-world operations, such as creating a file that is then accessed by another application. These are the kinds of performance improvements that really impact business.
NFS 4.2 also offers a number of other improvements and innovations:
- Protecting business continuity with live data migration. The Flex Files feature in NFS v4.2 enables live files to be moved without impacting applications. Flex Files can non-disruptively recall layouts, which enables data access and data integrity to be maintained, even as files are being copied. This feature has enormous ramifications for enterprises as it can eliminate the downtime associated with migrations and upgrades. Enterprises can combine this capability with software, such as a metadata engine, that can virtualize data across heterogeneous storage types, and automate the movement and placement data according to IT-defined business objectives. This level of automation can dramatically reduce the instance of human error that is currently the source of 60 percent to 80 percent of all downtime events.
- Free and accurate performance telemetry. All Linux clients running NFS v4.2 continuously report performance metrics on the underlying infrastructure. These metrics can be used by a metadata engine to optimize service levels, while minimizing costs. Notably, NFS clients require no additional software installation for enterprises to gain this ability, which makes it easy and safe for enterprises to deploy.
- Built-in support for file cloning. Server-side clone-and-copy enables cloning and snapshots of files by any NFS v4.2 storage server. If enterprises deploy the NFS storage server on an NVM-Express flash server, enterprises can improve service levels by offloading these operations from storage – reserving more storage resources to the serving of data to applications.
- Enhanced security. NFS v4.2 access control lists (ACLs) are compatible with Windows ACLs. This greatly simplifies the secure sharing of data across Linux and Windows platforms. Security has also been improved with the ability to use RPCSEC_GSS for authentication and data access.
- Broad support from enterprise Linux distributions. NFS v4.2 is supported by all the major distributions, including: Red Hat Enterprise Linux, CentOS, Oracle Linux, Canonical Ubuntu Server, and others. In fact, Red Hat recently abandoned BTRFS support in favor of NFS and its parallel access capabilities noted above.
NFS v4.2 transcends its legacy of slow performance and is now an excellent choice to support enterprises’ rapidly increasing scale-out file system needs. The Linux NFS wiki for a repository and documentation and most Linux vendors’ documentation provides a good source for more detailed information. Led by kernel maintainer and Primary Data principal system architect Trond Mykklebust, our team at Primary Data has actually made the most NFS contributions since October 2013, and I am proud that we can help NFS continue to evolve to serve modern enterprises.
David Flynn is co-founder and chief technology officer of startup Primary Data and has been architecting disruptive computing platforms since his early work in supercomputing and Linux systems. Flynn pioneered the use of flash for enterprise application acceleration as founder and former CEO of Fusion-io. He designed several of the world’s fastest supercomputers after building flight simulation software for Department of Defense missile systems at CSC in his teenage years. Flynn holds more than 100 patents across web browser technologies, mobile device management, network switching and protocols, to distributed storage systems. He earned his B.Sc. in computer science at Brigham Young University.