Hyperscaling With Consumer Flash And NVM-Express
April 14, 2017 Timothy Prickett Morgan
There is no question that plenty of companies are shifting their storage infrastructure from giant NAS and SAN appliances to more generic file, block, and object storage running on plain vanilla X86 servers equipped with flash and disk. And similarly, companies are looking to the widespread availability of dual-ported NVM-Express drives on servers to give them screaming flash performance on those storage servers.
But the fact remains that very few companies want to build and support their own storage servers, and moreover, there is still room for an appliance approach to these commodity components for enterprises that want to buy rather than build. This is why flash array startup Pure Storage is still in business and continues to invest in making its own flash storage modules and is among the first out the door embracing NVM-Express to radically improve the performance of its flash storage.
As we have pointed out in the past, Pure Storage, which is one of the pioneers in enterprise-grade flash storage, has two product lines. The flagship FlashArray line is aimed at block and file storage, the kind underpinning databases and virtual machines at most enterprises these days, and was updated last in June 2016 with faster Xeon-based controllers and fatter flash drives. The newer FlashBlade line is aimed at driving down the cost and driving up the scalability of flash so it can be used for large scale object storage like the kind you see in HPC, analytics, and cloud storage workloads. Companies are now looking for flash drives to accelerate the performance of all kinds of key workloads that depend on object storage, including traditional HPC simulation and modeling as well as genomics sequencing and various kinds of data analytics workloads (avoiding the need to go to in-memory databases or data stores) that require petabyte-scale storage.
With this week’s upgrade of the FlashArray line, Pure Storage is making good on its promise to bring the NVM-Express protocol, which cuts out all of the junk in the driver and operating system software stack related to legacy SCSI disk drive protocols that are still used for most flash cards and SSDs these days. With NVM-Express, you talk directly to flash memory from the compute complex, and you talk to it as flash and take advantage of its parallel addressing features to boost bandwidth and drop latency for I/O operations. This SCSI cruft, by the way, is part and parcel of the older FlashArray//m arrays that Pure Storage has been peddling for more than three years, but with the shiny new FlashArray//x devices, they speak NVM-Express end to end. This is an important transition, and every array maker is working hard to make the jump.
“It is kind of humorous, but the same thing that killed disk drives is happening to flash,” Matt Kixmoeller, vice president of products at Pure Storage, tells The Next Platform. “Hard drives died because every year they got bigger, but they never got faster and so the performance per terabyte went down precipitously. We are now seeing the same thing with flash. You can buy a 1 TB or 15 TB or a 60 TB drive, but they all have the same performance, and so the 60 TB drive is actually 60 times slower. Add into that that you connect this flash SSD using SAS pipes that are a serial connection that has one queue, which is a bit like building a big soccer stadium and only having one entrance and exit and making everybody come and go in a single file line. Obviously this creates a bit of a bottleneck. The transition to NVM-Express is about removing those bottlenecks.”
Rather than wait for commercial dual-ported NVM-Express flash SSDs to come to market later this year at a reasonable price, as many vendors are doing – see the drilldown we did on NVMesh from startup Excelero for more on this – Pure Storage is creating its own flash storage modules that support a variant oif the NVM-Express protocol and trying to get a jump on the market. This approach was initially done with the FlashBlade arrays for object storage last year, and is being refined with the DirectFlash modules in the FlashArray//x arrays for block and file storage now.
It may seem odd that Pure Storage is, once again, making its own flash modules. But this is all about pushing consumer pricing into enterprise products. While single-port NVM-Express drives used in PCs are widely available, the dual-ported drives NVM-Express drives needed for storage arrays (to give them resiliency and multiple data access paths) are twice as expensive as dual-ported SAS flash drives right now, according to Kixmoeller. There are those who argue that this gap will disappear by the end of the year, as Excelero has. But even still, the DirectFlash modules designed and manufactured by Pure Storage use consumer flash, so their raw flash is even cheaper and presumably the finished dual-ported DirectFlash modules are on par or even cheaper than what it expects dual-port NVM-Express SSDs to cost later this year.
With the DirectFlash modules, Pure Storage is taking a page out of the hyperscale playbook and removing the control functions that would normally be part of each individual flash device and placing it in a software layer that runs on the FlashArray controller. This has a couple of benefits. For one thing, Pure Storage can manage the entire array of flash as a single unit, and this means that capacity and performance are managed across potentially hundreds of DirectFlash modules rather than locally in each flash card or SSD as is the case in all other all-flash arrays (including the prior generation of FlashArray//m systems). Flash memory still wears out with every write, but the centralization means that the Purity management software that runs on the controllers can deal with wear leveling across a larger pool of flash at the same granularity, and the upshot is that it takes less overcapacity to meet the endurance that enterprises demand while at the same time allowing for Pure Storage to use consumer-grade MLC NAND flash (not the hot 3D NAND stuff) in its DirectFlash modules.
Kixmoeller says that the Purity software can see all of the flash in each card in their own box, while other arrays based on flash SSDs can’t see the spare capacity that flash vendors overprovision to meet endurance specs. With consumer-grade SSDs having around 8 percent overprovisioning and SAS and NVM-Express flash SSDs having as much as 50 percent overprovisioning, this can be a lot of flash capacity that never gets utilized.
Here is how it pans out for usable capacity. With a typical flash array using performance-grade SSDs, if you start with a 9.1 TB capacity, then about 2.7 TB goes right up the chimney for overprovisioning. Then with RAID data protection and other algorithms implemented for data protection, the usable capacity drops down to about 3.8 TB. Even with the FlashArray//m devices, if you started out with 9.1 TB of raw flash capacity, 1.5 TB got burned on overprovisioning to support insurance and the RAID overhead dropped the usable capacity down to 4.6 TB. With the new FlashArray//x controller and DirectFlash modules, there is no overprovisioning at all and after the RAID-3A protection protocol and metadata overhead is taken out, the usable capacity is 5.23 TB for that 9.1 TB module. The combination of consumer flash plus much higher net usable capacity is what Pure Storage is leveraging to drive down the cost of enterprise flash.
The DirectFlash modules are also using the direct parallel access enabled by NVM-Express to drop latency and increase bandwidth in its FlashArrays. The way the NVM-Express protocol is implemented, it can have up to 64,000 parallel queues into a flash device with as many as 64,000 outstanding I/O requests in those queues, which is very wide indeed. In its prior FlashArray//m controllers using SAS protocols, the devices were set with a depth queue of eight, and with the NVM-Express DirectFlash modules, it is set at 256 (nowhere near what NVM-Express can do at the top end, in theory). This 256 queue depth gives each core running on the FlashArray controllers its own queue into each of the DirectFlash modules in a one-to-one setup, allowing for I/O accesses to be fairly deterministic.
While Pure Storage is not providing the full feeds and speeds for the FlashArray//x NVM-Express arrays, it says that the latency of I/O operations on the new devices can be up to 50 percent on a mix of 65 percent read and 35 percent write transactions that is typical in the enterprise. The new controllers coupled to the new flash modules can deliver up to twice the performance and up to four times the performance density. It took 44 of the prior FlashArray//m modules to saturate the Xeon controllers, but the new DirectFlash modules can saturate them with only ten DirectFlash modules in the FlashArray//x devices.
Pure Storage is not releasing pricing on the FlashArray//x yet, but Kixmoeller says that customers should not expect anything like a 2X premium over SAS flash products, and nothing like the 10X premium that Dell/EMC was trying to charge for its DSSD D5 array, which was killed off a few weeks ago, compared to its XtremIO all-flash block storage. “Clearly, if NVM-Express is 2X or 10X more expensive, it just limits the market,” says Kixmoeller. “So our goal here is to go for a really modest uptick in price over FlashArray//m, which will make it broadly affordable.”
One emerging use case that Pure Storage is seeing with the early adopter customers of the FlashArray//x is a top of rack flash deployment, replacing individual flash cards or flash drives that might have otherwise been deployed inside individual servers. (This is particularly popular with cloud infrastructure and SaaS software vendors.) “This will get a lot more interesting as NVM-Express over Fabrics gets more mainstream,” says Kixmoeller.
The FlashArray//x has a new //70 controller module that is based on the same Xeon compute engines as those used in the FlashArray//m arrays, but which come with the NVM-Express links into the midplane of the FlashArray chassis, which was designed to support NVM-Express when it was engineered three years ago.
The FlashArray//x with the //x70 controller is a base 3U chassis that has two ten-module capacity packs in them; the 2U expansion shelves do not have the pair of two-socket Xeon controllers and have room for two capacity packs as well. The //70 controller supports 16 Gb/sec Fibre Channel and 10 Gb/sec and 40 Gb/sec Ethernet links out to servers, plus it has 10 Gb/sec ports for replicating data across enclosures.
During its directed availability phase (where Pure Storage does a lot of hand-holding with early adopters), customers will be able to get 2.2 TB and 9.1 TB modules. When the FlashArray//x becomes generally available in the third quarter of this year, capacity will double up to 18.3 TB per module, and we can expect the capacity to double every year from here on out until there is a major product refresh, perhaps three years hence. With the RAID-3A and other overhead, Pure Storage will be able to cram 1 PB of effective, usable capacity (assuming hefty de-duplication, compression, and pattern removal on data) into a 3U form factor. That is 15 PB per rack (and hundreds of gigabytes per second of I/O bandwidth) for enterprise-class data, which is a substantial increase in density. Back in 2012 that would have taken 90 racks of Pure Storage gear to reach that 15 PB level using 128 GB flash modules.
Pure Storage is a clever technology company in that it knows enterprise customers hate the technical and economic disruptions that come with upgrading technology. But the company also knows that flash technology is changing fast and use cases are proliferating, and as such, helping companies consume new technology in a way that doesn’t mess with the books or cause grief in the datacenter is a good idea. And that is what the Evergreen storage program is about, which is outlined below:
There are different levels of Evergreen support, but if you are willing to spend a little, you can be on an automatic plan to upgrade controllers and storage on a regular basis. The Evergreen Silver level is new, and it doesn’t have automatic hardware upgrades in it.
With Intel and Micron Technology ramping up their Optane 3D XPoint non-volatile memory, you might be thinking that Pure Storage might have a plan to upgrade the FlashArrays to this alternative to flash. Don’t hold your breath.
“We see 3D XPoint as more of a replacement for DRAM than for flash,” says Kixmoeller. “If you think NVM-Express flash is expensive, 3D XPoint is the next level up in expensive, and it is a lot closer to the DRAM price point than the commodity flash price point. In our system, we don’t really see a massive advantage. We already have an NV-RAM caching tier based on DRAM, so we could move those to Optane, but that would actually reduce performance. Until 3D XPoint is at a mainstream price point for mass usage, it is not that exciting to us. You will see some other vendors add it as a cache tier, but it is just a way to get more cache without having to buy a lot of DRAM, but we are not particularly big fans of the caching model and variable latencies.”