As the center of gravity shifts from compute to data, architectures are responding by moving the former a lot closer to the latter. That not only promises to increase application throughput, but energy efficiency as well.
Here at The Next Platform, we have likened this trend toward data centricity as profound as the historical shifts from mainframe to client/server, and client/server to the web. In some cases, moving compute closer to the data also means the host CPU is relieved of a lot of processing drudgery, which can further boost overall performance. A prime example of this is Mellanox Technologies use of its switch and adapter ASICs to offload MPI primitives and other network-based operations from the host. Something similar is happening with storage, and SSDs in particular, under the moniker “computational storage.”
Moving computational resources into storage devices not only shortens the access path to data, thus lowering latency, but also alleviates the bottleneck at the I/O ports. When it comes to really big datasets – here, we’re talking petabytes – keeping everything on the storage side can make a huge difference. For example, to transfer one petabyte of data from storage to main memory over 32 lanes of PCIe Gen3, takes a full nine hours. That time is cut in half for Gen4 and will be cut in half again for Gen5 when it arrives, but you’re still talking hours. If you have to move the data from a storage array to servers over a 100 Gbps network, you’re looking at over a day to load it. Computational storage means you only have to deal with the extremely fast busses within the device itself.
There are a number of entrants in the computational storage market, include some big names like Samsung and Western Digital, but also a handful of intriguing startups, including Scaleflux, Eideticom, and NGD Systems. At this point, NGD appear to be the most ambitious of the bunch and is now offering a 16 TB NVM-Express SSD equipped with an Arm processor that is exclusively devoted to running application code. The presence of the Arm hardware, in this case a quad-core Cortex-A53 processor integrated into the SSD controller ASIC, represents a significant departure from most of the early computational storage offerings, which relied on FPGAs to supply the processing muscle.
NGD also broke new ground by manufacturing its ASIC on 14 nanometer transistors – the first SSD controller chip to employ this process node. According to NGD Systems chief technology officer and cofounder Vladimir Alves, using the more advanced node enabled them to provide the needed capability in a 2.5-inch form factor, while also drawing a minimum of power. The 16 TB NVMe device burns a total of 12 watts, which works out to less than a watt per terabyte and, according to the company, the highest energy efficiency in the industry.
Although the embedded Cortex-A53 processor is key to the solution, Alves says for them the bigger effort was developing the programming model to make sure regular application developers would be able to tap into the ASIC. That includes an API that provides high-level language support for in-situ computation, as well as the hooks to run Ubuntu Linux on the processor.
Application software running on X86 servers would, for the most part, just need to be cross-compiled to Arm to be ported to their SSD. We say for the most part because apparently a host agent is used as the go-between, driving the application from the CPU side. Since the Arm processor is running Ubuntu, in some cases users could bypass application software altogether and just develop Linux shell scripts to do things like pattern matching, filtering, or searching on the stored data. Linux file system support is currently limited to Ext4 and GFS2. However, a port to HDFS is also in the works, thanks to some encouragement from a large customer with Hadoop-based applications.
One of the prime targets for the NGD technology is the big web service providers, especially for the kind of data-intensive work that is inherent to searching, pattern matching, and indexing. According to Cisco Systems’ Global Cloud Index (GCI), these hyperscale environments house slightly more than half of all the data stored in datacenters worldwide. And that proportion is expected to grow to 65 percent by 2021. In that same year, the total amount of data stored across all datacenters is projected to reach 1.3 zettabytes. All of which bodes well for computational storage.
Of course, a quad-core Cortex-A53 won’t deliver server-level performance, so it’s not simply a matter of just porting an application from the CPUs (or GPUs) over to the SSDs and expecting the code to magically speed up. And when you’re talking about machine learning codes, it’s probably more reasonable to expect to move inferencing smarts to the SSD, rather than training. But assuming these compute-enhanced SSDs are available across large numbers of nodes, developers will be able to take advantage distributed computing to spread out the computation as needed. And since compute and storage are now integrated into a single device, scalability is built in.
NGD’s next big use case is edge storage, where data influx is huge, storage capacities are limited, and, perhaps most importantly, the ability to send reams of data back to a datacenter is severely constrained. The good news (from NGD’s perspective) is that edge devices are an even more data-rich environment than datacenters, Using Cisco’s GCI projections once again, IoT devices alone are expected to generate 847 zettabytes per year by 2021, although only a fraction of that will be stored. Here computational storage could play a pivotal role in not only reducing the amount of data that has to be saved, but also in performing analytics and inferencing in near real-time.
Content delivery is another market on NGD’s radar. Here computational storage can help with things like encryption/Digital Rights Management (used to verify the content can be accessed by the user) and locality of service. According to the company, putting compute on the SSD means you can do away with the database servers that normally perform these tasks. And considering you only need less than a dozen 16 TB SSDs to hold the entire Netflix and Amazon Prime libraries, you can start to see how this changes the way content delivery might work.
To date, the company has published only a few instances of speed-ups for specific applications. In their most visible example, NGD demonstrated that the Facebook Artificial Intelligence Similarity Search (FAISS) software could be accelerated as much as 80 times by using in-situ processing on their SSD. In that case, execution times on the NGD platform barely increased as the dataset grew, while the X86 server saw its run-times increase dramatically.
Nevertheless, Alves maintains that wherever parallelism can be extracted, they can significantly improve both performance and energy efficiency. “Some workloads we see a 10X or 100X improvement in performance,” he says. “In other workloads, it could be 10 percent or 20 percent.”