For the past several years, Teuto.net, a small public and private cloud provider in Germany, has used the open source Ceph software as the backbone of its storage infrastructure. The software has met the bulk of the needs of its few hundred customers and the traditional workloads they run. But in a rapidly changing tech world that is seeing emerging technologies like containers and Kubernetes come into the play and new workloads based on artificial intelligence, machine learning and analysts, Teuto.net was beginning to see the limits of what Ceph could offer in terms of performance and, most importantly, latency.
“Ceph is very reliable and the scalability of Ceph is very good,” CEO Burkhard Noltensmeier tells The Next Platform. “We have relied on Ceph for the last four years and it is very solid storage and we continue to use it for our bigger demands for rotational storage, like object store. For that, we still use Ceph. We are satisfied with Ceph, but unfortunately it has got high latency and high CPU utilization, so it is not suitable for high transaction on the storage.”
However, Noltensmeier continues, more customers are “running Kubernetes on OpenStack and the demand for IOPS is ever increasing and we could not meet the IOPS demand from our customers with the Ceph storage that we’ve used up to now. Also, these AI and analytics workloads are on the rise and they will increase the need for IOPS for our customers. At the moment, even the traditional workloads can profit from these low latencies.”
For that, Teuto.net began looking for an alternative, and after evaluating some options, settled on the combination of Excelero’s NVMesh Server SAN block storage solution and Mellanox Technologies’ SN2100 25 Gb/sec Ethernet switches to create an environment that offers significant improvements in both performance and latency. The company began integrating NVMesh into its infrastructure in January and got it up and running in May.
We did a deep dive into Excelero’s NVMesh software, which essentially creates a pool of flash that supports block access protocols and puts NVM-Express at the center of its efforts. NVM-Express is the hot new protocol that is designed to boost the performance and power efficiency of flash and other non-volatile memory. The protocol has made its way into servers and its now beginning to appear in external storage appliances, like Dell EMC’s PowerMax array that was introduced early last month. The real improvements in performance and latency will come with the adoption of NVM-Express-over-fabrics (NVMe-oF) and storage-class memory (SCM). Teuto.net ruled out products that didn’t support NVMe-oF.
To address its needs, Teuto.net reviewed a number of options.
“We had tried a hyperconverged infrastructure with Ceph, but Ceph takes a big toll on the CPU,” Noltensmeier says. “If you go with Intel CPUs, you go from 12 cores to 16 cores, and on CPUs, the last four cores are the most expensive. So if you go with a hyperconverged setup, it’s good not to go with too many CPUs. This is a big advantage for Excelero because [its Remote Direct Drive Access] talks directly to the NVMe drives so they won’t take a toll on the CPU nodes at all. Therefore, you can scale it more easily and it’s less expensive on the CPU side. When you grow with your compute resources, you can also grow with your storage resources, so you don’t need extra capacity planning with that.”
The company also tried several ISCI products, but while ISCI “great from a performance perspective, because it was an appliance, we could not integrate it in our deployment.” The decision was to run NVMesh on SuperMicro servers on the teutoStack Cloud with ConnextX-4 and ConnectX-5 NICs from Mellanox running two 25 Gb/sec ports to each node. OpenStack Cinder is used to manage the storage, and with Linux support from both Excelero and Ubuntu, the result is an all-Linux implementation, which makes it easier to integrate orchestration and monitoring. Noltensmeier says the company also is using Cumulus Networking software with the switches and the open-source Prometheus for monitoring.
According to Teuto.net, that result has been a 2,000 percent increase in performance and a 10X jump in latency. The combination of Excelero and Mellanox delivers 8,000 IOPS per VM, compared with 400 IOPS with Ceph, according to the company. Through Excelero, latency is around 250 microseconds, compared with up to 2.5 milliseconds with Ceph.
“What concerns us most is the latency,” Noltensmeier says, adding that the company measured the latency in terms of “one consecutive I/O request after each other. So we measure our search performance [starting from] one. With this, we can see this big latency difference between Ceph and Excelero and where this percentage comes from. So if you’ve got a whole lot of requests, there might be a little less improvement in latency, but it’s this one single database that has to do its queries one after each other that will profit most from this low latency. Each customer needs this latency, maybe because of the traditional workloads. If you’ve got a database that already does parallel information storage, then you might not need so little latency because you can parallelize on the database. But if you have a more traditional approach with an SQL database, you’ll need this low latency at one point, so these traditional workloads profit very much from these low latencies.”
At the same time, enterprise use of AI and analytics in their workloads will increase, and Teuto.net is now recommending NVMesh to customers for their private clouds. At the same time, demand for hybrid cloud computing is growing, driving the adoption of Kubernetes, the CEO says.
“This is in the beginning, our customers using Kubernetes on workloads,” he says. “We’re beginning to offer a Kubernetes service to our customers. What is interesting from the hybrid cloud perspective is that there is a federation API in the making for Kubernetes where you can shift workloads very easily from one cloud to another, so Kubernetes may deliver on the promise of real hybrid clouds. In the next year we’ll see more hybrid cloud things from Kubernetes, which from an operations perspective are much easier solutions than [what we’ve had] up till now. Right now we have an integration with OpenStack and we would like to integrate Excelero directly on Kubernetes on bare metal and this will be a possibility in the future.”
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
This is good, NVMesh (SDS) involves the maintaining the software agent overhead. This still give opportunity to storage “islands”. Basically in converged environment you are trying maximize resource utilization without really increasing the compute and storage density and TOC still the same. If you look at the Pavilion Data Systems. where they provide the rack scale flash array with Millions of IOPS and more than 100 GB bandwidth at rack scale level with dense storage. Also provide the Kubernetes storage integration. No need to maintain the logical volumes anymore. https://paviliondata.com/