Forget in-memory computing for the moment because it requires a complete re-architecting of applications and most of the time the underlying hardware, too. What we really want is something more like in-memory storage – something that can be done immediately and provide performance benefits now.
This is what Formulus Black is doing with its Forsa in-memory storage, which is often mistakenly called in-memory computing. But that is not really what it is. Sort of.
ERP software powerhouse SAP is an illustration in now difficult it can be to embrace in-memory computing as it has been developed. Consider the substantial time it has taken SAP to port its own ERP software to run on its own HANA in-memory database and how long it will take its vast customer base of over 470,000 customers to move their applications to HANA – something that SAP is requiring them to do by 2025. To be fair, HANA has over 30,000 customers to date, and there are over 11,500 customers that are now using the S/4HANA ERP applications that run natively on SAP’s in-memory database. But at the current rate of converting 2,600 customers a year to S/4HANA, it will take 180 years to get it done. HANA is technically elegant and selling well enough, but this transition to in-memory computing is taking too long, even with the substantial benefits that come with pulling the database inside of main memory and, these days, stretching that main memory even further on Intel Xeon SP systems with Optane 3D XPoint persistent memory.
The world needs an easier way to start using memory better, and Formulus Black, which dropped out of stealth in March with the 2.0 version of its in-memory storage, has just rejiggered the code after substantial feedback from early adopters and partners and rolled up a much better Forsa 3.0 release that makes accelerating applications through in-memory storage quite compelling. The Forsa 3.0 tool also helps speed up and stretch the capacity of systems using Optane persistent memory, too, which started shipping earlier this year with Intel’s “Cascade Lake” Xeon SP processors. Ironically, it might be possible for customers running SAP ERP application software and the databases behind them in-memory without having to resort to a move to HANA.
The Forsa in-memory storage is not limited to any particular application type, of course, but databases, which still largely run on disk-based systems and which were tuned for the slow I/O of them, are an obvious place to start accelerating the databases. The big database vendors have been doing in-memory computing inside of their relational databases by shifting to columnar formats and doing all kinds of compression and other techniques to squeeze all or a portion of database tables into the main memory of the system, which allows it to be queried faster. This provided a performance boost, but it only improves the performance of that particular database and it means companies are wrestling with column and row formats now, too.
The way that Formulus Black sees it, the best means to accelerate the database – and indeed, any other application that is used to hitting block storage – is to pull as much of the data on which it depends off disk and flash and get it into main memory. And to make that possible, Formulus Black has come up with a data encoding and reduction technique, which it calls Forsa bit markers, which it claims is neither data compression nor data de-duplication as we know it, that can shrink application code and data by anywhere from 4X to 24X, depending on the nature of that data. This data is stored in the DRAM of a server, which can now be extended further with Optane DIMMs at least on Cascade Lake Xeon SP systems, giving more capacity for in-memory storage. There is CPU overhead because the Forsa software requires processing to encode and decode all of this data that is stored in main memory, and there is also some overhead and time to do a snapshot of the contents of main memory to flash – what Formulus Black calls a blink – so it can be persisted and recovered in the event of application failure.
Among early adopters, customers are eager to accelerate various kinds of databases and datastores because Forsa allows any database to store data in main memory instead of on disk without having to modify the database. You can basically bring the database cache size down to the minimum setting and then allow the application to go directly to disk – what it thinks it a disk logical unit, or LUN, but which is actually a logical extension to memory, or LEM as Formulus Black calls it. Other customers are testing Forsa on a variety of HPC applications, such as those from ANSYS Fluent to Dassault Systemes Abaqus to CERN HTCondor. Others have homegrown applications that they want to accelerate that are part of their processing pipeline by enabling all I/O to remain in memory rather than go out to disk.
“There is not a single customer that has come back and said that they were not happy with the speed improvement,” Jing Xie, chief operating officer at Formulus Black, tells The Next Platform. “It’s always a 2X or 3X improvement, or in the case of the Looker data analytics tool, it was 70X performance improvement against the fastest storage in the market today. So when we talk about ROI with customers, we are talking about improving query transaction rates, completing more HPC simulations and models in the same unit of time, and also how much CPU utilization is pushing applications with Forsa versus another environment, such as VMware ESXi. We are seeing some very good numbers there.”
With the initial Forsa 2.0 release from earlier this spring, Forsa was rolled up with Ubuntu Server and run on bare metal servers. There was a dependency between Forsa and Ubuntu, and not every Linux shop wants to run Ubuntu Server just to accelerate their applications by moving their storage into main memory. To that end, with Forsa 3.0, which was recently announced, Formulus Black is breaking this dependency and running as a piece of systems software on any Linux distribution. CentOS is the first one to be added to Ubuntu Server, and Red Hat Enterprise Linux and SUSE Linux Enterprise Server are close behind.
The Forsa in-memory storage software can run on Intel’s “Haswell” Xeon v3 family of processors from 2014 and anything after that, including “Broadwell” Xeon v4, “Skylake” Xeon SP, and “Cascade Lake” Xeon SP processors. “This has been very good for us because a number of these proofs of concept are to make these old servers run extremely fast and dramatically improve the ROI of existing investments,” says Xie.
Support for Windows Server is probably coming up next, as will be support for AMD Epyc processors and possibly other kinds of RISC processors – IBM Power and various Arm chips are still relevant.
“Our architecture for Forsa is agnostic to the chip, and we do not have any lock-in to Intel processors,” explains Pradeep Balakrishnan, head of software development at Formulas Black. “As far as the Windows operating system goes, the memory table entries are not exposed by the kernel at this point, and we have not done any development for Windows on the client and server because of this, but it has been one of my personal goals to make Forsa run on my Windows laptop. In Windows Server 2019, Microsoft is revealing the memory interfaces and once they are publicly available, we will start to do Windows Server prototyping.”
Companies can run Windows Server as a guest on top of Linux server with the KVM hypervisor, and many early adopters are using Forsa this way to goose the performance of SQL Server databases and other Windows Server applications.
Our immediate thought is that Forsa could be etched onto an ASIC or coded up into an FPGA and put on a memory controller inline to make this all utterly transparent to the system and not require burning CPU cycles on the host server.
“We have been thinking about putting Forsa on an FPGA or even an ASIC, and you can put it inside the logic of memory controllers, too,” says Balakrishnan. “The code is actually inline, so it can be infused into an embedded device as well. But we made a conscious decision to make this a software play at this point, but architecturally, we could do any of these things or license it to others so they can do it.”
It would be interesting to see AMD partner with – or outright acquire – Formulus Black as a counter counterpunch to Intel’s Optane persistent memory. (any one of the main memory makers like Micron Technology or Samsung might want to buy Formulus Black and keep it out of the market, too.) Optane is a bit slower, less capacious, and quite a bit more expensive than Intel had hoped, and at this point Forsa can amplify the effectiveness of Optane persistent memory as well as main memory in a Cascade Lake system. Formulus Black has been testing Forsa in Intel’s labs running MySQL databases on top of Optane, and it boosted the performance 20 percent above and beyond what the expanded memory that comes from Optane memory sticks provided.
Optane persistent memory also provides another thing for Forsa users: peace of mind.
“Customers are understandably concerned about running their databases solely on DRAM, which loses its contents when the power goes out, even if they can persist it to local storage with our Blink feature,” Xie says. “They sleep better with Optane in the machines and knowing that their databases can run on it and be persisted as if they were using flash out on the periphery but it is being stored right on the memory bus.”
The Forsa software also helps customers using Optane persistent memory on Linux systems get around a sticky issue, says Balakrishnan, and that is transparent huge pages. “In the Linux kernel, transparent huge pages is a big problem. If you look at the best practices from MongoDB, Spark, Hadoop, and even Oracle, you will see that they quietly ask you to disable transparent huge pages in the Linux kernel to make their software work with Optane persistent memory. Not many admins want to touch a Linux kernel that has been working for years. But with Forsa, you don’t need to worry about this. You use the Linux distribution as it is, and Optane just works.”
With the updated release, Forsa is also starting to scaled up and scaled out so it can support larger sets of data and applications. The initial Forsa 2.0 could only run on two-socket Xeon machines with as much physical memory as you could cram into them. With the 3.0 release, Forsa can scale to a maximum of 64 sockets in a NUMA machine in theory. Formulus Black is getting its hands on a Superdome-X platform from Hewlett Packard Enterprise to test the scale up limits. Even on more typical four-socket and eight-socket servers, this will be a much larger memory pool to convert into block storage. The updated software also scales out horizontally with what is called, appropriately enough, a scale out LEM, which creates a virtual in-memory LUN that spans two different nodes (of any number of sockets) and pools the memory as if it was in a single POSIX block device. The architecture in Forsa 3.0 is set at a maximum of eight nodes for scaling out, and it has been tested up to four nodes in the labs.
Imagine pooling in-memory storage across eight interconnected eight-node servers. . . .
There are a few other features that make Forsa more enterprise ready in the 3.0 release. The management console for Forsa can now handle eight distinct servers, and this will grow over time as the Forsa clusters do. Forsa could already underpin ESXi and KVM virtual machines and now it supports Docker containers, which means it plays nicely with the Docker Enterprise and Kubernetes container controllers. Moreover, with the update, different parts of the environment can be selectively blinked and scripted as such; the 2.0 release required the entire system to be backed up to flash – all VMs and all of the underlying LEMs, every time. Forsa still makes a master blink of the entire system and keeps it around on flash when it is first installed, and this can be updated at any frequency that provides a level of comfort.
Users can now migrate VMs and their LEMs across clustered machines, too, thanks to the support for linking multiple nodes into a shared memory pool. There is particular interest, says Xie, in using Forsa to migrate customers off of VMware ESXi and onto KVM while at the same time moving applications to memory-resident storage. What customers have realized, Xie says, is that in moving from ESXi to KVM backed by Forsa, the CPU utilization for the same workload is much lower, so they can get more work done and give more effective memory to applications at the same time. It’s a win-win-win. This VM migration is not dependent on live migration features of ESXi or KVM. You quiesce the VMs manually, unmount the VMDKs, and move them over to the new machine, mount them on a KVM VM, and it works. This HA replication feature doesn’t have to convert VMs from ESXi to KVM, and it is still somewhat experimental.
The Forsa software is licensed per socket on servers. Older systems with smaller memory can slightly less, but a reasonably loaded modern server costs $10,000 per socket. That price is based on value provided, according to Xie. When running Forsa, a single machine can do the same work that would take two or three servers, and the license to Forsa should be less than the incremental savings from servers that customers don’t have to buy, which might cost $15,000 to $20,000 in a memory heavy configuration that also has a lot of compute. Even on more cheaply configured servers that might cost $10,000 with some zippy flash and reasonable memory and compute, you could make the economic case to spend $10,000 on the server, $10,000 on Forsa per socket, and get $30,000 worth of serving for $30,000 in a footprint that is one third the size it might otherwise be.