Non Volatile Heaps And Object Stores In The Machine
February 8, 2016 Mark Funk
It is great that Hewlett Packard Enterprise is taking the storage bull by its horns with The Machine and working to create a complete compute and storage platform for the day when the needed technologies will all come together. Others, though, have also seen at least a part of this future and have been thinking about things like the programming model for such a system. When we have any form of non-volatile memory directly addressable by a processor, much of the programming model need not be affected, but some of it will and must be different.
In this sixth and final part in our series on the architecture of The Machine, we take a hard look at the heap, in this case a non-volatile heap. Call it an NV-Heap for short. For those of you running ahead of me here, you could even call it a persistent object store.
For those of you unfamiliar with the heap, think of it as temporarily used blocks of a program’s address space. If a program needs a temporary place to store some data, the program calls a heap manager and asks for the needed number of bytes from the available address space. When the program no longer needs that region of the address space, the program returns it to the heap manager for reuse later. In Java, C++, C#, and other languages, the notion of objects are created in the heap. Additionally, though, although these languages explicitly ask for heap memory, the blocks get automatically returned to the heap – via something called a garbage collector – when it is determined that that memory is no longer being used. (We bring this up here to provide some context for what is coming.)
Credit where credit is due: For more depth, you might want to refer here: NV-Heaps: Making Persistent Objects Fast and Safe with Next-Generation, Non-Volatile Memories.
The notion of the heap has been around for a long time as part of the programming model. All programmers have – or should have – some sense for its existence, but most are daily using it. Although no one really had thought much about it, it is today carved out of volatile memory. With non-volatile memory, the basic notion of the heap could be extended there as well; the heap is just regions of address space after all, from which smaller blocks are allocated (and to which are later returned). There are basic differences, one simple one being the answer to the question: “Should the heap being used reside in a portion of the address space associated with volatile or with non-volatile memory?” Said differently, when your program creates an object, do you need to have that object persist, even when the process or the operating system is inactive?
Let’s next return to our linked list object residing in fabric memory that we saw in a MIT Scratch animation in a previous article (and provided here again).
In order for the list to reside in Fabric Memory, each of the nodes of the list had to have been allocated from fabric memory; let’s say that each was requested from an NV-heap. Such a list is not a list – and scads of objects are not objects – without the links between nodes shown here. So, folks, what are these links?
What are those links when the nodes were allocated from a volatile heap as we know and love them today? Using the terminology introduced in the previous section, these links are process-local effective addresses (EAs). So, next, what happens to such an address when the process creating them no longer exists? The EAs linking these nodes are gone, lost, caput, nonexistent; the address links have a binary value, but their meaning is now lost with the process. In short, so is the list. And what of the memory allocated for these nodes having been sourced from fabric memory? The nodes continue to exist in fabric memory, consuming that otherwise NV-Heap fabric memory indefinitely, but no longer as part of a list, at least one that can be understood as a list.
Yes, the program model is different. Addressing into the NV-Heap and amongst objects created there cannot be process-local addresses; the addressing used for the links must be able to persist as well.
So, what is an NV-Heap really? Or more to the point, what are all of the many NV-Heaps that will get created as needed by each of many processes in each of many operating systems running on The Machine?
An operating system and the processes with it will be asking for fairly straightforward regions of fabric memory of some arbitrary size, a region of fabric memory which will subsequently be internally managed by a heap manager. Again, the purpose of the heap manager is largely just to manage the used and available blocks of persistent memory within this entire region. If an object is to be constructed, the heap manager is requested – potentially repeatedly – to provide the blocks of this memory to be used in the object’s construction.
Even though the objects themselves cannot be addressed by process-local EAs, EAs also happen to be the only way that a program can access this object; the processor’s instruction set uses EAs. Seems a bit of a Catch-22 doesn’t it. There is a solution, fortunately, but this too will show us another difference in program models used for volatile and non-volatile memory.
Consider the following picture. This represents an NV-Heap. It is presented here as a single contiguous block of bytes of fabric memory. This is what the heap manager is managing. In this picture, a set of just six node objects have already been allocated from this arbitrarily size region; think of the nodes each as a handful of bytes out of megabyte to gigabyte regions. Every block allocated is therefore at some heap manager-selected byte location, a byte offset within this region.
Separately, at the time that this region of fabric memory is first allocated as being an NV-heap, it is also allocated a Real Address range – for accessing by a node – as well as whatever types of addresses are needed for the processors to access this entire region. I am showing this in the figures as the entire region being represented by both a process-local EA and a system token representing the NV-heaps existence (for later reclamation). OK, good, a process allocating from this heap has an EA addressing the beginning of this NV-Heap and a byte offset relative to this EA of each block allocated from the NV-Heap; together these represent to the program where the object is to be constructed. Every object on this heap can be known by this NV-Heap identifier – this EA – and by the byte offset.
So, as in the preceding figure, let’s connect these nodes together again as a linked list. What are these links? The values linking these nodes into the linked list are these very byte offsets, at least in this example. The program working its way along this linked list – with the processor requiring EAs to do so – would use the NV-Heap’s starting EA as a base and add in the byte offset of each node along the way.
Fine, this can be done, but this is not normally the way that program working its way through a DRAM-based linked list would be doing it; the links there are very likely going to be EAs used directly. The fabric memory version of this same linked list instead requires the program to generate an EA of each node as it is working its way through the list. In short, the program model – which includes the compiler – needs to know the type of heap being used for these objects so that it can generate the correct code.
Seems like a problem, right? Well, yes and no. If working in Assembler or a language like C or C++, awareness of this difference would seem a requirement. But other object-oriented languages (Java and C# come to mind) – those languages which seem to go out of their way to hide the notion of an address from the programmer – might not perceive much of a difference. The program model would need to be aware of the difference between allocating from volatile versus non-volatile heaps, and the compilers would similarly need to be aware of this difference, but then it would seem possible for the correct type of code to be generated for these effectively different types of objects, based upon the volatile / nonvolatile attribute on the heap.
All that we have really done here is to begin to show how a heap in volatile memory would need to be perceived as some different than a heap in non-volatile memory. A more complete view of this can also be found here at NV-Heaps: Making Persistent Objects Fast and Safe with Next-Generation, Non-Volatile Memories. (Please note that this paper seems to assume that all cache is transparent to all memory, volatile or non-volatile, and so all cache is coherent across all processors.)
Before ending this series, we would like to refer back to another section in this series about fabric memory and ensuring recoverability. There we observed that if your intent is persistent objects, then not only would your objects need to be held in something like this NV-Heap, but persistence requires that:
- The objects actually reside in the fabric memory.
- There is provided along the way enough information to recover/restore an operation in the event of unexpected termination.
As a case in point, consider what it would take to delete that very same linked list residing in an NV-Heap. Successful destruction of that linked-list means that the fabric memory backing each of the nodes is successfully returned to the NV-Heap (or to the whole of the system) for subsequent use. The destruction needs to be done in such a way that restarting after a power failure knows just where the process left off and that recovering is even necessary. We won’t take you through the details here, the previously mentioned paper does a nice job, but here too is a further example that programming for persistence can be different than more traditional programming.