Programming For Persistent Memory Takes Persistence
April 21, 2016 Mark Funk
Some techies are capable of writing programs in assembler, but all will agree that they are very glad that they don’t need to. More know that they are fully capable of writing programs which manage their own heap memory, but are often enough pleased that the program model of some languages allow them to avoid it. Since like the beginning of time, computer scientists have been creating simplifying abstractions which allow programmers to avoid managing the hardware’s quirks, to write code more rapidly, and to even enhance its maintainability.
And so it is with a relatively new, and indeed both evolutionary and revolutionary, concept beginning to appear in computer systems: persistent memory. At some level, this memory, which is both directly addressable by processors and does not lose its data upon power failure, is not all that different than what has come before. It is just memory and so the program model associated with it does not need to change. But such persistent memory, as rapidly accessible as it is, also has its quirks, especially in ensuring and managing that persistence. Although some software engineers might be comfortable with its differences and find it not at all strange, others will want the protection of abstractions provided by an enhanced programming model.
It is an overview of this program model which is the focus of this article.
This article will be using the persistent memory found in a new computing system from Hewlett-Packard Enterprise’s, called The Machine (deep dive on it here), and a proposed associated programming model targeting that system to help focus the article. Know, though, that HPE, being perhaps the first, will not be alone in both hardware and software support of systems containing persistent memory. Before continuing, I would like to thank Dhruva Chakrabarti and his team at Hewlett-Packard Labs for helping explain their view on the program model for persistent memory. This two-part article is a follow-on to a series of articles previously published on The Next Platform outlining various aspects of The Machine. Links to the articles in that series can be found at the end of this article. But this article will attempt to stand alone, providing here the information you will need to follow along.
Background On The Machine
You are aware that disk drives and, more recently and more rapidly accessed, flash memory are a form of persistent storage. When the power goes off, the data remains. These are I/O-based devices. Even the flash memory is packaged as a “flash drive.” Although certainly a key part of any computer system, the processors nonetheless cannot directly access these I/O-based devices; data is instead copied between the memory directly attached to the processors, across one or more I/O links (for example, directly over the PCI-Express bus using a controller or over InfiniBand or Ethernet networks) to and from the persistent storage of such devices. Although both are fast and becoming faster, the transport time and the time to set up such transport is annoying.
Persistent memory, as used in this article, similarly ensures that its contents is not volatile, remaining unchanged even in the event of a power failure. The difference – between persistent memory and I/O-based persistent storage – is that the processors can access persistent memory’s contents in much the same way as processors have accessed volatile DRAM memory. Just as a program can access individual bytes of DRAM, that same program running on that same processor can, in much the same way, access the bytes of persistent memory. Rather than digging deeper here, let’s just say that accessing I/O-based persistent storage is considerably more complex and very much slower.
That is the starting point. Other pertinent attributes of persistent memory include the following:
- Being directly attached to the processors, reads from and writes to persistent memory are many, many times faster than the page reads and writes of a disk drive.
- Although preferably reading persistent memory would be as fast as a similar access from DRAM, the persistent memory access takes a few times longer.
- As with DRAM accesses, far and away most accesses will be block accesses, into and out of a processor’s cache. And then, most subsequent accesses of what you would think of as from persistent memory are from and to that cache, making those accesses many times faster than even DRAM accesses.
- The contents of any processor’s cache is volatile; if the power goes off on a processor, the contents of the cache is also lost.
- In order for cached data to become non-volatile, changed cached data blocks must be written into persistent memory.
- Changed cached data blocks are aged out of the cache and into memory by hardware at most any time (including never), based on need of the cache by other subsequently accessed data blocks. Software need not be involved. If that was the only means, a program would never know when it’s changed data blocks return to memory. The processor architecture provides the means for software to force the contents of specific cached blocks out of the cache and into memory, but this is rarely used.
- A program, storing – using the processor’s store instructions – to a location associated with either DRAM or persistent memory, is actually normally storing its change into the cache, not at that time into either form of memory. It is the previously mentioned aging or cached block flush operations which return the changed block to memory, otherwise the change remains in the cache indefinitely.
Without persistent memory, caches are designed in a way that, aside from performance, any access to/from there is very nearly completely transparent to any program. Whether or not the changed data was in the cache or in DRAM, from a program’s point of view, the program functioned identically; it did not matter whether what the program’s access was of some DRAM or in any processor’s cache, the effect was the same. Performance is just a lot better when the needed data happens to reside in cache. If the needed data happened to reside in persistent memory, but the program did not need to ensure persistence, the effect is the same. But, say after some number of changes, your program needs to ensure that the changes really are in persistent memory, something new is needed; the program needs to explain its intent to the hardware. This is part of what is new. This is part of what needs to be expressed and typically abstracted in the new program model.
You will also see that the manner in which those updates to persistent memory are supported is also a part of that program model. When your object in persistent memory requires that multiple actual changes make their way to persistent memory, and having allowed for a power failure in the meantime, that too is part of the program model. The folks in the know call this failure atomicity, a quite appropriate term.
Background On Persistent Storage
This difference is not really all that new, and yet it is.
Even at the highest levels of the user interface, you have been saving data into files. You select SAVE and, bam, your data is “saved,” residing out on disk, safe from a power failure. You don’t even need to think about all the complex processing, done at a lot of levels of the software and hardware to make this true. You also do not typically think about where in physical I/O space that file happens to reside. Heck, now days, the file could reside most anywhere in the world and it’s still, bam, its saved.
With the introduction of persistent memory on The Machine, the persistent memory can hold a file system and therefore could hold your file. Same words, same abstraction; select SAVE and – an even faster bam – your file is saved. The abstraction, the program model, remains unchanged. Certainly, some skilled architect knows what it really takes to save your file, to really make it persist, assuring that no part of the saved file remains only in a processor’s cache. And, as we will see shortly, there is still much more to supporting this abstraction.
Or, moving slightly up the skill level, your database programmers know that when they have requested some number of changes to the database and then perceived their operation as complete, the database manager itself will ensure that before that “complete” point the requested changes will persist across subsequent failures. In effect, either the changed database data itself or something representing those changes must reside on persistent storage before the high level program can perceive the change as having been made. More on this shortly.
As with a file and its file system, the database can reside in persistent memory. Here, too, the changes had to all have made their way into – here – persistent memory before such database operations can be perceived as being complete. But it is not your high level DB programmers who need to worry about this; only the software engineers supporting the lowest level of the database manager need even an inkling of what is really going on to persist these changes. At this highest level of DB use, even persistent memory did not really require a change in how the database changes were made. The program model did not really need to change at this level. Using persistent memory, what your database and/or system admin might perceive, though, is that the database operations are completing a lot more quickly, allowing more such changes to be concurrently executing.
And At The Other End Of The Rainbow
From the point of view of most folks, making files and databases persistent and durable to failure is well hidden. Aside from a very significant boost in performance, did it really much matter whether these were in persistent memory versus I/O-based disk drives or SSDs? This seems hardly a matter for a modified program model.
At that high level, yes, not much difference. But persistent memory allows something new as well. Its processor-based byte addressability, though, offers up a new option for the programmer, that of a persistent heap.
Picture it like this: All of those complex objects that are built in today’s volatile heap, and then later explicitly reorganized for writing or check-pointing out to non-volatile disk, can now instead be created and maintained directly in this persistent heap. Upon a power failure and then restart, done correctly, there it is, your object, waiting for you to continue roughly where you left off. This is, of course, as opposed to volatile memory’s having lost everything.
Certainly, part of this game is expressing your intent. Is your object to exist in volatile or in non-volatile heap? Is it OK, as today, to allow it to disappear or do you want it to persist? Programming languages need to know your intent. That distinction is part of the programming model. It, though, does not end – or even start – there.
This may seem counter-intuitive, but simply storing a change to an object residing in persistent heap does not immediately make the change durable to failure. Although well hidden from most of us, you know that stores are typically done first into a processor’s cache, where the change can reside for quite a while. Cache, though, is no more persistent than DRAM. An object’s changes which happen to still reside in a processor’s cache just disappear upon power failure, even if the object proper resides in persistent memory. As long as the power remains on, that change is perceivable to any processor in the same cache-coherent SMP. Once power is lost, though, the change is gone as well.
The point is, if persistence really is required for such objects, part of what is required is that such changes must first be forced from the cache into the persistent memory. With volatile memory-based objects, there normally is no such forcing; a subsequent access of the object, even if done on a different processor, will see the changed object even if it still resides in some processor’s cache; the change is visible everywhere implicitly. Cached data can, in the fullness of time, work its way into even persistent memory. The problem is that you don’t know when and power failures can occur at any time. So what is it that explicitly tells these temporarily cache-based objects to make their way into persistent memory and just when does that need to occur? In such explicit support, management of persistent objects also requires different code generation and, to some extent, a different design.
Fortunately, for the higher-level application programmer, full knowledge of what needs to occur requires not much more knowledge than identifying the use of a persistent heap for an object. But someone, in the support of the persistent memory’s program model, must know.
To explain, most modern languages provide for the support of already pre-packaged objects. As a case in point, consider the single class dictionary in Java, or C#, or even Python as a few examples. Before a dictionary could be used, someone implemented the many methods on these relatively complex objects. When you, at your higher level, construct or make a change to a dictionary, it is the dictionary class’ code making the needed – and often multiple – changes to maintain that object in a consistent state. The dictionary user is working with the semantics of a dictionary, but someone implemented for you what it actually takes to maintain consistently that dictionary object.
That, today, is for a dictionary in volatile memory; that is all that has been implemented. Now let’s add persistence, in the sense of persistent memory, to that dictionary. The multiple changes still needed to be made, but then those – still sitting in the cache – must be pushed into the persistent memory. During this process, at any moment, the power fails. When you restart, is your dictionary’s state consistent? Did all, none, and worse – some – of the changes show up? And how do you know? Ensuring an expected consistent state of this persistent dictionary is now a further responsibility of the code in support of such persistent dictionary objects. And doing that on this and other objects leads us into the next section.
Learning From History: Persistent Transaction Logs
Database managers have guaranteed that the database – perhaps consider it a database object – will remain in some consistent state, remaining durable to failure. This has been true for a lot of years now; it’s nothing new. In doing so, its design is built on the notion of its transactions having ACID properties (Atomicity, Consistency, Isolation, and Durability). For our purposes, let’s simply say that a database transaction, no matter how complex and how many changes, is always able to be perceived as either having never started OR of having executed to completion, even in the event of failure. For example, you wouldn’t want your bank transfer to have pulled your money without having it show up somewhere else: “I am sorry, sir, but we think we might have lost your money; the power failed during your money transfer transaction.”
In what we typically have today (i.e., sans persistent memory), a simplistic model has the database changes first made in the volatile DRAM memory and then forced out disk, only there being persistent. Given that is all that it is, given a power failure – say one occurring even when the changes began being forced out to disk – did all, or worse some, of the changes make their way out to disk? This is the Consistency and Atomicity part of ACID. For the Isolation part of ACID, consider, did anyone else see and use those database changes while still in memory, if none or only part of the changes made their way to disk? Again, a transaction is supposed to be perceived as either absolutely complete or as having never started, this even if there is a failure at any time along the way.
The preceding is database, but the concepts used there are applicable to other forms of persistent objects as well. So we continue; let’s require that other persistent objects move from one consistent state to another, that the object updates (the transactions) also follow the ACID properties, even in the event of a power failure. Once the object’s methods start execution, the results are either completed and are perceived as such or are perceived as never having started. (This is not required of all data residing in persistent memory, only of data whose changes you want to persist even across power failures.)
So given a failure at any time and the requirements of ACID transactions, how did they do it today? When the system restarts and the database is first re-accessed after the restart, how did they know that the database was still in or can be placed into an expected and consistent (ACID –> Consistency) state?
A nice description for one implementation can be found here, but we will attempt to provide a shortened version and build from there. Keep in mind that we are building on this for use in a program model used for objects within persistent memory.
Again, what are the enablers today?
The Transaction Log: Not only are objects changed, but such changes are separately recorded in a Transaction Log. You can think of the log as an ever growing list describing each and every change and the state of the data before the change. But here’s where the beauty comes in. The log entries associated with a transaction must have been guaranteed to reside in some form of persistent storage before the transaction is perceived as being complete, which also means before anyone else sees those changes. In a manner of speaking, as you will be seeing, these saved log entries are effectively the changes. Yes, we know, that last is a bit of a stretch at the moment.
To explain, realizing again that a failure can occur at any time during the execution of a transaction, let’s start by locking up the object being changed; with locking only one thread can see the object at a time. More precisely, the lock ensures that no other thread can access the object – or the changing portions of the object – during the entire period that our transaction is executing. With this serializing lock in place, we will record every change in the log while, indeed even before, the object changes are made. (These log records are done atomically, ensuring that if others are concurrently writing into the log, their stores to the log are done to other records without disrupting yours.)
Before freeing the object lock, all of the transaction’s log records are written to persistent storage, with the last such log record indicating that the transaction is committed; in doing so, the log fully describes both what is need for that transaction to change the object and the state of the object before it was changed. After that commit, and after having completed all of the changes to the object, returning it to a consistent, only then is the lock freed, allowing others to see the change. Stating it again as a list:
- Lock the object (or important portions of it).
- Record the object’s current and changing state in the log.
- Potentially concurrently make the changes on the object in volatile memory.
- Push the log entries, included an entry reading committed to persistent storage.
- Unlock the object.
OK, fine, if successful we’ve made the object changes and recorded the process in a persistent log. But a power failure could have occurred at any moment along the way. How does this ensure that the object is either perceivable as having completed or had never occurred? Additionally, notice that we are still assuming IO-based persistent storage (e.g., disk drives). As such, when the lock was freed, the actual change to the object was guaranteed only to reside in volatile DRAM; it’s the log records that are now persistent. (We make no statement here about whether or not the object itself is now in persistent storage. All or part of the object’s changes might well be on disk for reasons that we won’t get into here, so we are only guaranteed that when we unlock the changes are in DRAM.)
While reading this section, keep in mind that we are assuming that persistent storage is today’s IO-based disk or flash drives.
Next, given this, we will look at a couple cases of what happens in the event of a power failure. Recovery occurs after power is restored, the OS is re-activated, and prior to re-accessing the object. The last is supported by code associated with such objects. Let’s say that:
- Power fails after the transaction’s log entries, including commit, are completely written to disk. When restarting, the log is scanned in a forward direction, looking for records identifying the start of the transaction and then the “committed” end of the transaction. Recall that there is no guarantee that the changes themselves are in persistent storage; all you know for sure here is that the transaction can be perceived as being complete, because the log here says so. If there had been no such failure, the transaction would have continued executing and completed the changes to the object proper; later asynchronously writing even these to disk, independent of the lock. Once the lock had been freed, those object changes, typically still only in volatile memory, are then able to be seen by others. That means that given some failure and restart, now that the recovery code is executing, the recorded changes need to be applied or made as described in the log; recall that if the changes had only been in memory at the failure, they would have been lost. All of the transaction’s changes are applied per the log. Additionally, this recovery includes ensuring that the changes then really do reside in persistent storage; you will see why shortly.
- Power failed before all of the transaction’s log entries are written to disk. When restarting, the log is scanned in a forward direction. It will find no commit record for this transaction. Sans commit, it must be as though the transaction had never started. That also means that no changes to the object proper are allowed to reside in persistent storage; this is even though, prior to the failure, some changes might have been made and inadvertently made their way to disk. Recovery’s job is to work backward through this transaction’s log records, restoring – or simply ensuring that the object is restored – the object to the pre-transaction state. Recall again that under normal processing (i.e., there was no failure) the object had been locked, ensuring that no one else saw any partial changes, even though those changes may have made their way to disk.
To recap, in the current model, a record of what is and what it will become is saved in an object’s transaction log and saved to persistent storage before the transaction is considered to be completed. Each transaction is considered to have been committed when transaction processing learns that the transaction’s log entries have been saved to disk; being committed does not necessarily mean that the changed object itself is in persistent storage.
Before we go into the next section, which folds in persistent memory, let’s also observe that we need to keep around – and perhaps later process – all of these transaction Log entries for as long as we are unsure that associated object changes themselves have not made their way to disk. Notice that the only thing synchronous – and so within the bounds of the transaction – was that we ensured that it was the log records that made their way to persistent storage; we have no idea at that time whether or not the pages containing the actual object are on disk. Said differently, the object changes for any given transaction will make their way out to disk, but it could be many seconds, minutes, or even hours after the point where we considered the transaction complete. And, again, the log records need to be maintained, and potentially even reprocessed, if we don’t know for sure.
Fortunately, once we do know, those old log records can be trimmed off. The reason for trimming is that, if some subsequent recovery really is required, the very many unneeded log records do not need to be reprocessed. Of course, the database manager folks know all about these trade-offs. If the log is not kept trimmed, it can take a long time to recover a database from its log. There is a whole field of optimization and options with this in mind.
And the reason for deferring the objects writes to disk? It takes real time to gather up those changed pages and write them out. Do we really want to delay the transaction’s completion, holding locks all the while, until the transaction knows for sure that all of the changes are in persistent storage? If done fast enough, we just might.
As a quick aside, in order in increase data availability and guard against a real catastrophe, these same log entries might be replicated either synchronously or asynchronously to a completely different system. With a database residing there as well, these log records can be processed there to completely replicate the state of the database on that system. Beauty.
In the next article of this two-part series, we go on an and consider how this program model might change if this relatively slow form of persistent storage were replaced by a much faster form of byte-addressable persistent memory.