Building New Bridges Across the Memory Gap
April 13, 2015 Nicole Hemsoth
There is little doubt that the memory ecosystem will heat up over the next five to ten years, with emerging technologies still in development that promise massive bandwidth, capacity, and price advancements. But in the nearer term, as the demands of data add further weight in terms of the above elements—not to mention reliability, energy efficiency, and lowered costs – there is a driving need to improve technologies that are already in the market.
It’s simple to talk about mere memory capacity and bandwidth using hard numbers, but there are other problems that signal a troubled future for memory technology as we know it. Among the less discussed issues is overall reliability, which becomes more problematic as hardware – specifically, DRAM – scales down in size. These shrinkages might mean better energy efficiency and lower overall operational costs, but with each scale down, the size of the capacitors and access transistors decreases, which means the capacitor becomes less reliable, the access transistors can be leaky, therefore, refreshing memory cells has to occur more frequently.
This is problematic for two reasons. First, refreshing is a power-hungry process and second, refreshes in particular – not to mention all the other times memory is hit as applications run – lead to a breakdown in DRAM. Specifically, constant access in a sustained amount of time can lead to bit flips in adjacent rows, due mostly to the fact that the tightening of the space means less isolation of the components. This opens the system to unforeseen electrical disturbances in other areas of memory. This brought some attention earlier this month when Google researchers found how it was possible to expose this problem, called the “row hammer” bug, wherein they built an attack to show how easy (in theory) it would be to tap into the system via DRAM.
What’s interesting about the row hammer problem (and more general reliability issues) with DRAM is that there are not simple fixes baked into the current technology. At this point, adding more refreshes seems to be the best fix. The problem, again, is that these refreshes consume more power and further, add to more frequent hits against the DRAM, which means it can weaken it over time even faster.
With all this in mind, it may sound like DRAM has hit a scaling, performance, and reliability wall. But there is still a great deal of life left for the technology that can zap these problems—and target the overall energy efficiency and cost barriers.
Onur Mutlu and his team from Carnegie Mellon University have worked with Intel in the past to help address these and other problems in DRAM, but the real solutions for next generation memory that can push past some of these reliability, performance, power, and other issues are just on the other side of tomorrow. Mutlu told The Next Platform that it’s time that we start rethinking DRAM, which aside from the flaws described so far, has great potential.
“So far, we’ve been designing DRAM as a dumb device. The processor communicates with it, commands are sent, and it gets the data. There’s not much intelligence in the DRAM but if we can put more intelligence into it so it can fix the errors internally, adjust the refresh rate according to how much the system really needs, and further tune it according to the application, there is a lot that DRAM will be able to do.”
In essence, Mutlu wants to make DRAM capable of doing all the things that flash can do from an error correction point of view (remember that flash does all of this legwork in the controller) but without the latency and performance overhead. While flash is certainly cheaper, if DRAM reliability, cost, and performance can meet it in the middle, it might be possible to have the “best of both worlds” in terms of memory. It will still be necessary for users to keep flash around for persistent data in many cases, but to be able to do more with smarter DRAM could add a new angle as we wait for emerging non-volatile technologies to enter the sphere (memristors, phase change memory, resistive RAM, and so on). Adding intelligence to memory and bringing a best of all worlds approach to DRAM means finding a way to mesh the latency and bandwidth with a solution to errors. This still means adding a controller, and for high bandwidth memory (HBM) and hybrid memory cube (HMC) there are differing ways this is implemented, each with its own benefits and drawbacks.
“The reliability problems with DRAM have been known for some time, but it was mostly kept quiet and as a research effort until the Row Hammer bug was brought to light this month,” Mutlu says, noting that his team has been working on the problem with Intel.
The main point is that the scaling issues Mutlu described in terms of reliability can be solved by implementing a controller, something that has been done with both HMC and HBM. The ability for errors to be corrected within the system will be even more important for the largest-scale systems, including the forthcoming Aurora supercomputer at Argonne and other pre-exascale machines, where reliability (ECC-related as well as tolerating failure rates) of all components, not just memory, is being put under the microscope. For systems like this, which are data-intensive supercomputers as much as they are floating point powerhouses, the bandwidth capabilities of HBM and HMC type solutions represent a dramatic improvement. For instance on the upcoming Knights Landing processors from Intel, the HBM on the package has around 400 GB/sec of aggregate bandwidth across eight segments of 2 GB of memory using a proprietary link. The two DDR memory controllers on the chip have around 90 GB/sec of bandwidth and max out at 384 GB of capacity. So it is 4.5X the bandwidth for HBM than DDR4.
In both of these cases, power is always a concern, especially with HBM where the processor, albeit a very simple one, is stacked in a way that doesn’t allow the heat to escape easily. “They are similar in theory,” Mutlu explains, “but HBM is more like an evolutionary step from existing memory in that the interface is not changed too much. That will make it easier to adopt and might be the reason why so many systems will be using that approach.” He notes that HMC, while offering similarly high bandwidth and low latency, has a different interface to memory, changing it to something which looks, at the high level, somewhat like a network in how it deals with memory in a packets-based interface.
These are all technologies that we can see without a telescope, but Mutlu says there are far more promising memory solutions on the horizon—it will just be five to ten years before they ever find their way to market.
“The DRAM space is changing, it is really bifurcating. We used to just have commodity DRAM (DDR), but there is also DRAM in the mobile space, which is driving a lot of work since power is a concern, so there are new ways to adjust refresh to keep pace there. There is also work happening in graphics DDR (GDDR), which requires more bandwidth,” Mutlu explained. “As processing becomes more heterogeneous, so too does memory. We have more options in the near term driven by all of these things, so that will keep expanding.”
In terms of upcoming non-volatile technologies on the horizon, Mutlu said phase change memory appeared at first to be the most promising, but as they explored further, the power consumption problem was too prominent to offer a valid alternative next to another proposed memory coming in the next five years (or further), spin-transfer torque magnetic memory. This is closer to DRAM characteristics on chip, Mutlu says. “Phase change memory prototypes we’ve investigated have a 4X higher read latency. While write latencies are not higher with phase change, these aren’t as important. This, combined with the power requirements makes it hard to tell, but again, we’re still a long ways from seeing these things demonstrated.”
And, even if a memory technology does show promise as a system component, there is the whole problem of money that can get in the way.
“In the end, cost is the critical determinant—if it fits a good cost point at the level of the memory hierarchy it’s trying to replace,” says Mutlu. “Flash has been extremely successful because it’s much less costly than DRAM, but much higher than hard disks, so it’s filling the gap between DRAM and disk. Now we have a similar gap between DRAM and flash—if there’s a technology that can fit that it could work, it just has to prove itself out in terms of price, and that’s hard to say just now.”