Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

25
Storage Class Memory

Geoffrey W. Burr¹ and Paul Franzon²

¹IBM, USA

²North Carolina State University, USA

25.1 Introduction

One challenge in computing systems is the need for new memory technologies that can improve overall performance. More and more frequently, the ability of a CPU to rapidly execute programs is being limited by the rate at which data can arrive at the processor. Unfortunately, scaling does not automatically solve this problem. One evolutionary solution has been to increase the size of cache memory and thus the floor space that SRAM occupies on a CPU chip. However, this trend eventually leads to a decrease of the net information throughput.

While DRAM offers higher density than SRAM, auxiliary circuitry is required to maintain the stored data. True nonvolatility has conventionally required external storage media (e.g., magnetic Hard Disk Drives (HDDs), optical CDs, etc.), with access times that are slower than the volatile memory by many orders of magnitude. Solid-State Disks (SSDs) based on NAND Flash have recently offered nonvolatility at significantly lower latencies than HDDs, but are block-based, much slower to erase than to program, and offer fairly poor cycle endurance.

The development of an electrically accessible nonvolatile memory with high speed, high density, and high endurance, referred to as “Storage Class Memory” or SCM, would initiate a revolution in computer architecture. By using CMOS-compatible fabrication technology scaled beyond the present limits of SRAM and FLASH, such a memory technology could enable breakthroughs in both stand-alone and embedded memory applications. This development could potentially provide a significant increase in information throughput beyond the benefits traditionally associated with scaling CMOS devices into the nanoscale.

This chapter is structured as follows. Sections 25.2 and 25.3 motivate and define SCMs. Sections 25.4 through 25.6 discuss target specifications and potential device solutions. Sections 25.7 through 25.10 discuss architectural implications of SCMs and open issues related to architectural exploitation.

25.2 Traditional Storage: HDD and Flash Solid-state Drives

Conventionally, magnetic hard-disk drives are used for nonvolatile data storage. The cost of HDD storage in US$/GB is extremely low and continues to decrease. Although the bandwidth with which contiguous data can be streamed is high, the poor random access time of HDDs limits the maximum number of I/O requests per second (IOPs). Also, HDDs have relatively high energy consumption, large form factor, and are subject to mechanical reliability failures in ways that solid state technologies are not. However, the sheer number and growth in HDD shipments per year (380 000 Petabytes in 2012, growing at 32% per year) means that magnetic disk storage is highly unlikely to be “replaced” by solid-state drives at any time in the foreseeable future [1].

Nonvolatile semiconductor memory in the form of NAND Flash has become a widely used alternative storage technology, offering faster access times, smaller size and lower energy consumption when compared to HDD. However, there are several serious limitations of NAND Flash for storage applications, such as poor endurance (10⁴–10⁵ erase cycles), modest retention (typically 10 years on a new device, but only 1 year at the end of rated endurance lifetime), long erase time (∼ms), and high operating voltage (∼15 V). Another difficult challenge of NAND Flash SSD is due to its page/block-based architecture. By not allowing for direct overwrite of data, sophisticated garbage collection, wear-leveling and bulk erase procedures are required. This in turn requires additional computation (thus limiting performance and increasing cost and power associated with a local processor, RAM, and logic), and over-provisioning which further increases cost per effective user-bit of data [2].

Although Flash memory technology continues to project for further density scaling, inherent performance characteristics such as read, write, and erase latencies have been nearly constant for over a decade [3]. While the introduction of multi-level cell (MLC) Flash devices extended Flash memory capacities by a small integral factor (2–4), the combination of scaling and MLC have resulted in the degradation of both retention time and endurance, two parameters critical for storage applications. The migration of NAND Flash into the vertical dimension above the silicon is expected to continue this trend of improving bit density (and thus cost per bit) while maintaining or even slightly degrading the latency, retention, and endurance characteristics of present-day NAND Flash.

This outlook for existing technologies has opened interesting opportunities for prototypical and emerging research memory technologies to enter the nonvolatile solid-state memory space.

25.3 What is Storage Class Memory?

Storage class memory (SCM) describes a device category that combines the benefits of solid-state memory, such as high performance and robustness, with the archival capabilities and low cost of conventional hard-disk magnetic storage [4,5]. Such a device requires a nonvolatile memory (NVM) technology that could be manufactured at a very low cost per bit.

A number of suitable NVM candidate technologies have long received research attention, originally under the motivation of readying a “replacement” for NAND Flash, should that prove necessary. Yet the scaling roadmap for NAND Flash has progressed steadily so far, without needing any replacement by such technologies. So long as the established commodity continues to scale successfully, there would seem to be little need to gamble instead on implementing an unproven replacement technology.

However, while these NVM candidate technologies are still relatively unproven compared to Flash, there is a strong opportunity for one or more of them to find success in applications that do not involve simply “replacing” NAND Flash. Storage Class Memory can be thought of as the realization that many of these emerging alternative nonvolatile memory technologies can potentially offer significantly more than Flash, in terms of higher endurance, significantly faster performance, and direct-byte access capabilities. In principle, Storage Class Memory could engender two entirely new and distinct levels within the memory and storage hierarchy. These levels would be differentiated from each other by access time, with both levels located within the more than two orders of magnitude between the latencies of off-chip DRAM (∼80 ns) and NAND Flash (20 μs).

Table 25.1 lists a representative set of target specifications for SCM devices and systems, which are compared against benchmark parameters offered by existing technologies (HDD, NAND Flash, and DRAM). Two columns are shown, one for slower S-class Storage Class Memory, and one for fast M-class SCM.

Table 25.1 Target device and system specifications for SCM

	Benchmark			Target
Parameter	HDD	NAND flash	DRAM	Memory-type SCM	Storage-type SCM
Read/write latency	3–10 ms	∼100 μs (block erase ∼1 ms)	<100 ns	<200 ns	1–5 μs
Endurance (cycles)	Unlimited	10³–10⁵	Unlimited	>10⁹	>10⁶
Retention	>10 yr	∼10 yr	64 ms	>5 d	∼10 yr
ON power (W/GB)	0.003–0.05	∼0.01–0.04	0.4	<0.4	<0.10
Standby power	∼52–69% of ON power	<10% ON power	∼25% ON power	<5% ON power	<5% ON power
Areal density	∼10¹¹ bit/cm²	∼10¹¹ bit/cm²	∼10⁹ bit/cm²	>10¹⁰ bit/cm²	>10¹⁰ bit/cm²
Cost (US$/GB)	∼0.1–1.0	2	10	<10	<3–4

25.3.1 Storage-type SCM

The first new level, identified as S-type storage-class memory (S-SCM), would serve as a high-performance solid-state drive, accessed by the system I/O controller much like an HDD. S-SCM would need to provide at least the same data retention as Flash, allowing S-SCM modules to be stored offline, while offering new direct overwrite and random access capabilities (which can lead to improved performance and simpler systems) that NAND Flash devices cannot provide. However, it would be absolutely critical that the eventual device cost for S-SCM be no more than 1.5–2.0× higher than NAND Flash. While such costs need not be realized immediately at first introduction, it would need to be very clear early on that costs would steadily approach such a level relative to Flash, in order to guarantee large unit volumes and justify the sizeable up-front capital investment in an unproven new technology. Note however that such system cost reduction can come from sources other than the raw cost of the device technology: a slightly higher-cost NVM technology that enables a simple, low-cost SSD by eliminating or simplifying costly and/or performance-degrading overhead components would achieve the same overall goal. If the cost per bit could be driven low enough through ultrahigh memory density, ultimately such an S-SCM device could potentially replace magnetic hard-disk drives in enterprise storage server systems as well as in mobile computers.

25.3.2 Memory-type SCM

The second new level within the memory and storage hierarchy, termed M-type storage-class memory (M-SCM), should offer a read/write latency of less than ∼200 ns. These specifications would allow it to remain synchronous with a memory system, allowing direct connection from a memory controller and bypassing the inefficiencies of access through the I/O controller. The role of M-SCM would be to augment a small amount of DRAM to provide the same overall system performance as a DRAM-only system, while providing moderate retention, lower power/GB and lower cost/GB than DRAM. Again, as with S-SCM, the cost target is critical. It would be desirable to have cross-use of the same technology in either embedded applications or as a standalone S-SCM, in order to spread out the development risk of an M-SCM technology. The retention requirements for M-SCM are less stringent, since the role of nonvolatility might be primarily to provide full recovery from crashes or short-term power outages.

Particularly critical for M-SCM will be device endurance, since the time available for wear-leveling, error correction, and other similar techniques is limited. The volatile portion of the memory hierarchy will have effectively infinite endurance compared to any of the nonvolatile memory candidates that could become an M-SCM. Even if device endurance can be pushed well over 10⁹ cycles, it is quite likely that the role of M-SCM will need to be carefully engineered within a cascaded cache or other Hybrid Memory approach [6]. That said, M-SCM offers a host of new opportunities to system designers, opening up the possibility of programming with truly persistent data, committing critical transactions to M-SCM rather than to HDD, and performing commit-in-place database operations.

25.4 Target Specifications for SCM

Since the density and cost requirements of SCM transcend the straightforward scaling application of Moore's Law, additional techniques will be needed to achieve the ultrahigh memory densities and extremely low cost demanded by SCM, such as: (1) 3-D integration of multiple layers of memory, currently implemented commercially for write-once solid-state memory [7] and/or (2) multiple bits per cell (MLC) techniques.

Table 25.1 lists a representative set of target specifications for SCM devices and systems compared with benchmark parameters of existing technologies (HDD and NAND Flash). As described above, SCM applications can be expected to naturally separate based on latency. Although S-class SCM is the slower of these two targeted specifications, read and write latencies should be in the 1–5 μsec regime in order to provide sufficient performance advantage over NAND Flash. Similarly, endurance of S-class SCM should offer at least one million program-erase cycles, offering a distinct advantage over NAND Flash. In order to support offline storage, 10 year retention at 85 °C should be available.

To make overall system power usage competitive with NAND Flash and HDD, and since faster I/O interfaces can be expected to consume considerable power, the device-level power requirements must be extremely minimal. This is particularly important since low latency is necessary but not sufficient for enabling high bandwidth–high parallelism is also required. This in turn mandates a sufficiently low power per bit access, both in terms of peripheral circuitry and device-level write and read power requirements. Finally, standby power should be made extremely low, offering opportunities for significant system power savings without loss of performance through rapid switching between active and standby states.

To achieve the desired cost target of within 1.5–2.0× of the cost of NAND Flash, the effective areal density will similarly need to be very close to NAND Flash. This low-cost structure would then need to be maintained by subsequent SCM generations, through some combination of further scaling in lateral dimension, by increasing the number of multiple layers, or by increasing the number of bits per cell.

Also shown in Table 25.1 are the target specifications for M-type SCM devices. Given the faster latency target (which enables coherent access through a memory controller), program-erase cycle endurance must be higher, so that the overall nonvolatile memory system can offer a sufficiently large lifetime before needing replacement or upgrade. Although some studies have shown that a device endurance of 10⁷ is sufficient to enable device lifetimes on the order of 3–10 years [8], we anticipate that the need for sufficient engineering margin would suggest a minimum cycle endurance of 10⁹ cycles. While such endurance levels support the use of M-class SCM in memory support roles, significantly higher endurance values would allow M-class SCM to be used in more varied memory applications, where the total number of memory accesses may become very large.

As was discussed above, the expected use of M-class SCM within an online system reduces the requirement of nonvolatility greatly. Here the minimum retention that would be required would be sufficient to allow recovery of transaction or other critical data from the system, within a week or so of loss of power either due to a localized or wide-scale loss of power to the overall system. Obviously, a larger retention time would support even more varied applications, including even long-term offline storage. Active power should be competitive with or better than DRAM, with standby power again being significantly lower. While M-class SCM will not need to be refreshed, it should be noted that this may not necessarily lead to large power and efficiency improvements – in existing DRAM systems, the power used and overhead associated with refresh has typically been fairly modest. However, this trend may change as DRAM itself scales to more aggressive technology nodes. As with S-class SCM, the areal density of M-class SCM will need to be high in order to support a low cost as compared with DRAM.

Note that S-class SCM could potentially fail to match NAND Flash in cost yet still “succeed” in the market, because it can offer inherently higher performance, higher endurance, and the additional benefits of direct byte-accessibility. In contrast, M-class SCM will almost certainly have to offer slower performance and lower cycle endurance than DRAM. While nonvolatility is an attractive feature that DRAM cannot provide, it is unlikely that M-class SCM will meet with widespread success unless it can also prove attractive in some combination of lower cost, higher bit density, and/or lower power usage, with respect to DRAM.

25.5 Device Candidates for SCM

As discussed above, numerous nonvolatile memory candidates have been researched as potential replacements for NAND Flash, or more recently, as potential enablers of Storage Class Memory. Some of these memory candidates have been successfully commercialized, yet are still unsuitable for enabling SCM. This is not surprising since the required combination of attributes – high nonvolatility (ranging from 1 week to 10 years), very low latencies (ranging from hundreds of nanoseconds up to tens of microseconds), physical durability during practical use, and most important, ultra-low cost per bit – is nontrivial to attain.

Necessary attributes of a memory device for the storage-class memory applications, mainly driven by the requirement to minimize the cost per bit, are Scalability, the potential for high areal density through either MLC (Multilevel Cells) and/or 3D integration, low Fabrication cost, long-term Retention, low Latency, low Power, high cycle Endurance, and low Variability.

Table 25.2 shows the potential of the prototypical memory entries and the current emerging research memory entries for storage-class memory applications based on the above parameters. Each of the above-mentioned categories is qualitatively described as either a strength of that technology (represented with a green face), a nonstrength (yellow face), or a decided weakness (red face), in terms of suitability for SCM applications.

Table 25.2 Potential of prototypical and emerging research memory candidates for SCM applications.

	Prototypical					Emerging
							Redox RRAM
Parameter	FeRAM	STT-MRAM	PCRAM	Emerging ferroelectric memory	Conducting bridge	Metal oxide: bipolar filament	Metal oxide: unipolar filament	Metal oxide: bipolar interface effects
Scalability
MLC
3D integration
Fabrication cost
Retention
Latency
Power
Endurance
Variability

The three prototypical memory candidates are FeRAM, STT-MRAM, and PCRAM. While FeRAM was the first to be commercialized, its difficulties in scalability, MLC, and 3D integration make it a poor candidate for SCM, despite its excellent latency, power, and endurance characteristics. STT-MRAM can be expected to be particularly good in latency and endurance, but achieving scalability while maintaining thermal stability, low write power, and sharply defined resistance distributions are significant challenges. In particular, implementing MLC – by stacking multiple STT-MRAM cells with different and carefully tuned characteristics – can be expected to be quite difficult to implement. While PCRAM has been shown to be scalable, capable of MLC, and only requires a unipolar selection device for 3D integration, reducing power and maintaining good endurance, retention, latency, and low fabrication costs are still a work in progress. However, the recent release of PCRAM as NOR replacement products could potentially provide an opportunity to improve these aspects for SCM applications.

Of the emerging research memory entries, the emerging ferroelectric memories such as FeFETs are quite similar to FeRAM but can be expected to be more scalable, while offering lower endurance due to charge-trapping effects. The large amount of work over the past 3–5 years in the area of Redox Memories have made it clear that these memories need to be further subcategorized into Conducting bridge memories, and Metal Oxide memories based on Bipolar Filaments, Unipolar Filaments, or Bipolar Interface effects. The metallic filaments of copper or silver in conducting bridge devices provide a large resistance contrast suitable for MLC, but also tend to lead to poor retention characteristics. While a unipolar Metal Oxide memory (such as NiO) can be more suitable for 3D integration, these memories also tend to exhibit high switching power and poor endurance. Since three of these new subcategories involve filaments, the potential for scalability should be strong. However, while Bipolar Metal Oxide memories (such as HfO_x or TaO_x) have no other strong weaknesses, it is not yet clear whether the broad cycle to cycle variations in both resistances and switching voltages will support aggressive scaling to future technology nodes. So far, the low switching currents required at such nodes have tended to lead to increased variability. Finally, Bipolar Interface Effects in Metal Oxides such as PCMO (PrCaMnO) avoid problems related to filaments. The need to move ions across the entire device aperture tends to set up an unpleasant tradeoff between writing speed and long-term data retention.

The next section will discuss architectural issues in Storage Class Memory further. This discussion will touch on the system design and potential application spaces for SCM.

25.6 Architectural Issues in SCM

In traditional computing, SRAM is used as a series of caches, which DRAM tries to refill as fast as possible. The entire system image is stored in a nonvolatile medium, traditionally a hard drive, and is then swapped to and from memory as needed. However, this situation has been changing rapidly. Application needs are both scaling in size and evolving in scope, and thus are rapidly exhausting the capabilities of the traditional memory hierarchy.

By combining the reliability, fast access, and endurance of a solid-state memory together with the low-cost archival capabilities and vast capacity of a magnetic hard disk drive, Storage Class Memory (SCM) offers several interesting opportunities for creating new levels in the memory hierarchies that could help with these problems. SCM offers compact yet robust nonvolatile memory systems with greatly improved cost/performance ratios relative to other technologies. S-class SCM represents ultra-fast, long-term storage, similar to an SSD but with higher endurance, lower latencies, and byte-addressable access. M-class represents dense and low-power nonvolatile memory at speeds close to DRAM.

In order to implement SCM, both the emerging memory technologies as well as new interfaces and architectures will be needed, in order to fully use the potential, and to compensate for the weaknesses of various new memory technologies. In this section, we explore the Emerging Research Architecture implications and challenges associated with Storage Class Memory.

25.6.1 Challenges in Memory Systems

Current memory systems range in size from Gigabytes (low-volume ASIC systems, FPGAs, and mobile systems) through Terabytes (multicore systems that manage execution of many threads for personal or departmental computing), to Petabytes (for database, Big Data, cloud computing, and other data analytics applications), and up to Exabytes (next-generation, exascale scientific computing). In all cases, speed (both in terms of latency of data reads and writes as well as bandwidth), power consumption, and cost are absolutely critical. However, the importance of other system aspects can vary across these different application spaces.

Some applications such as data analytics and ASIC systems can benefit from having associative memories or content addressability, while other applications might gain little. Mobile systems can become even more compact if many different memory tiers can be combined on the same chip or package, including nonvolatile M-class or even S-class Storage Class Memory.

Many computer systems are not running at peak load continuously. Such systems (including mobile or data analytics) become much more efficient if power can be turned off rapidly while maintaining persistent stored data, since power usage can then become proportional to the computational load. This provides additional incentive for the nonvolatile storage aspect of SCM.

Access patterns in data-intensive computing can vary substantially. While some companies continue to use relational databases, others have switched to flat databases that must be separately indexed to create connections among entries. In general, database accesses tend to be fairly atomic (as small as a few Bytes) and can be widely distributed across the entire database. This is true for both reads and writes, and since the relative ratio of reads and writes varies widely by application, the optimality of any one design can depend strongly on the particular workload.

Total cost of ownership is influenced by cost to purchase, cost to maintain, and system lifetime. Current cost to purchase trends are that Hard Disk Drives (HDD) costs roughly an order of magnitude less per bit than Flash memory, which in turn costs almost an order of magnitude less per bit than DRAM. However, cost to purchase is not the only consideration. It is anticipated that S-class SCM will consume considerably less power than HDD (both directly and in terms of required cooling), and will take up considerably less floorspace. By 2020, if the main storage system of a data center is still built solely from HDD, the target performance of 8.4 G-SIO/s could consume as much as 93 MW and require 98 568 square feet (9157 m²) of floor space [5]. In contrast, the improved performance of emerging memories could supply this performance with only 4 kW and 12 square feet (1.11 m²) [5]. Given the cost of energy, this differential can easily shift the total cost advantage to emerging memory, away from HDD, even if a cost per bit differential still exists.

Roughly one-third of the power in a large computer system is consumed in the memory subsystem [9]. Some portion of this is refresh power, required by the volatile nature of DRAM. As a result, modern data servers consume considerable power even when operating at low utilization rates. For example, Google [10] reports that servers are typically operating at over 50% of their peak power consumption even at very low utilization rates. The requirement for rapid transition to full operation precludes using a hibernate mode. Thus a persistent memory that did not require constant refresh would be valuable.

These requirements have led to considerable early investigation into new memory architectures, exploiting emerging memory devices, often in conjunction with DRAM and HDDs in novel architectures. These new Storage Class Memories (SCM) are differentiated as whether they are intended to be close to the CPU (M-class), or to largely supplement the hard-drives and SSDs (S-class).

Because of the inherent speed in SCMs, the software can easily limit the system performance. Thus IO software from the file system, through the operating system and up to applications will have to be redesigned in order to best leverage SCMs. The number of software interactions must be reduced and disk-centric features will need to be removed. Conventional software can account for anywhere from 70 to 94% of the total IO latency [11]. It is likely to be valuable to give application software direct access to the SCM interface, although this can then require additional considerations to protect the SCM device from malicious software.

25.6.2 Emerging Memory Architectures for M-Class SCM

Storage Class Memory architectures that are intended to replace, merge with, or support DRAM, and be close to the CPU, are referred to as M-type or Memory-type SCM (M-SCM). The required properties of this memory have many similarities to DRAM, including its interfaces, architecture, endurance, and read and write speed. Since write endurance of an emerging research memory is likely to be inferior to DRAM, considerable scope exists for architectural innovation. It will necessary to choose how to integrate multiple memory technologies to optimize performance and power while maximizing lifetime. In addition, advanced load leveling that preserves the word level interface and suitable error correction will be needed.

The interface is likely to be a word addressable bus, treating the entire memory system as one flat address space. Since the cost of adapting to a new memory interfaces is sizeable, an interface standard that could support multiple generations of M-SCM devices would be highly preferred. Many systems (such as in automobiles) might be deployed for a long time, so any new standard should be backward-compatible. Such a standard should be compatible with DRAM interfaces (though with simpler control commands), and should reuse existing controllers and PHY (physical layers), as well as power supplies, as much as possible. It should be power efficient, for example, supporting small page sizes, and should support future directions, such as 3D master/slave configurations. The M-SCM device should indicate when writes have been completed successfully. Finally, an M-SCM standard might have to support multiple data rates, such as a DDR-like speed for the DRAM and a slower rate for the NVRAM [12].

While wear-leveling in a block-based architecture requires significant overhead to track the number of writes to each block, simple techniques such as “Start-Gap” Wear-Leveling are available for direct byte access memories such as PCM (Phase Change Memory) [13]. In this technique, a pair of registers is used to identify the location of the start point and an empty gap within a region of memory. After some threshold number of write accesses, the gap register is moved through the region, with the start register incrementing each time the gap register passes through the entire region [13]. Additional considerations can be added to defend against detrimental attacks intended to intentionally wear out the memory [13].

With such techniques, even an M-class SCM that is markedly slower than DRAM can offer improved performance by increasing available capacity and by reducing the occurrence of costly cache misses [8]. The presence of a small DRAM cache helps keep the slower speed of the M-class SCM from affecting overall system performance in many common workloads. Even with an endurance of 10⁷, device lifetime has been shown to be on the order of three years [8]. Techniques for reducing the write traffic back to the SCM device can help improve this by as much as a factor of three under realistic workloads [8].

Direct replacement of DRAM with a slightly slower M-class SCM has also been considered, for the particular example of STT-MRAM [14]. Since individual byte-level writes to STT-MRAM consume more power than in DRAM, a direct replacement is not competitive in terms of energy or performance. However, by re-architecting the interaction between the output buffer and the STT-MRAM, unnecessary writes back to the NVM can be eliminated, producing a sizeable energy improvement at almost no loss in performance [14]. However, the use of write buffers means that the device must be able to complete all writes back to nonvolatile memory in the event of power loss. Integrating PCM into the mobile environment, together with a redesigned memory management controller, is predicted to deliver a six times improvement in speed and also extends the memory lifetime six times [15].

Caches are intended to ensure that frequently needed data is located close to the processor in nearby, low-latency memory. In storage architectures, “hot” or frequently accessed data is identified and then moved to faster tiers of storage. However, as the number of tiers or caches increases, a significant amount of time and energy is being spent moving data. An alternative approach is to completely rethink the hardware/software interface. By organizing the computational system around the data, data is not brought to the processor but instead processing is performed in proximity to the stored data. One such emerging data-centric chip architecture, termed “Nanostores” [16], is predicted to offer 10–60× improvements in energy efficiency [17].

25.6.3 Emerging Storage Architectures for S-Class SCM

S (Storage) type SCMs are intended to replace or supplement the hard-disk drive as main storage. Their main advantage will be speed, avoiding the seek time penalty of main drives. However, to succeed, their total cost of ownership needs to approach that of HDDs. Research issues include whether the SCM serves as a disk cache or is directly managed, how load leveling is implemented while retaining a sufficiently fast and flexible interface, how error correction is implemented, and what is the optimal mix of fast yet expensive and slow yet inexpensive storage technologies.

The effective performance of Flash SSD, itself slower than S-SCM, has been strongly affected by interface performance. The standard SATA (Serial Advanced Technology Attachment) interface, which is a commonly used interface for SSD, was originally designed for HDD and is not optimized for Flash SSD [18]. There are several approaches for novel interfaces or architectures that can take advantage of the native Flash SSD performance [18–20], including PCIe, Thunderbolt, and Infiniband [11].

A likely introduction of these new memory devices to the market would be as hybrid solid-state discs, where the new memory technology complements the traditional Flash memory to boost the SSD performance. Experimental implementations of FeRAM/Flash [21] and PCRAM/Flash [22] have been explored. It was shown that the PCRAM/Flash hybrid improves SSD operations by decreasing the energy consumption and increasing the lifetime of Flash memory [22].

Additional open questions for S-SCM include storage management, interface, and architectural integration, including whether such a system should be treated like a fast disk drive, or as a managed extension of main memory. To date, disk-like systems built using nonvolatile memories have had disk-like interfaces, with fixed-sized blocks and a translation layer used to obtain block addresses. However, since the filesystem also performs a table lookup, some portion of SCM performance is sacrificed. In addition, nonNAND Flash SCMs have randomly accessible bits and do not need to be organized as fixed-size blocks [23].

While preserving this two-table structure means that no changes to the operating system are required to use, or switch between, new S-SCM technologies, the full advantages of such fast storage devices cannot be realized. There are two alternative approaches to eliminate one of these lookup tables. In Direct Access mode, the translation table is removed, so that the operating system must then understand how to address the SCM devices. However, any change in how table entries are calculated (such as improvements in garbage collection or wear-leveling) would require changes in the operating system [23].

In contrast, in an Object-Based Access model, the file system is organized as a series of (key,value) objects. While this requires a one-time change to operating systems, all specific details of the SCM could be implemented at a low level. This model leads to greater efficiency in terms of both speed and effective “file” density, and also offers potential for enhanced reliability [23].

Even first-generation PCM chips, although implemented without a DRAM cache, compare favorably with state of the art SSDs implemented with NAND Flash, particularly for small (<2 KB) writes and for reads of all sizes [24]. The CPU overhead per input–output operation is also greatly reduced [24]. Another observation for even first-generation PCM chips is that, while the average read latency is similar to NAND Flash, the worst-case latency outliers for NAND Flash can be many orders of magnitude slower than the worst-case PCM access. This is particularly important considering that such S-class SCM systems will typically be used to increase system performance by improving the delivery of urgently needed “hot” data.

Another new software consideration for both S- and M-class SCM is the increased importance of avoiding memory corruption, either through memory leaks, pointer errors, or other issues related to memory allocation and deallocation [25]. Since part of the memory system is now nonvolatile, such issues are now pervasive and may be difficult to detect and remove without affecting stored user data.

25.7 Conclusions

Storage-class memory (SCM) describes a device category that combines the benefits of solid-state memory, such as high performance and robustness, with the archival capabilities and low cost of conventional hard-disk magnetic storage. Memory class SCM is intended to be almost as fast as DRAM while providing at least five days of nonvolatility (for data recovery) at a similar or lower cost per bit. Storage class SCM is intended to outperform NAND Flash at a comparable or slightly higher cost per bit. Promising device candidates for SCM include STT-MRAM, PCM and some types of RRAM. Architecturally, Memory-class SCM is likely to be organized like DRAM, while Storage-class SCM is likely to be accessed through disk type interfaces. Challenges exist in error management and wear-leveling as these steps will have to be performed at much faster latencies and data rates than is done today for NAND Flash.

References

1. Fontana, R.E. Jr., Decad, G.M., and Hetzler, S.R. (2013) The Impact of Areal Density and Millions of Square Inches (MSI) of Produced Memory on Petabyte Shipments of TAPE, NAND Flash, and HDD, MSS&T 2013 Conference Proceedings, May 2013.
2. Deng, Y. and Zhou, J. (2011) Architectures and optimization methods of Flash memory based storage systems. Journal of Systems Architecture, 57, 214–227.
3. Grupp, L.M., Caulfield, A.M., Coburn, J. et al. (2009) Characterizing Flash Memory: Anomalies, Observations, and Applications, MICRO'09, Dec. 12–16, 2009, New York, NY, USA, p. 24–33.
4. Burr, G.W., Kurdi, B.N., Scott, J.C. et al. (2008) Overview of Candidate Device Technologies for Storage-Class Memory. IBM Journal of Research and Development, 52(4/5), 449–464.
5. Freitas, R.F. and Wilcke, W.W. (2008) Storage-class memory: The next storage system technology. IBM Journal of Research and Development, 52(4/5), 439–447.
6. Franceschini, M., Qureshi, M., Karidis, J. et al. (2010) Architectural Solutions for Storage-Class Memory in Main Memory, CMRR Non-volatile Memories Workshop, April 2010, http://cmrr.ucsd.edu/education/workshops/documents/Franceschini_Michael.pdf.
7. Johnson, M., Al-Shamma, A., Bosch, D. et al. (2003) 512-Mb PROM with a three-dimensional array of diode/antifuse memory cells. IEEE Journal of Solid-State Circuits, 38(11), 1920–1928.
8. Qureshi, M.K., Srinivasan, V., and Rivers, J.A. (2009) Scalable high performance main memory system using phase-change memory technology, ISCA '09 - Proceedings of the 36th annual International Symposium on Computer Architecture, ACM, pages 24–33.
9. “Final Report, Exascale Study Group: Technology Challenges in Advanced Exascale Systems” (DARPA) (2007).
10. Barroso, L.A. and Holzle, U. (2007) The case for energy-proportional computing. IEEE Computer, 40(12), 33–37.
11. Swanson, S. (2012) System architecture implications for M/S-class SCMs, ITRS SCM workshop, July 2012, http://www.itrs.net/ITWG/ERD_files.html.
12. Kim, K.H. (2012) Memory Interfaces for M-Class SCMs, ITRS SCM workshop, July 2012, http://www.itrs.net/ITWG/ERD_files.html.
13. Qureshi, M.K., Karidis, J., Franceschini, M. et al. (2009) Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling, MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, ACM, pages 14–23.
14. Kultursay, E., Kandemir, M., Sivasubramaniam, A., and Mutlu, O. (2013) Evaluating STT-RAM as an energy-efficient main memory alternative, Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
15. Lee, H. (2010) High-performance NAND and PRAM hybrid storage design for consumer electronics. IEEE Transactions on Consumer Electronics, 56(1), 112–118.
16. Ranganathan, P. (2011) From microprocessors to nanostores: Rethinking data-centric systems. COMPUTER, 44, 39–48.
17. Chang, J. (2012) Data-centric computing and Nanostores, ITRS SCM workshop, July 2012, http://www.itrs.net/ITWG/ERD_files.html.
18. Kim, D., Bang, K., Ha, S.-H. et al. (2010) Architecture exploration of high-performance PCs with a solid-state disk. IEEE Transactions on Computers, 59, 879–890.
19. Fusion (2013) www.fusionio.com (accessed 16 July 2013).
20. NVM (2013) http://download.intel.com/standards/nvmhci/NVM_Express_Explained.pdf (accessed 16 July 2013).
21. Yoon, J.H., Nam, E.H., Seong, Y.J. et al. (2008) Chameleon: A high performance Flash/FRAM hybrid solid state disk architecture. IEEE Computer Architecture Letters, 7, 17–20.
22. Lee, H.G. (2010) High-performance NAND and PRAM hybrid storage design for consumer electronics. IEEE Transactions on Consumer Electronics, 56, 112–118.
23. Miller, E.L. (2012) Object-based interfaces for efficient and portable access to S-class SCMs, ITRS SCM workshop, July 2012, http://www.itrs.net/ITWG/ERD_files.html.
24. Akel, A., Caulfield, A.M., Mollov, T.I. et al. (2011) Onyx: a protoype phase change memory storage array. Hot Storage, 2011, 10–19.
25. Coburn, J., Caulfield, A.M., Akel, A. et al. (2012) Nv-Heaps: Making persistent objects fast and safe with next-generation, nonvolatile memories. ACM Sigplan Notices, 47(4), 105–117.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 25: Storage Class Memory

Create new playlist

Sign In

Sign Up