IBM Real-time Compression Appliance
This chapter briefly describes the features and functions of the IBM Real-time Compression™ Appliance (RTCA).
For more information about IBM Real-time Compression, see the Introduction to the Redbooks publication, IBM Real-time Compression Appliance, which can be found at the following website:
The following topics are covered:
 
21.1 Introduction to data compression
The industry need for data compression is clearly for it to be fast, reliable, and scalable. The compression algorithm used must assure data consistency and a very good compression rate in order to be implemented. In addition, the data compression solution must also be easy to implement. The compression must occur without impacting the production use of the data at any time. A generic overview of the RTCA solution is presented in Figure 21-1.
Figure 21-1 Real-time Compression Appliance overview
21.2 IBM Real-time Compression
To understand the basic design of the IBM Real-time Compression technology, we need to review in detail the basics of modern compression techniques.
The IBM Real-time Compression Appliance (IBM RTCA) is based on a reversible data compression algorithm that operates in a real-time method.
The IBM RTCA product compresses data on initial write in order to assure that less data is stored on primary storage. As a result, the storage system has to process less data, using less CPU overhead and lower disk spindles utilization. The storage system can therefore serve more requests from its read/write cache, while some reads can be served from the RTCA product’s read-ahead cache.
In addition to compressing data in real-time, the IBM RTCA product also enables its customers to non-disruptively compress existing data that is already saved to disk with the Compression Accelerator utility. Compression Accelerator is a high-performance and intelligent software application running on the IBM RTCA product which, by policy, allows users to compress data that has already been saved to disk while that data remains online and accessible by applications and end users.
The policies allow users to throttle how decompressed data gets compressed so as not to have an impact on existing storage performance. The ability to compress already stored data significantly enhances and accelerates the benefit to end users, allowing them to see a tremendous return on their IBM RTCA investment. On initial purchase of an IBM RTCA product, users can defer their purchase of new storage. As new storage needs to be acquired, IT purchases less than half of the “required” storage before compression. The IBM RTCA product enables IT to save on their overall storage investment.
21.3 Benefits
IBM Real-time Compression Appliance solutions enable five key benefits:
Real-time efficient operation: Supports the performance and accessibility requirements of business-critical applications because data is compressed up to 80% in real time, without performance degradation.
Transparency: 100% transparent to systems, storage, and applications. Provides compatibility with downstream processes such as backups, Snapshots, cloning, mirroring and archiving. And it complements deduplicated environments.
Non-disruptive: Requires no changes to applications, servers or storage systems. All IT processes remain the same.
Less Cost, Greener: Cost reduction benefits carry throughout the storage lifecycle: less storage to power, cool, and manage means less cost and a greener data center.
Performance: By offloading the compression to an appliance, the storage controller is not handling the compression itself and has more processor cycles free for serving storage.
21.4 IBM RTCA RACE technology
The IBM Random Access Compression Engine (RACE) technology is the core of IBM RTCA products for NAS. RACE technology is based on 35 patents that are not about compression. Rather, they define how to make industry standard LZ compression of primary storage operate in real-time and allow random access. The primary intellectual property behind it is our RACE engine. The IBM RACE engine sits on an appliance in front of any NFS or CIFS deployment, acting as an “intelligent cable” between the IP switch and the storage. No software agents or drivers are required on clients or servers.
The RACE technology (see Figure 21-2) is made up of three components:
Random Access Compression Engine (RACE): Enables random-access data compression without compromising performance.
Unified Protocol Manager (UPM): Enables transparent support of multiple storage and network protocols, including CIFS and NFS.
Monitoring and Reporting Manager (MRM): Enables online storage compression trending, analysis, and reporting.
Figure 21-2 RACE technology overview
The traditional compression technologies start from a constant file size and, after compression, the result is a variable file in terms of capacity. As a result, when using large data chunks, the performance impact is high. However, when using small data chunks, although the performance impact is small, the compression ratio is also very small. Over time, there are many disadvantages that can occur. They include the need for garbage collection, poor performance while the volume of the data increases, or losing parts of metadata, such as the date of creation, date accessed, user rights, or modification dates. Another issue can be fragmentation in the target storage space. It occurs because after the file is stored in its original size, the result of compression is stored in a new zone and then the input is deleted.
The Random Access Compression Engine (RACE) starts from an unknown data stream and compresses data coming from the host. The resulting compressed file keeps all attributes from the original; metadata is not changed. Also, because of the in-line approach, there is no need at the storage level to write original data, read it, write the result of compression, and finally, delete the initial file. At the end, there is not any garbage or fragmentation on the storage system. The performance needed at the storage level is decreased because the writes and reads are made only in compressed format instead of a complete one.
A logical overview of the Random Access Compression Engine is presented in Figure 21-3.
Figure 21-3 Random Access Compression Engine
RACE takes incoming data streams and compresses the data within these data requests, leaving the metadata intact, to the storage array. The data is stored in the array and the acknowledgement that the write has been committed is sent directly back from the array to the end user or application. This process flow is important because from a data availability perspective, it is imperative that it is the storage array that acknowledges the write commitment, not the IBM RTCA product. It is for this reason that the IBM RTCA product has no write cache. All storage commits come from the array, preserving the integrity of the data between the storage and the application.
As stated before, the IBM RTCA technology uses industry standard LZ compression algorithms. The “secret sauce” is not the compression algorithms that the RTCA product uses to do its compression, but rather the manner in which that compression is accomplished. One of the key ways that the RTCA product is able to achieve its high compression ratios and performance is by compressing data utilizing random access techniques.
The benefits of compressing data using random access techniques are twofold. First, the ability to read or write only the blocks of the compressed file that require read or modification means faster access performance for these operations. If you only need to write a small piece of data in order to update a whole file, your storage performance is maximized. Second, because the RTCA product has this capability, updates to the file are accomplished in a way that does not disrupt the other blocks in the compressed file.
We are operating under the assumption that upstream data compression significantly reduces downstream data deduplication ratios. Therefore it is preferable to apply data deduplication technologies over decompressed data prior to performing a deduplicated backup. It can be true for data compressed with traditional techniques. But the unique, random access nature of the IBM RTCA product's compression preserves data deduplication ratios. It allows end users to experience maximum data optimization in both primary and downstream tiers.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.231.26