Deduplication with VMware vSphere 4.1
This chapter provides information about Advanced Single Instance Storage (A-SIS) deduplication and the benefits of enabling it. It also guides you step-by-step on how to set it up for a VMware vSphere 4.1 environment.
This chapter includes the following topics:
13.1 A-SIS deduplication overview
N series deduplication is a technology that can reduce the physical storage required to store a certain amount of data. Any typical data that might be stored in a disk volume has a certain amount of redundancy. It occurs in the form of identical data strings written to the volume multiple times. At a high level, the N series system can reduce the storage cost of this data. It does so by examining it and eliminating the inherent redundancies, as shown in Figure 13-1.
Figure 13-1 A-SIS savings
N series deduplication is managed at the volume level. Individual volumes can be configured to take advantage of deduplication, depending on the nature of the data in the volume. N series deduplication operates at the block level, which gives it a high level of granularity and efficiency. During the deduplication process, fingerprints of the individual blocks within a volume are compared to each other. When duplicate blocks are found, the system updates pointer files within the file system to reference one of the duplicate blocks. The others are deleted to reclaim free space.
The deduplication process does not occur at the time the data is written. It runs on a predetermined schedule or can be started manually at any time. Because deduplication process can be run at any time after the data was written, the performance impact of deduplication is low. During times when the storage system is busy or is accepting many new write operations, the only impact is the lightweight fingerprinting process. The total impact to performance of the system is low. The more I/O intensive deduplication process can then be scheduled to run during a period of low activity.
The amount of space savings using deduplication vary depending on the nature of the data being deduplicated. Results of anywhere between 10% and 90% space savings can be seen, but 50% or more is common.
13.2 Storage consumption on virtualized environments
Although any type of data can be effectively deduplicated by N series deduplication, the data on virtualized environment has several unique characteristics that make deduplication effective. For example, when a virtual disk is created, a file equal to the size of the virtual disk is created in a datastore. This virtual disk file consumes space equal to its size regardless of how much data is stored in the virtual disk. Any allocated but unused space (sometimes called white space) is identical redundant space on the disk and a prime candidate for deduplication.
Another unique characteristic of that data is related to the way that virtual machines are created. A common deployment method is to create templates and then deploy new virtual machines by cloning the template. The result is virtual machines that have a high level of similarity in their data.
In a traditional deployment, each new virtual machine takes new storage. Here, N series deduplication can help to reduce the amount of storage required to store the virtual machine images. When two or more virtual machines are stored in the same datastore, any common data between them can be duplicated. (The common data includes operating system binary files, application binary files, and free space.) In some cases, that data can be deduplicated down to the equivalent of a single copy it.
13.3 When to run deduplication
As mentioned previously, the N series deduplication process does not occur at the time that the data is written to the storage device. However, it can be run any time the administrator desires after the data was written. The deduplication process can be resource-intensive, and it is best to run it during a period of low activity.
You can schedule and start the deduplication process using one of several ways. For example, the process can be started automatically on a fixed schedule. It can be started automatically after a defined amount of new data was written to the volume (20% by default). Alternatively, you can start it manually at anytime. Run the deduplication process manually when a significant amount of data must be deduplicated. For example, run it after provisioning new virtual machines.
13.4 The effect of snapshots in deduplicated volumes
Although snapshots can be used in deduplicated volumes, you must take note of one operational difference. The deduplication process can identify and deduplicate redundant blocks that are in a snapshot. However, the block reclamation process cannot return to blocks to free space while the snapshots exist. Because of this behavior, you might experience lower than expected space savings when deduplicating data in a volume that has snapshots.
When all of the snapshots that were taken before the deduplication process are deleted, the deduplicated blocks are reclaimed as free space. As a result of this behavior, you might want to deduplicate new data before any snapshots are taken. However, it might not always be practical, especially in busy environments.
13.5 Enabling deduplication on a volume
This section explains how to set up deduplication on an N series for use with VMware servers. It also provides information about storage reduction after enabling it for Network File System (NFS) and Fibre Channel Protocol (FCP) volumes.
13.5.1 Setting up deduplication on a volume
In this section, you go step-by-step through the process to set up deduplication. This scenario is based on the creation of five identical guests of 10 GB each on the NFS and FCP. For more information about how to set up FCP LUNs and NFS for ESX, see 5.3, “Preparing N series for the VMware ESXi Server” on page 63. The size for the FCP LUN and the NFS share is 50 GB each.
The deduplication process
Figure 13-2 shows the original sizes of the NFS share as viewed through ESX server management console.
Figure 13-2 NFS size on the vCenter management console before deduplication
Figure 13-3 shows the original sizes of the FCP LUN as viewed through the ESX server management console.
Figure 13-3 FCP size on vCenter management console before deduplication
Example 13-1 shows the size of the NFS share as viewed on the N series command line.
Example 13-1 NFS size on the N series CLI
itsotuc3> df -g /vol/nfs_vol
Filesystem        total   used   avail capacity Mounted on
/vol/nfs_vol/      50GB    24GB    25GB   48%  /vol/nfs_vol/
/vol/nfs_vol/.snapshot  0GB    0GB    0GB  ---%  /vol/nfs_vol/.
snapshot
Example 13-2 shows the size of the FCP LUN as viewed on the N series command line.
Example 13-2 LUN size on the N series CLI
itsotuc3> df -g /vol/fcp_vol
Filesystem       total   used   avail capacity Mounted on
/vol/fcp_vol/      60GB   50GB   9GB    84%  /vol/fcp_vol/
/vol/fcp_vol/.snapshot  0GB    0GB   0GB    ---%  /vol/fcp_vol/.
snapshot
To enable deduplication on a volume, enter the sis on <vol_name> command as follows:
For an NFS volume, enter the command as shown in Example 13-3.
Example 13-3 Enabling deduplication
itsotuc3> sis on /vol/nfs_vol
SIS for "/vol/nfs_vol" is enabled.
Already existing data could be processed by running "sis start -s /vol/nfs_vol".
itsotuc3>
For an FCP volume, follow these steps:
a. Set the fractional reserve to 0 (Example 13-4).
Example 13-4 Setting the fractional reserve
itsotuc3> vol options fcp_vol fractional_reserve 0
b. Enable deduplication on the FCP volume (Example 13-5).
Example 13-5 Enabling deduplication on the FCP volume
itsotuc3> sis on /vol/fcp_vol
SIS for "/vol/fcp_vol/" is enabled.
Already existing data could be processed by running "sis start -s /vol/fcp_vol".
c. Check the status (Example 13-6).
Example 13-6 Checking the status
itsotuc3> sis status
Path              State   Status   Progress
/vol/fcp_vol          Enabled  Active   670 MB Scanned
/vol/nfs_vol          Enabled  Active   9497 MB Scanned
Deduplicating existing data
You can start the deduplication process at any time by using the sis start <vol> command. The default behavior of the command deduplicates only data that was written since deduplication was turned on for the volume.
To deduplicate data that was written before deduplication was enabled, use the sis start -s <vol> command.
To start the deduplication process, use the sis start -s <vol_name> command (Example 13-7).
Example 13-7 Starting the deduplication process
itsotuc3> sis start -s /vol/nfs_vol
The file system will be scanned to process existing data in /vol/nfs_vol.
This operation may initialize related existing metafiles.
Are you sure you want to proceed with scan (y/n)?y
Starting SIS volume scan on volume nfs_vol.
The SIS operation for "/vol/nfs_vol" is started
Example 13-8 shows how to start the deduplication process on a SAN volume.
Example 13-8 Starting the deduplication process on a SAN volume
itsotuc3> sis start -s /vol/fcp_vol
The file system will be scanned to process existing data in /vol/fcp_vol.
This operation may initialize related existing metafiles.
Are you sure you want to proceed with scan (y/n)?y
Starting SIS volume scan on volume fcp_vol.
The SIS operation for "/vol/fcp_vol" is started.
13.5.2 Deduplication results
To check the progress of the deduplication process, use the sis status command, as shown in Example 13-9. If the status is active, the process of deduplication is still on going. If the status is idle, deduplication is completed.
Example 13-9 Checking status
itsotuc3> sis status
Path              State   Status   Progress
/vol/fcp_vol          Enabled  Idle    Idle for 02.18.36
/vol/nfs_vol          Enabled  Idle    Idle for 02:12:50
When the process is completed, you can view the space savings from the Virtual Infrastructure client or on the storage controller. Use the df -s command (Example 13-10).
Example 13-10 N series node
itsotuc3> df -gs /vol/nfs_vol
Filesystem        used   saved   %saved
/vol/nfs_vol        2GB    21GB     91%
The space savings of NFS volumes are available immediately and can be observed from both the storage controller and Virtual Infrastructure Client. The NFS example (Example 13-10) starts with a total of 24 GB, which is reduced to 2 GB for a total savings of 91%.
The savings displayed on the N series node match what is shown on the ESX management console. In Figure 13-4, in the highlighted area, now 47.71 GB of space is available on the NFS share.
Figure 13-4 Savings display
13.5.3 Deduplication of LUNs
Deduplication is effective on VMFS datastores and LUNs. However, as default behavior, a LUN on the N series storage system reserves space in the volume equal to the size of a LUN. Deduplication cannot reduce this reservation. Although it is enabled, there is no way to realize the space savings of deduplication on the LUN. To realize the space savings, the space reservation of the LUN must be disabled. This option is set on each LUN individually and can be set in the GUI or by using the lun set reservation command.
 
Space allocation on the VMFS file system: Deduplication reduces the amount of physical storage that the LUN consumes on the storage device. However, it does not change the logical allocation of space within the VMFS file system. This situation is unlike an NFS datastore, where space savings are realized immediately and new data can be written to the datastore. For VMFS file systems, deduplication cannot change the total amount of space that can be stored in a VMFS datastore.
After deduplication is complete, you can use the free space gained to store new data. You can create a LUN in the same volume and connect it as a new datastore. Alternatively, you can shrink the existing volume and use the space saved to grow other volumes or create new volumes.
To disable space reservation for the LUN, run the lun set reservation <lun_path> command (Example 13-11).
Example 13-11 Setting LUN reservation
itsotuc3> lun set reservation /vol/fcp_vol/deduplication disable
Now you can see the storage savings on the volume that contains the LUN deduplication (Example 13-12).
Example 13-12 Storage savings displayed
itsotuc3> df -gs /vol/fcp_vol
Filesystem        used   saved   %saved
/vol/fcp_vol/       20%    21GB     91%
Unlike NFS, the FCP savings are not apparent when you verify the VMware vCenter management console.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.177.14