Chapter 19. Deduplication

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Deduplication

This chapter describes the benefits and functions of deduplication on space saving.

Deduplication is an optional feature of Data ONTAP that significantly improves physical storage space by eliminating duplicate data blocks within a FlexVol volume.

Deduplication works at the block level on the active file system, and uses the WAFL block-sharing mechanism. Each block of data has a digital signature that is compared with all other signatures in a data volume. If an exact block match exists, the duplicate block is discarded and its disk space is reclaimed.

Deduplication removes data redundancies, as shown in Figure 19-1.

Figure 19-1 Deduplication results

You can configure deduplication operations to run automatically or on a schedule. You can deduplicate new and existing data, or only new data, on a FlexVol volume.

Important: Starting with Data ONTAP 8.1, you can enable the deduplication feature without adding a license. For deduplication, no limit is imposed on the supported maximum volume size. The maximum volume size limit is determined by the type of storage system regardless of whether deduplication is enabled.

The following topics are covered:

•How deduplication works

•What deduplication metadata is

•Guidelines for using deduplication

•Deduplication commands

•Performance considerations for deduplication

•How deduplication works with other features and products

19.1 How deduplication works

Deduplication operates at the block level within the entire FlexVol volume, eliminating duplicate data blocks and storing only unique data blocks.

Data ONTAP writes all data to a storage system in 4-KB blocks. When deduplication runs for the first time on a FlexVol volume with existing data, it scans all the blocks in the FlexVol volume and creates a digital fingerprint for each of the blocks. Each of the fingerprints is compared to all other fingerprints within the FlexVol volume. If two fingerprints are found to be identical, a byte-for-byte comparison is done for all data within the block. If the byte-for-byte comparison detects identical fingerprints, the pointer to the data block is updated, and the duplicate block is freed.

Figure 19-2 shows how the process works.

Figure 19-2 Fingerprints and byte-for-byte comparison

Deduplication runs on the active file system. Therefore, as additional data is written to the deduplicated volume, fingerprints are created for each new block and written to a change log file. For subsequent deduplication operations, the change log is sorted and merged with the fingerprint file, and the deduplication operation continues with fingerprint comparisons as previously described.

19.2 What deduplication metadata is

Deduplication uses fingerprints, which are digital signatures for every 4-KB data block in a FlexVol volume. The fingerprint database and the change logs form the deduplication metadata.

The fingerprint database and the change logs used by the deduplication operation are located outside the volume and in the aggregate. Therefore, the deduplication metadata is not included in the FlexVol volume Snapshot copies.

This approach enables deduplication to achieve higher space savings. However, some of the temporary metadata files created during the deduplication operation are still placed inside the volume and are deleted only after the deduplication operation is complete. The temporary metadata files, which are created during a deduplication operation, can be locked in the Snapshot copies. These temporary metadata files remain locked until the Snapshot copies are deleted.

While deduplication can provide substantial space savings, a percentage of storage overhead is associated with it, which you need to consider when sizing a FlexVol volume.

The deduplication metadata can occupy up to 6 percent of the total logical data of the volume, as follows:

•Up to 2 percent of the total logical data of the volume is placed inside the volume.

•Up to 4 percent of the total logical data of the volume is placed in the aggregate.

19.3 Guidelines for using deduplication

When using deduplication, remember the following guidelines about system resources and free space:

•Deduplication is a background process that consumes system resources during the operation. If the data does not change very often in a FlexVol volume, it is best to run deduplication less frequently. Multiple concurrent deduplication operations running on a storage system lead to a higher consumption of system resources.

•Ensure that sufficient free space exists for deduplication metadata in the volumes and aggregates. Before running deduplication for the first time, you must ensure that the aggregate has free space that is at least 4 percent of the total data usage for all volumes in the aggregate, in addition to 2 percent free space for FlexVol volumes. It enables additional storage savings by deduplicating any new blocks with those that existed before the upgrade. If there is not sufficient space available in the aggregate, the deduplication operation fails with an error message. During a deduplication failure, there is no loss of data and the volume is still available for read/write operations. However, depending upon the space availability in the aggregate, fingerprints of the newly added data might be lost.

Tip: Use the df command to check free space on aggregates and volumes.

•You cannot increase the size of a volume that contains deduplicated data beyond the maximum supported size limit, either manually or by using the autogrow option.

•You cannot enable deduplication on a volume if it is larger than the maximum volume size. However, you can enable deduplication on a volume after reducing its size within the supported size limits.

Starting with Data ONTAP 8.0, FlexVol volumes can be either 32 bit or 64 bit. All FlexVol volumes created using releases earlier than Data ONTAP 8.0 are 32-bit volumes. A 32-bit volume, like its containing 32-bit aggregate, has a maximum size of 16 TB. A 64-bit volume has a maximum size as large as its containing 64-bit aggregate (up to 100 TB, depending on the storage system model).

Considerations:

•Even in 64-bit volumes, the maximum size for LUNs and files is still 16 TB.

•For best performance, if you want to create a large number of small files in a volume, it is best to use 32-bit volumes.

64-bit aggregates have a larger address space and need more memory for their metadata, compared to 32-bit aggregates. This extra memory usage reduces the amount of memory available for user data. Therefore, for workloads that are memory intensive, you might experience a slight performance impact when running the workload from a FlexVol volume contained in a 64-bit aggregate compared to running the workload from a volume contained in a 32-bit aggregate.

Workloads that are highly random in nature typically access more metadata over a given period of time compared to sequential workloads. Random read workloads with a very large active data set size might experience a performance impact when run on a FlexVol volume in a 64-bit aggregate, compared to when run on a FlexVol volume in a 32-bit aggregate. It is because the data set size combined with the increased metadata size can increase memory pressure on the storage system and result in an increased amount of on disk I/O. In such scenarios, if you want to run the random workload from a volume contained in a 64-bit aggregate, using PAM (or PAM II) improves the performance delivered by the storage system and helps alleviate any performance impact seen with 64-bit aggregates.

Note that just having a 64-bit aggregate on the storage system does not result in any sort of performance degradation. The effects on performance, if any, are seen when data in any FlexVol volume in the 64-bit aggregate starts to be accessed.

19.4 Deduplication commands

This section describes the main deduplication commands on Data ONTAP.

19.4.1 Activating the deduplication license

You need to activate the deduplication license before enabling deduplication.

You can activate the deduplication license by using the license add command after installing Data ONTAP.

Enter the following command:

license add <license_key>

Here, license_key is the code for the deduplication license.

You also need to add the NearStore option license. Run the following command:

license add <nearstore_option license key>

Important: The deduplication license is only supported with Data ONTAP 7.2.2 or later releases up to Data ONTAP 8.x.

Starting with Data ONTAP 8.1, you can enable the deduplication feature without adding a license. Also with DOT 8.1, for deduplication, no limit is imposed on the supported maximum volume size.

The maximum volume size limit is determined by the type of storage system regardless of whether deduplication is enabled.

19.4.2 Common deduplication operations

Here we show how to perform various deduplication operations:

•Enable deduplication operations with the following command:

sis on <path>

Here, path is the complete path to the FlexVol volume.

Example 19-1 Enabling deduplication in a volume

itsotuc*> sis on /vol/flexvol

SIS for "/vol/flexvol" is enabled.

•Start deduplication operations with the following command:

sis start [-s] [-f] [-d] [-sp] /vol/volname

The -s option scans the volume completely and you are prompted to confirm if deduplication must started on the volume.

The -f option starts deduplication on the volume without any prompts.

The -d option starts a new deduplication operation after deleting the existing checkpoint information.

The -sp option initiates a deduplication operation by using the previous checkpoint regardless of how old the checkpoint is.

•View the deduplication status of a volume with the following command:

sis status -l path

Here, path is the complete path to the FlexVol; for example, /vol/flexvol.

The sis status command is the basic command to view the status of deduplication operations on a volume. For more information about the sis status command, see the sis man page. Table 19-1 lists and describes status and progress messages that you might see after running the sis status -l command.

Table 19-1 Status and progress messages after sis commands

Message	Message type	Description
Idle	Status and progress	No active deduplication operation is in progress.
Pending	Status	The limit of maximum concurrent deduplication operations allowed for a storage system or a vFiler unit is reached. Any deduplication operation requested beyond this limit is queued.
Active	Status	Deduplication operations are running.
size Scanned	Progress	A scan of the entire volume is running, of which size is already scanned.
size Searched	Progress	A search of duplicated data is running, of which size is already searched.
size (pct) Done	Progress	Deduplication operations have saved size amounts of data. pct is the percentage saved of the total duplicated data that was discovered in the search stage.
size Verified	Progress	A verification of the metadata of processed data blocks is running, of which size is already verified.
pct% Merged	Progress	Deduplication operations have merged pct% (percentage) of all the verified metadata of processed data blocks to an internal format that supports fast deduplication operations.

•View deduplication space savings as follows:

The df -s command displays the space savings in the active file system only. Space savings in Snapshot copies are not included in the calculation.

Enter the following command to view space savings with deduplication as shown in Example 19-2:

df -s volname

Here, volname is the name of the FlexVol volume. For example, vol2.

Example 19-2 Listing saved space in a volume

itsotuc*> df -s vol2

Filesystem used saved %saved

/vol/vol2/ 82402564 17942796 18%

itsotuc*>

Tip: Using deduplication does not affect volume quotas. Quotas are reported at the logical level, and remain unchanged.

•Stop deduplication operations as follows:

Enter the following command to stop the deduplication operation as shown in Example 19-3:

sis stop path

Here, path is the complete path to the FlexVol volume. For example, /vol/flexvol.

Example 19-3 Stopping deduplication in a volume

itsotuc*> sis stop /vol/flexvol

Operation is currently idle: /vol/flexvol

itsotuc*>

•Disable deduplication operations as follows:

If deduplication on a specific volume has a performance impact greater than the space savings achieved, you might want to disable deduplication on that volume. If you want to remove deduplication license from your system, you must disable it before removing it.

– If deduplication is in progress on the volume, enter the following command to abort the operation:

sis stop path

Here, path is the complete path to the FlexVol volume. For example, /vol/vol1.

– Enter the following command to disable the deduplication operation:

sis off path

This command stops all future deduplication operations. See Example 19-4.

Example 19-4 Disabling the deduplication in a volume

itsotuc*> sis off /vol/flexvol

SIS for "/vol/flexvol" is disabled.

itsotuc*>

Tip: Before removing the deduplication license, you must disable deduplication on all the FlexVol volumes, using the sis off command. Otherwise, you will receive a warning message asking you to disable this feature. Any deduplication operation that occurred before removing the license will remain unchanged.

•Deduplication checkpoint feature:

The checkpoint is used to periodically log the execution process of a deduplication operation. When a deduplication operation is stopped for any reason (such as system halt, panic, reboot, or last deduplication operation failed or stopped) and checkpoint data exists, the deduplication process can resume from the latest checkpoint file.

– You can restart from the checkpoint by using the following commands:

sis start -s

sis start (manually or automatically)

– You can view the checkpoint by using the following command:

sis status -l

The checkpoint is created at the end of each stage or sub-stage of the deduplication process. For the sis start -s command, the checkpoint is created at every hour during the scanning phase.

If a checkpoint corresponds to the scanning stage (the phase when the sis start -s command is run) and is older than 24 hours, the deduplication operation will not resume from the previous checkpoint automatically. In this case, the deduplication operation will start from the beginning. However, if you know that significant changes have not occurred in the volume since the last scan, you can force continuation from the previous checkpoint using the -sp option.

19.5 Performance considerations for deduplication

Certain factors affect the performance of deduplication. You need to check the performance impact of deduplication in a test setup, including sizing considerations, before deploying deduplication in performance-sensitive or production environments.

The following factors affect the performance of deduplication:

•Application and the type of data used

•The data access pattern (for example, sequential versus random access, the size and pattern of the input and output)

•The amount of duplicate data, the amount of total data, and the average file size.

Tip: To avoid performance problems, run the first deduplication and monitor it. If you notice any performance degradation, abort the operation with command: sis stop <vol-name> . See Example 19-3 on page 301.

•The nature of data layout in the volume

•The amount of changed data between deduplication operations

Tip: use the df -s command between deduplication operations to know the amount of changed data. See Example 19-2 on page 301.

•The number of concurrent deduplication operations

Tip: You can run a maximum of eight concurrent deduplication operations on a system. If any more consecutive deduplication operations are scheduled, the operations are queued.

•Hardware platform (system memory and CPU module)

•Load on the system (for example, MBps)

•Disk types (for example, ATA/FC, and RPM of the disk)

19.6 How deduplication works with other features and products

When using deduplication with other features, be mindful of the following considerations.

19.6.1 Deduplication and Snapshot copies

You can run deduplication only on the active file system. However, this data can get locked in Snapshot copies created before you run deduplication, resulting in reduced space savings.

Data can get locked in Snapshot copies in two ways:

•One possibility is that the Snapshot copies were created before the deduplication operation is run. You can avoid this situation by always running deduplication before Snapshot copies are created.

•When the Snapshot copy is created, a part of the deduplication metadata resides in the volume and the rest of the metadata resides in the aggregate outside the volume. The fingerprint files and the change-log files that are created during the deduplication operation are placed in the aggregate and are not captured in Snapshot copies, which results in higher space savings. However, some temporary metadata files that are created during a deduplication operation are still placed inside the FlexVol; these files are deleted after the deduplication operation is complete.

These temporary metadata files can get locked in Snapshot copies if the copies are created during a deduplication operation. The metadata remains locked until the Snapshot copies are deleted. If a Snapshot copy is locked, the snap delete operation fails until you execute a snapmirror release or snapvault release command to unlock the Snapshot copy. Snapshot copies are locked because SnapMirror or SnapVault is maintaining these copies for the next update. Deleting a locked Snapshot copy will prevent SnapMirror or SnapVault from correctly replicating a file or volume as specified in the schedule you set up. Example 19-5 shows how to delete a locked SnapMirror Snapshot copy, and Example 19-6 shows how to delete a locked SnapVault Snapshot copy.

Example 19-5 Deleting a locked SnapMirror Snapshot copy

itsotuc*> snap delete vol0 oldsnap

Can't delete oldsnap: snapshot is in use by snapmirror.

Use 'snapmirror destinations -s' to find out why.

itsotuc*> snapmirror destinations -s vol0

Path Destination

/vol/vol0 itsotuc0*:vol0

itsotuc*> snapmirror release vol0 itsotuc0*:vol0

itsotuc*> snap delete vol0 oldsnap

Example 19-6 Deleting a locked SnapVault Snapshot copy

itsotuc*> snap delete vol0 oldsnap

Can't delete oldsnap: snapshot is in use by snapvault.

Use 'snapvault status -l' to find out why.

itsotuc*> snapvault status -l

SnapVault client is ON.

Source: itsotuc*:/vol/vol0/qt3

Destination itsotuc0*:/vol/sv_vol/qt3...

itsotuc*> snapvault release /vol/vol0/qt3

itsotuc0*:/vol/sv_vol/qt3

itsotuc*> snap delete vol0 oldsnap

To avoid conflicts between deduplication and Snapshot copies, follow these guidelines:

•Run deduplication before creating new Snapshot copies.

•Remove unnecessary Snapshot copies stored in deduplicated volumes.

•Reduce the retention time of Snapshot copies stored in deduplicated volumes.

•Schedule deduplication only after significant new data has been written to the volume.

•Configure appropriate reserve space for the Snapshot copies.

•If snap reserve is 0, turn off the schedule for automatic creation of Snapshot copies (which is the case in most LUN deployments).

Tips:

•To check the snap reserve value set, run the command:
snap reserve <vol-name>

•To disable the automatic Snapshot copy, run the command:
vol options <vol-name> nosnap on

•To enable the automatic Snapshot copy, run the command:
vol options <vol-name> nosnap off

19.6.2 Deduplication and volume SnapMirror

You can use volume SnapMirror to replicate a deduplicated volume.

When using volume SnapMirror with deduplication, consider the following information:

•You need to enable both the deduplication and SnapMirror licenses.

Tips:

•To enable the deduplication license, see 19.4.1, “Activating the deduplication license” on page 299

•To enable the SnapMirror license, do the same procedure, entering the command:
license add xxxxxx, where xxxxxx is the license code you purchased.

•You can enable deduplication on the source system, the destination system, or both systems.

Attention: A deduplication license is not required on the destination storage system. However, if the primary storage system is not available and the secondary storage system becomes the new primary, deduplication must be licensed on the secondary storage system for deduplication to continue. Therefore, you might want to license deduplication on both storage systems.

You can enable, run, and manage deduplication only from the primary storage system. However, the FlexVol volume in the secondary storage system inherits all the deduplication attributes and storage savings through SnapMirror:

•The shared blocks are transferred only once. Therefore, deduplication also reduces the use of network bandwidth. The fingerprint database and the change logs that the deduplication process uses are located outside a volume, in the aggregate. Therefore, volume SnapMirror does not transfer the fingerprint database and change logs to the destination. This change provides additional network bandwidth savings.

•If the source and destination volumes are on different storage system models, they might have different maximum volume sizes. The lower maximum applies. When creating a SnapMirror relationship between two different storage system models, ensure that the maximum volume size with deduplication is set to the lower maximum volume size limit of the two models.

•The volume SnapMirror update schedule does not depend on the deduplication schedule. When configuring volume SnapMirror and deduplication, you need to coordinate the deduplication schedule and the volume SnapMirror schedule. Start the volume SnapMirror transfers of a deduplicated volume after the deduplication operation is complete. This schedule prevents the sending of undeduplicated data and additional temporary metadata files over the network. If the temporary metadata files in the source volume are locked in Snapshot copies, these files consume extra space in the source and destination volumes. Volumes whose size has been reduced to within the limit supported by deduplication can be part of the SnapMirror primary storage system and the secondary storage system.

19.6.3 Deduplication and qtree SnapMirror

You can use deduplication for volumes that use qtree SnapMirror.

Deduplication operations are supported with qtree SnapMirror. Qtree SnapMirror does not automatically initiate a deduplication operation at the completion of every individual qtree SnapMirror transfer. You can set up a deduplication schedule independent of your qtree SnapMirror transfer schedule.

Reference: The sis config command is used to configure and view deduplication schedules for flexible volumes. For more details about the sis command, see the
IBM System Storage N series Software Guide, SG24-7129.

When using qtree SnapMirror with deduplication, consider the following information:

•You need to enable both the deduplication and SnapMirror licenses.

Tip: You can enable deduplication on the source system, the destination system, or both systems.

•Even when deduplication is enabled on the source system, duplicate blocks are sent to the destination system. Therefore, no network bandwidth savings is achieved.

•To recognize space savings on the destination system, run deduplication on the destination after the qtree SnapMirror transfer is complete.

•You can set up a deduplication schedule independently of the qtree SnapMirror schedule. For example, on the destination system, the deduplication process does not start automatically after qtree SnapMirror transfers are finished.

•Qtree SnapMirror recognizes deduplicated blocks as changed blocks. Therefore, when you run deduplication on an existing qtree SnapMirror source system for the first time, all the deduplicated blocks are transferred to the destination system. This process might result in a transfer several times larger than the regular transfers.

When using qtree SnapMirror with deduplication, ensure that qtree SnapMirror uses only the minimum number of Snapshot copies that it requires. To ensure this minimum, retain only the latest Snapshot copies.

19.6.4 Deduplication and SnapVault

The deduplication feature is integrated with the SnapVault secondary license. This feature increases the efficiency of data backup and improves the use of secondary storage.

The behavior of deduplication with SnapVault is similar to the behavior of deduplication with qtree SnapMirror, with the following exceptions:

•Deduplication is also supported on the SnapVault destination volume.

•The deduplication schedule depends on the SnapVault update schedule on the destination system. However, the deduplication schedule on the source system does not depend on the SnapVault update schedule, and it can be configured independently on a volume.

•Every SnapVault update (baseline or incremental) starts a deduplication process on the destination system after the archival Snapshot copy is taken.

•A new Snapshot copy replaces the archival Snapshot copy after deduplication finishes running on the destination system. (The name of this new Snapshot copy is the same as that of the archival copy, but the Snapshot copy uses a new timestamp, which is the creation time.)

•You cannot configure the deduplication schedule on the destination system manually or run the sis start command. However, you can run the sis start -s command on the destination system as shown in Example 19-7.

Example 19-7 Starting deduplication in a volume

itsotuc*> sis start -s /vol/flexvol01

The file system will be scanned to process existing data in /vol/flexvol01.

This operation may initialize related existing metafiles.

Are you sure you want to proceed (y/n)? y

The SIS operation for "/vol/flexvol01" is started.

•The SnapVault update does not depend on the deduplication operation. A subsequent incremental update is allowed to continue while the deduplication operation on the destination volume from the previous backup is still in progress. In this case, the deduplication operation continues; however, the archival Snapshot copy is not replaced after the deduplication operation is complete.

•The SnapVault update recognizes the deduplicated blocks as changed blocks. Thus, when deduplication is run on an existing SnapVault source for the first time, all saved space is transferred to the destination system. The size of the transfer might be several times larger than the regular transfers. Running deduplication on the source system periodically will help prevent this issue for future qtree SnapMirror transfers. Run deduplication before the SnapVault baseline transfer.

Tip: You can run a maximum of eight concurrent deduplication operations on a system. This number includes the deduplication operations linked to SnapVault volumes and those that are not linked to SnapVault volumes.

19.6.5 Deduplication and SnapRestore

The metadata created during a deduplication operation is located in the aggregate. Therefore, when you initiate a SnapRestore operation on a volume, the metadata is not restored to the active file system. The restored data, however, retains the original space savings.

After a SnapRestore operation, if deduplication is enabled on the volume, any new data written to the volume continues to be deduplicated. However, space savings is obtained for only the new data.

To run deduplication for all the data on the volume, use the sis start -s command.

This command builds the fingerprint database for all the data in the volume. The amount of time this process takes depends on the size of the logical data in the volume. Before using the sis start -s command, you must ensure that the volume and the aggregate containing the volume have sufficient free space for the deduplication metadata. See 19.3, “Guidelines for using deduplication” on page 298.

19.6.6 Deduplication and volume copy

Volume copy is a method of copying both data in the active file system and data in storage systems from one volume to another. The source and destination volumes must both be FlexVol volumes.

When deduplicated data is copied by using the vol copy command, the copy of the data at the destination inherits all the deduplication attributes and storage savings of the source data.

Tip: When using the vol copy command, the destination volume must be in restrict mode. Use the vol restrict command to restrict it and allow the vol copy command.

The metadata created during a deduplication operation (fingerprint files and changelog files) are located outside the volume in the aggregate. Therefore, when you run the volume copy operation on a volume, the fingerprint files and change-log files are not restored to the active file system. After a volume copy operation, if deduplication is enabled on the volume, any new data written to the volume continues to be deduplicated. However, space savings are only obtained for the new data.

To run deduplication for all the data on the volume, use the sis start -s command.

This command builds the fingerprint database for all the data in the volume. The amount of time this process takes depends on the size of the logical data in the volume. Before using the sis start -s command, you must ensure that the volume and the aggregate containing the volume have sufficient free space for deduplication metadata. See 19.3, “Guidelines for using deduplication” on page 298.

19.6.7 Deduplication and FlexClone volumes

Deduplication is supported on FlexClone volumes. FlexClone volumes are writable clones of a parent FlexVol volume.

The FlexClone volume of a deduplicated volume is a deduplicated volume. The cloned volume inherits the deduplication configuration of the parent volume (for example, deduplication schedules).

The FlexClone volume of a non-deduplicated volume is a non-deduplicated volume. If you run deduplication on a clone volume, the clone is deduplicated, but the original volume remains nondeduplicated.

The metadata created during a deduplication operation (fingerprint files and change-log files) are located outside the volume in the aggregate; therefore, they are not cloned. However, the data retains the space savings of the original data.

Any new data written to the destination volume continues to be deduplicated and fingerprint files for the new data are created. Space savings is only obtained for the new data.

To run deduplication for all the data on the cloned volume, use the sis start -s command. The time the process takes to finish depends on the size of the logical data in the volume.

When a cloned volume is split from the parent volume, deduplication of all data in the clone that was part of the parent volume is undone after the volume-split operation. However, if deduplication is running on the clone volume, the data is deduplicated in the subsequent deduplication operation.

19.6.8 Deduplication in a High Availability pair

You can activate deduplication in a High Availability (HA) pair.

The maximum number of concurrent deduplication operations allowed on each node of an HA pair is eight. If one of the nodes fails, the other node takes over the operations of the failed node. In takeover mode, the working node continues with its deduplication operations as usual. However, the working node does not start any deduplication operations on the failed node.

Attention: Change logging for volumes with deduplication continues for the failed node in takeover mode. Therefore, you can perform deduplication operations on data written during takeover mode after the failed node is active, and there is no loss in space savings. To disable change logging for volumes that belong to a failed node, you can turn off deduplication on those volumes. You can also view the status of volumes with deduplication for a failed node in takeover mode.

19.6.9 Deduplication and VMware

You can run deduplication in VMware environments for efficient space savings.

While planning the Virtual Machine Disk (VMDK) and data store layouts, follow these guidelines:

•Operating system VMDKs deduplicate efficiently because the binary files, patches, and drivers are highly redundant between virtual machines. You can achieve maximum savings by keeping these VMDKs in the same volume.

•Application binary VMDKs deduplicate to varying degrees. Applications from the same vendor commonly have similar libraries installed; therefore, you can achieve moderate deduplication savings. Applications written by different vendors do not deduplicate at all.

•Application datasets when deduplicated have varying levels of space savings and performance impact based on the application and intended use. Carefully consider what application data needs to be deduplicated.

•Transient and temporary data, such as VM swap files, pagefiles, and user and system temp directories, does not deduplicate well and potentially adds significant performance impact when deduplicated. Therefore, it is best to keep this data on a separate VMDK and volume that are not deduplicated.

Application data has a major effect on the percentage of storage savings achieved with deduplication.

New installations typically achieve large deduplication savings.

Important: In VMware environments, proper partitioning and alignment of the VMDKs is important. Applications whose performance is impacted by deduplication operations are likely to have the same performance impact when you run deduplication in a VMware environment.

19.6.10 Deduplication and MultiStore

Deduplication commands are available in all the vFiler contexts. Deduplication support on vFiler units allows users to reduce redundant data blocks within vFiler units.

You can enable deduplication only on FlexVol volumes in a vFiler unit. Deduplication support on vFiler units ensures that volumes owned by a vFiler unit are not accessible to another vFiler unit.

Deduplication also supports disaster recovery and migration of vFiler units. If you enable deduplication on the volume in the source vFiler unit, the destination vFiler unit inherits all deduplication attributes.

You must license deduplication on the primary storage system. It is best that you also license deduplication on the secondary storage system. These licenses ensure that deduplication operations can continue without any disruption in case a failure causes the secondary vFiler unit to become the primary storage system.

To use the deduplication feature, activate the following licenses on the storage system:

•multistore

•a_sis

Licenses: See 19.4.1, “Activating the deduplication license” on page 299 to activate licenses for deduplication operations.

You can run deduplication commands using the RSH or SSH protocol. Any request is routed to the IP address and IP space of the destination vFiler unit.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 19. Deduplication

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 19. Deduplication