IBM FlashSystem 900 client host attachment and implementation
This chapter has installation, implementation, and other general information and guidelines for connecting client host systems to the IBM FlashSystem 900 (FlashSystem 900).
This chapter covers the following topics:
The chapter provides this information:
FlashSystem 900 sector size and host block size considerations
Partition and file alignment for best performance
Multipath support implementation for various host operating systems
Necessary drivers for several operating systems
Host integration for various operating systems
 
Notes about IBM SAN Volume Controller and IBM Spectrum Virtualize:
Some of the following sections mention IBM SAN Volume Controller, which delivers the functions of IBM Spectrum Virtualize, part of the IBM Spectrum Storage family.
IBM Spectrum Virtualize is industry-leading storage virtualization that enhances existing storage to improve resource utilization and productivity so you can achieve a simpler, more scalable and cost-efficient IT infrastructure. The functionality of IBM Spectrum Virtualize is provided by IBM SAN Volume Controller.
For more details, see the following web page:
5.1 Host implementation and procedures
The procedures to connect the IBM FlashSystem 900 to client hosts using various operating systems are described in the following sections.
5.2 Host connectivity
The IBM FlashSystem 900 can be attached to a client host by four methods:
Fibre Channel (FC)
Fibre Channel over Ethernet (FCoE)
InfiniBand
iSCSI
Always check the IBM System Storage Interoperation Center (SSIC) to get the latest information about supported operating systems, hosts, switches, and more:
If a configuration that you want is not available on the SSIC, request approval from IBM by submitting a Solution for Compliance in a Regulated Environment (SCORE) or request for price quotation (RPQ). To submit a SCORE/RPQ, contact your IBM FlashSystem marketing representative or IBM Business Partner.
The IBM FlashSystem 900 can be SAN-attached by using a switch or can be directly attached to a client host. Check the IBM SSIC for specific details. Several operating system and Fibre Channel (FC) driver combinations allow point-to-point direct access with 16 Gbps FC. Check your environment and check the SSIC to use 16 Gbps direct attachment to the host.
 
Note: The FlashSystem 900 16 Gbps FC attachment does not support arbitrated loop topology (direct connection to client hosts). The IBM FlashSystem 900 must be connected to a SAN switch when using 16 Gbps FC if the host operating system does not support point-to-point FC direct connections. At the time that this book was written, IBM AIX did not support point-to-point FC direct connections.
5.2.1 Fibre Channel SAN attachment
If you attach a host using a SAN switch to the FlashSystem 900, ensure that each host port is connected and zoned to both canisters of the FlashSystem 900. If only one FlashSystem 900 canister is connected to a host port, the host state will be shown as degraded. This will be referred to in the remainder of this chapter as the switch rule.
 
Note: When you use a switch, you must zone host ports according to the switch rule. With the exception of IBM i, a host port must be connected to each FlashSystem 900 canister. For IBM i, different switch zoning rules apply as described in 5.3.4, “IBM i and FlashSytem 900” on page 109.
Figure 5-1 on page 103 shows the correct SAN connection of an AIX server with two ports to the FlashSystem 900. In this example, four zones are set up:
AIX port 8a FlashSystem 900 port 41
AIX port 8a FlashSystem 900 port 61
AIX port 27 FlashSystem 900 port 51
AIX port 27 FlashSystem 900 port 71
Figure 5-1 SAN attachment
5.2.2 Fibre Channel direct attachment
If you attach the FlashSystem 900 directly to a host, the host must be attached to both canisters. If the host is not attached to both canisters, the host will be shown as degraded.
Figure 5-2 shows the correct direct attachment of an AIX server with two ports to the FlashSystem 900. This example shows two connections:
AIX port 8a directly attached to FlashSystem 900 port 41
AIX port 27 directly attached to FlashSystem 900 port 71
Figure 5-2 Direct attachment
If you use SAN attachment and direct attachment simultaneously on a FlashSystem 900, the direct-attached host state will be degraded. Using a switch will enforce the switch rule for all attached hosts, which means that a host port has to be connected to both FlashSystem canisters. Because a direct-attached host cannot connect one port to both canisters, it will not meet the switch rule and its state will be degraded.
 
Note: You can attach a host through a switch and simultaneously attach a host directly to the FlashSystem 900. But then, the direct-attached host will be shown as degraded.
5.2.3 General Fibre Channel attachment rules
These rules apply to FC connections:
If directly attached, a host must have ports connected to both canisters.
If connected to a switch, all host ports must have paths to both canisters. This is the switch rule.
If any port is connected to a switch, the switch rule applies to all hosts except IBM i, regardless of whether that host is connected through a switch.
5.3 Operating system connectivity and preferred practices
Detailed information about the IBM FlashSystem 900 client host connections using various operating systems is described in the following sections.
5.3.1 FlashSystem 900 sector size
In a traditional spinning disk, a sector refers to a physical part of the disk. The size of the sector is defined by the disk manufacturer and most often set to 512 bytes. The 512-byte sector size is supported by most operating systems.
The FlashSystem 900 does not have fixed physical sectors like spinning disks do. Data is written in the most effective way on flash. To maintain compatibility however, the sector size is the same as the sector size of most traditional spinning disk sectors. Therefore, the default sector size is presented to the host by the FlashSystem 900 is 512 bytes. Starting with firmware version 1.1.3.0, you can create volumes with a sector size of 4096 bytes by using the CLI mkvdisk command and the new -blocksize parameter. For details about this command, see the FlashSystem 900 web page in the IBM Knowledge Center:
The mkvdisk command -blocksize paramaeter specifies the SCSI logical unit sector size. The only two possible values are 512 (the default) and 4096:
Size 512 is the default. It is supported by most operating systems.
Size 4096 provides better performance but it might not be supported by your host operating system or application.
 
Note: Format all client host file systems on the storage system at 4 KB or at a multiple of
4 KB. Do this for a used sector size of 512 and 4096 bytes. For example, file systems that are formatted at an 8 KB allocation size or a 64 KB allocation size are satisfactory because they are a multiple of 4 KB.
5.3.2 File alignment for the best RAID performance
File system alignment can improve performance for storage systems by using a RAID storage mode. File system alignment is a technique that matches file system I/O requests with important block boundaries in the physical storage system. Alignment is important in any system that implements a RAID layout. I/O requests that fall within the boundaries of a single stripe have better performance than an I/O request that affects multiple stripes. When an I/O request crosses the endpoint of one stripe and into another stripe, the controller must then modify both stripes to maintain their consistency.
Unaligned accesses include those requests that start at an address that is not divisible by 4 KB, or are not a multiple of 4 KB in size. These unaligned accesses are serviced at much higher response times, and they can also significantly reduce the performance of aligned accesses that were issued in parallel.
The IBM FlashSystem 900 provides 512-byte sector size support that greatly improves response times for I/O requests that cannot be forcibly aligned. However, alignment to 4 KB must be maintained whenever possible.
5.3.3 IBM AIX and FlashSystem 900
The IBM FlashSystem 900 can be attached to AIX client hosts by using the following Fibre Channel (FC) method:
The IBM FlashSystem 900 connects to AIX through Node Port Identifier Virtualization (NPIV) and Virtual I/O Server (VIOS) modes.
Directly attached Fibre Channel topology for AIX
Configure the FlashSystem 900 FC controllers to arbitrated loop topology when the controllers are directly attached to the AIX hosts. You must check the SSIC for supported configurations. For more details about SSIC, see 5.2, “Host connectivity” on page 102.
 
Note: The FlashSystem 900 16 Gbps FC ports do not support direct connection to AIX client hosts. A SAN switch must be placed between the IBM FlashSystem 900 and any 16 Gbps-attached client host. If arbitrated loop is required by the client host, connect at 8 Gbps FC to the IBM FlashSystem 900.
Optimal logical unit number configurations for AIX
The number of logical unit numbers (LUNs) that you create on the IBM FlashSystem 900 can affect the overall performance of AIX.
Applications perform optimally if at least 32 LUNs are used in a volume group. If fewer volumes are required by an application, use the Logical Volume Manager (LVM) to map fewer logical volumes to 32 logical units. This does not affect performance in any significant manner (LVM overhead is small).
 
Note: Use at least 32 LUNs in a volume group because this number is the best balance between good performance (the more queued I/Os the better FlashSystem 900 performs) and minimizing overhead and complexity.
Sector size restrictions for AIX
The AIX operating system supports the 512-byte sector size, which the IBM FlashSystem 900 supports.
Auto Contingent Allegiance support
Certain host systems require the Auto Contingent Allegiance (ACA) support to run multiple concurrent commands. When using the round-robin multipathing algorithm, IBM AIX sends out extraneous ACA task management commands. ACA support on logical units is always enabled on the IBM FlashSystem 900.
Volume alignment
The IBM AIX operating system volumes align to 4 KB boundaries.
Implementing multipathing for IBM AIX hosts
Multipathing enables the host to access the FlashSystem 900 LUNs through different paths. This architecture helps to protect against I/O failures, such as port, cable, or other path issues.
 
Important: The latest updates for multipathing support on the IBM AIX operating system, are at IBM Fix Central:
Resetting the host bus adapter and disk configuration
The following sections describe how to reconfigure the host bus adapters (HBAs) to implement multipathing. After you install the latest IBM AIX updates for support of the IBM FlashSystem 900, AIX must rescan the SCSI bus for the LUNs to recognize them as devices that support multipathing. Begin by reconfiguring the HBA and its attached disks.
To reset the HBA or disk configuration, complete the following steps:
 
Important: If other disks are attached to any HBA devices, the following commands remove the configuration for those disks and the HBA. If you are attempting to save the current configuration, skip these steps.
1. Determine the device names of the HBAs to which the storage system is connected, by entering this command:
lsdev -t efscsi
2. For each HBA device name, enter the following command to remove the HBA and the disk configuration that is associated with it:
rmdev -l <device name> -R
3. Determine whether any disks are already defined that must be removed before rescanning, by entering this command:
lsdev -C -c disk
4. If any LUNs are already defined as Other FC SCSI Disk Drive, remove the old definitions. For each disk name, enter the following command:
rmdev -l <disk name> -R
Setting the fast fail recovery flag for the host bus adapter
You can set the fast fail recovery flag for the HBA to improve the failover response.
For the multipath I/O (MPIO) driver to fail over to an available path in a timely manner after a path failure, set the fast_fail recovery flag for the HBA devices to which the storage system is connected.
At a command prompt, enter this command:
chdev -a fc_err_recov=fast_fail -l <device name>
The <device name> is the device name of the HBA that is connected to the system.
Rescanning for the storage system logical unit numbers
After the host system is configured to recognize that the storage device supports multipathing, you must rescan for the LUNs.
At a command prompt, enter this command:
cfgmgr -vl <device name>
The <device name> is the device name of the HBA connected to the system.
Confirming the configuration
After you change the configuration to support multipathing, confirm that the configuration is working correctly.
To confirm the new configuration, complete the following steps:
1. Ensure that the configuration is successful by entering the following command to list all disks available to the system:
lsdev -C -c disk
All LUNs must use MPIO. They must show as MPIO IBM FlashSystem Disk.
The following command gives you detailed information about a LUN:
lscfg -vl <LUN>
Example 5-1 on page 108 shows the output of those two commands. This AIX system has four disks attached:
 – FlashSystem 820 LUN using MPIO
 – A LUN without multipathing
 – Another LUN without multipathing
 – FlashSystem 900 LUN using MPIO
Both IBM FlashSystem units are shown as MPIO IBM FlashSystem Disks. But, you see the different models when you look at the Machine Type and Model attribute of the lscfg command output, for example, the fourth LUN is a FlashSystem 900.
Some output lines were removed for clarity (Example 5-1 on page 108).
Example 5-1 Check the AIX MPIO configuration
# lsdev -C -c disk
hdisk0 Available Virtual SCSI Disk Drive
hdisk1 Available 00-00-02 MPIO IBM FlashSystem Disk
hdisk2 Available 00-01-02 Other FC SCSI Disk Drive
hdisk3 Available 01-01-02 Other FC SCSI Disk Drive
hdisk4 Available 00-01-02 MPIO IBM FlashSystem Disk
 
# lscfg -vl hdisk1
hdisk1 U78C0.001.DBJ2497-P2-C1-T1-W20040020C2117377-L0 MPIO IBM FlashSystem Disk
 
Manufacturer................IBM
Machine Type and Model......FlashSystem
...
 
# lscfg -vl hdisk4
hdisk4 U78C0.001.DBJ2497-P2-C1-T2-W500507605EFE0AD1-L0 MPIO IBM FlashSystem Disk
 
Manufacturer................IBM
Machine Type and Model......FlashSystem-9840
...
2. If disks are missing, or are extra, or the LUNs do not show as an MPIO IBM FlashSystem Disk, check that the connections and the storage system configuration are correct. You must then remove the configuration for the HBAs and complete the rescan again. For more information, see “Resetting the host bus adapter and disk configuration” on page 106.
3. To ensure that all the connected paths are visible, enter the following command:
lspath
Example 5-2 shows the paths for the IBM FlashSystem 900 used in Example 5-1.
Example 5-2 AIX lspath output
# lspath -l hdisk4
Enabled hdisk4 fscsi1
Enabled hdisk4 fscsi1
Enabled hdisk4 fscsi3
Enabled hdisk4 fscsi3
4. If paths are missing, check that the connections and the storage system configuration are correct. You must then remove the configuration for the HBAs and perform the rescan again. For more information, see “Resetting the host bus adapter and disk configuration” on page 106.
Configuring path settings
All paths on the IBM FlashSystem 900 are equal. All ports have access to the LUNs, and there is no prioritized port. Therefore, you use them all at the same time. You have to set the distribution of the I/O load at the operating system level. The round-robin distribution is the ideal way to use all of the ports equally.
Set the algorithm attribute to round_robin before you add the hdisk to any volume group. All outgoing traffic is then spread evenly across all of the ports, as shown in the following example:
chdev -l <LUN> -a algorithm=round_robin
The shortest_queue algorithm is available in the latest technology levels of AIX for some devices. The algorithm behaves similarly to round_robin when the load is light. When the load increases, this algorithm favors the path that has the fewest active I/O operations. Therefore, if one path is slow because of congestion in the SAN, the other less congested paths are used for more of the I/O operations. The shortest_queue (if available) or round_robin enables the maximum use of the SAN resources. You can use the load_balance algorithm to spread the load equally across the paths.
To list the attributes of the LUN, enter the following command:
lsattr -El <LUN>
Example 5-3 shows output of the chdev and the lsattr commands of the FlashSystem 900 that is used in Example 5-1 on page 108. The number of spaces in the output is changed for clarity.
Example 5-3 AIX chdev and lsattr commands
# chdev -l hdisk4 -a algorithm=round_robin
hdisk4 changed
 
# lsattr -El hdisk4
PCM PCM/friend/fcpother Path Control Module False
PR_key_value none Persistent Reserve Key Value True+
algorithm round_robin Algorithm True+
clr_q no Device CLEARS its Queue on error True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_cmd test_unit_rdy Health Check Command True+
hcheck_interval 60 Health Check Interval True+
hcheck_mode nonactive Health Check Mode True+
location Location Label True+
lun_id 0x0 Logical Unit Number ID False
lun_reset_spt yes LUN Reset Supported True
max_coalesce 0x40000 Maximum Coalesce Size True
max_retry_delay 60 Maximum Quiesce Time True
max_transfer 0x80000 Maximum TRANSFER Size True
node_name 0x500507605efe0ad0 FC Node Name False
pvid none Physical volume identifier False
q_err yes Use QERR bit True
q_type simple Queuing TYPE True
queue_depth 64 Queue DEPTH True
reassign_to 120 REASSIGN time out value True
reserve_policy no_reserve Reserve Policy True+
rw_timeout 30 READ/WRITE time out value True
scsi_id 0x10100 SCSI ID False
start_timeout 60 START unit time out value True
timeout_policy fail_path Timeout Policy True+
unique_id 54361IBM FlashSystem-9840041263a20412-0000-0004-00006410FlashSystem-984003IBMfcp
Unique device identifier False
ww_name 0x500507605e800e41 FC World Wide Name False
5.3.4 IBM i and FlashSytem 900
This section covers the following topics pertaining to IBM i and FlashSystem 900:
Attachment methods
The IBM FlashSystem 900 can be attached to IBM i hosts by using one of the following Fibre Channel (FC) methods:
NPIV-attachment through the IBM Virtual I/O Server (VIOS) using Node Port Identifier Virtualization (NPIV) in conjunction with the required NPIV-capable SAN switch.
Native-attachment, that is, without the IBM VIOS, either with a SAN switch or without one. The latter (without a SAN switch) is also known as direct-attachment.
 
Note: IBM i attachment to the IBM FlashSystem 900 through the IBM VIOS using virtual SCSI is not supported.
Zoning rules
The same rules apply for IBM i direct-attachment to an IBM FlashSystem 900 as for other host operating systems. However, different zoning rules apply for SAN switch-attached IBM i host Fibre Channel initiators. Unlike with other operating systems, these initiators must not be zoned to both FlashSystem canisters.
For IBM FlashSystem 900 native or NPIV SAN switch attachment, a one-to-one zoning should be used such that one IBM i Fibre Channel initiator port is zoned with one FlashSystem target port from a single FlashSystem canister, as shown in Figure 5-3 on the right. Note that this is different also to IBM FlashSystem V9000, IBM Storwize series, or IBM SAN Volume Controller attachment where a single IBM i initiator should be zoned to both storage controller nodes to support the SCSI Asymmetrical Logical Unit Access (ALUA) LUN affinity concept of these controller nodes with preferred (active) and non-preferred (passive) paths. With the IBM FlashSystem 900, all paths are active.
Figure 5-3 IBM i FlashSystem 900 attachment and zoning rules
Hardware and software requirements
The minimum hardware and software requirements for IBM i attachment to the IBM FlashSystem 900 are summarized in Table 5-1 on page 111. For further interoperability information, see the IBM System Storage Interoperation Center (SSIC):
Table 5-1 IBM i and IBM FlashSystem 900 Minimum Requirements
Requirement
Requirement details and versions
IBM i version and release
IBM i 7.2 Technology Refresh 2 or later + latest HIPER PTF group1
IBM Power Systems server
IBM POWER7® firmware level FW780_40 or later
IBM POWER8® firmware level FW810 or later2
Attachment
VIOS NPIV
Native using switches
Native direct3
VIOS level
2.2.3.4 or later
Fibre Channel adapters and Fibre Channel over Ethernet adapters
Native connection or VIOS NPIV connection:
8 Gb 2-port FC adapter #5735/#5273
8 Gb 2-port FC adapter #EN0G/#EN0F (VIOS NPIV only)
8 Gb 4-port FC adapter #5729 (VIOS NPIV only)
8 Gb 4-port FC adapter #EN12/#EN0Y (VIOS NPIV only)
16 Gb 2-port FC adapter #EN0A/#EN0B
 
VIOS NPIV connection via FCoE:
2x 10 Gb FCoE 2x 1GbE SFP+ adapter #EN0H/#EN0J
4-port (10 Gb FCoE and 1 GbE) LR and RJ45 adapter #EN0M/#EN0N
4-port (10 Gb FCoE and 1 GbE) copper and RJ45 adapter #EN0K/#EN0L
SAN switches
Brocade or Cisco
FlashSystem firmware
1.2.0.11 or later

1 See the IBM TechNote PTF listing for 4096 disk sector support at 7.2 with D/TD840 or D/T6B4E-050 drives for additional information about recommended PTFs:
http://www.ibm.com/support/docview.wss?uid=nas8N1020957
2 Attachment of IBM FlashSystem 900 to IBM i with Boot from SAN is supported only in Power Systems models that support the stated firmware levels
3 Direct attachment can be done with: 8 Gb ports in IBM i and FlashSystem – 8 Gb ports in FlashSystem must be configured as Fibre Channel Arbitrated Loop; 16 Gb ports in IBM i and FlashSystem
Configuration and performance considerations
Any volume (LUN) configured and attached natively or through a VIOS NPIV connection to an IBM i partition must be created with 4096 byte sectors by using the FlashSystem CLI command mkvdisk with the parameter -blocksize 4096 or by using the GUI version 1.3 or later which supports the 4096 byte block size as shown in Figure 5-4 on page 112. IBM i compresses its tagged pages as a minor subset of its 4160 byte pages to fit into the 4096 byte sector format supported by the IBM FlashSystem 900.
Figure 5-4 FlashSystem 900 GUI creating 4096 bytes sector LUNs
 
Note: The 512 byte sector LUNs on an IBM FlashSystem 900 are not supported by IBM i unless this FlashSystem is virtualized by an IBM SAN Volume Controller.
Because IBM FlashSystem 900 generally expects a host initiator to log into both of its canisters, which is not applicable to IBM i, the IBM i host is shown with a state of degraded by the FlashSystem as shown in Figure 5-5. This is not a failure indication in this case and should be ignored.
Figure 5-5 FlashSystem 900 GUI reported IBM i Host State
Similar to other SAN storage systems for the IBM FlashSystem 900, a moderate LUN size should be chosen for IBM i in the approximate range of 40 - 300 GB because IBM i uses a fixed queue depth per disk unit and path. I/O concurrency and performance typically benefit from having a reasonable number of LUNs configured in an IBM i auxiliary storage pool (ASP) especially for applications known to perform a lot of file creates, opens, and closes.
Up to 64 LUNs are supported per IBM i physical or virtual Fibre Channel adapter port.
For further details about IBM PowerVM® Virtual I/O Server planning and implementation including NPIV attachment for IBM i see IBM PowerVM Virtualization Introduction and Configuration, SG24-7940.
With IBM PowerVM Virtual I/O Server NPIV attachment, usually no storage performance tunable parameters are available because VIOS merely acts as a Fibre Channel I/O pass-through from its owned physical Fibre Channel adapter through the IBM Power Systems hypervisor to a VIOS client partition such as IBM i. VIOS does not even “see” the NPIV client LUNs and thus it does not perform I/O multi-pathing for them, which should be done by the IBM i client, preferably across two VIOS partitions as shown in Figure 5-6 on page 113.
Figure 5-6 IBM i NPIV-attachment with two redundant Virtual I/O Servers
For VIOS attachment (NPIV-attachment), the Fibre Channel adapter queue depth is a storage performance-related tunable parameter. It can be considered to be increased if not already using VIOS version 2.2.4.10 or later and having deployed its default rule set for applying IBM recommended device settings.
The Fibre Channel adapter model-dependent current setting for its adapter queue depth, its allowed range, and its increase to its maximum supported value is shown in Example 5-4. With the fcsX FC adapter port resource usually in use, as implied in the example, VIOS would still need to be restarted for the permanent change of the adapter queue depth that is made to its resource database to become effective.
Example 5-4 Displaying and changing the VIOS FC adapter queue depth
$ lsdev -dev fcs0 -attr | grep num_cmd_elems
num_cmd_elems 500 Maximum number of COMMANDS to queue to the adapter True
 
$ lsdev -dev fcs0 -range num_cmd_elems
20...4096 (+1)
 
$ chdev -dev fcs0 -perm -attr num_cmd_elems=4096
fcs0 changed
For both native and VIOS NPIV-attachment, the IBM FlashSystem 900 LUNs report to IBM i as device type D840 disk units, as shown in Figure 5-7 on page 114.
Figure 5-7 on page 114 shows the disk configuration of a newly set up IBM i partition with its load source unit, that is, disk unit 1 being the only disk unit configured in the system ASP (ASP 1). All disk units are accessible from IBM i through two active Fibre Channel paths. The IBM i integrated multi-path driver distributes the I/O across all active paths of a disk unit using a round-robin algorithm with some load-balancing applied. Non-configured and non-initialized disk units, that is, those not assigned to an IBM i ASP yet, are still reported with DPHxxx resource names.
                       Display Disk Path Status
Serial Resource Path
ASP Unit Number Type Model Name Status
1 1 Y85FB50002E7 D840 040 DMP002 Active
Y85FB50002E7 D840 040 DMP001 Active
* * Y85FB50002E8 D840 040 DPH001 Active
Y85FB50002E8 D840 040 DPH010 Active
* * Y85FB50002E9 D840 040 DPH002 Active
Y85FB50002E9 D840 040 DPH011 Active
* * Y85FB50002EA D840 040 DPH003 Active
Y85FB50002EA D840 040 DPH012 Active
* * Y85FB50002EB D840 040 DPH004 Active
Y85FB50002EB D840 040 DPH013 Active
* * Y85FB50002EC D840 040 DPH005 Active
Y85FB50002EC D840 040 DPH014 Active
* * Y85FB50002ED D840 040 DPH006 Active
Y85FB50002ED D840 040 DPH015 Active
More...
Press Enter to continue.
F3=Exit F5=Refresh F9=Display disk unit details
F11=Display encryption status F12=Cancel
Figure 5-7 FlashSystem 900 LUNs reported on IBM i in SST
The least significant six digits of an IBM i disk unit serial number for a FlashSystem 900 LUN come from the volume unique identifier (UID) as assigned by the FlashSystem 900 as shown in Figure 5-8. The first five digits of the serial number following the letter “Y” are a unique hash value built by IBM i, which cannot be used to identify a particular storage system.
Figure 5-8 FlashSystem 900 GUI Properties for Volume
Unlike with other 512-byte storage systems supported by IBM i that require allocation of 9x 512-byte disk sectors to store a 4 KB IBM i memory page, with the 4096 byte sector support by IBM FlashSystem 900 almost the full storage volume capacity is available for IBM i data.
Figure 5-9 shows the example of 16x 80 GiB volumes from an IBM FlashSystem 900 with its 84577 MB usable capacity (80 x 1024 x 1024 x 1024 x 4096/4160 bytes) reported on IBM i.
                     Display Disk Configuration Capacity
----Protected--- ---Unprotected--
ASP Unit Type Model Threshold Overflow Size %Used Size %Used
1 90% No 0 0.00% 1353245 1.19%
1 D840 040 0 0.00% 84577 8.30%
2 D840 040 0 0.00% 84577 0.71%
3 D840 040 0 0.00% 84577 0.71%
4 D840 040 0 0.00% 84577 0.71%
5 D840 040 0 0.00% 84577 0.71%
6 D840 040 0 0.00% 84577 0.72%
7 D840 040 0 0.00% 84577 0.72%
8 D840 040 0 0.00% 84577 0.72%
9 D840 040 0 0.00% 84577 0.71%
10 D840 040 0 0.00% 84577 0.72%
11 D840 040 0 0.00% 84577 0.71%
More...
Press Enter to continue.
F3=Exit F5=Refresh F10=Display disk unit details
F11=Display disk configuration protection F12=Cancel
Figure 5-9 IBM i SST Display Disk Configuration Capacity
IBM FlashSystem 900 LUNs, despite their inherent RAID protection by the FlashSystem, are reported as unprotected disk units to IBM i as shown in Figure 5-9. Thus, additional storage system level protection can be implemented by using IBM i mirroring to a second IBM FlashSystem.
5.3.5 FlashSystem 900 and Linux client hosts
The FlashSystem 900 can be attached to Linux client hosts with the following methods:
Fibre Channel (FC)
InfiniBand
FCoE
iSCSI
The FlashSystem 900 benefits the most from operating systems where multipathing and logical volumes are supported. Most Linux distributions have the same optimum configurations. Specific Linux configuration settings are shown.
Network topology guidelines
You can use an arbitrated loop or point-to-point topology on FC configurations for Linux hosts.
 
Note: The FlashSystem 900 16 Gbps FC ports do not support direct connection to client hosts. A SAN switch must be placed between the IBM FlashSystem 900 and any 16 Gbps-attached client host.
Aligning a partition using Linux
Use this procedure to improve performance by aligning a partition in the Linux operating system.
The Linux operating system defaults to a 63-sector offset. To align a partition in Linux using fdisk, complete the following steps:
1. At the command prompt (#), enter the fdisk /dev/mapper/<device> command.
2. To change the listing of the partition size to sectors, enter u.
3. To create a partition, enter n.
4. To create a primary partition, enter p.
5. To specify the partition number, enter 1.
6. To set the base sector value, enter 128.
7. Press Enter to use the default last sector value.
8. To write the changes to the partition table, enter w.
 
Note: The <device> is the FlashSystem 900 volume. Example 5-24 on page 153 shows how to create device names for a FlashSystem 900 volume.
The newly created partition now has an offset of 64 KB and works optimally with an aligned application.
If you are installing the Linux operating system on the storage system, create the partition scheme before the installation process. For most Linux distributions, this process requires starting at the text-based installer and switching consoles (press Alt+F2) to get the command prompt before you continue.
Multipathing information for Linux
You can use MPIO to improve the performance of the Linux operating system. Linux kernels of 2.6, and later, support multipathing through device-mapper-multipath. This package can coexist with other multipathing solutions if the other storage devices are excluded from device-mapper.
For a template for the multipath.conf file, see Example 5-23 on page 152.
Because the storage system controllers provide true active/active I/O, the rr_min_io field in the multipath.conf file is set to 4. This results in best distribution of I/O activity across all available paths. You can set it to 1 for a round-robin distribution or if the I/O activity is more sequential in nature, you can increase the rr_min_io field by factors of 2 for a performance gain by using buffered I/O (non-direct).
Integrating InfiniBand controllers
To integrate with InfiniBand technology, the storage system provides block storage by using the SCSI Remote Direct Memory Access (RDMA) Protocol (SRP).
The Linux operating system requires several software modules to connect to the storage system through InfiniBand technology and SRP. In particular, make sure that you install the srp and srptools modules, and install drivers for the server’s host channel adapter (HCA). Use the OpenFabrics Enterprise Distribution (OFED) package from the following website to install these modules, either individually or by using the Install All option:
Figure 5-10 shows the setting of the InfiniBand /etc/infiniband/openib.conf configuration file for SRP.
# Load SRP module
SRP_LOAD=yes #
Enable SRP High Availability daemon
SRPHA_ENABLE=yes
SRP_DAEMON_ENABLE=yes
Figure 5-10 InfiniBand configuration file
These settings cause the SRP and the SRP daemons to load automatically when the InfiniBand driver starts. The SRP daemon automatically discovers and connects to InfiniBand SRP disks.
Use the SRPHA_ENABLE=yes setting. This setting triggers the multipath daemon to create a multipath disk target when a new disk is detected.
InfiniBand technology also requires a Subnet Manager (SM). An existing InfiniBand network already has an SM. In many cases, an InfiniBand switch acts as the SM. If an SM is needed, install OpenSM, which is included with the OFED package, and start it on a single server in the network by entering the following command:
# /etc/init.d/opensmd start
This script opens an SM on a single port only. If multiple ports directly connect to the storage system, a custom script is needed to start the SM on all ports.
5.3.6 FlashSystem 900 and Microsoft Windows client hosts
The FlashSystem 900 can be attached to Windows client hosts with the following methods:
Fibre Channel (FC)
FCoE
The IBM FlashSystem 900 sees the most benefit from operating systems where multipathing and logical volumes are supported. However, certain applications depend on operating systems that are designed for workstations, and they can still benefit from the storage system performance.
Network topologies for Windows hosts
Arbitrated loop or point-to-point topology can be used on the FC configuration for Windows hosts.
 
Note: The FlashSystem 900 16 Gbps FC ports do not support direct connection to client hosts. A SAN switch must be placed between the IBM FlashSystem 900 and any 16 Gbps-attached client host.
Implementing 4 KB alignment for Windows Server 2003
Use this procedure to improve performance by establishing a 4 KB alignment on a Windows operating system.
Before Windows Vista and Windows Server 2008, systems running the Windows operating systems offset the partition by 63 sectors, or 31.5 KB.
To align to the preferred 4 KB sector size, implement the offset by using the diskpart.exe utility:
1. Start the Windows diskpart.exe utility to open the DISKPART line prompt.
2. To view the list of available LUNs, enter the following command:
DISKPART> list disk
3. To select the LUN that holds the file system, enter the following command:
DISKPART> select disk <disk number>
4. To create a partition on the selected LUN, enter the following command:
DISKPART> create partition primary align=64
5. Use the Microsoft Management Console (MMC) or other method to assign a file system or drive letter (raw access) to the partition.
If you are installing the Windows XP or Windows Server 2003 operating system, create the partition on the LUN before the installation of the operating system. You can create the partition by using a Linux Live CD, or by presenting the LUN to another Windows host and disconnecting the drive after the partitioning is complete.
Windows Server 2003 multipathing
Configuring MPIO on a Windows Server 2003 operating system can improve reliability. Windows Server 2003 has a built-in MPIO driver that is provided by Microsoft. This driver is responsible for aggregating the links of storage systems and reporting the addition or removal of links to the kernel while online. To use this feature, a Device Specific Module (DSM) is provided to identify the storage system to the MPIO driver. All major storage vendors have support for the multipathing function and coexist safely because of the common driver in the Windows operating system.
To obtain a copy of the FlashSystem 900 driver for Windows Server 2003, a SCORE or RPQ must be submitted to IBM to request approval. To submit a SCORE or RPQ, contact your IBM representative or IBM Business Partner.
If you are using EMC PowerPath data path management software, it must be at version 4.6, or later, before the storage system DSM can be used.
Windows Server 2008 and Windows Server 2012 multipathing
Windows Server operating system versions that begin with Windows Server 2008 no longer require a separate DSM. Instead, the MPIO function must be installed on the server. For more information, see the Microsoft TechNet website:
You can enable multipathing by selecting the Server Manager  Features option. See Figure 5-11.
Figure 5-11 Windows 2008 example of activated multipathing
You can set vendor ID (IBM) and product ID (FlashSystem-9840) using Administrative Tools  MPIO. You enter the eight-character vendor ID and the 16-character product ID by using the MPIO Devices pane (Figure 5-12).
Figure 5-12 Windows MPIO vendor ID and product ID for the FlashSystem 900
 
Note: The vendor ID must be eight characters in length, including spaces. The product ID must be 16 characters in length, including spaces.
Table 5-2 shows the correct vendor ID and product ID for different IBM FlashSystem products.
Table 5-2 IBM FlashSystem SCSI standard inquiry data
IBM FlashSystem
Vendor identification
Product identification
IBM FlashSystem 900
IBM
FlashSystem-9840
IBM FlashSystem 840
IBM
FlashSystem-9840
IBM FlashSystem 820
IBM
FlashSystem
IBM FlashSystem 720
IBM
FlashSystem
IBM FlashSystem 810
IBM
FlashSystem
IBM FlashSystem 710
IBM
FlashSystem
After you install the MPIO function, set the load balance policy on all storage system LUNs to Least Queue Depth (Figure 5-13). All available paths to the LUNs are then used to aggregate bandwidth. The load balance policy is set through the Properties pane of each multipath disk device in the Windows Device Manager.
Figure 5-13 Windows MPIO queue configuration
Power option setting for the highest performance
Select Control Panel → Hardware → Power Options. In the window, set the Windows power plan to High performance (Figure 5-14).
Figure 5-14 Windows Power Options
Optimum disk command timeout settings
Adjust the disk TimeOutValue parameter on the Windows operating system for more reliable multipath access.
Windows operating systems have a default disk command TimeOutValue of 60 seconds. If a SCSI command does not complete, the application waits 60 seconds before an I/O request is tried again. This behavior can create issues with most applications, so you must adjust the disk TimeOutValue in the registry key to a lower value. For more information about setting this value, see the Microsoft TechNet website.
Adjust this key to your needs:
HKLMSystemCurrentControlSetServicesDiskTimeOutValue
For a disk in a nonclustered configuration, set TimeOutValue to 10.
For a disk in a clustered configuration, set TimeOutValue to 20.
Windows 2012 R2 fast volume formatting
Formatting a FlashSystem volume in Windows 2012 r2 can take a long time to complete. For example, a quick format of a 1.5 TB volume can take 30 minutes or more to complete.
This behavior is related to the way the Windows trim command (known as TRIM) works in conjunction with the TRIM feature of FlashSystem 900. TRIM, also known as UNMAP in the SCSI command set, allows an operating system to inform a solid-state drive (SSD) or flash drive which blocks of data are no longer considered in use and can be wiped internally.
On a new uninitialized FlashSystem 900, you can disable TRIM and the format will take 1 - 2 seconds to complete. TRIM should be re-enabled after the format is completed.
 
Important: The suggestion is to format a volume with Trim enabled, except in the case of a new uninitialized FlashSystem 900.
On a new uninitialized FlashSystem 900, without any previously stored data, a customer can disable, then later re-enable TRIM from the Windows host:
DISABLE fsutil behavior set DisableDeleteNotify 1
RE-ENABLE fsutil behavior set DisableDeleteNotify 0
Changes will take effect immediately, with the exception of in-flight requests. Implementing this change does not require a reboot and it affects all volumes on the Windows server. After formatting of the volume is complete, be sure to RE-ENABLE TRIM.
5.3.7 FlashSystem 900 and client VMware ESX hosts
The FlashSystem 900 can be attached to VMware ESX client hosts by using the following methods:
Fibre Channel (FC)
FCoE
Arbitrated loop topology is required when you attach the IBM FlashSystem 900 directly to the client VMware ESX hosts.
 
Note: The FlashSystem 900 16 Gbps FC ports do not support direct connection to client hosts. A SAN switch must be placed between the IBM FlashSystem 900 and any 16 Gbps-attached client host. If arbitrated loop connection is required by the host, 8 Gbps FC must be used.
To configure round-robin multipathing in a VMware ESX environment, complete these steps:
1. In the vSphere client, select the Configuration tab.
2. In the Devices view, select each disk on which you want to change the path selection.
3. In the Manage Paths pane, change the Path Selection setting to Round Robin (VMware).
You must set the alignment in the guest operating system. VMware aligns its data stores to 64 KB, but guest virtual machines must still align their own presentation of the storage. Before you continue the installation of a guest Linux operating system or a guest Windows Server 2003 operating system, partition the storage to the aligned accesses.
VAAI unmap support
FlashSystem 900 supports the VMware ESXi VAAI unmap command. The ESXi host informs the storage system that files or VMs had been deleted or moved from a VMFS data store using VAAI unmap. The VAAI unmap is then using the SCSI UNMAP command. This VAAI command is often used with Thin Provisioned VMFS datastores but this command is not restricted to Thin Provisioned volumes. FlashSystem 900 does not provide thin provisioned volumes.
5.3.8 FlashSystem 900 and IBM SAN Volume Controller or Storwize V7000
For details about IBM SAN Volume Controller or Storwize V7000 product integration, considerations, and configuration with the IBM FlashSystem 900, see Chapter 8, “Product integration” on page 279.
5.3.9 FlashSystem iSCSI host attachment
Support for iSCSI hosts is planned for these operating systems:
Red Hat Enterprise Linux 6.5
SUSE Linux Enterprise Server 11 Service Pack 3
Microsoft Windows 2012 R2
You can find an excellent description about the host site for iSCSI connection in Implementing the IBM Storwize V7000 and IBM Spectrum Virtualize V7.6, SG24-7938.
 
Note: Always check the IBM System Storage Interoperation Center (SSIC) to get the latest information about supported operating systems, hosts, switches, and so on:
5.3.10 FlashSystem iSCSI configuration
Figure 5-15 shows the GUI I/O port information. In this picture, ports 3 and 11 are online.
 
Note: The FlashSystem 900 iSCSI port ID numbering starts with ID 3. The two ports with ID 1 and 2 are the management ports of the two canisters as shown in Example 5-5.
Figure 5-15 FlashSystem 900 iSCSI properties
Example 5-5 shows the configuration by using the lsportip command.
Example 5-5 FlashSystem 900 lsportip information
>lsportip -delim :
id:node_id:node_name:canister_id:adapter_id:port_id:IP_address:mask:gateway:IP_address_6:prefix_6:gateway_6:MAC:duplex:state:speed:failover:link_state:host:host_6
1:1:node1:0:0:0:::::0::40:f2:e9:4a:a3:34::management_only::no:active::
2:2:node2:0:0:0:::::0::40:f2:e9:4a:a2:84::management_only::no:active::
3:1:node1:1:1:1:192.168.6x.xxx:255.255.255.0:192.168.61.1::0::40:F2:E9:4A:01:99::online:10Gb:no:active::
4:1:node1:1:1:2:192.168.1.10:255.255.255.0:0.0.0.0::0::40:F2:E9:4A:01:9B::offline:NONE:no:inactive::
5:1:node1:1:1:3:192.168.1.10:255.255.255.0:0.0.0.0::0::40:F2:E9:4A:01:9D::offline:NONE:no:inactive::
6:1:node1:1:1:4:192.168.1.10:255.255.255.0:0.0.0.0::0::40:F2:E9:4A:01:9F::offline:NONE:no:inactive::
7:1:node1:1:2:1:192.168.1.10:255.255.255.0:0.0.0.0::0::40:16:88:06:00:01::offline:NONE:no:inactive::
8:1:node1:1:2:2:192.168.1.10:255.255.255.0:0.0.0.0::0::40:16:88:06:00:03::offline:NONE:no:inactive::
9:1:node1:1:2:3:192.168.1.10:255.255.255.0:0.0.0.0::0::40:16:88:06:00:05::offline:NONE:no:inactive::
10:1:node1:1:2:4:192.168.1.10:255.255.255.0:0.0.0.0::0::40:16:88:06:00:07::offline:NONE:no:inactive::
11:2:node2:2:1:1:192.168.6x.xxx:255.255.255.0:192.168.61.1::0::40:F2:E9:4A:01:91::online:10Gb:no:active::
12:2:node2:2:1:2:192.168.1.10:255.255.255.0:0.0.0.0::0::40:F2:E9:4A:01:93::offline:NONE:no:inactive::
13:2:node2:2:1:3:192.168.1.10:255.255.255.0:0.0.0.0::0::40:F2:E9:4A:01:95::offline:NONE:no:inactive::
14:2:node2:2:1:4:192.168.1.10:255.255.255.0:0.0.0.0::0::40:F2:E9:4A:01:97::offline:NONE:no:inactive::
15:2:node2:2:2:1:192.168.1.10:255.255.255.0:0.0.0.0::0::CC:CC:CC:CC:CC:CD::offline:NONE:no:inactive::
16:2:node2:2:2:2:192.168.1.10:255.255.255.0:0.0.0.0::0::CC:CC:CC:CC:CC:CF::offline:NONE:no:inactive::
17:2:node2:2:2:3:192.168.1.10:255.255.255.0:0.0.0.0::0::CC:CC:CC:CC:CC:D1::offline:NONE:no:inactive::
18:2:node2:2:2:4:192.168.1.10:255.255.255.0:0.0.0.0::0::CC:CC:CC:CC:CC:D3::offline:NONE:no:inactive::
5.3.11 Windows 2008 R2 and Windows 2012 iSCSI attachment
In Windows 2008 R2 and 2012, the Microsoft iSCSI software initiator is preinstalled. Enter iscsi in the search field of the Windows start menu (Figure 5-16) and click iSCSI Initiator.
Figure 5-16 Windows iSCSI initiator
Confirm the automatic startup of the iSCSI service (Figure 5-17).
Figure 5-17 Automatic startup of the iSCSI service
The iSCSI Configuration window opens. Select the Configuration tab (Figure 5-18). Write down the initiator name of your Windows host, you will use it later in the host configuration part when configuring FlashSystem 900.
Figure 5-18 iSCSI Initiator Properties window
Open the FlashSystem 900 GUI and open the Add Host dialog by selecting
Hosts  Hosts  Add Hosts.
Enter a name for the Windows host and add the initiator name of the Windows host that is shown in Figure 5-19 on page 127, which shows the dialog with values.
Figure 5-19 iSCSI add host dialog
Create the FlashSystem 900 LUNs for this host and map them to this host as described in 6.3, “Volumes menu” on page 199.
Return to the Windows system’s iSCSI Initiator Properties window and select the Targets tab. Click Refresh; the FlashSystem 900 is now listed in the Discovered targets section (Figure 5-20). If FlashSystem 900 is not listed, you can add its IP addresses in the Target field and click Quick Connect.
Figure 5-20 iSCSi discovered targets
Click Connect and select Enable multi-path if the host is connected with multiple Ethernet port to FlashSystem 900 (Figure 5-21).
Figure 5-21 Connect to target dialog
5.3.12 Linux iSCSI attachment
Install or set up the iSCSI initiator software according to the instructions of your Linux distribution and version.
To configure the FlashSystem 900, you need the iSCSI initiator name. Example 5-6 shows how to find the initiator name on Red Hat Enterprise Linux 6.2.
Example 5-6 Red Hat 6.2 iSCSI initiator name
# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.redhat:21774cad37da
Open the FlashSystem 900 GUI and select Hosts  Hosts  Add Hosts. The Add Host dialog opens (Figure 5-22). Enter a name for the Linux host and add the initiator name of the host as shown in Example 5-6.
Figure 5-22 iSCSI Add Host dialog
Create the FlashSystem 900 LUNs for this host and map them to this host as described in 6.3, “Volumes menu” on page 199.
Return to the Linux host and start the iSCSI disk discovery. Example 5-7 shows the command that is used to discover the LUNs created on the FlashSystem 900 for this host. In this example, two FlashSystem LUNs will be attached to the Linux host. The Linux host will have one path to each canister. Multipathing will be set up.
Example 5-7 Discover iSCSI targets and devices
#
# # use iscsiadm --mode discoverydb to detect iSCSI targets
# # detect a connection to each FlashSystem 900 to be able to set up multipathing
 
# # first FlashSystem canister
# iscsiadm --mode discoverydb --type sendtargets --portal 192.168.61.215 --discover
192.168.6x.xxx:3260,-1 naa.500507605e807780
 
# # second FlashSystem canister
# iscsiadm --mode discoverydb --type sendtargets --portal 192.168.61.216 --discover
192.168.6x.xxx:3260,-1 naa.500507605e807780
 
# # login to first target to detect iSCSI disks
# iscsiadm --mode node --targetname naa.500507605e807780 --portal 192.168.6x.xxx --login
Logging in to [iface: default, target: naa.500507605e807780, portal: 192.168.6x.xxx,3260] (multiple)
Login to [iface: default, target: naa.500507605e807780, portal: 192.168.6x.xxx,3260] successful.
 
# # login to second target to detect iSCSI disks
# iscsiadm --mode node --targetname naa.500507605e807780 --portal 192.168.6x.xxx --login
Logging in to [iface: default, target: naa.500507605e807780, portal: 192.168.6x.xxx,3260] (multiple)
Login to [iface: default, target: naa.500507605e807780, portal: 192.168.6x.xxx,3260] successful.
 
# # use iscsiadm to get detailed information about the target
# # both targets will only differ in the node addresses
# iscsiadm -m node --targetname=naa.500507605e807780 --portal=192.168.6x.xxx --op=show
# BEGIN RECORD 6.2.0-873.2.el6
node.name = naa.500507605e807780
... (lines left out)
node.discovery_address = 192.168.6x.xxx
... (lines left out)
node.conn[0].address = 192.168.6x.xxx
... (lines left out)
 
# # use iscsiadm -m session to get the attached iSCSIdisk
# # they are listed at the end of the command output
# # the two falshsystem LUN’s are seen by each portal
# iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-870
version 6.2.0-873.2.el6
Target: naa.500507605e807780
Current Portal: 192.168.6x.xxx:3260,50
Persistent Portal: 192.168.6x.xxx:3260,50
 
... (lines left out)
 
************************
Attached SCSI devices:
************************
Host Number: 16 State: running
scsi16 Channel 00 Id 0 Lun: 0
scsi16 Channel 00 Id 0 Lun: 2
Attached scsi disk sds State: running
scsi16 Channel 00 Id 0 Lun: 3
Attached scsi disk sdt State: running
Current Portal: 192.168.61.216:3260,18
Persistent Portal: 192.168.61.216:3260,18
 
... (lines left out)
 
************************
Attached SCSI devices:
************************
Host Number: 17 State: running
scsi17 Channel 00 Id 0 Lun: 0
scsi17 Channel 00 Id 0 Lun: 2
Attached scsi disk sdu State: running
scsi17 Channel 00 Id 0 Lun: 3
Attached scsi disk sdv State: running
You can set up Linux multipathing with the configuration file multipath.conf and the udev rules (as described in 5.5.3, “Linux configuration file multipath.conf example” on page 152 and Example 5-27 on page 156) by adding the FlashSystem 900 LUN Volume Unique Identifier to the blacklist_exceptions in the multipath.conf file.
Select Hosts  Volumes by Host, right-click the volume name you want, and then select Properties. Figure 5-23 shows this value in the properties window of the FlashSystem 900 LUN using the GUI.
Figure 5-23 FlashSystem 900 LUN Volume Unique Identifier
When using the CLI, you get this information from the lsvdisk command. Example 5-8 shows this value in the multipath.conf file.
Example 5-8 Blacklist exception for iSCSI
blacklist_exceptions {
wwid "36005076*"
}
This entry will allow all LUNs starting with "36005076" to be used by the Linux multipath daemon. This will also include the FlashSystem 900 FC LUNs. You must restart the Linux multipath daemon for this change to take effect, and then check for new devices as shown in Example 5-9.
Example 5-9 Restarting the Linux multipath daemon
# # add the Volume Unique Identifier to /multipath.conf
# vi /etc/multipath.conf
 
# # restart multipathd
# service multipathd restart
ok
Stopping multipathd daemon: [ OK ]
Starting multipathd daemon: [ OK ]
 
# # check for new devices
# multipath -v 2
create: mpathd (36005076aa18e082aa000000006000007) undef IBM,FlashSystem-9840
size=130G features='0' hwhandler='0' wp=undef
`-+- policy='queue-length 0' prio=1 status=undef
|- 16:0:0:2 sds 65:32 undef ready running
`- 17:0:0:2 sdu 65:64 undef ready running
create: mpathe (36005076aa18e082aa000000007000008) undef IBM,FlashSystem-9840
size=131G features='0' hwhandler='0' wp=undef
`-+- policy='queue-length 0' prio=1 status=undef
|- 16:0:0:3 sdt 65:48 undef ready running
`- 17:0:0:3 sdv 65:80 undef ready running
 
# # list devices
# multipath -ll
mpathe (36005076aa18e082aa000000007000008) dm-5 IBM,FlashSystem-9840
size=131G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 16:0:0:3 sdt 65:48 active ready running
`- 17:0:0:3 sdv 65:80 active ready running
mpathd (36005076aa18e082aa000000006000007) dm-4 IBM,FlashSystem-9840
size=130G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 16:0:0:2 sds 65:32 active ready running
`- 17:0:0:2 sdu 65:64 active ready running
 
#
5.4 Miscellaneous host attachment
This section provides implementation and other general information for connecting client host systems to IBM FlashSystem 900.
 
Note: Always check the IBM SSIC website to get the latest information about supported operating systems, hosts, switches, adapters, and more:
If the IBM SSIC does not list the support, submit a SCORE or RPQ to IBM to request approval. To submit a SCORE or RPQ, contact your IBM representative or IBM Business Partner.
5.4.1 FlashSystem 900 and Solaris client hosts
The Oracle Solaris operating system has slight differences between x86 and SPARC support when the disks are partitioned. The Solaris Multiplexed I/O (MPxIO) multipathing setups, however, are nearly identical.
Network topology guidelines
Configure the settings of the FC ports in the system for the network topology that is used with the storage system.
If they are directly attached to a server, configure the FC ports in the storage system to an arbitrated loop topology. If configured through a switch, set the FC ports to point-to-point to correctly negotiate with the switch.
 
Note: The 16 Gbps FC ports of the FlashSystem 900 do not support direct connection to client hosts. A SAN switch must be placed between the FlashSystem 900 and any 16 GBps-attached client host.
Sector size information for Solaris
The Solaris operating system supports a 512-byte sector size. The FlashSystem 900 uses a 512-byte sector size.
Aligning the partition for Solaris
Aligning the partition to a 4 KB boundary improves performance of the storage system.
Solaris SPARC aligns slices on logical unit numbers (LUNs) when the LUN is using an Oracle SUN label for a LUN smaller than 2 TB. If the 2 TB capacity is exceeded, the operating system must use an Extensible Firmware Interface (EFI) label. If you use the EFI label, the partitions that are created do not start on a 4 KB boundary. EFI disks default to 34-sector offsets that have the default partition table. To align the partition to a 4 KB boundary for optimal performance, complete the following steps:
1. After formatting the disk, select the partition option.
2. Choose the All Free Hog partitioning base option:
partition> 0
Select partitioning base:
0. Current partition table (unnamed)
1. All Free Hog
Choose base (enter number) [0]? 1
3. Change the base sector to 40.
4. Press Enter to use the default values for the remaining options:
partition> 0
Enter partition id tag[usr]:
Enter partition permission flags[wm]:
Enter new starting Sector[34]: 40
Enter partition size[10066067388b, 10066067427e, 4915071mb, 4799gb, 4tb]:
Part Tag Flag First Sector Size Last Sector
0 usr wm 40 4.69TB 10066067427
5. After the first sector is changed, save the configuration and then continue by using the newly created partition.
Multipathing information for Solaris 11 hosts
The method of implementing multipathing depends on the HBA used in the server. MPxIO is the built-in multipathing mechanism for Solaris and no longer requires an HBA that is branded Oracle SUN.
To enable MPxIO support, copy the /kernel/drv/scsi_vhci.conf file to the /etc/driver/drv/scsi_vhci.conf file and modify the /etc/driver/drv/scsi_vhci.conf file.
Example 5-10 shows the FlashSystem entry for Solaris 11.
Example 5-10 Example of /etc/driver/drv/scsi_vhci.conf for Solaris 11 that is reduced for clarity
name="scsi_vhci" class="root";
load-balance="round-robin";
auto-failback="enable";
ddi-forceload =
"misc/scsi_vhci/scsi_vhci_f_asym_sun",
"misc/scsi_vhci/scsi_vhci_f_asym_lsi",
"misc/scsi_vhci/scsi_vhci_f_asym_emc",
"misc/scsi_vhci/scsi_vhci_f_sym_emc",
"misc/scsi_vhci/scsi_vhci_f_sym_hds",
"misc/scsi_vhci/scsi_vhci_f_sym",
"misc/scsi_vhci/scsi_vhci_f_tpgs";
 
scsi-vhci-failover-override =
"IBM FlashSystem-9840", "f_sym";
 
spread-iport-reservation = "yes";
iport-rlentime-snapshot-interval = 30;
 
Note: The "IBM FlashSystem-9840" entry in Example 5-10 contains exactly five spaces.
Preferred read with Solaris
You can increase the speed of an application by accelerating the read I/Os. Implementing preferred read with the FlashSystem 900 gives you an easy way to deploy the FlashSystem 900 in an existing environment. This section describes how to set up preferred read with Solaris.
Use Solaris Volume Manager (SVM) to create mirrored volumes and to set up preferred read to the first disk in the mirrored volume. Use the following command:
metaparam -r first <device>
This command modifies the read option for a mirror. The option first specifies reading only from the first submirror.
FlashSystem must be the first disk in the mirrored volume. If you create a new volume, you have to select the FlashSystem 900 metadevice as the first metadevice for the volume. If you have an existing mirrored volume, you can use these steps:
1. Add FlashSystem metadevice to the existing volume.
2. Synchronize the data.
3. Destroy the mirrored volume without losing data.
4. Re-create the mirrored volume and use the FlashSystem 900 metadevice as the first entry.
The next example shows the steps in creating a Solaris file system with mirrored devices and preferred read on the FlashSystem 900:
1. Check for the attached FlashSystem 900.
2. Create the partition on the attached FlashSystem 900.
3. Create a mirrored file system.
4. Set preferred read on the FlashSystem 900.
Check for attached FlashSystem 900
Us the fcinfo command to get information about the local FC ports and the devices that are attached to this port. Example 5-11 shows these two use cases.
Example 5-11 Solaris FC port information
#
# # list local FC ports
# fcinfo hba-port
HBA Port WWN: 2100001b320f324e
Port Mode: Initiator
Port ID: 10800
OS Device Name: /dev/cfg/c9
Manufacturer: QLogic Corp.
Model: QLE2462
Firmware Version: 5.6.4
FCode/BIOS Version: BIOS: 1.29; fcode: 1.27; EFI: 1.09;
Serial Number: RFC0802K03058
Driver Name: qlc
Driver Version: 20120717-4.01
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2000001b320f324e
Max NPIV Ports: 127
NPIV port list:
 
# # list systems attached to local FC -ports
# fcinfo remote-port -s -p 2100001b320f324e
Remote Port WWN: 500507605efe0ac2
Active FC4 Types: SCSI
SCSI Target: yes
        Port Symbolic Name: IBM FlashSystem-9840 0020
Node WWN: 500507605efe0ad0
LUN: 0
Vendor: IBM
Product: FlashSystem-9840
OS Device Name: /dev/rdsk/c10t500507605EFE0AC2d0s2
Create a partition on attached FlashSystem 900
The Solaris Server in this example is based on the x86 architecture. Before creating Solaris partitions, the disk must be partitioned for possible operating systems on the x86 server. In Example 5-12, the disk is used only for Solaris.
Some lines were removed for clarity.
Example 5-12 Create Solaris partition
#
# # Format disk
# format
Searching for disks...done
 
 
AVAILABLE DISK SELECTIONS:
0. c7d0 ...
/pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0
1. c9t0d1 ...
/pci@0,0/pci8086,d13a@5/pci1077,138@0/fp@0,0/disk@w20080020c2117377,1
2. c9t0d2 ...
/pci@0,0/pci8086,d13a@5/pci1077,138@0/fp@0,0/disk@w20080020c2117377,2
3. c10t500507605EFE0AC2d0 <IBM-9840-0020 cyl 16716 alt 2 hd 224 sec 56>
/pci@0,0/pci8086,d13a@5/pci1077,138@0,1/fp@0,0/disk@w500507605efe0ac2,0
Specify disk (enter its number): 3
selecting c10t500507605EFE0AC2d0
[disk formatted]
No Solaris fdisk partition found.
 
 
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
fdisk - run the fdisk program
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show disk ID
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
format> fdisk
No fdisk table exists. The default partition for the disk is:
 
a 100% "SOLARIS System" partition
 
Type "y" to accept the default partition, otherwise type "n" to edit the
partition table.
y
format> partition
 
 
PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
!<cmd> - execute <cmd>, then return
quit
partition> print
Current partition table (default):
Total disk cylinders available: 16715 + 2 (reserved cylinders)
 
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 16714 99.98GB (16715/0/0) 209672960
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 6.12MB (1/0/0) 12544
9 unassigned wm 0 0 (0/0/0) 0
 
partition> 0
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
 
Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]:
Enter partition size[0b, 0c, 0e, 0.00mb, 0.00gb]: 16713e
partition> print
Current partition table (unnamed):
Total disk cylinders available: 16715 + 2 (reserved cylinders)
 
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 1 - 16713 99.97GB (16713/0/0) 209647872
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 16714 99.98GB (16715/0/0) 209672960
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 6.12MB (1/0/0) 12544
9 unassigned wm 0 0 (0/0/0) 0
 
... < create a partition on slice 7, size 1 cylinder, to be used for metadb >
 
partition> label
Ready to label disk, continue? yes
 
... < partition written, quit format >
Create a mirrored file system
You can easily set up a mirrored file system using the Solaris Volume Manager (SVM) Soft Partitioning. Example 5-13 shows the steps to create SVM metadevices and use them to set up a mirrored file system. One mirror is on a spinning disk and the other mirror is on the FlashSystem 900.
Some lines were removed for clarity.
Example 5-13 Solaris mirrored file system
#
# # create an SVM database with the attached devices
# metadb -f -a c9t0d0s7 c10t500507605EFE0AC2d0s7
# metadb -i
flags first blk block count
a u 16 8192 /dev/dsk/c9t0d0s7
a u 16 8192 /dev/dsk/c10t500507605EFE0AC2d0s7
...
u - replica is up to date
...
a - replica is active, commits are occurring to this replica
...
 
# # create a one-on-one concatenation on spinning disk
# metainit d_disk 1 1 c9t0d0s0
d_disk: Concat/Stripe is setup
 
# # create a one-on-one concatenation mn flashsystem
# metainit d_FlashSystem 1 1 c10t500507605EFE0AC2d0s0
 
# # mirror setup. first on spinning disk
# metainit d_mirror -m d_disk
d_mirror: Mirror is setup
 
# # attach second disk
# metattach d_mirror d_FlashSystem
d_mirror: submirror d_FlashSystem is attached
 
# # check
# metastat -p d_mirror
d_mirror -m /dev/md/rdsk/d_disk /dev/md/rdsk/d_FlashSystem 1
d_disk 1 1 /dev/rdsk/c9t0d0s0
d_FlashSystem 1 1 /dev/rdsk/c10t500507605EFE0AC2d0s0
 
 
# # create FS
# newfs /dev/md/rdsk/d_mirror
newfs: construct a new file system /dev/md/rdsk/d_mirror: (y/n)? y
Warning: 512 sector(s) in last cylinder unallocated
/dev/md/rdsk/d_mirror: 115279360 sectors in 18763 cylinders of 48 tracks, 128 sectors
56288.8MB in 1173 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
.......................
super-block backups for last 10 cylinder groups at:
114328992, 114427424, 114525856, 114624288, 114722720, 114821152, 114919584,
115018016, 115116448, 115214880
 
# # mount
# mount /dev/md/dsk/d_mirror /mnt
 
# # check
# df -h | grep mirror
/dev/md/dsk/d_mirror 54G 55M 54G 1% /mnt
The file system is now on two mirrored devices.
Set preferred read on FlashSystem 900
Solaris Volume Manager uses the round-robin read algorithm as a default. To read from only one device, you can choose the first option. Then, the reads will be performed only from the first device. In this example, the first device that was used was on spinning disk. To use the FlashSystem 900 as the first device in the mirror, you must destroy the mirror and re-create the mirror with the FlashSystem 900 as the first disk.
 
Attention: Be extremely careful while performing the steps described in Example 5-14 because the data can be at risk. Ensure that you back up any critical data before performing any file system management activities.
Example 5-14 Solaris preferred read setup
#
# # destroy mirror, recreate mirror
# # first umount
# umount /dev/md/dsk/d_mirror
 
# # delete the mirror
# metaclear d_mirror
d_mirror: Mirror is cleared
 
# # recreate mirror, FlashSystem as first entry
# metainit d_mirror -m d_FlashSystem
d_mirror: Mirror is setup
 
# # attach spinning disk
# metattach d_mirror d_disk
d_mirror: submirror d_disk is attached
 
# # check
# metastat
d_mirror: Mirror
Submirror 0: d_FlashSystem
State: Okay
Submirror 1: d_disk
State: Resyncing
Resync in progress: 20 % done
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 115279360 blocks (54 GB)
 
d_FlashSystem: Submirror of d_mirror
State: Okay
Size: 115279360 blocks (54 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c10t500507605EFE0AC2d0s0 0 No Okay Yes
 
d_disk: Submirror of d_mirror
State: Resyncing
Size: 115279360 blocks (54 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c9t0d0s0 0 No Okay Yes
 
# # first disk preferred read
# # metaparam -r first d_mirror
 
# # check changed read algorithm
# metaparam d_mirror
d_mirror: Mirror current parameters are:
Pass: 1
Read option: first (-r)
Write option: parallel (default)
 
# # mount
# mount /dev/md/dsk/d_mirror /mnt
 
# # check
# df -h | grep mirror
/dev/md/dsk/d_mirror 54G 10G 44G 19% /mnt
 
# # create a read process and check preferred read with iostat:
# # for example: iostat -xMnz 10
# # only md/d_FlashSystem and md/d_mirror will show activity.
 
Note: You must destroy and re-create a metadevice mirror to change the device that is used as the first disk for the preferred read.
5.4.2 FlashSystem 900 and HP-UX client hosts
This section discusses HP-UX client host with FlashSystem 900 configurations.
 
Note: Always check the IBM SSIC to ensure that the IBM FlashSystem 900 supports your required client host and version required.
HP-UX operating system configurations
Consider this checklist when you use HP-UX client hosts:
If using Veritas File System (VxFS), select a block size of 4 Kb or greater.
If using HP physical volume links (PVLinks), add each disk and path into the same Volume Group under LVM.
For directly attaching FlashSystem, FC ports on the FlashSystem must be set to arbitrated loop (AL) topology (not for 16 Gb attachment).
For fabric-attached, use point-to-point topology or Auto.
LUNs greater than 2 TB have not been tested.
For LUNs that require sequential detection, you can use the mkvdiskhostmap command to explicitly assign LUNs; LUNs on a bus (FC port) start at LUN 0, LUN 1, and so on.
LUNs are historically limited to 0 - 7 on a bus.
When HP-UX is on traditional SCSI-2 storage architecture, you will not see LUN 8+ unless the SCSI driver in HP-UX is updated to SCSI-3 compliance.
No virtual bus or virtual target architecture in the FlashSystem; more than eight LUNs require LUN masking to overlapping I/O paths.
Detecting LUNs on HP-UX server
Changes in the FlashSystem LUN configuration do not generate a CHECK_CONDITION to the host. You must rescan the SCSI bus manually in either of the following ways:
# ioscan -fnC disk
# ioinit –i (HP-UX 11v1, 11v2)
You might need to force CHECK_CONDITION by a link reset. You can use the chportfc command and its -reset option to perform a link reset on a dedicated port.
Alignment
HP-UX volumes will align to 4 KB boundaries. The VxFS file system must be set to a block size of 4096 or higher to keep alignment:
# mkfs –F vxfs –o bsize=4096 <disk>
5.5 FlashSystem 900 preferred read and configuration examples
Examples of implementing preferred read in different environments and of the Linux multipath.conf configuration file are shown in the following sections.
5.5.1 FlashSystem 900 deployment scenario with preferred read
Implementing preferred read with the IBM FlashSystem 900 gives you an easy way to deploy the IBM FlashSystem 900 in an existing environment. The data is secured by writing it to two separate storage systems. Data is read at the FlashSystem 900 speed, because it is always read from the FlashSystem 900. This implementation does not change the existing infrastructure concepts, for example, data security, replication, backup, disaster recovery, and so on. Preferred read can be implemented with the following techniques:
IBM SAN Volume Controller Virtualize/V7000:
 – Virtual disk mirroring (also known as volume mirroring)
At the volume manager or operating system level:
 – IBM AIX
 – Linux LVM (native least queue read)
At the application level:
 – Oracle Automatic Storage Management (ASM)
 – Standby or reporting instance
 – SQL Server: AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together.
The following examples are schemas that show the logical setup. You have to plan the FC, FCoE, or InfiniBand cabling and SAN setup, depending on your environment and needs.
An example of implementing preferred read with the operating system volume manager is shown in Figure 5-24 on page 141. It represents a schema of IBM AIX LVM mirroring.
Figure 5-24 Preferred read with AIX
An example of implementing preferred read on the application level is shown in Figure 5-25. It represents a schema of Oracle Automatic Storage Management (ASM) mirroring.
Figure 5-25 Preferred read with Oracle ASM
An example of implementing preferred read on a virtualization layer is shown in Figure 5-26 on page 142. It represents a schema of the IBM SAN Volume Controller.
Figure 5-26 Preferred read with the IBM SAN Volume Controller
5.5.2 Implementing preferred read
You can increase the speed of an application by accelerating the read I/Os. Implementing preferred read with the IBM FlashSystem 900 gives you an easy way to deploy the IBM FlashSystem 900 in an existing environment. The data is secured by writing it to two separate storage systems. Data is read at the FlashSystem 900 speed, because it is always read from the IBM FlashSystem 900. This implementation does not change the existing infrastructure concepts, for example, data security, replication, backup, disaster recovery, and so on. Preferred read is described in 5.5.1, “FlashSystem 900 deployment scenario with preferred read” on page 140.
Preferred read with AIX
On AIX, preferred read is implemented by the AIX Logical Volume Manager (LVM).
The following steps are illustrated in Example 5-15 on page 143 through Example 5-21 on page 150. The examples show the process of creating a preferred read configuration with the FlashSystem 900. The steps assume that the AIX server is cabled and zoned correctly.
1. Create a file system on spinning disk.
2. Add the IBM FlashSystem 900 as a mirrored copy to this file system.
3. Set the correct read and write policy.
4. Set preferred read to the IBM FlashSystem 900.
In the following steps (Example 5-15 on page 143 through Example 5-21 on page 150), two systems, which are attached through a SAN to the AIX host, are used. Their AIX hdisk information is listed:
hdisk1 - hdisk4: IBM MPIO FC 2145 (V7000)
hdisk5 - hdisk8: IBM FlashSystem 900 Storage
The following steps are based on AIX 7.1.
Create a file system on spinning disk
The steps in Example 5-15 create a file system on AIX. In this example, hdisk1 - hdisk4 are used. All commands are preceded by a comment to the next action. Always check the command parameters against your current AIX version.
Example 5-15 AIX file system creation
#
# # Create a file system on normal disks
 
# # list physical disks
# lsdev -C -c disk
hdisk0 Available Virtual SCSI Disk Drive
 
# # attach Disksystem to AIX server and check for new disks
# cfgmgr
 
# # list physical disks
# lsdev -C -c disk
hdisk0 Available Virtual SCSI Disk Drive
hdisk1 Available 00-00-02 MPIO FC 2145
hdisk2 Available 00-00-02 MPIO FC 2145
hdisk3 Available 00-00-02 MPIO FC 2145
hdisk4 Available 00-00-02 MPIO FC 2145
 
# # set path policy to your needs: round_robin, load_balance, or shortest_queue
# # check path for all disks, hdisk1 as an example
# lsattr -El hdisk1 | grep algorithm
algorithm load_balance
 
# # use chdev if needed
# chdev -l hdisk1 -a algorithm=round_robin
# chdev -l hdisk1 -a algorithm=shortest_queue
# chdev -l hdisk1 -a algorithm=load_balance
 
# # create a volume group with the four IBM 2145 FC Disks
# mkvg -B -y test_vg_1 -t 8 hdisk1 hdisk2 hdisk3 hdisk4
0516-1254 mkvg: Changing the PVID in the ODM.
0516-1254 mkvg: Changing the PVID in the ODM.
0516-1254 mkvg: Changing the PVID in the ODM.
0516-1254 mkvg: Changing the PVID in the ODM.
test_vg_1
 
# # list the information
# lsvg
rootvg
test_vg_1
# lsvg test_vg_1
VOLUME GROUP: test_vg_1 VG IDENTIFIER: 00f6600100004c00000001464e967f3d
VG STATE: active PP SIZE: 64 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 3196 (204544 megabytes)
MAX LVs: 512 FREE PPs: 3196 (204544 megabytes)
LVs: 0 USED PPs: 0 (0 megabytes)
OPEN LVs: 0 QUORUM: 3 (Enabled)
TOTAL PVs: 4 VG DESCRIPTORS: 4
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 4 AUTO ON: yes
MAX PPs per VG: 130048
MAX PPs per PV: 8128 MAX PVs: 16
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512
 
# # create a logical volume with file system type jfs2
# # name will be test_lv_1
# mklv -y test_lv_1 -t'jfs2' test_vg_1 3096 hdisk1 hdisk2 hdisk3 hdisk4
test_lv_1
 
# # create a file system on the logical volume
# # mount point /test/preferred_read will be created at the same time
# crfs -v jfs2 -d test_lv_1 -m /test/preferred_read
File system created successfully.
202893060 kilobytes total disk space.
New File System size is 405798912
 
# # mount new created file system
# mount /test/preferred_read
 
# # check
# df -g /test/preferred_read
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/test_lv_1 193.50 193.47 1% 4 1% /test/preferred_read
Add the FlashSystem 900 as a mirrored copy to this file system
The steps in Example 5-16 extend an existing file system and use this extension to create a mirror. In this example, the second disk, hdisk4, is used.
All commands are preceded by a comment to the next action.
Example 5-16 Create a mirrored file system on AIX
#
# # Add FlashSystem 900 as a mirrored copy to this file system
 
# # attach FlashSystem 900 to AIX server and check for new disks
# cfgmgr
 
# # check for new FlashSystem 900 disk, will be hdisk5 hdisk6 hdisk7 hdisk8
# lsdev -C -c disk
hdisk0 Available Virtual SCSI Disk Drive
hdisk1 Available 00-00-02 MPIO FC 2145
hdisk2 Available 00-00-02 MPIO FC 2145
hdisk3 Available 00-00-02 MPIO FC 2145
hdisk4 Available 00-00-02 MPIO FC 2145
hdisk5 Available 00-00-02 MPIO IBM FlashSystem Disk
hdisk6 Available 00-00-02 MPIO IBM FlashSystem Disk
hdisk7 Available 00-00-02 MPIO IBM FlashSystem Disk
hdisk8 Available 00-00-02 MPIO IBM FlashSystem Disk
 
# # set path policy to your needs:round_robin or shortest_queue
# # check path for all disks, hdisk5 as an example
# lsattr -El hdisk5 | grep algorithm
algorithm shortest_queue
 
# # use chdev if needed
# chdev -l hdisk5 -a algorithm=round_robin
# chdev -l hdisk5 -a algorithm=shortest_queue
 
# # list used Physical volume names
# lslv -m test_lv_1 | awk '{print $3, " ", $5, " ", $7}' | uniq
 
PV1 PV2 PV3
hdisk1
hdisk2
hdisk3
hdisk4
 
# # add FlashSystem 900 disk to volume group
# extendvg test_vg_1 hdisk5 hdisk6 hdisk7 hdisk8
0516-1254 extendvg: Changing the PVID in the ODM.
0516-1254 extendvg: Changing the PVID in the ODM.
0516-1254 extendvg: Changing the PVID in the ODM.
0516-1254 extendvg: Changing the PVID in the ODM.
 
# # create a mirror
# mklvcopy test_lv_1 2 hdisk5 hdisk6 hdisk7 hdisk8
 
# # list used Physical volume names and check mirror
# lslv -m test_lv_1 | awk '{print $3, " ", $5, " ", $7}' | uniq
 
PV1 PV2 PV3
hdisk1 hdisk5
hdisk2 hdisk6
hdisk3 hdisk7
hdisk4 hdisk8
 
# # check mirror state
# lsvg -l test_vg_1
test_vg_1:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
test_lv_1 jfs2 3096 6192 8 open/stale /test/preferred_read
loglv00 jfs2log 1 1 1 open/syncd N/A
 
# # the mirror is stale, synchronize it
# # this command will take some time depending on volume size
# syncvg -P 32 -v test_vg_1
 
# # check mirror state
# lsvg -l test_vg_1
test_vg_1:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
test_lv_1 jfs2 3096 6192 8 open/syncd /test/preferred_read
loglv00 jfs2log 1 1 1 open/syncd N/A
 
# # turn VG quorum off
# # always check your business needs, if VG quorum should be enabled or disabled
# # do this to ensure the VG will not go offline if a quorum of disks goes missing
# chvg -Q n test_vg_1
 
# # check VG state
# lsvg test_vg_1
VOLUME GROUP: test_vg_1 VG IDENTIFIER: 00f6600100004c00000001464e967f3d
VG STATE: active PP SIZE: 64 megabyte(s)
.
OPEN LVs: 2 QUORUM: 1 (Disabled)
.
Now, the file system data is mirrored onto two separate physical locations. The first copy is on spinning disk; the second copy is on the FlashSystem 900.
Set the correct read and write policy
IBM AIX LVM sets the scheduling policy for reads and writes to the storage systems. If you use mirrored logical volumes, the following scheduling policies for writing to disk can be set for a logical volume with multiple copies:
Sequential scheduling policy
Performs writes to multiple copies or mirrors in order. The multiple physical partitions representing the mirrored copies of a single logical partition are designated primary, secondary, and tertiary. In sequential scheduling, the physical partitions are written to in sequence. The system waits for the write operation for one physical partition to complete before starting the write operation for the next one. When all write operations are complete for all mirrors, the write operation is complete.
Parallel scheduling policy
Simultaneously starts the write operation for all the physical partitions in a logical partition. When the write operation to the physical partition that takes the longest to complete finishes, the write operation is complete. Specifying mirrored logical volumes with a parallel scheduling policy might improve I/O read-operation performance because multiple copies allow the system to direct the read operation to the least busy disk for this logical volume.
Parallel write with sequential read scheduling policy
Simultaneously starts the write operation for all the physical partitions in a logical partition. The primary copy of the read is always read first. If that read operation is unsuccessful, the next copy is read. During the read retry operation on the next copy, the failed primary copy is corrected by the LVM with a hardware relocation. This patches the bad block for future access.
Parallel write with round-robin read scheduling policy
Simultaneously starts the write operation for all the physical partitions in a logical partition. Reads are switched back and forth between the mirrored copies.
To get the preferred read performance of the IBM FlashSystem 900, set the policy to parallel write with sequential read. With this option, you get these functions:
Write operations are done in parallel to all copies of the mirror.
Read operations are always done on the primary copy of the devices in the mirror set.
Example 5-17 shows how to change the LVM scheduler to the parallel write with sequential read scheduling policy.
 
Important: Downtime of the file system is required when you change the LVM scheduler policy.
Example 5-17 Changing the scheduler to parallel write with the round-robin read scheduling policy
# # check current state of the LVM scheduler
# lslv test_lv_1
LOGICAL VOLUME: test_lv_1 VOLUME GROUP: test_vg_1
LV IDENTIFIER: 00f6600100004c00000001464e967f3d.1 PERMISSION: read/write
VG STATE: active/complete LV STATE: opened/syncd
TYPE: jfs2 WRITE VERIFY: off
MAX LPs: 3096 PP SIZE: 64 megabyte(s)
COPIES: 2 SCHED POLICY: parallel
LPs: 3096 PPs: 6192
STALE PPs: 0 BB POLICY: relocatable
INTER-POLICY: minimum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 16
MOUNT POINT: /test/preferred_read LABEL: /test/preferred_read
DEVICE UID: 0 DEVICE GID: 0
DEVICE PERMISSIONS: 432
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?: NO
INFINITE RETRY: no
# lslv test_lv_1 | grep SCHED POLICY
COPIES: 2 SCHED POLICY: parallel
 
# # Logical volume must be closed.
# # If the logical volume contains a file system,
# # the umount command will close the LV device.
# umount /test/preferred_read
 
# # set schedular to parallel write with sequential read-scheduling policy
# # (parallel/sequential)
# # Note: mklv and chlv: The -d option cannot be used with striped logical volumes.
# chlv -d ps test_lv_1
# # check changed state of the LVM schedular
# lslv test_lv_1
LOGICAL VOLUME: test_lv_1 VOLUME GROUP: test_vg_1
LV IDENTIFIER: 00f6600100004c00000001464e967f3d.1 PERMISSION: read/write
VG STATE: active/complete LV STATE: closed/syncd
TYPE: jfs2 WRITE VERIFY: off
MAX LPs: 3096 PP SIZE: 64 megabyte(s)
COPIES: 2 SCHED POLICY: parallel/sequential
LPs: 3096 PPs: 6192
STALE PPs: 0 BB POLICY: relocatable
INTER-POLICY: minimum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 16
MOUNT POINT: /test/preferred_read LABEL: /test/preferred_read
DEVICE UID: 0 DEVICE GID: 0
DEVICE PERMISSIONS: 432
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?: NO
INFINITE RETRY: no
# lslv test_lv_1 | grep SCHED POLICY
COPIES: 2 SCHED POLICY: parallel/sequential
 
# # mount file system
# mount /test/preferred_read
# # write some data to the filesystem /test/preferred_read
# # then read this data and check with iostat
 
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk1 0.0 147495.2 1036.4 737476 0
hdisk2 0.0 134964.8 736.6 674824 0
hdisk4 0.0 85035.2 448.6 425176 0
hdisk3 0.0 118422.4 684.2 592112 0
The setup of the logical volume is now preferred read from the first copy.
Set preferred read to the FlashSystem 900
Check which logical volumes are primary, which are secondary, and which are tertiary, if any. This list might be a long list. Example 5-18 lists the first 10 lines of the command output and a command for a quick overview.
Example 5-18 Get first, second, and third logical volume physical disk
#
# # Get first, second, and third logical volume's physical disk
# # Get reduced list of all involved hdisks
# # list used Physical volume names
# lslv -m test_lv_1 | awk '{print $3, " ", $5, " ", $7}' | uniq
PV1 PV2 PV3
hdisk1 hdisk5
hdisk2 hdisk6
hdisk3 hdisk7
hdisk4 hdisk8
Now, the spinning disk devices in the PV1 column are the primary devices. The reads will all be supported by the PV1 devices. During boot, the PV1 devices are the primary copy of the mirror, and they will be used as the sync point.
You must remove the PV1 disks, so that the PV2 disks will be primary. Then, you add the removed disk and set up a mirror again.
Example 5-19 shows how to make the FlashSystem 900 the primary copy.
Example 5-19 Make the FlashSystem 900 the primary disk
# # remove the primary copy which is on spinning disk
# rmlvcopy test_lv_1 1 hdisk1 hdisk2 hdisk3 hdisk4
 
# # list used Physical volume names ; no mirror
# lslv -m test_lv_1 | awk '{print $3, " ", $5, " ", $7}' | uniq
 
PV1 PV2 PV3
hdisk5
hdisk6
hdisk7
hdisk8
 
# # add removed spinning disk to mirror data
# mklvcopy test_lv_1 2 hdisk1 hdisk2 hdisk3 hdisk4
 
# # list used Physical volume names ; first copy now on FlashSystem
# lslv -m test_lv_1 | awk '{print $3, " ", $5, " ", $7}' | uniq
 
PV1 PV2 PV3
hdisk5 hdisk1
hdisk6 hdisk2
hdisk7 hdisk3
hdisk8 hdisk4
 
# # check mirror state
# lsvg -l test_vg_1
test_vg_1:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
test_lv_1 jfs2 3096 6192 8 open/stale /test/preferred_read
loglv00 jfs2log 1 1 1 open/syncd N/A
 
# # the mirror is stale, synchronize it
# # this command will take some time depending on volume size
# syncvg -P 32 -v test_vg_1
 
# # check mirror state
# lsvg -l test_vg_1
test_vg_1:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
test_lv_1 jfs2 3096 6192 8 open/syncd /test/preferred_read
loglv00 jfs2log 1 1 1 open/syncd N/A
 
# # turn VG quorum off
# # always check your business need if VG quorum should be enabled or disabled
# # do this to ensure the VG will not go offline if a quorum of disks goes missing
# chvg -Q n test_vg_1
 
# # check VG state
# lsvg test_vg_1
VOLUME GROUP: test_vg_1 VG IDENTIFIER: 00f6600100004c00000001464e967f3d
VG STATE: active PP SIZE: 64 megabyte(s)
.
OPEN LVs: 2 QUORUM: 1 (Disabled)
.
 
# # again read the data and check with iostat
 
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk0 0.0 8.8 1.6 44 0
hdisk5 0.0 281169.6 5695.8 1405848 0
hdisk6 0.0 261426.4 5057.2 1307132 0
hdisk7 0.0 228819.2 4434.0 1144096 0
hdisk8 0.0 115506.4 2221.0 577532 0
You can use the iostat command to see the effects of the parallel/sequential scheduler settings and the performance of the spinning disk or the FlashSystem 900 as the primary disk. Execute this command with the normal disk and again later with the FlashSystem 900 as the primary disk:
iostat -DRlTV 3
Notice that only the first disk is used for reading and that you get a big performance increase with the FlashSystem 900.
Example 5-20 shows the iostat output before changing the primary disk. It shows hdisk1, which is a spinning disk. Example 5-21 on page 150 shows the result after changing the primary. You see that hdisk5, which is the FlashSystem disk, is used only for reading.
Output is shortened for clarity.
Example 5-20 The iostat command checks for preferred read on the spinning disk
#
# # the volume group has to be in syncd state
# # use dd command to read from the mirrored logical volume
# dd if=/dev/test_lv_1 of=/dev/zero bs=16k count=100000
 
# # execute iostat in another windows
# iostat -DRlTV 3
Disks:
--------------
hdisk1
Example 5-21 The iostat command checks for preferred read on the IBM FlashSystem 900
#
# # the volume group has to be in syncd state
# # use dd command to read from the mirrored logical volume
# dd if=/dev/test_lv_1 of=/dev/zero bs=16k count=100000
 
# # execute iostat in another windows
# iostat -DRlTV 3
 
Disks:
--------------
hdisk5
Setting the FC topology on the FlashSystem 900 and AIX
The IBM FlashSystem 900 can be directly attached to an AIX host without a switch. In this case, the FC ports of the FlashSystem 900 are changed to arbitrated loop (AL) topology. You can use the chportfc command to change port settings on the IBM FlashSystem 900. On the AIX system, the ports also needed to be changed to AL.
Example 5-22 shows changing two ports, fscsi0 and fscsi2, on the AIX system. This system has four FC ports. Ports 0 and 2 are directly attached to the FlashSystem 900 using AL and ports 1 and 3 are attached to a switch. The cfgmgr command detects the correct topology.
Example 5-22 Set the AIX port to arbitrated loop
# # before using these commands
# # you must first alter the port topology on the FlashSystem 900
# # all traffic has to be stopped before using this command
#
# # remove FC port fscsi0 and then configure it using cfgmgr
# rmdev -Rdl fscsi0
# cfgmgr -vl fcs0
 
# # remove FC port fscsi2 and then configure it using cfgmgr
# rmdev -Rdl fscsi2
# cfgmgr -vl fcs2
 
# check all 4 ports
# lsattr -El fscsi0 | grep attach
attach al
 
# lsattr -El fscsi1 | grep attach
attach switch
 
# lsattr -El fscsi2 | grep attach
attach al
 
# lsattr -El fscsi3 | grep attach
attach switch
 
Before you run the commands: The topology must be set on an attached system, switch, or storage device before you run these commands.
Preferred read with the IBM SAN Volume Controller
You can set up preferred read on the IBM FlashSystem 900 with the IBM SAN Volume Controller with only one mouse-click. In the IBM SAN Volume Controller GUI, go to the Volumes menu and right-click the FlashSystem 900 disk of the mirrored volume. Then, select Make Primary. Notice the start asterisk (*) next to your primary disk, which is now the preferred read disk.
Figure 5-27 shows the mirrored IBM SAN Volume Controller volume with preferred read on a spinning disk.
Figure 5-27 SAN Volume Controller mirrored VDisk with preferred read on spinning disk
Figure 5-28 shows the option menu to select the primary copy. The primary copy is identical to the preferred read disk.
Figure 5-28 SAN Volume Controller option Make Primary
Figure 5-29 shows the mirrored IBM SAN Volume Controller volume with preferred read using the FlashSystem 900.
Figure 5-29 SAN Volume Controller mirrored VDisk with preferred read on the FlashSystem 900
Preferred read with Oracle ASM
Oracle Automatic Storage Management (ASM) in Oracle 11g includes advanced features that can use the performance of the IBM FlashSystem 900. The features of Preferred Mirror Read and Fast Mirror Resync are the two most prominent features that fit in this category.
You can set up preferred read by using the following two features. You can get detailed information about these two features in the Oracle documentation:
Preferred Mirror Read
Fast Mirror Resync
You can read a detailed guide about Oracle ASM in the “Administering ASM Disk Groups” topic of the Database Storage Administrator’s Guide from the Oracle Help Center:
5.5.3 Linux configuration file multipath.conf example
The name and the value of several multipath.conf file attributes are changed from version 5 to version 6, or from version 6 to version 6.2. The example is based on Linux 6.2 and includes commented values for other versions. Check your Linux version to set the correct wwid values.
Example 5-23 shows a Linux multipath.conf file for the IBM FlashSystem 900.
 
Important: Check for the correct SCSI inquiry string. The actual string is as follows:
"FlashSystem-9840"
Example 5-23 IBM FlashSystem 840 multipath.conf file for Linux 6.2
# multipath.conf for Linux 5.x and Linux 6.x
#
# Always check for the correct parameter names, other versions may use a different name.
# Check the correct names and values for your Linux environment.
#
 
defaults {
udev_dir /dev
polling_interval 30
checker_timeout 10
}
 
blacklist {
wwid "*"
}
 
blacklist_exceptions {
wwid "36005076*"
}
 
devices {
device {
vendor "IBM"
product "FlashSystem-9840"
# path_selector "round-robin 0" # Linux 5, Linux 6
path_selector "queue-length 0" # Linux 6.2, if available
path_grouping_policy multibus
path_checker tur
rr_min_io_rq 4 # Linux 6.x
# rr_min_io 4 # Linux 5.x
rr_weight uniform
no_path_retry fail
failback immediate
dev_loss_tmo 300
fast_io_fail_tmo 25
}
}
 
multipaths {
# Change these example WWID's to match the FlashSystem LUN.s.
multipath {
wwid 360050768018e9fc15000000006000000
alias FlashSystem_900_6
}
multipath {
wwid 360050768018e9fc15000000007000000
alias FlashSystem_900_7
}
}
Example of Linux commands to configure the FlashSystem 900
Example 5-24 shows the commands and their results after attaching two 103 GB FlashSystem 900 volumes to the Linux 6.2 host. In this example, a 100 GB FlashSystem 820 LUN was already attached to the system, and the two FlashSystem 900 LUNs are new. The multipath.conf file of Example 5-23 on page 152 must be extended by entries for the IBM FlashSystem 900, which are shown in Example 5-26 on page 155.
Example 5-24 Commands to create the FlashSystem 840 devices
#
# # list current devices
# multipath -l
mpatha (1ATA SAMSUNG HE161HJ S209J90S) dm-4 ATA,SAMSUNG HE161HJ
size=149G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 0:0:0:0 sda 8:0 active undef running
FlashSystem_820_1 (20020c24001117377) dm-1 IBM,FlashSystem
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
|- 5:0:0:1 sdc 8:32 active undef running
`- 4:0:0:1 sdj 8:144 active undef running
 
# # check for new devices
# multipath
create: FlashSystem_900_7 (360050768018e9fc15000000007000000) undef IBM,FlashSystem-9840
size=103G features='0' hwhandler='0' wp=undef
`-+- policy='queue-length 0' prio=1 status=undef
|- 5:0:1:1 sde 8:64 undef ready running
|- 6:0:0:1 sdg 8:96 undef ready running
|- 7:0:0:1 sdi 8:128 undef ready running
`- 4:0:1:1 sdl 8:176 undef ready running
create: FlashSystem_900_6 (360050768018e9fc15000000006000000) undef IBM,FlashSystem-9840
size=103G features='0' hwhandler='0' wp=undef
`-+- policy='queue-length 0' prio=1 status=undef
|- 6:0:0:0 sdf 8:80 undef ready running
|- 5:0:1:0 sdd 8:48 undef ready running
|- 7:0:0:0 sdh 8:112 undef ready running
`- 4:0:1:0 sdk 8:160 undef ready running
 
# # list multipath devices
# multipath -l
FlashSystem_900_6 (360050768018e9fc15000000006000000) dm-2 IBM,FlashSystem-9840
size=103G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
|- 6:0:0:0 sdf 8:80 active undef running
|- 5:0:1:0 sdd 8:48 active undef running
|- 7:0:0:0 sdh 8:112 active undef running
`- 4:0:1:0 sdk 8:160 active undef running
mpatha (1ATA SAMSUNG HE161HJ S209J90S) dm-4 ATA,SAMSUNG HE161HJ
size=149G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 0:0:0:0 sda 8:0 active undef running
FlashSystem_820_1 (20020c24001117377) dm-1 IBM,FlashSystem
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
|- 5:0:0:1 sdc 8:32 active undef running
`- 4:0:0:1 sdj 8:144 active undef running
FlashSystem_900_7 (360050768018e9fc15000000007000000) dm-0 IBM,FlashSystem-9840
size=103G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
|- 5:0:1:1 sde 8:64 active undef running
|- 6:0:0:1 sdg 8:96 active undef running
|- 7:0:0:1 sdi 8:128 active undef running
`- 4:0:1:1 sdl 8:176 active undef running
 
# # list devices with ls
# ls -l /dev/mapper/
lrwxrwxrwx. 1 root root 7 Oct 16 09:30 FlashSystem_820_1 -> ../dm-1
lrwxrwxrwx. 1 root root 7 Oct 16 09:58 FlashSystem_900_6 -> ../dm-2
lrwxrwxrwx. 1 root root 7 Oct 16 09:58 FlashSystem_900_7 -> ../dm-0
Example 5-25 shows the creation of an aligned partition in Linux 6.2.
Example 5-25 Creating a Linux partition
#
# fdisk /dev/mapper/FlashSystem_900_6
 
The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.
 
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
 
Command (m for help): u
Changing display/entry units to sectors
 
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (63-216006655, default 1024): 128
Last sector, +sectors or +size{K,M,G} (128-216006655, default 216006655):
Using default value 216006655
 
 
Command (m for help): w
The partition table has been altered!
 
Calling ioctl() to reread partition table.
Syncing disks.
Using FlashSystem 820 and FlashSystem 900 with Linux client hosts
The FlashSystem 820 and IBM FlashSystem 900 have different product names that must be configured in the multipath.conf file. FlashSystem 840 and FlashSystem 900 have the same product string ("FlashSystem-9840") and use the same multipath.conf configuration. Example 5-26 shows the configuration file for the FlashSystem 820 and FlashSystem 900.
Example 5-26 Multipath.conf file extension for the FlashSystem 820 and FlashSystem 900
# multipath.conf for Linux 5.x and Linux 6.x
#
# Always check for the correct parametername, another version may use a different name.
# Check the correct names and values for your Linux environment
#
 
defaults {
udev_dir /dev
polling_interval 30
checker_timeout 10
}
 
blacklist {
wwid "*"
}
 
blacklist_exceptions {
wwid "36005076*" # FlashSystem 900
wwid "20020c24*" # FlashSystem 710/810/720/820
}
 
devices {
# FlashSystem 840 and 900
device {
vendor "IBM"
product "FlashSystem-9840"
# path_selector "round-robin 0" # Linux 5, Linux 6
path_selector "queue-length 0" # Linux 6.2, if available
path_grouping_policy multibus
path_checker tur
rr_min_io_rq 4 # Linux 6.x
# rr_min_io 4 # Linux 5.x
rr_weight uniform
no_path_retry fail
failback immediate
dev_loss_tmo 300
fast_io_fail_tmo 25
}
# FlashSystem 710/810/720/820
device {
vendor "IBM"
product "FlashSystem"
# path_selector "round-robin 0" # Linux 5, Linux 6
path_selector "queue-length 0" # Linux 6.2, if available
path_grouping_policy multibus
path_checker tur
#              rr_min_io_rq 1 # 6.x, FlashSystem 710/810
# rr_min_io_ 1 # 5.x, FlashSystem 710/810
rr_min_io_rq 4 # 6.x, FlashSystem 720/820
               rr_min_io_ 4 # 5.x, FlashSystem 720/820
rr_weight uniform
no_path_retry fail
failback immediate
dev_loss_tmo 300
fast_io_fail_tmo 25
}
}
 
multipaths {
 
# Change these example WWID's to match the FlashSystem LUN.s.
multipath {
wwid 360050768018e9fc15000000006000000
alias FlashSystem_900_6
}
multipath {
wwid 360050768018e9fc15000000007000000
alias FlashSystem_900_7
}
}
Linux tuning
The Linux kernel buffer file system writes data before it sends the data to the storage system. With the IBM FlashSystem 900, better performance can be achieved when the data is not buffered but is directly sent to the IBM FlashSystem 900. When setting the scheduling policy to no operation (NOOP), the fewest CPU instructions possible are used for each I/O. Setting the scheduler to NOOP gives the best write performance on Linux systems. You can use the following setting in most Linux distributions as a boot parameter:
elevator=noop
Current Linux devices are managed by the device manager Udev. You can define how Udev will manage devices by adding rules to the /etc/udev/rules.d directory.
Example 5-27 shows the rules for the IBM FlashSystem 900 with Linux 6. This udev rules file contains two lines originally. It is divided into multiple lines for better readability. If you use this file, be sure that each line starts with the keyword ACTION.
Example 5-27 Linux device rules Linux 6.x
#
cat 99-IBM-FlashSystem.rules
 
ACTION=="add|change", SUBSYSTEM=="block",ATTRS{device/model}=="FlashSystem-9840",
ATTR{queue/scheduler}="noop",ATTR{queue/rq_affinity}="1",
ATTR{queue/add_random}="0",ATTR{device/timeout}="5"
 
ACTION=="add|change", KERNEL=="dm-*",
PROGRAM="/bin/bash -c 'cat /sys/block/$name/slaves/*/device/model | grep FlashSystem-9840'",
ATTR{queue/scheduler}="noop",ATTR{queue/rq_affinity}="1",ATTR{queue/add_random}="0"
Example 5-28 shows the rule for the IBM FlashSystem 900 with Linux 5. This udev rules file contains one line originally. It is divided into multiple lines for better readability. If you use this file, be sure that the line starts with the keyword ACTION.
Example 5-28 Linux device rules Linux 5.x
#
cat 99-IBM-FlashSystem.rules
 
ACTION=="add|change", SUBSYSTEM=="block",SYSFS{model}=="FlashSystem-9840",
RUN+="/bin/sh -c 'echo noop > /sys/$DEVPATH/queue/scheduler'"
You can apply the new rules by using the commands in Example 5-29.
Example 5-29 Restarting udev rules
# linux 6.2
/sbin/udevadm control --reload-rules
/sbin/start_udev
5.5.4 Example of a VMware configuration
You can set the number of I/Os for each path on VMware with this command, which sets 10 I/Os for each path:
esxcli nmp roundrobin setconfig --device <device> --iops=10 --type "iops"
5.6 FlashSystem 900 and Easy Tier
You can implement the IBM FlashSystem 900 with IBM SAN Volume Controller Easy Tier. SAN Volume Controller Easy Tier automatically moves hot, frequently used data to the FlashSystem 900 and cold less frequently or never used data to the traditional disk system. An example of implementing SAN Volume Controller Easy Tier is shown in Figure 5-30 on page 158.
Figure 5-30 Easy Tier with SAN Volume Controller
For more details about the IBM FlashSystem 900 and SAN Volume Controller solution, see these sections:
5.7 Troubleshooting
Troubleshooting information for issues that you might encounter when configuring InfiniBand or creating file systems is described in this section.
5.7.1 Troubleshooting Linux InfiniBand configuration issues
The following list describes potential Linux configuration issues, troubleshooting guidance, and resolutions:
When you install OFED-X.X.X, an error occurs, which indicates that you failed to build the ofa_kernel RPM Package Manager (originally Red Hat Package Manager).
The kernel that is used by the server might not be supported by OpenFabrics Enterprise Distribution (OFED). If the Install All option was chosen, try the Customize option in the OFED installation menu and select only the components that are needed. If this process does not work, try installing a different version of OFED.
Loading the driver module fails.
The HCA used might not be supported by OFED, or the driver was not installed correctly. Obtain the latest drivers for the host channel adapters (HCAs) from the HCA vendor’s website.
When you try to install OFED, an error message is displayed:
"<module 1> is required to build <module 2>"
This error means that certain dependencies that are required by OFED are not installed on the server. You must install all of the required dependencies:
 – To search for the necessary RPM (if the yum package-management tool is available), enter this command:
# yum provides <dependency_name>
 – To install the RPM, enter this command:
# yum install <dependency_rpm>
 
Note: If yum is not installed on the server, each dependency must be manually downloaded and installed.
When you try to run the srp_daemon command, a message indicates that an operation failed. Check these items:
 – Make sure that the storage system is physically connected to the network and that all components are powered on.
 – Make sure that the correct cable is used and that OpenSM is running. To confirm whether OpenSM is running, enter this command:
# /etc/init.d/opensmd status
Loading the ib_srp module fails.
Verify that OFED is installed correctly and that the necessary device drivers are also installed. If a custom OFED installation was done, be sure that ibutils and all packages that are related to srp were selected.
5.7.2 Linux fdisk error message
You might receive the following messages when you create a partition in Linux:
"Re-reading the partitioning table failed"
Also, the corresponding device is not created in the /dev/mapper directory. Solve this problem by issuing the partprobe command, as depicted in Example 5-30.
Example 5-30 Solving the partition table failed with error 22
[root@localhost ~]# fdisk /dev/mapper/FlashSystem_900_3
 
<partition creation lines left out for clearity>
...
 
Command (m for help): w
The partition table has been altered!
 
Calling ioctl() to re-read partition table.
 
WARNING: Re-reading the partition table failed with error 22: Invalid argument.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
 
[root@localhost ~]# ls -l /dev/mapper/
lrwxrwxrwx. 1 root root 7 Oct 8 08:10 FlashSystem_900_2 -> ../dm-0
lrwxrwxrwx. 1 root root 7 Oct 8 09:20 FlashSystem_900_3 -> ../dm-2
 
[root@localhost ~]# partprobe
 
[root@localhost ~]# ls -l /dev/mapper/
lrwxrwxrwx. 1 root root 7 Oct 8 08:10 FlashSystem_900_2 -> ../dm-0
lrwxrwxrwx. 1 root root 7 Oct 8 09:20 FlashSystem_900_3 -> ../dm-2
brw-rw----. 1 root disk 253, 7 Oct 8 09:21 FlashSystem_900_3p1
You can search the /dev/mapper directory for newly generated partitions. After using the partprobe command, the new partition is generated.
5.7.3 Changing FC port properties
The FlashSystem 900 automatically detects the SAN topology and speed. If you want to set the speed, for example, 16 Gbps, or the topology, such as arbitrated loop or fabric explicitly, use the chportfc command. The lsportfc command lists the current settings.
 
Note: The FlashSystem 900 16 Gbps FC attachment uses fabric topology only.
5.7.4 Changing iSCSI port properties
The FlashSystem 900 uses 10 Gb Ethernet for iSCSI connections. Use the chportip command to set the IP address, netmask, and gateway values of the iSCSI ports. The lsportip command lists the current settings.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.57.172