Troubleshooting iSCSi virtualization
This chapter is focused on troubleshooting problems with Internet Small Computer System (iSCSI) targets. For an introduction to troubleshooting and the applicable data collection procedures for the SAN Volume Controller and IBM Storwize products, see Chapter 10, “Troubleshooting” on page 175.
This chapter describes the following topics:
16.1 Troubleshooting iSCSI target discovery
This section describes the basic troubleshooting methodology for iSCSI target discovery issues.
16.1.1 Problems with initial discovery
This section details the troubleshooting process for iSCSI target discovery.
A failed discovery for iSCSI storage indicates that using the detectiscsistorageport resulted in a failure to communicate with the specified IP address on the interfaces that are specified in the command. Therefore, the first item to investigate is why the Storwize initiator cannot access the target.
The lsiscsistorageportcadidate command displays the result of the last run detectiscsistorageportcandidate command. This information can be useful in troubleshooting discovery issues.
Assume that you have a two-I/O group cluster in a standard topology. Initiate a cluster-level discovery operation on port 1 for each node in this cluster to target IP address 192.168.70.121. After running this command, the lsiscsstorageportcandidate command output for iogroup_list is 1:0:-:-, as shown in Example 16-1.
Example 16-1 The lsiscsstorageportcandidate command output
IBM_2145:Redbooks_cluster1:superuser>lsiscsistorageportcandidate
 
id src_port_id target_ipv4 target_ipv6 target_iscsiname iogroup_list configured status
0  1           192.168.70.121             iqn.2005-10.com.xivstorage:041529   1:0:-:- no         partial
By viewing this output, you know that I/O group 0 successfully discovered the storage port on both nodes in that group. However, I/O group 1 failed to discover either one or both of the nodes in this I/O group. To ensure a successful discovery, you must discover why port 1 on the node or nodes in I/O group 1 cannot contact the IP address 192.168.70.121. To troubleshoot this type of issue, complete the following steps:
1. Ensure that the required interfaces are all online and are connected.
2. Validate the connectivity between the SAN Volume Controller or IBM Storwize system and the target storage controller. Connectivity can be validated by using the ping command.
3. Validate that the CHAP authentication parameters are correct.
4. Ensure that firewall port 3260 is enabled between the devices.
5. Validate that the target storage controller is supported for iSCSI virtualization.
16.1.2 Problems adding a storage port
If you run detectiscsistorageportcandidate and wait for an extended period, there might have been changes that were made to the back-end storage controller. If changes were made since the last time that you ran detectiscsstorageportcandidate, you have stale data that is no longer valid in the output of lsiscsistorageportcandidate. If you run addiscsistorageport against this stale data, then the specified storage port candidate is added to the persistent storage port table, and the SAN Volume Controller or IBM Storwize system attempts to connect to that storage port, and might fail to create an iSCSI session.
To rediscover targets immediately before attempting to add a storage port and to validate that the connections are as expected, run lsiscsistorageport.
16.2 Troubleshooting a degraded or offline status
This section focuses on troubleshooting post-configuration issues with target storage controllers, including degraded and offline conditions.
16.2.1 Restoring an offline MDisk or storage controller
A storage port is marked offline by the IBM Storwize cluster if the target controller fails to respond to a heartbeat that is sent by the SAN Volume Controller or IBM Storwize system within 5 seconds. This is typically caused by a connectivity issue between the back-end storage controller and the SAN Volume Controller or IBM Storwize system. To troubleshoot this situation, complete the following steps:
1. Validate that all the required ports are connected.
2. Validate that the SAN Volume Controller or IBM Storwize system can communicate with the target storage controller.
3. Validate that port 3260 is open between the SAN Volume Controller or IBM Storwize device and the target storage controller.
4. Validate that no changes to the target controller were made that affect its availability to be virtualized.
The SAN Volume Controller or IBM Storwize software automatically retries the paths every few seconds to recover the storage port after it is available again. However, you can also force a rediscovery with the detectmdisk command.
16.2.2 Restoring degraded MDisks or storage controllers
This section describes how to restore degraded MDisks or storage controllers.
Remote port that is excluded (error code 1230)
If the system detects an excessive number of I/O errors when accessing an MDisk from a particular storage port, the system might exclude this path and generate error code 1230, which indicates that there is an issue in the communication between the SAN Volume Controller or IBM Storwize system and the back-end storage controller. To troubleshoot this type of issue, complete the following steps:
1. Ensure that no changes were made to the back-end storage controller that affect the SAN Volume Controller or IBM Storwize system’s ability to use the managed resources.
2. Check the health of the SAN Volume Controller or IBM Storwize interfaces. Because a port is excluded and not offline, all the ports are probably connected, so you must rely on interface statistics to assist you with your troubleshooting. To learn how to review the interface statistics on the SAN Volume Controller or IBM Storwize device, see 10.2.4, “Ethernet logs and statistics on IBM Storwize nodes” on page 193.
3. If the SAN Volume Controller or IBM Storwize device’s interfaces do not appear to be the problem, then you should open an investigation into your networking devices and back-end storage controller.
After the communication issue is resolved, force a retry of the path or paths by either running the directed maintenance procedure (DMP) or by using the following command:
includemdisk <mdisk name/id>
For more information about this command, see IBM Knowledge Center.
Single path failure of a redundant controller
If a path or storage port to a target controller goes offline but other paths or storage ports remain online, then the controller and associated MDisks are likely degraded. Resolve this condition by completing the following steps:
1. Validate that all the required ports are connected.
2. Validate that the SAN Volume Controller or IBM Storwize system can communicate with the target storage controller.
3. Validate that port 3260 is open between the SAN Volume Controller or IBM Storwize device and the target storage controller.
4. Validate that no changes to the target controller were made that affects its ability to be virtualized.
16.3 Performance issues
If the SAN Volume Controller or Storwize initiator system is reporting performance problems from the back-end storage system, follow a similar troubleshooting method for any other performance problem, except that in this case the SAN Volume Controller or IBM Storwize device is the host system. Section 10.4.7, “Problem determination: Checking for performance problems” on page 207 might be useful in determining the source of a performance issue. When the performance problem appears to be coming from a controller that is virtualized over iSCSI, complete the following steps:
1. Ensure that the MTU size is set correctly on all devices.
2. Ensure that the port speeds match on every device in the path and on the end points, including the storage controllers, switches, and routers.
3. Review the SAN Volume Controller or IBM Storwize system performance data to try and identify the bottleneck:
a. Use IBM Spectrum Control if possible to review data over long periods.
b. If IBM Spectrum Control is not available, review the data, as shown in 8.2.2, “Real-time performance monitoring with the GUI” on page 163.
4. Review the interface counters in the Ethernet trace files that are found in the snap file of the SAN Volume Controller or Storwize system, as shown in 10.2.5, “iSCSI logs on IBM Storwize nodes” on page 195.
5. Review the switch statistics to see whether the network is causing the delay.
6. Review the back-end storage controller to see whether this device is having performance problems. If the back-end storage controller is having performance problems, then this situation is noticed by the SAN Volume Controller or IBM Storwize device.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.151.21