Chapter 13. RAS, monitoring, and troubleshooting

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

RAS, monitoring, and troubleshooting

There are many ways to manage, monitor, and troubleshoot IBM Spectrum Virtualize. This chapter introduces useful, common procedures to maintain IBM Spectrum Virtualize. It includes the following topics:

•Reliability, availability, and serviceability

•Shutting down IBM SAN Volume Controller

•Configuration backup

•Software update

•Health Checker feature

•Troubleshooting and fix procedures

•Monitoring

•Audit log

•Collecting support information using the GUI and the CLI

•Service Assistant Tool

13.1 Reliability, availability, and serviceability

Reliability, availability, and serviceability (RAS) are important concepts in the design of the IBM Spectrum Virtualize system. Hardware features, software features, design considerations, and operational guidelines all contribute to make the system reliable.

Fault tolerance and high levels of availability are achieved by the following methods:

•The Redundant Array of Independent Disks (RAID) capabilities of the underlying disks

•IBM SAN Volume Controller nodes clustering by using a Compass architecture

•Auto-restart of hung nodes

•Integrated Battery Backup Units (BBU) to provide memory protection if there is a site power failure

•Host system failover capabilities using N-Port ID Virtualization (NPIV)

•Hot Spare Node option to provide complete node redundancy and failover

High levels of serviceability are available through the following methods:

•Cluster error logging

•Asynchronous error notification

•Automatic Dump capabilities to capture software detected issues

•Concurrent diagnostic procedures

•Directed Maintenance Procedures (DMP) with guided online replacement procedures

•Concurrent log analysis and memory dump data recovery tools

•Concurrent maintenance of all IBM SAN Volume Controller components

•Concurrent upgrade of IBM Spectrum Virtualize Software and firmware

•Concurrent addition or deletion of nodes in the clustered system

•Automatic software version leveling when replacing a node

•Detailed status and error conditions displayed by LED indicators

•Error and event notification through Simple Network Management Protocol (SNMP), syslog, and email

•Optional Remote Support Assistant

The heart of IBM Spectrum Virtualize system is one or more pairs of nodes. The nodes share the read and write data workload from the attached hosts and to the disk arrays. This section examines the RAS features of IBM SAN Volume Controller system, monitoring, and troubleshooting.

13.1.1 IBM SAN Volume Controller Node

IBM SAN Volume Controller nodes work as a redundant clustered system. Each IBM SAN Volume Controller node is an individual server within the clustered system on which the Spectrum Virtualize software runs.

IBM SAN Volume Controller nodes are always installed in pairs, forming an I/O group. A minimum of one pair and a maximum of four pairs of nodes constitute a clustered SVC system. Many of the components that make up IBM SAN Volume Controller nodes include light-emitting diodes (LEDs) that indicate status and the activity of that component.

Figure 13-1 shows the rear view, ports, and indicator lights on the IBM SAN Volume Controller node model 2145-SV1.

Figure 13-1 Rear ports and indicators of IBM SAN Volume Controller model 2145-SV1

Host interface cards

The Fibre Channel or 10 Gbps Ethernet adapters are installed horizontally on the mid and right side of the node. The 2145-SV1 can accommodate up to four quad-port 16-gigabit per second (Gbps) Fibre Channel (FC) cards, or one four-port 10-gigabit Ethernet (GbE) adapter stand alone or in combination with an FC card.

Table 13-1 lists the meaning of port LEDs for FC configuration.

Table 13-1 Fibre Channel link LED statuses

Port LED	Color	Meaning
Link status	Green	Link is up, connection established.
Speed	Amber	Link is not up or speed fault.

USB ports

Two active Universal Serial Bus (USB) connectors are available in the horizontal position. They have no numbers and no indicators are associated with them. These ports can be used for initial cluster setup, encryption key backup, and node status or log collection.

Ethernet and LED status

Three 10 Gbps Ethernet ports are side by side on the node. They are logically numbered as 1, 2, and 3 from left to right. Each port has two LEDs, and their status values are shown in Table 13-2. The fourth port is the Technician Port that is used for initial setup and can also be used for other service actions.

Table 13-2 Ethernet LED statuses

LED	Color	Meaning
Link state	Green	It is on when there is an Ethernet link.
Activity	Amber	It is flashing when there is activity on the link.

Serial-attached SCSI ports

Four 12 Gbps serial-attached SCSI (SAS) ports are side by side on the left side of the node, with indicator LEDs next to them. They are logically numbered 1, 2, 3 and 4, from left to right.

Each port is associated with one green and one amber LED indicating its status of the operation, as listed in Table 13-3.

Table 13-3 SAS LED statuses

LED	Meaning
Green	Link is connected and up.
Orange	Fault on the SAS link (disconnected, wrong speed, errors).

Node status LEDs

Five LEDs in a row in the upper right position of the IBM SAN Volume Controller node indicate the status and the functionality of the node. See Figure 13-2.

Figure 13-2 Rear view and indicators of IBM SAN Volume Controller model 2145-SV1

The next section explains the LED components and the condition that are associated with them.

Power LED

The Power LED has these statuses:

•Off: When the Power LED is off, the IBM SAN Volume Controller node has no power at the power supply or both power supplies have failed.

•On: When the Power LED is on, the IBM SAN Volume Controller node is on.

•Flashing: When the Power LED is flashing, the IBM SAN Volume Controller is off, but it has power at the power supplies.

Identify LED

The Identify LED has this status:

•This LED flashes if the identify feature is on. This function can be used to find a specific node at the data center.

Node Fault LED

The Node Fault LED has these statuses:

•Off: No fatal, critical, or warning events are shown in the IBM Spectrum Virtualize logs.

•On: When the node fault LED is on, the IBM SAN Volume Controller nodes indicate a fatal node error.

•Flashing: A warning or critical error is reported in the IBM Spectrum Virtualize logs.

Battery status LED

The Battery status LED has these statuses:

•Off: Hardened data will not be saved if there is a power loss or the IBM Spectrum Virtualize is not running.

•On: The battery charge level is sufficient for the hardened data to be saved twice if the power is lost for both nodes of the I/O group.

•Slow flash: The battery charge level is sufficient.

•Fast flash: The battery charge is too low or batteries are charging.

Boot drive activity LED

The boot drive activity LED has these statuses:

•Off: The drive is not ready for use.

•On: The drive is ready for use, but is not in use.

•Flashing: The drive is in use.

Boot drive status LED

The boot drive status LED has these statuses:

•Off: The drive is in good state or has no power.

•On: The drive is faulty.

•Flashing: The drive is being identified.

13.1.2 Dense Drawer Enclosures LED

As Figure 13-3 shows, two 12 Gbps SAS ports are side by side on the canister of every enclosure. They are numbered 1 on the right and 2 on the left. Each Dense Drawer has two canisters side by side.

Figure 13-3 Dense Drawer LEDs

The interpretation of SAS status LED indicators has the same meaning as the LED indicators of SAS ports in the IBM SAN Volume Controller node. Table 13-4 shows the LED status values of the expansion canister.

Table 13-4 Expansion canister LEDs statuses

Position	Color	Name	State	Meaning
Right	Green	Power	On	The canister is powered on.
			Off	No power available to the canister.
Middle	Green	Status	On	The canister is operating normally.
			Flashing	There is an error with the vital product date (VPD).
Left	Amber	Fault	On	There is an error logged against the canister or the system is not running.
			Flashing	Canister is being identified.
			Off	No fault, canister is operating normally.

13.1.3 Adding a node to an SVC system

Before you add a node to a system, ensure that you have configured the switch zoning so that the node you are adding is in the same zone as all other nodes in the system. If you are replacing a node and the switch is zoned by worldwide port name (WWPN) rather than by switch port, you must follow the service instructions carefully to continue to use the same WWPNs.

Complete the following steps to add a node to the SVC clustered system:

1. If the switch zoning is correct, you see the additional I/O Group as a gray empty frame on the Monitoring → System pane. Figure 13-4 shows this empty frame.

Figure 13-4 Available I/O Group or nodes on the System pane

2. Click the grayed out node representing an empty io_grp1 to open the Add Nodes window.

3. In the Add Nodes window (Figure 13-5), you see the available nodes, which are in candidate mode and able to join the cluster.

Figure 13-5 Adding Available nodes to a new I/O group

Important: You must have at least two nodes in an I/O Group.

4. Ensure that you have selected the correct available nodes to be added and click Finish. If the existing system has encryption enabled, you are prompted to enable encryption on the selected nodes. Encryption licenses need to be installed for all nodes in the system.

5. The system display, as shown in Figure 13-6, shows the new nodes in I/O group 1 where one node has completed adding while the other node is still adding. This process can take approximately 30 minutes as the system automatically updates the code level on the new node if it was not matching the system level.

Figure 13-6 Adding the node to the SVC Cluster

13.1.4 Removing a node from an SVC clustered system

From the System window, complete the following steps to remove a node:

1. Click the front panel of the node you want to remove.

2. From the enlarged view of the node, right-click the front panel as shown in Figure 13-7. Select Remove from the menu.

Figure 13-7 Remove a node from the SVC clustered system action

3. A Warning window shown in Figure 13-8 opens. Read the warnings before continuing by clicking Yes.

Figure 13-8 Warning window when you remove a node

Warning: By default, the cache is flushed before the node is deleted to prevent data loss if a failure occurs on the other node in the I/O Group.

In certain circumstances, such as when the node is already offline, you can remove the specified node immediately without flushing the cache or ensuring that data loss does not occur. Select Bypass check for volumes that will go offline, and remove the node immediately without flushing its cache.

4. Click Yes to confirm the removal of the node. See the System Details window to verify a node removal, as shown in Figure 13-9.

Figure 13-9 System Details pane with one SVC node removed

5. If this node is the last node in the system, the warning message shown in Figure 13-10 is displayed. Before you delete the last node in the system, ensure that you want to delete the system. The user interface and any open CLI sessions are lost.

Figure 13-10 Warning window for the removal of the last node in the cluster

6. After you click OK, the node becomes a candidate, ready to be added into an SVC cluster or create a new system.

13.1.5 Power

IBM SAN Volume Controller nodes and disk enclosures accommodate two power supply units (PSUs) for normal operation. For this reason, it is highly advised to supply AC power to each PSU from different Power Distribution Units (PDUs).

A fully charged battery is able to perform two fire hose dumps. It supports the power outage for 5 seconds before initiating safety procedures.

Figure 13-11 shows two PSUs present in the IBM SAN Volume Controller node. The controller PSU has two green and one amber indication LEDs reporting the status of the PSU.

Figure 13-11 IBM SAN Volume Controller PSU 1 and 2

Power supplies in both nodes, dense drawers and regular expansion enclosures are hot-swappable and replaceable without a need to shut down a node or cluster.

If the power is interrupted in one node for less than 5 seconds, the node or enclosure will not perform a fire hose dump, and continues operation from the battery. This feature is useful for a case of, for example, maintenance of UPS systems in the data center or replugging the power to a different power source or PDU unit. A fully charged battery is able to perform two fire hose dumps.

13.2 Shutting down IBM SAN Volume Controller

You can safely shut down an IBM SAN Volume Controller cluster by using either the GUI or the CLI.

Important: Never shut down your IBM SAN Volume Controller cluster by powering off the PSUs, removing both PSUs, or removing both power cables from the nodes. It can lead to inconsistency or loss of the data staged in the cache.

Before shutting down the cluster, stop all hosts that have allocated volumes from the device. This step can be skipped for hosts that have volumes that are also provisioned with mirroring (host-based mirror) from different storage devices. However, doing so incurs errors that are related to lost storage paths/disks on the host error log.

You can shut down a single node, or shut down the entire cluster. When you shut down only one node, all activities remain active. When you shut down the entire cluster, you need to power on the nodes locally to start the system again.

If all input power to the SVC clustered system is removed for more than a few minutes (for example, if the machine room power is shut down for maintenance), it is important that you shut down the SVC system before you remove the power.

Shutting down the system while it is still connected to the main power ensures that the internal node batteries are still fully charged when the power is restored.

If you remove the main power while the system is still running, the internal batteries detect the loss of power and start the node shutdown process. This shutdown can take several minutes to complete. Although the internal batteries have sufficient power to perform the shutdown, you will drain the nodes batteries unnecessarily.

When power is restored, the nodes start. However, if the nodes batteries have insufficient charge to survive another power failure, allowing the node to perform another clean shutdown, the node will enter service mode. You would not want the batteries to run out of power in the middle of the node’s shutdown.

It can take approximately 3 hours to charge the batteries sufficiently for a node to come online.

Important: When a node shuts down because of a power loss, the node dumps the cache to an internal Flash drive so that the cached data can be retrieved when the system starts again.

The SAN Volume Controller (SVC) internal batteries are designed to survive at least two power failures in a short time. After that time, the nodes will not come online until the batteries have sufficient power to survive another immediate power failure.

During maintenance activities, if the batteries detect power and then detect a loss of power multiple times (the nodes start and shut down more than once in a short time), you might discover that you have unknowingly drained the batteries. In this case, you must wait until they are charged sufficiently before the nodes start again.

To shut down your SVC system, complete the following steps:

1. From the Monitoring → System pane, Click Actions, as shown in Figure 13-12. Select Power Off System.

Figure 13-12 Action pane to power off the system

A confirmation window opens, as shown in Figure 13-13.

Figure 13-13 Confirmation window to confirm the shutdown of the system

2. Before you continue, ensure that you stopped all FlashCopy mappings, remote copy relationships, data migration operations, and forced deletions.

Attention: Pay special attention when encryption is enabled on some storage pools. You must have inserted a USB drive with the stored encryption keys or you must ensure that your IBM SAN Volume Controller is able to “talk” to SKLM server or clone servers to retrieve the encryptions keys. Otherwise, the data will not be readable after restart.

3. Enter the generated confirmation code and click OK to begin the shutdown process.

13.3 Configuration backup

You can download and save the configuration backup file using the IBM Spectrum Virtualize graphical user interface (GUI) or command-line interface (CLI). On an ad hoc basis, manually perform this procedure because it is able to save the file directly to your workstation. The command-line option requires login to the system and downloading the dumped file by using a specific Secure Copy Protocol (SCP). The command-line option is a good practice for an automated backup of the configuration.

Important: Generally, perform a daily backup of the IBM Spectrum Virtualize configuration backup file. The best approach is to automate this task. Always perform an additional backup before any critical maintenance task, such as an update of the Spectrum Virtualize software version.

The backup file is updated by the cluster every day. Saving it after any changes to your system configuration is also important. It contains configuration data of arrays, pools, volumes, and so on. The backup does not contain any data from the volumes.

To successfully perform the configuration backup, follow the prerequisites and requirements:

•All nodes must be online.

•No independent operations that change the configuration can be running in parallel.

•No object name can begin with an underscore.

Important: Ad hoc backup of configuration can be done only from the CLI using the svcconfig backup command. Then, the output of the command can be downloaded from the GUI.

13.3.1 Backup using the CLI

You can use CLI to trigger configuration backup either manually or by a regular automated process. The svcconfig backup command generates a new backup file. Triggering a backup by using the GUI is not possible. However, you can choose to save the automated 1AM cron backup if you have not made any configuration changes,

Example 13-1 shows output of the svcconfig backup command.

Example 13-1 Saving the configuration using the CLI

IBM_2145:SVC_ESC:superuser>svcconfig backup

................................................................................................................................................................................................................................................

CMMVC6155I SVCCONFIG processing completed successfully

IBM_2145:SVC_ESC:superuser>

The svcconfig backup command generates three files that provide information about the backup process and cluster configuration. These files are dumped into the /tmp directory on the configuration node. Use the lsdumps command to list them (Example 13-2).

Example 13-2 Listing the backup files in CLI

IBM_2145:ITSO DH8_B:superuser>lsdumps |grep backup

170 svc.config.backup.bak_KD8P1BP

265 svc.config.backup.xml_KD8P1BP

266 svc.config.backup.sh_KD8P1BP

267 svc.config.backup.log_KD8P1BP

IBM_2145:ITSO DH8_B:superuser>

Table 13-5 describes the three files that are created by the backup process.

Table 13-5 Files created by the backup process

File name	Description
svc.config.backup.xml	This file contains your cluster configuration data.
svc.config.backup.sh	This file contains the names of the commands that were issued to create the backup of the cluster.
svc.config.backup.log	This file contains details about the backup, including any error information that might have been reported.

Save the current backup to a secure and safe location. The files can be downloaded using UNIX scp or pscp for MS windows as shown in Example 13-3. Replace the IP address with the cluster IP address of your SVC and specify a local folder on your workstation. In this example, we are saving to C:SVCbackups.

Example 13-3 Saving config backup files to your workstation

C:>"Program Files (x86)PuTTYpscp" -unsafe [email protected]:/dumps/svc.config.backup.* SVCbackups.

Using keyboard-interactive authentication.

Password:

svc.config.backup.bak_KD8 | 173 kB | 173.8 kB/s | ETA: 00:00:00 | 100%

svc.config.backup.log_KD8 | 15 kB | 15.5 kB/s | ETA: 00:00:00 | 100%

svc.config.backup.sh_KD8P | 5 kB | 5.2 kB/s | ETA: 00:00:00 | 100%

svc.config.backup.xml_KD8 | 78 kB | 78.2 kB/s | ETA: 00:00:00 | 100%

C:>dir SVCbackups

Volume in drive C has no label.

Volume Serial Number is 1825-978F

Directory of C:SVCbackups

04/10/2017 10:21 AM <DIR> .

04/10/2017 10:21 AM <DIR> ..

04/10/2017 10:21 AM 177,999 svc.config.backup.bak_KD8P1BP

04/10/2017 10:21 AM 15,892 svc.config.backup.log_KD8P1BP

04/10/2017 10:21 AM 5,278 svc.config.backup.sh_KD8P1BP

04/10/2017 10:21 AM 80,091 svc.config.backup.xml_KD8P1BP

4 File(s) 279,260 bytes

2 Dir(s) 5,615,493,120 bytes free

C:>

The use of -unsafe option allows us to use the wildcard for downloading all the svc.config.backup files in a single command.

Tip: If you encounter Fatal: Received unexpected end-of-file from server when using the pscp command, consider upgrading your version of PuTTY.

13.3.2 Saving the backup using the GUI

Although it is not possible to generate an ad-hoc backup, you can save the backup files using the GUI. To do so, complete the following steps:

1. Navigate to Settings → Support → Support Package.

2. Click the Manual Upload Instructions twistie to expand it.

3. Click Download Support Package, as shown in Figure 13-14.

Figure 13-14 Download Support Package

The Download New Support Package or Log File window opens, as shown in Figure 13-15.

Figure 13-15 Download Existing Package

4. Click Download Existing Package to launch a list of files found on the config node. We filtered the view by clicking in the Filter box, entering backup, and pressing the Enter, as shown in Figure 13-16.

Figure 13-16 Filtering specific files for download

5. Select all of the files to include in the compressed file, then click Download. Depending on your browser preferences, you might be prompted where to save the file or it will download to your defined download directory.

13.4 Software update

This section describes the operations to update your IBM Spectrum Virtualize software release version 8.1.

The format for the software update package name ends in four positive integers that are separated by dots. For example, a software update package might have the following name:

IBM_2145_INSTALL_8.1.0.0

13.4.1 Precautions before the update

This section describes the precautions that you should take before you attempt an update.

Important: Before you attempt any IBM Spectrum Virtualize code update, read and understand the concurrent compatibility and code cross-reference matrix. For more information, see the following website and click Latest IBM Spectrum Virtualize code:

http://www.ibm.com/support/docview.wss?uid=ssg1S1001707

During the update, each node in your clustered system is automatically shut down and restarted by the update process. Because each node in an I/O Group provides an alternative path to volumes, use the Subsystem Device Driver (SDD) to make sure that all I/O paths between all hosts and storage area networks (SANs) work.

If you do not perform this check, certain hosts might lose connectivity to their volumes and experience I/O errors.

13.4.2 IBM Spectrum Virtualize update test utility

The software update test utility is a software instruction utility that checks for known issues that can cause problems during an IBM Spectrum Virtualize software update. More information about the utility is available on the following website:

http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S4000585

Download the software update utility from this page where you can also download the firmware. This procedure ensures that you get the current version of this utility. You can use the svcupgradetest utility to check for known issues that might cause problems during a software update.

The software update test utility can be downloaded in advance of the update process. Alternately, it can be downloaded and run directly during the software update, as guided by the update wizard.

You can run the utility multiple times on the same system to perform a readiness check in preparation for a software update. Run this utility for a final time immediately before you apply the IBM Spectrum Virtualize update to ensure that there were no new releases of the utility since it was originally downloaded.

The installation and use of this utility is nondisruptive, and does not require restart of any nodes. Therefore, there is no interruption to host I/O. The utility is only installed on the current configuration node.

System administrators must continue to check whether the version of code that they plan to install is the latest version. You can obtain the current information about the following website:

https://ibm.biz/BdjviZ

This utility is intended to supplement rather than duplicate the existing tests that are performed by the IBM Spectrum Virtualize update procedure (for example, checking for unfixed errors in the error log).

Concurrent software update of all components is supported through the standard Ethernet management interfaces. However, during the update process, most of the configuration tasks are restricted.

13.4.3 Update procedure to IBM Spectrum Virtualize V8.1

To update the IBM Spectrum Virtualize software, complete the following steps:

1. Open a supported web browser and navigate to your cluster IP address. A login window opens (Figure 13-17).

Figure 13-17 IBM SAN Volume Controller GUI login window

2. Log in with superuser rights. The IBM SVC management home window opens. Move the cursor over Settings and click System (Figure 13-18).

Figure 13-18 Settings menu

3. In the System menu, click Update System. The Update System window opens (Figure 13-19).

Figure 13-19 Update System window

4. From this window, you can select either to run the update test utility and continue with the code update or just run the test utility. For this example, we click Update and Test.

My Notifications: Use the My Notifications tool to receive notifications of new and updated support information to better maintain your system environment. This feature is especially useful in an environment where a direct Internet connection is not possible.

Go to the following address (an IBM account is required) and add your system to the notifications list to be advised of support information, and to download the current code to your workstation for later upload:

http://www.ibm.com/software/support/einfo.html

5. Because you have previously downloaded both files from https://ibm.biz/BdjviZ, you can click each yellow folder, browse to the location where you saved the files, and upload them to the SVC cluster. If the files are correct, the GUI detects and updates the target code level as shown in Figure 13-20.

Figure 13-20 Upload option for both Test utility and Update package

6. Select the type of update you want to perform, as shown in Figure 13-21. Select Automatic update unless IBM Support has suggested a Service Assistant Manual update. The manual update might be preferable in cases where misbehaving host multipathing is known to cause loss of access. Click Finish to begin the update package upload process.

Figure 13-21 Software update type selection

7. When updating from a V8.1 or later level, an additional window is displayed at this point allowing you to choose a fully automated update, one that pauses when half the nodes have completed update, or one that pauses after each node update, as shown in Figure 13-22. The pause options will require the Resume button to be clicked to continue the update after each pause. Click Finish.

Figure 13-22 New V8.1 update pause options

8. After the update packages have uploaded, the update test utility looks for any known issues that might affect a concurrent update of your system. The GUI helps identify any detected issues, as shown in Figure 13-23.

Figure 13-23 Issue detected

9. Click Go to Update System to return to the Update System window. Here, click Read more (Figure 13-24).

Figure 13-24 Issues detected by the update test utility

The results pane opens and shows you what issues were detected (Figure 13-25). In our case, the warning is that email notification (call home) is not enabled. Although this is not a recommended condition, it does not prevent the system update from running. Therefore, we can click Close and proceed with the update. However, you might need to contact IBM Support to assist with resolving more serious issues before continuing.

Figure 13-25 Description of the warning from the test utility

10. Click Resume on the Update System window and the update proceeds as shown in Figure 13-26.

Figure 13-26 Resuming the system update

11. Due to the utility detecting issues, another warning comes up to ensure that you have investigated them and are certain you want to proceed, as shown in Figure 13-27. When you are ready to proceed, click Yes.

Figure 13-27 Warning before you can continue

12. The system begins updating the IBM Spectrum Virtualize software by taking one node offline and installing the new code. This process takes approximately 20 minutes. After the node returns from the update, it is listed as complete as shown in Figure 13-28.

Figure 13-28 Update process starts

13. After a 30-minute pause, to ensure that multipathing has recovered on all attached hosts, a node failover occurs and you temporarily lose connection to the GUI. A warning window displays, prompting you to refresh the current session, as shown in Figure 13-29.

Tip: If you are updating from V7.8 or later code, the 30-minute wait period can be adjusted by using the applysoftware CLI command with the -delay (mins) parameter to begin the update instead of using the GUI.

Figure 13-29 Node failover

You now see the new V8.1 GUI and the status of the second node updating in Figure 13-30.

Figure 13-30 New GUI after node failover

After the second node completes, the update is committed to the system, as shown in Figure 13-31.

Figure 13-31 Updating system level

14. The update process completes when all nodes and the system are committed. The final status indicates the new level of code that is installed in the system.

Note: If your nodes have greater than 64 GB of memory before updating to V8.1, each node will post an 841 error after the update completes. Because V8.1 allocates memory differently, the memory must be accepted by running the fix procedure for the event or issue the CLI command svctask chnodehw <id> for each node. See the SAN Volume Controller IBM Knowledge Center for more information:

https://ibm.biz/BdjmK3

13.4.4 Updating Spectrum Virtualize with a Hot Spare Node

Spectrum Virtualize V8.1 introduces a new optional feature of Hot Spare Nodes. This feature allows for Spectrum Virtualize software updates to minimize any performance impact and removes any redundancy risk during the update. It does so by automatically swapping in a Hot Spare Node after one minute, to temporarily replace the node currently updating. After the original node has updated, the hot spare node becomes spare again, ready for the next node to update. The original node rejoins the cluster.

To use this feature, the spare node must be either a DH8 or SV1 node type and have the same amount of memory and matching FC port configuration as the other nodes in the cluster. Up to four hot spare nodes can be added to a cluster and must be zoned as part of the SVC Cluster. Figure 13-32 shows how the GUI displays the hot spare node online while the original cluster node is offline for update.

Figure 13-32 Online Hot Spare Node GUI view

13.4.5 Updating IBM SAN Volume Controller internal drives code

After completing the Spectrum Virtualize software update detailed in 13.4, “Software update” on page 705, update the firmware of any SAN Volume Controller drive expansions that you have. The upgrade test utility has already identified that there are downlevel drives in the system, as shown in Figure 13-33. However, this fact does not stop the system software from being performed.

Figure 13-33 Upgrade test utility drive firmware warning

To update the IBM SAN Volume Controller internal drives code, complete these steps:

1. Download the latest Drive firmware package for IBM SAN Volume Controller from this website:

https://www.ibm.com/support/docview.wss?uid=ssg1S1003843

2. On the Spectrum Virtualize GUI, navigate to Pools Internal Storage and select All Internal.

3. Click Actions and select Upgrade all, as shown in Figure 13-34.

Figure 13-34 Upgrade all internal drives

Tip: The Upgrade all drives action will only display if you have not selected any individual drive in the list. If you have clicked an individual drive in the list, the action gives you individual drive actions. Selecting Upgrade only upgrades that drives firmware. You can clear an individual drive by holding the control (CTRL) key and clicking the drive again.

4. The Upgrade all drives window opens, as shown in Figure 13-35. Click the small folder at the right end of the Upgrade package entry box to navigate to where you saved the downloaded file in step 1. Click Upgrade to upload the firmware package and begin upgrading any drives that are downlevel. Do not tick the box to install firmware even if the drive is running a newer level. Only, only do this under guidance from IBM Support.

Figure 13-35 Select the Drive Upgrade package

Note: The system upgrades any member drives one at a time. Although the firmware upgrades are concurrent, it does cause a brief reset to the drive. However, the RAID technology allows the SAN Volume Controller to ride through this brief interruption. After a drive completes its update, there is a calculated wait time before the next drive update begins. This delay is to ensure that the previous drive is stable after upgrading and can vary on system load.

5. With the drive upgrades running, you can view the progress by clicking the Tasks icon and clicking View for the Drive Upgrade running task, as shown in Figure 13-36.

Figure 13-36 Selecting Drive upgrade running Task view

The Drive upgrade running task panel is displayed, listing drives pending upgrade and an estimated time of completion, as shown in Figure 13-37.

Figure 13-37 Drive upgrade progress

6. You can also view each drives firmware level from the Pools Internal Storage All Internal window by enabling the drive firmware option after right-clicking in the column header line, as shown in Figure 13-38.

Figure 13-38 Viewing Drive firmware levels

With the Firmware level column enabled, you can see the current level of each drive, as shown in Figure 13-39.

Figure 13-39 Drive firmware display

13.4.6 Updating the IBM SAN Volume Controller system manually

This example assumes that you have an 8-node cluster of the IBM SAN Volume Controller cluster, as illustrated in Table 13-6.

Table 13-6 The iogrp

iogrp (0)	iogrp (1)	iogrp (2)	iogrp (3)
node 1 (config node)	node 3	node 5	node 7
node 2	node 4	node 6	node 8

After uploading the update utility test and Software update package to the cluster using PSCP, and running the utility test, complete the following steps:

1. Start by removing node 2, which is the partner node of the configuration node in iogrp 0, using either the cluster GUI or CLI.

2. Log in to the service GUI to verify that the removed node is in candidate status.

3. Select the candidate node and click Update Manually from the left pane.

4. Browse and locate the code that you already downloaded and saved to your PC.

5. Upload the code and click Update.

When the update is completed, a message caption indicating software update completion displays. The node then reboots, and appears again in the service GUI after approximately 20 - 25 minutes in candidate status.

6. Select the node and verify that it is updated to the new code.

7. Add the node back by using either the cluster GUI or the CLI.

8. Select node 3 from iogrp1.

9. Repeat steps 1 - 7 to remove node 3, update it manually, verify the code, and add it back to the cluster.

10. Proceed to node 5 in iogrp 2.

11. Repeat steps 1 - 7 to remove node 5, update it manually, verify the code, and add it back to the cluster.

12. Move on to node 7 in iogrp 3.

13. Repeat steps 1 - 7 to remove node 5, update it manually, verify the code, and add it back to the cluster.

Note: At this point, the update is 50% completed. You now have one node from each iogrp updated with the new code manually. Always leave the configuration node for last during a manual Spectrum Control Software update.

14. Next, select node 4 from iogrp 1.

15. Repeat steps 1 - 7 to remove node 4, update it manually, verify the code, and add it back to the cluster.

16. Again, select node 6 from iogrp 2.

17. Repeat steps 1 - 7 to remove node 6, update it manually, verify the code, and add it back to the cluster.

18. Next, select node 8 in iogrp 3.

19. Repeat steps 1 - 7 to remove node 8, update it manually, verify the code, and add it back to the cluster.

20. Lastly, select and remove node 1, which is the configuration node in iogrp 0.

Note: A partner node becomes the configuration node because the original config node is removed from the cluster, keeping the cluster manageable.

The removed configuration node becomes candidate, and you do not have to apply the code update manually. Simply add the node back to the cluster. It automatically updates itself and then adds itself back to the cluster with the new code.

21. After all the nodes are updated, you must confirm the update to complete the process. The confirmation restarts each node in order, which takes about 30 minutes to complete.

The update is complete.

13.5 Health Checker feature

The Spectrum Control health checker feature runs in the IBM Cloud. Based on the weekly call home inventory reporting, it proactively creates recommendations. These recommendations are provided on the IBM Call Home Web, which is found at ibm.com. Click Support → My support → Call Home Web (Figure 13-40).

Figure 13-40 Call Home Web on ibm.com

For a video guide on how to setup and use IBM Call Home Web, see:

https://www.youtube.com/watch?v=7G9rqk8NXPA

Another feature is Critical Fix Notification function, which enables IBM to warn Spectrum Virtualize users that a critical issue exists in the level of code that they are using. The system notifies users when they log on to the GUI using a web browser connected to the internet.

Consider the following information about this function:

•It warns users only about critical fixes, and does not warn them that they are running a previous version of the software.

•It works only if the browser also has access to the internet. The IBM Storwize V7000 and IBM SAN Volume Controller systems themselves do not need to be connected to the internet.

•The function cannot be disabled. Each time that is displays a warning, it must be acknowledged (with the option to not warn the user again for that issue).

The decision about what a critical fix is subjective and requires judgment, which is exercised by the development team. As a result, clients might still encounter bugs in code that were not deemed critical. They should continue to review information about new code levels to determine whether they should update even without a critical fix notification.

Important: Inventory notification must be enabled and operational for these features to work. It is strongly preferred to enable Call Home and Inventory reporting on your Spectrum Virtualize clusters.

13.6 Troubleshooting and fix procedures

The management GUI of IBM Spectrum Virtualize is a browser-based GUI for configuring and managing all aspects of your system. It provides extensive facilities to help troubleshoot and correct problems. This section explains how to effectively use its features to avoid service disruption of your IBM SAN Volume Controller.

Figure 13-41 shows the Monitoring menu for System information, viewing Events, or seeing real-time Performance statistics.

Figure 13-41 Monitoring options

Use the management GUI to manage and service your system. Select Monitoring → Events to list events that should be addressed and maintenance procedures that walk you through the process of correcting problems. Information in the Events window can be filtered in three ways:

•Recommended Actions

Shows only the alerts that require attention. Alerts are listed in priority order and should be resolved sequentially by using the available fix procedures. For each problem that is selected, you can do these tasks:

– Run a fix procedure

– View the properties

•Unfixed Messages and Alerts

Displays only the alerts and messages that are not fixed. For each entry that is selected, you can perform these tasks:

– Run a fix procedure

– Mark an event as fixed

– Filter the entries to show them by specific minutes, hours, or dates

– Reset the date filter

– View the properties

•Show All

Displays all event types, whether they are fixed or unfixed. For each entry that is selected, you can perform these tasks:

– Run a fix procedure

– Mark an event as fixed

– Filter the entries to show them by specific minutes, hours, or dates

– Reset the date filter

– View the properties

Some events require a certain number of occurrences in 25 hours before they are displayed as unfixed. If they do not reach this threshold in 25 hours, they are flagged as expired. Monitoring events are below the coalesce threshold, and are usually transient.

Important: The management GUI is the primary tool that is used to operate and service your system. Real-time monitoring should be established by using SNMP traps, email notifications, or syslog messaging in an automatic manner.

13.6.1 Managing event log

Regularly check the status of the system using the management GUI. If you suspect a problem, first use the management GUI to diagnose and resolve the problem.

Use the views that are available in the management GUI to verify the status of the system, the hardware devices, the physical storage, and the available volumes by completing these steps:

1. Click Monitoring → Events to see all problems that exist on the system (Figure 13-42).

Figure 13-42 Messages in the event log

2. Select Show All → Recommended Actions to display the most important events to be resolved (Figure 13-43). The Recommended Actions tab shows the highest priority maintenance procedure that must be run. Use the troubleshooting wizard so that IBM SAN Volume Controller can determine the proper order of maintenance procedures.

Figure 13-43 Recommended Actions

In this example, the number of device logins reduced is listed (service error code 1630). Review the physical FC cabling to determine the issue and then click Run Fix. At any time and from any GUI window, you can directly go to this menu by using the Alerts icon at the top of the GUI (Figure 13-44).

Figure 13-44 Status alerts

13.6.2 Running a fix procedure

If there is an error code for the alert, you should run the fix procedure to assist you in resolving the problem. These fix procedures analyze the system and provide more information about the problem. They suggest actions to take and walk you through the actions that automatically manage the system where necessary, while ensuring availability. Finally, they verify that the problem is resolved.

If an error is reported, always use the fix procedures from the management GUI to resolve the problem. Always use the fix procedures for both software configuration problems and hardware failures. The fix procedures analyze the system to ensure that the required changes will not cause volumes to become inaccessible to the hosts. The fix procedures automatically perform configuration changes that are required to return the system to its optimum state.

The fix procedure displays information that is relevant to the problem, and provides various options to correct the problem. Where possible, the fix procedure runs the commands that are required to reconfigure the system.

Note: After V7.4, you are no longer required to run the fix procedure for a failed internal enclosure drive. Hot plugging of a replacement drive will automatically trigger the validation processes.

The fix procedure also checks that any other existing problem will not result in volume access being lost. For example, if a power supply unit in a node enclosure must be replaced, the fix procedure checks and warns you if the integrated battery in the other power supply unit is not sufficiently charged to protect the system.

Hint: Always use the Run Fix button, which resolves the most serious issues first. Often, other alerts will be corrected automatically because they were the result of a more serious issue.

The following example demonstrates how to clear the error that is related to malfunctioning FC connectivity:

1. From the dynamic menu (the icons on the left), click Monitoring → Events, and then focus on the errors with the highest priority first. List only the recommended actions by selecting the filters in the Actions menu (Figure 13-45). Click Run Fix.

Figure 13-45 Initiate Run Fix procedure from the management GUI

2. The pop-up window prompts you to indicate whether the issue was caused by a planned change or maintenance task, or whether it appeared in an unexpected manner (Figure 13-46).

Figure 13-46 Determination of planned action

3. If you answer Yes, the fix procedure finishes, assuming the changes in the system were done on purpose and no other action is necessary. However, our example simulates a failed SFP in the SAN switch and we continue the fix procedure. Select No and click Next.

4. In the next window (Figure 13-47), the Spectrum Virtualize GUI lists suggested actions and which components must be checked to fix and resolve the error. When you are sure that all possible technical requirements are met (in our case, we replaced a failed SFP in the SAN switch), click Next.

Figure 13-47 Verification steps to eliminate single point of failure

The discovery of managed disks starts (Figure 13-48).

Figure 13-48 Starting the discovery of managed disks

If no other important issue exists, discovery should finish within 2 minutes, depending on the number of enclosures and installed disk drives (Figure 13-49).

Figure 13-49 Discovery complete

5. An event has been marked as fixed, and you can safely finish the fix procedure. Click Close and the event is removed from the list of events (Figure 13-50).

Figure 13-50 Correctly finished fix procedure

13.6.3 Resolve alerts in a timely manner

To minimize any impact to your host systems, always perform the recommended actions as quickly as possible after a problem is reported. Your system is designed to be resilient to most single hardware failures. However, if it operates for any period with a hardware failure, the possibility increases that a second hardware failure can result in volume data that is unavailable. If several unfixed alerts exist, fixing any one alert might become more difficult because of the effects of the others.

13.6.4 Event log details

Multiple views of the events and recommended actions are available. The GUI works like a typical Microsoft Windows pop-up menu, so the event log grid is manipulated through the row that contains the column headings (Figure 13-51). When you click the column icon at the right end of the table heading, a menu for the column choices opens.

Figure 13-51 Grid options of the event log

Select or remove columns as needed. You can then also extend or shrink the width of the column to fit your screen resolution and size. This is the way to manipulate it for most grids in the management GUI of IBM Spectrum Virtualize, not just the events pane.

Every field of the event log is available as a column in the event log grid. Several fields are useful when you work with IBM Support. The preferred method in this case is to use the Show All filter, with events sorted by time stamp. All fields have the sequence number, event count, and the fixed state. Using Restore Default View sets the grid back to the defaults.

You might want to see more details about each critical event. Some details are not shown in the main grid. To access properties and sense data of a specific event, double-click the specific event anywhere in its row.

The properties window opens (Figure 13-52) with all the relevant sense data. This data includes the first and last time of an event occurrence, WWPN, and worldwide node name (WWNN), enabled or disabled automatic fix, and so on.

Figure 13-52 Event sense data and properties

For more information about troubleshooting options, see the IBM SAN Volume Controller Troubleshooting section in IBM Knowledge Center, which is available at:

https://ibm.biz/Bdjmgz

13.7 Monitoring

An important step is to correct any issues that are reported by your IBM SAN Volume Controller as soon as possible. Configure your system to send automatic notifications when a new event is reported. To avoid having to monitor the management GUI for new events, select the type of event for which you want to be notified. For example, restrict notifications to just events that require action. Several event notification mechanisms exist:

Email An event notification can be sent to one or more email addresses. This mechanism notifies individuals of problems. Individuals can receive notifications wherever they have email access, including mobile devices.

SNMP An SNMP traps report can be sent to a data center management system, such as IBM Systems Director, that consolidates SNMP reports from multiple systems. With this mechanism, you can monitor your data center from a single workstation.

Syslog A syslog report can be sent to a data center management system that consolidates syslog reports from multiple systems. With this option, you can monitor your data center from a single location.

If your system is within warranty or if you have a hardware maintenance agreement, configure your IBM SAN Volume Controller cluster to send email events directly to IBM if an issue that requires hardware replacement is detected. This mechanism is known as Call Home. When this event is received, IBM automatically opens a problem ticket and, if appropriate, contacts you to help resolve the reported problem.

Important: If you set up Call Home to IBM, ensure that the contact details that you configure are correct and kept up to date. Personnel changes can cause delays in IBM making contact.

13.7.1 Email notifications and the Call Home function

The Call Home function of IBM Spectrum Virtualize uses the email notification being sent to the specific IBM Support center. Therefore, the configuration is similar to sending emails to the specific person or system owner. The following procedure summarizes how to configure email notifications and emphasizes what is specific to Call Home:

1. Prepare your contact information that you want to use for the email notification and verify the accuracy of the data. From the GUI menu, click Settings → Notifications (Figure 13-53).

Figure 13-53 Notifications menu

2. Select Email and then click Enable Notifications (Figure 13-54).

For the correct functionality of email notifications, ask your network administrator if Simple Mail Transfer Protocol (SMTP) is enabled on the management network and is not blocked by firewalls. Also, ensure that the foreign destination “@de.ibm.com” is not black listed.

Be sure to test the accessibility to the SMTP server using the telnet command (port 25 for a non-secured connection, port 465 for Secure Sockets Layer (SSL)-encrypted communication) using any server in the same network segment.

Figure 13-54 Configuration of email notifications

3. After clicking Next on the welcome window, provide the information about the location of the system (Figure 13-55) and contact information of IBM SAN Volume Controller administrator (Figure 13-56 on page 731) to be contactable by IBM Support. Always keep this information current.

Figure 13-55 Location of the device

Figure 13-56 shows the contact information of the owner.

Figure 13-56 Contact information

4. Configure the IP address of your company SMTP server, as shown in Figure 13-57. When the correct SMTP server is provided, you can test the connectivity using Ping to its IP address. You can configure additional SMTP servers by clicking the + at the end of the entry line. When you are done, click Apply and Next.

Figure 13-57 Configure email servers and inventory reporting

5. A summary window is displayed. Verify all of the information, then click Finish. You are then returned to the Email Settings window where you can verify email addresses of IBM Support ([email protected]) and optionally add local users who also need to receive notifications. See Figure 13-58 for details.

The default support email address [email protected] is predefined by the system to receive Error Events and Inventory. Do not change these settings.

You can modify or add local users using Edit mode after the initial configuration is saved.

The Inventory Reporting function is enabled by default for Call Home. Rather than reporting a problem, an email is sent to IBM that describes your system hardware and critical configuration information. Object names and other information, such as IP addresses, are not included. By default the inventory email is sent weekly, allowing an IBM Cloud service to analyze and inform you if the hardware or software that you are using requires an update because of any known issue as detailed in 13.5, “Health Checker feature” on page 719.

Figure 13-58 shows the configured email notification and Call Home settings.

Figure 13-58 Setting email recipients and alert types

6. After completing the configuration wizard, test the email function. To do so, enter Edit mode, as illustrated in Figure 13-59. In the same window, you can define additional email recipients or alter any contact and location details as needed.

Figure 13-59 Entering edit mode

We strongly suggest that you keep the sending inventory option enabled to IBM Support. However, it might not be of interest to local users, although inventory content can serve as a basis for inventory and asset management.

7. In Edit mode, you can change any of the previously configured settings. After you are finished editing these parameters, adding more recipients, or just testing the connection, save the configuration to make the changes take effect (Figure 13-60).

Figure 13-60 Saving modified configuration

Note: The Test button will appear for new email users after first saving and then editing again.

13.7.2 Disabling and enabling notifications

At any time, you can temporarily or permanently disable email notifications, as shown in Figure 13-61. This is good practice when performing activities in your environment that might generate errors on your IBM Spectrum Virtualize, such as SAN reconfiguration/replacement activities. After the planned activities, remember to re-enable the email notification function. The same results can be achieved with the CLI svctask stopmail and svctask startmail commands.

Figure 13-61 Disabling or enabling email notifications

13.7.3 Remote Support Assistance

Remote Support Assistance, introduced with V8.1, allows IBM Support to remotely connect to the SAN Volume Controller through a secure tunnel to perform analysis, log collection, and software updates. The tunnel can be enabled ad hoc by the client or enable a permanent connection if wanted.

Note: Clients who have purchased Enterprise Class Support (ECS) are entitled to IBM support using Remote Support Assistance to quickly connect and diagnose problems. However, IBM Support might choose to use this feature on non-ECS systems at their discretion. Therefore, configure and test the connection on all systems.

If you are enabling Remote Support Assistance, then ensure that the following prerequisites are met:

1. Ensure that call home is configured with a valid email server.

2. Ensure that a valid service IP address is configured on each node on the Spectrum Virtualize.

3. If your SAN Volume Controller is behind a firewall or if you want to route traffic from multiple storage systems to the same place, you must configure a Remote Support Proxy server. Before you configure remote support assistance, the proxy server must be installed and configured separately. During the setup for support assistance, specify the IP address and the port number for the proxy server on the Remote Support Centers window.

4. If you do not have firewall restrictions and the SAN Volume Controller nodes are directly connected to the internet, request your network administrator to allow connections to 129.33.206.139 and 204.146.30.139 on Port 22.

5. Both uploading support packages and downloading software require direct connections to the internet. A DNS server must be defined on your SAN Volume Controller for both of these functions to work.

6. To ensure that support packages are uploaded correctly, configure the firewall to allow connections to the following IP addresses on port 443: 129.42.56.189, 129.42.54.189, and 129.42.60.189.

7. To ensure that software is downloaded correctly, configure the firewall to allow connections to the following IP addresses on port 22: 170.225.15.105,170.225.15.104, 170.225.15.107, 129.35.224.105, 129.35.224.104, and 129.35.224.107.

Figure 13-62 shows a pop-up window that appears in the GUI after updating to V8.1. It prompt you to configure your SAN Volume Controller for Remote Support. You can select to not enable it, open a tunnel when needed, or to open a permanent tunnel to IBM.

Figure 13-62 Prompt to configure Remote Support Assistance

You can choose to configure SAN Volume Controller, learn some more about the feature, or just close the window by clicking the X. Figure 13-63 shows how you can find the Setup Remote Support Assistance if you have closed the window.

Figure 13-63 Remote Support Assistance menu

Choosing to set up support assistance opens a wizard to guide you through the configuration:

1. Figure 13-64 shows the first wizard window. Choose not to enable remote assistance by selecting I want support personnel to work on-site only or enable remote assistance by selecting I want support personnel to access my system both on-site and remotely. Click Next.

Note: Selecting I want support personnel to work on-site only does not entitle you to expect IBM support to attend on-site for all issues. Most maintenance contracts are for customer-replaceable units (CRU) support, where IBM diagnoses your problem and send a replacement component for you to replace if required. If you prefer to have IBM perform replacement tasks for you, then contact your local sales person to investigate an upgrade to your current maintenance contract.

Figure 13-64 Remote Support wizard enable or disable

2. The next window, shown in Figure 13-65, lists the IBM Support center’s IP addresses and SSH port that will need to be open in your firewall. You can also define a Remote Support Assistance Proxy if you have multiple Storwize V7000 or SAN Volume Controllers in the data center, allowing for firewall configuration only being required for the Proxy Server rather than every storage system. We do not have a proxy server and leave the field blank, then click Next.

Figure 13-65 Remote Support wizard proxy setup

3. The next window asks if you want to open a tunnel to IBM permanently, allowing IBM to connect to your Storwize V7000 At Any Time, or On Permission Only, as shown in Figure 13-66. On Permission Only requires a storage administrator to log on to the GUI and enable the tunnel when required. Click Finish.

Figure 13-66 Remote Support wizard access choice

4. After completing the remote support setup, you can view the status of any remote connection, start a new session, test the connection to IBM, and reconfigure the setup. In Figure 13-67, we have successfully tested the connection. Click Start New Session to open a tunnel for IBM Support to connect through.

Figure 13-67 Remote Support Status and session management

5. A pop-up window asks how long you would like the tunnel to remain open if there is no activity by setting a timeout value. As shown in Figure 13-68, the connection is established and waits for IBM Support to connect.

Figure 13-68 Remote Assistance tunnel connected

13.7.4 SNMP Configuration

SNMP is a standard protocol for managing networks and exchanging messages. The system can send SNMP messages that notify personnel about an event. You can use an SNMP manager to view the SNMP messages that are sent by the SVC.

You can configure an SNMP server to receive various informational, error, or warning notifications by entering the following information (Figure 13-69 on page 740):

•IP Address

The address for the SNMP server.

•Server Port

The remote port number for the SNMP server. The remote port number must be a value of 1 - 65535 where the default is port 162 for SNMP.

•Community

The SNMP community is the name of the group to which devices and management stations that run SNMP belong. Typically, the default of public is used.

•Event Notifications:

Consider the following points about event notifications:

– Select Error if you want the user to receive messages about problems, such as hardware failures, that require prompt action.

Important: Browse to Recommended Actions to run the fix procedures on these notifications.

– Select Warning if you want the user to receive messages about problems and unexpected conditions. Investigate the cause immediately to determine any corrective action such as a space efficient volume running out of space.

Important: Browse to Recommended Actions to run the fix procedures on these notifications.

– Select Info if you want the user to receive messages about expected events. No action is required for these events.

Figure 13-69 SNMP configuration

To add an SNMP server, click Actions → Add and fill out the Add SNMP Server window, as shown in Figure 13-70. To remove an SNMP server, click the line with the server you want to remove, and select Actions → Remove.

Figure 13-70 Add SNMP Server

13.7.5 Syslog notifications

The syslog protocol is a standard protocol for forwarding log messages from a sender to a receiver on an IP network. The IP network can be IPv4 or IPv6. The system can send syslog messages that notify personnel about an event.

You can configure a syslog server to receive log messages from various systems and store them in a central repository by entering the following information (Figure 13-71):

•IP Address

The IP address for the syslog server.

•Facility

The facility determines the format for the syslog messages. The facility can be used to determine the source of the message.

•Message Format

The message format depends on the facility. The system can transmit syslog messages in the following formats:

– The concise message format provides standard detail about the event.

– The expanded format provides more details about the event.

•Event Notifications

Consider the following points about event notifications:

– Select Error if you want the user to receive messages about problems, such as hardware failures, that must be resolved immediately.

Important: Browse to Recommended Actions to run the fix procedures on these notifications.

– Select Warning if you want the user to receive messages about problems and unexpected conditions. Investigate the cause immediately to determine whether any corrective action is necessary.

Important: Browse to Recommended Actions to run the fix procedures on these notifications.

– Select Info if you want the user to receive messages about expected events. No action is required for these events.

Figure 13-71 Syslog configuration

To remove a syslog server, click the Minus sign (-).

To add another syslog server, click the Plus sign (+).

The syslog messages are sent in concise message format or expanded message format depending on the Facility level chosen.

Example 13-4 shows a compact format syslog message.

Example 13-4 Compact syslog message example

IBM2145 #NotificationType=Error #ErrorID=077001 #ErrorCode=1070 #Description=Node

CPU fan failed #ClusterName=SVCCluster1 #Timestamp=Wed Jul 02 08:35:00 2017 BST

#ObjectType=Node #ObjectName=Node1 #CopyID=0 #ErrorSequenceNumber=100

Example 13-5 shows an expanded format syslog message.

Example 13-5 Full format syslog message example

IBM2145 #NotificationType=Error #ErrorID=077001 #ErrorCode=1070 #Description=Node

CPU fan failed #ClusterName=SVCCluster1 #Timestamp=Wed Jul 02 08:35:00 2017 BST

#ObjectType=Node #ObjectName=Node1 #CopyID=0 #ErrorSequenceNumber=100 #ObjectID=2

#NodeID=2 #MachineType=2145SV1#SerialNumber=1234567 #SoftwareVersion=8.1.0.0

(build 13.4.1709291021000)#FRU=fan 01EJ378, system board 01EJ381#AdditionalData(0->63)=0000000021000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000#AdditionalData(64-127)=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

13.8 Audit log

The audit log is useful when analyzing past configuration events, especially when trying to determine, for example, how a volume ended up being shared by two hosts, or why the volume was overwritten. The audit log is also included in the svc_snap support data to aid in problem determination.

The audit log tracks action commands that are issued through a Secure Shell (SSH) session, through the management GUI or Remote Support Assistance. It provides the following entries:

•Identity of the user who issued the action command

•Name of the actionable command

•Time stamp of when the actionable command was issued on the configuration node

•Parameters that were issued with the actionable command

The following items are not documented in the audit log:

•Commands that fail are not logged.

•A result code of 0 (success) or 1 (success in progress) is not logged.

•Result object ID of node type (for the addnode command) is not logged.

•Views are not logged.

Several specific service commands are not included in the audit log:

•dumpconfig

•cpdumps

•cleardumps

•finderr

•dumperrlog

•dumpintervallog

•svcservicetak dumperrlog

•svcservicetask finderr

Figure 13-72 shows the access to the audit log. Click Audit Log in the left menu to see which configuration CLI commands have been run on IBM SAN Volume Controller system.

Figure 13-72 Audit Log from Access menu

Figure 13-73 shows an example of the audit log after creating a FlashCopy volume, with a command highlighted. The Running Tasks button is available at the top of the window in the status pane. If you click that button, the progress of the currently running tasks can be displayed by clicking the associated View button.

Figure 13-73 Audit log

Changing the view of the Audit Log grid is also possible by right-clicking column headings (Figure 13-74). The grid layout and sorting is completely under the user’s control. Therefore, you can view everything in the audit log, sort different columns, and reset the default grid preferences.

Figure 13-74 Right-click audit log column headings

13.9 Collecting support information using the GUI and the CLI

Occasionally, if you have a problem and call the IBM Support Center, they will most likely ask you to provide a support package. You can collect and upload this package from the Settings Support menu.

13.9.1 Collecting information using the GUI

To collect information using the GUI, complete the following steps:

1. Click Settings → Support and the Support Package tab (Figure 13-75).

2. Click the Upload Support Package button.

Figure 13-75 Support Package option

Assuming that the problem encountered was an unexpected node restart that has logged a 2030 error, collect the default logs plus the most recent statesave from each node to capture the most relevant data for support.

Note: When a node unexpectedly reboots, it first dumps its current statesave information before it restarts to recover from an error condition. This statesave is critical for support to analyze what happened. Collecting a snap type 4 creates new statesaves at the time of the collection, which is not useful for understanding the restart event.

3. The Upload Support Package window provides four options for data collection. If you have been contacted by IBM Support due to your system calling home or you have manually opened a call with IBM Support, you will have been given a PMR number. Enter that PMR number into the PMR field and select the snap type, often referred to as an option 1, 2, 3, 4 snap, as requested by IBM Support (Figure 13-76). In our case, we enter our PMR number, select snap type 3 (option 3) because this will automatically collect the statesave created at the time the node restarted, and click Upload.

Tip: You can use https://www.ibm.com/support/servicerequest to open a service request online.

Figure 13-76 Upload Support Package window

4. The procedure to create the snap on an IBM SAN Volume Controller system, including the latest statesave from each node, begins. This process might take a few minutes (Figure 13-77).

Figure 13-77 Task detail window

13.9.2 Collecting logs using the CLI

The CLI can be used to collect and upload a support package as requested by IBM Support by performing the following steps:

1. Log in to the CLI and issue the svc_snap command that matches the type of snap requested by IBM Support:

– Standard logs (type 1):

svc_snap upload pmr=ppppp,bbb,ccc gui1

– Standard logs plus one existing statesave (type 2):

svc_snap upload pmr=ppppp,bbb,ccc gui2

– Standard logs plus most recent statesave from each node (type 3):

svc_snap upload pmr=ppppp,bbb,ccc gui3

– Standard logs plus new statesaves:

svc_livedump -nodes all -yes
svc_snap upload pmr=ppppp,bbb,ccc gui3

2. We collect the type 3 (option 3) and have it automatically upload to the PMR number provided by IBM Support, as shown in Example 13-6.

Example 13-6 The svc_snap command

ssh [email protected]

Password:

IBM_2145:ITSO DH8_B:superuser>>svc_snap upload pmr=04923,215,616 gui3

3. If you do not want to automatically upload the snap to IBM, do not specify the upload pmr=ppppp,bbb,ccc part of the commands. When the snap creation completes, it creates a file named using this format:

/dumps/snap.<panel_id>.YYMMDD.hhmmss.tgz

It takes a few minutes for the snap file to complete, and longer if including statesaves.

4. The generated file can then be retrieved from the GUI under Settings → Support → Manual Upload Instructions twisty → Download Support Package and then click Download Existing Package, as shown in Figure 13-78.

Figure 13-78 Downloaded Existing Package

5. Click in the Filter box and enter snap to see a list of snap files, as shown in Figure 13-79. Locate the exact name of the snap generated by the svc_snap command issued earlier, select that file, and then click Download.

Figure 13-79 Filtering on snap to download

6. Save the file to a folder of your choice on your workstation.

13.9.3 Uploading files to the Support Center

If you have chosen to not have the Storwize V7000 upload the support package automatically, it can still be uploaded for analysis from the Enhanced Customer Data Repository (ECuRep). Any uploads should be associated with a specific problem management report (PMR). The PMR is also known as a service request and is a mandatory requirement when uploading.

To upload information, use the following procedure:

1. Using a web browser, navigate to ECuRep:

https://www.secure.ecurep.ibm.com/app/upload

This link takes you to the Secure Upload page (Figure 13-80).

Figure 13-80 ECuRep details

2. Complete the required fields:

– PMR number (mandatory) as provided by IBM Support for your specific case. This number should be in the format of ppppp,bbb,ccc, for example, 04923,215,616, using a comma (,) as a separator.

– Upload is for (mandatory). Select Hardware from the menu.

– Email address (not mandatory). Input your email address in this field to be automatically notified of a successful or unsuccessful upload.

3. When the form is completed, click Continue to open the input window (Figure 13-81).

Figure 13-81 ECuRep File upload

4. Select one or more files, click Upload to continue, and follow the directions.

13.10 Service Assistant Tool

The Service Assistant Tool (SAT) is a web-based GUI that is used to service individual node canisters, primarily when a node has a fault and is in a service state. A node is not an active part of a clustered system while it is in service state.

Typically, an IBM Spectrum Virtualize cluster is initially configured with the following IP addresses:

•One service IP address for each IBM SAN Volume Controller node.

•One cluster management IP address, which is set when the cluster is created.

The SAT is available even when the management GUI is not accessible. The following information and tasks can be accomplished with the Service Assistance Tool:

•Status information about the connections and the IBM SAN Volume Controller nodes

•Basic configuration information, such as configuring IP addresses

•Service tasks, such as restarting the Common Information Model (CIM) object manager (CIMOM) and updating the WWNN

•Details about node error codes

•Details about the hardware such as IP address and Media Access Control (MAC) addresses.

The SAT GUI is available by using a service assistant IP address that is configured on each SAN Volume Controller node. It can also be accessed through the cluster IP addresses by appending /service to the cluster management IP.

If the clustered system is down, the only method of communicating with the nodes is through the SAT IP address directly. Each node can have a single service IP address on Ethernet port 1 and should be configured on all nodes of the cluster, including any Hot Spare Nodes.

To open the SAT GUI, enter one of the following URLs into any web browser:

•http(s)://<cluster IP address of your cluster>/service

•http(s)://<service IP address of a node>/service

Complete the following steps to access the SAT:

1. When you are accessing SAT by using <cluster IP address>/service, the configuration node canister SAT GUI login window opens. Enter the Superuser Password, as shown in Figure 13-82.

Figure 13-82 Service Assistant Tool Login GUI

2. After you are logged in, you see the Service Assistant Home window, as shown in Figure 13-83. The SAT can view the status and run service actions on other nodes, in addition to the node that the user is logged in to.

Figure 13-83 Service Assistant Tool GUI

3. The current selected SAN Volume Controller node is displayed in the upper left corner of the GUI. In Figure 13-83, this is node ID 1. Select the node that you want in the Change Node section of the window. You see the details in the upper left change to reflect the selected node.

Note: The SAT GUI provides access to service procedures and shows the status of the nodes. It is advised that these procedures should only be carried out if directed to do so by IBM Support.

For more information about how to use the SAT, see the following website:

https://ibm.biz/BdjKXu

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 13. RAS, monitoring, and troubleshooting

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 13. RAS, monitoring, and troubleshooting