Chapter 12. RAS, monitoring, and troubleshooting

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

RAS, monitoring, and troubleshooting

This chapter describes the reliability, availability, and serviceability (RAS) features and ways to monitor and troubleshoot the IBM Storwize V5000 Gen2.

Specifically, this chapter provides information about the following topics:

•Reliability, availability, and serviceability features

•System components

•Configuration backup

•System update

•Monitoring

•Audit log

•Event log

•Support Assistant

•Collecting support information

•Powering off the system and shutting down the infrastructure

12.1 Reliability, availability, and serviceability features

This section describes the reliability, availability, and serviceability (RAS) features of the IBM Storwize V5000 Gen2, as well as monitoring and troubleshooting. RAS features are important concepts in the design of the IBM Storwize V5000 Gen2. Hardware and software features, design considerations, and operational guidelines all contribute to make the IBM Storwize V5000 Gen2 reliable.

Fault tolerance and a high level of availability are achieved with the following features:

•The RAID capabilities of the underlying disk subsystems

•The software architecture that is used by the IBM Storwize V5000 Gen2 nodes

•Auto-restart of nodes that are stopped

•Battery units to provide cache memory protection in a site power failure

•Host system multipathing and failover support

High levels of serviceability are achieved with the following features:

•Cluster error logging

•Asynchronous error notification

•Dump capabilities to capture software-detected failures

•Concurrent diagnostic procedures

•Directed maintenance procedures

•Concurrent log analysis and memory dump data recovery tools

•Concurrent maintenance of all of the IBM Storwize V5000 Gen2 components

•Concurrent upgrade of IBM Storwize V5000 Gen2 software and microcode of drives

•Concurrent addition or deletion of a node canister in a cluster

•Software recovery through the Service Assistant Tool

•Automatic software version correction when a node is replaced

•Detailed status and error conditions that are displayed through the Service Assistant Tool

•Error and event notification through Simple Network Management Protocol (SNMP), syslog, and email

•Access to the Service Assistant Tool through the tech port for network connection problems

•Remote support personnel is able to access the system to complete troubleshooting and maintenance tasks

At the core of the IBM Storwize V5000 Gen2 is a redundant pair of node canisters. The two canisters share the load of transmitting and receiving data between the attached hosts and the disk arrays.

12.2 System components

This section describes each of the components that make up the IBM Storwize V5000 Gen2 system. The components are described in terms of location, function, and serviceability.

12.2.1 Enclosure midplane

The enclosure midplane connects the node or expansion canisters to the power supply units and to the drives. The midplane is part of the enclosure midplane assembly, which consists of the midplane and the front section of the enclosure.

During the basic system configuration, vital product data (VPD) is written to the enclosure midplane. On a control enclosure midplane, the VPD contains information, such as worldwide node name (WWNN) 1, WWNN 2, machine type and model, machine part number, and serial number. On an expansion enclosure midplane, the VPD contains information, such as machine type and model, machine part number, and serial number.

The enclosure midplane is initially generic and it is configured as a control enclosure midplane or expansion enclosure midplane only when the VPD is written. After the VPD is written, a control enclosure midplane is no longer interchangeable with an expansion enclosure midplane and vice versa.

Important: The enclosure midplane must be replaced only by a trained IBM service support representative (SSR).

For information about the midplane replacement process, see the IBM Storwize V5000 Gen2 Knowledge Center at:

https://ibm.biz/Bdjmbn

For a complete overview of maintenance tasks see:

http://www.ibm.com/support/knowledgecenter/STHGUJ/welcome?lang=en

12.2.2 Node canisters

Two node canister slots are on the top of the unit. The left slot is canister 1, and the right slot is canister 2.

Figure 12-1 shows the rear view of a fully equipped control enclosure.

Figure 12-1 Rear view of a control enclosure with two node canisters (the Storwize V5020)

USB ports

Each node canister has one USB port. The location of the port is the same on every model, and no indicators are associated with it.

Figure 12-2 shows the location of the USB port.

Figure 12-2 Node canister USB port (the Storwize V5020)

The USB flash drive is not required to initialize the system configuration. However, it can be used for other functions. Using the USB flash drive is required in the following situations:

•When you cannot connect to a node canister in a control enclosure by using the service assistant or the technician port, and you want to see the status of the node or re-enable the technician port.

•When you do not know, or cannot use, the service IP address for the node canister in the control enclosure and must set the address.

•When you have forgotten the superuser password and must reset the password.

Ethernet ports

The Storwize V5010 and Storwize V5020 node canisters have two 100/1000 Mbps Ethernet ports. Both ports can be used for management, Internet Small Computer System Interface (iSCSI) traffic, and Internet Protocol (IP) replication. Additionally, port 2 can be used as a technician port (the white box with “T” in the center of the box) for system initialization and servicing. After initialization the technician port will be disabled. It is possible to reactivate the technician port later again via CLI commands.

Figure 12-3 shows the Ethernet ports on the Storwize V5010.

Figure 12-3 Storwize V5010 Ethernet ports

Figure 12-4 shows the Ethernet ports on the Storwize V5020.

Figure 12-4 Storwize V5020 Ethernet ports

Each Storwize V5030 node canister has two 1/10 Gbps Ethernet ports and one Ethernet technician port. Port 1 and 2 can be used for management, iSCSI traffic, and IP replication. Port T can be used as a technician port for system initialization and service only.

Figure 12-5 shows the Ethernet ports on the Storwize V5030.

Figure 12-5 Storwize V5030 Ethernet ports

Each port has two LEDs that display the status of its activity. Their meanings are shown in Table 12-1.

Table 12-1 Ethernet port status LEDs

Name and position	Color	State	Meaning
Activity (left)	Green	Flashing	The link is active.
Activity (left)	Green	Off	The link is inactive.
Link speed (right)	Green	Solid	A connection exists to a remote device at 1 Gbps or more.
Link speed (right)	Green	Off	No connection exists to a remote device, or the link is connected at less than 1 Gbps.

Serial-attached SCSI ports

Each Storwize V5010 node canister uses one 12 Gbps serial-attached SCSI (SAS) port to connect optional expansion enclosures. This port does not support host attachment.

Figure 12-6 shows the SAS ports on the Storwize V5010.

Figure 12-6 Storwize V5010 SAS ports

Each Storwize V5020 node canister has three 12 Gbps SAS ports. Port 1 can be used to connect optional expansion enclosures, and ports 2 and 3 can be used for host attachment.

Figure 12-7 shows the SAS ports on the Storwize V5020.

Figure 12-7 Storwize V5020 SAS ports

Each Storwize V5030 node canister has two 12 Gbps SAS ports to connect optional expansion enclosures. This port does not support host attachment.

Figure 12-8 shows the SAS ports on the Storwize V5030.

Figure 12-8 Storwize 5030 SAS ports

Each port has two LEDs that display the status of its activity. Their meanings are shown in Table 12-2.

Table 12-2 SAS port status LEDs

Name and position	Color	State	Meaning
Fault (left)	Amber	Solid	•One of the following conditions has occurred: •One or more, but not all, of the 4 lanes are up. (If no lanes are up. the activity light will be off.) •One or more of the lanes is running at a different speed to the others. •One or more of the up lanes are attached to a different address to the others. •An unsupported device is plugged into this SAS port.
Fault (left)	Amber	Off	No fault exists. All four lanes (phys) have a connection.
Link (right)	Green	Solid	A connection exists on at least one lane (phy).
Link (right)	Green	Off	None of the SAS connections are working.

Battery status

Each node canister houses a battery, the status of which is displayed by two LEDs on the back of the unit, as shown in Figure 12-9.

Figure 12-9 Battery status LEDs (the Storwize V5020)

The meaning of each LED is described in Table 12-3.

Table 12-3 Battery status LEDs

Name and position	Color	State and Meaning
Battery status (left)	Green	•FAST BLINK The battery is charging. It does not have a sufficient charge to perform a “fire hose” dump. •BLINK The battery has sufficient charge to perform one “fire hose” dump. •ON The battery is fully charged and has sufficient charge to perform two “fire hose” dumps. •OFF The battery is not available for use.
Fault (right)	Amber	•OFF If the LED is off, no known conditions are preventing normal operation, unless the battery status LED is also on. •ON An active condition or fault could compromise normal operation. •SLOW BLINK There is a non-critical fault with the battery.
Battery in use	Green	•OFF The battery is not being used to power the canister. •FAST BLINK The battery is currently providing power for a “fire hose” dump.

Canister status

The status of each canister is displayed by three LEDs on the back of the unit, as shown in Figure 12-10.

Figure 12-10 Node canister status LEDs (the Storwize V5020)

The meaning of each LED is described in Table 12-4.

Table 12-4 Canister status LEDs

Name and position	Color	State and Meaning
Power (left)	Green	•OFF No power is available or power is coming from the battery. •SLOW BLINK Power is available but the main CPU is not running; the system is in standby mode. •FAST BLINK System is in self test. •ON Power is available and the system code is running.
Status (middle)	Green	•OFF Indicates one of the following conditions: – No power to the canister – Canister is in standby mode or self test – Operating system is loading •BLINK The canister is in candidate or service state. It is not performing I/O. It is safe to remove the node. •BLINK FAST The canister is carrying out a fire hose dump. •ON The canister is active, able to perform I/O, or starting. The system is part of a cluster.
Canister Fault (right)	Amber	•OFF The node is in candidate or active state. Any error that has been detected is not severe enough to stop the node participating in a cluster or performing I/O. •BLINK The canister is being identified. There might or might not be a fault condition. •ON The node is in service state or an error exists that might be stopping the system code from starting (node error 550). The node canister cannot become active in the system until the problem is resolved. The problem is not necessarily related to a hardware component.

Replaceable components

The IBM Storwize V5000 Gen2 node canister contains the following field-replaceable (client-replaceable) components:

•Host Interface Card

•Memory

•Battery

Figure 12-11 shows the location of these parts within the node canister.

Figure 12-11 Node canister client-replaceable components

Note: Because these components are inside the node canister, their replacement leads to a redundancy loss until the replacement is complete.

Host Interface Card replacement procedure

For information about the Host Interface Card (HIC) replacement process, see the IBM Storwize V5000 Gen2 Knowledge Center at this website:

https://ibm.biz/Bdjmpg

Figure 12-12 shows a HIC replacement.

Figure 12-12 HIC replacement

Memory replacement procedure

For information about the memory replacement process, see the IBM Storwize V5000 Gen2 Knowledge Center at this website:

https://ibm.biz/BdjmpJ

Figure 12-13 shows the location of the memory modules.

Figure 12-13 Location of memory modules

Figure 12-14 shows a memory replacement.

Note: The memory modules do not stand up. They lie in a cascading fashion.

Figure 12-14 Memory replacement

Battery Backup Unit replacement procedure

Attention: The battery is a lithium ion battery. To avoid a possible explosion, do not incinerate the battery. Exchange the battery only with the part that is approved by IBM.

Because the Battery Backup Unit (BBU) replacement leads to a redundancy loss until the replacement is complete, we advise that you replace the BBU only when you are instructed to replace it. We advise you to follow the Directed Maintenance Procedure (DMP).

During the procedure, while you lift and lower the battery, grasp the blue handle on each end of the battery and keep the battery parallel to the canister system board, as shown in Figure 12-15.

Figure 12-15 BBU replacement

Important: During the replacement, the battery must be kept parallel to the canister system board while the battery is removed or replaced. Keep equal force, or pressure, on each end.

For more information about the BBU replacement process, see the IBM Knowledge Center at this website:

https://ibm.biz/Bdjmp3

More replacement procedures can be found on this website:

https://ibm.biz/BdjmpT

12.2.3 Expansion canisters

Two expansion canister slots are on the top of the unit. As with the control enclosure, the left slot is canister 1 and the right slot is canister 2.

Figure 12-16 shows the rear view of a fully equipped expansion enclosure.

Figure 12-16 Rear view of an expansion enclosure with two expansion canisters

SAS ports

SAS ports are used to connect the expansion canister to the node canister or to an extra expansion canister in the chain. Figure 12-17 shows the SAS ports that are on the expansion canister.

Figure 12-17 Expansion canister SAS ports

Each port has two LEDs that display the status of its activity. Their meanings are shown in Table 12-5.

Table 12-5 SAS port status LEDs

Name and position	Color	State	Meaning
Fault (left)	Amber	Solid	One of the following errors exists: •Only 1, 2, or 3 lanes (phys) have a connection. •Not all of the lanes (phys) that have a connection are running at the same speed. •Not all of the lanes (phys) that have a connection are attached to the same address. •An unsupported device is connected to the port.
Fault (left)	Amber	Off	No fault exists. All four lanes (phys) have a connection.
Link (right)	Green	Solid	A connection exists on at least one lane (phy).
Link (right)	Green	Off	No connection exists on any lane (phy).

Canister status

The status of each expansion canister is displayed by three LEDs on the back of the unit, as shown in Figure 12-18.

Figure 12-18 Enclosure canister status LEDs

The meaning of each LED is described in Table 12-6.

Table 12-6 Expansion canister status LEDs

Name and position	Color	State	Meaning
Power (left)	Green	Solid	The canister is receiving power.
Power (left)	Green	Off	No power is available, or the power is coming from the battery.
Status (middle)	Green	Solid	The canister is running normally.
		Blinking	The canister is unable to read data from the midplane.
		Off	The system is off, in standby, or running a self-test, or the operating system is loading.
Fault (right)	Amber	Solid	A fault requires part replacement, or the canister is still starting.
		Blinking	The canister is being identified. A fault might or might not exist.
		Off	The canister has no faults that require part replacement.

12.2.4 Disk subsystem

This section describes the parts of the IBM Storwize V5000 Gen2 disk subsystem, which is made up of control and expansion enclosures.

The Storwize V5010 and Storwize V5020 can have one control enclosure. The Storwize V5030 can consist of 1 or 2 control enclosures.

Each Storwize V5010 and Storwize V5020 control enclosure can attach up to 10 expansion enclosures. Each Storwize V5030 control enclosure can attach up to 20 expansion enclosures.

SAS cabling

Expansion enclosures are attached to control enclosures and between each other by using SAS cables.

A set of correctly interconnected enclosures is called a chain. Each chain is made up of two strands. A strand runs through the canisters that are in the same position in each enclosure in the chain. Canister 1 of an enclosure is cabled to canister 1 of the downstream enclosure. Canister 2 of an enclosure is cabled to canister 2 of the downstream enclosure.

Each strand consists of 4 phys, and each phy operates at 12 Gbps, therefore a strand has a usable speed of 48 Gbps.

A strand starts with a SAS initiator chip inside an IBM Storwize V5000 Gen2 node canister and progresses through SAS expanders, which connect to the disk drives. Each canister contains an expander. Each drive has two ports, each of which is connected to a different expander and strand. This configuration means that both nodes directly access each drive, and no single point of failure exists.

At system initialization, when devices are added to or removed from strands (and at other times), the IBM Storwize V5000 Gen2 software performs a discovery process to update the state of the drive and enclosure objects.

The Storwize V5010 supports one SAS chain for each control enclosure, and up to 10 expansion enclosures can be attached to this chain. The node canister uses SAS port 1 for expansion enclosures.

Figure 12-19 shows the SAS cabling on a Storwize V5010 with three attached expansion enclosures.

Figure 12-19 SAS expansion cabling on the Storwize V5010

The Storwize V5020 supports one SAS chain for each control enclosure, and up to 10 expansion enclosures can be attached to this chain. The node canister uses SAS port 1 for expansion enclosures.

Figure 12-20 shows the SAS cabling on a Storwize V5020 with three attached expansion enclosures.

Figure 12-20 SAS expansion cabling on the Storwize V5020

The Storwize V5030 supports two SAS chains for each control enclosure, and up to 10 expansion enclosures can be attached to each chain. The node canister uses SAS port 1 for expansion enclosures.

Figure 12-21 shows the SAS cabling on a Storwize V5030 with six attached expansion enclosures (three enclosures in each chain).

Figure 12-21 SAS expansion cabling on the Storwize V5030

Important: When a SAS cable is inserted, ensure that the connector is oriented correctly by confirming that the following conditions are met:

•The pull tab must be below the connector.

•Insert the connector gently until it clicks into place. If you feel resistance, the connector is probably oriented the wrong way. Do not force it.

•When the connector is inserted correctly, the connector can be removed only by pulling the tab.

•Cabling is done from the controller view top → down. Top/down button up is not supported.

Drive slots

The IBM Storwize V5000 Gen2 has different types of enclosures, depending on the model, warranty, and number of drive slots. Table 12-7 shows the drive slots on each enclosure type.

Table 12-7 Drive slots for each enclosure type

Enclosure type	Drive slots
•Control enclosure 2077/2078-112 •Control enclosure 2077/2078-212 •Control enclosure 2077/2078-312 •Expansion enclosure 2077/2078-12F	12 x 3.5-inch slots
•Expansion enclosure 2077/2078-92F	92 x 3.5-inch slots (usage of 2.5-inch drives possible with carriers)
•Control enclosure 2077/2078-124 •Control enclosure 2077/2078-224 •Control enclosure 2077/2078-324 •Expansion enclosure 2077/2078-24F	24 x 2.5-inch slots

Drive replacement procedure

You can reseat or replace a failed drive in a Storwize V5000 Gen2 by removing it from its enclosure and replacing it with the correct new drive without requiring the Directed Maintenance Procedure to supervise the service action.

The system can automatically perform the drive hardware validation tests and can promote the drive into the configuration if these tests pass, automatically configuring the inserted drive as a spare. The status of the drive after the promotion can be recorded in the event log either as an informational message or an error if a hardware failure occurs during the system action.

For more information about the drive replacement process, see the IBM Storwize V5000 Gen2 Knowledge Center at this website:

Replacing a 3.5 inch drive assembly

https://ibm.biz/Bdjm8J

Replacing a 2.5 inch drive assembly:

https://ibm.biz/Bdjm88

12.2.5 Power supply units

All enclosures require two power supply units (PSUs) for normal operation. A single PSU can power the entire enclosure for redundancy. We advise that you supply AC power to each PSU from different power distribution units (PDUs).

Figure 12-22 shows a fully equipped control enclosure with two supply units. The PSUs are identical between the control and expansion enclosures.

Figure 12-22 Power supply units

The left PSU is numbered 1, and the right PSU is numbered 2.

Power supplies in both control and expansion enclosures are hot-swappable and replaceable without a need to shut down a node or cluster. If the power is interrupted in one node canister for less than 2.5 seconds, the canister cannot perform a fire hose dump and continues operation from battery.

PSU status

Each PSU has three LEDs that display the status of its activity. The LEDs are the same for the control and expansion units.

Figure 12-23 shows the PSU status LEDs.

Figure 12-23 PSU status LEDs

The meaning of each LED is shown in Table 12-8.

Table 12-8 PSU status LEDs

Name and position	Color	State	Meaning
Input status (top)	Green	Solid	Input power is available.
Input status (top)	Green	Off	No input power is available.
Output status (middle)	Green	Solid	PSU is providing DC output power.
Output status (middle)	Green	Off	PSU is not providing DC output power.
Fault (bottom)	Amber	Solid	A fault exists with the PSU.
		Blinking	The PSU is being identified. A fault might exist.
		Off	No fault is detected.

PSU replacement procedure

For information about the PSU replacement process, see the IBM Storwize V5000 Gen2 Knowledge Center at this website:

https://ibm.biz/Bdjm8C

12.3 Configuration backup

The configuration backup file must be used if a serious failure occurs that requires the system configuration to be restored. The file contains configuration data of arrays, pools, volumes, and so on (but no client data).

The configuration backup file can be downloaded and saved by using the graphical user interface (GUI) or the command-line interface (CLI). The CLI option requires you to log in to the system and download the file by using Secure Copy Protocol (SCP). It is a preferred practice for an automated backup of the configuration.

Important: Save the configuration files of the IBM Storwize V5000 Gen2 regularly. The best approach is to save daily and automate this task. Always perform the additional manual backup before you perform any critical maintenance task, such as an update of the microcode or software version.

The backup file is updated by the cluster every day and stored in the /dumps directory. Even so, it is important to start a manual backup after you change your system configuration.

To successfully perform the configuration backup, follow the prerequisites and requirements:

•All nodes must be online.

•No independent operations that change the configuration can be running in parallel.

•No object name can begin with an underscore.

Important: You can perform an ad hoc backup of the configuration only from the CLI. However, the output of the command can be downloaded from both the CLI and the GUI.

12.3.1 Generating a manual configuration backup by using the CLI

You can use the CLI to trigger a configuration backup either manually on an ad hoc basis or by an automatic process regularly. The svcconfig backup command generates a new backup file. Triggering a backup by using the GUI is not possible, but you can save the output from the GUI.

Example 12-1 shows the output of the svcconfig backup command.

Example 12-1 Triggering a backup by using the CLI

>svcconfig backup

........................................................................................................................................................................................

CMMVC6155I SVCCONFIG processing completed successfully

The svcconfig backup command creates three files that provide information about the backup process and cluster configuration. These files are created in the /dumps directory on the configuration node and can be retrieved by using SCP. Use the lsdumps command to list them, as shown in Example 12-2.

Example 12-2 Listing the backup files by using the CLI

>lsdumps

id filename

...

48 svc.config.backup.xml_781000E-1

49 svc.config.backup.sh_781000E-1

50 svc.config.backup.log_781000E-1

...

The three files that are created by the backup process are described in Table 12-9.

Table 12-9 Files that are created by the backup process

File name	Description
svc.config.backup.xml_<serial>	This file contains the cluster configuration data.
svc.config.backup.sh_<serial>	This file contains the names of the commands that were issued to create the backup of the cluster.
svc.config.backup.log_<serial>	This file contains details about the backup, including any error information that might be reported.

12.3.2 Downloading a configuration backup by using the GUI

The IBM Storwize V5000 Gen2 does not offer an option to initiate a backup from the GUI. However, you can download existing daily backups or manual backups that were triggered from the CLI.

To download a configuration backup file by using the GUI, complete the following steps:

1. Browse to Settings → Support → Support Package and select Manual Upload Instructions. See Figure 12-24.

Figure 12-24 Manual Upload Instructions

When you select Manual Upload Instructions, a window opens (Figure 12-25).

Figure 12-25 Download Support Package

2. Pressing the Button Download Support Package brings you to the next option, where you can select the different kinds of Support packages, see Figure 12-26 for details.

Figure 12-26 Download Support Package

3. Select Download Existing Package to get a list of all the available log files that are stored on the configuration node, as shown in Figure 12-27.

Figure 12-27 Full log listing option

4. Search for the files that are named svc.config.backup.xml_*, svc.config.backup.sh_*, and svc.config.backup.log_*. Select the files, right-click, and select Download, as shown in Figure 12-28.

Figure 12-28 Backup files download

5. Even though the configuration backup files are updated automatically daily, it might be useful to verify the time stamp of the actual file. Open the svc.config.backup.xml_xx file with a text editor and search for the string timestamp=, which is near the top of the file. Figure 12-29 shows the file and the timestamp information.

Figure 12-29 Timestamp in the backup xml file

12.4 System update

The system update process involves updating the entire IBM Storwize V5000 Gen2 environment.

The node canister software and the drive firmware are updated separately, so these tasks are described in different topics.

Note: Storwize V5000 Gen1 hardware is not supported by IBM Spectrum Virtualize V8.1 or later. The V7.7.1 and V7.8.1 code streams will continue to be updated with critical fixes for this hardware.

12.4.1 Updating node canister software

For information about the latest software and to download the software package, go to the following website:

http://www.ibm.com/support/docview.wss?uid=ssg1S1004336

The GUI also shows whether a software update is available and the latest software level when you navigate to Settings → System → Update System, as shown in Figure 12-30.

Figure 12-30 Latest software level available

Important: Certain levels of code support updates only from specific previous levels. If you update to more than one level above your current level, you might be required to install an intermediate level. For information about on update compatibility, see this website:

http://www.ibm.com/support/docview.wss?uid=ssg1S1004336

Preparing for the update

Allow sufficient time to plan your tasks, review your preparatory update tasks, and complete the update of the IBM Storwize V5000 Gen2 environment. The update procedures can be divided into the following general update tasks, as shown in Table 12-10.

Table 12-10 Software update tasks

Sequence	Upgrade tasks
1	Decide whether you want to update automatically or manually. During an automatic update procedure, the clustered system updates each of the nodes systematically. The automatic method is the preferred procedure for updating software on nodes. However, you can update each node manually.
2	Ensure that Common Information Model (CIM) object manager (CIMOM) clients are working correctly. When necessary, update these clients so that they can support the new version of the IBM Storwize V5000 Gen2 code. Examples can be operating system (OS) versions and options, such as FlashCopy Manager or VMware plug-ins.
3	Ensure that multipathing drivers in the environment are fully redundant. If you experience failover issues with multipathing driver support, resolve these issues before you start normal operations.
4	Update other devices in the IBM Storwize V5000 Gen2 environment. Examples might include updating the hosts and switches to the correct levels.
5	Update your IBM Storwize V5000 Gen2.

Important: Ensure that no unfixed errors are in the log and that the system date and time are correctly set before you start the update.

The amount of time that it takes to perform a node canister update can vary depending on the amount of preparation work that is required and the size of the environment. Generally, to update the node software, allow 20 - 40 minutes for each node canister and a single 30-minute wait when the update is halfway complete. One node in each I/O group can be upgraded to start, then the system can wait 30 minutes before it upgrades the second node in each I/O group. The 30-minute wait allows the recently updated node canister to come online and be confirmed as operational, and it allows time for the host multipath to recover.

The software update can be performed concurrently with normal user I/O operations. After the updating node is unavailable, all I/O operations fail to that node and the failed I/O operations are directed to the partner node of the working pair. Applications do not see any I/O failures.

The maximum I/O rate that can be sustained by the system might degrade while the code is uploaded to a node, the update is in progress, the node is rebooted, and the new code is committed because write caching is disabled during the node canister update process.

Important: Ensure that the multipathing drivers are fully redundant with every available path and online. You might see errors that are related to the paths, which can go away (failover) and the error count can increase during the update. When the paths to the nodes return, the nodes fall back to become a fully redundant system.

When new nodes are added to the system, the upgrade package is automatically downloaded to the new nodes from the IBM Storwize V5000 Gen2 system.

Update test utility

The Storwize V5000 Gen2 update test utility checks for known issues that can cause problems during a software update. You can download the utility and read more about it at this website:

http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S4000585

The software update test utility can be downloaded in advance of the update process, or it can be downloaded and run directly during the software update, as guided by the update wizard. You can run the utility multiple times on the same system to perform a readiness check-in preparation for a software update.

The installation and use of this utility is non disruptive, and it does not require a restart of any node. Therefore, host I/O is not interrupted. The utility is only installed on the current configuration node.

System administrators must continue to check whether the version of code that they plan to install is the latest version.

Updating the software automatically by using the GUI

Complete the following steps to automatically update the node canister software by using the GUI:

1. Browse to Settings → System → Update System and select Test and Update, as shown in Figure 12-31.

Figure 12-31 Update system panel

Alternatively, you can run only the test utility by selecting Test Only.

2. Select the test utility and update package files by clicking the folder icons, as shown in Figure 12-32. The code levels are entered automatically.

Figure 12-32 File selection

Alternatively, for the Test Only option, upload only the test utility and enter the code level manually.

3. Select Automatic update and click Next to come to the next question regarding paused update, as shown in Figure 12-33. The Automatic update option is the default and advised choice.

Figure 12-33 Automatic update selection

4. Shown in Figure 12-34 you can choose if you want to pause the update or not. Default is Fully automatic. Click Finish to start the update.

Figure 12-34 Fully automatic

5. Wait for the test utility and update package to upload to the system, as shown in Figure 12-35.

Figure 12-35 File upload

6. After the files upload, the test utility is automatically run, as shown in Figure 12-36. The test utility verifies that no issues exist with the current system environment, such as failed components and drive firmware that is not at the latest level.

Figure 12-36 State while the test utility runs

7. If the test utility discovers any warnings or errors, a window opens to inform the user, as shown in Figure 12-37. Click Read more to get more information.

Figure 12-37 Warning about the issues that were detected

Figure 12-38 shows that in this example the test utility identified one warning.

Figure 12-38 Test utility results

Warnings do not prevent the software update from continuing, even if the recommended procedure is to fix all issues before you proceed.

8. Close the window and select either Resume or Cancel, as shown in Figure 12-39. Clicking Resume continues the software update. Clicking Cancel cancels the software update so that the user can correct any issues.

Figure 12-39 State after you run the test utility

9. Selecting Resume prompts the user to confirm the action, as shown in Figure 12-40.

Figure 12-40 Resume confirmation window

10. Wait for each node to be updated and rebooted, one at a time until the update process is complete. The GUI displays the overall progress of the update and the current state of each node, as shown in Figure 12-41.

Figure 12-41 Automatic update progress

During the update process, a node fails over and you can temporarily lose connection to the GUI. After this situation happens, a warning is displayed, as shown in Figure 12-42. Select Yes.

Figure 12-42 Configuration node failover warning

Updating the software manually by using the GUI and SAT

Important: We advise that you update the IBM Storwize V5000 Gen2 automatically by following the update wizard. If a manual update is used, ensure that you do not skip any steps.

Complete the following steps to manually update the software by using the GUI and Service Assistant Tool (SAT):

1. Browse to Settings → System → Update System and select Update and Test, as shown in Figure 12-43.

Figure 12-43 Update system panel

Alternatively, you can run the test utility by selecting Test Only.

2. Select the test utility and update package files by clicking the folder icons, as shown in Figure 12-44. The code levels are entered automatically.

Figure 12-44 File selection

Alternatively, for the Test Only option, upload only the test utility and enter the code level manually.

3. Select Service Assistant Manual update and click Finish, as shown in Figure 12-45.

Figure 12-45 Manual update selection

4. Wait for the test utility and update package to upload to the system, as shown in Figure 12-46.

Figure 12-46 File upload

5. After the files upload, the test utility is automatically run, as shown in Figure 12-47. The test utility verifies that no issues exist with the current system environment, such as failed components and drive firmware that is not at the latest level.

Figure 12-47 State while the test utility run

If the utility identifies no issues, the system is ready for the user to initiate the manual upgrade, as shown in Figure 12-48.

Figure 12-48 State while you wait for the manual upgrade to start

6. Choose a node to update. Non-configuration nodes must be updated first. Update the configuration node last. Browse to Monitoring → System and hover over the canisters to confirm the nodes that are the non-configuration nodes, as shown in Figure 12-49.

Figure 12-49 Checking the configuration node status

7. Right-click the canister that contains the node that you want to update and select Remove, as shown in Figure 12-50.

Figure 12-50 Removing a node canister

Important: Ensure that you select the non-configuration nodes first.

8. A warning message appears to ask whether you want to remove the node, as shown in Figure 12-51. Click Yes.

Figure 12-51 Node removal confirmation window

The non-configuration node is removed from the management GUI Update System panel and is shown as Unconfigured when you hover over the node after you select Monitoring → System.

9. Open the Service Assistant Tool for the node that you removed. Enter the Service IP Address followed by /service into a browser window. Without /service the browser opens the associated GUI to this service IP. No HTTP:// or HTTPS:// is needed.

Example: 172.163.18.34/service

10. In the Service Assistant Tool, ensure that the node that is ready for update is selected. The node can be in the Service status, display a 690 error, and show no available cluster information, as shown in Figure 12-52.

Figure 12-52 Node to update in the Service Assistant Tool

11. In the Service Assistant Tool, select Update Manually, and choose the required node canister software upgrade file, as shown in Figure 12-53.

Figure 12-53 Starting the update in the Service Assistant Tool

12. Click Update to start the update process on the first node and wait for the node to finish updating.

Non-configuration nodes can be reintroduced automatically into the system after the update finishes. Updating and adding the node again can last 20 - 40 minutes.

The management GUI shows the progress of the update, as shown in Figure 12-54.

Figure 12-54 Manual update progress

13. Repeat steps 7 - 12 for the remaining nodes, leaving the configuration node until last.

14. After you remove the configuration node from the cluster, you are asked whether you want to refresh the panel, as shown in Figure 12-55. Select Yes.

Figure 12-55 Configuration node failover warning

Important: The configuration node remains in the Service state when it is added to the cluster again. Therefore, you need to exit the Service state manually.

15. To exit the Service state, browse to the Home panel of the Service Assistant Tool and open the Actions menu. Select Exit Service State and click GO (Figure 12-56).

Figure 12-56 Exiting the Service state in the Service Assistant Tool

12.4.2 Updating the drive firmware

Drive firmware can be updated for all drives at the same time or individually.

To get the latest drive update package, go to the Supported Drive Types and Firmware Levels for the IBM Storwize V5000 website:

http://www.ibm.com/support/docview.wss?uid=ssg1S1004427

Note: Find the download link for the actual drive firmware at the bottom of the Web page.

Updating the firmware on individual drives

To update an individual drive, complete the following steps:

1. Navigate to Pools → Internal Storage, right-click the drive to update, and select Upgrade from the Actions menu, as shown in Figure 12-57.

Figure 12-57 Individual drive update

2. Select the upgrade package, which was downloaded from the IBM Support site, by clicking the folder icon, and click Upgrade, as shown in Figure 12-58.

Figure 12-58 Individual drive update file selection

The drive firmware update takes about 2 - 3 minutes for each drive.

3. To verify the new firmware level, right-click the drive and select Properties, as shown in Figure 12-59.

Figure 12-59 Individual drive update result

Updating the firmware on all drives

Here we show how to use the management GUI to update all of the drives in an IBM Storwize V5000 Gen2:

1. Go to Pools → Internal Storage.

2. Figure 12-60 on page 695 shows how to update all drives through the Actions menu in the Internal Storage panel.

3. Under Drive Class Filter, click All Internal.

4. In the Actions menu, click Upgrade All.

Note: If any drives are selected, the Actions menu displays actions for the selected drives and the Upgrade All option does not appear. If a drive is selected, deselect it by holding down the Ctrl key and clicking the drive.

Figure 12-60 Update of multiple drives in the Internal Storage panel

5. After you initiate the drive upgrade process by either of the previous two options, the panel in Figure 12-61 is displayed. Select the drive upgrade package, which was downloaded from the IBM Support site, by clicking the folder icon, and click Upgrade. You can also override newer versions of firmware when you check mark the option Install the firmware even if the drive is running a newer version and if directed by the support center.

Figure 12-61 Upload the software upgrade package for multiple drives

All drives that require an update can now be updated.

12.5 Monitoring

Any issue that is reported by your IBM Storwize V5000 Gen2 system must be fixed as soon as possible. Therefore, it is important to configure the system to send automatic notifications when a new event is reported. You can select the type of event for which you want to be notified. For example, you can restrict notifications to only events that require immediate action.

Several event notification mechanisms are available:

Email Email notifications can be configured to send emails to one or more email addresses. With this mechanism, individuals can receive notifications wherever they have email access, including mobile devices.

SNMP SNMP notifications can be configured to send a Simple Network Management Protocol (SNMP) traps report to a data center management system that consolidates SNMP reports from multiple systems. With this mechanism, you can monitor your data center from a single workstation.

Syslog Syslog notifications can be configured to send a syslog report to a data center management system that consolidates syslog reports from multiple systems. With this mechanism, you can monitor your data center from a single location.

If your system is within warranty, or you have a hardware maintenance agreement, configure your IBM Storwize V5000 Gen2 system to send email events directly to IBM if an issue that requires hardware replacement is detected. This mechanism is known as Call Home. When an event is received, IBM automatically opens a problem report and, if appropriate, contacts you to verify whether replacement parts are required.

Important: If you set up Call Home to the IBM Support Center, ensure that the contact details that you configured are correct and kept up-to-date when personnel changes.

12.5.1 Email notifications and Call Home

The Call Home function of the IBM Storwize V5000 Gen2 uses the email notification mechanism to send emails to the specific IBM Support Center. You enable Call Home by configuring email notifications. Then, you can optionally add other email addresses to notify.

To configure Call Home and other optional email addresses, complete the following steps:

1. Browse to Settings → Notifications → Email and select Enable Notifications, as shown in Figure 12-62.

Figure 12-62 Enabling email notifications

2. For the correct functionality of email notifications, ensure that Simple Mail Transfer Protocol (SMTP) is enabled on the management network and not, for example, blocked by firewalls.

If Email Notification is not enabled, you will get a periodically warning like that shown in Figure 12-63.

Figure 12-63 Configure Call Home info

3. Configure the SMTP servers. You can add several servers by clicking the plus (+) sign, as shown in Figure 12-64.

Figure 12-64 Email Servers

4. Figure 12-65 shows the entry for Call Home. This Email Address is given, and can’t be changed.

Figure 12-65 Call Home

5. You can add several recipients to receive notifications. Press the + sign to add a new Email Address. Figure 12-66 shows one entry.

Figure 12-66 Email Users

6. It is very important to add an Email contact, who is responsible for this Storage System. Provide the contact information of the system owner, who can be contacted by the IBM Support Center when necessary, Figure 12-67 shows such an entry. Ensure that you always keep this information up-to-date.

Figure 12-67 Email Contact

7. Also, the System Location is important. This Information will be used by the support personnel to send the Support Representative to the failing system. If there is only a minor problem this Info will be used to send the CRU parts to the given address. Figure 12-68 shows how the System Location panel should be filled out. Ensure that you always keep this information up-to-date.

Figure 12-68 System Location

8. You can include an inventory file into your Email to check the actual inventory of your system. Figure 12-69 shows you the location where you can set the check mark to indicate you want to receive inventory details. The emails include an inventory report that describes the system hardware and critical configuration information. Object names and other information, such as IP addresses, are not sent. Based on the information that is received, IBM can inform you whether the hardware or software that you are using requires an upgrade because of a known issue.

Figure 12-69 Inventory details

9. Click Save.

10. Select Edit → Call Home → Test to test the Call Home function, as shown in Figure 12-70.

Figure 12-70 Test Call Home.

Disabling and enabling notifications

Email notifications can be temporarily or permanently disabled at any time, as shown in Figure 12-71. Disabling email notifications is a preferred practice when you run maintenance tasks, such as upgrading code or replacing parts. After the maintenance operation, remember to re-enable the email notification function.

Figure 12-71 Disabling email notifications

The same results can be achieved by using the CLI and entering the svctask stopmail and svctask startmail commands.

12.6 Audit log

The audit log is useful when you analyze past configuration events, especially when you try to determine, for example, how a volume ended up being shared by two hosts, or why the volume was overwritten. The audit log is included in the support package to aid in problem determination.

The audit log tracks action commands that are issued through the CLI or the management GUI. It provides the following entries:

•Name of the user who issued the action command

•Name of the actionable command

•Time stamp of when the actionable command was issued on the configuration node

•Parameters that were issued with the actionable command

Failed commands and view commands are not logged in the audit log. Certain service commands are not logged either. The svcconfig backup, cpdumps, and ping service commands are not logged.

To access the audit log by using the GUI, browse to Access → Audit Log, as shown in Figure 12-72.

Figure 12-72 Audit log panel

Right-clicking any column header opens the option menu in which you can select columns that are shown or hidden. It is also possible to click the Column icon on the far right of the column headers to open the option menu.

Figure 12-73 shows all of the possible columns that can be displayed in the audit log view.

Figure 12-73 Possible audit log columns

12.7 Event log

Whenever a significant change in the status of the IBM Storwize V5000 Gen2 is detected, an event is submitted to the event log. All events are classified as alerts or messages.

An alert is logged when the event requires action. Certain alerts have an associated error code that defines the service action that is required. The service actions are automated through the fix procedures. If the alert does not have an error code, the alert represents an unexpected change in the state. This situation must be investigated to see whether it is expected or represents a failure. Investigate an alert and resolve it when it is reported.

A message is logged when a change that is expected is reported, for instance, an IBM FlashCopy operation completes.

To check the event log, browse to Monitoring → Events, as shown in Figure 12-74.

Figure 12-74 Event log

12.7.1 Managing the event log

The event log features a size limit. After the event log is full, newer entries replace the older entries, which are not required. To avoid a repeated event that fills the event log, certain records in the event log refer to multiple occurrences of the same event. When event log entries are coalesced in this way, the time stamp of the first occurrence of the problem and the time stamp the last occurrence of the problem are saved in the log entry. A count of the number of times that the error condition occurred is also saved in the log entry. Other data refers to the last occurrence of the event.

Event log panel columns

Figure 12-75 shows all of the possible columns that can be displayed in the error log view.

Figure 12-75 Possible event log columns

Event log filter options

The event log can be filtered by using the options that are shown in Figure 12-76.

Figure 12-76 Event log filter options

Each option is described:

•Recommended Actions (default)

Only events with Recommended Actions (Status Alert) are displayed. For each problem that is selected, you can:

– Run a fix procedure

– View the properties

•Unfixed Alerts

Displays only the alerts that are not fixed. For each entry that is selected, you can:

– Run a fix procedure on any alert with an error code

– Mark an event as fixed

– Filter the entries to show them by specific minutes, hours, or dates

– Reset the date filter

– View the properties

•Unfixed Messages and Alerts

This option lists unfixed events. This option is useful to find events that must be handled, but no actions are required or recommended. For each entry that is selected, you can:

– Run a fix procedure on any alert with an error code

– Mark an event as fixed

– Filter the entries to show them by specific minutes, hours, or dates

– Reset the date filter

– View the properties

•Show All

This option lists all available events. For each entry that is selected, you can:

– Run a fix procedure on any alert with an error code

– Mark an event as fixed

– Filter the entries to show them by specific minutes, hours, or dates

– Reset the date filter

– View the properties

Some events require a certain number of occurrences in 25 hours before they are displayed as unfixed. If they do not reach this threshold in 25 hours, they are flagged as expired. Monitoring events are below the coalesce threshold and are usually transient.

You can also sort events by time or error code. When you sort by error code, the most serious events (those with the lowest numbers) are displayed first. You can select any event that is listed and select Actions → Properties to view details about the event.

Important: Check for this filter option if no event is listed. Events might exist that are not associated with recommended actions.

Figure 12-77 shows an event log with no items when the Recommended Actions filter was selected, which does not necessarily mean that the event log is clear. To check whether the log is clear, click Show All.

Figure 12-77 Event log with no recommended actions

Actions on a single event

Right-clicking a single event gives options that might be used for that specific event, as shown in Figure 12-78 on page 704.

Figure 12-78 Possible actions on a single event

Each option is described:

•Run Fix Procedure

This option starts the fix procedure for this specific event.You can start a fix procedure even if the procedure is not the recommended next action. However, we advise that you fix the error with the highest priority first.

•Mark as Fixed

This option marks this specific event as fixed. Message events must be marked as fixed to stop them from showing in the event log.

•Filter by Date

This option limits the event log entries to the events that occurred between an interval that is defined by the user.

•Show entries within (minutes/hours/days)

This option limits the event log entries to the events that occurred within the last period:

– 1, 5, 10, 15, 30, or 45 minutes

– 1, 2, 5, or 12 hours

– 1, 4, 7, 15, or 30 days

•Reset Date Filter

This option clears the Filter by Date.

•Clear Log

This option clears the complete event log, even if only one event was selected.

Important: These actions cannot be undone and might prevent the system from being analyzed when severe problems occur.

•Properties

This option provides more information for the selected event that is shown in the list.

Recommended actions

A fix procedure invokes a wizard that is known as a Directed Maintenance Procedure (DMP) that helps to troubleshoot and correct the cause of an error. Certain DMPs reconfigure the system based on your responses, ensure that actions are carried out in the correct sequence, and prevent or mitigate the loss of data. For this reason, you must always run the fix procedure to fix an error, even if the fix might seem obvious.

To run the fix procedure for the error with the highest priority, go to the Recommended Action panel at the top of the Events page and click Run Fix, as shown in Figure 12-79. When you fix higher-priority events first, the system often can automatically mark lower-priority events as fixed.

Figure 12-79 Next recommended action

12.7.2 Alert handling and recommended actions

All events that are in Alert status require attention. Alerts are listed in priority order. Alerts must be fixed sequentially by using the available fix procedures.

Example: Array mdisk not protected by sufficient spares

For example, look at an error that was raised by taking a drive offline in an array with redundancy of one.

This example can show how faults are represented in the error log, how information about the fault can be gathered, and how the Recommended Action (DMP) can be used to fix the error:

•Detecting the alert

The Health Status indicator shows a red alert. The Status Alerts indicator (on top of the GUI) shows one alert. Click the alert to retrieve the specific information, as shown in Figure 12-80.

Figure 12-80 Status alert for an individual entry

Review the event log for more information.

•Gathering additional information

More details about the event are available by clicking the event and selecting Details. This information might help you fix a problem or analyze a root cause. Figure 12-81 shows the properties for the previous event.

Figure 12-81 Alert properties

•Run the Recommended Action (DMP)

We highly advise that you use the DMP to fix any alerts. You can miss tasks that are running in the background when you bypass the DMP. Not all alerts have available DMPs.

Figure 12-82 shows how to start the DMP by selecting Run Fix at the top of the window. This option always runs the recommended action.

Figure 12-82 Starting the DMP (first option)

Figure 12-83 on page 707 shows how to start the DMP by right-clicking the alert record and selecting Run Fix Procedure. You can use this option to run a fix procedure that might not be the recommended action.

Figure 12-83 Starting the DMP (second option)

The steps and panels of a DMP are specific to the error. When all of the steps of the DMP are processed successfully, the recommended action is complete and the problem is fixed usually. Figure 12-84 shows that the Health Status changed to green and both the Status Alerts indicator and the Recommended Action box disappeared, implying that no more actions must be taken.

Figure 12-84 Event log with no outstanding recommended action

Handling multiple alerts

Figure 12-85 shows the event log with multiple alerts.

Figure 12-85 Multiple alert events that are displayed in the event log

The Recommended Action function orders the alerts by severity and displays the events with the highest severity first. If multiple events have the same severity, they are ordered by date and the oldest event is displayed first.

Events are ordered by severity. The first event is the most severe. Events are ordered by severity in the following way:

•Unfixed alerts (sorted by error code). The lowest error code has the highest severity.

•Unfixed messages.

•Monitoring events (sorted by error code). The lowest error code has the highest severity.

•Expired events.

•Fixed alerts and messages.

The less severe events are often fixed with the resolution of the most severe events.

12.8 Support Assistance

Support assistance enables support personnel to access the system to complete troubleshooting and maintenance tasks. You can configure either local support assistance, where support personnel visit your site to fix problems with the system, or remote support assistance. Both local and remote support assistance use secure connections to protect data exchange between the support center and system. More access controls can be added by the system administrator. Assistance can be provided at your location or through a remote connection to your system.

Local support assistance

Use local support assistance if you have restrictions that require on-site support only. Unlike other authentication methods, you can audit all actions that support personnel conduct on the system when local support assistance is configured. Support personnel can log on to your system by using a console or over your intranet. These users can be authenticated only by a challenge-response mechanism.

Support personnel obtain the challenge-response access either through virtual private network (VPN) or over a telephone call with another support person or the administrator at the support center. Note that if you want to enable remote support assistance or use the Assist On-Site tool, you must configure local support assistance.

Remote support assistance

With remote support assistance, support personnel can visit on site and they can also access the system remotely through a secure connection from the support center. However, before you enable remote support assistance between the system and support, you first need to configure local support assistance. You must ensure that call home is configured and a valid email server is specified.

Call home automatically contacts support when critical errors occur on the system. Call home sends a return email that communicates information back to the system such as a Problem Management Report (PMR) number that tracks the problem until it is resolved.

Note that you cannot enable remote support assistance and use the Assist On-Site tool at the same time.

In addition, a service IP address must be configured before you set up remote support assistance. During system initialization, you can optionally set up a service IP address and remote support assistance. If you did not configure a service IP address, go to Settings → Network → Service IPs to configure a service IP for each node on the system. Optionally, you need to configure a proxy server if you use a firewall to protect your internal network.

When you enable remote support assistance, a shared-token is also generated by the system and sent to the support center. If the system needs support services, support personnel can be authenticated onto the system with a challenge-response mechanism. Use the chsra command to enable remote support assistance on the system.

After support personnel obtain the response code, it is entered to gain access to the system. Service personnel have three attempts to enter the correct response code. After three failed attempts, the system generates a new random challenge and support personnel must obtain a new response code.

Support roles

When you enable local support assistance, support personnel are assigned either the Monitor role or the Restricted Administrator role. The Monitor role can view, collect, and monitor logs and errors to determine the solution to problems on the system. The Restricted Administrator role gives support personnel access to administrator tasks to help solve problems on the system. However, this role restricts these users from deleting volumes or pools, unmapping hosts, or creating, deleting, or changing users.

Roles limit access of the assigned user to specific tasks on the system. Users with the service role can set the time and date on the system, delete dump files, add and delete nodes, apply service, and shut down the system. They can also view objects and system configuration but cannot configure, modify, or manage the system or its resources. They also cannot read user data.

12.8.1 Configuring support assistance

You find the Support Assistance Screen under Settings → Support → Support Assistance as shown in Figure 12-86.

Figure 12-86 Support Assistance

12.8.2 Set up Support Assistant

To Support assistance enables support personnel to access the system to complete troubleshooting and maintenance tasks. You can configure either local support assistance, where support personnel visit your site to fix problems with the system, or remote support assistance. Both local and remote support assistance use secure connections to protect data exchange between the support center and system. More access controls can be added by the system administrator. The system supports both local and remote support assistance.

Use local support assistance if you have restrictions that require on-site support only. Unlike other authentication methods, you can audit all actions that support personnel conduct on the system when local support assistance is configured.With remote support assistance, support personnel can visit on site and they can also access the system remotely through a secure connection from the support center. However, before you enable remote support assistance between the system and support, you first need to configure local support assistance.

Support personnel rely on the support package, such as snaps, dumps, and various trace files, to troubleshoot issues on the system. The management GUI and the command-line interface support sending this data to the support center securely. Additionally, support personnel can download new builds, patches, and fixes automatically to the system with your permission.

To configure support assistance, complete the following steps:

1. In the management GUI, select Settings → Support → Support Assistance → Set Up Support Assistance. See Figure 12-87.

Figure 12-87 Support Assistance

2. If you selected to configure both local and remote support assistance, verify the pre-configured support centers.

Optionally, enter the name, IP address, and port for the proxy server on the Remote Support Centers page. A proxy server is used in systems where a firewall is used to protect your internal network or if you want to route traffic from multiple storage systems to the same place

Enable local support

To enable local support, complete the following steps:

1. You have to select: I want support personnel to work on-site only. Figure 12-88 shows how to enable local support.

2. Select this option to configure local support assistance. Use this option if your system has certain restrictions that require on-site maintenance. If you select this option, click Finish to set up local support assistance.

Figure 12-88 Enable local support

The screen in Figure 12-89 appears.

Figure 12-89 Local Support Definitions

3. Under Support assistance → Start new Session (marked) you can select the time which the Remote support Session can be idle before the system disconnects the line, see Figure 12-90.

Figure 12-90 Set Idle time before the line will be disconnected

4. Press Begin to begin a new session.

5. Test Connection lets you test the connectivity, as shown in Figure 12-91.

Figure 12-91 Test connection

Figure 12-92 shows the pop up testing the line to the Service Center.

Figure 12-92 Test Service Center Connection

An overview of remote users is shown in Figure 12-93.

Figure 12-93 Support Users

6. A new Token can be generated by pressing the button Generate New Token as shown in Figure 12-94.

Figure 12-94 Generate New Token

When you enable remote support assistance, the system generates a support assistance token. This shared security token is sent to the support center and is used for authentication during support assistance sessions. Updating a token is essentially overwriting the existing token, then sending it securely to the support assistance administration server in an email message. You specify the email addresses of the support assistance administration servers when you configure support assistance.

If the email is not received in time for a support incident or cannot be sent for some reason, a service engineer can manually add the token to the administration server. Before you can update a token, you must enable the support assistance feature. You can update the token periodically as a security practice, similar to how you update passwords.

7. To update a shared support assistance token, enter the following command:

svctask chsra -updatetoken

8. If settings change over time you can reconfigure your settings using the button Reconfigure Settings as shown in Figure 12-95.

Figure 12-95 Reconfigure Settings

Enable remote support

If you are configuring remote support assistance, ensure that the following prerequisites are met:

•Ensure that call home is configured with a valid email server.

•Ensure that a valid service IP address is configured on each node on the system.

•If your system is behind a firewall or if you want to route traffic from multiple storage systems to the same place, you must configure a Remote Support Proxy server. Before you configure remote support assistance, the proxy server must be installed and configured separately. During the set-up for support assistance, specify the IP address and the port number for the proxy server on the Remote Support Centers page.

•If you do not have firewall restrictions and the storage nodes are directly connected to the Internet, request your network administrator to allow connections to 129.33.206.139 and 204.146.30.139 on Port 22.

•Both uploading support packages and downloading software require direct connections to the Internet. A DNS server must be defined on your system for both of these functions to work.

•To ensure that support packages are uploaded correctly, configure the firewall to allow connections to the following IP addresses on port 443: 129.42.56.189, 129.42.54.189, and 129.42.60.189.

•To ensure that software is downloaded correctly, configure the firewall to allow connections to the following IP addresses on port 22: 170.225.15.105,170.225.15.104, 170.225.15.107, 129.35.224.105, 129.35.224.104, and 129.35.224.107.

Using the management GUI

To configure remote support assistance, complete the following steps:

1. Select I want support personnel to access my system both on-site and remotely, as shown in Figure 12-96.

Figure 12-96 Enable remote support

2. Select this option to configure remote support assistance. Use this option to allow support personnel to access your system through a secure connection from the support center. Secure remote assistance requires a valid service IP address, call home, and an optional proxy server if a firewall is used to protect your internal network. If you select this option, click Next to specify IP addresses for the support center and optional proxy server. See Figure 12-97.

Figure 12-97 Support Centers

3. Click Next. On the Remote Support Access Settings page, select one of these options to control when support personnel can access your system to conduct maintenance and fix problems:

– At Any Time: Support personnel can access the system at any time. For this option, remote support session does not need to be started manually and sessions remain open continuously.

– On Permission Only: The system administrator must grant permission to support personnel before they can access the system. See Figure 12-98 on page 717.

Figure 12-98 Remote Support Access Settings

4. Click Finish. After you configure remote support assistance with permission only, you can start sessions between the support center and the system. On the Support Assistance page, select Start New Session and specify the number of hours the session can be idle before the support user is logged off from the system. See Figure 12-99.

Figure 12-99 Idle Timeout Setting

5. If you plan to use the command-line interface to configure assistance, use the following commands:

– For local support assistance, enter the following command:

chsra -enable

– To configure remote support assistance, enter the following command:

chsra -remotesupport enable

12.8.3 Disable Support Assistance

You can disable support assistance by using the command-line interface (CLI). When you disable support assistance, the support assistance token is deleted. All active secure remote access user sessions are closed immediately and a secure email message is sent to the administration server to indicate that secure remote access is disabled on the system.

To disable support assistance completely, enter the following command:

svctask chsra -disable

To disable remote support assistance only, enter the following command:

svctask chsra -remotesupport disable

12.9 Collecting support information

If you have an issue with an IBM Storwize V5000 Gen2 and call the IBM Support Center, you might be asked to provide support data as described in the next section.

12.9.1 Collecting support information by using the GUI

The following information describes how to collect supporting data for the IBM Support Center.

To reach the Support Package screen go to Settings → Support → Support Package following screens opens. See Figure 12-100.

Figure 12-100 Support Package

12.9.2 Automatic upload of Support Packages

You can use the management GUI or the command-line interface to upload support packages to the support center. If support assistance is configured on your systems, you can either automatically or manually upload new support packages to the support center to help analyze and resolve errors on the system. You can select individual logs to either download to review or send directly to the support center for analysis.

Before automatically uploading a support package, ensure that the following prerequisites are configured on the system:

1. Ensure that all of the nodes on the system have internet access.

2. Ensure that a valid service IP address is configured on each node on the system.

3. Configure at least one valid DNS server for domain name resolution. To configure a DNS server on the system, select Settings → System → DNS and specify valid IP addresses and names for one or more DNS servers. You can also use the mkdnsserver command to configure DNS servers.

4. Configure the firewall to allow connections to the following IP addresses on port 443: 129.42.56.189, 129.42.54.189, and 129.42.60.189. To test connections to the support center, select Settings → Support → Support Assistance. On the Support Assistance page, select Test Connection to verify connectivity between the system and the support center.

The management GUI supports uploading new or existing support packages to support automatically. See Figure 12-101.

Figure 12-101 Upload Support Package

5. When you press → Upload Support Package the selection screen opens as shown in Figure 12-102.

Figure 12-102 Upload Support package

6. On the Upload Support Package page, enter the Problem Management Report (PMR) number that is associated with the support package that you are uploading. If you do not have a PMR number, click Don’t have a PMR? to open the Service Request (SR) tool to generate a PMR. You need a IBM Partner ID to register.

Note: If you are not sure if a PMR exists or do not want to create a new PMR, the package can still be sent to the support center. The machine serial number and type are used to route the package to the support center. However, specifying a PMR number can decrease response time for support personnel. You can call the IBM Support Line or use the IBM Support Portal to open a call. Go to the following address:

https://www.ibm.com/support/home/?brandind=Hardware

7. Specify the type of package that you want to generate and upload to the support center by selecting:

– Standard logs

This support package contains the most recent logs that were collected from the system. These logs are most commonly used by the IBM Support Center to diagnose and solve problems.

– Standard logs plus one existing statesave

This support package contains the standard logs from the system and the most recent statesave from any of the nodes in the system. Statesaves are also known as memory dumps or live memory dumps.

– Standard logs plus the most recent statesave from each node

This option is used most often by the support team for problem analysis. They contain the standard logs from the system and the most recent statesave from each node in the system.

– Standard logs plus new statesave

This option might be requested by the IBM Support Center team for problem determination. It generates a new statesave (livedump) for all of the nodes and packages them with the most recent logs.

The support center will let you know which package they need.

8. Click Upload. After the new support package is generated, a summary panel displays the progress of the upload. If the upload is unsuccessful or encounters errors, verify the connection between the system and the support center and retry the upload.

9. If you decide that you want to upload the support package later, you can use the function shown in Figure 12-103.

Figure 12-103 Upload Existing Package

10. If you press Upload Existing Package a screen opens as shown in Figure 12-104.

Figure 12-104 Select Support Package to Upload

Using the command-line interface

To upload a support package or other file with the command-line interface, complete these steps:

1. Enter the following command:

satask supportupload -pmr pmr_number -filename fullpath/filename

where the pmr_number is the number of an existing PMR and fullpath/filename is the full path and the name of the file that you are uploading. The -pmr and -filename parameters are not required. If you do not specify a PMR number, the file is uploaded by using the machine serial and type to route the file to the support center. If you do not specify a file name, the latest support package is uploaded.

2. To verify the progress of the upload to the support center, enter the following command:

lscmdstatus

In the results of this command, verify that the supportupload_status is Complete, which indicates that the upload is successfully completed. Other possible values for this parameter include Active, Wait, Abort, and Failed. If the upload is Active, you can use the supportupload_progress_percent parameter to view the progress for the upload.

If you want to generate a new support package, complete these steps:

1. Enter the following command in the command-line interface:

satask snap -upload -pmr pmr_number

where the pmr_number is the number of an existing PMR. The command generates a new support package and uploads it to the support center with the identifying PMR number. If you do not have a PMR number that corresponds with support package, then you can use the following command:

satask snap -upload

The command generates a new support package and uploads it to the support center by using the machine type and serial to route the package.

2. To verify the progress of the upload to the support center, enter the following command:

lscmdstatus

12.9.3 Manual upload of Support Packages

You can use the management GUI or the command-line interface to upload manually support packages to the support center. If support assistance is configured on your systems, you can manually upload new support packages to the support center to help analyze and resolve errors on the system. You can select individual logs to either download to review or send directly to the support center for analysis.

Using the management GUI

The management GUI supports manually uploading support packages. Manually uploading support packages require that you download either a new support package or an existing support package to your system and then upload the file to support directly.

To manually upload a new support package to the support center, complete these steps:

1. In the management GUI, select Settings → Support → Support Package.

2. On the Support Package page, expand Manual Upload Instructions, as shown in Figure 12-105.

Figure 12-105 Manual Upload Instructions

3. In the Manual Upload Instructions section, click Download Support Package. See Figure 12-106.

Figure 12-106 Download Support Package

4. On the Download New Support Package or Log File panel, select one of these types of support packages to download which are shown in Figure 12-107.

Figure 12-107 Download Support Package or Log file

The type to select depends on the event that is being investigated. For example, if you notice that a node is restarted, capture the snap file with the latest existing statesave. If needed, the IBM Support Center can notify you of the package that is required.

The following components are included in each type of support package:

– Standard logs

This support package contains the most recent logs that were collected from the system. These logs are most commonly used by the IBM Support Center to diagnose and solve problems.

– Standard logs plus one existing statesave

This support package contains the standard logs from the system and the most recent statesave from any of the nodes in the system. Statesaves are also known as memory dumps or live memory dumps.

– Standard logs plus the most recent statesave from each node

This option is used most often by the support team for problem analysis. They contain the standard logs from the system and the most recent statesave from each node in the system.

– Standard logs plus new statesave

This option might be requested by the IBM Support Center team for problem determination. It generates a new statesave (livedump) for all of the nodes and packages them with the most recent logs.

5. Click Download to download the support package to your local computer.

6. After the download completes to your local computer, you can upload the package to the support center with one of the following methods shown in Figure 12-108.

Figure 12-108 Download portals

This explains the different portals and in each case a Webpage opens in your browser.

– Blue Diamond

Select the link to log in to the BlueDiamond portal. BlueDiamond provides enhanced security and support for healthcare clients. You must be a registered BlueDiamond client to use this option. After you accept the terms of service for the upload, log into the BlueDiamond portal with your user name and password.

– Upload over Browser

Use this option for small files under 200 MB. Select the link to upload the support package to the support website through the web browser. On the support website, complete the following steps:

Enter a valid PMR number that is associated with this support package. In the Upload is for field, select Other. Enter a valid email address for the contact for this package.

– FTP Transfer

Use this option for larger files. Select the link to send the package to support with file transfer protocol (FTP). You can send packages to support with standard FTP (non-secure), secure FTP, or with SFTP, which is FTP over secure shell protocol (SSH). On the support port for FTP transfers, select the type of FTP you want to use and follow the instructions for that method.

Using the command-line interface

To upload a support package or other file with the command-line interface, complete these steps:

1. Enter the following command:

satask supportupload -pmr pmr_number -filename fullpath/filename

2. To verify the progress of the upload to the support center, enter the following command:

lscmdstatus

If you want to generate a new support package, complete these steps:

1. Enter the following command in the command-line interface:

satask snap -upload -pmr pmr_number

satask snap -upload

The command generates a new support package and uploads it to the support center by using the machine type and serial to route the package.

2. To verify the progress of the upload to the support center, enter the following command:

lscmdstatus

12.9.4 Collecting support information by using the SAT

The IBM Storwize V5000 Gen2 management GUI collects information from all of the components in the system. The Service Assistant Tool (SAT) collects information from all node canisters. The snap file is the information that is collected and packaged in a single file.

If the package is collected by using the Service Assistant Tool, ensure that the node from which the logs are collected is the current node, as shown in Figure 12-109.

Figure 12-109 Accessing the Collect Logs panel in the Service Assistance Tool

Support information can be downloaded with or without the latest statesave, as shown in Figure 12-110.

Figure 12-110 Collect Logs panel in the Service Assistance Tool

Accessing the SAT by using the technician port

If your system or one of your node canisters is inaccessible through the administrative network, you can connect a personal computer directly to the technician port on the node canister to access the Service Assistant Tool.

Note: This procedure starts the initialization tool if the node canister that is being serviced is in the candidate state, if no system details are configured, and if the partner node is not in the active state.

Complete the following steps:

1. Configure Dynamic Host Configuration Protocol (DHCP) on the Ethernet port of the personal computer to connect to the node canister.

Alternatively, if the personal computer does not support DHCP, configure the static IPv4 address 192.168.0.2 on the port.

2. On the Storwize V5010 system or Storwize V5020 system, reenable the technician port by completing the following steps:

a. Create a text file with the satask chserviceip -techport enable -force command.

b. Save the file as satask.txt in the root directory of the Universal Serial Bus (USB) stick.

c. Insert the USB stick in the USB port of the node that you want to service.

d. Wait until no write activity is recognized and remove the USB stick.

Note: The Storwize V5030 systems have a dedicated technician port that is always enabled so this step is unnecessary.

3. Connect an Ethernet cable between the port on the personal computer and the technician port. The technician port is labeled with a T on the rear of the node canister.

4. Open a supported web browser on the personal computer and browse to the http://192.168.0.1 URL.

Note: If the cluster is active and you connect to the configuration node, this URL opens the management GUI. If you want to access the SAT in this case, browse to http://192.168.0.1/service.

5. Complete the correct procedure to service the canister.

6. Log out of the Service Assistant Tool and disconnect the Ethernet cable from the technician port.

7. On the Storwize V5010 system or Storwize V5020 system, disable the technician port by running the command that is shown in Example 12-3.

Example 12-3 Disabling the technician port

>satask chserviceip -techport disable

SAS port 2 can then be used again to provide extra Ethernet connectivity for system management, iSCSI, and IP replication.

12.10 Powering off the system and shutting down the infrastructure

The following sections describe the process to power off the system and to shut down and start an entire infrastructure that contains an IBM Storwize V5000 Gen2.

12.10.1 Powering off

Important: Never power off your IBM Storwize V5000 Gen2 system by powering off the power supply units (PSUs), removing both PSUs, or removing both power cables from a running system. It can lead to inconsistency or loss of the data that is staged in the cache.

You can power off a node canister or the entire system. When you power off only one node canister for each I/O group, all of the running tasks remain active while the remaining node takes over.

Powering off the system is typically planned in site maintenance (power outage, building construction, and so on) because all components of the IBM Storwize V5000 Gen2 are redundant and replaceable while the system is running.

Important: If you are powering off the entire system, you lose access to all volumes that are provided by this system. Powering off the system also powers off all IBM Storwize V5000 Gen2 nodes. All data is flushed to disk before the power is removed.

Before you power off the system, stop all hosts with volumes that are allocated to this system. This step can be skipped for hosts with volumes that are provisioned with mirroring (host-based mirror) from different storage systems. However, skipping this step means that errors that relate to lost storage paths and disks can be logged on the host error log.

Note: If a canister or the system is powered off, a local visit can be required to either reseat the canister or power cycle the enclosures.

Powering off a node canister

To power off a canister by using the GUI, complete the following steps:

1. Browse to Monitoring → System and rotate the enclosure to the rear view, as shown in Figure 12-111.

Figure 12-111 Rotating the system image

2. Right-click the required canister and select Power Off Canister, as shown in Figure 12-112.

Figure 12-112 Powering off the canister

3. Confirm that you want to power off the canister by entering the confirmation code and clicking OK, as shown in Figure 12-113.

Figure 12-113 Canister power off confirmation window

4. After the node canister is powered off, you can confirm that it is offline in the System panel, as shown in Figure 12-114.

Figure 12-114 Checking the canister state

To power off a node canister by using the CLI, use the command that is shown in Example 12-4.

Example 12-4 Powering off a canister by using the CLI

>svctask stopsystem -node 2

Are you sure that you want to continue with the shut down? (y/yes to confirm)

Powering off the system

To power off the entire system by using the GUI, complete the following steps:

1. Browse to Monitoring → System, click Actions → Power Off System, as shown in Figure 12-115.

Figure 12-115 Powering off the system

2. Confirm that you want to power off the system by entering the confirmation code and clicking OK, as shown in Figure 12-116. Ensure that all FlashCopy, Metro Mirror, Global Mirror, data migration operations, and forced deletions are stopped or allowed to complete before you continue.

Figure 12-116 Power Off System confirmation window

3. To power off the system by using the CLI, use the command that is shown in Example 12-5. Ensure that all FlashCopy, Metro Mirror, Global Mirror, data migration operations, and forced deletions are stopped or allowed to complete before you continue.

Example 12-5 Powering off the system by using the CLI

>svctask stopsystem

Are you sure that you want to continue with the shut down? (y/yes to confirm)

4. Wait for the power LED on the node canisters to blink slowly, which indicates that the power off operation completed.

Note: When you power off an IBM Storwize V5000 Gen2, it does not automatically restart. You must manually restart the system by removing and reapplying the power / power cords.

12.10.2 Shutting down and starting up the infrastructure

To shut down an entire infrastructure (storage, servers, and applications), complete the following steps:

1. Power off your servers and all applications.

2. Power off your IBM Storwize V5000 Gen2 system by using either the GUI or the CLI.

3. Remove the power cords that are connected to both power supplies in the rear of the enclosure on every control and expansion enclosure.

4. Power off your storage area network (SAN) switches.

To start an entire infrastructure, complete the following steps:

1. Power on your SAN switches and wait until the boot completes.

2. Power on any expansion enclosures by connecting the power cord to both power supplies in the rear of the enclosure or by turning on the power circuit.

3. Power on the control enclosures by connecting the power cords to both power supplies in the rear of the enclosure and by turning on the power circuits.

The system starts. The system starts successfully when the status LEDs of all node canisters in the control enclosure are permanently on, which takes no longer than 10 minutes.

Power on your servers and start all applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12. RAS, monitoring, and troubleshooting

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 12. RAS, monitoring, and troubleshooting