Monitoring with Simple Network Management Protocol
This chapter provides information about Simple Network Management Protocol (SNMP) implementation and messages for the IBM DS8880 storage system.
This chapter covers the following topics:
14.1 SNMP implementation on the DS8880
SNMP, as used by the DS8880, is designed so that the DS8880 sends traps only if a notification is necessary. The traps can be sent to a defined IP address.
SNMP alert traps provide information about problems that the storage unit detects. You or the service provider must correct the problems that the traps detect.
The DS8880 does not include an installed SNMP agent that can respond to SNMP polling. The default Community Name parameter is set to public.
The management server configured to receive the SNMP traps receives all of the generic trap 6 and specific trap 3 messages, which are sent in parallel with the call home to IBM.
Before SNMP is configured for the DS8880, you are required to get the destination address for the SNMP trap and the information about the port on which the Trap Daemon listens.
 
Standard port: The standard port for SNMP traps is port 162.
14.1.1 Message Information Base file
The DS8880 storage system provides a Message Information Base (MIB) file that describes the SNMP trap objects. Load the file by using the software that is used for enterprise and SNMP monitoring.
The file is in the snmp subdirectory on the data storage command-line interface (DS CLI) installation CD, or the file available on the DS CLI installation CD image that is available from this FTP site:
14.1.2 Predefined SNMP trap requests
An SNMP agent can send SNMP trap requests to SNMP managers to inform them about the change of values or status on the IP host where the agent is running. Seven predefined types of SNMP trap requests exist, as shown in Table 14-1.
Table 14-1 SNMP trap request types
Trap type
Value
Description
coldStart
0
Restart after a crash.
warmStart
1
Planned restart.
linkDown
2
Communication link is down.
linkUp
3
Communication link is up.
authenticationFailure
4
Invalid SNMP community string was used.
egpNeighborLoss
5
Exterior Gateway Protocol (EGP) neighbor is down.
enterpriseSpecific
6
Vendor-specific event happened.
Each trap message contains an object identifier (OID) and a value, as shown in Table 14-1 on page 458, to notify you about the cause of the trap message. You can also use type 6, the enterpriseSpecific trap type, when you must send messages that do not fit the predefined trap types. For example, the DS8880 uses this type for notifications that are described in this chapter.
14.2 SNMP notifications
The Management Console, which is also known as the Hardware Management Console (HMC), of the DS8880 sends an SNMPv1 trap in the following cases:
A serviceable event is reported to IBM by using call home.
An event occurs in the Copy Services configuration or processing.
When the Global Mirror operation pauses on the consistency group boundary.
When the Global Mirror operation fails to unsuspend one or more Global Copy members.
A space-efficient repository or an over-provisioned volume reaches a user-defined warning watermark.
When the rank reaches I/O saturation.
When Encryption Key Management issues an alert that communication between the control unit and one or more Encryption Key Manager (EKM) servers are lost or reconnected.
A serviceable event is posted as a generic trap 6, specific trap 3 message. The specific trap 3 is the only event that is sent for serviceable events and hardware service-related actions (data offload and remote secure connection). For reporting Copy Services events, generic trap 6 and specific traps 100, 101, 102, 200, 202, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 225, or 226 are sent.
 
Note: Consistency group traps (200 and 201) must be prioritized above all other traps. They must be surfaced in less than 2 seconds from the real-time incident.
14.2.1 Serviceable event that uses specific trap 3
Example 14-1 shows the contents of generic trap 6, specific trap 3. The trap holds the following information:
Serial number of the DS8880
Event number that is associated with the manageable events from the HMC
Reporting storage facility image (SFI)
System reference code (SRC)
Location code of the part that is logging the event
The SNMP trap is sent in parallel with a call home for service to IBM and email notification (if configured).
Example 14-1 SNMP special trap 3 of a DS8880
Manufacturer=IBM
ReportingMTMS=2107-981*1300960
ProbNm=3084
LparName=SF1300960ESS01
FailingEnclosureMTMS=2107-981*1300960
SRC=BE80CB13
EventText=Recovery error,the device hardware error.
FruLoc=Part Number 98Y4317 FRU CCIN U401
FruLoc=Serial Number 1731000A39FC
FruLoc=Location Code U2107.D03.G367012-P1-D2
For open events in the event log, a trap is sent every eight hours until the event is closed.
14.2.2 Copy Services event traps
For state changes in a remote Copy services environment, 13 traps are implemented. The 1xx traps are sent for a state change of a physical link connection. The 2xx traps are sent for state changes in the logical Copy Services setup. For all of these events, no call home is generated and IBM is not notified.
This chapter describes only the messages and the circumstances when traps are sent by the DS8880. For more information about these functions and terms, see IBM DS8870 Copy Services for IBM z Systems, SG24-6787, and IBM DS8870 Copy Services for Open Systems, SG24-6788.
Physical connection events
Within the trap 1xx range, a state change of the physical links is reported. The trap is sent if the physical remote copy link is interrupted. The Link trap is sent from the primary system. The PLink and SLink columns are only used by the 2105 Enterprise Storage Server disk unit.
If one or several links (but not all links) are interrupted, a trap 100 (as shown in Example 14-2), is posted. Trap 100 indicates that the redundancy is degraded. The reference code (RC) column in the trap represents the return code for the interruption of the link.
Example 14-2 Trap 100: Remote Mirror and Copy links degraded
PPRC Links Degraded
UNIT: Mnf Type-Mod SerialNm LS
PRI: IBM 2107-981 75-ZA571 12
SEC: IBM 2107-981 75-CYK71 24
Path: Type PP PLink SP SLink RC
1: FIBRE 0143 XXXXXX 0010 XXXXXX 15
2: FIBRE 0213 XXXXXX 0140 XXXXXX OK
If all of the links are interrupted, a trap 101 (as shown in Example 14-3) is posted. This event indicates that no communication between the primary and the secondary system is possible.
Example 14-3 Trap 101: Remote Mirror and Copy links are inoperable
PPRC Links Down
UNIT: Mnf Type-Mod SerialNm LS
PRI: IBM 2107-981 75-ZA571 10
SEC: IBM 2107-981 75-CYK71 20
Path: Type PP PLink SP SLink RC
1: FIBRE 0143 XXXXXX 0010 XXXXXX 17
2: FIBRE 0213 XXXXXX 0140 XXXXXX 17
After the DS8880 can communicate again by using any of the links, trap 102 (as shown in Example 14-4) is sent after one or more of the interrupted links are available again.
Example 14-4 Trap 102: Remote Mirror and Copy links are operational
PPRC Links Up
UNIT: Mnf Type-Mod SerialNm LS
PRI: IBM 2107-981 75-ZA571 21
SEC: IBM 2107-981 75-CYK71 11
Path: Type PP PLink SP SLink RC
1: FIBRE 0010 XXXXXX 0143 XXXXXX OK
2: FIBRE 0140 XXXXXX 0213 XXXXXX OK
Remote Mirror and Copy events
If you configured consistency groups and a volume within this consistency group is suspended because of a write error to the secondary device, trap 200 is sent, as shown in Example 14-5. One trap for each logical subsystem (LSS) that is configured with the consistency group option is sent. This trap can be handled by automation software, such as Copy Services Manager, to freeze this consistency group. The SR column in the trap represents the suspension reason code, which explains the cause of the error that suspended the Remote Mirror (Peer-to-Peer Remote Copy (PPRC)) and Copy group. The suspension reason codes are listed in Table 14-2 on page 465.
Example 14-5 Trap 200: LSS pair consistency group Remote Mirror and Copy pair error
LSS-Pair Consistency Group PPRC-Pair Error
UNIT: Mnf Type-Mod SerialNm LS LD SR
PRI: IBM 2107-981 75-ZA571 84 08
SEC: IBM 2107-981 75-CYM31 54 84
Trap 202, as shown in Example 14-6, is sent if a Remote Copy pair goes into a suspend state. The trap contains the serial number (SerialNm) of the primary and secondary machine, the LSS (LS), and the logical device (LD). To avoid SNMP trap flooding, the number of SNMP traps for the LSS is throttled. The complete suspended pair information is represented in the summary.
The last row of the trap represents the suspend state for all pairs in the reporting LSS. The suspended pair information contains a hexadecimal string of 64 characters. By converting this hex string into binary code, each bit represents a single device. If the bit is 1, the device is suspended. Otherwise, the device is still in full duplex mode.
Example 14-6 Trap 202: Primary Remote Mirror and Copy devices on LSS suspended due to error
Primary PPRC Devices on LSS Suspended Due to Error
UNIT: Mnf Type-Mod SerialNm LS LD SR
PRI: IBM 2107-981 75-ZA571 28 00 01
SEC: IBM 2107-981 75-CZM21 a8 00
Start: 2015/11/14 10:30:32 CST
PRI Dev Flags (1 bit/Dev, 1=Suspended):
C000000000000000000000000000000000000000000000000000000000000000
Trap 210, as shown in Example 14-7, is sent when a consistency group in a Global Mirror environment was successfully formed.
Example 14-7 Trap 210: Global Mirror initial consistency group successfully formed
Asynchronous PPRC Initial Consistency Group Successfully Formed
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Session ID: 4002
As shown in Example 14-8, trap 211 is sent if the Global Mirror setup is in a severe error state in which no attempts are made to form a consistency group.
Example 14-8 Trap 211: Global Mirror Session is in a fatal state
Asynchronous PPRC Session is in a Fatal State
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-CYM21
Session ID: 4002
Trap 212, as shown in Example 14-9, is sent when a consistency group cannot be created in a Global Mirror relationship for one of the following reasons:
Volumes were taken out of a copy session.
The Remote Copy link bandwidth might not be sufficient.
The Fibre Channel (FC) link between the primary and secondary system is not available.
Example 14-9 Trap 212: Global Mirror consistency group failure: Retry is attempted
Asynchronous PPRC Consistency Group Failure - Retry will be attempted
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Session ID: 4002
Trap 213, as shown in Example 14-10, is sent when a consistency group in a Global Mirror environment can be formed after a previous consistency group formation failure.
Example 14-10 Trap 213: Global Mirror consistency group successful recovery
Asynchronous PPRC Consistency Group Successful Recovery
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Session ID: 4002
Trap 214, as shown in Example 14-11, is sent if a Global Mirror session is ended by using the DS CLI rmgmir command or the corresponding graphical user interface (GUI) function.
Example 14-11 Trap 214: Global Mirror master terminated
Asynchronous PPRC Master Terminated
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Session ID: 4002
As shown in Example 14-12, trap 215 is sent if, in the Global Mirror environment, the master detects a failure to complete the FlashCopy commit. The trap is sent after many commit retries fail.
Example 14-12 Trap 215: Global Mirror FlashCopy at remote site unsuccessful
Asynchronous PPRC FlashCopy at Remote Site Unsuccessful
A UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-CZM21
Session ID: 4002
Trap 216, as shown in Example 14-13, is sent if a Global Mirror master cannot end the Global Copy relationship at one of its subordinates. This error might occur if the master is ended by using the rmgmir command but the master cannot end the copy relationship on the subordinate.
You might need to run a rmgmir command against the subordinate to prevent any interference with other Global Mirror sessions.
Example 14-13 Trap 216: Global Mirror subordinate termination unsuccessful
Asynchronous PPRC Slave Termination Unsuccessful
UNIT: Mnf Type-Mod SerialNm
Master: IBM 2107-981 75-ZA571
Slave:  IBM 2107-981 75-CYM31
Session ID: 4002
Trap 217, as shown in Example 14-14, is sent if a Global Mirror environment is suspended by the DS CLI command pausegmir or the corresponding GUI function.
Example 14-14 Trap 217: Global Mirror paused
Asynchronous PPRC Paused
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-CYM31
Session ID: 4002
As shown in Example 14-15, trap 218 is sent if a Global Mirror exceeded the allowed threshold for failed consistency group formation attempts.
Example 14-15 Trap 218: Global Mirror number of consistency group failures exceeds threshold
Global Mirror number of consistency group failures exceed threshold
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Session ID: 4002
Trap 219, as shown in Example 14-16, is sent if a Global Mirror successfully formed a consistency group after one or more formation attempts previously failed.
Example 14-16 Trap 219: Global Mirror first successful consistency group after prior failures
Global Mirror first successful consistency group after prior failures
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Session ID: 4002
Trap 220, as shown in Example 14-17, is sent if a Global Mirror exceeded the allowed threshold of failed FlashCopy commit attempts.
Example 14-17 Trap 220: Global Mirror number of FlashCopy commit failures exceeds threshold
Global Mirror number of FlashCopy commit failures exceed threshold
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Session ID: 4002
Trap 225, as shown in Example 14-18, is sent when a Global Mirror operation paused on the consistency group boundary.
Example 14-18 Trap 225: Global Mirror paused on the consistency group boundary
Global Mirror operation has paused on the consistency group boundary
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-CYM31
Session ID: 4002
Trap 226, in Example 14-19, is sent when a Global Mirror operation failed to unsuspend one or more Global Copy members.
Example 14-19 Trap 226: Global Mirror unsuspend members failed
Global Mirror operation has failed to unsuspend one or more Global Copy members
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-CYM31
Session ID: 4002
Table 14-2 shows the Copy Services suspension reason (SR) codes.
Table 14-2 Copy Services suspension reason codes
Suspension reason code
Description
03
The host system sent a command to the primary volume of a Remote Mirror and Copy volume pair to suspend copy operations. The host system might specify an immediate suspension or a suspension after the copy completed and the volume pair reached a full duplex state.
04
The host system sent a command to suspend the copy operations on the secondary volume. During the suspension, the primary volume of the volume pair can still accept updates, but updates are not copied to the secondary volume. The out-of-sync tracks that are created between the volume pair are recorded in the change recording feature of the primary volume.
05
Copy operations between the Remote Mirror and Copy volume pair were suspended by a primary storage unit secondary device status command. This system resource code can be returned only by the secondary volume.
06
Copy operations between the Remote Mirror and Copy volume pair were suspended because of internal conditions in the storage unit. This system resource code can be returned by the control unit of the primary volume or the secondary volume.
07
Copy operations between the remote mirror and copy volume pair were suspended when the secondary storage unit notified the primary storage unit of a state change transition to the simplex state. The specified volume pair between the storage units is no longer in a copy relationship.
08
Copy operations were suspended because the secondary volume became suspended because of internal conditions or errors. This system resource code can be returned only by the primary storage unit.
09
The Remote Mirror and Copy volume pair was suspended when the primary or secondary storage unit was rebooted or when the power was restored. The paths to the secondary storage unit might not be disabled if the primary storage unit was turned off. If the secondary storage unit was turned off, the paths between the storage units are restored automatically, if possible. After the paths are restored, issue the mkpprc command to resynchronize the specified volume pairs. Depending on the state of the volume pairs, you might need to issue the rmpprc command to delete the volume pairs and reissue a mkpprc command to reestablish the volume pairs.
0A
The Remote Mirror and Copy pair was suspended because the host issued a command to freeze the Remote Mirror and Copy group. This system resource code can be returned only if a primary volume was queried.
14.2.3 I/O Priority Manager SNMP
When the I/O Priority Manager Control switch is set to Monitor or Managed, an SNMP trap alert message also can be enabled. The DS8880 Licensed Internal Code (LIC) monitors for rank saturation. If a rank is overdriven to the point of saturation (high usage), an SNMP trap alert message 224 is posted to the SNMP server.
The following SNMPs rules are followed:
Up to eight SNMP traps for each SFI server in a 24-hour period (maximum: 16 for each 24 hours for each SFI).
The rank enters the saturation state if it is in saturation for five consecutive 1-minute samples.
The rank exits the saturation state if it is not in saturation for three of five consecutive 1-minute samples.
The SNMP message 224 is reported when a rank enters saturation or every eight hours if the rank is in saturation. The message identifies the rank and SFI. See Example 14-20.
Example 14-20 Trap 224: Rank saturation status changed
Rank Saturated
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Rank ID: R21
Saturation Status: 0
 
Important: To receive traps from I/O Priority Manager (IOPM), set the IOPM to manage SNMP by issuing the following command:
chsi -iopmmode managesnmp <Storage_Image>
14.2.4 Thin provisioning SNMP
The DS8880 can trigger two specific SNMP trap alerts that relate to the thin provisioning feature. The trap is sent out when certain extent pool capacity thresholds are reached, which causes a change in the extent status attribute. A trap is sent under the following conditions:
The extent status is not zero (available space is already below threshold) when the first extent space-efficient (ESE) volume is configured.
ESE volumes are configured in the extent pool.
Example 14-21 shows an example of generated event trap 221.
Example 14-21 Trap 221: Space-efficient repository or over-provisioned volume reached a warning
Space Efficient Repository or Over-provisioned Volume has reached a warning watermark
Unit: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Volume Type: repository
Reason Code: 1
Extent Pool ID: f2
Percentage Full: 100%
Example 14-22 shows an example of generated event trap 223.
Example 14-22 Trap 223: Extent pool capacity reached a warning threshold
Extent Pool Capacity Threshold Reached
UNIT: Mnf Type-Mod SerialNm
IBM 2107-981 75-ZA571
Extent Pool ID: P1
Limit: 95%
Threshold: 95%Status: 0
14.3 SNMP configuration
The SNMP for the DS8880 is designed to send traps as notifications. The DS8880 does not include an installed SNMP agent that can respond to SNMP polling. Also, the SNMP community name for Copy Services-related traps is fixed and set to public.
14.3.1 SNMP preparation
During the planning for the installation (8.3.4, “Monitoring DS8880 with the Management Console” on page 229), the IP addresses of the management system are provided for IBM service personnel. This information must be applied by the IBM service support representative (SSR) during the installation. Also, the IBM SSR can configure the HMC to send a notification for every serviceable event or for only those events that call home to IBM.
The network management server that is configured on the HMC receives all of the generic trap 6, specific trap 3 messages, which are sent in parallel with any events that call home to IBM.
The SNMP alerts can contain a combination of a generic and a specific alert trap. The Traps list outlines the explanations for each of the possible combinations of generic and specific alert traps. The format of the SNMP traps, the list, and the errors that are reported by SNMP are available in the “Generic and specific alert traps” section of the Troubleshooting section of the IBM Knowledge Center for the DS8880 at the following site:
SNMP alert traps provide information about problems that the storage unit detects. You or the IBM SSR must perform corrective action for the related problems.
14.3.2 SNMP configuration with the HMC
Clients can configure the SNMP alerting by logging in to the DS8880 Service web user interface (WUI). The Service WUI can be started from the DS8000 Storage Management Console (HMC) (https://<HMC_ip_address>/service) remotely through a web browser. Access the Service Management Console and log in with the following client credentials:
User ID: customer
Password: cust0mer (default password)
Complete the following steps to configure SNMP at the HMC:
1. Log in to the Service Management section on the HMC, as shown in Figure 14-1.
Figure 14-1 HMC Service Management
2. Select Manage Serviceable Event Notification, as shown in Figure 14-2, and enter the TCP/IP information of the SNMP server in the SNMP Trap Configuration folder.
Figure 14-2 HMC Manage Serviceable Event Notification
3. To verify the successful setup of your environment, create a Test Event on your DS8880 Management Console. Select Storage Facility Management  Services Utilities  Test Problem Notification (PMH, SNMP, Email), as shown in Figure 14-3.
Figure 14-3 HMC test SNMP trap
4. The test generates the Service Reference Code BEB20010 and the SNMP server receives the SNMP trap notification, as shown in Figure 14-4.
Figure 14-4 HMC SNMP trap test
14.3.3 SNMP configuration with the DS CLI
Perform the configuration process for receiving the operation-related traps, such as for Copy Services, thin provisioning, Encryption, or I/O Priority Manager, by using the DS CLI. Example 14-23 shows how SNMP is enabled by using the chsp command.
Example 14-23 Configuring the SNMP by using dscli
dscli> chsp -snmp on -snmpaddr 10.10.10.1,10.10.10.2
CMUC00040I chsp: Storage complex IbmStoragePlex successfully modified.
 
dscli> showsp
Name IbmStoragePlex
desc -
acct -
SNMP Enabled
SNMPadd 10.10.10.1,10.10.10.2
emailnotify Disabled
emailaddr -
emailrelay Disabled
emailrelayaddr -
emailrelayhost -
numkssupported 4
SNMP preparation for the management software
To enable the trap-receiving software to display the correctly decoded message in a human-readable format, load the DS8880 specific MIB file.
The MIB file that is delivered with the latest DS8880 DS CLI CD is compatible with all previous levels of DS8880 LIC. Therefore, ensure that you loaded the latest MIB file that is available.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.185.199