SNMP notifications
This appendix describes SNMP traps that are sent out in a remote copy and mirror environments. This appendix repeats some of the SNMP trap information that is available inIBM DS8900F Architecture and Implementation, SG24-8456 and IBM DS8880 Architecture and Implementation (Release 8.51), SG24-8323. It includes the following topics:
IBM Copy Services Manager SNMP traps
SNMP overview
The DS8000 sends out SNMP traps when a state change in a remote Copy Services environment occurs. Eighteen traps are implemented. The traps 1xx are sent out for a state change of a physical link connection. The 2xx traps are sent out for state changes in the logical Copy Services setup.
The DS HMC can be set up to send SNMP traps to up to two defined IP addresses. Copy Services Manager (see Chapter 29, “IBM Copy Services Manager” on page 353) is listening to the SNMP traps of the DS8000. In addition, Network Management programs can be used to catch and process the SNMP traps.
Physical connection events
With the trap 1xx range, a state change of the physical links is reported. The trap is sent if the physical remote copy link is interrupted. The Link trap is sent from the primary system. The PLink and SLink columns are used only by one of our former products the 2105 Enterprise Storage Server® (ESS). If one or several links (but not all links) are interrupted, a trap 100, as shown in Example B-1, is posted and indicates that the redundancy is degraded. The RC column in the trap represents the return code for the interruption of the link; return codes are listed in Table B-1 on page 586.
Example B-1 Trap 100 - the Remote Mirror and Copy links are degraded
PPRC Links Degraded
UNIT: Mnf Type-Mod SerialNm LS
PRI: IBM 2107-922 75-20781 12
SEC: IBM 2107-9A2 75-ABTV1 24
Path: Type PP PLink SP SLink RC
1: FIBRE 0143 XXXXXX 0010 XXXXXX 15
2: FIBRE 0213 XXXXXX 0140 XXXXXX OK
If all links allare interrupted, a trap 101, as shown in Example B-2, is posted. This event indicates that no communication between the primary and the secondary system is possible any more.
Example B-2 Trap 101 - the Remote Mirror and Copy links are inoperable
PPRC Links Down
UNIT: Mnf Type-Mod SerialNm LS
PRI: IBM 2107-922 75-20781 10
SEC: IBM 2107-9A2 75-ABTV1 20
Path: Type PP PLink SP SLink RC
1: FIBRE 0143 XXXXXX 0010 XXXXXX 17
2: FIBRE 0213 XXXXXX 0140 XXXXXX 17
When the DS8000 can communicate again by using any of the links, trap 102, as shown in Example B-3, is sent when one or more of the interrupted links are available again.
Example B-3 Trap 102 - Remote Mirror and Copy links are operational
PPRC Links Up
UNIT: Mnf Type-Mod SerialNm LS
PRI: IBM 2107-9A2 75-ABTV1 21
SEC: IBM 2107-922 75-20781 11
Path: Type PP PLink SP SLink RC
1: FIBRE 0010 XXXXXX 0143 XXXXXX OK
2: FIBRE 0140 XXXXXX 0213 XXXXXX OK
Table B-1 on page 586 shows the Remote Mirror and Copy return codes.
Table B-1 Remote Mirror and Copy return codes
Return code
Description
02
The initialization failed. The ESCON link reject threshold is exceeded when you attempt to send ELP or RID frames.
03
There is a timeout. No reason is available.
04
There are no resources available in the primary storage unit for establishing logical paths because the maximum number of logical paths are already established.
05
There are no resources available in the auxiliary storage unit for establishing logical paths because the maximum number of logical paths are already established.
06
There is an auxiliary storage unit sequence number, or logical subsystem number, mismatch.
07
There is a secondary LSS subsystem identifier (SSID) mismatch, or failure of the I/O that collects the secondary information for validation.
08
The ESCON link is offline. This situation is caused by the lack of light detection that is coming from a host, peer, or switch.
09
The establish failed. It is tried again until the command succeeds or a remove paths command is run for the path. The attempt-to-establish state persists until the establish path operation succeeds or the remove Remote Mirror and Copy paths command is run for the path.
0A
The primary storage unit port or link cannot be converted to channel mode if a logical path is already established on the port or link. The establish paths operation is not tried within the storage unit.
10
Configuration error. The source of the error is one of the following conditions:
The specification of the SA ID does not match the installed ESCON adapters in the
primary controller.
For ESCON paths, the auxiliary storage unit destination address is zero and an ESCON Director (switch) was found in the path.
For ESCON paths, the auxiliary storage unit destination address is not zero and an ESCON director does not exist in the path. The path is a direct connection.
14
The Fibre Channel path link is down.
15
The maximum number of Fibre Channel path retry operations is exceeded.
16
The Fibre Channel path secondary adapter is not Remote Mirror and Copy capable. This situation might be caused by one of the following conditions:
The secondary adapter is not configured correctly or does not have the current firmware installed.
The secondary adapter is already a target of 32 different logical subsystems (LSSs).
17
The secondary adapter Fibre Channel path is not available.
18
The maximum number of Fibre Channel path primary login attempts is exceeded.
19
The maximum number of Fibre Channel path secondary login attempts is exceeded.
1A
The primary Fibre Channel adapter is not configured correctly or does not have the correct firmware level installed.
1B
The Fibre Channel path is established but is degraded because of a high failure rate.
1C
The Fibre Channel path was removed because of a high failure rate.
Remote Mirror and Copy events
If you configured consistency groups and a volume within this consistency group is suspended because of a write error to the secondary volume, trap 200, as shown in Example B-4, is sent. One trap per LSS, which is configured with the consistency group option, is sent. This trap can be handled by automation software such as Copy Services Manager to freeze this consistency group. The SR column in the trap represents the suspension reason code, which explains the cause of the error that suspended the Remote Mirror and Copy group. Suspension reason codes are listed in Table B-2 on page 591.
Example B-4 Trap 200: LSS-Pair consistency group Remote Mirror and Copy pair error
LSS-Pair Consistency Group PPRC-Pair Error
UNIT: Mnf Type-Mod SerialNm LS LD SR
PRI: IBM 2107-922 75-03461 56 84 08
SEC: IBM 2107-9A2 75-ABTV1 54 84
Trap 202, as shown in Example B-5, is sent if a remote Copy Pair goes into a suspended state. The trap contains the serial number (SerialNm) of the primary and secondary system, the logical subsystem or LSS (LS), and the logical device (LD). To avoid SNMP trap flooding, the number of SNMP traps for the LSS is throttled. The complete suspended pair information is represented in the summary. The last row of the trap represents the Suspended state for all pairs in the reporting LSS. The suspended pair information contains a hexadecimal string of a length of 64 characters. By converting this hex string into binary, each bit represents a single device. If the bit is 1, then the device is suspended; otherwise, the device is still in the Full Duplex mode.
 
Triggering this alert: This alert can also be triggered depending on your actions. For example, the alert is triggered if you manually suspend the replication.
Example B-5 Trap 202: Primary Remote Mirror and Copy devices on the LSS suspended due to error
Primary PPRC Devices on LSS Suspended Due to Error
UNIT: Mnf Type-Mod SerialNm LS LD SR
PRI: IBM 2107-922 75-20781 11 00 03
SEC: IBM 2107-9A2 75-ABTV1 21 00
Start: 20xx/11/14 09:48:05 CST
PRI Dev Flags (1 bit/Dev, 1=Suspended):
C000000000000000000000000000000000000000000000000000000000000000
Global Mirror related SNMP traps
Trap 210, as shown in Example B-6, is sent when a consistency group in a Global Mirror environment is successfully formed.
Example B-6 Trap 210 - Global Mirror initial consistency group successfully formed
Asynchronous PPRC Initial Consistency Group Successfully Formed
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-922 75-20781
Session ID: 4002
Trap 211, as shown in Example B-7, is sent if the Global Mirror setup goes into a severe error state where no attempts are made to form a consistency group.
Example B-7 Trap 211: The Global Mirror session is in a fatal state
Asynchronous PPRC Session is in a Fatal State
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-922 75-20781
Session ID: 4002
Trap 212, as shown in Example B-8, is sent when a consistency group cannot be created in a Global Mirror relationship. Some of the reasons might be:
Volumes are taken out of a copy session.
The remote copy link bandwidth might not be sufficient.
The FC link between the primary and secondary system is not available.
Example B-8 Trap 212: Global Mirror consistency group failure - a retry is attempted
Asynchronous PPRC Consistency Group Failure - Retry will be attempted
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-922 75-20781
Session ID: 4002
Trap 213, as shown in Example B-9, is sent when a consistency group in a Global Mirror environment can be formed after a previous consistency group formation failure.
Example B-9 Trap 213: Global Mirror consistency group successful recovery
Asynchronous PPRC Consistency Group Successful Recovery
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-9A2 75-ABTV1
Session ID: 4002
Trap 214, as shown in Example B-10, is sent if a Global Mirror Session is terminated by running the DS CLI rmgmir command.
Example B-10 Trap 214: Global Mirror Master terminated
Asynchronous PPRC Master Terminated
UNIT: Mnf Type-Mod SerialNm
IBM 2107-922 75-20781
Session ID: 4002
Trap 215, as shown in Example B-11, is sent if, in the Global Mirror environment, the master detects a failure to complete the FlashCopy commit. The trap is sent after a number of commit retries fail.
Example B-11 Trap 215: Global Mirror FlashCopy at the remote site is unsuccessful
Asynchronous PPRC FlashCopy at Remote Site Unsuccessful
A UNIT: Mnf Type-Mod SerialNm
        IBM 2107-9A2 75-ABTV1
Session ID: 4002
Trap 216, as shown in Example B-12, is sent if a Global Mirror master cannot terminate the Global Copy relationship at one of its subordinates. This situation might occur if the master is terminated by running rmgmir but the master cannot terminate the copy relationship on the subordinate. You might need to run rmgmir against the subordinate to prevent any interference with other Global Mirror sessions.
Example B-12 Trap 216: Global Mirror subordinate termination unsuccessful
Asynchronous PPRC Slave Termination Unsuccessful
UNIT: Mnf Type-Mod SerialNm
Master: IBM 2107-922 75-20781
Slave:  IBM 2107-921 75-03641
Session ID: 4002
Trap 217, as shown in Example B-13, is sent if a Global Mirror environment is suspended by the DS CLI command pausegmir.
Example B-13 Trap 217: Global Mirror paused
Asynchronous PPRC Paused
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-9A2 75-ABTV1
Session ID: 4002
Trap 218, as shown in Example B-14, is sent if a Global Mirror exceeds the allowed threshold for failed consistency group formation attempts.
Example B-14 Trap 218: Global Mirror number of consistency group failures exceed threshold
Global Mirror number of consistency group failures exceed threshold
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-9A2 75-ABTV1
Session ID: 4002
Trap 219, as shown in Example B-15, is sent if a Global Mirror successfully forms a consistency group after one or more formations attempts previously failed.
Example B-15 Trap 219: Global Mirror first successful consistency group after prior failures
Global Mirror first successful consistency group after prior failures
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-9A2 75-ABTV1
Session ID: 4002
Trap 220, as shown in Example B-16, is sent if a Global Mirror exceeds the threshold of failed FlashCopy commit attempts.
Example B-16 Trap 220: Global Mirror number of FlashCopy commit failures exceed threshold
Global Mirror number of FlashCopy commit failures exceed threshold
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-9A2 75-ABTV1
Session ID: 4002
Trap 225, as shown in Example B-17 on page 590, is sent if a Global Mirror paused on consistency group boundary.
Example B-17 Trap 225: Global Mirror paused on consistency group boundary
Global Mirror operation paused on the consistency group boundary.
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-981 75-FAW81
Session ID: 4002
Trap 226, as shown in Example B-18, is sent if a Global Mirror operation failed to unsuspend one or more Global Copy members.
Example B-18 Trap 226: Global Mirror unsuspend members failed
Global Mirror unsuspend members failed.
UNIT: Mnf Type-Mod SerialNm
      IBM 2107-981 75-FAW81
Session ID: 4002
 
 
Note: The SNMP traps 221 to 224 are used for other purposes:
SNMP traps 221 and 223 is sent for Extent Space Efficient (ESE) volume or Extentpool warnings.
SNMP trap 222 is sent if the Encryption Key Management has issued an alert
SNMP trap 224 is used when a rank reaches I/O saturation.
 
Table B-2 shows the Copy Services suspension reason codes.
Table B-2 Copy Services suspension reason codes
Suspension reason code (SRC)
Description
03
The host system sent a command to the primary volume of a Remote Mirror and Copy volume pair to suspend copy operations. The host system might have specified either an immediate suspension or a suspension after the copy completed and the volume pair reached a Full Duplex state.
04
The host system sent a command to suspend the copy operations on the secondary volume. During the suspension, the primary volume of the volume pair can still accept updates but updates are not copied to the secondary volume. The out-of-sync tracks that are created between the volume pair are recorded in the change recording feature of the primary volume.
05
Copy operations between the Remote Mirror and Copy volume pair were suspended by a primary storage unit secondary device status command. This system resource code can be returned only by the secondary volume.
06
Copy operations between the Remote Mirror and Copy volume pair were suspended because of internal conditions in the storage unit. This system resource code can be returned by the control unit of either the primary volume or the secondary volume.
07
Copy operations between the Remote Mirror and Copy volume pair were suspended when the auxiliary storage unit notified the primary storage unit of a state change transition to the simplex state. The specified volume pair between the storage units is no longer in a copy relationship.
08
Copy operations were suspended because the secondary volume was suspended as a result of internal conditions or errors. This system resource code can be returned only by the primary storage unit.
09
The Remote Mirror and Copy volume pair was suspended when the primary or auxiliary storage unit was rebooted or when the power was restored.
The paths to the auxiliary storage unit might not be disabled if the primary storage unit was turned off. If the auxiliary storage unit was turned off, the paths between the storage units are restored automatically, if possible. After the paths are restored, run mkpprc to resynchronize the specified volume pairs. Depending on the state of the volume pairs, you might have to run rmpprc to delete the volume pairs and run mkpprc to reestablish the volume pairs.
0A
The Remote Mirror and Copy pair was suspended because the host issued a command to freeze the Remote Mirror and Copy group. This system resource code can be returned only if a primary volume was queried.
Copy Services Manager related SNMP traps
As described in Chapter 29, “IBM Copy Services Manager” on page 353, Copy Services Manager is a software that manages and monitors Copy Services. It can handle FlashCopy, Safeguarded Copy, Metro Mirror, Global Copy, Global Mirror, Metro/Global Mirror and Multi-target PPRC. It is also important to monitor Copy Services Manager, as it produces messages about all the created sessions.
Other networking management software can receive traps and act after they are notified. Such software usually helps many companies to monitor and take support actions that are based on the reported problems.
The traps can be classified as the following general events:
Session state change
Configuration change
Suspending-event notification
Communication failure
Management Server state change
Scheduled task notification
Table B-3 describes the traps alerts that Copy Services Manager can produce along with a short description of the change that is made to a session.
Table B-3 Session State traps: These traps are sent only by an active server
Object ID (OID)
Description
1.3.6.1.4.1.2.6.208.0.1
The state of session X has transitioned to Defined.
1.3.6.1.4.1.2.6.208.0.2
The state of session X has transitioned to Preparing.
1.3.6.1.4.1.2.6.208.0.3
The state of session X has transitioned to Prepared.
1.3.6.1.4.1.2.6.208.0.4
The state of session X has transitioned to Suspended.
1.3.6.1.4.1.2.6.208.0.5
The state of session X has transitioned to Recovering.
1.3.6.1.4.1.2.6.208.0.6
The state of session X has transitioned to Target Available.
1.3.6.1.4.1.2.6.208.0.19
The state of session X has transitioned to Suspending.
1.3.6.1.4.1.2.6.208.0.20
The state of session X has transitioned to SuspendedH2H3.
1.3.6.1.4.1.2.6.208.0.21
The state of session X has transitioned to SuspendedH1H3.
1.3.6.1.4.1.2.6.208.0.22
The state of session X has transitioned to Flashing.
1.3.6.1.4.1.2.6.208.0.23
The state of session X has transitioned to Terminating.
1.3.6.1.4.1.2.6.208.0.26
The recovery point objective (RPO) for the role pair of X in session Y has passed the warning threshold of Z seconds.
1.3.6.1.4.1.2.6.208.0.27
The RPO for the role pair of X in session Y has passed the severe threshold of Z seconds.
1.3.6.1.4.1.2.6.208.0.28
A suspend event occurred triggering the auto restart feature for session X. The session will be restarted in Y seconds.
1.3.6.1.4.1.2.6.208.0.29
Session X was enabled for auto restart. However, the session could not be restarted.
1.3.6.1.4.1.2.6.208.0.33
The state of session X has transitioned to Protected.
1.3.6.1.4.1.2.6.208.0.34
The state of session X has transitioned to Unprotected.
Table B-4 describes the Configuration change traps.
Table B-4 Configuration change traps: These traps are sent only by an active server
Object ID (OID)
Description
1.3.6.1.4.1.2.6.208.0.7
One or more copy sets have been added or deleted from this session.
An event is sent for each session at least every 15 minutes.
1.3.6.1.4.1.2.6.208.0.8
Peer-to-Peer Remote Copy (PPRC) path definitions has been changed. An event is sent for each path configuration change.
1.3.6.1.4.1.2.6.208.0.30
One or more logical paths between storage systems have entered an error state or have been removed.
Table B-5 describes a suspending-event trap.
Table B-5 Suspending-event traps are sent by both the active and standby servers
Object ID (OID)
Description
1.3.6.1.4.1.2.6.208.0.9
The session is in a severe state due to an unexpected error.
 
Table B-6 describes the Communication-failure traps.
Table B-6 Communication-failure traps: These traps are sent by both the active and standby servers
Object ID (OID)
Description
1.3.6.1.4.1.2.6.208.0.10
Server X has timed out attemptong to communicate with storage system Y.
1.3.6.1.4.1.2.6.208.0.11
Server X has encountered errors attempting to communicate with storage system Y.
1.3.6.1.4.1.2.6.208.0.12
Active server X has terminated communication with standby server Y as a result of communication errors.
1.3.6.1.4.1.2.6.208.0.13
Standby server X has encountered communication errors with active server Y.
 
Important: For Communication-failure traps, after an SNMP trap for a failure is sent, it is not resent unless communication is reestablished and failed again.
Table B-7 describes the management servers traps.
Table B-7 Management servers traps: These traps are sent by both active and standby servers
Object ID (OID)
Description
1.3.6.1.4.1.2.6.208.0.14
The copy services management server HA connection X  Y has changed state to Unknown (previously Offline).
1.3.6.1.4.1.2.6.208.0.15
The copy services management server HA connection X  Y has changed state to Synchronized.
1.3.6.1.4.1.2.6.208.0.16
The copy services management server HA connection X  Y has changed state to Disconnected Consistent (previously Consistent Offline).
1.3.6.1.4.1.2.6.208.0.17
The copy services management server HA connection X  Y has changed state to Synchronization Pending.
1.3.6.1.4.1.2.6.208.0.18
The copy services management server HA connection X  Y has changed state to Disconnected.
Table Table B-8 describes the CSM Scheduled task notification traps.
Table B-8 Scheduled task notification traps
Object ID (OID)
Description
1.3.6.1.4.1.2.6.208.0.31
The scheduled task ibmTPCRtaskname has finished running.
1.3.6.1.4.1.2.6.208.0.32
Scheduled task ibmTPCRtaskname failed due to an error that was encountered while it was running.
Correlating remote copy traps and possible actions
When you work with Metro Mirror, Global Copy, Global Mirror, or Metro/Global Mirror, you can face multiple interpretations of the results on SNMP messages. Table B-9 on page 595 describes the principals traps and the possible actions to take or check to identify the root cause of your problem.
 
Terminology note: Due to space constraints, Table B-9 on page 595 uses the terms MM for Metro Mirror, GM for Global Mirror, and MGM for Metro/Global Mirror.
Table B-9 Remote copy traps and possible actions
Error trap code
Remote Copy type
Source error message
Target error message
Possible actions
Trap 100
MM/GM/MGM
PPRC Links Degraded
PPRC Links Degraded
Check connectivity on both sides. The message describes which IOPORT failed. Check whether connectivity is still available for the mentioned ports. At least one path is still available.
Trap 101
MM/GM/MGM
PPRC Links Down
PPRC Links Down
Check connectivity on both sides. The message describes which IOPORT failed. Check whether connectivity is still available for the mentioned ports. All paths are down.
Trap 202
MM/GM/MGM
Primary PPRC Devices on LSS Suspended Due to Error
PPRC Links Down. This error can come first on any of the available systems that are reporting the message.
This error message can be reported after a Trap 102. In this case, it means that the copy was suspended on the primary because of connectivity problems. Recheck the connectivity and resume the PPRC copy.
Trap 202
MM/GM/MGM
Primary PPRC Devices on LSS Suspended Due to Error
None
The PPRC relationship was manually paused or an error with the volume on primary system caused the error. Check the volume status and connectivity and resume operations after you correct the issues. If the volume status depends on the DDM status, call IBM for a complete health check before you continue. This message can also be issued after you remove a PPRC relationship. In this case, no further action is needed.
Trap 218
GM/MGM
Global Mirror Number of consistency group failures exceed threshold
None
Check the connectivity and bandwidth capacities. Recheck your session’s configurations on GM/MGM. Also, this error can appear when the secondary site has issues with the LUNs.
Trap 219
GM/MGM
Global Mirror First successful Consistency Group after prior failures
None
There is no needed action for this trap. It means only that it can successfully restore after a previous failure.
Trap 210
GM/MGM
Global Mirror Master terminated
None
The session was manually terminated. Reestablish a new session between the primary / secondary or tertiary site.
Trap 221
GM/MGM
Devices on LSS suspended due to error
Space-Efficient Repository or Over-provisioned Volume has reached a warning watermark.
Check for your ESE repositories sizes. Resume PPRC operations after you correct the problem.
Trap 223
MM/GM/MGM
Extent Pool capacity threshold reached
None or Extent Pool capacity threshold reached.
Check for your ESE repositories sizes. Resume PPRC operations after you correct the problem.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.80.3