Cluster partition management
This chapter contains information about the following topics for cluster partition management on PowerHA SystemMirror 7.1.2 Enterprise Edition (EE):
10.1 Cluster partitioning
This section describes cluster partitioning considerations.
The terms cluster partitioning, split-brain, and node isolation all refer to a situation where more than one cluster node activates resources as though it were the primary node. Such a situation can occur in the following scenarios:
Failure of all links between sites
Multiple failures within a site
 – Requires failures of Ethernet, SAN, and repository access
Although cluster partitioning can occur within a site, cluster partitioning between sites is considered to be of a higher probability because heartbeating relies on IP network connectivity.
Figure 10-1 illustrates an overview of cluster partitioning between sites.
Figure 10-1 Cluster partitioning
IBM PowerHA defines the following terminologies for these specific situations:
Split When communication is lost between the cluster nodes, and each site believes that it is the only one still online.
Merge When the split partitions in the cluster attempt to reform as a single cluster because the links between them have been restored.
When utilizing sites with a PowerHA cluster, avoid unplanned split events. Not only can such events lead to undesirable results, but data divergence is also a risk. Both sites might bring the resources online, leading to each site accessing its own local disks and thus creating inconsistent copies. Manual intervention is required to recover from this situation, but it might be difficult to perform.
In general, cluster partitions are not detected until a merge event occurs. Although previous versions of PowerHA implemented safeguard functions for merge events, during the split there is no function to prevent data divergence.
10.2 Methods to avoid cluster partitioning
A partitioned cluster is one of the worst scenarios that can occur in a clustered environment. This condition can be dangerous because each node can independently run the applications, and can acquire the data from their own storage copies. If this occurs you risk losing access to the disks, and potentially experiencing data divergence.
The basic way to prevent partitioning between sites it to define multiple IP networks between the sites. Having multiple network communication paths between the sites minimizes the risk of a false fallover. To achieve true redundancy, the networks must be backed by a separate network infrastructure.
If that is not possible, there might be no benefit to having a separate network defined. Instead, the multiple interfaces can be aggregated to form a single redundant logical interface, which adds redundancy to the communication interface.
However, even with the method previously mentioned, we cannot prevent cluster partitions from total loss of IP connectivity between sites.
IBM PowerHA SystemMirror 7.1.2 Enterprise Edition now provides a new function: the tie breaker disk. This new function allows PowerHA to decide which partition of the cluster should survive in case of split events.
Tie breaker disk overview
You can use the tie breaker option to specify a SCSI-3 Persistent Reserve (PR)-capable disk that is used by the split and merge policies.
A tie breaker disk is used when a group of nodes in a cluster cannot communicate with each other. This communication failure results in the cluster splitting the nodes into two or more partitions. If failure occurs because the cluster communication links are not responding, both partitions attempt to lock the tie breaker disk. The partition that acquires the tie breaker disk continues to function, while the other partition reboots.
 
Tie breaker accessibility considerations: The disk that is identified as the tie breaker must be accessible to all nodes in the cluster.
When partitions that were part of the cluster are brought back online after the communication failure, they must be able to communicate with the partition that owns the tie breaker disk. If a partition that is brought back online cannot communicate with the tie breaker disk, it does not join the cluster. The tie breaker disk is released when all nodes in the configuration rejoin the cluster.
10.3 Planning to avoid cluster partitioning
This section discusses how to plan to avoid cluster partitioning.
Stretched cluster versus linked cluster
Although split and merge can happen in either stretched clusters or linked clusters, a linked cluster is considered crucial because it usually involves data replication.
Adding a tie breaker disk to a stretched cluster is, in general, expected to be not that beneficial. This is based on the assessment that partitions are more likely in a configuration with additional heartbeat paths through SAN or the repository disk.
In comparison, the risk of total loss of IP connectivity between sites is higher in a linked cluster. In such configurations, using a tie breaker disk configuration is suggested to prevent cluster partitions when a split event occurs.
Split handling policy
The split handling policy attribute describes the type of handling performed by PowerHA when a split occurs within a cluster. The possible choices for this attribute are described here:
None: (Default) With this option, each partition that is created by the cluster split event becomes an independent cluster. Each partition can start a workload independent of the other partition. This option is the default setting.
Note that, for linked clusters, do not use this option if your environment is configured to use HyperSwap for PowerHA SystemMirror.
TieBreaker With this option, each partition attempts to acquire the tie breaker by placing a lock on the tie breaker disk. The tie breaker is a SCSI disk that is accessible to all nodes in the cluster. The partition that cannot lock the disk is rebooted.
Note that if you use this option, the merge policy configuration must also use the tie breaker option.
Merge handling policy
The merge handling policy attribute controls the behavior of the PowerHA cluster when the cluster partitions or splits, and later, when the cluster partitions are attempting to merge. The possible choices for this attribute are described here:
Majority (Default) With this option, the partition with the highest number of nodes remains online. If each partition has the same number of nodes, then the partition that has the shortest node ID is chosen. The partition that does not remain online is rebooted.
This is the default policy, and it is identical to that provided by PowerHA 6.1 and prior releases.
TieBreaker With this option, each partition attempts to acquire the tie breaker by placing a lock on the tie breaker disk. The tie breaker is a SCSI disk that is accessible to all nodes in the cluster. The partition that cannot lock the disk is rebooted.
Note that if you use this option, your split policy configuration must also use the tie breaker option.
 
Important: The current release only supports the following combinations:
Split Policy: None, Merge Policy: Majority
Split Policy: TieBreaker, Merge Policy: TieBreaker
Split and merge action plan
This policy setting describes what action is taken on the nodes in the losing partitions when the split or merge happens. The only configurable method in the current release is listed here:
Reboot (Default) In this case, the AIX operating system is rebooted.
Tie breaker disk requirements
Note the following requirements for tie breaker disks:
A disk supporting SCSI-3 persistent reserve that is accessible by all nodes.
The repository cannot be used as a tie breaker.
You can verify whether the disks are SCSI-3 persistent reserve-capable with the lsattr -Rl hdiskX -a reserve_policy command. The PR_exclusive value should appear as shown in Example 10-1.
Example 10-1 Checking reserve capability of the disks
# lsattr -Rl hdisk9 -a reserve_policy
no_reserve
single_path
PR_exclusive
PR_shared
10.4 Detailed behavior og cluster partitioning
In this section, we discuss the detailed behavior of cluster partitioning through actual testing.
10.4.1 Test environment overview
This scenario used a three-node cluster that was configured with IPv6 and IPv4 (dual stack configuration). The cluster name is glvma1_cluster. The cluster consisted of one node on siteA, and two nodes on siteB.
Two networks were configured: an XD_data network and an Ether network. The XD_data network was connected with IPv4 addresses. The Ether network was connected with IPv6 addresses.
The disks were replicated through a synchronous GLVM function. The tie breaker disk was configured as a Fibre Channel-attached DS3400 device that was accessible from both sites. Figure 10-2 shows the overview of the cluster configuration.
Figure 10-2 Test environment overview
A routed network was configured between the two sites. To test the cluster partitioning, we shut down the routers (Figure 10-3). This isolated the two sites, initiating the split event.
Figure 10-3 Disabling the routers
Next, we reactivated the routers for the two sites to re-establish the IP link. This initiated the merge event.
10.4.2 Configuring the split and merge policy
To configure the split and merge policy, we used smitty sysmirror → Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy.
Figure 10-4 shows the corresponding SMIT panel of the window.
      Configure Cluster Split and Merge Policy for a Linked Cluster
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
Split Handling Policy None +
Merge Handling Policy Majority +
Split and Merge Action Plan Reboot +
Select Tie Breaker +
Figure 10-4 SMIT panel for split and merge policy
To configure the tie breaker disks, we pressed F4 while on the Select Tie Breaker menu. A list of available disks that can be configured as the tie breaker disk was displayed (Figure 10-5).
For information about how to get your disk to show up in this list, see “Tie breaker disk requirements” on page 439.
 +--------------------------------------------------------------------------+
| Select Tie Breaker |
| |
| Move cursor to desired item and press Enter. |
| |
| None |
| hdisk9 (00f70c9952af7567) on all cluster nodes |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 10-5 Tie breaker disk selection
To confirm this setting, we checked the HACMPsplitmerge ODM database (Example 10-2).
Example 10-2 HACMPsplitmerge ODM database
# odmget HACMPsplitmerge
 
HACMPsplitmerge:
id = 0
policy = "action"
value = "Reboot"
 
HACMPsplitmerge:
id = 0
policy = "split"
value = "TieBreaker"
 
HACMPsplitmerge:
id = 0
policy = "merge"
value = "TieBreaker"
 
HACMPsplitmerge:
id = 0
policy = "tiebreaker"
value = "00f70c9952af7567"
10.4.3 Split policy: None, merge policy: Majority
Our first test was performed using the configuration shown in Figure 10-6.
      Configure Cluster Split and Merge Policy for a Linked Cluster
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
Split Handling Policy None +
Merge Handling Policy Majority +
Split and Merge Action Plan Reboot +
Select Tie Breaker +
Figure 10-6 Configuring split policy: None, merge policy: Majority
When the cluster was first activated, glvma1 acquired the resource group rg1 (Example 10-3).
Example 10-3 Resource group acquisition
# /usr/es/sbin/cluster/utilities/clRGinfo -p
 
Cluster Name: glvma1_cluster
 
Resource Group Name: rg1
Node Primary State Secondary State
---------------------------- ---------------
glvma1@siteA ONLINE OFFLINE
glvmb1@siteB OFFLINE ONLINE SECONDARY
glvmb2@siteB OFFLINE OFFLINE
Testing the split
We created a split situation by disabling the routers. As soon as the router was disabled, an event split_merge_prompt split occurred on every node. Table 10-1 on page 444 lists the events that occurred on node glvma1 and glvmb1.
Notice that false and mismatching events such as site_down and node_down events occurred on each node. Because glvmb1 falsely indicated that siteA was down, it tried to acquire the resource without disabling it on glvma1.
Table 10-1 Events during the split
Cluster events during the split on glvma1
Cluster events during the split on glvmb1
EVENT START: split_merge_prompt split
EVENT START: node_down glvmb1
EVENT COMPLETED: node_down glvmb1 0
EVENT START: site_down siteB
EVENT COMPLETED: split_merge_prompt split 0
EVENT START: site_down_remote siteB
EVENT COMPLETED: site_down_remote siteB 0
EVENT COMPLETED: site_down siteB 0
EVENT START: node_down glvmb2
EVENT COMPLETED: node_down glvmb2 0
EVENT START: rg_move_release glvma1 1
EVENT START: rg_move glvma1 1 RELEASE
EVENT COMPLETED: rg_move glvma1 1 RELEASE 0
EVENT COMPLETED: rg_move_release glvma1 1 0
EVENT START: rg_move_fence glvma1 1
EVENT COMPLETED: rg_move_fence glvma1 1 0
EVENT START: node_down_complete glvmb1
EVENT COMPLETED: node_down_complete glvmb1 0
EVENT START: node_down_complete glvmb2
EVENT COMPLETED: node_down_complete glvmb2 0
EVENT START: split_merge_prompt split
EVENT COMPLETED: split_merge_prompt split 0
EVENT START: site_down siteA
EVENT START: site_down_remote siteA
EVENT COMPLETED: site_down_remote siteA 0
EVENT COMPLETED: site_down siteA 0
EVENT START: node_down glvma1
EVENT COMPLETED: node_down glvma1 0
EVENT START: rg_move_release glvmb1 1
EVENT START: rg_move glvmb1 1 RELEASE
EVENT COMPLETED: rg_move glvmb1 1 RELEASE 0
EVENT COMPLETED: rg_move_release glvmb1 1 0
EVENT START: rg_move_fence glvmb1 1
EVENT COMPLETED: rg_move_fence glvmb1 1 0
EVENT START: rg_move_release glvmb1 1
EVENT START: rg_move glvmb1 1 RELEASE
EVENT COMPLETED: rg_move glvmb1 1 RELEASE 0
EVENT COMPLETED: rg_move_release glvmb1 1 0
EVENT START: rg_move_fence glvmb1 1
EVENT COMPLETED: rg_move_fence glvmb1 1 0
EVENT START: rg_move_fence glvmb1 1
EVENT COMPLETED: rg_move_fence glvmb1 1 0
EVENT START: rg_move_acquire glvmb1 1
EVENT START: rg_move glvmb1 1 ACQUIRE
EVENT START: acquire_takeover_addr
EVENT COMPLETED: acquire_takeover_addr 0
EVENT COMPLETED: rg_move glvmb1 1 ACQUIRE 0
EVENT COMPLETED: rg_move_acquire glvmb1 1 0
EVENT START: rg_move_complete glvmb1 1
EVENT COMPLETED: rg_move_complete glvmb1 1 0
EVENT START: node_down_complete glvma1
EVENT COMPLETED: node_down_complete glvma1 0
The error log on each node showed a split occurred (Example 10-4).
Example 10-4 Split errlog
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
4BDDFBCC 1207041112 I S ConfigRM The operational quorum state of the acti
A098BF90 1207041112 P S ConfigRM The operational quorum state of the acti
77A1A9A4 1207041112 I O ConfigRM ConfigRM received Site Split event notif
 
# errpt -a
---------------------------------------------------------------------------
LABEL: CONFIGRM_HASQUORUM_
IDENTIFIER: 4BDDFBCC
 
Date/Time: Fri Dec 7 04:11:13 2012
Sequence Number: 883
Machine Id: 00F70C994C00
Node Id: glvma1
Class: S
Type: INFO
WPAR: Global
Resource Name: ConfigRM
 
Description
The operational quorum state of the active peer domain has changed to HAS_QUORUM.
In this state, cluster resources may be recovered and controlled as needed by
management applications.
 
Probable Causes
One or more nodes have come online in the peer domain.
 
User Causes
One or more nodes have come online in the peer domain.
 
Recommended Actions
None
 
Detail Data
DETECTING MODULE
RSCT,PeerDomain.C,1.99.22.110,18993
ERROR ID
REFERENCE CODE
---------------------------------------------------------------------------
LABEL: CONFIGRM_PENDINGQUO
IDENTIFIER: A098BF90
 
Date/Time: Fri Dec 7 04:11:13 2012
Sequence Number: 882
Machine Id: 00F70C994C00
Node Id: glvma1
Class: S
Type: PERM
WPAR: Global
Resource Name: ConfigRM
 
Description
The operational quorum state of the active peer domain has changed to PENDING_QUORUM.
This state usually indicates that exactly half of the nodes that are defined in the
peer domain are online. In this state cluster resources cannot be recovered although
none will be stopped explicitly.
 
Probable Causes
One or more nodes in the active peer domain have failed.
One or more nodes in the active peer domain have been taken offline by the user.
A network failure is disrupted communication between the cluster nodes.
 
Failure Causes
One or more nodes in the active peer domain have failed.
One or more nodes in the active peer domain have been taken offline by the user.
A network failure is disrupted communication between the cluster nodes.
 
Recommended Actions
Ensure that more than half of the nodes of the domain are online.
Ensure that the network that is used for communication between the nodes is functioning correctly.
Ensure that the active tie breaker device is operational and if it set to
'Operator' then resolve the tie situation by granting ownership to one of
the active sub-domains.
 
Detail Data
DETECTING MODULE
RSCT,PeerDomain.C,1.99.22.110,18997
ERROR ID
REFERENCE CODE
 
---------------------------------------------------------------------------
LABEL: CONFIGRM_SITE_SPLIT
IDENTIFIER: 77A1A9A4
 
Date/Time: Fri Dec 7 04:11:12 2012
Sequence Number: 880
Machine Id: 00F70C994C00
Node Id: glvma1
Class: O
Type: INFO
WPAR: Global
Resource Name: ConfigRM
 
Description
ConfigRM received Site Split event notification
 
Probable Causes
Networks between sites may have been disconnected
 
Failure Causes
Networks between sites may have been disconnected
 
Recommended Actions
Check the network connectivity between sites
 
Detail Data
DETECTING MODULE
RSCT,ConfigRMGroup.C,1.331,1398
ERROR ID
REFERENCE CODE
DIAGNOSTIC EXPLANATION
At this point, resource groups were now acquired on both siteA and siteB. The volume group was activated on both nodes (Example 10-5 on page 447). The cluster state was not synced between sites, and each node was able to access write-enabled its own copies.
Table 10-2 Resource group state after the split
Resource group state in glvma1
Resource group state in glvmb1
# /usr/es/sbin/cluster/utilities/clRGinfo -p
 
Cluster Name: glvma1_cluster
 
Resource Group Name: rg1
Node Primary State Secondary State
---------------------------- ---------------
glvma1@siteA ONLINE OFFLINE
glvmb1@siteB OFFLINE OFFLINE
glvmb2@siteB OFFLINE OFFLINE
 
# lspv
hdisk0 00f70c99e24ff9ff altinst_rootvg
hdisk1 00f70c9901259917 rootvg active
hdisk2 00f70c992405114b caavg_private active
hdisk3 00f70c990580a411 glvmvg active
hdisk4 00f70c990580a44c glvmvg active
hdisk5 00f70c990580a486 glvmvg active
hdisk6 00f6f5d005808d31 glvmvg active
hdisk7 00f6f5d005808d6b glvmvg active
hdisk8 00f6f5d005808da5 glvmvg active
hdisk9 00f70c9952af7567 None
 
# /usr/es/sbin/cluster/utilities/clRGinfo -p
 
Cluster Name: glvma1_cluster
 
Resource Group Name: rg1
Node Primary State Secondary State
---------------------------- ---------------
glvma1@siteA OFFLINE OFFLINE
glvmb1@siteB ONLINE OFFLINE
glvmb2@siteB OFFLINE OFFLINE
 
# lspv
hdisk0 00f6f5d0e24f6303 altinst_rootvg
hdisk1 00f6f5d0012596cc rootvg active
hdisk2 00f6f5d023ec219d caavg_private active
hdisk3 00f6f5d005808d31 glvmvg active
hdisk4 00f6f5d005808d6b glvmvg active
hdisk5 00f6f5d005808da5 glvmvg active
hdisk6 00f70c990580a411 glvmvg active
hdisk7 00f70c990580a44c glvmvg active
hdisk8 00f70c990580a486 glvmvg active
hdisk9 00f70c9952af7567 None
 
In summary, the cluster partition was not prevented during total IP connection lost. This is the expected behavior for setting the split handling policy to None.
Testing the merge
Next. we created a merge situation by reactivating the routers. Because siteB was configured with two nodes, the expected behavior was the resource group moved to siteB.
Shortly after the router had been activated, an event split_merge_prompt merge occurred on every node. Table 10-3 shows the events that occurred on node glvma1 and glvmb1.
Table 10-3 Events during the merge
Cluster events during the merge on glvma1
Cluster events during the merge on glvmb1
EVENT START: split_merge_prompt merge
EVENT COMPLETED: split_merge_prompt merge 0
EVENT START: split_merge_prompt merge
EVENT COMPLETED: split_merge_prompt merge 0
The error log on each node indicated that a merge occurred(Example 10-5).
Example 10-5 Merge errlog
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
1BD32427 1207042512 I O ConfigRM ConfigRM received Site Merge event notif
 
# errpt -a
---------------------------------------------------------------------------
LABEL: CONFIGRM_SITE_MERGE
IDENTIFIER: 1BD32427
 
Date/Time: Fri Dec 7 04:25:41 2012
Sequence Number: 764
Machine Id: 00F6F5D04C00
Node Id: glvmb2
Class: O
Type: INFO
WPAR: Global
Resource Name: ConfigRM
 
Description
ConfigRM received Site Merge event notification
 
Probable Causes
Networks between sites may have been reconnected
 
Failure Causes
Networks between sites may have been reconnected
 
Recommended Actions
Verify the network connection between sites
 
Detail Data
DETECTING MODULE
RSCT,ConfigRMGroup.C,1.331,1436
ERROR ID
REFERENCE CODE
DIAGNOSTIC EXPLANATION
Shortly after these events occurred, a reboot occurred on glvma1. The errlog indicated the cause of the reboot (Example 10-6).
Example 10-6 Merge errlog
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
AFA89905 1207042712 I O cthags Group Services daemon started
DE84C4DB 1207042612 I O ConfigRM IBM.ConfigRM daemon has started.
A6DF45AA 1207042612 I O RMCdaemon The daemon is started.
2BFA76F6 1207042612 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1207042612 T O errdemon ERROR LOGGING TURNED ON
24126A2B 1207042512 P O cthags Group Services daemon exit to merge/spli
F0851662 1207042512 I S ConfigRM The sub-domain containing the local node
1BD32427 1207042512 I O ConfigRM ConfigRM received Site Merge event notif
 
# errpt -a
---------------------------------------------------------------------------
LABEL: GS_SITE_DISSOLVE_ER
IDENTIFIER: 24126A2B
 
Date/Time: Fri Dec 7 04:25:44 2012
Sequence Number: 886
Machine Id: 00F70C994C00
Node Id: glvma1
Class: O
Type: PERM
WPAR: Global
Resource Name: cthags
 
Description
Group Services daemon exit to merge/split sites
 
Probable Causes
Network between two sites has repaired
 
Failure Causes
CAA Services has been partitioned and/or merged.
 
Recommended Actions
Check the CAA policies.
Verify that CAA has been restarted
Call IBM Service if problem persists
 
Detail Data
DETECTING MODULE
RSCT,NS.C,1.107.1.68,4894
ERROR ID
6fca2Y.MMPkE/kn8.rE4e.1...................
REFERENCE CODE
DIAGNOSTIC EXPLANATION
NS::Ack(): The master requests to dissolve my domain because of the merge with other domain 65535.Ni
---------------------------------------------------------------------------
LABEL: CONFIGRM_MERGE_ST
IDENTIFIER: F0851662
 
Date/Time: Fri Dec 7 04:25:44 2012
Sequence Number: 885
Machine Id: 00F70C994C00
Node Id: glvma1
Class: S
Type: INFO
WPAR: Global
Resource Name: ConfigRM
 
Description
The sub-domain containing the local node is being dissolved because another
sub-domain has been detected that takes precedence over it. Group services
will be ended on each node of the local sub-domain which will cause the
configuration manager daemon (IBM.ConfigRMd) to force the node offline and
then bring it back online in the surviving domain.
 
Probable Causes
A merge of two sub-domain is probably caused by a network outage being
repaired so that the nodes of the two sub-domains can now communicate.
 
User Causes
A merge of two sub-domain is probably caused by a network outage being
repaired so that the nodes of the two sub-domains can now communicate.
 
Recommended Actions
No action is necessary since the nodes will be automatically synchronized
and brought online in the surviving domain.
 
Detail Data
DETECTING MODULE
RSCT,ConfigRMGroup.C,1.331,919
ERROR ID
REFERENCE CODE
In summary, the resource group stayed online on siteB, and siteA was brought offline on a merge event. This is the expected behavior for setting the merge policy to Majority because siteB had the majority of the nodes in the cluster.
10.4.4 Split policy: TieBreaker, merge policy: TieBreaker
Then we tested with the configuration shown in Figure 10-7.
Configure Cluster Split and Merge Policy for a Linked Cluster
 
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
Split Handling Policy TieBreaker +
Merge Handling Policy TieBreaker +
Split and Merge Action Plan Reboot +
Select Tie Breaker (00f70c9952af7567) +
Figure 10-7 Configuring split policy: TieBreaker, merge policy: TieBreaker
The disk with PVID 00f70c9952af7567 was a DS3400 disk configured to all nodes in the cluster. From the DS3400, the LUN is named glvm_tiebreaker. Example 10-7 shows some of the tie breaker disks properties.
Example 10-7 Tie breaker disk names
# lspv | grep 00f70c9952af7567
hdisk9 00f70c9952af7567 None

# lsdev -Cc disk -l hdisk9
hdisk9 Available 80-T1-01 MPIO Other DS3K Array Disk

# mpio_get_config -l hdisk9
Storage Subsystem Name = 'DS3400POK-1'
hdisk# LUN # Ownership User Label
hdisk9 0 A (preferred) glvm_tiebreaker
When the cluster was first activated, glvma1 acquires the resource group rg1 (Example 10-8).
Example 10-8 Resource group acquisition
# /usr/es/sbin/cluster/utilities/clRGinfo -p
 
Cluster Name: glvma1_cluster
 
Resource Group Name: rg1
Node Primary State Secondary State
---------------------------- ---------------
glvma1@siteA ONLINE OFFLINE
glvmb1@siteB OFFLINE ONLINE SECONDARY
glvmb2@siteB OFFLINE OFFLINE
To check whether there was a persistent reserve on the tie breaker disk, we used the devrsrv command. The output after the cluster activation is shown in Example 10-9.
Notice that the ODM Reservation Policy displayed PR EXCLUSIVE, and the Device Reservation State displayed NO RESERVE.
Example 10-9 The devrsrv command output after cluster activation
# devrsrv -c query -l hdisk9
Device Reservation State Information
==================================================
Device Name : hdisk9
Device Open On Current Host? : NO
ODM Reservation Policy : PR EXCLUSIVE
ODM PR Key Value : 5071576869747560161
Device Reservation State : NO RESERVE
Registered PR Keys : No Keys Registered
PR Capabilities Byte[2] : 0xd SIP_C ATP_C PTPL_C
PR Capabilities Byte[3] : 0x0
PR Types Supported : NOT VALID
We also checked whether there was a persistent reserve on the DS3400 after the cluster was activated. To check the persistent reserve, we right-clicked the storage subsystem on the storage manager GUI and pressed Execute Script (Figure 10-8).
Figure 10-8 Execute Script from the Storage Manager GUI
Then we executed shows logicalDrives [logicalDrivelabel] reservations (Figure 10-9).
Figure 10-9 Checking the persistent reserve
From the output display we confirmed that no persistent reserve existed on this LUN.
 
Checking persistent reservation: Methods of checking the persistent reservation differ on the storage subsystem and device driver you are using.
Testing the split
Next we created a split situation by disabling the routers. Shortly after the routers were disabled, an event split_merge_prompt split occurred on every node. Table 10-4 on page 453 shows the events that occurred on node glvma1 and glvmb1 after the split.
Notice that the split_merge_prompt event is the only event on glvma1. This is because a reboot occurred shortly after the event happened. In comparison, from the “Split policy: None, merge policy: Majority” on page 442, there is no mismatch or false events, because all nodes on siteA have been rebooted.
Table 10-4 Events during the split
Cluster events during the split on glvma1
Cluster events during the split on glvmb1
EVENT START: split_merge_prompt split
EVENT COMPLETED: split_merge_prompt split 0
EVENT START: split_merge_prompt split
EVENT COMPLETED: split_merge_prompt split 0
EVENT START: site_down_remote siteA
EVENT COMPLETED: site_down_remote siteA 0
EVENT COMPLETED: site_down siteA 0
EVENT START: node_down glvma1
EVENT COMPLETED: node_down glvma1 0
EVENT START: rg_move_release glvmb1 1
EVENT START: rg_move glvmb1 1 RELEASE
EVENT COMPLETED: rg_move glvmb1 1 RELEASE 0
EVENT COMPLETED: rg_move_release glvmb1 1 0
EVENT START: rg_move_fence glvmb1 1
EVENT COMPLETED: rg_move_fence glvmb1 1 0
EVENT START: rg_move_release glvmb1 1
EVENT START: rg_move glvmb1 1 RELEASE
EVENT COMPLETED: rg_move glvmb1 1 RELEASE 0
EVENT COMPLETED: rg_move_release glvmb1 1 0
EVENT START: rg_move_fence glvmb1 1
EVENT COMPLETED: rg_move_fence glvmb1 1 0
EVENT START: rg_move_fence glvmb1 1
EVENT COMPLETED: rg_move_fence glvmb1 1 0
EVENT START: rg_move_acquire glvmb1 1
EVENT START: rg_move glvmb1 1 ACQUIRE
EVENT START: acquire_takeover_addr
EVENT COMPLETED: acquire_takeover_addr 0
EVENT COMPLETED: rg_move glvmb1 1 ACQUIRE 0
EVENT COMPLETED: rg_move_acquire glvmb1 1 0
EVENT START: rg_move_complete glvmb1 1
EVENT COMPLETED: rg_move_complete glvmb1 1 0
EVENT START: node_down_complete glvma1
EVENT COMPLETED: node_down_complete glvma1 0
Additionally, other than the errlog in Example 10-4 on page 444 and before rebooting, failing attempts to access the tie breaker disk (hdisk9) can be observed on glvma1 (Example 10-10).
Example 10-10 Split errlog
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
AFA89905 1207053112 I O cthags Group Services daemon started
DE84C4DB 1207052812 I O ConfigRM IBM.ConfigRM daemon has started.
A6DF45AA 1207052812 I O RMCdaemon The daemon is started.
2BFA76F6 1207052812 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1207052812 T O errdemon ERROR LOGGING TURNED ON
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
A098BF90 1207052712 P S ConfigRM The operational quorum state of the acti
77A1A9A4 1207052712 I O ConfigRM ConfigRM received Site Split event notif
 
# errpt -a
---------------------------------------------------------------------------
LABEL: SC_DISK_ERR9
IDENTIFIER: B0EE9AF5
 
Date/Time: Fri Dec 7 05:27:49 2012
Sequence Number: 925
Machine Id: 00F70C994C00
Node Id: glvma1
Class: S
Type: TEMP
WPAR: Global
Resource Name: hdisk9
 
Description
REQUESTED OPERATION CANNOT BE PERFORMED
 
Probable Causes
MEDIA
 
User Causes
MEDIA DEFECTIVE
RESOURCE NOT AVAILABLE
 
Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES
 
Failure Causes
MEDIA
DISK DRIVE
 
Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES
 
Detail Data
PATH ID
1
SENSE DATA
0600 1A00 7F00 FF00 0000 0000 0000 0000 0000 0000 0000 0000 0118 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0000 0000 0093 0000
0000 009C 009C
 
Tips: We can determine that there is reserve conflict from the sense data of the SC_DISK_ERR errlog. Refer to the following layout:
SENSE DATA LAYOUT
LL00 CCCC CCCC CCCC CCCC CCCC CCCC CCCC CCCC RRRR RRRR RRRR VVSS AARR DDDD KKDD
In Example 10-10 on page 453, notice VV=01, SS=18. This indicates the following:
VV 01 indicates that the SCSI status field (SS) is valid.
SS 18 indicates that the SCSI device is reserved by another host.
For details about SCSI3 protocol errlog layout, refer to PCI Fibre Channel Adapter, SCSI-3 Protocol (Disk, CD-ROM, Read/Write Optical Device) Error Log Sense Information in Fibre Channel Planning and Integration: User’s Guide and Service Information at:
http://publibfp.boulder.ibm.com/epubs/pdf/c2343293.pdf
The devrsrv command shows a persistent key reservation on the disk (Example 10-11). The Device Reservation State now shows PR EXCLUSIVE. The node that acquired the tie breaker can be determined by comparing the ODM PR Key Value (which is unique on each cluster node) and the PR Holder Key Value.
Example 10-11 The devrsrv command after the split event
# devrsrv -c query -l hdisk9
Device Reservation State Information
==================================================
Device Name : hdisk9
Device Open On Current Host? : NO
ODM Reservation Policy : PR EXCLUSIVE
ODM PR Key Value : 5071576869747560161
Device Reservation State : PR EXCLUSIVE
PR Generation Value : 1274
PR Type : PR_WE_RO (WRITE EXCLUSIVE, REGISTRANTS ONLY)
PR Holder Key Value : 5071576869747560161
Registered PR Keys : 5071576869747560161 5071576869747560161
PR Capabilities Byte[2] : 0xd SIP_C ATP_C PTPL_C
PR Capabilities Byte[3] : 0x1 PTPL_A
PR Types Supported : NOT VALID
Also, using the Storage Manager GUI, we now observed the persistent reserve on hdisk9 (Figure 10-10). As discussed in “Tie breaker disk overview” on page 437, this reservation is designed to last until every node rejoins the cluster.
Figure 10-10 Persistent reserve on the tie breaker disk
Because glvma1 rebooted, glvmb1 is now the only node that has access to the data (Example 10-12).
Example 10-12 Resource group state on glvmb1
# /usr/es/sbin/cluster/utilities/clRGinfo -p
 
Cluster Name: glvma1_cluster
 
Resource Group Name: rg1
Node Primary State Secondary State
---------------------------- ---------------
glvma1@siteA OFFLINE OFFLINE
glvmb1@siteB ONLINE OFFLINE
glvmb2@siteB OFFLINE OFFLINE
 
 
# lspv
hdisk0 00f6f5d0e24f6303 altinst_rootvg
hdisk1 00f6f5d0012596cc rootvg active
hdisk2 00f6f5d023ec219d caavg_private active
hdisk3 00f6f5d005808d31 glvmvg active
hdisk4 00f6f5d005808d6b glvmvg active
hdisk5 00f6f5d005808da5 glvmvg active
hdisk6 00f70c990580a411 glvmvg active
hdisk7 00f70c990580a44c glvmvg active
hdisk8 00f70c990580a486 glvmvg active
hdisk9 00f70c9952af7567 None
In summary, this scenario shows that the cluster partitioning was prevented. However, although the resource group was initially started at siteA, siteB acquired the resource group making an unplanned resource group movement. Because having a split handling policy to TieBreaker is a “winner takes all” policy, this outcome is working as designed.
Testing the merge
After the reboot completed on glvma1, we reactivated the cluster services with smitty clstart. Because the IP-connectivity had not been restored between sites, the two nodes acquired the resource group at the same time (Table 10-5).
Table 10-5 Resource group state before the merge
Resource group state in glvma1
Resource group state in glvmb1
# /usr/es/sbin/cluster/utilities/clRGinfo -p
 
Cluster Name: glvma1_cluster
 
Resource Group Name: rg1
Node Primary State Secondary State
---------------------------- ---------------
glvma1@siteA ONLINE OFFLINE
glvmb1@siteB OFFLINE OFFLINE
glvmb2@siteB OFFLINE OFFLINE
 
# lspv
hdisk0 00f70c99e24ff9ff altinst_rootvg
hdisk1 00f70c9901259917 rootvg active
hdisk2 00f70c992405114b caavg_private active
hdisk3 00f70c990580a411 glvmvg active
hdisk4 00f70c990580a44c glvmvg active
hdisk5 00f70c990580a486 glvmvg active
hdisk6 00f6f5d005808d31 glvmvg active
hdisk7 00f6f5d005808d6b glvmvg active
hdisk8 00f6f5d005808da5 glvmvg active
hdisk9 00f70c9952af7567 None
 
# /usr/es/sbin/cluster/utilities/clRGinfo -p
 
Cluster Name: glvma1_cluster
 
Resource Group Name: rg1
Node Primary State Secondary State
---------------------------- ---------------
glvma1@siteA OFFLINE OFFLINE
glvmb1@siteB ONLINE OFFLINE
glvmb2@siteB OFFLINE OFFLINE
 
# lspv
hdisk0 00f6f5d0e24f6303 altinst_rootvg
hdisk1 00f6f5d0012596cc rootvg active
hdisk2 00f6f5d023ec219d caavg_private active
hdisk3 00f6f5d005808d31 glvmvg active
hdisk4 00f6f5d005808d6b glvmvg active
hdisk5 00f6f5d005808da5 glvmvg active
hdisk6 00f70c990580a411 glvmvg active
hdisk7 00f70c990580a44c glvmvg active
hdisk8 00f70c990580a486 glvmvg active
hdisk9 00f70c9952af7567 None
 
 
Important: From the previous test result, the split handling policy TieBreaker cannot prevent the cluster partitioning before regaining the IP connectivity between sites. In real-world scenarios, confirm that the IP connectivity is restored for the two sites before you perform additional operations.
Upon reactivating the routers, an event split_merge_prompt merge occurred on every node. Table 10-6 shows the events that occurred on node glvma1 and glvmb1 after the merge.
Table 10-6 Events during the merge
Cluster events during the merge on glvma1
Cluster events during the merge on glvmb1
EVENT START: split_merge_prompt merge
EVENT COMPLETED: split_merge_prompt merge 0
EVENT START: split_merge_prompt merge
EVENT COMPLETED: split_merge_prompt merge 0
Shortly after these events, node glvma1 rebooted. The errlog is shown in Example 10-13 on page 458. Observe that after the merge events, there are several failing attempts to access the tie breaker disk, hdisk9, and initiate a reboot.
Example 10-13 Merge errlog
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
AFA89905 1207054812 I O cthags Group Services daemon started
DE84C4DB 1207054712 I O ConfigRM IBM.ConfigRM daemon has started.
A6DF45AA 1207054712 I O RMCdaemon The daemon is started.
2BFA76F6 1207054712 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1207054712 T O errdemon ERROR LOGGING TURNED ON
24126A2B 1207054612 P O cthags Group Services daemon exit to merge/spli
F0851662 1207054612 I S ConfigRM The sub-domain containing the local node
65DE6DE3 1207054612 P S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207054612 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207054612 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207054612 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207054612 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207054612 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207054612 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
B0EE9AF5 1207054612 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED
1BD32427 1207054512 I O ConfigRM ConfigRM received Site Merge event notif
 
# errpt -a
---------------------------------------------------------------------------
LABEL: GS_SITE_DISSOLVE_ER
IDENTIFIER: 24126A2B
 
Date/Time: Fri Dec 7 05:46:29 2012
Sequence Number: 1652
Machine Id: 00F70C994C00
Node Id: glvma1
Class: O
Type: PERM
WPAR: Global
Resource Name: cthags
 
Description
Group Services daemon exit to merge/split sites
 
Probable Causes
Network between two sites has repaired
 
Failure Causes
CAA Services has been partitioned and/or merged.
 
Recommended Actions
Check the CAA policies.
Verify that CAA has been restarted
Call IBM Service if problem persists
 
Detail Data
DETECTING MODULE
RSCT,NS.C,1.107.1.68,4894
ERROR ID
6fca2Y.3YQkE/f1o/rE4e.1...................
REFERENCE CODE
DIAGNOSTIC EXPLANATION
NS::Ack(): The master requests to dissolve my domain because of the merge with other dom
ain 65535.Ni
---------------------------------------------------------------------------
LABEL: CONFIGRM_MERGE_ST
IDENTIFIER: F0851662
 
Date/Time: Fri Dec 7 05:46:29 2012
Sequence Number: 1651
Machine Id: 00F70C994C00
Node Id: glvma1
Class: S
Type: INFO
WPAR: Global
Resource Name: ConfigRM
 
Description
The sub-domain containing the local node is being dissolved because another
sub-domain has been detected that takes precedence over it. Group services
will be ended on each node of the local sub-domain which will cause the
configuration manager daemon (IBM.ConfigRMd) to force the node offline and
then bring it back online in the surviving domain.
 
Probable Causes
A merge of two sub-domain is probably caused by a network outage being
repaired so that the nodes of the two sub-domains can now communicate.
 
User Causes
A merge of two sub-domain is probably caused by a network outage being
repaired so that the nodes of the two sub-domains can now communicate.
 
Recommended Actions
No action is necessary since the nodes will be automatically synchronized
and brought online in the surviving domain.
 
Detail Data
DETECTING MODULE
RSCT,ConfigRMGroup.C,1.331,919
ERROR ID
REFERENCE CODE
---------------------------------------------------------------------------
LABEL: SC_DISK_ERR10
IDENTIFIER: 65DE6DE3
 
Date/Time: Fri Dec 7 05:46:25 2012
Sequence Number: 1650
Machine Id: 00F70C994C00
Node Id: glvma1
Class: S
Type: PERM
WPAR: Global
Resource Name: hdisk9
 
Description
REQUESTED OPERATION CANNOT BE PERFORMED
 
Probable Causes
DASD DEVICE
 
User Causes
RESOURCE NOT AVAILABLE
UNAUTHORIZED ACCESS ATTEMPTED
 
Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES
 
Failure Causes
MEDIA
DISK DRIVE
 
Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES
 
Detail Data
PATH ID
1
SENSE DATA
0A00 2A00 0000 0000 0000 0804 0000 0000 0000 0000 0000 0000 0118 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0004 0200 0000 0000 0000 0000 0000 0000 0000 0093 0000
0000 003D 0017
---------------------------------------------------------------------------
LABEL: CONFIGRM_SITE_MERGE
IDENTIFIER: 1BD32427
 
Date/Time: Fri Dec 7 05:45:26 2012
Sequence Number: 1595
Machine Id: 00F70C994C00
Node Id: glvma1
Class: O
Type: INFO
WPAR: Global
Resource Name: ConfigRM
 
Description
ConfigRM received Site Merge event notification
 
Probable Causes
Networks between sites may have been reconnected
 
Failure Causes
Networks between sites may have been reconnected
 
Recommended Actions
Verify the network connection between sites
 
Detail Data
DETECTING MODULE
RSCT,ConfigRMGroup.C,1.331,1436
ERROR ID
REFERENCE CODE
DIAGNOSTIC EXPLANATION
 
---------------------------------------------------------------------------
After the merge event, observe that the persistent reservation has been released from the devrsrv command (Example 10-14), and from the Storage Manager GUI, as shown Figure 10-11.
Example 10-14 Devrsrv command output after merge event
# devrsrv -c query -l hdisk9
Device Reservation State Information
==================================================
Device Name : hdisk9
Device Open On Current Host? : NO
ODM Reservation Policy : PR EXCLUSIVE
ODM PR Key Value : 5071576869747560161
Device Reservation State : NO RESERVE
Registered PR Keys : No Keys Registered
PR Capabilities Byte[2] : 0xd SIP_C ATP_C PTPL_C
PR Capabilities Byte[3] : 0x0
PR Types Supported : NOT VALID
Figure 10-11 Persistent reserve release
 
Important: Although there are several methods to manually unlock the persistent reserve, do not perform a reserve operation outside of PowerHA. If, in any way, the persistent reserve leads to a problem, contact IBM support first.
In summary from this test scenario, we can observe that the tie breaker disk prevents others nodes from acquiring the resources after the split events occurs.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.27.45