Chapter 10. Cluster partition management

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Cluster partition management

This chapter contains information about the following topics for cluster partition management on PowerHA SystemMirror 7.1.2 Enterprise Edition (EE):

•Cluster partitioning

•Methods to avoid cluster partitioning

•Planning to avoid cluster partitioning

•Detailed behavior og cluster partitioning

10.1 Cluster partitioning

This section describes cluster partitioning considerations.

The terms cluster partitioning, split-brain, and node isolation all refer to a situation where more than one cluster node activates resources as though it were the primary node. Such a situation can occur in the following scenarios:

•Failure of all links between sites

•Multiple failures within a site

– Requires failures of Ethernet, SAN, and repository access

Although cluster partitioning can occur within a site, cluster partitioning between sites is considered to be of a higher probability because heartbeating relies on IP network connectivity.

Figure 10-1 illustrates an overview of cluster partitioning between sites.

Figure 10-1 Cluster partitioning

IBM PowerHA defines the following terminologies for these specific situations:

Split When communication is lost between the cluster nodes, and each site believes that it is the only one still online.

Merge When the split partitions in the cluster attempt to reform as a single cluster because the links between them have been restored.

When utilizing sites with a PowerHA cluster, avoid unplanned split events. Not only can such events lead to undesirable results, but data divergence is also a risk. Both sites might bring the resources online, leading to each site accessing its own local disks and thus creating inconsistent copies. Manual intervention is required to recover from this situation, but it might be difficult to perform.

In general, cluster partitions are not detected until a merge event occurs. Although previous versions of PowerHA implemented safeguard functions for merge events, during the split there is no function to prevent data divergence.

10.2 Methods to avoid cluster partitioning

A partitioned cluster is one of the worst scenarios that can occur in a clustered environment. This condition can be dangerous because each node can independently run the applications, and can acquire the data from their own storage copies. If this occurs you risk losing access to the disks, and potentially experiencing data divergence.

The basic way to prevent partitioning between sites it to define multiple IP networks between the sites. Having multiple network communication paths between the sites minimizes the risk of a false fallover. To achieve true redundancy, the networks must be backed by a separate network infrastructure.

If that is not possible, there might be no benefit to having a separate network defined. Instead, the multiple interfaces can be aggregated to form a single redundant logical interface, which adds redundancy to the communication interface.

However, even with the method previously mentioned, we cannot prevent cluster partitions from total loss of IP connectivity between sites.

IBM PowerHA SystemMirror 7.1.2 Enterprise Edition now provides a new function: the tie breaker disk. This new function allows PowerHA to decide which partition of the cluster should survive in case of split events.

Tie breaker disk overview

You can use the tie breaker option to specify a SCSI-3 Persistent Reserve (PR)-capable disk that is used by the split and merge policies.

A tie breaker disk is used when a group of nodes in a cluster cannot communicate with each other. This communication failure results in the cluster splitting the nodes into two or more partitions. If failure occurs because the cluster communication links are not responding, both partitions attempt to lock the tie breaker disk. The partition that acquires the tie breaker disk continues to function, while the other partition reboots.

Tie breaker accessibility considerations: The disk that is identified as the tie breaker must be accessible to all nodes in the cluster.

When partitions that were part of the cluster are brought back online after the communication failure, they must be able to communicate with the partition that owns the tie breaker disk. If a partition that is brought back online cannot communicate with the tie breaker disk, it does not join the cluster. The tie breaker disk is released when all nodes in the configuration rejoin the cluster.

10.3 Planning to avoid cluster partitioning

This section discusses how to plan to avoid cluster partitioning.

Stretched cluster versus linked cluster

Although split and merge can happen in either stretched clusters or linked clusters, a linked cluster is considered crucial because it usually involves data replication.

Adding a tie breaker disk to a stretched cluster is, in general, expected to be not that beneficial. This is based on the assessment that partitions are more likely in a configuration with additional heartbeat paths through SAN or the repository disk.

In comparison, the risk of total loss of IP connectivity between sites is higher in a linked cluster. In such configurations, using a tie breaker disk configuration is suggested to prevent cluster partitions when a split event occurs.

Split handling policy

The split handling policy attribute describes the type of handling performed by PowerHA when a split occurs within a cluster. The possible choices for this attribute are described here:

None: (Default) With this option, each partition that is created by the cluster split event becomes an independent cluster. Each partition can start a workload independent of the other partition. This option is the default setting.

Note that, for linked clusters, do not use this option if your environment is configured to use HyperSwap for PowerHA SystemMirror.

TieBreaker With this option, each partition attempts to acquire the tie breaker by placing a lock on the tie breaker disk. The tie breaker is a SCSI disk that is accessible to all nodes in the cluster. The partition that cannot lock the disk is rebooted.

Note that if you use this option, the merge policy configuration must also use the tie breaker option.

Merge handling policy

The merge handling policy attribute controls the behavior of the PowerHA cluster when the cluster partitions or splits, and later, when the cluster partitions are attempting to merge. The possible choices for this attribute are described here:

Majority (Default) With this option, the partition with the highest number of nodes remains online. If each partition has the same number of nodes, then the partition that has the shortest node ID is chosen. The partition that does not remain online is rebooted.

This is the default policy, and it is identical to that provided by PowerHA 6.1 and prior releases.

Note that if you use this option, your split policy configuration must also use the tie breaker option.

Important: The current release only supports the following combinations:

•Split Policy: None, Merge Policy: Majority

•Split Policy: TieBreaker, Merge Policy: TieBreaker

Split and merge action plan

This policy setting describes what action is taken on the nodes in the losing partitions when the split or merge happens. The only configurable method in the current release is listed here:

Reboot (Default) In this case, the AIX operating system is rebooted.

Tie breaker disk requirements

Note the following requirements for tie breaker disks:

•A disk supporting SCSI-3 persistent reserve that is accessible by all nodes.

•The repository cannot be used as a tie breaker.

You can verify whether the disks are SCSI-3 persistent reserve-capable with the lsattr -Rl hdiskX -a reserve_policy command. The PR_exclusive value should appear as shown in Example 10-1.

Example 10-1 Checking reserve capability of the disks

# lsattr -Rl hdisk9 -a reserve_policy

no_reserve

single_path

PR_exclusive

PR_shared

10.4 Detailed behavior og cluster partitioning

In this section, we discuss the detailed behavior of cluster partitioning through actual testing.

10.4.1 Test environment overview

This scenario used a three-node cluster that was configured with IPv6 and IPv4 (dual stack configuration). The cluster name is glvma1_cluster. The cluster consisted of one node on siteA, and two nodes on siteB.

Two networks were configured: an XD_data network and an Ether network. The XD_data network was connected with IPv4 addresses. The Ether network was connected with IPv6 addresses.

The disks were replicated through a synchronous GLVM function. The tie breaker disk was configured as a Fibre Channel-attached DS3400 device that was accessible from both sites. Figure 10-2 shows the overview of the cluster configuration.

Figure 10-2 Test environment overview

A routed network was configured between the two sites. To test the cluster partitioning, we shut down the routers (Figure 10-3). This isolated the two sites, initiating the split event.

Figure 10-3 Disabling the routers

Next, we reactivated the routers for the two sites to re-establish the IP link. This initiated the merge event.

10.4.2 Configuring the split and merge policy

To configure the split and merge policy, we used smitty sysmirror → Custom Cluster Configuration → Cluster Nodes and Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge Policy.

Figure 10-4 shows the corresponding SMIT panel of the window.

Configure Cluster Split and Merge Policy for a Linked Cluster

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Split Handling Policy None +

Merge Handling Policy Majority +

Split and Merge Action Plan Reboot +

Select Tie Breaker +

Figure 10-4 SMIT panel for split and merge policy

To configure the tie breaker disks, we pressed F4 while on the Select Tie Breaker menu. A list of available disks that can be configured as the tie breaker disk was displayed (Figure 10-5).

For information about how to get your disk to show up in this list, see “Tie breaker disk requirements” on page 439.

+--------------------------------------------------------------------------+

| Select Tie Breaker |

| |

| Move cursor to desired item and press Enter. |

| |

| None |

| hdisk9 (00f70c9952af7567) on all cluster nodes |

| |

| F1=Help F2=Refresh F3=Cancel |

F1| F8=Image F10=Exit Enter=Do |

F5| /=Find n=Find Next |

F9+--------------------------------------------------------------------------+

Figure 10-5 Tie breaker disk selection

To confirm this setting, we checked the HACMPsplitmerge ODM database (Example 10-2).

Example 10-2 HACMPsplitmerge ODM database

# odmget HACMPsplitmerge

HACMPsplitmerge:

id = 0

policy = "action"

value = "Reboot"

HACMPsplitmerge:

id = 0

policy = "split"

value = "TieBreaker"

HACMPsplitmerge:

id = 0

policy = "merge"

value = "TieBreaker"

HACMPsplitmerge:

id = 0

policy = "tiebreaker"

value = "00f70c9952af7567"

10.4.3 Split policy: None, merge policy: Majority

Our first test was performed using the configuration shown in Figure 10-6.

Configure Cluster Split and Merge Policy for a Linked Cluster

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Split Handling Policy None +

Merge Handling Policy Majority +

Split and Merge Action Plan Reboot +

Select Tie Breaker +

Figure 10-6 Configuring split policy: None, merge policy: Majority

When the cluster was first activated, glvma1 acquired the resource group rg1 (Example 10-3).

Example 10-3 Resource group acquisition

# /usr/es/sbin/cluster/utilities/clRGinfo -p

Cluster Name: glvma1_cluster

Resource Group Name: rg1

Node Primary State Secondary State

---------------------------- ---------------

glvma1@siteA ONLINE OFFLINE

glvmb1@siteB OFFLINE ONLINE SECONDARY

glvmb2@siteB OFFLINE OFFLINE

Testing the split

We created a split situation by disabling the routers. As soon as the router was disabled, an event split_merge_prompt split occurred on every node. Table 10-1 on page 444 lists the events that occurred on node glvma1 and glvmb1.

Notice that false and mismatching events such as site_down and node_down events occurred on each node. Because glvmb1 falsely indicated that siteA was down, it tried to acquire the resource without disabling it on glvma1.

Table 10-1 Events during the split

Cluster events during the split on glvma1

Cluster events during the split on glvmb1

EVENT START: split_merge_prompt split
EVENT START: node_down glvmb1

EVENT COMPLETED: node_down glvmb1 0

EVENT START: site_down siteB

EVENT COMPLETED: split_merge_prompt split 0

EVENT START: site_down_remote siteB

EVENT COMPLETED: site_down_remote siteB 0

EVENT COMPLETED: site_down siteB 0

EVENT START: node_down glvmb2

EVENT COMPLETED: node_down glvmb2 0

EVENT START: rg_move_release glvma1 1

EVENT START: rg_move glvma1 1 RELEASE

EVENT COMPLETED: rg_move glvma1 1 RELEASE 0

EVENT COMPLETED: rg_move_release glvma1 1 0

EVENT START: rg_move_fence glvma1 1

EVENT COMPLETED: rg_move_fence glvma1 1 0

EVENT START: node_down_complete glvmb1

EVENT COMPLETED: node_down_complete glvmb1 0

EVENT START: node_down_complete glvmb2

EVENT COMPLETED: node_down_complete glvmb2 0

EVENT START: split_merge_prompt split

EVENT COMPLETED: split_merge_prompt split 0
EVENT START: site_down siteA

EVENT START: site_down_remote siteA

EVENT COMPLETED: site_down_remote siteA 0

EVENT COMPLETED: site_down siteA 0

EVENT START: node_down glvma1

EVENT COMPLETED: node_down glvma1 0

EVENT START: rg_move_release glvmb1 1

EVENT START: rg_move glvmb1 1 RELEASE

EVENT COMPLETED: rg_move glvmb1 1 RELEASE 0

EVENT COMPLETED: rg_move_release glvmb1 1 0

EVENT START: rg_move_fence glvmb1 1

EVENT COMPLETED: rg_move_fence glvmb1 1 0

EVENT START: rg_move_release glvmb1 1

EVENT START: rg_move glvmb1 1 RELEASE

EVENT COMPLETED: rg_move glvmb1 1 RELEASE 0

EVENT COMPLETED: rg_move_release glvmb1 1 0

EVENT START: rg_move_fence glvmb1 1

EVENT COMPLETED: rg_move_fence glvmb1 1 0

EVENT START: rg_move_fence glvmb1 1

EVENT COMPLETED: rg_move_fence glvmb1 1 0

EVENT START: rg_move_acquire glvmb1 1

EVENT START: rg_move glvmb1 1 ACQUIRE

EVENT START: acquire_takeover_addr

EVENT COMPLETED: acquire_takeover_addr 0

EVENT COMPLETED: rg_move glvmb1 1 ACQUIRE 0

EVENT COMPLETED: rg_move_acquire glvmb1 1 0

EVENT START: rg_move_complete glvmb1 1

EVENT COMPLETED: rg_move_complete glvmb1 1 0

EVENT START: node_down_complete glvma1

EVENT COMPLETED: node_down_complete glvma1 0

The error log on each node showed a split occurred (Example 10-4).

Example 10-4 Split errlog

# errpt

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

4BDDFBCC 1207041112 I S ConfigRM The operational quorum state of the acti

A098BF90 1207041112 P S ConfigRM The operational quorum state of the acti

77A1A9A4 1207041112 I O ConfigRM ConfigRM received Site Split event notif

# errpt -a

---------------------------------------------------------------------------

LABEL: CONFIGRM_HASQUORUM_

IDENTIFIER: 4BDDFBCC

Date/Time: Fri Dec 7 04:11:13 2012

Sequence Number: 883

Machine Id: 00F70C994C00

Node Id: glvma1

Class: S

Type: INFO

WPAR: Global

Resource Name: ConfigRM

Description

The operational quorum state of the active peer domain has changed to HAS_QUORUM.

In this state, cluster resources may be recovered and controlled as needed by

management applications.

Probable Causes

One or more nodes have come online in the peer domain.

User Causes

One or more nodes have come online in the peer domain.

Recommended Actions

None

Detail Data

DETECTING MODULE

RSCT,PeerDomain.C,1.99.22.110,18993

ERROR ID

REFERENCE CODE

---------------------------------------------------------------------------

LABEL: CONFIGRM_PENDINGQUO

IDENTIFIER: A098BF90

Date/Time: Fri Dec 7 04:11:13 2012

Sequence Number: 882

Machine Id: 00F70C994C00

Node Id: glvma1

Class: S

Type: PERM

WPAR: Global

Resource Name: ConfigRM

Description

The operational quorum state of the active peer domain has changed to PENDING_QUORUM.

This state usually indicates that exactly half of the nodes that are defined in the

peer domain are online. In this state cluster resources cannot be recovered although

none will be stopped explicitly.

Probable Causes

One or more nodes in the active peer domain have failed.

One or more nodes in the active peer domain have been taken offline by the user.

A network failure is disrupted communication between the cluster nodes.

Failure Causes

One or more nodes in the active peer domain have failed.

One or more nodes in the active peer domain have been taken offline by the user.

A network failure is disrupted communication between the cluster nodes.

Recommended Actions

Ensure that more than half of the nodes of the domain are online.

Ensure that the network that is used for communication between the nodes is functioning correctly.

Ensure that the active tie breaker device is operational and if it set to

'Operator' then resolve the tie situation by granting ownership to one of

the active sub-domains.

Detail Data

DETECTING MODULE

RSCT,PeerDomain.C,1.99.22.110,18997

ERROR ID

REFERENCE CODE

---------------------------------------------------------------------------

LABEL: CONFIGRM_SITE_SPLIT

IDENTIFIER: 77A1A9A4

Date/Time: Fri Dec 7 04:11:12 2012

Sequence Number: 880

Machine Id: 00F70C994C00

Node Id: glvma1

Class: O

Type: INFO

WPAR: Global

Resource Name: ConfigRM

Description

ConfigRM received Site Split event notification

Probable Causes

Networks between sites may have been disconnected

Failure Causes

Networks between sites may have been disconnected

Recommended Actions

Check the network connectivity between sites

Detail Data

DETECTING MODULE

RSCT,ConfigRMGroup.C,1.331,1398

ERROR ID

REFERENCE CODE

DIAGNOSTIC EXPLANATION

At this point, resource groups were now acquired on both siteA and siteB. The volume group was activated on both nodes (Example 10-5 on page 447). The cluster state was not synced between sites, and each node was able to access write-enabled its own copies.

Table 10-2 Resource group state after the split

Resource group state in glvma1

Resource group state in glvmb1

# /usr/es/sbin/cluster/utilities/clRGinfo -p

Cluster Name: glvma1_cluster

Resource Group Name: rg1

Node Primary State Secondary State

---------------------------- ---------------

glvma1@siteA ONLINE OFFLINE

glvmb1@siteB OFFLINE OFFLINE

glvmb2@siteB OFFLINE OFFLINE

# lspv

hdisk0 00f70c99e24ff9ff altinst_rootvg

hdisk1 00f70c9901259917 rootvg active

hdisk2 00f70c992405114b caavg_private active

hdisk3 00f70c990580a411 glvmvg active

hdisk4 00f70c990580a44c glvmvg active

hdisk5 00f70c990580a486 glvmvg active

hdisk6 00f6f5d005808d31 glvmvg active

hdisk7 00f6f5d005808d6b glvmvg active

hdisk8 00f6f5d005808da5 glvmvg active

hdisk9 00f70c9952af7567 None

# /usr/es/sbin/cluster/utilities/clRGinfo -p

Cluster Name: glvma1_cluster

Resource Group Name: rg1

Node Primary State Secondary State

---------------------------- ---------------

glvma1@siteA OFFLINE OFFLINE

glvmb1@siteB ONLINE OFFLINE

glvmb2@siteB OFFLINE OFFLINE

# lspv

hdisk0 00f6f5d0e24f6303 altinst_rootvg

hdisk1 00f6f5d0012596cc rootvg active

hdisk2 00f6f5d023ec219d caavg_private active

hdisk3 00f6f5d005808d31 glvmvg active

hdisk4 00f6f5d005808d6b glvmvg active

hdisk5 00f6f5d005808da5 glvmvg active

hdisk6 00f70c990580a411 glvmvg active

hdisk7 00f70c990580a44c glvmvg active

hdisk8 00f70c990580a486 glvmvg active

hdisk9 00f70c9952af7567 None

In summary, the cluster partition was not prevented during total IP connection lost. This is the expected behavior for setting the split handling policy to None.

Testing the merge

Next. we created a merge situation by reactivating the routers. Because siteB was configured with two nodes, the expected behavior was the resource group moved to siteB.

Shortly after the router had been activated, an event split_merge_prompt merge occurred on every node. Table 10-3 shows the events that occurred on node glvma1 and glvmb1.

Table 10-3 Events during the merge

Cluster events during the merge on glvma1	Cluster events during the merge on glvmb1
EVENT START: split_merge_prompt merge EVENT COMPLETED: split_merge_prompt merge 0	EVENT START: split_merge_prompt merge EVENT COMPLETED: split_merge_prompt merge 0

The error log on each node indicated that a merge occurred(Example 10-5).

Example 10-5 Merge errlog

# errpt

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

1BD32427 1207042512 I O ConfigRM ConfigRM received Site Merge event notif

# errpt -a

---------------------------------------------------------------------------

LABEL: CONFIGRM_SITE_MERGE

IDENTIFIER: 1BD32427

Date/Time: Fri Dec 7 04:25:41 2012

Sequence Number: 764

Machine Id: 00F6F5D04C00

Node Id: glvmb2

Class: O

Type: INFO

WPAR: Global

Resource Name: ConfigRM

Description

ConfigRM received Site Merge event notification

Probable Causes

Networks between sites may have been reconnected

Failure Causes

Networks between sites may have been reconnected

Recommended Actions

Verify the network connection between sites

Detail Data

DETECTING MODULE

RSCT,ConfigRMGroup.C,1.331,1436

ERROR ID

REFERENCE CODE

DIAGNOSTIC EXPLANATION

Shortly after these events occurred, a reboot occurred on glvma1. The errlog indicated the cause of the reboot (Example 10-6).

Example 10-6 Merge errlog

# errpt

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

AFA89905 1207042712 I O cthags Group Services daemon started

DE84C4DB 1207042612 I O ConfigRM IBM.ConfigRM daemon has started.

A6DF45AA 1207042612 I O RMCdaemon The daemon is started.

2BFA76F6 1207042612 T S SYSPROC SYSTEM SHUTDOWN BY USER

9DBCFDEE 1207042612 T O errdemon ERROR LOGGING TURNED ON

24126A2B 1207042512 P O cthags Group Services daemon exit to merge/spli

F0851662 1207042512 I S ConfigRM The sub-domain containing the local node

1BD32427 1207042512 I O ConfigRM ConfigRM received Site Merge event notif

# errpt -a

---------------------------------------------------------------------------

LABEL: GS_SITE_DISSOLVE_ER

IDENTIFIER: 24126A2B

Date/Time: Fri Dec 7 04:25:44 2012

Sequence Number: 886

Machine Id: 00F70C994C00

Node Id: glvma1

Class: O

Type: PERM

WPAR: Global

Resource Name: cthags

Description

Group Services daemon exit to merge/split sites

Probable Causes

Network between two sites has repaired

Failure Causes

CAA Services has been partitioned and/or merged.

Recommended Actions

Check the CAA policies.

Verify that CAA has been restarted

Call IBM Service if problem persists

Detail Data

DETECTING MODULE

RSCT,NS.C,1.107.1.68,4894

ERROR ID

6fca2Y.MMPkE/kn8.rE4e.1...................

REFERENCE CODE

DIAGNOSTIC EXPLANATION

NS::Ack(): The master requests to dissolve my domain because of the merge with other domain 65535.Ni

---------------------------------------------------------------------------

LABEL: CONFIGRM_MERGE_ST

IDENTIFIER: F0851662

Date/Time: Fri Dec 7 04:25:44 2012

Sequence Number: 885

Machine Id: 00F70C994C00

Node Id: glvma1

Class: S

Type: INFO

WPAR: Global

Resource Name: ConfigRM

Description

The sub-domain containing the local node is being dissolved because another

sub-domain has been detected that takes precedence over it. Group services

will be ended on each node of the local sub-domain which will cause the

configuration manager daemon (IBM.ConfigRMd) to force the node offline and

then bring it back online in the surviving domain.

Probable Causes

A merge of two sub-domain is probably caused by a network outage being

repaired so that the nodes of the two sub-domains can now communicate.

User Causes

A merge of two sub-domain is probably caused by a network outage being

repaired so that the nodes of the two sub-domains can now communicate.

Recommended Actions

No action is necessary since the nodes will be automatically synchronized

and brought online in the surviving domain.

Detail Data

DETECTING MODULE

RSCT,ConfigRMGroup.C,1.331,919

ERROR ID

REFERENCE CODE

In summary, the resource group stayed online on siteB, and siteA was brought offline on a merge event. This is the expected behavior for setting the merge policy to Majority because siteB had the majority of the nodes in the cluster.

10.4.4 Split policy: TieBreaker, merge policy: TieBreaker

Then we tested with the configuration shown in Figure 10-7.

Configure Cluster Split and Merge Policy for a Linked Cluster

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Split Handling Policy TieBreaker +

Merge Handling Policy TieBreaker +

Split and Merge Action Plan Reboot +

Select Tie Breaker (00f70c9952af7567) +

Figure 10-7 Configuring split policy: TieBreaker, merge policy: TieBreaker

The disk with PVID 00f70c9952af7567 was a DS3400 disk configured to all nodes in the cluster. From the DS3400, the LUN is named glvm_tiebreaker. Example 10-7 shows some of the tie breaker disks properties.

Example 10-7 Tie breaker disk names

# lspv | grep 00f70c9952af7567

hdisk9 00f70c9952af7567 None

# lsdev -Cc disk -l hdisk9

hdisk9 Available 80-T1-01 MPIO Other DS3K Array Disk

# mpio_get_config -l hdisk9

Storage Subsystem Name = 'DS3400POK-1'

hdisk# LUN # Ownership User Label

hdisk9 0 A (preferred) glvm_tiebreaker

When the cluster was first activated, glvma1 acquires the resource group rg1 (Example 10-8).

Example 10-8 Resource group acquisition

# /usr/es/sbin/cluster/utilities/clRGinfo -p

Cluster Name: glvma1_cluster

Resource Group Name: rg1

Node Primary State Secondary State

---------------------------- ---------------

glvma1@siteA ONLINE OFFLINE

glvmb1@siteB OFFLINE ONLINE SECONDARY

glvmb2@siteB OFFLINE OFFLINE

To check whether there was a persistent reserve on the tie breaker disk, we used the devrsrv command. The output after the cluster activation is shown in Example 10-9.

Notice that the ODM Reservation Policy displayed PR EXCLUSIVE, and the Device Reservation State displayed NO RESERVE.

Example 10-9 The devrsrv command output after cluster activation

# devrsrv -c query -l hdisk9

Device Reservation State Information

==================================================

Device Name : hdisk9

Device Open On Current Host? : NO

ODM Reservation Policy : PR EXCLUSIVE

ODM PR Key Value : 5071576869747560161

Device Reservation State : NO RESERVE

Registered PR Keys : No Keys Registered

PR Capabilities Byte[2] : 0xd SIP_C ATP_C PTPL_C

PR Capabilities Byte[3] : 0x0

PR Types Supported : NOT VALID

We also checked whether there was a persistent reserve on the DS3400 after the cluster was activated. To check the persistent reserve, we right-clicked the storage subsystem on the storage manager GUI and pressed Execute Script (Figure 10-8).

Figure 10-8 Execute Script from the Storage Manager GUI

Then we executed shows logicalDrives [logicalDrivelabel] reservations (Figure 10-9).

Figure 10-9 Checking the persistent reserve

From the output display we confirmed that no persistent reserve existed on this LUN.

Checking persistent reservation: Methods of checking the persistent reservation differ on the storage subsystem and device driver you are using.

Testing the split

Next we created a split situation by disabling the routers. Shortly after the routers were disabled, an event split_merge_prompt split occurred on every node. Table 10-4 on page 453 shows the events that occurred on node glvma1 and glvmb1 after the split.

Notice that the split_merge_prompt event is the only event on glvma1. This is because a reboot occurred shortly after the event happened. In comparison, from the “Split policy: None, merge policy: Majority” on page 442, there is no mismatch or false events, because all nodes on siteA have been rebooted.

Table 10-4 Events during the split

Cluster events during the split on glvma1

Cluster events during the split on glvmb1

EVENT START: split_merge_prompt split

EVENT COMPLETED: split_merge_prompt split 0

EVENT START: split_merge_prompt split

EVENT COMPLETED: split_merge_prompt split 0

EVENT START: site_down_remote siteA

EVENT COMPLETED: site_down_remote siteA 0

EVENT COMPLETED: site_down siteA 0

EVENT START: node_down glvma1

EVENT COMPLETED: node_down glvma1 0

EVENT START: rg_move_release glvmb1 1

EVENT START: rg_move glvmb1 1 RELEASE

EVENT COMPLETED: rg_move glvmb1 1 RELEASE 0

EVENT COMPLETED: rg_move_release glvmb1 1 0

EVENT START: rg_move_fence glvmb1 1

EVENT COMPLETED: rg_move_fence glvmb1 1 0

EVENT START: rg_move_release glvmb1 1

EVENT START: rg_move glvmb1 1 RELEASE

EVENT COMPLETED: rg_move glvmb1 1 RELEASE 0

EVENT COMPLETED: rg_move_release glvmb1 1 0

EVENT START: rg_move_fence glvmb1 1

EVENT COMPLETED: rg_move_fence glvmb1 1 0

EVENT START: rg_move_fence glvmb1 1

EVENT COMPLETED: rg_move_fence glvmb1 1 0

EVENT START: rg_move_acquire glvmb1 1

EVENT START: rg_move glvmb1 1 ACQUIRE

EVENT START: acquire_takeover_addr

EVENT COMPLETED: acquire_takeover_addr 0

EVENT COMPLETED: rg_move glvmb1 1 ACQUIRE 0

EVENT COMPLETED: rg_move_acquire glvmb1 1 0

EVENT START: rg_move_complete glvmb1 1

EVENT COMPLETED: rg_move_complete glvmb1 1 0

EVENT START: node_down_complete glvma1

EVENT COMPLETED: node_down_complete glvma1 0

Additionally, other than the errlog in Example 10-4 on page 444 and before rebooting, failing attempts to access the tie breaker disk (hdisk9) can be observed on glvma1 (Example 10-10).

Example 10-10 Split errlog

# errpt

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

AFA89905 1207053112 I O cthags Group Services daemon started

DE84C4DB 1207052812 I O ConfigRM IBM.ConfigRM daemon has started.

A6DF45AA 1207052812 I O RMCdaemon The daemon is started.

2BFA76F6 1207052812 T S SYSPROC SYSTEM SHUTDOWN BY USER

9DBCFDEE 1207052812 T O errdemon ERROR LOGGING TURNED ON

B0EE9AF5 1207052712 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED

A098BF90 1207052712 P S ConfigRM The operational quorum state of the acti

77A1A9A4 1207052712 I O ConfigRM ConfigRM received Site Split event notif

# errpt -a

---------------------------------------------------------------------------

LABEL: SC_DISK_ERR9

IDENTIFIER: B0EE9AF5

Date/Time: Fri Dec 7 05:27:49 2012

Sequence Number: 925

Machine Id: 00F70C994C00

Node Id: glvma1

Class: S

Type: TEMP

WPAR: Global

Resource Name: hdisk9

Description

REQUESTED OPERATION CANNOT BE PERFORMED

Probable Causes

MEDIA

User Causes

MEDIA DEFECTIVE

RESOURCE NOT AVAILABLE

Recommended Actions

FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY

PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes

MEDIA

DISK DRIVE

Recommended Actions

FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY

PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data

PATH ID

SENSE DATA

0600 1A00 7F00 FF00 0000 0000 0000 0000 0000 0000 0000 0000 0118 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0000 0000 0093 0000

0000 009C 009C

Tips: We can determine that there is reserve conflict from the sense data of the SC_DISK_ERR errlog. Refer to the following layout:

SENSE DATA LAYOUT
LL00 CCCC CCCC CCCC CCCC CCCC CCCC CCCC CCCC RRRR RRRR RRRR VVSS AARR DDDD KKDD

In Example 10-10 on page 453, notice VV=01, SS=18. This indicates the following:

•VV 01 indicates that the SCSI status field (SS) is valid.

•SS 18 indicates that the SCSI device is reserved by another host.

For details about SCSI3 protocol errlog layout, refer to PCI Fibre Channel Adapter, SCSI-3 Protocol (Disk, CD-ROM, Read/Write Optical Device) Error Log Sense Information in Fibre Channel Planning and Integration: User’s Guide and Service Information at:

http://publibfp.boulder.ibm.com/epubs/pdf/c2343293.pdf

The devrsrv command shows a persistent key reservation on the disk (Example 10-11). The Device Reservation State now shows PR EXCLUSIVE. The node that acquired the tie breaker can be determined by comparing the ODM PR Key Value (which is unique on each cluster node) and the PR Holder Key Value.

Example 10-11 The devrsrv command after the split event

# devrsrv -c query -l hdisk9

Device Reservation State Information

==================================================

Device Name : hdisk9

Device Open On Current Host? : NO

ODM Reservation Policy : PR EXCLUSIVE

ODM PR Key Value : 5071576869747560161

Device Reservation State : PR EXCLUSIVE

PR Generation Value : 1274

PR Type : PR_WE_RO (WRITE EXCLUSIVE, REGISTRANTS ONLY)

PR Holder Key Value : 5071576869747560161

Registered PR Keys : 5071576869747560161 5071576869747560161

PR Capabilities Byte[2] : 0xd SIP_C ATP_C PTPL_C

PR Capabilities Byte[3] : 0x1 PTPL_A

PR Types Supported : NOT VALID

Also, using the Storage Manager GUI, we now observed the persistent reserve on hdisk9 (Figure 10-10). As discussed in “Tie breaker disk overview” on page 437, this reservation is designed to last until every node rejoins the cluster.

Figure 10-10 Persistent reserve on the tie breaker disk

Because glvma1 rebooted, glvmb1 is now the only node that has access to the data (Example 10-12).

Example 10-12 Resource group state on glvmb1

# /usr/es/sbin/cluster/utilities/clRGinfo -p

Cluster Name: glvma1_cluster

Resource Group Name: rg1

Node Primary State Secondary State

---------------------------- ---------------

glvma1@siteA OFFLINE OFFLINE

glvmb1@siteB ONLINE OFFLINE

glvmb2@siteB OFFLINE OFFLINE

# lspv

hdisk0 00f6f5d0e24f6303 altinst_rootvg

hdisk1 00f6f5d0012596cc rootvg active

hdisk2 00f6f5d023ec219d caavg_private active

hdisk3 00f6f5d005808d31 glvmvg active

hdisk4 00f6f5d005808d6b glvmvg active

hdisk5 00f6f5d005808da5 glvmvg active

hdisk6 00f70c990580a411 glvmvg active

hdisk7 00f70c990580a44c glvmvg active

hdisk8 00f70c990580a486 glvmvg active

hdisk9 00f70c9952af7567 None

In summary, this scenario shows that the cluster partitioning was prevented. However, although the resource group was initially started at siteA, siteB acquired the resource group making an unplanned resource group movement. Because having a split handling policy to TieBreaker is a “winner takes all” policy, this outcome is working as designed.

Testing the merge

After the reboot completed on glvma1, we reactivated the cluster services with smitty clstart. Because the IP-connectivity had not been restored between sites, the two nodes acquired the resource group at the same time (Table 10-5).

Table 10-5 Resource group state before the merge

Resource group state in glvma1

Resource group state in glvmb1

# /usr/es/sbin/cluster/utilities/clRGinfo -p

Cluster Name: glvma1_cluster

Resource Group Name: rg1

Node Primary State Secondary State

---------------------------- ---------------

glvma1@siteA ONLINE OFFLINE

glvmb1@siteB OFFLINE OFFLINE

glvmb2@siteB OFFLINE OFFLINE

# lspv

hdisk0 00f70c99e24ff9ff altinst_rootvg

hdisk1 00f70c9901259917 rootvg active

hdisk2 00f70c992405114b caavg_private active

hdisk3 00f70c990580a411 glvmvg active

hdisk4 00f70c990580a44c glvmvg active

hdisk5 00f70c990580a486 glvmvg active

hdisk6 00f6f5d005808d31 glvmvg active

hdisk7 00f6f5d005808d6b glvmvg active

hdisk8 00f6f5d005808da5 glvmvg active

hdisk9 00f70c9952af7567 None

# /usr/es/sbin/cluster/utilities/clRGinfo -p

Cluster Name: glvma1_cluster

Resource Group Name: rg1

Node Primary State Secondary State

---------------------------- ---------------

glvma1@siteA OFFLINE OFFLINE

glvmb1@siteB ONLINE OFFLINE

glvmb2@siteB OFFLINE OFFLINE

# lspv

hdisk0 00f6f5d0e24f6303 altinst_rootvg

hdisk1 00f6f5d0012596cc rootvg active

hdisk2 00f6f5d023ec219d caavg_private active

hdisk3 00f6f5d005808d31 glvmvg active

hdisk4 00f6f5d005808d6b glvmvg active

hdisk5 00f6f5d005808da5 glvmvg active

hdisk6 00f70c990580a411 glvmvg active

hdisk7 00f70c990580a44c glvmvg active

hdisk8 00f70c990580a486 glvmvg active

hdisk9 00f70c9952af7567 None

Important: From the previous test result, the split handling policy TieBreaker cannot prevent the cluster partitioning before regaining the IP connectivity between sites. In real-world scenarios, confirm that the IP connectivity is restored for the two sites before you perform additional operations.

Upon reactivating the routers, an event split_merge_prompt merge occurred on every node. Table 10-6 shows the events that occurred on node glvma1 and glvmb1 after the merge.

Table 10-6 Events during the merge

Cluster events during the merge on glvma1	Cluster events during the merge on glvmb1
EVENT START: split_merge_prompt merge EVENT COMPLETED: split_merge_prompt merge 0	EVENT START: split_merge_prompt merge EVENT COMPLETED: split_merge_prompt merge 0

Shortly after these events, node glvma1 rebooted. The errlog is shown in Example 10-13 on page 458. Observe that after the merge events, there are several failing attempts to access the tie breaker disk, hdisk9, and initiate a reboot.

Example 10-13 Merge errlog

# errpt

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

AFA89905 1207054812 I O cthags Group Services daemon started

DE84C4DB 1207054712 I O ConfigRM IBM.ConfigRM daemon has started.

A6DF45AA 1207054712 I O RMCdaemon The daemon is started.

2BFA76F6 1207054712 T S SYSPROC SYSTEM SHUTDOWN BY USER

9DBCFDEE 1207054712 T O errdemon ERROR LOGGING TURNED ON

24126A2B 1207054612 P O cthags Group Services daemon exit to merge/spli

F0851662 1207054612 I S ConfigRM The sub-domain containing the local node

65DE6DE3 1207054612 P S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED

B0EE9AF5 1207054612 T S hdisk9 REQUESTED OPERATION CANNOT BE PERFORMED

1BD32427 1207054512 I O ConfigRM ConfigRM received Site Merge event notif

# errpt -a

---------------------------------------------------------------------------

LABEL: GS_SITE_DISSOLVE_ER

IDENTIFIER: 24126A2B

Date/Time: Fri Dec 7 05:46:29 2012

Sequence Number: 1652

Machine Id: 00F70C994C00

Node Id: glvma1

Class: O

Type: PERM

WPAR: Global

Resource Name: cthags

Description

Group Services daemon exit to merge/split sites

Probable Causes

Network between two sites has repaired

Failure Causes

CAA Services has been partitioned and/or merged.

Recommended Actions

Check the CAA policies.

Verify that CAA has been restarted

Call IBM Service if problem persists

Detail Data

DETECTING MODULE

RSCT,NS.C,1.107.1.68,4894

ERROR ID

6fca2Y.3YQkE/f1o/rE4e.1...................

REFERENCE CODE

DIAGNOSTIC EXPLANATION

NS::Ack(): The master requests to dissolve my domain because of the merge with other dom

ain 65535.Ni

---------------------------------------------------------------------------

LABEL: CONFIGRM_MERGE_ST

IDENTIFIER: F0851662

Date/Time: Fri Dec 7 05:46:29 2012

Sequence Number: 1651

Machine Id: 00F70C994C00

Node Id: glvma1

Class: S

Type: INFO

WPAR: Global

Resource Name: ConfigRM

Description

The sub-domain containing the local node is being dissolved because another

sub-domain has been detected that takes precedence over it. Group services

will be ended on each node of the local sub-domain which will cause the

configuration manager daemon (IBM.ConfigRMd) to force the node offline and

then bring it back online in the surviving domain.

Probable Causes

A merge of two sub-domain is probably caused by a network outage being

repaired so that the nodes of the two sub-domains can now communicate.

User Causes

A merge of two sub-domain is probably caused by a network outage being

repaired so that the nodes of the two sub-domains can now communicate.

Recommended Actions

No action is necessary since the nodes will be automatically synchronized

and brought online in the surviving domain.

Detail Data

DETECTING MODULE

RSCT,ConfigRMGroup.C,1.331,919

ERROR ID

REFERENCE CODE

---------------------------------------------------------------------------

LABEL: SC_DISK_ERR10

IDENTIFIER: 65DE6DE3

Date/Time: Fri Dec 7 05:46:25 2012

Sequence Number: 1650

Machine Id: 00F70C994C00

Node Id: glvma1

Class: S

Type: PERM

WPAR: Global

Resource Name: hdisk9

Description

REQUESTED OPERATION CANNOT BE PERFORMED

Probable Causes

DASD DEVICE

User Causes

RESOURCE NOT AVAILABLE

UNAUTHORIZED ACCESS ATTEMPTED

Recommended Actions

FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY

PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes

MEDIA

DISK DRIVE

Recommended Actions

FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY

PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data

PATH ID

SENSE DATA

0A00 2A00 0000 0000 0000 0804 0000 0000 0000 0000 0000 0000 0118 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0004 0200 0000 0000 0000 0000 0000 0000 0000 0093 0000

0000 003D 0017

---------------------------------------------------------------------------

LABEL: CONFIGRM_SITE_MERGE

IDENTIFIER: 1BD32427

Date/Time: Fri Dec 7 05:45:26 2012

Sequence Number: 1595

Machine Id: 00F70C994C00

Node Id: glvma1

Class: O

Type: INFO

WPAR: Global

Resource Name: ConfigRM

Description

ConfigRM received Site Merge event notification

Probable Causes

Networks between sites may have been reconnected

Failure Causes

Networks between sites may have been reconnected

Recommended Actions

Verify the network connection between sites

Detail Data

DETECTING MODULE

RSCT,ConfigRMGroup.C,1.331,1436

ERROR ID

REFERENCE CODE

DIAGNOSTIC EXPLANATION

---------------------------------------------------------------------------

After the merge event, observe that the persistent reservation has been released from the devrsrv command (Example 10-14), and from the Storage Manager GUI, as shown Figure 10-11.

Example 10-14 Devrsrv command output after merge event

# devrsrv -c query -l hdisk9

Device Reservation State Information

==================================================

Device Name : hdisk9

Device Open On Current Host? : NO

ODM Reservation Policy : PR EXCLUSIVE

ODM PR Key Value : 5071576869747560161

Device Reservation State : NO RESERVE

Registered PR Keys : No Keys Registered

PR Capabilities Byte[2] : 0xd SIP_C ATP_C PTPL_C

PR Capabilities Byte[3] : 0x0

PR Types Supported : NOT VALID

Figure 10-11 Persistent reserve release

Important: Although there are several methods to manually unlock the persistent reserve, do not perform a reserve operation outside of PowerHA. If, in any way, the persistent reserve leads to a problem, contact IBM support first.

In summary from this test scenario, we can observe that the tie breaker disk prevents others nodes from acquiring the resources after the split events occurs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10. Cluster partition management

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 10. Cluster partition management