Correcting a DRBD split brain

One looming danger when running any replication system is that of node status conflicts. This happens when more than one node has been primary, and we want to reestablish the previous mirror state. This can happen in many ways, but a common scenario can occur if the existing primary node experiences a sudden failure and the remaining secondary node is promoted to primary status.

In the case where we repair the old primary node, we can't simply reattach it to the DRBD network and expect successful synchronization. In cases where the last status for each node is that of a primary, DRBD will not resolve this conflict automatically. It is our job to manually choose the best primary node from our available choices, and reattach the other node.

In this recipe, we'll explore the steps necessary to reattach a malfunctioning node to an existing DRBD architecture. We can't have a highly available PostgreSQL cluster with only one functional node.

Getting ready

Since we're working with DRBD and need a fully established mirror, please follow steps in all the recipes up to Adding block-level replication before continuing. In addition, we need to simulate a split brain. A very easy way to do this is to put both nodes in primary state while disconnected from each other.

Assuming that we have nodes pg1 and pg2, where pg1 is the current primary node, follow these instructions as the root user to cause a split brain:

  1. On both nodes, disconnect from DRBD with this command:
    drbdadm disconnect pg
    
  2. On pg2, execute this command to force it into primary status:
    drbdadm primary --force pg
    

If we were to use drbdadm to attempt and connect the nodes now, we would see the following message in the system logs:

Split-Brain detected but unresolved, dropping connection!

How to do it...

Follow these instructions as the root user to repair a split-brain scenario:

  1. First, decide which node should be the new primary. This should be relatively easy, since some event likely precipitated the node mismatch. For the remainder of this recipe, we will assume pg2 should be the new primary node.
  2. Prepare each server by assuring that each is disconnected from the other:
    drbdadm disconnect pg
    
  3. Disable the VG_POSTGRES volume with vgchange on pg1:
    vgchange -a n VG_POSTGRES
    
  4. Use drbdadm to downgrade pg1 to secondary status:
    drbdadm secondary pg
    
  5. Execute this command on pg1 to connect while discarding metadata:
    drbdadm connect --discard-my-data pg
    
  6. Execute this command on pg2 to connect to DRBD:
    drbdadm connect pg
    

How it works...

The first step is clearly the most critical. We need to determine which node has the most recent valid data. In almost all cases, there should be sufficient logs to make this determination. However, in some network disruption scenarios coupled with automated failover solutions, this may not be obvious. Unfortunately, resolving this step is too varied to adequately express in a simple guide.

Tip

If you are unsure of how to continue following an extremely complicated failure scenario, we strongly recommend contacting Linbit, which maintains the DRBD software. Their support information is available at this URL:

http://www.linbit.com/en/products-and-services/drbd-support

For our example, we manually promoted the pg2 node, so it should be the new primary. With that in mind, there are many states DRBD could have right now, and we want one in particular: StandAlone. By disconnecting both nodes, we don't have to worry about aborted or premature connection attempts disrupting our progress. We want both nodes to report StandAlone in /proc/drbd as the connection state (cs), as shown in this screenshot:

How it works...

Our next step is actually related to LVM. If DRBD is primary on a node, the second LVM layer is probably active as well. Since LVM uses the underlying DRBD device, we can't demote this node to secondary status until we use vgchange to set the active (-a) state of VG_POSTGRES to no (n).

Given that there are no other elements connected to /dev/drbd0, we can set its status to secondary with drbdadm. While in secondary state, we can attempt to connect to the DRBD network with drbdadm connect. Since both nodes were primary at one point, each was maintaining a different map of modified blocks; these maps will not match. If this happens, DRBD will refuse to connect to the network, and it will revert to the StandAlone status.

To prevent that, we add --discard-my-data to the connect operation. This option acknowledges the situation, and it tells the secondary node to ignore its own change map in favor of what the primary node may contain. If the secondary node is too out-of-date for the update map, DRBD will simply resynchronize all data on the device.

Of course, none of this will happen until we invoke drbdadm connect from the new primary node. We do this last because we can always change our minds and abort the process. If we did this before connecting the secondary node, previously existing storage maps have already been discarded, and resynchronization is already taking place.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.17.12