Adding DRBD to cluster management

DRBD is actually one of the most difficult resources to manage with Pacemaker. Unlike a regular service that is started or stopped depending on where it is active, DRBD is always active. The only thing that changes between two nodes running DRBD is the Primary or Secondary state ascribed to each.

Due to this complication, DRBD is not one resource, but two:

  • A DRBD resource to manage starting and stopping DRBD
  • A master/slave resource to control which node acts as the Primary

In this recipe, we'll allocate both of these resources so that Pacemaker can manage DRBD properly.

Getting ready

As we're continuing to configure Pacemaker, make sure you've followed all previous recipes.

How to do it...

In the previous chapter, we created a DRBD resource named pg. With this in mind, follow these steps as the root user to add DRBD to Pacemaker:

  1. Create a basic Pacemaker primitive for DRBD with this command:
    crm configure primitive drbd_pg ocf:linbit:drbd 
        params drbd_resource="pg" 
        op monitor interval="15" role="Master" 
        op monitor interval="20" role="Slave" 
        op start interval="0" timeout="240" 
        op stop interval="0" timeout="120"
    
  2. Create a master/slave resource with this command:
    crm configure ms ms_drbd_pg drbd_pg 
        meta master-max="1" master-node-max="1" 
        clone-max="2" clone-node-max="1" notify="true"
    
  3. Clean up any errors that might have accumulated with crm:
    crm resource cleanup drbd_pg
    
  4. Display the status of our new resources with crm:
    crm resource status
    

How it works...

Most of the resources we create in subsequent sections are called primitives. These should be considered the base resource element that Pacemaker controls, as they have a one-to-one relationship with each service. The first of these we create is for our DRBD service.

When creating new configuration entries with crm, we declare them with configure primitive, and then we must supply a name. To keep things simple, we named this resource drbd_pg. After the name, we must supply a resource agent to actually manage this service. Pacemaker is shipped with several, but we are specifically interested in the ocf:linbit:drbd agent, as it was written by the makers of DRBD themselves.

Next, we can configure the resource agent by specifying params, followed by the options it recognizes, labeled with op. Among these options, we define a monitor interval for the master server and one for the slave that isn't quite as frequent. Then, finally, we override the start timeout and stop timeout so that they match the minimum values expected by Pacemaker. It will complain if we use values lower than this, but feel free to increase them.

Next, we create the master/slave resource that controls how Pacemaker views the drbd_pg resource. Instead of adding and configuring a primitive, this time we configure a ms (master slave resource) and name it ms_drbd_pg. After naming our ms resource, we designate drbd_pg as the primitive to treat as a master or slave service. All of the entries after the meta designation are somewhat confusing and arbitrary, so we hope these pointers help:

  • By setting master-max to 1, we tell Pacemaker that only one node in the cluster can ever be promoted to master for this service.
  • Similarly, setting master-node-max to 1 limits Pacemaker to a single copy of this resource per server.
  • The clone-max setting actually describes the amount of active copies for this resource, which is 2 in our case.
  • Oddly enough, the clone-node-max setting means basically the same thing as master-node-max. We set this to 1 as well to safeguard the DRBD resource from potential Pacemaker bugs or future changes in default settings.
  • Finally, the notify setting effectively transmits master/slave notices to all nodes so that Pacemaker knows the new status of the shared resource everywhere it is running.

What do we mean by a resource copy? Internally, Pacemaker stores resources as defined roles. If a single resource has two roles, it actually exists as two items within Pacemaker. In Pacemaker lingo, these are referred to as clones. The crm system hides these details from us, but they're still very real and difficult to manage.

The values we chose for all of the meta options are actually Pacemaker defaults. We could have omitted them, but a high-availability system cannot remain safe while it is at the mercy of malleable defaults. We set these in stone now to prevent Pacemaker upgrades from potentially causing problems in the future.

When adding new resources, sometimes Pacemaker enters an undefined state and lists errors that aren't actually valid. We can clear these out using the resource cleanup parameter to target the drbd_pg primitive. It's always a good idea to keep Pacemaker status clean to avoid possible conflicts later.

Our final job is to view the status of all configured resources by calling crm with resource status. Our test system showed this output:

How it works...

Even though we created two primitive resources, we only see one entry: ms_drbd_pg. Note, however, that it represents the drbd_pg resource. We can also see the Masters and Slaves for this Set, though there should never be more than one of each with the configuration we used.

There's more...

In Pacemaker, resource agents can be viewed separately with the crm program, and many are available. To get a list of all the LSB resource agents (scripts in /etc/init.d) Pacemaker can see, use this command:

crm ra list lsb

For a list of Pacemaker-specific agents, use this command:

crm ra list ocf

By itself, this information isn't entirely helpful. Knowing that the agents exist does not tell us what parameters they have. To see this, we need to view the meta information for the agent. We used the ocf:linbit:drbd agent in this recipe, and we can view its usage information with this command:

crm ra meta ocf:linbit:drbd

If this is not convenient enough, we can actually use the man command for most agents as well. If we know the class, provider, and name of an agent, we can view its Unix manual. For example, to see the manual for the ocf:heartbeat:nginx agent, we could use this command:

man ocf_heartbeat_nginx

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.186.219