Switching live stack systems

At this point, we have our data located simultaneously on two servers. The second system can fulfill many possible roles. It can replace the current node in case of hardware failure, or allow us to perform server maintenance or upgrades with very little downtime.

Regardless of our intent, properly utilizing the second system is the key to a highly available database server. In this recipe, we'll discuss the proper method for activating the second server in a two-node pair so that we can make changes to one or both nodes.

Getting ready

By now, we need the full stack and probably a fully active database server as well. Follow all the recipes up to Tweaking XFS performance before starting here.

How to do it...

For this recipe, we will need two PostgreSQL servers, pg1 and pg2, where pg1 is the currently active node. Follow these steps as the root user on the system indicated to move an active PostgreSQL service from one node to another:

  1. Stop the PostgreSQL service with pg_ctl on pg1:
    pg_ctl -D /db/pgdata stop -m fast
    
  2. Unmount the /db/pgdata filesystem on pg1:
    umount /db/pgdata
    
  3. Mark the VG_POSTGRES group as inactive using vgchange on pg1:
    vgchange -a n VG_POSTGRES
    
  4. Demote DRBD status to secondary with drbdadm on pg1:
    drbdadm secondary pg
    
  5. Promote DRBD status to primary with drbdadm on pg2:
    drbdadm primary pg
    
  6. Mark the VG_POSTGRES group as active using vgchange on pg2:
    vgchange -a y VG_POSTGRES
    
  7. Mount the /db/pgdata filesystem on pg2:
    mount -t xfs -o noatime,nodiratime -o logbsize=256k,allocsize=1m /dev/VG_POSTGRES/LV_DATA /db
    
  8. Start PostgreSQL on pg2:
    pg_ctl -D /db/pgdata start
    

How it works...

There is actually very little in this recipe that we have not done in this chapter. What we have actually done here is formalized the steps necessary to tear down and build up an active stack. We start the process by stopping the PostgreSQL service with pg_ctl, as we clearly can't move the data while it's still in use.

Next, we use umount to decouple the /dev/VG_POSTGRES/LV_DATA device from the /db directory. With no locks on the storage volume, we can use vgchange with the -a parameter set to n to deactivate any volume in the VG_POSTGRES group. Since the VG_POSTGRES group actually resides on the DRBD device, it can only be active on one node at a time.

Once the volumes are no longer active, we can set the DRBD status to secondary with drbdadm. After we perform this step, the /dev/VG_POSTGRES directory and any corresponding device will actually disappear. This is because a DRBD device in secondary status is only active within DRBD. Here is what DRBD shows us in /proc/drbd regarding the situation:

How it works...

DRBD sees the device as Secondary on both nodes; currently, neither node can access our PostgreSQL data. From this point, we merely reverse the process to reactivate all of these resources on pg2 instead.

We begin reactivating PostgreSQL by promoting the storage to primary status with drbdadm on the pg2 node. This causes the requisite VG_POSTGRES volume group to appear on pg2, making it a candidate for activation with vgchange.

Now we simply reuse the mount command that we discussed in the Tweaking XFS performance recipe on the pg2 node, making the data available to us once again. If we start PostgreSQL with the pg_ctl control script, our database will begin running as if it were still on the pg1 node. PostgreSQL does not know anything has changed.

There's more...

Since data can switch nodes arbitrarily as demonstrated here, upgrades and maintenance to server hardware are much easier. What can we do with the extra node? We can reboot it, apply firmware or kernel updates, apply security patches, or even update the database software to a bug-fix release.

Following any required or suggested changes to the secondary node, we merely promote it to run PostgreSQL in place of the current server. Then, we can repeat modifications on the other node. With this, we can limit outages to a matter of seconds while still providing high uptime guarantees, all without skipping system maintenance.

In fact, this process is so standardized that we will be exploring it in great detail in the next chapter. Once this tear-down and build-up procedure is automated, maintaining or replacing servers is even easier.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.170.223