Setting up a hot standby

It is a very good practice, if not an outright requirement, to have a second online copy of a PostgreSQL server in high availability clusters. Without such an online server, recovery from an outage may require hours of incidence response, backup recovery, and server provisioning. We have everything to gain by having extra online servers.

In addition, the process of setting up a hot standby acts as the basis for creating PostgreSQL streaming replicas. This means that we can reuse this recipe over and over again anytime we need to create PostgreSQL mirrors, provision extra backup copies, set up test instances, and so on.

All of this is made possible by the pg_basebackup command.

Getting ready

A hot standby server should have similar, if not exactly the same, specifications as the PostgreSQL server it is subscribed to. Try to accomplish this if possible. Also refer to the previous Securing the WAL stream recipe, as we will be consuming WAL files in this recipe.

How to do it...

For this scenario, the server at 192.168.56.10 is the primary PostgreSQL server, and 192.168.56.20 will be the new copy. Once again, arc_server will be the location of the archive server with old WAL files. On all PostgreSQL servers, our data directory should be located at /db/pgdata.

Follow these steps to build a PostgreSQL hot standby:

  1. Ensure that the pg_hba.conf file on the primary server contains this line:
    host   replication   rep_user   192.168.56.20/32   trust
  2. Reload the configuration files on the primary server with the following command as the postgres user:
    pg_ctl -D /db/pgdata reload
    
  3. Ensure that the wal_level and max_wal_senders settings in postgresql.conf are set as follows on the primary server:
    wal_level = hot_standby
    max_wal_senders = 5
  4. Create the replication user if it doesn't already exist with this SQL statement:
    CREATE USER rep_user WITH REPLICATION;
  5. On the new server replica, create the /db/pgdata and /db/pg_archived directories with these commands as a root-level user:
    sudo mkdir -p /db/pgdata /db/pg_archived
    sudo chown postgres:postgres /db/*
    sudo chmod 0700 /db/pgdata /db/pg_archived
    
  6. Create a file named /etc/cron.d/postgres with the following contents in a single line:
    * * * * * postgres flock /tmp/wal_sync rsync -aq --del arc_server::wal_fetch/ /db/pg_archived
  7. Copy the primary server data with this command on the secondary server as the postgres user:
    pg_basebackup -D /db/pgdata -h 192.168.56.10 -U rep_user
    
  8. Create a file named /db/pgdata/recovery.conf and fill it with the following contents:
    standby_mode = on
    restore_command = 'pg_standby /db/pg_archived %f %p'
  9. Ensure that the postgresql.conf file on the standby server contains the following setting:
    hot_standby = on
  10. Start the PostgreSQL server on the standby server with this command:
    pg_ctl -D /db/pgdata start
    

How it works...

The first thing we do with this recipe is allow the new PostgreSQL server to retrieve data from the primary server. There are a few ways to do this, but for the sake of demonstration, we created a rule for the server at 192.168.56.20 to connect to the replication pseudo-database. This allows tools such as pg_basebackup to copy database files from the primary database when we initialize the replica. Once we reload the configuration files, rep_user should have sufficient access to copy PostgreSQL data files.

In a related concern, we must ensure that the wal_level setting of the primary server is set to hot_standby and that max_wal_senders is a value greater than 0. Earlier chapters on configuring PostgreSQL have already made this suggestion, but this recipe won't work at all if these parameters are set wrong.

Next, we should make sure that rep_user exists. Earlier chapters contained instructions to create this user, but it doesn't hurt to double-check. Regardless of what user we use to copy data, it must have the replication permission used in the CREATE USER syntax.

Next, the new child server needs the same data directory as its parent. We also want to have a location to synchronize WAL files so that the copy can process them and remain up to date. We set the permissions so that only the postgres user can view their contents. We should end up with something like this:

How it works...

With these two directories in place, it's time to copy WAL files from the archive server. To accomplish this, we create a file in /etc/cron.d that will execute an rsync command every minute. This rsync command will copy WAL files from the archive server to the /db/pg_archived directory. The -a parameter ensures that it will include file permissions and ownership, and -q will suppress non-error messages so it's easier to tell if something went wrong. We have also added the --del setting, which will cause rsync to delete any files that don't exist on the archive server.

Tip

Why every minute? It prevents the hot standby from falling too far behind, without making use of pure PostgreSQL replication. If you want to use this server as an insurance policy, it might be a good idea to delay it behind the source database by an hour. This way, mistakes will not appear on the standby for an hour, giving us a chance to fix problems before they taint database copies. To sync every hour, change the * * * * * portion of the rsync command to 0 * * * *.

As we're launching rsync asynchronously, we use flock to create a temporary lock file in the /tmp directory. This way, if the primary server produced a large burst of WAL files, we won't have two conflicting rsync commands trying to copy the files to /db/pg_archived.

Once we've established a stream for WAL files, we need to copy the actual database. For this, we use the pg_basebackup command. While pg_basebackup is, theoretically, a backup utility, it serves a dual purpose. When launched with the -D parameter, it copies the server data files from the host indicated by the -h parameter and saves them to the indicated directory. Thus, our pg_basebackup command copied files from 192.168.56.10 to /db/pgdata. This produces a PostgreSQL data directory capable of hosting a running database. We also used the -U setting to use the rep_user user that we created specifically for replication-related tasks.

Next, we want to start the PostgreSQL hot standby, but first we need to tell it how to recover WAL files. We create a file named recovery.conf, and if this file exists, PostgreSQL will enter recovery mode instead of normal operation. In this recovery mode, it expects to process WAL files until there are no more available. However, we set standby_mode to on in this file, which tells PostgreSQL to wait forever under the assumption that more WAL files will arrive later. This is continuous recovery, and this is what makes a hot standby work.

Another setting that we use in recovery.conf is restore_command. Here, we use the pg_standby utility to regularly consume WAL files in the /db/pg_archived directory. We could have simply copied the files with cp, but this produces annoying output in our logs that looks like this:

How it works...

These errors do nothing but add useless noise to the logs. We could suppress these errors from cp, but if there was an actual error, we would miss it. Using pg_standby is just easier.

Before we start the PostgreSQL hot standby, there's one more thing to confirm. Simply having a standby is useful, but having a readable standby is better. By enabling hot_standby in the postgresql.conf file, we can execute the basic select statements against the standby database.

Once we start the database on the replica, we should have a fully functional hot standby PostgreSQL server.

See also

As this is such a common configuration, the PostgreSQL documents discuss it at great length. We also made extensive use of the pg_basebackup and pg_standby commands. You can find out more information about these from the following URLs:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.237.24