It is a very good practice, if not an outright requirement, to have a second online copy of a PostgreSQL server in high availability clusters. Without such an online server, recovery from an outage may require hours of incidence response, backup recovery, and server provisioning. We have everything to gain by having extra online servers.
In addition, the process of setting up a hot standby acts as the basis for creating PostgreSQL streaming replicas. This means that we can reuse this recipe over and over again anytime we need to create PostgreSQL mirrors, provision extra backup copies, set up test instances, and so on.
All of this is made possible by the pg_basebackup
command.
A hot standby server should have similar, if not exactly the same, specifications as the PostgreSQL server it is subscribed to. Try to accomplish this if possible. Also refer to the previous Securing the WAL stream recipe, as we will be consuming WAL files in this recipe.
For this scenario, the server at 192.168.56.10
is the primary PostgreSQL server, and 192.168.56.20
will be the new copy. Once again, arc_server
will be the location of the archive server with old WAL files. On all PostgreSQL servers, our data directory should be located at /db/pgdata
.
Follow these steps to build a PostgreSQL hot standby:
pg_hba.conf
file on the primary server contains this line:host replication rep_user 192.168.56.20/32 trust
postgres
user:pg_ctl -D /db/pgdata reload
wal_level
and max_wal_senders
settings in postgresql.conf
are set as follows on the primary server:wal_level = hot_standby max_wal_senders = 5
CREATE USER rep_user WITH REPLICATION;
/db/pgdata
and /db/pg_archived
directories with these commands as a root-level user:sudo mkdir -p /db/pgdata /db/pg_archived sudo chown postgres:postgres /db/* sudo chmod 0700 /db/pgdata /db/pg_archived
/etc/cron.d/postgres
with the following contents in a single line:* * * * * postgres flock /tmp/wal_sync rsync -aq --del arc_server::wal_fetch/ /db/pg_archived
postgres
user:pg_basebackup -D /db/pgdata -h 192.168.56.10 -U rep_user
/db/pgdata/recovery.conf
and fill it with the following contents:standby_mode = on restore_command = 'pg_standby /db/pg_archived %f %p'
postgresql.conf
file on the standby server contains the following setting:hot_standby = on
pg_ctl -D /db/pgdata start
The first thing we do with this recipe is allow the new PostgreSQL server to retrieve data from the primary server. There are a few ways to do this, but for the sake of demonstration, we created a rule for the server at 192.168.56.20
to connect to the replication
pseudo-database. This allows tools such as pg_basebackup
to copy database files from the primary database when we initialize the replica. Once we reload the configuration files, rep_user
should have sufficient access to copy PostgreSQL data files.
In a related concern, we must ensure that the wal_level
setting of the primary server is set to hot_standby
and that max_wal_senders
is a value greater than 0
. Earlier chapters on configuring PostgreSQL have already made this suggestion, but this recipe won't work at all if these parameters are set wrong.
Next, we should make sure that rep_user
exists. Earlier chapters contained instructions to create this user, but it doesn't hurt to double-check. Regardless of what user we use to copy data, it must have the replication
permission used in the CREATE USER
syntax.
Next, the new child server needs the same data directory as its parent. We also want to have a location to synchronize WAL files so that the copy can process them and remain up to date. We set the permissions so that only the postgres
user can view their contents. We should end up with something like this:
With these two directories in place, it's time to copy WAL files from the archive server. To accomplish this, we create a file in /etc/cron.d
that will execute an rsync
command every minute. This rsync
command will copy WAL files from the archive server to the /db/pg_archived
directory. The -a
parameter ensures that it will include file permissions and ownership, and -q
will suppress non-error messages so it's easier to tell if something went wrong. We have also added the --del
setting, which will cause rsync
to delete any files that don't exist on the archive server.
Why every minute? It prevents the hot standby from falling too far behind, without making use of pure PostgreSQL replication. If you want to use this server as an insurance policy, it might be a good idea to delay it behind the source database by an hour. This way, mistakes will not appear on the standby for an hour, giving us a chance to fix problems before they taint database copies. To sync every hour, change the * * * * *
portion of the rsync
command to 0 * * * *
.
As we're launching rsync
asynchronously, we use flock
to create a temporary lock file in the /tmp
directory. This way, if the primary server produced a large burst of WAL files, we won't have two conflicting rsync
commands trying to copy the files to /db/pg_archived
.
Once we've established a stream for WAL files, we need to copy the actual database. For this, we use the pg_basebackup
command. While pg_basebackup
is, theoretically, a backup utility, it serves a dual purpose. When launched with the -D
parameter, it copies the server data files from the host indicated by the -h
parameter and saves them to the indicated directory. Thus, our pg_basebackup
command copied files from 192.168.56.10
to /db/pgdata
. This produces a PostgreSQL data directory capable of hosting a running database. We also used the -U
setting to use the rep_user
user that we created specifically for replication-related tasks.
Next, we want to start the PostgreSQL hot standby, but first we need to tell it how to recover WAL files. We create a file named recovery.conf
, and if this file exists, PostgreSQL will enter recovery mode instead of normal operation. In this recovery mode, it expects to process WAL files until there are no more available. However, we set standby_mode
to on
in this file, which tells PostgreSQL to wait forever under the assumption that more WAL files will arrive later. This is continuous recovery, and this is what makes a hot standby work.
Another setting that we use in recovery.conf
is restore_command
. Here, we use the pg_standby
utility to regularly consume WAL files in the /db/pg_archived
directory. We could have simply copied the files with cp
, but this produces annoying output in our logs that looks like this:
These errors do nothing but add useless noise to the logs. We could suppress these errors from cp
, but if there was an actual error, we would miss it. Using pg_standby
is just easier.
Before we start the PostgreSQL hot standby, there's one more thing to confirm. Simply having a standby is useful, but having a readable standby is better. By enabling hot_standby
in the postgresql.conf
file, we can execute the basic select statements against the standby database.
Once we start the database on the replica, we should have a fully functional hot standby PostgreSQL server.
18.220.237.24