Since the release of PostgreSQL 9.0, DBAs have had access to asynchronous streaming replication. Unlike the older hot standby methods used in earlier versions, replica servers can connect to an upstream PostgreSQL server and consume data modifications directly. With low network latency and fast transactions, this means that it is fairly common for streaming replicas to lag behind the master by only a few milliseconds.
In the context of high availability, this means we can scale horizontally by copying the database to multiple servers. Of course, this means that we need to copy the entire database to each server. For small-to medium-sized database instances, this is a relatively minor requirement. This also means that we can produce up-to-date backups, perform ad hoc queries on practically live data, and aggregate information into reports without disrupting our primary database.
This recipe will explain how to set up a streaming asynchronous database replica and explore some of the hidden caveats of doing so.
We will be continuing the work we performed in the Setting up a hot standby recipe, so please refer to that recipe to build a working hot standby. We will alter the standby setup to include streaming replication, and better security.
For this scenario, the server at 192.168.56.10
is the primary PostgreSQL server, and 192.168.56.20
will be the asynchronous replica. Follow these steps to build a PostgreSQL asynchronous replica:
rep_user
user a password with this SQL statement:ALTER USER rep_user WITH PASSWORD 'newpass';
pg_hba.conf
line and remove any references to the rep_user
user. Then, add this line:host replication rep_user 192.168.56.20/32 md5
postgres
user:pg_ctl -D /db/pgdata reload
.pgpass
in the postgres
user's home directory with the following contents:192.168.56.10:*:replication:rep_user:newpass
.pgpass
file to have the correct permissions with this command:Chmod 600 ~/.pgpass
recovery.conf
file on the recovery server to match these lines:standby_mode = on primary_conninfo = 'host=192.168.56.10 user=rep_user' restore_command = 'cp /db/pg_archived/%f %p 2>/dev/null'
postgres
user:pg_ctl -D /db/pgdata reload
SELECT client_addr, usename, state FROM pg_stat_replication;
Using trust
authentication is not generally a recommended practice. It is one thing to copy the database without a password once, but quite another to leave a long-term security hole for all database replicas. This means it is time to ensure that the rep_user
user has a password. We also need to change pg_hba.conf
to reflect the fact that we want to use regular md5
authentication instead of trust
. Once we reload the configuration files on the primary server, we move on to the streaming replica.
To get into the practice of using .pgpass
files, we create one on the replica server for the rep_user
user. The line we created in this file will send our desired password when the sections match; in this case, if we connect to 192.168.56.10
on any port to the replication
database as the rep_user
user, authentication will succeed automatically If any of these are different, the PostgreSQL client libraries will not send a password, and the client will receive an error. This is a fairly easy way to automate password submissions securely. PostgreSQL will also ignore this file if the permissions are wrong, so we set the control flags with chmod
so that only the postgres
user can access it.
Next, we rewrite the contents of the recovery.conf
file to include primary_conninfo
. This line is used to specify the connection information for establishing streaming replication. Since our password is in the .pgpass
file, we don't need to enter it here. We also removed pg_standby
in favor of a regular cp
command with the errors suppressed. Now that our primary method of WAL consumption is directly from another server, we only need WAL files from /db/pg_archived
as a fail back in case the stream is disrupted.
Why do we use .pgpass
instead of entering the password in the recovery.conf
file? It is very common for system automation tools to distribute configuration files to dozens or even hundreds of servers. Using .pgpass
, we can settle on and redistribute passwords easily. In addition, tools that build recovery.conf
will not need to know the password for the replication user.
Once we reload the standby server, it should become a streaming replica instead of a regular hot standby. We can confirm this with the SQL statement that checks the pg_stat_replication
view on the primary server. We should get output similar to this:
Though streaming replication has existed since PostgreSQL 9.0, recent changes to 9.3 include two very helpful additional features:
pg_basebackup
tool puts PostgreSQL in backup mode by invoking the pg_start_backup()
function. As this function writes to the database, it normally can't be used on a streaming server. However, the developers made changes in 9.3 that make it possible to use pg_basebackup
on standby servers. This can greatly simplify the backup process and reduce overhead on the primary server.There are good resources within the PostgreSQL documentation and Wiki regarding streaming replication. For more information, please visit these URLs:
3.144.244.250