Software in the server space is normally fairly stable. However, elements such as security updates and bug fixes must be applied. Highly available servers can't be stopped often, but without important upgrades, they could crash or experience a breach, which would be far more serious.
Then how do we ensure that updates can be applied safely while maintaining consistent availability? Once again, this often comes down to preparation. We prepare by having duplicate online data copies and by abstracting access paths. With architecture like this in place, we can switch to a backup server while upgrading the primary; thus, the database never actually goes offline.
We'll explore this scenario here, especially as it will be a very common one.
For this section, we need at least one extra server with PostgreSQL installed. This server should be running a copy of our database. We can follow the Managing system migrations recipe to build a copy if we don't already have one available. We will also use ideas introduced in the Exploring the magic of virtual IPs recipe. Reviewing these recipes now might be a good idea.
For this scenario, assume that we have two servers with the addresses 192.168.1.10
and 192.168.1.20
, where 192.168.1.10
is currently the primary server. In addition, we have a virtual IP address of 192.168.1.30
on the eth0:pgvip
Ethernet device. To upgrade the PostgreSQL software on both nodes, follow these steps:
192.168.1.20
as the postgres
user using this command:pg_ctl -D /path/to/database stop -m fast
192.168.1.20
:sudo apt-get install postgresql-9.3
192.168.1.20
as the postgres
user:pg_ctl -D /path/to/database start
192.168.1.10
, stop the virtual IP address with the following command:sudo ifconfig eth0:pgvip down
192.168.1.10
:CHECKPOINT;
192.168.1.10
and issue the following query to check replication status:SELECT sent_location, replay_locationFROM pg_stat_replicationWHERE usename = 'rep_user';
sent_location
and replay_location
match.postgres
, stop the PostgreSQL service on 192.168.1.10
with this command:pg_ctl -D /path/to/database stop -m fast
postgres
, promote the PostgreSQL replica on 192.168.1.20
with this command:pg_ctl -D /path/to/database promote
192.168.1.20
, start the virtual IP address with the following command:sudo ifconfig eth0:pgvip 192.168.1.30 up
192.168.1.10
as already performed on 192.168.1.20
.192.168.1.10
as the postgres
user this way:rm -Rf /path/to/database
pg_basebackup
on 192.168.1.10
to make a copy of the upgraded database on 192.168.1.20
:pg_basebackup -U rep_user -h 192.168.1.20 -D /path/to/database
recovery.con
f in /path/to/databas
e with the following contents:standby_mode = 'on' primary_conninfo = 'host=192.168.1.20 port=5432 user=rep_user'
postgres
user on 192.168.1.10
using the following command:pg_ctl -D /path/to/database start
This entire process is very long, but we hope to illustrate that it is actually very straightforward. The first step is to upgrade the mirror copy of the database under the assumption that it is not actively utilized by applications or users. The role of the secondary node in this case is to act as an emergency backup for the primary database node. As it's not being used, we are able to stop the database, perform any updates necessary, and start it and allow it to synchronize again.
Afterwards, we isolate the primary database node by disabling the virtual IP address. This allows the streaming replica to replay the last few active transactions so that it's fully synchronized before we make it the new primary database. We accomplish this by issuing CHECKPOINT
and watching the replication status until it matches on both systems. When the replication status matches, we can stop the primary PostgreSQL server; its role in the process is complete.
As software upgrades may take some time to complete or require a server restart, we need to immediately make the secondary node available as the primary database. We start by promoting the replica to become the new primary by sending the promote
command to pg_ctl
. Once the database is writable, we reinstate the 192.168.1.30
virtual IP address so that applications and users can reconnect safely.
This process of node switching is fairly quick, provided we already have a replica ready to take over. With the replica acting as a primary, the next step is to perform any upgrades necessary, just as we did on the secondary node. After the upgrades are finished, we cannot simply restart the primary database again, as the replica has been acting as a primary database for a period of time.
This means that we need to rebuild the primary database as a new replica. This makes both nodes ready for the next upgrade and maintains the two-node relationship. We start this process by erasing the old contents of the database and then use pg_basebackup
to copy the current primary database. Then, we create a new recovery.conf
file and direct it to act as a new replica. Once the replica is started, we have the same configuration as we had earlier, but now, the roles are reversed; 192.168.1.20
is the primary, and 192.168.1.10
is the replica.
Astute readers may have noticed that using pg_basebackup
to copy the entire database following a minor upgrade is somewhat wasteful. We agree! In the later recipes, we will make use of rsync
or PostgreSQL-specific software to perform these tasks instead. This recipe was already pretty long, and setting up rsync
properly for this operation would have added quite a bit more time. The point is to show you the switching process; feel free to substitute better methods you know for synchronizing data.
rsync
, a newer utility named pg_rewind
can make resetting replicas much easier. It is beyond the scope of this chapter, so we recommend that you read more about it at https://github.com/vmware/pg_rewind.3.145.70.38