The primary mechanism that PostgreSQL uses to provide a data durability guarantee is through its Write Ahead Log (WAL). All transactional data is written to this location before ever being committed to database files. Once WAL files are no longer necessary for crash recovery, PostgreSQL will either delete or archive them. For the purposes of a highly available server, we recommend that you keep these important files as long as possible. There are several reasons for this; they are as follows:
In order to gain these benefits, we need to enable PostgreSQL WAL archiving and save these files until we no longer need them. This section will address our recommendations for long-term storage of WAL files.
In order to properly archive WAL files, we recommend that you provision a server dedicated to backups or file storage. Depending on the transaction volume, an active PostgreSQL database might produce thousands of these on a daily basis. At 16 MB apiece, this is not an idle concern. For instance, for a 1 TB database, we recommend at least 3 TB of storage space.
In addition, we will be using rsync as a daemon on this archive server. To install this on a Debian-based server, execute the following command as a root-level user:
sudo apt-get install rsync
Red-Hat-based systems will need this command instead:
sudo yum install rsync xinetd
Our archive server has a 3 TB mount at the /db
directory and is named arc_server
on our network. The PostgreSQL source server resides at 192.168.56.10
. Follow these steps for long-term storage of important WAL files on an archive server:
/etc/default/rsync
file and change the RSYNC_ENABLE
variable to true
./etc/xinet.d/rsync
file and change the disable
parameter to no
.postgres
user with these commands:sudo mkdir /db/pg_archived sudo chown postgres:postgres /db/pg_archived
/etc/rsyncd.conf
and fill it with the following contents:[wal_store] path = /db/pg_archived comment = DB WAL Archives uid = postgres gid = postgres read only = false hosts allow = 192.168.56.10 hosts deny = *
sudo service rsync start
. sudo service xinetd start
.archive_mode
and archive_command
parameters in postgresql.conf
to read the following:archive_mode = on archive_command = 'rsync -aq %p arc_server::wal_store/%f'
pg_ctl -D $PGDATA restart
The rsync utility is normally used to transfer files between two servers. However, we can take advantage of using it as a daemon to avoid connection overhead imposed by using SSH as an rsync protocol. Our first step is to ensure that the service is not disabled in some manner, which would make the rest of this guide moot.
Next, we need a place to store archived WAL files on the archive server. Assuming that we have 3 TB of space in the /db
directory, we simply claim /db/pg_archived
as the desired storage location. There should be enough space to use /db
for backups as well, but we won't discuss that in this recipe.
Next, we create a file named /etc/rsyncd.conf
, which will configure how rsync operates as a daemon. Here, we name the /db/pg_archived
directory wal_store
so that we can address the path by its name when sending files. We give it a human-readable name and ensure that files are owned by the postgres
user, as this user also controls most of the PostgreSQL-related services.
The next, and possibly the most important step, is to block all hosts but the primary PostgreSQL server from writing to this location. We set hosts deny
to *
, which blocks every server. Then, we set hosts allow
to the primary database server's IP address so that only it has access. If everything goes well, we can start the rsync (or xinetd
on Red Hat systems) service and we can see that in the following screenshot:
Next, we enable archive_mode
by setting it to on
. With archive mode enabled, we can specify a command that will execute when PostgreSQL no longer needs a WAL file for crash recovery. In this case, we invoke the rsync
command with the -a
parameter to preserve elements such as file ownership, timestamps, and so on.
In addition, we specify the -q
setting to suppress output, as PostgreSQL only checks the command exit status to determine its success. In the archive_command
setting, the %p
value represents the full path to the WAL file, and %f
resolves to the filename. In this context, we're sending the WAL file to the archive server at the wal_store
module we defined in rsyncd.conf
.
Once we restart PostgreSQL, it will start storing all the old WAL files by sending them to the archive server.
In case any rsync
command fails because the archive server is unreachable, PostgreSQL will keep trying to send it until it is successful. If the archive server is unreachable for too long, we suggest that you change the archive_command
setting to store files elsewhere. This prevents accidentally overfilling the PostgreSQL server storage.
As we will likely want to use the WAL files on other servers, we suggest that you make a list of all the servers that could need WAL files. Then, modify the rsyncd.conf
file on the archive server and add this section:
[wal_fetch] path = /db/pg_archived comment = DB WAL Archive Retrieval uid = postgres gid = postgres read only = true hosts allow = host1, host2, host3 hosts deny = *
Now, we can fetch WAL files from any of the hosts in hosts allow
. As these are dedicated PostgreSQL replicas, recovery servers, or other defined roles, this makes the archive server a central location for all our WAL needs.
archive_mode
and archive_command
settings on the PostgreSQL site. We've included a link here: http://www.postgresql.org/docs/9.3/static/runtime-config-wal.htmlrsyncd.conf
file should also have its own manual page. Read it with this command to learn more about the available settings:man rsyncd.conf
3.147.27.131