In the previous section, you have seen that enabling archiving takes just a handful of lines and offers a great deal of flexibility. In this section, we will see how to create a so called base backup, which can be used to apply XLOG later on. A base backup is an initial copy of the data.
In PostgreSQL, there are two main options to create an initial base backup:
pg_basebackup
rsync
based methodsThe following two sections will explain in detail how a base backup can be created:
The first and most common method to create a backup of an existing server is to run a command called pg_basebackup
, which was introduced in PostgreSQL 9.1.0. Basically pg_basebackup
is able to fetch a database base backup directly over a database connection. When executed on the slave, pg_basebackup
will connect to the database server of your choice and copy all the data files in the data directory over to your machine. There is no need to log into the box anymore, and all it takes is one line of code to run it; pg_basebackup
will do all the rest for you.
In this example, we will assume that we want to take a base backup of a host called sample.postgresql-support.de
. The following steps must be performed:
pg_hba.conf
to allow replicationpg_hba.conf
changes into accountpg_basebackup
To allow remote boxes to log into a PostgreSQL server and to stream XLOG, you have to explicitly allow replication.
In PostgreSQL, there is a file called pg_hba.conf
, which tells the server which boxes are allowed to connect using which type of credentials. Entire IP ranges can be allowed or simply discarded through pg_hba.conf
.
To enable replication, we have to add one line for each IP range we want to allow. The following listing contains an example of a valid configuration:
# TYPE DATABASE USER ADDRESS METHOD
In this case we allow replication connections from 192.168.0.34
. The IP range is identified by 32
(which simply represents a single server in our case). We have decided to use MD5 as our authentication method. It means that the pg_basebackup
has to supply a password to the server. If you are doing this in a non-security critical environment, using trust
as authentication method might also be an option.
What happens if you actually have a database called replication
in your system? Basically, setting the database to replication
will just configure your streaming behavior, if you want to put in rules dealing with the database called replication
, you have to quote the database name as follows: "replication"
. However, we strongly advise not to do this kind of trickery to avoid confusion.
Once pg_hba.conf
has been changed, we can tell PostgreSQL to reload the configuration. There is no need to restart the database completely. We have three options to make PostgreSQL reload pg_hba.conf
:
SELECT pg_reload_conf();
kill –HUP 4711
(with 4711
being the process ID of the master)pg_ctl
: pg_ctl –D $PGDATA reload
(with $PGDATA
being the home directory of your database instance)Once we have told the server acting as data source to accept streaming connections, we can move forward and run pg_basebackup
.
pg_basebackup
is a very simple-to-use command-line tool for PostgreSQL. It has to be called on the target system and will provide you with a ready-to-use base backup, which is ready to consume the transaction log for Point-In-Time-Recovery.
The syntax of pg_basebackup
is as follows:
iMac:dbhs$ pg_basebackup --help
pg_basebackup takes a base backup of a running PostgreSQL server.
Usage:
pg_basebackup [OPTION]...
Options controlling the output:
-D, --pgdata=DIRECTORY receive base backup into
directory
-F, --format=p|t output format (plain (default),
tar)
-x, --xlog include required WAL files in
backup (fetch mode)
-X, --xlog-method=fetch|stream
include required WAL files with
specified method
-z, --gzip compress tar output
-Z, --compress=0-9 compress tar output with given
compression level
General options:
-c, --checkpoint=fast|spread
set fast or spread checkpointing
-l, --label=LABEL set backup label
-P, --progress show progress information
-v, --verbose output verbose messages
-V, --version output version information, then exit
-?, --help show this help, then exit
Connection options:
-h, --host=HOSTNAME database server host or
socket directory
-p, --port=PORT database server port number
-s, --status-interval=INTERVAL
time between status packets sent to server (in seconds)
-U, --username=NAME connect as specified database
user
-w, --no-password never prompt for password
-W, --password force password prompt (should
happen automatically)
A basic call to pg_basebackup
would look like this:
iMac:dbhs$ pg_basebackup -D /target_directory -h sample.postgresql-support.de
In this example, we will fetch the base backup from sample.postgresql-support.de
and put it into our local directory called /target_directory
. It just takes this simple line to copy an entire database instance to the target system.
When we create a base backup as shown in this section, pg_basebackup
will connect to the server and wait for a checkpoint to happen before the actual copy process is started. In this mode, this is necessary because the replay process will start exactly at this point in the XLOG. The problem is that it might take a while until a checkpoint kicks in; pg_basebackup
does not enforce a checkpoint on the source server straight away to make sure that normal operations are not disturbed.
By default, a plain base backup will be created. It will consist of all the files in directories found on the source server. If the base backup should be stored on tape, we suggest to give –-format=t
a try. It will automatically create a TAR archive (maybe on a tape). If you want to move data to a tape, you can save an intermediate step easily this way. When using TAR, it is usually quite beneficial to use it in combination with --gzip
to reduce the size of the base backup on disk.
Usually a base backup without XLOG is pretty worthless. This is because the base backup is taken while the master is fully operational. While the backup is taken, those storage files in the database instance might have been modified heavily. The purpose of the XLOG is to fix those potential issues in the data files reliably.
But, what if we want to create a base backup, which can live without (explicitly archived) XLOG? In this case, we can use the --xlog-method=stream
option. If this option has been chosen, pg_basebackup
will not just copy the data as it is but it will also stream the XLOG being created during the base backup to our destination server. This will provide us with just enough XLOG to allow us to start a base backup made that way directly. It is self-sufficient and does not need additional XLOG files. This is not Point-In-Time-Recovery but it can come in handy in case of trouble. Having a base backup, which can be started right away, is usually a good thing and it comes at fairly low cost.
If you are planning to use Point-In-Time-Recovery and if there is absolutely no need to start the backup as it is, you can safely skip the XLOG and save some space this way (default mode).
18.227.79.241