Taking base backups

In the previous section, you have seen that enabling archiving takes just a handful of lines and offers a great deal of flexibility. In this section, we will see how to create a so called base backup, which can be used to apply XLOG later on. A base backup is an initial copy of the data.

Tip

Keep in mind that the XLOG itself is more or less worthless. It is only useful in combination with the initial base backup.

host	replication  all     192.168.0.34/32      md5

In PostgreSQL, there are two main options to create an initial base backup:

  • Using pg_basebackup
  • Traditional copy/rsync based methods

The following two sections will explain in detail how a base backup can be created:

Using pg_basebackup

The first and most common method to create a backup of an existing server is to run a command called pg_basebackup, which was introduced in PostgreSQL 9.1.0. Basically pg_basebackup is able to fetch a database base backup directly over a database connection. When executed on the slave, pg_basebackup will connect to the database server of your choice and copy all the data files in the data directory over to your machine. There is no need to log into the box anymore, and all it takes is one line of code to run it; pg_basebackup will do all the rest for you.

In this example, we will assume that we want to take a base backup of a host called sample.postgresql-support.de. The following steps must be performed:

  • Modify pg_hba.conf to allow replication
  • Signal the master to take pg_hba.conf changes into account
  • Call pg_basebackup

Modifying pg_hba.conf

To allow remote boxes to log into a PostgreSQL server and to stream XLOG, you have to explicitly allow replication.

In PostgreSQL, there is a file called pg_hba.conf, which tells the server which boxes are allowed to connect using which type of credentials. Entire IP ranges can be allowed or simply discarded through pg_hba.conf.

To enable replication, we have to add one line for each IP range we want to allow. The following listing contains an example of a valid configuration:

# TYPE  DATABASE     USER    ADDRESS           METHOD

In this case we allow replication connections from 192.168.0.34. The IP range is identified by 32 (which simply represents a single server in our case). We have decided to use MD5 as our authentication method. It means that the pg_basebackup has to supply a password to the server. If you are doing this in a non-security critical environment, using trust as authentication method might also be an option.

Tip

What happens if you actually have a database called replication in your system? Basically, setting the database to replication will just configure your streaming behavior, if you want to put in rules dealing with the database called replication, you have to quote the database name as follows: "replication". However, we strongly advise not to do this kind of trickery to avoid confusion.

Signaling the master server

Once pg_hba.conf has been changed, we can tell PostgreSQL to reload the configuration. There is no need to restart the database completely. We have three options to make PostgreSQL reload pg_hba.conf:

  • By running an SQL command: SELECT pg_reload_conf();
  • By sending a signal to the master: kill –HUP 4711 (with 4711 being the process ID of the master)
  • By calling pg_ctl: pg_ctl –D $PGDATA reload (with $PGDATA being the home directory of your database instance)

Once we have told the server acting as data source to accept streaming connections, we can move forward and run pg_basebackup.

pg_basebackup – basic features

pg_basebackup is a very simple-to-use command-line tool for PostgreSQL. It has to be called on the target system and will provide you with a ready-to-use base backup, which is ready to consume the transaction log for Point-In-Time-Recovery.

The syntax of pg_basebackup is as follows:

iMac:dbhs$ pg_basebackup --help
pg_basebackup takes a base backup of a running PostgreSQL server.

Usage:
pg_basebackup [OPTION]...

Options controlling the output:
  -D, --pgdata=DIRECTORY receive base backup into 
directory
  -F, --format=p|t       output format (plain (default),
tar)
  -x, --xlog             include required WAL files in 
backup (fetch mode)
  -X, --xlog-method=fetch|stream
include required WAL files with
specified method
  -z, --gzip             compress tar output
  -Z, --compress=0-9     compress tar output with given 
compression level

General options:
  -c, --checkpoint=fast|spread
set fast or spread checkpointing
  -l, --label=LABEL      set backup label
  -P, --progress         show progress information
  -v, --verbose          output verbose messages
  -V, --version          output version information, then exit
  -?, --help             show this help, then exit

Connection options:
  -h, --host=HOSTNAME    database server host or 
socket directory
  -p, --port=PORT        database server port number
  -s, --status-interval=INTERVAL
time between status packets sent to server (in seconds)
  -U, --username=NAME    connect as specified database
user
  -w, --no-password      never prompt for password
  -W, --password         force password prompt (should 
happen automatically)

A basic call to pg_basebackup would look like this:

iMac:dbhs$ pg_basebackup -D /target_directory 
-h sample.postgresql-support.de

In this example, we will fetch the base backup from sample.postgresql-support.de and put it into our local directory called /target_directory. It just takes this simple line to copy an entire database instance to the target system.

When we create a base backup as shown in this section, pg_basebackup will connect to the server and wait for a checkpoint to happen before the actual copy process is started. In this mode, this is necessary because the replay process will start exactly at this point in the XLOG. The problem is that it might take a while until a checkpoint kicks in; pg_basebackup does not enforce a checkpoint on the source server straight away to make sure that normal operations are not disturbed.

Tip

If you don't want to wait on a checkpoint, consider using --checkpoint=fast. It will enforce an instant checkpoint and pg_basebackup will start copying instantly.

By default, a plain base backup will be created. It will consist of all the files in directories found on the source server. If the base backup should be stored on tape, we suggest to give –-format=t a try. It will automatically create a TAR archive (maybe on a tape). If you want to move data to a tape, you can save an intermediate step easily this way. When using TAR, it is usually quite beneficial to use it in combination with --gzip to reduce the size of the base backup on disk.

Tip

There is also a way to see a progress bar while doing the base backup but we don't recommend to use this option (--progress) because it requires pg_basebackup to determine the size of the source instance first, which can be costly.

pg_basebackup – self-sufficient backups

Usually a base backup without XLOG is pretty worthless. This is because the base backup is taken while the master is fully operational. While the backup is taken, those storage files in the database instance might have been modified heavily. The purpose of the XLOG is to fix those potential issues in the data files reliably.

But, what if we want to create a base backup, which can live without (explicitly archived) XLOG? In this case, we can use the --xlog-method=stream option. If this option has been chosen, pg_basebackup will not just copy the data as it is but it will also stream the XLOG being created during the base backup to our destination server. This will provide us with just enough XLOG to allow us to start a base backup made that way directly. It is self-sufficient and does not need additional XLOG files. This is not Point-In-Time-Recovery but it can come in handy in case of trouble. Having a base backup, which can be started right away, is usually a good thing and it comes at fairly low cost.

Tip

Please note that --xlog-method=stream will require two database connections to the source server, not just one. You have to keep that in mind when adjusting max_wal_senders on the source server.

If you are planning to use Point-In-Time-Recovery and if there is absolutely no need to start the backup as it is, you can safely skip the XLOG and save some space this way (default mode).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.79.241