Point-in-time Recovery

When you make a change to a PostgreSQL database, the PostgreSQL server records your changes in the shared-buffer pool, the write-ahead log (WAL), and eventually, in the file that holds the table that you've changed. The WAL stores a complete record of every change that you make. PostgreSQL's point-in-time recovery mechanism (PITR) uses the modification history stored in the WAL files to roll-forward changes made since the most recent cluster backup. You can think of PITR as an incremental backup scheme. You start with a complete backup and then, periodically, archive the changes. To recover from a crash, you restore the complete backup and then apply the changes, in sequence, until you've recovered all of the data that you want to restore.

Point-in-time recovery (PITR) can seem very intimidating if you start out reading the documentation (not that the documentation is bad, it just assaults you with a lot of detail before you get the big picture). To give you a broad overview of PostgreSQL's PITR, I'll cook up a new database, make it crash, and then recover the data using the PITR mechanism. You can follow along if you want, but you'll need some spare disk space.

I'll start by creating a new database cluster so I can experiment without harming any real data:

$ export PGDATA=/usr/local/pgPITR
$ initdb
The files belonging to this database system will be owned by user "pg".
This user must also own the server process.
...
Success. You can now start the database server using:

    postmaster -D /usr/local/pgPITR/data
or
    pg_ctl -D /usr/local/pgPITR/data -l logfile start

Next, I'll change the $PGDATA/postgresql.conf configuration file to enable PITR. The only change that I have to make is to define the archive_command parameter. archive_command tells PostgreSQL how to archive the WAL (write-ahead-log) files generated by the PostgreSQL server. Since I don't really care about the data in this cluster, I'll just squirrel away the WAL files in the /tmp/wals directory:

archive_command = 'cp %p /tmp/wals/%f'

PostgreSQL will execute the archive_command instead of simply deleting the WAL files as it would normally do. The %p and %f macros expand to the complete pathname of a WAL file (%p) and the filename component of that file (%f).

Now I'll create the archive directory (/tmp/wals), start the postmaster, and create a database that I can work in:

$ mkdir /tmp/wals
$ pg_ctl -l /tmp/pg.log start
postmaster starting
$ createdb test
CREATE DATABASE

At this point, I have a fully PostgreSQL cluster, based in $PGDATA, with a sacrificial database named test. When I make changes to database, those changes are recorded in the WAL files stored in $PGDATA/pg_xlog (just like any other PostgreSQL cluster). When a WAL file fills, PostgreSQL will copy the file into the /tmp/wals directory for safe-keeping (if this were a real live database, I would employ a much more secure archive method). The only difference between a regular PostgreSQL cluster and a PITR-enabled database is that PostgreSQL archives the WAL files instead of deleting them.

Because the PITR mechanism works by applying changes recorded in the WAL files, I'll generate some WAL data by creating some dummy tables. It really doesn't matter what I put in the dummy tables, they just have to generate enough WAL data to overflow a few WAL files (each WAL file is 16MB long). Since the whole point of PITR is to recover to a specific point in time, I'll make note of the time that I COMMIT each transaction:

$ psql test
Welcome to psql 8.0.0, the PostgreSQL interactive terminal.
...
test=# BEGIN WORK;
BEGIN
test=# CREATE TABLE dummy1 AS SELECT * FROM pg_class, pg_attribute;
SELECT
test=# COMMIT; — executed at 12:30:00pm
COMMIT
test=# q

The CREATE TABLE command produced a table that holds more than 245,000 records and produced enough WAL data to overflow a few segments. You can see the archived segments by looking in the /tmp/wals directory:

$ ls /tmp/wals
000000010000000000000000
000000010000000000000001
000000010000000000000002
000000010000000000000003
000000010000000000000004

Next, I'll create a complete backup of my database cluster. To simplify this example, I'll simply create a tarball and save it in the /tmp directory instead of writing the whole thing to tape. Before I start the backup, I'll tell PostgreSQL what I'm about to do by calling the pg_start_backup() function:

$ psql test
Welcome to psql 8.0.0, the PostgreSQL interactive terminal.
...
test=# select pg_start_backup( 'full backup - Monday' );
 pg_start_backup
----------------
 0/52EA2B8
(1 row)
test=# q

$ tar -zxvf /tmp/pgdata.tgz $PGDATA
tar: Removing leading '/' from member names
/usr/local/pgPITR/data/
/usr/local/pgPITR/data/pg_hba.conf
/usr/local/pgPITR/data/pg_subtrans/
...

The argument that you give to pg_start_backup() is simply a label that helps you remember where the archive came from—you'll find a file named $PGDATA/backup_label after calling this function and the label is stored inside of that file.

When the backup completes, I'll tell PostgreSQL that I'm finished by calling the pg_stop_backup() function:

$ psql test
Welcome to psql 8.0.0, the PostgreSQL interactive terminal.
...
test=# select pg_stop_backup();
 pg_stop_backup
--------------
 0/52EA2F4
(1 row)

test=# q

At this point, I have a complete backup of the database cluster, PostgreSQL is still running (I did not have to stop the postmaster), any database changes are recorded in the WAL files, and the WAL files are still being archived in /tmp/wals. Pretend that I've locked the backup (/tmp/pgdata.tgz) in a safe place somewhere.

To generate a few more WAL files, I'll create another dummy table:

$ psql test
Welcome to psql 8.0.0, the PostgreSQL interactive terminal.
...
test=# BEGIN WORK;
BEGIN
test=# CREATE TABLE dummy2 AS SELECT * FROM pg_class, pg_attribute;
SELECT
test=# COMMIT; — executed at 12:38:00pm

Now, just to make things more interesting, I'll add a third dummy table and the drop it again. You would expect that table to disappear once I recover:

test=# BEGIN WORK;
BEGIN
test=# CREATE TABLE dummy3 AS SELECT * FROM pg_class, pg_attribute;
SELECT
test=# COMMIT; — executed at 12:39:00pm
COMMIT
test=# BEGIN WORK;
BEGIN
test=# DROP TABLE dummy3;
DROP TABLE
test=# COMMIT; — executed at 12:40:00pm
COMMIT
test=# q

As you would expect, PostgreSQL has copied a few more WAL files to /tmp/wals:

$ ls /tmp/wals
000000010000000000000000
000000010000000000000001
000000010000000000000002
000000010000000000000003
000000010000000000000004
000000010000000000000005
000000010000000000000005.002EA2B8.backup
000000010000000000000006
000000010000000000000007
000000010000000000000008
000000010000000000000009
00000001000000000000000A
00000001000000000000000B
00000001000000000000000C
00000001000000000000000D
00000001000000000000000E

At this point, disaster strikes. My power goes out, an unexpected hurricane arrives, or my computer bursts into flames (and everything except my /tmp directory is destroyed—hey, we're just pretending). To simulate disaster, I'll kill off the postmaster:

$ kill -9 $(head -1 $PGDATA/postmaster.pid)

Now it's time to recover the entire cluster. I'll start by renaming the damaged cluster:

$ mv $PGDATA $PGDATA.old

Next, I'll restore the backup from its archive:

$ pushd / && tar -zxvf /tmp/pgdata.tgz && popd

That leaves me with the damaged cluster in $PGDATA.old and the backup cluster in $PGDATA. I'll clean up the (freshly restored) cluster by removing the old WAL files and the postmaster.pid file:

$ rm -f $PGDATA/pg_xlog/0*
$ rm -f $PGDATA/postmaster.pid

To ensure that I can recover as much data as possible, I'll copy the WAL files from the damaged cluster into the restored cluster:

$ cp $PGDATA.old/pg_xlog/0* $PGDATA/pg_xlog/

If the damaged cluster is not available (maybe it burned up when my computer caught fire), I can still recover—I just can't recover as much. I can recover all transactions committed before the most recent WAL file was archived. In a busy database, that's a few minutes. In a quiescent database, the interval is longer, but the amount of data is the same (16MB or so).

Now I can start the PostgreSQL recovery process, but before I do, here's another look at the timeline that I've followed:

  • 12:30:00pm— I created the dummy1 table and committed the change

  • 2:38:00pm— I created the dummy2 table and committed the change

  • 12:39:00pm— I created the dummy3 table and committed the change

  • 12:40:00pm— I dropped dummy3 and committed the change

Sometime between 12:30 and 12:38, I backed up the entire cluster. When I start the recovery process, I can recover all changes, or I can tell PostgreSQL to stop at a certain point in time. If the recovery stops before 12:38, I should find the dummy1 table, but not the dummy2 (or dummy3) table. If the recovery stops before 12:39, I should find dummy1 and dummy2 (but not dummy3). If the recovery stops before 12:40, I should find all three tables. If I let PostgreSQL recover all changes, the dummy3 file should disappear (because I dropped that table before the lights went out).

To control the recovery process, I'll create a file named $PGDATA/recovery.conf that tells PostgreSQL how to proceed. recovery.conf looks like this:

$ cat $PGDATA/recovery.conf
restore_command = 'cp /tmp/wals/%f %p'
recovery_target_time = '2005-06-22 12:39:01 EST'

The restore_command parameter tells PostgreSQL how to restore the WAL files PostgreSQL tucked away in /tmp/wals (in a real-world environment, the restore_command might be a shell script that asks you to mount a specific tape). The recovery_target_time parameter tells PostgreSQL when to stop. If you want to recover all changes, simply omit the recovery_target_time parameter. In this case, I've asked PostgreSQL to stop the recovery process after it restores the 12:39pm change—I expect to find dummy1, dummy2, and dummy3 in the database when recovery completes.

PostgreSQL starts the recovery process as soon as you start the postmaster (the postmaster knows it has extra work to do because it finds the $PGDATA/recovery.conf file):

$ pg_ctl -l /tmp/pg.log start
postmaster started

If you're running on a Linux/Unix system (or you have the Cygwin tools installed), you can follow the postmaster's progress by watching the server's log file:

$ tail -f /tmp/pg.log
LOG:  starting archive recovery
LOG:  restore_command = "cp /tmp/wals/%f %p"
LOG:  recovery_target_time = 2005-06-22 13:05:00-05
...
LOG:  restored log file "000000010000000000000006" from archive
LOG:  restored log file "000000010000000000000007" from archive
LOG:  restored log file "000000010000000000000008" from archive
LOG:  restored log file "000000010000000000000009" from archive
...
LOG:  archive recovery complete
LOG:  database system is ready

When the recovery process completes, PostgreSQL has renamed the recovery.conf file (to recovery.done) to avoid an accidental recovery the next time you start the postmaster.

I can now connect to the database and see that the dummy1, dummy2, and dummy3 tables are in place:

$ psql test
Welcome to psql 8.0.0, the PostgreSQL interactive terminal.
...
test=# d
        List of relations
 Schema |  Name  | Type  | Owner
--------+--------+-------+------
 public | dummy1 | table | pg
 public | dummy2 | table | pg
 public | dummy3 | table | pg
(3 rows)

If I had omitted the recovery_target_time parameter, the dummy3 table would disappear because I dropped that table (and, most importantly, committed the change) before my system crashed.

You can see that PITR is easy to configure (just define an archive_command in the postgresql.conf file). The process of actually recovering from a crash can be tricky, but as long as you follow the procedure I've outlined, you should be in good shape. I strongly encourage you to practice the whole process before you really need it. Just create a sacrificial cluster, configure PITR recovery, create some data, and fake a few crashes. It won't take much time and you'll be happy that you've learned the procedure if your disk drives start making funny noises.

To summarize the setup procedure:

  • Configure PITR recovery by defining an archive_command in $PGDATA/postgresql.conf

  • Restart the postmaster to start the WAL archive process

  • Connect to the database and invoke the pg_start_backup( label ) function

  • Back up the entire cluster (you don't have to shutdown the database)

To summarize the recovery procedure:

  • If possible, rename the damaged cluster so you can recover as much data as possible

  • Restore the complete cluster backup from the archive

  • Clean up the restored cluster (remove the $PGDATA/pg_xlog files and $PGDATA/postmaster.pid)

  • If possible, copy the WAL files from the damaged cluster into the freshly restored cluster

  • Create the $PGDATA/recovery.conf file (at minimum, this file must define a restore_command)

  • Start the postmaster

  • Verify that the data you expect to find in the database is really there

The most important part of the PITR scheme is the archive_command. The PostgreSQL reference documentation (Chapter 22) provides a few guidelines regarding the archive_command (and the restore_command). You'll also find a detailed explanation of the parameters that you can include in the recovery.conf file. Be sure to read that chapter before you try to configure a real-world server that contains important data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.68.81