Replaying the transaction log

Once we have created ourselves a shiny initial base backup, we can collect the XLOG files created by the database. When the time has come, we can take all those XLOG files and perform our desired recovery process. This works as described in this section.

Performing a basic recovery

In PostgreSQL, the entire recovery process is governed by a file named recovery.conf, which has to reside in the main directory of the base backup. It is read during startup and tells the database server where to find the XLOG archive, when to end replay, and so forth.

To get you started, we have decided to include a simple recovery.conf sample file for performing a basic recovery process:

restore_command = 'cp /archive/%f %p'
recovery_target_time = '2013-10-10 13:43:12'

The restore_command is essentially the exact counterpart of the archive_command you have seen before. While the archive_command is supposed to put data into the archive, the restore_command is supposed to provide the recovering instance with the data file by file. Again, it is a simple shell command or a simple shell script providing one chunk of XLOG after the other. The options you have here are only limited by imagination; all PostgreSQL does is to check for the return code of the code you have written, and replay the data provided by your script.

Just like in postgresql.conf, we have used %p and %f as placeholders; the meaning of those two placeholders is exactly the same.

To tell the system when to stop recovery, we can set the recovery_target_time. The variable is actually optional. If it has not been specified, PostgreSQL will recover until it runs out of XLOG. In many cases, simply consuming the entire XLOG is a highly desirable process; if something crashes, you want to restore as much data as possible. But, it is not always so. If you want to make PostgreSQL stop recovery at a specific point in time, you simply have to put the proper date in. The crucial part here is actually to know how far you want to replay XLOG; in a real work scenario this has proven to be the trickiest question to answer.

Tip

If you happen to a recovery_target_time, which is in the future, don't worry, PostgreSQL will start at the very last transaction available in your XLOG and simply stop recovery. The database instance will still be consistent and ready for action. You cannot break PostgreSQL, but, you might break your applications in case data is lost because of missing XLOG.

Before starting PostgreSQL, you have to run chmod 700 on the directory containing the base backup, otherwise, PostgreSQL will error out:

iMac:target_directoryhs$ pg_ctl -D /target_directory
  start
server starting
FATAL:  data directory "/target_directory" has group or world access
DETAIL:  Permissions should be u=rwx (0700).

This additional security check is supposed to make sure that your data directory cannot be read by some user accidentally. Therefore an explicit permission change is definitely an advantage from a security point of view (better safe than sorry).

Now that we have all the pieces in place, we can start the replay process by starting PostgreSQL:

iMac:target_directoryhs$ pg_ctl –D /target_directory 
start
server starting
LOG:  database system was interrupted; last known up at 2013-03-10 18:04:29 CET
LOG:  creating missing WAL directory "pg_xlog/archive_status"
LOG:  starting point-in-time recovery to 2013-10-10 13:43:12+02
LOG:  restored log file "000000010000000000000006" from archive
LOG:  redo starts at 0/6000020
LOG:  consistent recovery state reached at 0/60000B8
LOG:  restored log file "000000010000000000000007" from archive
LOG:  restored log file "000000010000000000000008" from archive
LOG:  restored log file "000000010000000000000009" from archive
LOG:  restored log file "00000001000000000000000A" from archive
cp: /tmp/archive/00000001000000000000000B: No such file or directory
LOG:  could not open file "pg_xlog/00000001000000000000000B" (log file 0, segment 11): No such file or directory
LOG:  redo done at 0/AD5CE40
LOG:  last completed transaction was at log time 2013-03-10 18:05:33.852992+01
LOG:  restored log file "00000001000000000000000A" from archive
cp: /tmp/archive/00000002.history: No such file or directory
LOG:  selected new timeline ID: 2
cp: /tmp/archive/00000001.history: No such file or directory
LOG:  archive recovery complete
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

The amount of log produced by the database tells us everything we need to know about the restore process and it is definitely worth investigating this information in detail.

The first line indicates that PostgreSQL has found out that it has been interrupted and that it has to start recovery. From the database instance point of view, a base backup looks more or less like a crash needing some instant care by replaying XLOG; this is precisely what we want.

The next couple of lines (restored log file ...) indicate that we are replaying one XLOG file after the other that have been created since the base backup. It is worth mentioning that the replay process starts at the sixth file. The base backup knows where to start, so PostgreSQL will automatically look for the right XLOG file.

The message displayed after PostgreSQL reaches the sixth file (consistent recovery state reached at 0/60000B8) is of importance. PostgreSQL states that it has reached a consistent state. This is important. The reason is that the data files inside a base backup are actually broken by definition (please refer to our previous chapter about the XLOG here), but, the data files are not broken beyond repair. As long as we have enough XLOG to recover, we are very well off. If you cannot reach a consistent state, your database instance will not be usable and your recovery cannot work without providing additional XLOG.

Tip

Practically speaking, not being able to reach a consistent state usually indicates a problem somewhere in your archiving process and your system setup. If everything up to now has been working properly, there is no reason not to reach a consistent state.

Once we have reached a consistent state, one file after the other will be replayed successfully until the system finally looks for the 00000001000000000000000B file. The problem is that this file has not been created by the source database instance. Logically, an error pops up.

Tip

Not finding the last file is absolutely normal; this type of error is expected if the recovery_target_time does not ask PostgreSQL to stop recovery before it reaches the end of the XLOG stream. Don't worry, your system is actually fine. You have successfully replayed everything to the file showing up exactly before the error message.

As soon as all the XLOG has been consumed and the error message discussed earlier has been issued, PostgreSQL reports the last transaction it was able or supposed to replay, and starts up. You have a fully recovered database instance now and you can connect to the database instantly. As soon as the recovery has ended, recovery.conf will be renamed by PostgreSQL to recovery.done to make sure that it does not do any harm when the new instance is restarted later on at some point.

More sophisticated positioning in the XLOG

Up to now, we have recovered a database up to the very latest moment available in our 16 MB chunks of transaction log. We have also seen that you can define the desired recovery timestamp. But the question now is: How do you know which point in time to perform the recovery to? Just imagine somebody has deleted a table during the day. What if you cannot easily determine the recovery timestamp right away? What if you want to recover to a certain transaction?

recovery.conf has all you need. If you want to replay until a certain transaction, you can refer to recovery_target_xid. Just specify the transaction you need and configure recovery_target_inclusive to include this very specific transaction or not. Using this setting is technically easy but as mentioned before, it is not easy by far to find the right position to replay to.

In a typical setup, the best way to find a reasonable point to stop recovery is to use pause_at_recovery_target. If this is set to true, PostgreSQL will not automatically turn into a productive instance if the recovery point has been reached. Instead, it will wait for further instructions from the database administrator. This is especially useful if you don't know exactly how far to replay. You can replay, log in, see how far the database is, change to the next target time, and continue replaying in small steps.

Tip

You have to set hot_standby = on in postgresql.conf to allow reading during recovery.

Resuming recovery after PostgreSQL has paused can be done by calling a simple SQL statement: SELECT pg_xlog_replay_resume(). It will make the instance move to the next position you have set in recovery.conf. Once you have found the right place, you can set the pause_at_recovery_target back to false and call pg_xlog_replay_resume. Alternatively, you can simply utilize pg_ctl –D ... promote to stop recovery and make the instance operational.

Was this explanation too complicated? Let us boil it down to a simple list:

  • Add a restore_command to the recovery.conf file.
  • Add recovery_target_time to the recovery.conf file.
  • Set pause_at_recovery_target to true in the recovery.conf file.
  • Set hot_standby to on in postgresql.conf file.
  • Start the instance to be recovered.
  • Connect to the instance once it has reached a consistent state and as soon as it stops recovering.
  • Check if you are already where you want to be.
  • If you are not:
    • Change recovery_target_time.
    • Run SELECT pg_xlog_replay_resume().
    • Check again and repeat this section if it is necessary.

Tip

Keep in mind that once recovery has finished and once PostgreSQL has started up as a normal database instance, there is (as of 9.2) no way to replay XLOG later on.

Instead of going through this process, you can of course always use filesystem snapshots. A filesystem snapshot will always work with PostgreSQL because when you restart a snapshotted database instance, it will simply believe that it had crashed before and recover normally.

Cleaning up the XLOG on the way

Once you have configured archiving, you have to store the XLOG being created by the source server. Logically, this cannot happen forever. At some point, you really have to get rid of this XLOG; it is essential to have a sane and sustainable cleanup policy for your files.

Keep in mind, however, that you must keep enough XLOG so that you can always perform recovery from the latest base backup. But if you are certain that a specific base backup is not needed anymore, you can safely clean out all the XLOG that is older than the base backup you want to keep.

How can an administrator figure out what to delete? The best method is to simply take a look at your archive directory:

000000010000000000000005
000000010000000000000006
000000010000000000000006.00000020.backup
000000010000000000000007
000000010000000000000008

Check out the filename in the middle of the listing. The .backup file has been created by the base backup. It contains some information about the way the base backup has been made and tells the system where to continue replaying the XLOG. If the backup file belongs to the oldest base backup you need to keep around, you can safely erase all the XLOG lower than file number 6; in this case, file number 5 could be safely deleted.

In our case, 000000010000000000000006.00000020.backup contains the following information:

START WAL LOCATION: 0/6000020 (file 000000010000000000000006)
STOP WAL LOCATION: 0/60000E0 (file 000000010000000000000006)
CHECKPOINT LOCATION: 0/6000058
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2013-03-10 18:04:29 CET
LABEL: pg_basebackup base backup
STOP TIME: 2013-03-10 18:04:30 CET

The .backup file will also provide you with relevant information such as the time the base backup has been made. It is plain there and so it should be easy for ordinary users to read this information.

As an alternative to deleting all the XLOG files at one point, it is also possible to clean them up during replay. One way is to hide an rm command inside your restore_command. While this is technically possible, it is not necessarily wise to do so (what if you want to recover again?).

Also, you can add the recovery_end_command command to your recovery.conf file. The goal of recovery_end_command is to allow you to automatically trigger some action as soon as the recovery ends. Again, PostgreSQL will call a script doing precisely what you want. You can easily abuse this setting to clean up the old XLOG when the database declares itself active.

Switching the XLOG files

If you are going for an XLOG file-based recovery, you have seen that one XLOG will be archived every 16 MB. What would happen if you never manage to create 16 MB of changes? What if you are a small supermarket, which just makes 50 sales a day? Your system will never manage to fill up 16 MB in time.

However, if your system crashes, the potential data loss can be seen as the amount of data in your last unfinished XLOG file. Maybe this is not good enough for you.

A postgresql.conf setting on the source database might help. The archive_timeout tells PostgreSQL to create a new XLOG file at least every x seconds. So, if you are this little supermarket, you can ask the database to create a new XLOG file every day shortly before you are heading for home. In this case, you can be sure that the data of the day will safely be on your backup device already.

It is also possible to make PostgreSQL switch to the next XLOG file by hand. A procedure named pg_switch_xlog() is provided by the server to do the job:

test=# SELECT pg_switch_xlog();
pg_switch_xlog
----------------
 0/17C0EF8
(1 row)

You might want to call this procedure when some important patch job has finished or if you want to make sure that a certain chunk of data is safely in your XLOG archive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.37.38