Chapter 8. Fixing Backups and Replication

In this chapter, you will be guided through replication and learn about simple, basic backups. The following topics will be covered:

  • Using pg_dump for text dumps
  • Creating custom format dumps
  • Performing binary base backups
  • Understanding point-in-time recovery
  • Setting up asynchronous replication
  • Upgrading to synchronous replication

This chapter provides a practical, straightforward introduction to backups and replication. The focus is on simple guidelines.

Using pg_dump

The pg_dump command is one of the most important commands in PostgreSQL. It can be used to create textual as well as binary backups. Every system administrator will get in touch with this command once in a while. Here is how it works.

Creating textual dumps

Creating a textual dump is how most administrators digging into PostgreSQL get started. This is an essential, yet easy-to-manage, task.

The idea behind pg_dump is simple; it connects to the database that should be saved and transforms the content to a text format. Here is an example of its use:

pg_dump test > /backup/dump.sql

The simplest form of a backup is to send the SQL output created by pg_dump directly to a plain text file.

It is important to mention that a dump is always consistent. Internally, the dump is a large transaction in isolation level repeatable read. A dump represents a snapshot of data, which means that if you start a dump and it takes an hour to complete, then the changes in that one hour will not be included in the dump. This is because it takes a snapshot of data when you start and takes the backup from that.

In the next example, the dump is restored in a database called new_test:

psql new_test < /backup/dump.sql

Note

Keep in mind that all you are doing is replaying text.

In general, a dump is a non-blocking operation. However, DDLs might conflict with some reads. In general, it is a good idea to be careful with DDLs and dumps, as they might exclude each other.

Taking care of blobs

Blobs are not automatically included in a dump. In some cases, this leads to trouble. Therefore, it is recommended to always use the -b (--blobs) option. If -b is activated, pg_dump will include those blobs in textual formats and ensure that your dump is complete in every aspect.

Handling passwords

In the following example, we have assumed that a connection has been set to trust. No passwords have been used so far. However, this is far from reality. In a real setup, there is always some sort of security mechanism.

In many cases, a file called .pgpass, located in the user's home directory, can be used. Alternatively, the .pgpass file can also be referenced by setting the PGPASSFILE environment variable.

This file should contain lines of the following format:

hostname:port:database:username:password

Just create the desired entries. No password prompt will be shown anymore.

Why doesn't PostgreSQL just provide some sort of --put-password-here parameter? The reason is that any password passed to a program, such as pg_dump or psql, will automatically show up in the process table. This has to be avoided at any cost.

Creating custom format dumps

In many cases, a plain textual dump is not what you would really want to use. The main problems with textual dumps are:

  • They tend to be fairly large
  • It is pretty hard to extract just a subset of data

A custom format dump is exactly what comes to the rescue here. It is basically a compressed dump that includes a table of contents (TOC). The beauty of a custom format dump is that it is easily possible to extract just a subset of tables, a single index, or maybe a single procedure. In addition to that, it allows the replay process to be scaled out to more than one CPU.

Here is how a custom format dump can be created:

$ pg_dump test -Fc > /tmp/dump.fc

The -Fc option is the only option that has to be added to create a custom format dump.

Once the dump has been created, it is possible to take a look at the content of the dump file:

$ pg_restore --list /tmp/dump.fc
*snip*
175; 1259 16447 TABLE public a hs
176; 1259 16450 TABLE public b hs
174; 1259 16445 SEQUENCE public seq_a hs
172; 1259 16436 TABLE public t_test hs
173; 1259 16442 TABLE public x hs
*snip*

The --list option will return the list of objects in the dump. It is now possible to selectively restore data, instead of replacing the entire database.

Here is how a single table can be extracted from the dump again:

pg_restore -t t_test /tmp/dump.fc 

The SQL representation of the data will be displayed on screen. To load the data into a database once again, a simple pipe or -d can be used:

$ pg_restore -t t_test /tmp/dump.fc | psql xy
$ pg_restore -t t_test /tmp/dump.fc -d xy

Making use of many CPUs

So far, only one CPU has been used to replay the dump. This is true for the text dump as well as the custom format dump. However—especially in the case of large databases—it is hard to replay stuff within a reasonable amount of time. Building indexes is expensive and takes a long time during the replay process.

The -j option allows custom format dumps to be replayed using more than one CPU. Here is how it works:

$ pg_restore -j 4 /tmp/dump.fc -d xy

This example assumes that data will be replayed into a database called xy. In this case, PostgreSQL should try to use up to four CPU cores concurrently to perform the dump.

Note

Note that many cores are not possible when there is just a single table. It will only work in the case of more than one table.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.200.220