Mixing streaming and file-based recovery

Life is not always just black or white; sometimes there are also some shades of gray. For some cases, streaming replication might be just perfect. In some other cases, file-based replication and PITR are all you need. But, there are also many cases in which you need a little bit of both. One example would be that when you interrupt replication for a longer period of time, you might want to resync the slave using the archive again instead of performing a full base backup again. It might also be useful to keep an archive around for some later investigation or replay operation.

The good news is that PostgreSQL allows you to actually mix file-based and streaming-based replication. You don't have to decide whether streaming- or file-based is better; you can have the best of both worlds at the very same time.

How can you do that? In fact, you have seen all the ingredients already; we just have to put them together in the right way.

To make this easier for you, we have compiled a complete example for you.

The master configuration

On the master, we can use the following configuration in postgresql.conf:

wal_level = hot_standby
        # minimal, archive, or hot_standby
        # (change requires restart)
archive_mode = on
        # allows archiving to be done
        # (change requires restart)
archive_command = 'cp %p /archive/%f'
        # command to use to archive a logfile segment
        # placeholders: %p = path of file to archive
        #               %f = file name only
max_wal_senders = 5
        # we used five here to have some spare capacity

In addition to that, we have to add some config lines to pg_hba.conf to allow streaming. Here is an example:

# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     hs   trust
host    replication     hs   127.0.0.1/32      trust
host    replication     hs   ::1/128           trust

host    replication     all  192.168.0.0/16    md5

In our case, we have simply opened an entire network to allow replication (to keep the example simple).

Once we have made those changes, we can restart the master and take a base backup as shown earlier in this chapter.

The slave configuration

Once we have configured our master and taken a base backup, we can start to configure our slave system. Let us assume for the sake of simplicity that we are only using a single slave; we will not cascade replication to other systems.

We only have to change a single line in postgresql.conf on the slave:

hot_standby = on     # to make the slave readable

In the next step, we can write a simple recovery.conf file and put it into the main data directory:

restore_command = 'cp /archive/%f %p'
standby_mode = on
primary_conninfo = ' host=sample.postgresql-support.de port=5432 '
trigger_file = '/tmp/start_me_up.txt'

When we fire up the slave, the following things will happen:

  1. PostgreSQL will call the restore_command to fetch the transaction log from the archive.
  2. It will do so until no more files can be found in the archive.
  3. PostgreSQL will try to establish a streaming connection.
  4. It will stream if data exists.
    • If no data is present, it will call the restore_command to fetch the transaction log from the archive.
    • It will do so until no more files can be found in the archive.
    • It will try the streaming connection again.

You can keep streaming as long as necessary. If you want to turn the slave into a master, you can again use pg_ctl promote or the trigger_file defined in recovery.conf.

Error scenarios

The most important advantage of a dual-strategy is that you can create a cluster, which offers a higher level of security than just plain streaming-based or plain file-based replay. If streaming does not work for some reason, you can always fall back to files.

In this section we can discuss some typical error scenarios in a dual-strategy cluster:

Network connection between the master and slave is dead

If the network is dead, the master might not be able to perform the archive_command operation successfully anymore. The history of the XLOG files must remain continuous, so the master has to queue up those XLOG files for later archiving. This can be a dangerous (yet necessary) scenario because you might run out of space for XLOG on the master if the stream of files is interrupted permanently.

If the streaming connection fails, PostgreSQL will try to keep syncing itself through the file-based channel. Should the file-based channel also fail, the slave will sit there and wait for the network connection to come back. It will then try to fetch the XLOG and simply continue once this is possible again.

Tip

Keep in mind that the slave needs an uninterrupted stream of XLOG; it can only continue to replay the XLOG if no single XLOG file is missing or if the streaming connection can still provide the slave with the XLOG that it needs to operate.

Rebooting the slave

Rebooting the slave will not do any harm as long as the archive has the XLOG to bring the slave back up. The slave will simply start up again and try to get the XLOG from any source available. There won't be corruption or any other problem of this sort.

Rebooting the master

If the master reboots, the situation is pretty uncritical as well. The slave will notice though the streaming connection that the master is gone. It will try to fetch the XLOG through both channels, but it won't be successful until the master is back. Again, nothing bad such as corruption can happen. Operations can simply resume after the reboot on both boxes.

Corrupted XLOG in the archive

If the XLOG in the archive corrupts, we have to distinguish between two scenarios:

  1. The slave is streaming: If the stream is okay and intact, the slave will not notice that some XLOG file somehow got corrupted in the archive. The slaves never need to read from the XLOG files as long as the streaming connection is operational.
  2. If we are not streaming but replaying from a file, PostgreSQL will inspect every XLOG record and see if its checksum is correct. If anything goes wrong, the slave will not continue to replay the corrupted XLOG. This will ensure that no problems can propagate and no broken XLOG can be replayed. Your database might not be complete, but it will be sane and consistent up to the point of the error.

Surely, there is a lot more that can go wrong, but given those likely cases, you can see clearly that the design has been made as reliable as possible.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.172.132