Conflict management

In PostgreSQL, the streaming replication data flows in one direction only. The XLOG is provided by the master to a handful of slaves, which consume the transaction log and provide you with a nice copy of the data. You might wonder how this could ever lead to conflicts. Well, there can be conflicts.

Consider the following scenario: As you know, data is replicated with a very small delay. So, the XLOG ends up at the slave after it has been made on the master. This tiny delay can cause the scenario shown in the following picture:

Conflict management

Let us assume that a slave starts to read a table. It is a long read operation. In the meantime, the master receives a request to actually drop the table. This is a bit of a problem because the slave will still need this data to perform its SELECT statement. On the other hand, all the requests coming from the master have to be obeyed under any circumstances. This is a classical conflict.

Tip

In case of a conflict, PostgreSQL will issue the following error message:

Terminating connection due to conflict with recovery

There are two choices to solve the problem:

  1. Don't replay the conflicting transaction log before the slave has terminated the operation in question.
  2. Kill the query on the slave to resolve the problem.

The first option might lead to ugly delays during the replay process, especially if the slave performs fairly long operations. The second option might frequently kill queries on the slave. The database instance cannot know by itself what is best for your application, so you have to find a proper balance between delaying replay and killing queries.

To find this delicate balance, PostgreSQL offers two parameters in postgresql.conf:

max_standby_archive_delay = 30s        
     # max delay before canceling queries
     # when reading WAL from archive;
     # -1 allows indefinite delay
max_standby_streaming_delay = 30s      
     # max delay before canceling queries
     # when reading streaming WAL;
     # -1 allows indefinite delay

The max_standby_archive_delay parameter will tell the system how long to suspend the XLOG replay when there is a conflicting operation. In the default setting, the slave will delay the XLOG replay for up to 30 seconds if a conflict is found. This setting is valid if the slave is replaying the transaction log from files.

The max_standby_streaming_delay tells the slave for how long to suspend the XLOG replay if the XLOG is coming in through streaming. If the time has expired, and if the conflict is still there, PostgreSQL will cancel the statement due to a problem with recovery causing the problem in the slave system and resume the XLOG recovery to catch up.

In the previous example, we have shown that a conflict may show up if a table is dropped. This is an obvious scenario; however, it is by far not the most common one. It is much more likely that a row is removed by VACUUM or HOT-UPDATE somewhere, causing conflicts on the slave.

Conflicts popping up once in a while can be really annoying and trigger bad behavior of your applications. In other words, if possible, conflicts should be avoided. We have already seen how replaying the XLOG can be delayed. These are not the only mechanisms provided by PostgreSQL. There are two more settings we can use.

The first and older one of the two is the setting called vacuum_defer_cleanup_age. It is measured in transactions and tells PostgreSQL when to remove a line of data. Normally a line of data can be removed by VACUUM if no more transactions can see the data anymore. The vacuum_defer_cleanup_age tells VACUUM to not clean up a row immediately but wait for some more transactions before it can go away. Deferring cleanups will keep a row around a little longer than needed. This helps the slave to complete queries that are relying on old rows. Especially if your slave is the one handling some analytical work, this will help a lot to make sure that no queries have to die in vain.

One more method to control conflicts is to make use of hot_standby_feedback. The idea is that a slave reports transaction IDs to the master, which can, in turn, use this information to defer VACUUM. This is one of the easiest methods to avoid cleanup conflicts on the slave.

Tip

Keep in mind, however, that deferring cleanups can lead to increased space consumption and some other side effects, which have to be kept in mind under any circumstances. The effect is basically the same as running a long transaction on the master.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.82.217