In PostgreSQL, the streaming replication data flows in one direction only. The XLOG is provided by the master to a handful of slaves, which consume the transaction log and provide you with a nice copy of the data. You might wonder how this could ever lead to conflicts. Well, there can be conflicts.
Consider the following scenario: As you know, data is replicated with a very small delay. So, the XLOG ends up at the slave after it has been made on the master. This tiny delay can cause the scenario shown in the following picture:
Let us assume that a slave starts to read a table. It is a long read operation. In the meantime, the master receives a request to actually drop the table. This is a bit of a problem because the slave will still need this data to perform its SELECT
statement. On the other hand, all the requests coming from the master have to be obeyed under any circumstances. This is a classical conflict.
There are two choices to solve the problem:
The first option might lead to ugly delays during the replay process, especially if the slave performs fairly long operations. The second option might frequently kill queries on the slave. The database instance cannot know by itself what is best for your application, so you have to find a proper balance between delaying replay and killing queries.
To find this delicate balance, PostgreSQL offers two parameters in postgresql.conf
:
max_standby_archive_delay = 30s # max delay before canceling queries # when reading WAL from archive; # -1 allows indefinite delay max_standby_streaming_delay = 30s # max delay before canceling queries # when reading streaming WAL; # -1 allows indefinite delay
The max_standby_archive_delay
parameter will tell the system how long to suspend the XLOG replay when there is a conflicting operation. In the default setting, the slave will delay the XLOG replay for up to 30 seconds if a conflict is found. This setting is valid if the slave is replaying the transaction log from files.
The
max_standby_streaming_delay
tells the slave for how long to suspend the XLOG replay if the XLOG is coming in through streaming. If the time has expired, and if the conflict is still there, PostgreSQL will cancel the statement due to a problem with recovery causing the problem in the slave system and resume the XLOG recovery to catch up.
In the previous example, we have shown that a conflict may show up if a table is dropped. This is an obvious scenario; however, it is by far not the most common one. It is much more likely that a row is removed by VACUUM
or HOT-UPDATE
somewhere, causing conflicts on the slave.
Conflicts popping up once in a while can be really annoying and trigger bad behavior of your applications. In other words, if possible, conflicts should be avoided. We have already seen how replaying the XLOG can be delayed. These are not the only mechanisms provided by PostgreSQL. There are two more settings we can use.
The first and older one of the two is the setting called vacuum_defer_cleanup_age
. It is measured in transactions and tells PostgreSQL when to remove a line of data. Normally a line of data can be removed by VACUUM
if no more transactions can see the data anymore. The
vacuum_defer_cleanup_age
tells VACUUM
to not clean up a row immediately but wait for some more transactions before it can go away. Deferring cleanups will keep a row around a little longer than needed. This helps the slave to complete queries that are relying on old rows. Especially if your slave is the one handling some analytical work, this will help a lot to make sure that no queries have to die in vain.
One more method to control conflicts is to make use of
hot_standby_feedback
. The idea is that a slave reports transaction IDs to the master, which can, in turn, use this information to defer VACUUM
. This is one of the easiest methods to avoid cleanup conflicts on the slave.
13.59.141.75