Bulletproofing with synchronous replication

Sometimes, in order to provide acceptable data durability, a high availability configuration must utilize synchronous commits. Beginning with PostgreSQL 9.1, database servers can now refuse to commit a transaction until the data is located on at least one alternate server. Unlike asynchronous replication where this is optional, synchronous replicas enforce this requirement to a fault.

Discussions in the PostgreSQL mailing list suggest that there is a long-standing misconception that synchronous replication is similar to RAID-1 operation. In RAID-1, the same exact data exists on two disks (or two disk sets), and if one of the pair fails, it continues to operate in degraded mode until the problem is addressed. This is absolutely not the case with PostgreSQL synchronous replication.

Unlike a RAID-1, PostgreSQL replicas can exist on different servers, on different networks, or even in different countries. PostgreSQL synchronous replication is a guarantee that data is written to at least two servers. Despite the necessary increase in latency to confirm this, the guarantee is upheld at all times.

This recipe is for databases that need this kind of extreme durability.

Getting ready

We will be continuing the work we performed in the Upgrading to asynchronous replication recipe, so please refer to that section to build a working asynchronous replica. We will alter the standby setup to include synchronous streaming replication.

How to do it...

For this scenario, the server at 192.168.56.10 is still the primary PostgreSQL server. Follow these steps to change an asynchronous PostgreSQL server into a synchronous replica:

  1. Modify the recovery.conf file on the recovery server to match these lines:
    standby_mode = on
    primary_conninfo = 'host=192.168.56.10 user=rep_user application_name=node2'
    restore_command = 'cp /db/pg_archived/%f %p 2>/dev/null'
  2. Restart the streaming server with the following command as the postgres user:
    pg_ctl -D /db/pgdata restart
    
  3. Change the synchronous_standby_names setting in the postgresql.conf file on the primary server to read the following:
    synchronous_standby_names = 'node2'
  4. Reload the configuration files on the primary server with the following command as the postgres user:
    pg_ctl -D /db/pgdata reload
    
  5. Confirm that the standby is connected by executing this SQL on the primary PostgreSQL server:
    SELECT client_addr, state, sync_state, application_name
      FROM pg_stat_replication;

How it works...

Promoting an asynchronous standby server to synchronous mode is actually a fairly simple procedure. We begin by modifying the primary_conninfo setting in the standby's recovery.conf file to include the application_name value. PostgreSQL differentiates replicas by their stated application name, so if we change this, we can specifically target that particular replica. Any other synchronous standby nodes should be assigned different names.

Once we restart the PostgreSQL server on the streaming standby, it will reconnect to the primary server with the new application_name that we assigned. From this point onward, we can refer to the standby server as node2. Thus, when we alter the synchronous_standby_names variable in the primary server's postgresql.conf file, we use the same name there.

Any time we want to change the synchronous_standby_names variable, we merely need to tell PostgreSQL to reload its configuration files. Thus, after we do this, node2 should now act as a synchronous standby server. Any transaction will only commit if it can write to this server as well as the primary one.

Tip

This last point is extremely important. If, for any reason, the synchronous standby becomes unavailable, the primary server will stop writing to the database as well! If you are performing maintenance on the secondary server, we suggest that you set synchronous_standby_names to a blank value and reload the PostgreSQL server. This will break the synchronous guarantee until the standby can be reconnected.

Once we have reloaded the primary server's configuration files, we can check the pg_stat_replication view again to observe how streaming is currently functioning. After executing the query, we should see something like this:

How it works...

As we can see in this example, the primary server sees node2 as a synchronous streaming replica.

There's more...

We really want to confirm if the streaming replication works as advertised. To do this, let's shut down the standby server with this command:

pg_ctl -D /db/pgdata stop -m fast

Then, try to write to the primary server. This simple SQL statement should wait indefinitely:

CREATE TABLE foo ( bar INT );

If we then restart the streaming replica using the following command, we should see the transaction complete:

pg_ctl -D /db/pgdata start

As you might imagine, this can be problematic in true high availability architectures that handle thousands of transactions per second. As such, we don't actually recommend that you use synchronous replication on OLTP servers. As these comprise the bulk of highly available PostgreSQL clusters, opportunities to take advantage of this level of data durability are somewhat slim.

However, synchronous commit is actually somewhat optional. If we want to try the experiment again, we can first issue this SQL statement before trying a basic write query:

SET synchronous_commit TO false;

This disables synchronous replication temporarily for the current session. Subsequent write queries in this connection should succeed normally as if the remote server was a standard asynchronous copy.

See also

There are good resources within the PostgreSQL documentation and Wiki regarding streaming replication. For more information, please visit these URLs:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.80.101