Chapter 5. Setting up Synchronous Replication

Up to now we have dealt with file-based replication (or log shipping) and a simple streaming-based setup. In both cases, data is submitted and received by the slave(s) after the transaction has been committed on the master. During the time between the master's commit and the the point when the slave actually has fully received the data, it can still be lost.

In this chapter we will learn about the following topics:

  • Making sure that no single transaction can be lost
  • Configuring PostgreSQL for synchronous replication
  • Understanding and using application_name
  • The performance impact of synchronous replication
  • Optimizing replication for speed

Setting up synchronous replication

As mentioned before, synchronous replication has been made to protect your data at all costs. The core idea of synchronous replication is that a transaction must be on at least two servers before the master returns success to the client.

Setting up synchronous replication works just like setting up asynchronous replication. Just a handful of parameters discussed in this chapter have to be changed to enjoy the blessings of synchronous replication. However, in case you are about to create yourself a setup based on synchronous replication, we recommend getting started with an asynchronous setup and gradually extend your configuration and turning it into synchronous replication. This will allow you to debug things more easily and avoid problems down the road.

Understanding the downside of synchronous replication

The most important thing you have to know about synchronous replication is that it is simply expensive. Do you remember our first chapter about the CAP theory, about the speed of light, and so on? Synchronous replication and its downsides are one of the core reasons why we have decided to include all this background information in this book. It is essential to understand the physical limitations of synchronous replication, otherwise you might end up in deep trouble.

When setting up synchronous replication, try to keep the following things in mind:

  • Minimize the latency
  • Make sure you have redundant connections
  • Synchronous replication is more expensive than asynchronous replication

Understanding the application_name parameter

The application_name plays an important role in a synchronous setup. In a typical application, people use the application_name parameter for debugging purposes. It can help to track bugs, identify what an application is doing, and so on:

test=# SHOW application_name;
application_name
------------------
psql
(1 row)

test=# SET application_name TO 'whatever';
SET
test=# SHOW application_name;
application_name
------------------
 whatever
(1 row)

As you can see, it is possible to set the application_name parameter freely. The setting is valid for the session we are in, and will be gone as soon as we disconnect. The question now is: What does application_name have to do with synchronous replication?

Well, the story goes like this: If a slave connects to the master through streaming, it can optionally send an application_name as part of the primary_conninfo setting. If this application_name happens to be part of the first entry in synchronous_standby_names, the slave will be a synchronous one.

Tip

In the case of cascaded replication (which means that a slave is again connected to a slave), the cascaded slave is not treated synchronously anymore.

With all this information in mind, we can move forward and configure our first synchronous replication.

Making synchronous replication work

To show you how synchronous replication works, this chapter will include a full, working example, outlining all the relevant configuration parameters.

A couple of changes have to be made to the master. The following settings will be needed in postgresql.conf on the master:

wal_level = hot_standby
max_wal_senders = 5    # or any number
synchronous_standby_names = 'book_sample'
hot_standby = on  
# on the slave to make it readable

Then we have to adapt pg_hba.conf just as we have already seen it in the previous chapters. After that, the server can be restarted and the master is ready for action.

Tip

We recommend to set wal_keep_segments as well to keep more transaction log on the master database. This makes the entire setup way more robust.

In the next step, we can perform a base backup just as we have done it before. We have to call pg_basebackup on the slave. Ideally, we include the transaction log already when doing the base backup (--xlog-method=stream). This allows us to fire things up quickly and without any greater risks.

Tip

--xlog-method=stream and wal_keep_segments are a good combo, and should in our opinion be used in most cases to ensure that a setup works flawlessly and safely.

We have recommended setting hot_standby on the master already; the config file will be replicated anyway, so you save yourself one trip to postgresql.conf to change this setting. Of course, this is not fine art, but an easy and pragmatic approach.

Once the base backup has been performed, we can move ahead and write a simple recovery.conf file suitable for synchronous replication:

iMac:slavehs$ cat recovery.conf
primary_conninfo = 'host=localhost
                    application_name=book_sample
port=5432'
standby_mode = on

The config file looks just like before. The only difference is that we have added the application_name to the scenery. Note, the application_name parameter must be identical to the synchronous_standby_names setting on the master.

Once we have finished writing recovery.conf, we can fire up the slave.

In our example, the slave is on the same server as the master. In this case, you have to ensure that those two instances will use different TCP ports, otherwise the instance started second will not be able to fire up. The port can be changed in postgresql.conf easily.

After those steps, the database instance can be started. The slave will check out its connection info and connect to the master. Once it has replayed all relevant transaction logs, it will be in the synchronous state; the master and the slave will hold exactly the same data from then on.

Checking replication

Now that we have started the database instance, we can connect to the system and see if things are working properly.

To check for replication, we can connect to the master and take a look at pg_stat_replication. For this check, we can connect to any database inside our (master) instance:

postgres=# x
Expanded display is on.
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 62871
usesysid         | 10
usename          | hs
application_name | book_sample
client_addr      | ::1
client_hostname  | 
client_port      | 59235
backend_start    | 2013-03-29 14:53:52.352741+01
state            | streaming
sent_location    | 0/30001E8
write_location   | 0/30001E8
flush_location   | 0/30001E8
replay_location  | 0/30001E8
sync_priority    | 1
sync_state       | sync

This system view will show exactly one line per slave attached to your master system.

Tip

x will make the output more readable for you. If you don't use x to transpose the output, those lines will be so long that it is pretty hard for you to comprehend the content of this table. In expanded display mode, each column will be one line instead.

You can see that the application_name parameter has been taken from the connect string passed to the master by the slave (which is in our example book_sample). As the application_name parameter matches the master's synchronous_standby_names setting, we have convinced the system to replicate synchronously; no transaction can be lost anymore because every transaction will end up on two servers instantly. The sync_state setting will tell you precisely how data is moving from the master to the slave.

Tip

You can also use a list of application names or simply a * in synchronous_standby_names to indicate that the first slave has to be synchronous.

Understanding performance issues

At various points in this book we have already pointed out that synchronous replication is an expensive thing to do. Remember, we have to wait for a remote server and not just on the local system; the network between those two nodes is definitely not something that is going to speed things up. Writing to more than just one node is always more expensive than writing to just one node. Therefore, we definitely have to keep an eye on speed, otherwise you might face some pretty nasty surprises.

Tip

Consider what we have learned about the CAP theory earlier in this book; synchronous replication is exactly where it is most obvious, with serious impact the physical limitations will have on performance.

The main question you really have to ask yourself is: Do you really want to replicate all transactions synchronously? In many cases, you don't. To prove our point, let us imagine a typical scenario: A bank wants to store accounting related data as well as some logging data. We definitely don't want to lose a couple of millions just because a database node goes down. This kind of data might be worth the effort and we can replicate it synchronously. The logging data is quite different, however. It might be far too expensive to cope with the overhead of synchronous replication. So, we want to replicate this data in an asynchronous way to ensure maximum throughput.

How can we configure a system to handle important as well as not so important transactions nicely? The answer lies in a variable you have already seen earlier on in the book: The synchronous_commit variable.

Setting synchronous_commit to on

In the default PostgreSQL configuration, synchronous_commit has been set to on. In this case, commits will wait until a reply from the current synchronous standby indicates it has received the commit record of the transaction and flushed it to the disk. In other words, both servers must report that the data has been written safely. Unless both servers crash at the same time, your data will survive potential problems (crashing both servers should be pretty unlikely).

Setting synchronous_commit to remote_write

Flushing to both disks can be highly expensive. In many cases, it is enough to know that the remote server has accepted the XLOG and passed it on to the operating system without flushing things to disk on the slave. As we can be pretty certain that we don't lose two servers at the very same time, this is a reasonable compromise between performance and consistency respectively to data protection.

Setting synchronous_commit to off

We have already dealt with this setting in a previous chapter. The idea is to delay WAL writing to reduce disk flushes. It can be used if performance is more important than durability. In the case of replication, it means that we are not replicating in a fully synchronous way.

Keep in mind that this can have a serious impact on your application. Imagine a transaction committing on the master and you want to query that data instantly on one of the slaves. There is still a tiny window during which you can actually get outdated data.

Setting synchronous_commit to local

local will flush locally but will not wait for the replica to respond. In others words, it will turn your transaction into an asynchronous one.

Setting synchronous_commit to local can also cause a small time window during which the slave can actually return slightly outdated data. This phenomenon has to be kept in mind when you decide to offload reads to the slave.

In short, If you want to replicate synchronously you have to ensure that synchronous_commit is either set to on or set to remote_write.

Changing durability settings on the fly

Changing the way data is replicated on the fly is easy. In this chapter, we have already set up a full synchronous replication infrastructure by adjusting synchronous_standby_names (master) along with the application_name (slave) parameter. The good thing about PostgreSQL is that you can change your durability requirements on the fly:

test=# BEGIN;
BEGIN
test=# CREATE TABLE t_test (id int4);
CREATE TABLE
test=# SET synchronous_commit TO local;
SET
test=# x
Expanded display is on.
test=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 62871
usesysid         | 10
usename          | hs
application_name | book_sample
client_addr      | ::1
client_hostname  | 
client_port      | 59235
backend_start    | 2013-03-29 14:53:52.352741+01
state            | streaming
sent_location    | 0/3026258
write_location   | 0/3026258
flush_location   | 0/3026258
replay_location  | 0/3026258
sync_priority    | 1
sync_state       | sync

test=# COMMIT;
COMMIT

In this example, we have changed the durability requirements on the fly. It will make sure that this very specific transaction will not wait for the slave to flush to disk. Note, as you can see, the sync_state has not changed. Don't be fooled by what you see here; you can completely rely on the behavior outlined in this section. PostgreSQL is perfectly able to handle each transaction separately. This is a unique feature of this wonderful open source database; it puts you in control and lets you decide which kind of durability requirements are there.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.172.56