Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Running pgpool with streaming replication

pgpool can also be used with streaming instead of statement-level replication. It is perfectly fine to use PostgreSQL onboard-replication and utilize pgpool just for load balancing and connection pooling.

In fact, it can even be beneficial to do so because you don't have to worry about side-effects of functions or potential other issues. The PostgreSQL transaction log is always right and it can be considered to be the ultimate law.

pgpool statement-level replication was a good feature to replicate data before streaming replication was introduced into the core of PostgreSQL.

In addition to that it can be beneficial to have just one master. The reason for that is simple. If you have just one master, it is hard to face inconsistencies. Also, pgpool will create full replicas so data has to be replicated anyway. There is absolutely no win if data must end up on both servers anyway—writing to two nodes will not make things scale any better in this case.

How can you run pgpool without replication? The process is basically quite simple:

Set up PostgreSQL streaming replication (synchronous or asynchronous).
Change replication_mode in the pool config to off.
Set master_slave to on.
Set master_slave_sub_mode to stream.
Start pgpool as described earlier in this chapter.

In a basic setup, pgpool will assume that node number 0 will be the master. So, you have to make sure that your two nodes are listed in the config in the right order.

For a basic setup these small changes to the config are perfectly fine.

Optimizing pgpool configuration for master/slave mode

pgpool offers a handful of parameters to tweak the configuration to your needs. One of the most important things we have to take into consideration is that PostgreSQL supports both synchronous and asynchronous replication. Why is this relevant? Well, let us assume a simple scenario. Somebody wants to register on a website:

A write request comes in. pgpool will dispatch you to node 0 because we are facing a write.
The user clicks on the Save button.
The user will reach the next page; a read request will be issued
- If we end up on node 0 we will be fine—the data is expected to be there.
- If we end up on node 1 we might not see the data at this point if we are replicating asynchronously. Theoretically there can also be a small window if you are using synchronous replication in this case.

This can lead to strange behavior on the client side. A typical case of strange behavior would be: A user creates a profile. In this case a row is written. At the very next moment the user wants to visit his or her profile and check the data. If he or she happens to read from a replica, the data might not be there already. If you are writing a web application you must keep this in the back if your mind.

To get around this issue, you have two choices:

Replicate synchronously, which is a lot more expensive
Set delay_threshold in the pooler config

delay_threshold defines the maximum lag a slave is allowed to have to still receive reads. The setting is defined in bytes of changes inside the XLOG. So, if you set this to 1024 a slave is only allowed to be 1 KB of XLOG behind the master. Otherwise it will not receive read requests.

Of course, unless this has been set to zero it is pretty hard to make it totally impossible that a slave can ever return data that is too old, however, a reasonable setting can make it very unlikely. In many practical applications this might very well be enough.

How does pgpool know how far a slave is behind? The answer is that this can be configured easily:

sr_check_period: This variable defines, how often the system should check those XLOG positions to figure out if the delay is too high or not. The unit used here is seconds.
sr_check_user: The name of the user to connect to the primary via streaming to check for the current position in the XLOG.
sr_check_password: The password for this user.

Tip

If you really want to make sure that load balancing will always provide you with up-to-date data, it is necessary to replicate synchronously, which can be expensive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Running pgpool with streaming replication

Create new playlist

Sign In

Sign Up

Running pgpool with streaming replication

Optimizing pgpool configuration for master/slave mode

Tip

Table of Contents for
Running pgpool with streaming replication