pgpool can also be used with streaming instead of statement-level replication. It is perfectly fine to use PostgreSQL onboard-replication and utilize pgpool just for load balancing and connection pooling.
In fact, it can even be beneficial to do so because you don't have to worry about side-effects of functions or potential other issues. The PostgreSQL transaction log is always right and it can be considered to be the ultimate law.
pgpool statement-level replication was a good feature to replicate data before streaming replication was introduced into the core of PostgreSQL.
In addition to that it can be beneficial to have just one master. The reason for that is simple. If you have just one master, it is hard to face inconsistencies. Also, pgpool will create full replicas so data has to be replicated anyway. There is absolutely no win if data must end up on both servers anyway—writing to two nodes will not make things scale any better in this case.
How can you run pgpool without replication? The process is basically quite simple:
replication_mode
in the pool config to off
.master_slave
to on
.master_slave_sub_mode
to stream.pgpool
as described earlier in this chapter.In a basic setup, pgpool will assume that node number 0
will be the master. So, you have to make sure that your two nodes are listed in the config in the right order.
For a basic setup these small changes to the config are perfectly fine.
pgpool
offers a handful of parameters to tweak the configuration to your needs. One of the most important things we have to take into consideration is that PostgreSQL supports both synchronous and asynchronous replication. Why is this relevant? Well, let us assume a simple scenario. Somebody wants to register on a website:
pgpool
will dispatch you to node 0
because we are facing a write.0
we will be fine—the data is expected to be there.This can lead to strange behavior on the client side. A typical case of strange behavior would be: A user creates a profile. In this case a row is written. At the very next moment the user wants to visit his or her profile and check the data. If he or she happens to read from a replica, the data might not be there already. If you are writing a web application you must keep this in the back if your mind.
To get around this issue, you have two choices:
delay_threshold
in the pooler config
delay_threshold
defines the maximum lag a slave is allowed to have to still receive reads. The setting is defined in bytes of changes inside the XLOG. So, if you set this to 1024
a slave is only allowed to be 1 KB of XLOG behind the master. Otherwise it will not receive read requests.
Of course, unless this has been set to zero it is pretty hard to make it totally impossible that a slave can ever return data that is too old, however, a reasonable setting can make it very unlikely. In many practical applications this might very well be enough.
How does pgpool
know how far a slave is behind? The answer is that this can be configured easily:
sr_check_period:
This variable defines, how often the system should check those XLOG positions to figure out if the delay is too high or not. The unit used here is seconds.sr_check_user:
The name of the user to connect to the primary via streaming to check for the current position in the XLOG.sr_check_password:
The password for this user.3.144.37.12