To set up pgpool, we can simply take an existing sample file containing a typical configuration, copy it to our configuration directory and modify it:
$ cp /usr/local/etc/pgpool.conf.sample /usr/local/etc/pgpool.conf
It is a lot easier to just adapt this config file than to write things from scratch. In the following listing you will see a sample config you can use for a simple two-node setup:
listen_addresses = 'localhost' port = 9999 socket_dir = '/tmp' pcp_port = 9898 pcp_socket_dir = '/tmp' backend_hostname0 = 'localhost' backend_port0 = 5432 backend_weight0 = 1 backend_data_directory0 = '/home/hs/db' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = 'localhost' backend_port1 = 5433 backend_weight1 = 1 backend_data_directory1 = '/home/hs/db2' backend_flag1 = 'ALLOW_TO_FAILOVER' enable_pool_hba = off pool_passwd = 'pool_passwd' authentication_timeout = 60 ssl = off num_init_children = 32 max_pool = 4 child_life_time = 300 child_max_connections = 0 connection_life_time = 0 client_idle_limit = 0 connection_cache = on reset_query_list = 'ABORT; DISCARD ALL' replication_mode = on replicate_select = off insert_lock = on load_balance_mode = on ignore_leading_white_space = on white_function_list = '' black_function_list = 'nextval,setval'
Let us now discuss these settings in detail and see what each setting means:
pid_file_name
: Just like most software components, pgpool will write a PID file. We can explicitly define the position of this file. Usually PID files will reside somewhere under /var/
.listen_addresses
: This setting is the exact counterpart of PostgreSQL's own listen_addresses
setting. The idea here is to have a setting defining which IPs to listen on.port
: This will define the TCP port on which the system will listen.socket_dir
: There is no hard requirement to use TCP. UNIX sockets will be perfectly fine as well. socket_dir
will define the location where these UNIX sockets will reside.pcp_port
: The TCP port on which the administration interface will listen.pcp_socket_dir
: The UNIX sockets directory the administration interface will use.backend_hostname0
: The hostname of the first database in our setup.backend_port0
: The TCP port of this system.backend_weight0
: In pgpool we can assign weights to individual nodes. A higher weight will automatically make sure that more requests will be sent there.backend_data_directory0
: The PGDATA
directory belonging to this instance.backend_flag
: This setting tells pgpool if a node is allowed to failover or not. Two settings are allowed: ALLOW_TO_FAILOVER
and DISALLOW_TO_FAILOVER
.enable_pool_hba
: If this is set to true
, pgpool will use pool_hba.conf
for authentication. pgpool follows the same concept as PostgreSQL here.pool_passwd
: Password file for pgpool.authentication_timeout
: Defines the timeout for pool authentication.ssl
: If this has been set to true, SSL will be enabled for client and backend connections. ssl_key
and ssl_cert
must be set as well to make this work.num_init_children
: When pgpool is started a number of connections will be pre-forked to make sure that response times stay low. This setting will define the number of initial children. The default value is 32.max_pool
: This setting defines the maximum size of the pool per child. Please be aware that the number of connections from pgpool processes to the backends may reach num_init_children * max_pool. This parameter can only be set at server start.child_life_time
: This defines the number of seconds a child is allowed to be idle before it is terminated.child_max_connections
: After this number of connections to the very same child, it will be terminated. In other words, a process will handle so many connections before it is recycled.connection_life_time
: This tells you how long a connection may live before it is recycled.client_idle_limit
: Disconnect a client if it has been idle for this amount of time.connection_cache
: If this is set to true
, connections to the (storage) backend will be cached.reset_query_list
: This defines a list of commands, which has to be executed when a client exits a session. It is used to clean up a connection.replication_mode
: This turns replication explicitly on. The default value is false.replicate_select
: Shall we replicate SELECT
statements or not?insert_lock
: When replicating tables with sequences (data type serial
), pgpool has to make sure that those numbers will stay in sync.load_balance_mode
: Shall pgpool split the load to all hosts in the system? The default setting is false.ignore_leading_white_space
: Shall leading whitespaces of a query be ignored or not?white_function_list
: When pgpool runs a stored procedure, pgpool will have no idea what it actually does. SELECT func()
can be a read or a write —there is no way to see from outside what will actually happen. white_function_list
will allow you to teach pgpool which functions can be safely load balanced. If a function writes data, it must not be load balanced—otherwise data will be out of sync on those servers. Being out of sync must be avoided at any cost.black_function_list
: This is the opposite of white_function_list
. It will tell pgpool which functions must be replicated to make sure that things stay in sync.Keep in mind that there is an important relation between max_pool
and a child process of pgpool. A single child process can handle up to max_pool
connections.
Once you have come up with a working config for pgpool
we can move ahead and configure authentication. In our case we want to add one user called hs
. The password of hs
should simply be hs
. The pool_passwd
will be in charge of storing passwords. The format of the file is simple: It will hold the name of the user, a colon, and the MD5-encrypted password.
To encrypt a password, we can use the pg_md5
script:
$ pg_md5 hs 789406d01073ca1782d86293dcfc0764
Then we can add all of this to the config file storing users and passwords. In the case of pgpool this file is called pcp.conf
:
# USERID:MD5PASSWD hs:789406d01073ca1782d86293dcfc0764
Now that we have all components in place, we can start pgpool:
$ pgpool –f /usr/local/pgpool/pgpool.conf
If there is no error we should see a handful of processes waiting for some work from those clients out there:
$ ps ax | grep pool 30927 pts/4 S+ 0:00 pgpool -n 30928 pts/4 S+ 0:00 pgpool: wait for connection request 30929 pts/4 S+ 0:00 pgpool: wait for connection request 30930 pts/4 S+ 0:00 pgpool: wait for connection request 30931 pts/4 S+ 0:00 pgpool: wait for connection request 30932 pts/4 S+ 0:00 pgpool: wait for connection request
As you can clearly see, pgpool
will show up as a handful of processes in the process table.
Basically we could already connect to pgpool
and fire queries—but, this would instantly lead to disaster and inconsistency. Before we can move on to some real action, we should check the status of those nodes participating in the cluster. To do so we can utilize a tool called pcp_node_info
:
$ pcp_node_info 5 localhost 9898 hs hs 0 localhost 5432 3 0.500000 $ pcp_node_info 5 localhost 9898 hs hs 1 localhost 5433 2 0.500000
The format of this call to pcp_node_info
is a little complicated and not too easy to read if you happen to see it for the first time.
Note that the weights are 0.5 here. In the configuration, we have given both backends a weight of 1. pgpool has automatically adjusted the weight so that they add up to 1.
Here is the syntax of pcp_node_info
:
pcp_node_info - display a pgpool-II node's information Usage: pcp_node_info [-d] timeout hostname port# username password nodeID -d, --debug : enable debug message (optional) timeout : connection timeout value in seconds. command exits on timeout hostname : pgpool-II hostname port# : PCP port number username : username for PCP authentication password : password for PCP authentication nodeID : ID of a node to get information for Usage: pcp_node_info [options] Options available are: -h, --help : print this help -v, --verbose : display one line per information with a header
The first parameter is the timeout
. It will define the maximum time for the request. Then we specify the host and the port of the PCP infrastructure. Finally we pass a username and a password as well as the number of the host we want to have information about. The system will respond with a hostname, a port, a status and the weight of the node. In our example we have to focus our attention on the status column. It can return four different values:
0
: This state is only used during the initialization. PCP will never display it.1
: Node is up. No connections yet.2
: Node is up. Connections are pooled.3
: Node is down.In our example we can see that node number 1
is basically returning status 3
—it is down. This is clearly a problem because if we were to execute a write now, it would not end up in both nodes but just in one of them.
To fix the problem, we can call pcp_attach_node
and enable the node:
$ pcp_attach_node 5 localhost 9898 hs hs 0 $ pcp_node_info 5 localhost 9898 hs hs 0 localhost 5432 1 0.500000
Once we have added the node we can check its status again. It will be up and running.
To test our setup we can check out psql
and display a list of all databases in the system:
$ psql -l -p 9999 List of databases Name | Owner | Encoding | Collate | Ctype ... ----------+-------+-----------+-------------+-- ... postgres | hs | SQL_ASCII | en_US.UTF-8 | C ... template0 | hs | SQL_ASCII | en_US.UTF-8 | C ... template1 | hs | SQL_ASCII | en_US.UTF-8 | C ... (3 rows)
The answer is as expected. We can see an empty database instance.
18.118.200.154