Setting up replication and load balancing

To set up pgpool, we can simply take an existing sample file containing a typical configuration, copy it to our configuration directory and modify it:

$ cp /usr/local/etc/pgpool.conf.sample /usr/local/etc/pgpool.conf

It is a lot easier to just adapt this config file than to write things from scratch. In the following listing you will see a sample config you can use for a simple two-node setup:

listen_addresses = 'localhost'
port = 9999
socket_dir = '/tmp'
pcp_port = 9898
pcp_socket_dir = '/tmp'

backend_hostname0 = 'localhost'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/hs/db'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'localhost'
backend_port1 = 5433
backend_weight1 = 1
backend_data_directory1 = '/home/hs/db2'
backend_flag1 = 'ALLOW_TO_FAILOVER'

enable_pool_hba = off
pool_passwd = 'pool_passwd'
authentication_timeout = 60
ssl = off
num_init_children = 32
max_pool = 4
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0

connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'

replication_mode = on
replicate_select = off
insert_lock = on
load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'

Let us now discuss these settings in detail and see what each setting means:

  • pid_file_name: Just like most software components, pgpool will write a PID file. We can explicitly define the position of this file. Usually PID files will reside somewhere under /var/.
  • listen_addresses: This setting is the exact counterpart of PostgreSQL's own listen_addresses setting. The idea here is to have a setting defining which IPs to listen on.
  • port: This will define the TCP port on which the system will listen.
  • socket_dir: There is no hard requirement to use TCP. UNIX sockets will be perfectly fine as well. socket_dir will define the location where these UNIX sockets will reside.
  • pcp_port: The TCP port on which the administration interface will listen.
  • pcp_socket_dir: The UNIX sockets directory the administration interface will use.
  • backend_hostname0: The hostname of the first database in our setup.
  • backend_port0: The TCP port of this system.
  • backend_weight0: In pgpool we can assign weights to individual nodes. A higher weight will automatically make sure that more requests will be sent there.
  • backend_data_directory0: The PGDATA directory belonging to this instance.
  • backend_flag: This setting tells pgpool if a node is allowed to failover or not. Two settings are allowed: ALLOW_TO_FAILOVER and DISALLOW_TO_FAILOVER.
  • enable_pool_hba: If this is set to true, pgpool will use pool_hba.conf for authentication. pgpool follows the same concept as PostgreSQL here.
  • pool_passwd: Password file for pgpool.
  • authentication_timeout: Defines the timeout for pool authentication.
  • ssl: If this has been set to true, SSL will be enabled for client and backend connections. ssl_key and ssl_cert must be set as well to make this work.
  • num_init_children: When pgpool is started a number of connections will be pre-forked to make sure that response times stay low. This setting will define the number of initial children. The default value is 32.
  • max_pool: This setting defines the maximum size of the pool per child. Please be aware that the number of connections from pgpool processes to the backends may reach num_init_children * max_pool. This parameter can only be set at server start.
  • child_life_time: This defines the number of seconds a child is allowed to be idle before it is terminated.
  • child_max_connections: After this number of connections to the very same child, it will be terminated. In other words, a process will handle so many connections before it is recycled.
  • connection_life_time: This tells you how long a connection may live before it is recycled.
  • client_idle_limit: Disconnect a client if it has been idle for this amount of time.
  • connection_cache: If this is set to true, connections to the (storage) backend will be cached.
  • reset_query_list: This defines a list of commands, which has to be executed when a client exits a session. It is used to clean up a connection.
  • replication_mode: This turns replication explicitly on. The default value is false.
  • replicate_select: Shall we replicate SELECT statements or not?
  • insert_lock: When replicating tables with sequences (data type serial), pgpool has to make sure that those numbers will stay in sync.
  • load_balance_mode: Shall pgpool split the load to all hosts in the system? The default setting is false.
  • ignore_leading_white_space: Shall leading whitespaces of a query be ignored or not?
  • white_function_list: When pgpool runs a stored procedure, pgpool will have no idea what it actually does. SELECT func() can be a read or a write —there is no way to see from outside what will actually happen. white_function_list will allow you to teach pgpool which functions can be safely load balanced. If a function writes data, it must not be load balanced—otherwise data will be out of sync on those servers. Being out of sync must be avoided at any cost.
  • black_function_list: This is the opposite of white_function_list. It will tell pgpool which functions must be replicated to make sure that things stay in sync.

Keep in mind that there is an important relation between max_pool and a child process of pgpool. A single child process can handle up to max_pool connections.

Password authentication

Once you have come up with a working config for pgpool we can move ahead and configure authentication. In our case we want to add one user called hs. The password of hs should simply be hs. The pool_passwd will be in charge of storing passwords. The format of the file is simple: It will hold the name of the user, a colon, and the MD5-encrypted password.

To encrypt a password, we can use the pg_md5 script:

$ pg_md5 hs
789406d01073ca1782d86293dcfc0764

Then we can add all of this to the config file storing users and passwords. In the case of pgpool this file is called pcp.conf:

# USERID:MD5PASSWD
hs:789406d01073ca1782d86293dcfc0764

Firing up pgpool and testing the setup

Now that we have all components in place, we can start pgpool:

$ pgpool –f /usr/local/pgpool/pgpool.conf

If there is no error we should see a handful of processes waiting for some work from those clients out there:

$ ps ax | grep pool
30927 pts/4    S+     0:00 pgpool -n
30928 pts/4    S+     0:00 pgpool: wait for connection request
30929 pts/4    S+     0:00 pgpool: wait for connection request
30930 pts/4    S+     0:00 pgpool: wait for connection request
30931 pts/4    S+     0:00 pgpool: wait for connection request
30932 pts/4    S+     0:00 pgpool: wait for connection request

As you can clearly see, pgpool will show up as a handful of processes in the process table.

Attaching hosts

Basically we could already connect to pgpool and fire queries—but, this would instantly lead to disaster and inconsistency. Before we can move on to some real action, we should check the status of those nodes participating in the cluster. To do so we can utilize a tool called pcp_node_info:

$ pcp_node_info 5 localhost 9898 hs hs 0
localhost 5432 3 0.500000
$ pcp_node_info 5 localhost 9898 hs hs 1
localhost 5433 2 0.500000

The format of this call to pcp_node_info is a little complicated and not too easy to read if you happen to see it for the first time.

Note that the weights are 0.5 here. In the configuration, we have given both backends a weight of 1. pgpool has automatically adjusted the weight so that they add up to 1.

Here is the syntax of pcp_node_info:

pcp_node_info - display a pgpool-II node's information

Usage: pcp_node_info [-d] timeout hostname port# username password nodeID
  -d, --debug    : enable debug message (optional)
timeout        : connection timeout value in seconds. 
command exits on timeout
hostname       : pgpool-II hostname
port#          : PCP port number
username       : username for PCP authentication
password       : password for PCP authentication
nodeID         : ID of a node to get information for

Usage: pcp_node_info [options]
  Options available are:
  -h, --help     : print this help
  -v, --verbose  : display one line per information 
with a header

The first parameter is the timeout. It will define the maximum time for the request. Then we specify the host and the port of the PCP infrastructure. Finally we pass a username and a password as well as the number of the host we want to have information about. The system will respond with a hostname, a port, a status and the weight of the node. In our example we have to focus our attention on the status column. It can return four different values:

  • 0: This state is only used during the initialization. PCP will never display it.
  • 1: Node is up. No connections yet.
  • 2: Node is up. Connections are pooled.
  • 3: Node is down.

In our example we can see that node number 1 is basically returning status 3—it is down. This is clearly a problem because if we were to execute a write now, it would not end up in both nodes but just in one of them.

To fix the problem, we can call pcp_attach_node and enable the node:

$ pcp_attach_node 5 localhost 9898 hs hs 0
$ pcp_node_info 5 localhost 9898 hs hs 0
localhost 5432 1 0.500000

Once we have added the node we can check its status again. It will be up and running.

To test our setup we can check out psql and display a list of all databases in the system:

$ psql -l -p 9999
                    List of databases
   Name   | Owner | Encoding  |   Collate   | Ctype ...
----------+-------+-----------+-------------+-- ...
postgres  | hs    | SQL_ASCII | en_US.UTF-8 | C ...
template0 | hs    | SQL_ASCII | en_US.UTF-8 | C ... 
template1 | hs    | SQL_ASCII | en_US.UTF-8 | C ...  
(3 rows)

The answer is as expected. We can see an empty database instance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.199.122