Evaluating PgBouncer pool health

Though PgBouncer provides similar information regarding both server and client database connections, the status and health of each pool are also available. If we didn't already clarify, PgBouncer pools are separated by username, database name, and the server's hostname. Thus, each PostgreSQL server may have as many connection pools as there are different databases a user might access via PgBouncer.

PgBouncer supplies somewhat detailed information when seeking server or client status. However, these are not database views, so we can't summarize or aggregate the output to make it more usable. When running a highly available database server, we need to monitor aggregate values, if possible, to watch for potential patterns of misconfiguration or abuse.

Unfortunately, since PgBouncer acts as a proxy, we can't rely on the pg_stat_activity system view for summaries. This means PgBouncer and its administrative console are the main sources of debugging and status information. Thankfully, there is quite a lot of useful information. Let's explore.

Getting ready

As before, we continue to use the PgBouncer administration console, so we recommend following the Listing PgBouncer client connections recipe before continuing here. Remember to use the pgbouncer database name to enter the administration console.

How to do it...

Follow these steps to get the status of PgBouncer clients:

  1. Connect to the pgbouncer database on port 6432 of the PostgreSQL server as the postgres user.
  2. Issue the following query for pool status:
    SHOW POOLS;
  3. Issue the following query for pool statistics:
    SHOW STATS;

How it works...

Connecting to the pgbouncer database name on port 6432 connects us to PgBouncer using a simulated database that doesn't actually exist. This name tells PgBouncer that we want the administration console. If we configured PgBouncer according to the Configuring PgBouncer safely recipe, the postgres user is the only database user allowed to use the console.

By sending SHOW POOLS as a query, PgBouncer responds with a row for every PostgreSQL database to which it is acting as a proxy. Each column is a summary for various client and server metrics, mainly related to activity or status. Here is a detailed summary of the columns:

  • cl_active: This column shows the number of clients that are currently assigned a server connection. This number should not exceed the value we get by adding default_pool_size and reserve_pool_size from the pgbouncer.ini configuration file. If the total is regularly below the maximum, we may consider reducing the pool size.
  • cl_waiting: It denotes the number of clients waiting for a server connection. Since this is a snapshot of the current activity, the number can fluctuate drastically between checks. However, if it regularly remains above zero, and the maxwait column is increasing, the pools are probably too small.
  • sv_active: This column details how many PostgreSQL server connections are assigned to the PgBouncer clients. These clients are not necessarily active, just associated with the connection. The cl_active and sv_active columns should always be equal.
  • sv_idle: This column provides a count of PostgreSQL server connections that are not in use at all. PgBouncer marks connections as idle after it sends a reset query to clear out the allocated objects and settings. Thus, not only is the connection idle but it's also immediately ready for assignment. If there are several of these, it's because PgBouncer doesn't need them; think about reducing the pool size.
  • sv_used: This indicates the count of dirty PostgreSQL server connections. These connections are actually idle, but they have not yet been reset by PgBouncer for reuse. This means we need to add sv_used to sv_idle to get the real count of idle connections for this database and user combination. As with sv_idle, a large amount of used connections indicate reducing pool size limits.
  • maxwait: This column outlines the maximum number of seconds a client has waited for a connection. Combined with the cl_waiting cumulative total, we can infer either an excess or shortage of throughput based on the connection availability. This statistic is constantly updated, so if no clients are waiting, it will show zero. This kind of live feedback allows us to adjust our pool sizes to ideal levels.

By sending SHOW STATS as a query, PgBouncer responds with a row for every PostgreSQL database to which it is acting as a proxy. Each column is a summary of various network and time metrics. Here is a detailed summary of these columns:

  • total_requests: This column represents the total number of transactions that PgBouncer has directed through the pool. The documentation suggests that the SQL requests are summarized here, but this is probably a miscommunication. Tests clearly show that only queries outside of transactions, or transactions themselves, increase the counter. As transactions are more expensive than simple queries, they can represent a larger ratio of excess work.
  • total_received: This column tracks the total amount of data in bytes sent to PgBouncer through the network for this database and user combination. In order to have a healthy pool, we need to illustrate high throughput. Thus, we must also examine the next column.
  • total_sent: This column tracks the total amount of data in bytes sent from PgBouncer to the clients accessing the database. The ratio of this value to total_received can indicate that PgBouncer is handling too many large queries, which reduces pool connection throughput. It's also possible that a misconfigured batch job is improperly accessing the database via PgBouncer.
  • total_query_time: This is the amount of time in microseconds that PgBouncer has spent communicating with a client in this pool. This can be a particularly difficult column to read because it's cumulative, based on all clients accessing PostgreSQL connections. For now, we suggest ignoring it.
  • avg_req: This column shows the average number of requests per second since the last stat update. As with total_requests, this is the amount of transactions, not queries, handled by PgBouncer.
  • avg_recv: This column details the average number of bytes sent to PgBouncer by each client since the last stat update. In low activity pools, this may reset to zero between samples.
  • avg_sent: This column indicates the average amount of bytes that PgBouncer has sent to each client since the last stat update. In low activity pools, this may reset to zero between samples. Along with avg_recv, we can again obtain a ratio of sent bandwidth versus received to look for potential excessive query output.
  • avg_query: This column provides the average query duration in microseconds for all connections in this pool. This is a much more useful metric than total_query_time as it actually tells us the average throughput of the pool. If the average query time is 50 ms, for example, we can expect each PostgreSQL connection to handle 20 clients per second. This is valuable data to properly size the connection pools.

Feel free to browse the PgBouncer documentation for other available fields.

There's more...

We've mentioned adjusting pool size several times in this recipe. Since pgpool acts as a single proxy for several database and user combinations, we can actually override the default in cases where pools require more direct management. For instance, if we change our entry in /etc/pgbouncer.ini for the postgres database to postgres = host=localhost pool_size=5, no user connecting to the postgres database can use more than five connections, even if the default is 50 per pool. Keep this in mind when analyzing the pools, clients, servers, and other statistics that PgBouncer collects on our behalf. We will most likely need several adjustments before reaching an ideal state that won't overwhelm the PostgreSQL server, yet adequately supplies client requirements.

See also

We know we've listed these documentation links before, but we're still working with complicated configuration settings and usage. We've listed them again for convenience:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.40.189