Another source of valuable troubleshooting information is PostgreSQL itself. There are numerous views, tables, and functions dedicated to tracking and reporting various statistics and operating statuses for each hosted database. Principal among these is the pg_stat_activity
view.
This view tells us what every database client is doing, where it is connected from, which user account it is operating under, and other important values. When administering a highly available database, we must either have an iron control over what executes in the database or the ability to quickly and easily assess its execution state. Besides using this data to track suspicious activity, we can also cancel long-running queries or Cartesian Products, or simply examine the connection turnover.
We probably use this view into the database more than any other, and it forms the backbone of several monitoring utilities as well. Let's explore just why this system catalog is so indispensable.
While any user can view the contents of the pg_stat_activity
view, only a superuser can freely examine the contents of every column. To avoid security exploits, regular users cannot view the current query activity, any connection information, or fields related to query time or status.
To get the most out of this view safely, we want to grant elevated privileges to specific users dedicated to monitoring and status checks. In order to do this, we must first connect to the database as a superuser (such as the postgres
user) for the duration of this recipe.
Perform the following steps to prepare pg_stat_activity
for generalized use:
CREATE OR REPLACE FUNCTION pg_stat_activity() RETURNS SETOF pg_stat_activity AS $$ SELECT * FROM pg_stat_activity; $$ LANGUAGE sql SECURITY DEFINER;
REVOKE ALL ON FUNCTION pg_stat_activity() FROM PUBLIC;
CREATE USER db_mon WITH PASSWORD 'somepass';
GRANT EXECUTE ON FUNCTION pg_stat_activity() TO db_mon;
Now, connect to PostgreSQL as the db_mon
user and examine the contents of pg_stat_activity
by executing this SQL query:
SELECT * FROM pg_stat_activity();
The pg_stat_activity
view is a wealth of information for a database administrator. However, it is all but useless for monitoring due to the security measures that encumber it. However, these fields are obfuscated specifically to prevent system compromises and data leaks. So, our entire goal is to prevent abusing the view while still loosening the security.
The first step we take is to create a function that is capable of returning a set of rows similar to the pg_stat_activity
view itself. The SETOF
modifier tells PostgreSQL that our function does exactly that. It's no coincidence that the body of our function is merely a SELECT
statement on the pg_stat_activity
view.
Why did we use a function to abstract the view? After all, it seems excessive to create a whole function for such a simple statement. The answer is in the SECURITY DEFINER
function modifier that we added; it allows the function to execute as the user that created it. Thus, if we create the function as the postgres
user, it runs as if the postgres
user invoked it. As the postgres
user is a superuser, the function can see all of the hidden columns, no matter who runs the function.
By default, all new functions are available to all users. However, this function executes as a superuser, and we don't want just anyone to execute it and see what everyone else is doing. So, we revoke all permissions from the PUBLIC
context. At this point, only a superuser can call our function.
As we want to be able to monitor database status values, we create a user for this very purpose. We named our user db_mon
, but any user name works just as well. As long as it has a secure password or is only used locally, our security exposure is minimal. Then, we grant EXECUTE
privileges on the pg_stat_activity
function, and our work is complete. The db_mon
user can now view all user queries. We can also grant EXECUTE
to other DBAs or support staff who may need it.
What data is available? Important fields include, but are not limited to, the following:
pid
or procpid
: In versions of PostgreSQL 9.2 and above, this field is named pid
; all older versions use procpid
. This tells us that the process ID assigned to the backend server process by the operating system is extremely valuable for debugging or connection-management purposes.username
: This displays the name of the user who owns this connection.backend_start
: This provides the date and time when the connection was established.xact_start
: This tracks the date and time when the current transaction started, if any.query_start
: This reports the date and time of the last query submitted.waiting
: This tells us whether or not the connection is currently blocked by something and will show either t
for true or f
for false.state
: In versions of PostgreSQL 9.2 and above, this column reports the current state of the connection. States marked as active
are executing a query; the idle
ones are not. If a connection is marked idle in a transaction
, look carefully at the query_start
and xact_start
fields for excessive delays. If a connection was in a transaction and encountered an error, it will report idle in the transaction (aborted)
; applications should catch errors and either roll back the transaction or disconnect, so idle
aborted transactions are a possible source of trouble. Unfortunately, this field does not exist in older versions, so a certain context is lost during investigation.query
: In versions of PostgreSQL 9.1 and above, this column contains most or all of the last known query this connection executed. This field does not exist in older versions.current_query
: In versions of PostgreSQL 9.1 and below, this column contains most or the entire last known query that this connection executed. In newer versions, this field was split into the state
and query
fields to provide better insight into the connection activity during transactions.Mind the version! PostgreSQL versions below 9.2 do not have the state
or query
fields and supply only the current_query
column. While it might be tempting to use query
and current_query
interchangeably, older PostgreSQL versions are strictly at a disadvantage.
In PostgreSQL 9.1 and below, queries are only reflected in the pg_stat_activity
view while they are actually executing. As soon as the query finishes, the current_query
column will be empty or report idle in transaction
if the query was part of a transaction. This means we lose a lot of operating context unless we just happened to be logging every database query.
On very high-volume OLTP systems, recording every query is not feasible. We've personally administered databases that handle over 1 billion queries per day, at a rate of 60,000 per second. Even with a conservative query length of 50 characters, we would produce over 50 GB of logs every day.
Troubleshooting stuck, idle, or otherwise faulty connections is much easier in the newer versions of PostgreSQL. If at all possible, upgrade to 9.2 or above.
pg_stat_activity
view is described in more depth there, so take a look at http://www.postgresql.org/docs/9.3/static/monitoring-stats.html.3.14.251.128