Checking the pg_stat_activity view

Another source of valuable troubleshooting information is PostgreSQL itself. There are numerous views, tables, and functions dedicated to tracking and reporting various statistics and operating statuses for each hosted database. Principal among these is the pg_stat_activity view.

This view tells us what every database client is doing, where it is connected from, which user account it is operating under, and other important values. When administering a highly available database, we must either have an iron control over what executes in the database or the ability to quickly and easily assess its execution state. Besides using this data to track suspicious activity, we can also cancel long-running queries or Cartesian Products, or simply examine the connection turnover.

We probably use this view into the database more than any other, and it forms the backbone of several monitoring utilities as well. Let's explore just why this system catalog is so indispensable.

Getting ready

While any user can view the contents of the pg_stat_activity view, only a superuser can freely examine the contents of every column. To avoid security exploits, regular users cannot view the current query activity, any connection information, or fields related to query time or status.

To get the most out of this view safely, we want to grant elevated privileges to specific users dedicated to monitoring and status checks. In order to do this, we must first connect to the database as a superuser (such as the postgres user) for the duration of this recipe.

How to do it...

Perform the following steps to prepare pg_stat_activity for generalized use:

  1. Execute this SQL statement as a database superuser to create a function:
    CREATE OR REPLACE FUNCTION pg_stat_activity()
    RETURNS SETOF pg_stat_activity AS $$
        SELECT * FROM pg_stat_activity;
    $$ LANGUAGE sql SECURITY DEFINER;
  2. Execute this SQL statement to secure the function we created:
    REVOKE ALL ON FUNCTION pg_stat_activity() FROM PUBLIC;
  3. Create a user dedicated to monitoring with this SQL statement:
    CREATE USER db_mon WITH PASSWORD 'somepass';
  4. Grant the monitoring user the ability to use our function with this SQL statement:
    GRANT EXECUTE ON FUNCTION pg_stat_activity() TO db_mon;

Now, connect to PostgreSQL as the db_mon user and examine the contents of pg_stat_activity by executing this SQL query:

SELECT * FROM pg_stat_activity();

How it works...

The pg_stat_activity view is a wealth of information for a database administrator. However, it is all but useless for monitoring due to the security measures that encumber it. However, these fields are obfuscated specifically to prevent system compromises and data leaks. So, our entire goal is to prevent abusing the view while still loosening the security.

The first step we take is to create a function that is capable of returning a set of rows similar to the pg_stat_activity view itself. The SETOF modifier tells PostgreSQL that our function does exactly that. It's no coincidence that the body of our function is merely a SELECT statement on the pg_stat_activity view.

Why did we use a function to abstract the view? After all, it seems excessive to create a whole function for such a simple statement. The answer is in the SECURITY DEFINER function modifier that we added; it allows the function to execute as the user that created it. Thus, if we create the function as the postgres user, it runs as if the postgres user invoked it. As the postgres user is a superuser, the function can see all of the hidden columns, no matter who runs the function.

By default, all new functions are available to all users. However, this function executes as a superuser, and we don't want just anyone to execute it and see what everyone else is doing. So, we revoke all permissions from the PUBLIC context. At this point, only a superuser can call our function.

As we want to be able to monitor database status values, we create a user for this very purpose. We named our user db_mon, but any user name works just as well. As long as it has a secure password or is only used locally, our security exposure is minimal. Then, we grant EXECUTE privileges on the pg_stat_activity function, and our work is complete. The db_mon user can now view all user queries. We can also grant EXECUTE to other DBAs or support staff who may need it.

What data is available? Important fields include, but are not limited to, the following:

  • pid or procpid: In versions of PostgreSQL 9.2 and above, this field is named pid; all older versions use procpid. This tells us that the process ID assigned to the backend server process by the operating system is extremely valuable for debugging or connection-management purposes.
  • username: This displays the name of the user who owns this connection.
  • backend_start: This provides the date and time when the connection was established.
  • xact_start: This tracks the date and time when the current transaction started, if any.
  • query_start: This reports the date and time of the last query submitted.
  • waiting: This tells us whether or not the connection is currently blocked by something and will show either t for true or f for false.
  • state: In versions of PostgreSQL 9.2 and above, this column reports the current state of the connection. States marked as active are executing a query; the idle ones are not. If a connection is marked idle in a transaction, look carefully at the query_start and xact_start fields for excessive delays. If a connection was in a transaction and encountered an error, it will report idle in the transaction (aborted); applications should catch errors and either roll back the transaction or disconnect, so idle aborted transactions are a possible source of trouble. Unfortunately, this field does not exist in older versions, so a certain context is lost during investigation.
  • query: In versions of PostgreSQL 9.1 and above, this column contains most or all of the last known query this connection executed. This field does not exist in older versions.
  • current_query: In versions of PostgreSQL 9.1 and below, this column contains most or the entire last known query that this connection executed. In newer versions, this field was split into the state and query fields to provide better insight into the connection activity during transactions.

There's more...

Mind the version! PostgreSQL versions below 9.2 do not have the state or query fields and supply only the current_query column. While it might be tempting to use query and current_query interchangeably, older PostgreSQL versions are strictly at a disadvantage.
In PostgreSQL 9.1 and below, queries are only reflected in the pg_stat_activity view while they are actually executing. As soon as the query finishes, the current_query column will be empty or report idle in transaction if the query was part of a transaction. This means we lose a lot of operating context unless we just happened to be logging every database query.

On very high-volume OLTP systems, recording every query is not feasible. We've personally administered databases that handle over 1 billion queries per day, at a rate of 60,000 per second. Even with a conservative query length of 50 characters, we would produce over 50 GB of logs every day.

Troubleshooting stuck, idle, or otherwise faulty connections is much easier in the newer versions of PostgreSQL. If at all possible, upgrade to 9.2 or above.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.17.12