Transforming foreign tables into local tables

Remote tables provide an easy and convenient way to access remote data in a PostgreSQL database. This is good for highly available systems, as a properly compartmentalized system invites segmented maintenance. Yet, remote data comes with a rather drastic cost regarding data fetching and handling overhead.

With the release of PostgreSQL 9.3 comes internal support of materialized views. Traditionally, materialized views merely instantiate a view into a physical structure to avoid expensive or complicated query plans and result sets. They also make it possible to index or optimize a view in ways not normally possible. Now, imagine what we can do with such a structure when utilizing foreign tables.

In this recipe, we will explore how materialized views can drastically increase local data access capability within a PostgreSQL database.

Getting ready

As we will be using the pgbench_accounts foreign table in this recipe, please follow all recipes up to Creating a foreign table before proceeding.

How to do it...

For this recipe, we will focus on the pg-report database server. All queries should be performed by the postgres user in the pgbench database. Follow these steps to create and use a materialized view:

  1. Rename the pgbench_accounts foreign table with this SQL statement:
    ALTER FOREIGN TABLE pgbench_accounts
          RENAME TO remote_accounts;
  2. Use this SQL statement to create a materialized view:
    CREATE MATERIALIZED VIEW pgbench_accounts AS
    SELECT *
      FROM remote_accounts
     WHERE bid = 5
      WITH DATA;
  3. Add an index to pgbench_accounts to make it usable:
    CREATE INDEX idx_pgbench_accounts_aid
        ON pgbench_accounts (aid);
  4. Execute this SQL statement to produce a simple query plan:
    EXPLAIN ANALYZE
     SELECT *
       FROM pgbench_accounts
      WHERE aid BETWEEN 400001 AND 400050;

How it works...

When it comes to this recipe, we begin by moving the existing pgbench_accounts table out of the way. The intent in this case is to prove that we can treat a materialized view similar to a local table. To do this, we want to create it with the same name the foreign table currently uses. Thus, pgbench_accounts becomes remote_accounts and better illustrates its relationship with the foreign server as a bonus.

Next, we create the actual materialized view. We could define all of the columns manually, but in this case, we want it to simply mirror the remote table. Think of this as object-oriented programming; we have a class named pgbench_remote, and we will instantiate it as pgbench_accounts.

Notice, however, that we added a WHERE clause to restrict the results to rows where bid is 5. For our particular set of test data, this represents only 100,000 rows of the total 20 million. We did this to illustrate that we could have a central repository of data and maintain only a small subset on each local server for better scalability purposes. By finishing the statement with WITH DATA, PostgreSQL executes the query and stores the result in our new materialized view. If we had omitted this, the view would be empty and unusable.

At this point, we created an index on the aid column. This reflects the primary key that exists on the remote table, and it means any local queries that expect it will perform normally. To prove this, our final step is to perform a basic query that retrieves 50 rows from the table and examines the path that PostgreSQL used to execute our request.

Our test system produced this output:

How it works...

We can see a few important things from this EXPLAIN output. First, our results are being supplied by the idx_pgbench_accounts_aid index we created. The query run time is reported as 0.024 ms, which is less than 1/40th of a millisecond. This is the performance we would expect from an indexed retrieval with such a small amount of rows.

There's more...

There are a few unfortunate aspects of materialized views that we must consider:

  • The contents are completely static
  • They cannot be the target of INSERT, UPDATE, or DELETE statements
  • Refreshing their contents may be slow

By static, we mean that the rows stored in the materialized view are the result of the SELECT statement we used to define it. It would be a great way to bootstrap a reporting table of some kind, but then, we see the next item in our list: no modifications. A natural consequence of this is that we can't build manual maintenance procedures designed to top off the contents. This means we must refresh the contents of the materialized view all at once with this statement:

REFRESH MATERIALIZED VIEW pgbench_accounts;

If the query that builds the output is slow and we have several materialized views like it, maintenance times could increase dramatically. Some contributed materialized view architectures do not have this limitation, and it's entirely possible future versions of PostgreSQL will also improve this aspect. For now though, we'll want to limit our materialized view definitions to queries that are very well optimized.

Note

Refreshing a materialized view requires an exclusive lock, because its entire contents are replaced during the refresh. Be wary of queries or batch jobs that depend on these views, as they may be temporarily blocked until the refresh is complete.

See also

The PostgreSQL documentation does a pretty good job of explaining materialized views. Please refer to these resources to learn more:

You can also build your own materialized view library. The techniques described at http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views worked well for several users before the feature was included in PostgreSQL 9.3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.68.159