Remote tables provide an easy and convenient way to access remote data in a PostgreSQL database. This is good for highly available systems, as a properly compartmentalized system invites segmented maintenance. Yet, remote data comes with a rather drastic cost regarding data fetching and handling overhead.
With the release of PostgreSQL 9.3 comes internal support of materialized views. Traditionally, materialized views merely instantiate a view into a physical structure to avoid expensive or complicated query plans and result sets. They also make it possible to index or optimize a view in ways not normally possible. Now, imagine what we can do with such a structure when utilizing foreign tables.
In this recipe, we will explore how materialized views can drastically increase local data access capability within a PostgreSQL database.
As we will be using the pgbench_accounts
foreign table in this recipe, please follow all recipes up to Creating a foreign table before proceeding.
For this recipe, we will focus on the pg-report
database server. All queries should be performed by the postgres
user in the pgbench
database. Follow these steps to create and use a materialized view:
pgbench_accounts
foreign table with this SQL statement:ALTER FOREIGN TABLE pgbench_accounts RENAME TO remote_accounts;
CREATE MATERIALIZED VIEW pgbench_accounts AS SELECT * FROM remote_accounts WHERE bid = 5 WITH DATA;
pgbench_accounts
to make it usable:CREATE INDEX idx_pgbench_accounts_aid ON pgbench_accounts (aid);
EXPLAIN ANALYZE SELECT * FROM pgbench_accounts WHERE aid BETWEEN 400001 AND 400050;
When it comes to this recipe, we begin by moving the existing pgbench_accounts
table out of the way. The intent in this case is to prove that we can treat a materialized view similar to a local table. To do this, we want to create it with the same name the foreign table currently uses. Thus, pgbench_accounts
becomes remote_accounts
and better illustrates its relationship with the foreign server as a bonus.
Next, we create the actual materialized view. We could define all of the columns manually, but in this case, we want it to simply mirror the remote table. Think of this as object-oriented programming; we have a class named pgbench_remote
, and we will instantiate it as pgbench_accounts
.
Notice, however, that we added a WHERE
clause to restrict the results to rows where bid
is 5
. For our particular set of test data, this represents only 100,000 rows of the total 20 million. We did this to illustrate that we could have a central repository of data and maintain only a small subset on each local server for better scalability purposes. By finishing the statement with WITH DATA
, PostgreSQL executes the query and stores the result in our new materialized view. If we had omitted this, the view would be empty and unusable.
At this point, we created an index on the aid
column. This reflects the primary key that exists on the remote table, and it means any local queries that expect it will perform normally. To prove this, our final step is to perform a basic query that retrieves 50 rows from the table and examines the path that PostgreSQL used to execute our request.
Our test system produced this output:
We can see a few important things from this EXPLAIN
output. First, our results are being supplied by the idx_pgbench_accounts_aid index
we created. The query run time is reported as 0.024 ms
, which is less than 1/40th of a millisecond. This is the performance we would expect from an indexed retrieval with such a small amount of rows.
There are a few unfortunate aspects of materialized views that we must consider:
INSERT
, UPDATE
, or DELETE
statementsBy static, we mean that the rows stored in the materialized view are the result of the SELECT
statement we used to define it. It would be a great way to bootstrap a reporting table of some kind, but then, we see the next item in our list: no modifications. A natural consequence of this is that we can't build manual maintenance procedures designed to top off the contents. This means we must refresh the contents of the materialized view all at once with this statement:
REFRESH MATERIALIZED VIEW pgbench_accounts;
If the query that builds the output is slow and we have several materialized views like it, maintenance times could increase dramatically. Some contributed materialized view architectures do not have this limitation, and it's entirely possible future versions of PostgreSQL will also improve this aspect. For now though, we'll want to limit our materialized view definitions to queries that are very well optimized.
The PostgreSQL documentation does a pretty good job of explaining materialized views. Please refer to these resources to learn more:
You can also build your own materialized view library. The techniques described at http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views worked well for several users before the feature was included in PostgreSQL 9.3.
3.12.163.175