PostgreSQL is one of the most ANSI-SQL compliant databases on the market. It even supports many of the additions introduced with the SQL:2006+ standard. PostgreSQL goes much further and adds constructs that range from mundane syntax shorthands to avant-garde features that break the bounds of traditional SQL. In this chapter, we’ll cover some SQL constructs not often found in other databases. For this chapter, you should have a working knowledge of SQL; otherwise, you may not appreciate the labor-saving tidbits that PostgreSQL brings to the table.
Like most relational databases, PostgreSQL supports views. Some
things have changed over the years on how views work and how you can
update the underlying tables via updates on views. In pre-PostgreSQL 9.1,
views were updatable but required INSTEAD OF
UPDATE
, DELETE
rules
on the view. In PostgreSQL 9.1, the preferred way of updating data via a
view is to use INSTEAD OF
triggers instead of rules, though
rules are still supported. The trigger approach is standards compliant and
more along the lines of what you’ll find in other databases that support
triggers and updatable views.
Unlike Microsoft SQL Server and MySQL, simple views are not automatically updatable and require writing an instead-of rule or trigger to make them updatable. On the plus side, you have great control over how the underlying tables will be updated. We’ll cover triggers in more detail in Chapter 8. You can see an example of building updatable views using rules in Database Abstraction with Updateable Views.
Views are most useful for encapsulating common joins. In this next example, we’ll join our lookup with our fact data.
Example 7-1. View census.vw_facts
CREATE OR REPLACE VIEW census.vw_facts AS SELECT lf.fact_type_id, lf.category, lf.fact_subcats, lf.short_name , f.tract_id, f.yr, f.val, f.perc FROM census.facts As f INNER JOIN census.lu_fact_types As lf ON f.fact_type_id = lf.fact_type_id;
To make this view updatable with a trigger, you can define one or more instead of triggers. We first define the trigger function(s). There is no standard in naming of the functions and a trigger function can be written in any language that supports triggers. For this example, we’ll use PL/pgSQL to write our trigger function as shown in Example 7-2.
Example 7-2. Trigger function for vw_facts to update, delete, insert
CREATE OR REPLACE FUNCTION census.trig_vw_facts_ins_upd_del() RETURNS trigger AS $$ BEGIN IF (TG_OP = 'DELETE') THEN DELETE FROM census.facts AS f WHERE f.tract_id = OLD.tract_id AND f.yr = OLD.yr AND f.fact_type_id = OLD.fact_type_id; RETURN OLD; END IF; IF (TG_OP = 'INSERT') THEN INSERT INTO census.facts(tract_id, yr, fact_type_id, val, perc) SELECT NEW.tract_id, NEW.yr, NEW.fact_type_id, NEW.val, NEW.perc; RETURN NEW; END IF; IF (TG_OP = 'UPDATE') THEN IF ROW(OLD.fact_type_id, OLD.tract_id, OLD.yr, OLD.val, OLD.perc) != ROW(NEW.fact_type_id, NEW.tract_id, NEW.yr, NEW.val, NEW.perc) THEN UPDATE census.facts AS f SET tract_id = NEW.tract_id, yr = NEW.yr , fact_type_id = NEW.fact_type_id , val = NEW.val, perc = NEW.perc WHERE f.tract_id = OLD.tract_id AND f.yr = OLD.yr AND f.fact_type_id = OLD.fact_type_id; RETURN NEW; ELSE RETURN NULL; END IF; END IF; END; $$ LANGUAGE plpgsql VOLATILE;
Next, we bind the trigger function to the view as shown in Example 7-3.
Example 7-3. Bind trigger function to vw_facts view insert,update,delete events
CREATE TRIGGER trip_01_vw_facts_ins_upd_del INSTEAD OF INSERT OR UPDATE OR DELETE ON census.vw_facts FOR EACH ROW EXECUTE PROCEDURE census.trig_vw_facts_ins_upd_del();
Now when we update, delete, or insert into our view, it will update
the underlying facts
table instead:
UPDATE census.vw_facts SET yr = 2012 WHERE yr = 2011 AND tract_id = '25027761200';
This will output a note:
Query returned successfully: 51 rows affected, 21 ms execution time.
If we tried to update one of the fields in our lookup table, because of our row compare the update will not take place, as shown here:
UPDATE census.vw_facts SET short_name = 'test';
Therefore, the output message would be:
Query returned successfully: 0 rows affected, 931 ms execution time.
Although we have just one trigger function to handle multiple events, we could have just as easily created a separate trigger and trigger function for each event.
Window functions are a common ANSI-SQL feature supported in PostgreSQL since 8.4. A window function has the unusual knack to see and use data beyond the current row, hence the term window. Without window functions, you’d have to resort to using joins and subqueries to poll data from adjacent rows. On the surface, window functions do violate the set-based operating principle of SQL, but we mollify the purist by claiming them to be a short-hand. You can find more details and examples in the section Window Functions.
Here’s a quick example to get started. Using a window function, we
can obtain the average value for all records with
fact_type_id
of 86 in one simple
SELECT
.
Example 7-4. The basic window
SELECT tract_id, val, AVG(val) OVER () as val_avg FROM census.facts WHERE fact_type_id = 86;
tract_id | val | val_avg ------------+-----------+----------------------- 25001010100 | 2942.000 | 4430.0602165087956698 25001010206 | 2750.000 | 4430.0602165087956698 25001010208 | 2003.000 | 4430.0602165087956698 25001010304 | 2421.000 | 4430.0602165087956698 : :
Notice how we were able to perform an aggregation without having to
use GROUP BY
. Furthermore, we were able to rejoin the
aggregated result back with the other variables without using a formal
join. The OVER ()
converted our conventional
AVG()
function into a window function. When PostgreSQL
sees a window function in a particular row, it will actually scan all rows
fitting the WHERE
clause, perform the aggregation, and
output the value as part of the row.
You can embellish the window into separate panes using the
PARTITION BY
clause. This instructs PostgreSQL to
subdivide the window into smaller panes and then to take the aggregate
over those panes instead of over the entire set of rows. The result is
then output along with the row depending on which pane it belongs to. In
this next example, we repeat what we did in Example 7-4, but partition our window into
separate panes by county code.
Example 7-5. Partition our window by county code
SELECT tract_id, val, AVG(val) OVER (PARTITION BY left(tract_id,5)) As val_avg_county FROM census.facts WHERE fact_type_id = 86 ORDER BY tract_id;
tract_id | val | val_avg_county ------------+-----------+----------------------- 25001010100 | 2942.000 | 3787.5087719298245614 25001010206 | 2750.000 | 3787.5087719298245614 : 25003900100 | 3389.000 | 3364.5897435897435897 25003900200 | 4449.000 | 3364.5897435897435897 25003900300 | 2903.000 | 3364.5897435897435897 :
The left
function was introduced in
PostgreSQL 9.1. If you are using a lower version, you can use
substring
instead.
Window functions also allow an ORDER BY
clause.
Without getting too abstruse, the best way to think about this is that
all the rows in the window will be ordered and the window function will
only consider rows from the first row to the current row. The classic
example uses the ROW_NUMBER()
function, which is
found in all databases supporting window functions. It sequentially
numbers rows based on some ordering and or partition. In Example 7-6, we demonstrate how to number our
census tracts in alphabetical order.
Example 7-6. Number alphabetically
SELECT ROW_NUMBER() OVER(ORDER BY tract_name) As rnum, tract_name FROM census.lu_tracts ORDER BY rnum LIMIT 4;
rnum | tract_name -----+-------------------------------------------------- 1 | Census Tract 1, Suffolk County, Massachusetts 2 | Census Tract 1001, Suffolk County, Massachusetts 3 | Census Tract 1002, Suffolk County, Massachusetts 4 | Census Tract 1003, Suffolk County, Massachusetts
You can combine ORDER BY
with
PARTITION BY
. Doing so will restart the ordering for
each partition. We return to our example of county codes.
Example 7-7. Partition our window and ordering by value
SELECT tract_id, val , AVG(val) OVER (PARTITION BY left(tract_id,5) ORDER BY val) As avg_county_ordered FROM census.facts WHERE fact_type_id = 86 ORDER BY left(tract_id,5), val;
tract_id | val | avg_county_ordered -------------+-----------+------------------------ 25001990000 | 0.000 | 0.00000000000000000000 25001014100 | 1141.000 | 570.5000000000000000 25001011700 | 1877.000 | 1006.0000000000000000 25001010208 | 2003.000 | 1255.2500000000000000 : 25003933200 | 1288.000 | 1288.0000000000000000 25003934200 | 1306.000 | 1297.0000000000000000 25003931300 | 1444.000 | 1346.0000000000000000 25003933300 | 1509.000 | 1386.7500000000000000 :
The key observation with output is to notice how the average
changes from row to row. The ORDER BY clause means that the average will
only be taken from the beginning of the partititon to the current row.
For instance, if your row is in the 5th row in the 3rd partition, the
average will only cover the first five rows in the 3rd partition. We put
an ORDER BY left(tract_id,5), val
at the end of the query
so you could easily see the pattern, but keep in mind that the
ORDER BY
of the query is independent of the
ORDER BY
in each OVER
. You can
explicitly control the rows under consideration within a frame by
explicitly putting in a RANGE
or
ROWS
clause of the form ROWS BETWEEN CURRENT
ROW AND 5 FOLLOWING
. For more details, we ask that you refer to
SQL
SELECT official documentation.
PostgreSQL also supports window naming which is useful if you have
the same window for each of your window columns. In Example 7-8, we demonstrate how to define
named windows as well as showing a record value before and after for a
given window frame, using the LEAD
and LAG
standard ANSI window functions.
Example 7-8. Named windows and lead lag
SELECT * FROM (SELECT ROW_NUMBER() OVER wt As rnum , substring(tract_id,1, 5) As county_code, tract_id , LAG(tract_id,2) OVER wt As tract_2_before , LEAD(tract_id) OVER wt As tract_after FROM census.lu_tracts WINDOW wt AS (PARTITION BY substring(tract_id,1, 5) ORDER BY tract_id ) ) As foo WHERE rnum BETWEEN 2 and 3 AND county_code IN('25007', '25025') ORDER BY county_code, rnum;
rnum | county_code | tract_id | tract_2_before | tract_after -----+-------------+-------------+----------------+------------- 2 | 25007 | 25007200200 | | 25007200300 3 | 25007 | 25007200300 | 25007200100 | 25007200400 2 | 25025 | 25025000201 | | 25025000202 3 | 25025 | 25025000202 | 25025000100 | 25025000301
Both LEAD()
and LAG()
take
an optional step argument that defines how many to skip forward or
backward; the step can be positive or negative. Also
LEAD()
and LAG()
will return NULL
when trying to retrieve rows outside the window partition. This is a
possibility that you always have to account for when applying these two
functions.
Before leaving the discussion on window functions, we must mention
that in PostgreSQL, any aggregate function you create can be used as a
window function. Other databases tend to limit window functions to using
built-in aggregates like AVG()
,
SUM()
, MIN()
,
MAX()
etc.
In its essence, common table expressions (CTE
)
allow you to define a query that can be reused in a larger query.
PostgreSQL has supported this feature since PostgreSQL 8.4 and expanded
the feature in 9.1 with the introduction of writeable CTEs. You’ll find a
similar feature in SQL Server 2005+, Oracle 11 (Oracle 10 and below
implemented this features using CORRESPONDING BY
), IBM DB2,
and Firebird. This features doesn’t exist in MySQL of any version. There
are three different ways to use CTEs:
The standard non-recursive, non-writable CTE. This is your unadorned CTE used for the purpose of readability of your SQL or to encourage the planner to materialize an expensive sub result for better performance.
Writeable CTEs. This is an extension of the standard CTE with
UPDATE
and INSERT
constructs.
Common use is to delete rows and then return rows that have been
deleted.
The recursive CTE. This put an entirely new whirl on standard CTE. With recursive CTEs, the rows returned by the CTE actually varies during the execution of the query. PostgreSQL allows you to have a CTE that is both updatable and recursive.
Your basic CTE construct looks as shown in Example 7-9.
Example 7-9. Basic CTE
WITH cty_with_tot_tracts AS ( SELECT tract_id, substring(tract_id,1, 5) As county_code , COUNT(*) OVER(PARTITION BY substring(tract_id,1, 5)) As cnt_tracts FROM census.lu_tracts) SELECT MAX(tract_id) As last_tract, county_code, cnt_tracts FROM cty_with_tot_tracts WHERE cnt_tracts > 100 GROUP BY county_code, cnt_tracts;
You can stuff as many table expressions as you want in a
WITH
clause, just be sure to separate each by a comma. The
order of the CTEs matter in that CTEs defined later can use CTEs defined
earlier, but never vice versa.
Example 7-10. CTE with more than one table expression
WITH cty_with_tot_tracts AS ( SELECT tract_id, substring(tract_id,1, 5) As county_code , COUNT(*) OVER(PARTITION BY substring(tract_id,1, 5)) As cnt_tracts FROM census.lu_tracts) , cty AS (SELECT MAX(tract_id) As last_tract , county_code, cnt_tracts FROM cty_with_tot_tracts WHERE cnt_tracts < 8 GROUP BY county_code, cnt_tracts) SELECT cty.last_tract, f.fact_type_id, f.val FROM census.facts As f INNER JOIN cty ON f.tract_id = cty.last_tract;
The writeable CTE was introduced in 9.1 and extends the CTE to allow for update, delete, insert statements. We’ll revisit our logs tables that we created in Example 6-2. We’ll add another child table and populate it.
CREATE TABLE logs_2011_01_02(PRIMARY KEY(log_id) , CONSTRAINT chk_y2011_01_02 CHECK(log_ts >= '2011-01-01' AND log_ts < '2011-03-01')) INHERITS (logs_2011);
In Example 7-11, we’ll move data from our parent 2011 table to our new child Jan-Feb 2011 table.
Example 7-11. Writeable CTE moves data from one branch to another
t1 AS (DELETE FROM ONLY logs_2011 WHERE log_ts < '2011-03-01' RETURNING *) INSERT INTO logs_2011_01_02 SELECT * FROM t1;
A common use case for the writeable CTE is for repartioning of data in one step. Examples of this and other writeable CTEs are covered in David Fetter’s Writeable CTEs, The Next Big Thing.
The official documentation for PostgreSQL describes it best: The
optional RECURSIVE
modifier changes CTE from a mere
syntactic convenience into a feature that accomplishes things not
otherwise possible in standard SQL. A more interesting CTE is one that
uses a recursively defining construct to build an expression. PostgreSQL
recursive CTEs utilize UNION ALL
. To turn a basic CTE
to a recursive one, add the RECURSIVE
modifier after the
WITH
. Within a WITH RECURSIVE
, you can have a
mix of recursive and non-recursive table expressions. In most other
databases, the RECURSIVE
keyword is not necessary to denote
recursion. A common of recursive CTEs is for message threading and other
tree like structures. We have an example of this in Recursive
CTE to Display Tree Structures.
Here is an example that lists all the table relationships we have in our database:
Example 7-12. Recursive CTE
WITH RECURSIVE tbls AS ( SELECT c.oid As tableoid, n.nspname AS schemaname , c.relname AS tablename FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace LEFT JOIN pg_inherits As th ON th.inhrelid = c.oid WHERE th.inhrelid IS NULL AND c.relkind = 'r'::"char" AND c.relhassubclass = true UNION ALL SELECT c.oid As tableoid, n.nspname AS schemaname , tbls.tablename || '->' || c.relname AS tablename FROM tbls INNER JOIN pg_inherits As th ON th.inhparent = tbls.tableoid INNER JOIN pg_class c ON th.inhrelid = c.oid LEFT JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace ) SELECT * FROM tbls ORDER BY tablename;
tableoid | schemaname | tablename ----------+------------+---------------------------------- 3152249 | public | logs 3152260 | public | logs->logs_2011 3152272 | public | logs->logs_2011->logs_2011_01_02
Get list of all tables that have child tables but have no parent table.
This is the recursive part; gets all children of tables in
tbls
.
Child table name starts with the ancestral tree name.
Return parents and all child tables. Since sorting by table name which has parent prefix appended, all child tables will follow their parents.
Although PostgreSQL is fairly ANSI-SQL compliant, it does have a few unique constructs you probably won’t find in other databases. Many are simply shortcuts without which you’d have to write subqueries to achieve the same results. In this regard, if you opt to stick with ANSI-SQL compliance, simply avoid these shorthands.
One of our favorites is the DISTINCT ON
clause. It
behaves like an SQL DISTINCT
, except that it allows
you to define what columns to consider distinct, and in the case of the
remaining columns, an order to designate the preferred one. This one
little word replaces numerous lines of additional code necessary to
achieve the same result.
In Example 7-13, we demonstrate how to get the details of the first tract for each county.
Example 7-13. DISTINCT ON
SELECT DISTINCT ON(left(tract_id, 5)) left(tract_id, 5) As county , tract_id, tract_name FROM census.lu_tracts ORDER BY county, tract_id LIMIT 5;
county | tract_id | tract_name --------+-------------+---------------------------------------------------- 25001 | 25001010100 | Census Tract 101, Barnstable County, Massachusetts 25003 | 25003900100 | Census Tract 9001, Berkshire County, Massachusetts 25005 | 25005600100 | Census Tract 6001, Bristol County, Massachusetts 25007 | 25007200100 | Census Tract 2001, Dukes County, Massachusetts 25009 | 25009201100 | Census Tract 2011, Essex County, Massachusetts
The ON
modifier can take on multiple columns, all
will be considered to determine uniqueness. Finally, the ORDER
BY
clause has to start with the set of columns in the
DISTINCT ON
, then you can follow with your preferred
ordering.
LIMIT
and OFFSET
are clauses in your
query to limit the number of rows returned. They can be used in tandem
or separately. These constructs are not unique to PostgreSQL and are in
fact copied from MySQL. You’ll find it in MySQL and SQLite and probably
various other databases. SQL Server adopted something similar in its
2012 version with a slightly different naming. An OFFSET
of
zero is the same as leaving out the clause entirely. A positive offset
means start the output after skipping the number of rows specified by
the offset. You’ll usually use these two clauses in conjuction with an
ORDER BY
clause. In Example 7-13, we demonstrate with a positive
offset.
Example 7-14. First tract for counties 2 to 5
SELECT DISTINCT ON(left(tract_id, 5)) left(tract_id, 5) As county, tract_id, tract_name FROM census.lu_tracts ORDER BY county, tract_id LIMIT 3 OFFSET 2;
county | tract_id | tract_name --------+-------------+-------------------------------------------------- 25005 | 25005600100 | Census Tract 6001, Bristol County, Massachusetts 25007 | 25007200100 | Census Tract 2001, Dukes County, Massachusetts 25009 | 25009201100 | Census Tract 2011, Essex County, Massachusetts
ANSI-SQL specs define a construct called CAST
, which
allows you to cast one data type to another. For example,
CAST('2011-10-11' AS date)
will cast the text
2011-10-11
to a date. PostgreSQL has a shorthand for doing
this using a pair of colons as in '2011-10-11'::date
. If
you don’t care about being coss-database agnostic, the PostgreSQL syntax
is easier to write, especially when chaining casts like
somexml::text::integer
for cases where you can’t directly
cast from one type to another without going through an intermediary
type.
PostgreSQL is case sensitive, similar to Oracle. However, it does
have mechanisms in place to do a case insensitive search. You can apply
the UPPER()
function to both sides of the
ANSI-compliant LIKE
operator, or you can simply use the
ILIKE
operator found only in PostgreSQL. Here is an
example:
SELECT tract_name FROM census.lu_tracts WHERE tract_name ILIKE '%duke%';
which produces:
tract_name ------------------------------------------------ Census Tract 2001, Dukes County, Massachusetts Census Tract 2002, Dukes County, Massachusetts Census Tract 2003, Dukes County, Massachusetts Census Tract 2004, Dukes County, Massachusetts Census Tract 9900, Dukes County, Massachusetts
PostgreSQL allows functions that return sets to appear in the
SELECT
clause of an SQL statement. This is not true
of many other databases where only scalar functions may appear in the
SELECT
. In fact, to circumvent the restriction, SQL
Server 2005+ introduced a CROSS APPLY command. The PostgreSQL solution
is much cleaner, but we advise you to use this freedom responsibly.
Interweaving set returning functions inside an already complicated query
could easily produce results that are beyond what you expect, since
using set returning functions usually results in row creation or
deletion. You must anticipate this if you’ll be using the results as a
subquery. In Example 7-15, we
demonstrate this with a temporal version of generate_series.
We will use a table we construct with the following:
CREATE TABLE interval_periods(i_type interval); INSERT INTO interval_periods(i_type) VALUES ('5 months'), ('132 days'), ('4862 hours'),
Example 7-15. Set returning function in SELECT
SELECT i_type , generate_series('2012-01-01'::date,'2012-12-31'::date,i_type) As dt FROM interval_periods;
i_type | dt -----------+------------------------ 5 months | 2012-01-01 00:00:00-05 5 months | 2012-06-01 00:00:00-04 5 months | 2012-11-01 00:00:00-04 132 days | 2012-01-01 00:00:00-05 132 days | 2012-05-12 00:00:00-04 132 days | 2012-09-21 00:00:00-04 4862 hours | 2012-01-01 00:00:00-05 4862 hours | 2012-07-21 15:00:00-04
When you query from a table that has child tables, the query
drills down, unionizing all the child records satisfying the query
condition. DELETE
and UPDATE
work
the same way, drilling down the hierarchy for victims. Sometimes this is
not desirable and you want data to only come from the table you
specified without the kids tagging along. This is where the
ONLY
keyword comes in handy. We saw an example of its use
in Example 7-11, where we only wanted to
delete records from the y2011
table that weren’t already
migrated to the logs_2011_01_02
table. Without the
ONLY
modifier, we’d end up deleting records from the child
table that might have been moved previously.
The RETURNING
clause is supported by ANSI-SQL
standards, but not found in many databases. We saw an example of it in
Example 7-11, where we returned the records
deleted. RETURNING can also be used for INSERT
and
UPDATE
. For inserts into tables with serial keys, it is
particularly handy since it returns you the key value of the new row(s).
Though RETURNING is often accompanied by *
for all
fields, you can limit the fields as we do in Example 7-16.
Example 7-16. RETURNING changed records of an UPDATE
UPDATE census.lu_fact_types AS f SET short_name = Replace(Replace(Lower(f.fact_subcats[4]),' ','_'),':','') WHERE f.fact_subcats[3] = 'Hispanic or Latino:' AND f.fact_subcats[4] > '' RETURNING fact_type_id, short_name;
fact_type_id | short_name --------------+------------------------------------------------- 96 | white_alone 97 | black_or_african_american_alone 98 | american_indian_and_alaska_native_alone 99 | asian_alone 100 | native_hawaiian_and_other_pacific_islander_alone 101 | some_other_race_alone 102 | two_or_more_races
Composites provide a lot of flexibility to PostgreSQL. The first time you see a query with composites, you might be surprised. In fact, you might come across their versatility by accident when making a typo in an SQL statement. Try the following query:
SELECT X FROM census.lu_fact_types As X LIMIT 2;
At first glance, you might think that we left out a
.*
by accident, but check out the result:
x ------------------------------------------------------------------ (86,Population,"{D001,Total:}",d001) (87,Population,"{D002,Total:,""Not Hispanic or Latino:""}",d002)
Recall from an earlier section All Tables Are Custom where we demonstrated that
PostgreSQL automatically create composite types of all tables in
PostgreSQL. Instead of erroring out, our above example returns the
canonical representation of an lu_fact_type
object.
Looking at the first record: 86
is the fact_type_id,
Population
is the category, and {D001,Total:}
is the fact_subcats property, which happens to be an array in its own
right.
In addition to being able to output a row as a single object,
there are several functions that can take a composite or row as an
input. For example, you can feed a row into the
array_agg
, hstore
, and countless
other functions. If you are using PostgreSQL 9.2 or above, and are
building AJAX apps, you can take advantage of the built-in JavaScript Object Notation (JSON) support
and use a combination of array_agg
and
array_to_json
to output a whole query as a single
JSON object, as we demonstrate in Example 7-17.
Example 7-17. Query to JSON output
SELECT array_to_json(array_agg(f) ) As ajaxy_cats FROM (SELECT MAX(fact_type_id) As max_type, category FROM census.lu_fact_types GROUP BY category) As f;
This will give you an output of:
ajaxy_cats ---------------------------------------------------- [{"max_type":102,"category":"Population"}, {"max_type":153,"category":"Housing"}]
3.138.204.186