As with most databases, you can string a series of SQL statements together and treat them
as a unit. Different databases ascribe different names for this unit—stored procedures, modules,
macros, prepared statements, and so on. PostgreSQL calls them functions. Aside from simply
unifying various SQL statements, these units often add the capability to control the execution
of the SQL statements through using procedural language (PL). In PostgreSQL, you have your
choice of languages when it comes to writing functions. Often packaged along with binary
installers are SQL, C, PL/pgSQL, PL/Perl, PL/Python. In version 9.2, you’ll also find plv8js, which will allow
you to write procedural functions in JavaScript. plv8js
should be an exciting
addition to web developers and a nice companion to the built-in JSON type.
You can always install additional languages such as PL/R, PL/Java, PL/sh, and even experimental ones geared for high-end processing and AI, such as PL/Scheme or PgOpenCL. A list of available languages can be found here:Procedural Languages
Regardless which language you choose to write a particular function, they all share a similar structure.
Example 8-1. Basic Function Structure
CREATE OR REPLACE FUNCTIONfunc_name
(arg1
arg1_datatype
) RETURNSsome_type | setof sometype | TABLE (..)
AS $$BODY of function
$$ LANGUAGElanguage_of_function
Functional definitions can include additional qualifiers to optimize execution and to enforce security. We describe these below:
LANGUAGE
has to be one you have installed in your database. You
can get a list with the query: SELECT lanname FROM pg_language;
.
VOLATILITY
defaults to VOLATILE
if not specified.
It can be set to STABLE
, VOLATILE
, or IMMUTABLE
.
This setting gives the planner an idea if the results of a function can be cached.
STABLE
means that the function will return the same value for the same
inputs within the same query. VOLATILE
means the function may return
something different with each call, even with same inputs. Functions that change data or
are a function of other environment settings like time, should be marked as
VOLATILE
. IMMUTABLE
means given the same inputs
the function is guaranteed to return the same result. The volatility setting is merely a
hint to the query planner. It may choose to not cache if it concludes caching is less
cost effective than recomputation. However, if you mark a function as
VOLATILE
it will always recompute.
STRICT
. A function is assumed to be not strict unless adorned
with STRICT
. A strict function will always return NULL if any inputs
are NULL and doesn’t bother evaluating the function so saves some processing. When
building SQL functions, you should be careful about using STRICT
as
it will prevent index usage as described in STRICT on SQL Functions
COST
is a relative measure of computational intensiveness. SQL
and PL/pgSQL functions default to 100
and C functions to 1
.
This affects the order functions will be evaluated in a WHERE
clause and
also the likeliness of caching. The higher the value, the more costly the function is
assumed to be.
ROWS
is only set for set returning functions and is an estimate
of how many rows will be returned; used by planner to arrive at best strategy. Format
would be ROWS 100
.
SECURITY DEFINER
is an optional clause that means run under the
context of the owner of the function. If left out, a function runs under the context of
the user running the function. This is useful for giving people rights to update a table
in a function but not direct update access to a table and also tasks that normally
require super user rights.
Function languages can be divided into two levels of trust. Many—but not all—languages offer both a trusted and untrusted version.
Trusted
—Trusted languages are languages that don’t have access
to the filesystem beyond the data cluster and can’t execute OS commands. They can be
created by any user. Languages like SQL, PL/pgSQL, PL/Perl are trusted. It basically
means they can’t do damage to the underlying OS.
Untrusted
—Untrusted languages are those that can interact with
the OS and even call on webservices or execute functions on the OS. Functions written in
these languages can only be created by super users, however, a super user can delegate
rights for another user to use them by using the SECURITY DEFINER
setting.
By convention, these languages have a u
at the end of the name to denote
that they’re untrusted—for instance, PL/PerlU
,
PL/PythonU
.
Writing SQL functions[2] is fast and easy. Take your existing SQL statements, add a functional header and footer, and you’re done. The ease does mean you’ll sacrifice flexibility. You won’t have fancy control languages to create conditional execution branches. More restrictively, you can’t run dynamic SQL statements that you piece together based on parameters as you can in most other languages. On the positive side, SQL functions are often inlined by the query planner into the overall plan of a query since the planner can peek into the function. Functions in other languages are always treated as blackboxes. Inlining allows SQL functions to take advantage of indexes and collapse repetitive computations.
Your basic scalar returning SQL function is shown in Example 8-2:
Example 8-2. SQL function to return key of inserted record
CREATE OR REPLACE FUNCTION ins_logs(param_user_name varchar, param_description text) RETURNS integer AS $$ INSERT INTO logs(user_name, description) VALUES($1, $2) RETURNING log_id; $$ LANGUAGE 'sql' VOLATILE;
To call function Example 8-2, we would execute:
SELECT ins_logs('lhsu', 'this is a test') As new_id;
Similarly, you can update data with an SQL function and return a scalar or void as shown in Example 8-3.
Example 8-3. SQL function to update a record
CREATE OR REPLACE FUNCTION upd_logs(log_id integer, param_user_name varchar, param_description text) RETURNS void AS $$ UPDATE logs SET user_name = $2, description = $3, log_ts = CURRENT_TIMESTAMP WHERE log_id = $1;$$ LANGUAGE 'sql' VOLATILE;
To execute:
SELECT upd_logs(12,'robe', 'Change to regina'),
Prior to 9.2, SQL functions could only use the ordinal position of the input arguments
in the body of the function. After 9.2, you have the option of using named arguments, for
example you can write param_1
, param_2
instead of $1
,
$2
. SQL functions are the only ones that retained this limitation until
now.
Functions in almost all languages can return sets. SQL functions can also return sets.
There are three common approaches of doing this, using ANSI-SQL standard RETURNS
TABLE
syntax, using OUT
parameters, or returning a composite
data type. The RETURNS TABLE
approach requires PostgreSQL 8.3 or above, but
is closer to what you’ll see in other relational databases. In Example 8-4, we demonstrate how to write the same function in three
different ways.
Example 8-4. Examples of function returning sets
Using returns table:
CREATE FUNCTION sel_logs_rt(param_user_name varchar) RETURNS TABLE (log_id int, user_name varchar(50), description text, log_ts timestamptz) AS $$ SELECT log_id, user_name, description, log_ts FROM logs WHERE user_name = $1; $$ LANGUAGE 'sql' STABLE;
Using OUT parameters:
CREATE FUNCTION sel_logs_out(param_user_name varchar, OUT log_id int , OUT user_name varchar, OUT description text, OUT log_ts timestamptz) RETURNS SETOF record AS $$ SELECT * FROM logs WHERE user_name = $1; $$ LANGUAGE 'sql' STABLE;
Using composite type:
CREATE FUNCTION sel_logs_so(param_user_name varchar) RETURNS SETOF logs AS $$ SELECT * FROM logs WHERE user_name = $1; $$ LANGUAGE 'sql' STABLE;
All functions in Example 8-4 can be called using:
SELECT * FROM sel_logs_rt('lhsu'),
When your functional needs exceed reaches beyond SQL, PL/pgSQL is the most common
option. PL/pgSQL stands apart from SQL in that you can declare local variables using
DECLARE
, you can have control flow, and the body of the function needs
be enclosed in a BEGIN..END
block. To demonstrate the difference, we have
rewritten Example 8-4 as a PL/pgSQL function.
Example 8-5. Function to return a table using PL/pgSQL
CREATE FUNCTION sel_logs_rt(param_user_name varchar) RETURNS TABLE (log_id int, user_name varchar(50), description text, log_ts timestamptz) AS $$ BEGIN RETURN QUERY SELECT log_id, user_name, description, log_ts FROM logs WHERE user_name = param_user_name; END; $$ LANGUAGE 'plpgsql' STABLE;
Python is a slick language with a vast number of available libraries. PostgreSQL is the only database we know of that’ll let you compose functions using Python. PostgreSQL 9.0+ supports both Python 2 and Python 3.
You can have both plpython2u
andplpython3u
installed in the same database, but you can’t use them in the same session. This means that
you can’t write a query that contains both plpython2u
and
plpython3u
-written functions.
In order to use PL/Python, you first need to install Python on your server. For Windows and Mac OS, Python installers are available at http://www.python.org/download/. For Linux/Unix systems, Python binaries are usually available via the various distros. For details, refer to PL/Python. After you have Python on your server, proceed to install the PostgreSQL Python extension using the commands below:
CREATE EXTENSION plpython2u; CREATE EXTENSION plpython3u;
(You will find a third extension called plpythonu
, which is an alias
for plpython2u
and intended for backwards compatibility.) Make sure you have
Python properly running on your server before attempting to install the extension or else you
will run into errors.
You should install a minor version of Python that matches what your
plpythonu
extensions were compiled against. For example, if your
plpython2u
is compiled against 2.7, then you’ll need to install Python
2.7.
PostgreSQL automatically converts PostgreSQL datatypes to Python datatypes and back. PL/Python is capable of returning arrays and even composite types. You can use PL/Python to write triggers and create aggregate functions. We’ve demonstrated some of these on the Postgres OnLine Journal, in PL/Python Examples.
Python allows you to perform feats that aren’t possible in PL/pgSQL. In Example 8-6, we demonstrate how to write a PL/Python function that does a text search of the online PostgreSQL document resource site.
Example 8-6. Searching PostgreSQL docs using PL/Python
CREATE OR REPLACE FUNCTION postgresql_help_search(param_search text) RETURNS text AS $$ import urllib, re response = urllib.urlopen('http://www.postgresql.org/search/?u=%2Fdocs%2Fcurrent%2F&q=' + param_search) raw_html = response.read() result = raw_html[raw_html.find("<!-- docbot goes here -->"):raw_html.find("<!-- pgContentWrap -->") - 1] result = re.sub('<[^<]+?>', '', result).strip() return result $$ LANGUAGE plpython2u SECURITY DEFINER STABLE;
Import the libraries we’ll be using.
Web search concatenating user input parameters.
Read response and save html to a variable called raw_html
.
Save the part of the raw_html
that starts with <!-- docbot goes
here -->
and ends just before the beginning of <!-- pgContentWrap -->
.
Strip HTML and white space from front and back and then re-save back to variable
called result
.
Return final result
.
Calling Python functions is no different than functions written in other languages. In Example 8-7, we use the function we created in Example 8-6 to output the result with three search terms.
Example 8-7. Using Python function in a query
SELECT search_term, left(postgresql_help_search(search_term), 125) As result FROM (VALUES ('regexp_match'), ('pg_trgm'), ('tsvector')) As X(search_term);
search_term | result regexp_match | Results 1-7 of 7. 1. PostgreSQL: Documentation: Manuals: Pattern Matching [1.46] ...matching a POSIX regular pg_trgm|Results 1-8 of 8. 1. PostgreSQL: Documentation: Manuals: pg_trgm [0.66] ...pg_trgm The pg_trgm module provide tsvector | Results 1-20 of 32. Result pages: 1 2 Next 1. PostgreSQL: Documentation: Manuals: Text Search Functions (3 rows)
Recall that PL/Python is an untrusted language without a trusted counterpart. This means it’s capable of interacting with the filesystem of the OS and a function can only be created by super users. Our next example uses PL/Python to retrieve file listings from a directory. Keep in mind that PL/Python function runs under the context of the postgres user account, so you need to be sure that account has adequate access to the relevant directories.
Example 8-8. List files in directories
CREATE OR REPLACE FUNCTION list_incoming_files() RETURNS SETOF text AS $$ import os return os.listdir('/incoming') $$ LANGUAGE 'plpython2u' VOLATILE SECURITY DEFINER;
You can run the function in Example 8-8 with the query below:
SELECT filename FROM list_incoming_files() As filename WHERE filename ILIKE '%.csv'
No database of merit should be without triggers to automatically detect and handle
changes in data. Triggers can be added to both tables and views. PostgreSQL offers both
statement-level triggers and row-level triggers. Statement triggers run once per statement,
while row triggers run for each row called. For instance, suppose we execute an
UPDATE
command that affects 1,500 rows. A statement-level trigger will fire
only once, whereas the row-level trigger can fire up to 1,500 times. More distinction is made
between a BEFORE
, AFTER
, and INSTEAD OF
trigger. A
BEFORE
trigger fires prior to the execution of the command giving you a chance
to cancel and change data before it changes data. An AFTER
trigger fires
afterwards giving you a chance to retrieve revised data values. AFTER
triggers
are often used for logging or replication purposes. The INSTEAD OF
triggers run
instead of the normal action. INSTEAD OF
triggers can only be used with views.
BEFORE
and AFTER
triggers can only be used with tables. To gain a
better understanding of the interplay between triggers and the underlying command, we refer
you to the official documentation Overview of Trigger Behavior. We demonstrated an example
of a view trigger in Example 7-2.
PostgreSQL offers specialized functions to handle triggers. These trigger functions act
just like any other function and have the same basic structure as your standard function.
Where they differ are in the input parameter and the output type. A trigger function never
takes a literal input argument though internally they have access to the trigger state data
and can modify it. They always output a datatype called a trigger
.
Because PostgreSQL trigger functions are just another function, you can reuse the same
trigger function across different triggers. This is usually not the case for other databases
where each trigger has its own non-reusable handler code. Each trigger must have exactly one
associated triggering function. To apply multiple triggering functions, you must create
multiple triggers against the same event. The alphabetical order of the trigger name
determines the order of firing and each trigger passes the revised trigger state data to the
next trigger in the list.
You can use almost any language to write trigger functions, with SQL being the notable exception. Our example below uses PL/pgSQL, which is by far the most common language for writing triggers. You will see that we take two steps: First, we write the trigger function. Next, we attach the trigger function to the appropriate trigger, a powerful extra step that decouples triggers from trigger functions.
Define the trigger function. This function can be used on any table that has a
upd_ts
column . It changes the value of the upd_ts
field of
the new record before returning. Trigger functions that change values of a row should only
be called in the BEFORE
event, because in the AFTER
event, all updates to the NEW
record will be ignored.
The trigger will fire before the record is committed.
This is a new feature introduced in PostgreSQL 9.0+ that allows us to limit the
firing of the trigger only if specified columns have changed. In prior versions, you
would do this in the trigger function itself using a series of comparison between
OLD.some_column
and NEW.some_column
. This feature is not
supported for INSTEAD OF
triggers.
Binds the trigger to the table.
Aggregates are another type of specialized function offered up by PostgreSQL. In many
other databases, you’re limited to ANSI-SQL aggregate functions such as MIN()
,
MAX()
, AVG()
, SUM()
, and COUNT()
. You
can define your own aggregates in PostgreSQL. Don’t forget that any aggregate function in
PostgreSQL can be used as a window function. Altogether this makes PostgreSQL the most
customizable database in existence today.
You can write aggregates in almost any language, SQL included. An aggregate is generally composed of one or more functions. It must have at least a state transition function to perform the computation and optional functions to manage initial and final states. You can use a different language for each of the functions should you choose. We have various examples of building aggregates using PL/pgSQL, PL/Python, and SQL in PostgreSQL Aggregates.
Regardless of which languages you code the aggregate, the glue that brings them all together looks the same for all and is of the form:
CREATE AGGREGATEmyagg
(datatype_of_input
) (SFUNC=state_function_name
, STYPE=state_type
, FINALFUNC=final_func_name
, INITCOND=optional_init_state_value
);
The final function is optional, but if specified must take as input the result of the
state function. The state function always takes as input the
datatype_of_input
and result of last state function call. The
initial condition is also optional. When present, it is used to initialize the state value.
Aggregates can be multi-column as well, as we describe in How to Create Multi-Column Aggregates.
Although SQL functions are the simplest of functions to write, you can still go pretty
far with them. In this section, we’ll demonstrate how to create a geometric mean aggregate
function with SQL. A geometric
mean is the nth root of a product of n positive numbers
((x1*x2*x3...Xn)^(1/n)
). It has various uses in finance, economics, and
statistics. A geometric mean may have more meaning than an arithmetic mean when the numbers
are of vastly different scales. A more suitable computational formula uses logarithm to
convert a multiplicative process to an additive one (EXP(SUM(LN(x))/n)
). We’ll be
using this method in our example.
For our geometric mean aggregate, we’ll use two functions: a state function to add the logs and a final exponential function to convert the logs back. We will also specify an initial condition of zero when we put everything together.
Example 8-10. Geometric mean aggregate: State function
CREATE OR REPLACE FUNCTION geom_mean_state(prev numeric[2], next numeric) RETURNS numeric[2] AS $$ SELECT CASE WHEN $2 IS NULL or $2 = 0 THEN $1 ELSE ARRAY[COALESCE($1[1],0) + ln($2), $1[2] + 1] END; $$ LANGUAGE sql IMMUTABLE;
Our transition state function, as shown in Example 8-10, takes two inputs: the previous state passed in as a one-dimensional array with two elements and also the next element in the aggregation process. If the next element is NULL or zero, the state function returns the prior state. Otherwise, it’ll return an array where the first element is the logarithmic sum and the second being the current count. We will need a final function that takes the sum from the state transition and divides by the count.
Example 8-11. Geometric mean aggregate: Final function
CREATE OR REPLACE FUNCTION geom_mean_final(numeric[2]) RETURNS numeric AS $$ SELECT CASE WHEN $1[2] > 0 THEN exp($1[1]/$1[2]) ELSE 0 END; $$ LANGUAGE sql IMMUTABLE;
Now we stitch all the pieces together in our aggregate definition. Note that our aggregate has an initial condition that is the same as what is returned by our state function.
Example 8-12. Geometric mean aggregate: Putting all the pieces together
CREATE AGGREGATE geom_mean(numeric) (SFUNC=geom_mean_state, STYPE=numeric[] , FINALFUNC=geom_mean_final, INITCOND='{0,0}'),
Let’s take our geom_mean()
function for a test drive. We’re going to
compute a heuristic rating for racial diversity and list the top five most racially diverse
counties in Massachusetts.
Example 8-13. Top five most racially diverse counties using geometric mean
SELECT left(tract_id,5) As county, geom_mean(val) As div_county FROM census.vw_facts WHERE category = 'Population' AND short_name != 'white_alone' GROUP BY county ORDER BY div_county DESC LIMIT 5;
county | div_county -------+--------------------- 25025 | 85.1549046212833364 25013 | 79.5972921427888918 25017 | 74.7697097102419689 25021 | 73.8824162064128504 25027 | 73.5955049035237656
Let’s put things into overdrive and try our new aggregate function as a window aggregate.
Example 8-14. Top five most racially diverse census tracts with average
WITH X AS (SELECT tract_id, left(tract_id,5) As county , geom_mean(val) OVER(PARTITION BY tract_id) As div_tract , ROW_NUMBER() OVER(PARTITION BY tract_id) As rn , geom_mean(val) OVER(PARTITION BY left(tract_id,5)) As div_county FROM census.vw_facts WHERE category = 'Population' AND short_name != 'white_alone') SELECT tract_id, county, div_tract, div_county FROM X WHERE rn = 1 ORDER BY div_tract DESC, div_county DESC LIMIT 5;
tract_id | county | div_tract | div_county ------------+--------+----------------------+--------------------- 25025160101 | 25025 | 302.6815688785928786 | 85.1549046212833364 25027731900 | 25027 | 265.6136902148147729 | 73.5955049035237656 25021416200 | 25021 | 261.9351057509603296 | 73.8824162064128504 25025130406 | 25025 | 260.3241378371627137 | 85.1549046212833364 25017342500 | 25017 | 257.4671462282508267 | 74.7697097102419689
3.137.217.220