Chapter 8. Writing Functions

As with most databases, you can string a series of SQL statements together and treat them as a unit. Different databases ascribe different names for this unit—stored procedures, modules, macros, prepared statements, and so on. PostgreSQL calls them functions. Aside from simply unifying various SQL statements, these units often add the capability to control the execution of the SQL statements through using procedural language (PL). In PostgreSQL, you have your choice of languages when it comes to writing functions. Often packaged along with binary installers are SQL, C, PL/pgSQL, PL/Perl, PL/Python. In version 9.2, you’ll also find plv8js, which will allow you to write procedural functions in JavaScript. plv8js should be an exciting addition to web developers and a nice companion to the built-in JSON type.

You can always install additional languages such as PL/R, PL/Java, PL/sh, and even experimental ones geared for high-end processing and AI, such as PL/Scheme or PgOpenCL. A list of available languages can be found here:Procedural Languages

Anatomy of PostgreSQL Functions

Function Basics

Regardless which language you choose to write a particular function, they all share a similar structure.

Example 8-1. Basic Function Structure

CREATE OR REPLACE FUNCTION func_name(
	arg1 arg1_datatype)
RETURNS some_type | setof sometype | TABLE (..) AS
$$
BODY of function
$$
LANGUAGE language_of_function

Functional definitions can include additional qualifiers to optimize execution and to enforce security. We describe these below:

  • LANGUAGE has to be one you have installed in your database. You can get a list with the query: SELECT lanname FROM pg_language;.

  • VOLATILITY defaults to VOLATILE if not specified. It can be set to STABLE, VOLATILE, or IMMUTABLE. This setting gives the planner an idea if the results of a function can be cached. STABLE means that the function will return the same value for the same inputs within the same query. VOLATILE means the function may return something different with each call, even with same inputs. Functions that change data or are a function of other environment settings like time, should be marked as VOLATILE. IMMUTABLE means given the same inputs the function is guaranteed to return the same result. The volatility setting is merely a hint to the query planner. It may choose to not cache if it concludes caching is less cost effective than recomputation. However, if you mark a function as VOLATILE it will always recompute.

  • STRICT. A function is assumed to be not strict unless adorned with STRICT. A strict function will always return NULL if any inputs are NULL and doesn’t bother evaluating the function so saves some processing. When building SQL functions, you should be careful about using STRICT as it will prevent index usage as described in STRICT on SQL Functions

  • COST is a relative measure of computational intensiveness. SQL and PL/pgSQL functions default to 100 and C functions to 1. This affects the order functions will be evaluated in a WHERE clause and also the likeliness of caching. The higher the value, the more costly the function is assumed to be.

  • ROWS is only set for set returning functions and is an estimate of how many rows will be returned; used by planner to arrive at best strategy. Format would be ROWS 100.

  • SECURITY DEFINER is an optional clause that means run under the context of the owner of the function. If left out, a function runs under the context of the user running the function. This is useful for giving people rights to update a table in a function but not direct update access to a table and also tasks that normally require super user rights.

Trusted and Untrusted Languages

Function languages can be divided into two levels of trust. Many—but not all—languages offer both a trusted and untrusted version.

  • Trusted—Trusted languages are languages that don’t have access to the filesystem beyond the data cluster and can’t execute OS commands. They can be created by any user. Languages like SQL, PL/pgSQL, PL/Perl are trusted. It basically means they can’t do damage to the underlying OS.

  • Untrusted—Untrusted languages are those that can interact with the OS and even call on webservices or execute functions on the OS. Functions written in these languages can only be created by super users, however, a super user can delegate rights for another user to use them by using the SECURITY DEFINER setting. By convention, these languages have a u at the end of the name to denote that they’re untrusted—for instance, PL/PerlU, PL/PythonU.

Writing Functions with SQL

Writing SQL functions[2] is fast and easy. Take your existing SQL statements, add a functional header and footer, and you’re done. The ease does mean you’ll sacrifice flexibility. You won’t have fancy control languages to create conditional execution branches. More restrictively, you can’t run dynamic SQL statements that you piece together based on parameters as you can in most other languages. On the positive side, SQL functions are often inlined by the query planner into the overall plan of a query since the planner can peek into the function. Functions in other languages are always treated as blackboxes. Inlining allows SQL functions to take advantage of indexes and collapse repetitive computations.

Your basic scalar returning SQL function is shown in Example 8-2:

Example 8-2. SQL function to return key of inserted record

CREATE OR REPLACE FUNCTION ins_logs(param_user_name varchar, param_description text) 
 RETURNS integer AS
$$ INSERT INTO logs(user_name, description) VALUES($1, $2)
RETURNING log_id; $$
LANGUAGE 'sql' VOLATILE;

To call function Example 8-2, we would execute:

SELECT ins_logs('lhsu', 'this is a test') As new_id;

Similarly, you can update data with an SQL function and return a scalar or void as shown in Example 8-3.

Example 8-3. SQL function to update a record

CREATE OR REPLACE FUNCTION upd_logs(log_id integer, param_user_name varchar, param_description text) 
 RETURNS void AS
$$ UPDATE logs SET user_name = $2, description = $3, log_ts = CURRENT_TIMESTAMP 
 WHERE log_id = $1;$$
LANGUAGE 'sql' VOLATILE;

To execute:

SELECT upd_logs(12,'robe', 'Change to regina'),

Note

Prior to 9.2, SQL functions could only use the ordinal position of the input arguments in the body of the function. After 9.2, you have the option of using named arguments, for example you can write param_1, param_2 instead of $1, $2. SQL functions are the only ones that retained this limitation until now.

Functions in almost all languages can return sets. SQL functions can also return sets. There are three common approaches of doing this, using ANSI-SQL standard RETURNS TABLE syntax, using OUT parameters, or returning a composite data type. The RETURNS TABLE approach requires PostgreSQL 8.3 or above, but is closer to what you’ll see in other relational databases. In Example 8-4, we demonstrate how to write the same function in three different ways.

Example 8-4. Examples of function returning sets

Using returns table:

CREATE FUNCTION sel_logs_rt(param_user_name varchar)
RETURNS TABLE (log_id int, user_name varchar(50), description text, log_ts timestamptz) AS
$$
SELECT log_id, user_name, description, log_ts FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE;

Using OUT parameters:

CREATE FUNCTION sel_logs_out(param_user_name varchar, OUT log_id int
 , OUT user_name varchar, OUT description text, OUT log_ts timestamptz)
RETURNS SETOF record AS
$$
SELECT * FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE;

Using composite type:

CREATE FUNCTION sel_logs_so(param_user_name varchar)
RETURNS SETOF logs AS
$$
SELECT * FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE;

All functions in Example 8-4 can be called using:

SELECT * FROM sel_logs_rt('lhsu'),

Writing PL/pgSQL Functions

When your functional needs exceed reaches beyond SQL, PL/pgSQL is the most common option. PL/pgSQL stands apart from SQL in that you can declare local variables using DECLARE, you can have control flow, and the body of the function needs be enclosed in a BEGIN..END block. To demonstrate the difference, we have rewritten Example 8-4 as a PL/pgSQL function.

Example 8-5. Function to return a table using PL/pgSQL

CREATE FUNCTION sel_logs_rt(param_user_name varchar)
RETURNS TABLE (log_id int, user_name varchar(50), description text, log_ts timestamptz) AS
$$
BEGIN
	RETURN QUERY
    SELECT log_id, user_name, description, log_ts 
      FROM logs WHERE user_name = param_user_name;
END;
$$
LANGUAGE 'plpgsql' STABLE;

Writing PL/Python Functions

Python is a slick language with a vast number of available libraries. PostgreSQL is the only database we know of that’ll let you compose functions using Python. PostgreSQL 9.0+ supports both Python 2 and Python 3.

Note

You can have both plpython2u andplpython3u installed in the same database, but you can’t use them in the same session. This means that you can’t write a query that contains both plpython2u and plpython3u-written functions.

In order to use PL/Python, you first need to install Python on your server. For Windows and Mac OS, Python installers are available at http://www.python.org/download/. For Linux/Unix systems, Python binaries are usually available via the various distros. For details, refer to PL/Python. After you have Python on your server, proceed to install the PostgreSQL Python extension using the commands below:

CREATE EXTENSION plpython2u;
CREATE EXTENSION plpython3u;

(You will find a third extension called plpythonu, which is an alias for plpython2u and intended for backwards compatibility.) Make sure you have Python properly running on your server before attempting to install the extension or else you will run into errors.

You should install a minor version of Python that matches what your plpythonu extensions were compiled against. For example, if your plpython2u is compiled against 2.7, then you’ll need to install Python 2.7.

Basic Python Function

PostgreSQL automatically converts PostgreSQL datatypes to Python datatypes and back. PL/Python is capable of returning arrays and even composite types. You can use PL/Python to write triggers and create aggregate functions. We’ve demonstrated some of these on the Postgres OnLine Journal, in PL/Python Examples.

Python allows you to perform feats that aren’t possible in PL/pgSQL. In Example 8-6, we demonstrate how to write a PL/Python function that does a text search of the online PostgreSQL document resource site.

Example 8-6. Searching PostgreSQL docs using PL/Python

CREATE OR REPLACE FUNCTION postgresql_help_search(param_search text)
RETURNS text AS 
$$
import urllib, re  1
response = urllib.urlopen('http://www.postgresql.org/search/?u=%2Fdocs%2Fcurrent%2F&q=' + param_search)   2
raw_html = response.read() 3
result = raw_html[raw_html.find("<!-- docbot goes here -->"):raw_html.find("<!-- pgContentWrap -->") - 1]   4
result = re.sub('<[^<]+?>', '', result).strip()  5
return result 6
$$
LANGUAGE plpython2u SECURITY DEFINER STABLE;
1

Import the libraries we’ll be using.

2

Web search concatenating user input parameters.

3

Read response and save html to a variable called raw_html.

4

Save the part of the raw_html that starts with <!-- docbot goes here --> and ends just before the beginning of <!-- pgContentWrap -->.

5

Strip HTML and white space from front and back and then re-save back to variable called result.

6

Return final result.

Calling Python functions is no different than functions written in other languages. In Example 8-7, we use the function we created in Example 8-6 to output the result with three search terms.

Example 8-7. Using Python function in a query

SELECT search_term, left(postgresql_help_search(search_term), 125) As result FROM (VALUES ('regexp_match'), ('pg_trgm'), ('tsvector')) As X(search_term);
search_term  |  result
regexp_match  |  Results 1-7 of 7.
    1. PostgreSQL: Documentation: Manuals: Pattern Matching [1.46]
    ...matching a POSIX regular
pg_trgm|Results 1-8 of 8.


    1. PostgreSQL: Documentation: Manuals: pg_trgm [0.66]
    ...pg_trgm
The
pg_trgm
module provide
tsvector  |  Results 1-20 of 32.
   Result pages: 1 2 Next
    1. PostgreSQL: Documentation: Manuals: Text Search Functions
(3 rows)

Recall that PL/Python is an untrusted language without a trusted counterpart. This means it’s capable of interacting with the filesystem of the OS and a function can only be created by super users. Our next example uses PL/Python to retrieve file listings from a directory. Keep in mind that PL/Python function runs under the context of the postgres user account, so you need to be sure that account has adequate access to the relevant directories.

Example 8-8. List files in directories

CREATE OR REPLACE FUNCTION list_incoming_files()
RETURNS SETOF text AS
$$
import os
return os.listdir('/incoming')
$$
LANGUAGE 'plpython2u' VOLATILE SECURITY DEFINER;

You can run the function in Example 8-8 with the query below:

SELECT filename FROM list_incoming_files() As filename WHERE filename ILIKE '%.csv'

Trigger Functions

No database of merit should be without triggers to automatically detect and handle changes in data. Triggers can be added to both tables and views. PostgreSQL offers both statement-level triggers and row-level triggers. Statement triggers run once per statement, while row triggers run for each row called. For instance, suppose we execute an UPDATE command that affects 1,500 rows. A statement-level trigger will fire only once, whereas the row-level trigger can fire up to 1,500 times. More distinction is made between a BEFORE, AFTER, and INSTEAD OF trigger. A BEFORE trigger fires prior to the execution of the command giving you a chance to cancel and change data before it changes data. An AFTER trigger fires afterwards giving you a chance to retrieve revised data values. AFTER triggers are often used for logging or replication purposes. The INSTEAD OF triggers run instead of the normal action. INSTEAD OF triggers can only be used with views. BEFORE and AFTER triggers can only be used with tables. To gain a better understanding of the interplay between triggers and the underlying command, we refer you to the official documentation Overview of Trigger Behavior. We demonstrated an example of a view trigger in Example 7-2.

PostgreSQL offers specialized functions to handle triggers. These trigger functions act just like any other function and have the same basic structure as your standard function. Where they differ are in the input parameter and the output type. A trigger function never takes a literal input argument though internally they have access to the trigger state data and can modify it. They always output a datatype called a trigger. Because PostgreSQL trigger functions are just another function, you can reuse the same trigger function across different triggers. This is usually not the case for other databases where each trigger has its own non-reusable handler code. Each trigger must have exactly one associated triggering function. To apply multiple triggering functions, you must create multiple triggers against the same event. The alphabetical order of the trigger name determines the order of firing and each trigger passes the revised trigger state data to the next trigger in the list.

You can use almost any language to write trigger functions, with SQL being the notable exception. Our example below uses PL/pgSQL, which is by far the most common language for writing triggers. You will see that we take two steps: First, we write the trigger function. Next, we attach the trigger function to the appropriate trigger, a powerful extra step that decouples triggers from trigger functions.

Example 8-9. Trigger function to timestamp new and changed records

CREATE OR REPLACE FUNCTION trig_time_stamper() RETURNS trigger AS 1
$$
BEGIN
    NEW.upd_ts := CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$
LANGUAGE plpgsql VOLATILE;

CREATE TRIGGER trig_1
BEFORE INSERT OR UPDATE 2 OF session_state, session_id 3
 ON web_sessions 4
 FOR EACH ROW
 EXECUTE PROCEDURE trig_time_stamper();
1

Define the trigger function. This function can be used on any table that has a upd_ts column . It changes the value of the upd_ts field of the new record before returning. Trigger functions that change values of a row should only be called in the BEFORE event, because in the AFTER event, all updates to the NEW record will be ignored.

2

The trigger will fire before the record is committed.

3

This is a new feature introduced in PostgreSQL 9.0+ that allows us to limit the firing of the trigger only if specified columns have changed. In prior versions, you would do this in the trigger function itself using a series of comparison between OLD.some_column and NEW.some_column. This feature is not supported for INSTEAD OF triggers.

4

Binds the trigger to the table.

Aggregates

Aggregates are another type of specialized function offered up by PostgreSQL. In many other databases, you’re limited to ANSI-SQL aggregate functions such as MIN(), MAX(), AVG(), SUM(), and COUNT(). You can define your own aggregates in PostgreSQL. Don’t forget that any aggregate function in PostgreSQL can be used as a window function. Altogether this makes PostgreSQL the most customizable database in existence today.

You can write aggregates in almost any language, SQL included. An aggregate is generally composed of one or more functions. It must have at least a state transition function to perform the computation and optional functions to manage initial and final states. You can use a different language for each of the functions should you choose. We have various examples of building aggregates using PL/pgSQL, PL/Python, and SQL in PostgreSQL Aggregates.

Regardless of which languages you code the aggregate, the glue that brings them all together looks the same for all and is of the form:

CREATE AGGREGATE myagg(datatype_of_input)
(SFUNC=state_function_name, STYPE=state_type, FINALFUNC=final_func_name, INITCOND=optional_init_state_value);

The final function is optional, but if specified must take as input the result of the state function. The state function always takes as input the datatype_of_input and result of last state function call. The initial condition is also optional. When present, it is used to initialize the state value. Aggregates can be multi-column as well, as we describe in How to Create Multi-Column Aggregates.

Although SQL functions are the simplest of functions to write, you can still go pretty far with them. In this section, we’ll demonstrate how to create a geometric mean aggregate function with SQL. A geometric mean is the nth root of a product of n positive numbers ((x1*x2*x3...Xn)^(1/n)). It has various uses in finance, economics, and statistics. A geometric mean may have more meaning than an arithmetic mean when the numbers are of vastly different scales. A more suitable computational formula uses logarithm to convert a multiplicative process to an additive one (EXP(SUM(LN(x))/n)). We’ll be using this method in our example.

For our geometric mean aggregate, we’ll use two functions: a state function to add the logs and a final exponential function to convert the logs back. We will also specify an initial condition of zero when we put everything together.

Example 8-10. Geometric mean aggregate: State function

CREATE OR REPLACE FUNCTION geom_mean_state(prev numeric[2], next numeric)
RETURNS numeric[2] AS
$$
SELECT CASE WHEN $2 IS NULL or $2 = 0 THEN $1 
  ELSE ARRAY[COALESCE($1[1],0) + ln($2), $1[2] + 1] END;
$$
LANGUAGE sql IMMUTABLE;

Our transition state function, as shown in Example 8-10, takes two inputs: the previous state passed in as a one-dimensional array with two elements and also the next element in the aggregation process. If the next element is NULL or zero, the state function returns the prior state. Otherwise, it’ll return an array where the first element is the logarithmic sum and the second being the current count. We will need a final function that takes the sum from the state transition and divides by the count.

Example 8-11. Geometric mean aggregate: Final function

CREATE OR REPLACE FUNCTION geom_mean_final(numeric[2])
RETURNS numeric AS
$$
SELECT CASE WHEN $1[2] > 0 THEN exp($1[1]/$1[2]) ELSE 0 END;
$$
LANGUAGE sql IMMUTABLE;

Now we stitch all the pieces together in our aggregate definition. Note that our aggregate has an initial condition that is the same as what is returned by our state function.

Example 8-12. Geometric mean aggregate: Putting all the pieces together

CREATE AGGREGATE geom_mean(numeric) (SFUNC=geom_mean_state, STYPE=numeric[]
 , FINALFUNC=geom_mean_final, INITCOND='{0,0}'),

Let’s take our geom_mean() function for a test drive. We’re going to compute a heuristic rating for racial diversity and list the top five most racially diverse counties in Massachusetts.

Example 8-13. Top five most racially diverse counties using geometric mean

SELECT left(tract_id,5) As county, geom_mean(val) As div_county 
FROM census.vw_facts 
WHERE category = 'Population' AND short_name != 'white_alone'
GROUP BY county
ORDER BY div_county DESC LIMIT 5;
county |     div_county
-------+---------------------
25025  | 85.1549046212833364
25013  | 79.5972921427888918
25017  | 74.7697097102419689
25021  | 73.8824162064128504
25027  | 73.5955049035237656

Let’s put things into overdrive and try our new aggregate function as a window aggregate.

Example 8-14. Top five most racially diverse census tracts with average

WITH X AS (SELECT tract_id, left(tract_id,5) As county
    , geom_mean(val) OVER(PARTITION BY tract_id) As div_tract
    , ROW_NUMBER() OVER(PARTITION BY tract_id) As rn
    , geom_mean(val) OVER(PARTITION BY left(tract_id,5)) As div_county 
  FROM census.vw_facts WHERE category = 'Population' AND short_name != 'white_alone')
SELECT tract_id, county, div_tract, div_county 
 FROM X 
 WHERE rn = 1 
 ORDER BY div_tract DESC, div_county DESC LIMIT 5;
tract_id    | county |      div_tract       |     div_county
------------+--------+----------------------+---------------------
25025160101 | 25025  | 302.6815688785928786 | 85.1549046212833364
25027731900 | 25027  | 265.6136902148147729 | 73.5955049035237656
25021416200 | 25021  | 261.9351057509603296 | 73.8824162064128504
25025130406 | 25025  | 260.3241378371627137 | 85.1549046212833364
25017342500 | 25017  | 257.4671462282508267 | 74.7697097102419689


[2] SQL in this context really means a language for writing functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.220