Quick introduction to PL/Python

In the previous chapters, we discussed PL/pgSQL which is one of the standard procedural languages distributed with PostgreSQL. PL/pgSQL is a language unique to PostgreSQL and was designed to add blocks of computation and SQL inside the database. While it has grown in its breadth of functionality, it still lacks the completeness of syntax of a full programming language. PL/Python allows your database functions to be written in Python with all the depth and maturity of writing a Python code outside the database.

A minimal PL/Python function

Let's start from the very beginning, yet again:

CREATE FUNCTION hello(name text)
  RETURNS text
AS $$
    return 'hello %s !' %  name
$$ LANGUAGE plpythonu;

Here, we see that creating a function starts by defining it as any other PostgreSQL function with a RETURNS definition of a text field:

CREATE FUNCTION hello(name text)
  RETURNS text

The difference from what we have seen before, is that the language part is specifying plpythonu (the language ID for the PL/PythonU language):

$$ LANGUAGE plpythonu;

Inside the function body, it is very much a normal Python function returning a value obtained by the name passed as an argument formatted into a string 'hello %s !', using the standard Python formatting operator %:

    return 'hello %s !' %  name

Finally, let's test how this works:

hannu=# SELECT hello('world'),
     hello     
---------------
 hello world !
(1 row)

And yes, it returns exactly what is expected!

Data type conversions

The first and last things happening when a PL function is called by PostgreSQL, are converting argument values between the PostgreSQL and PL types. The PostgreSQL types need to be converted to the PL types on entering the function, and then the return value needs to be converted back into the PostgreSQL types.

Except for PL/pgSQL, which uses PostgreSQL's own native types in computations, the PLs are based on existing languages with their own understanding of what types (integer, string, date, and so on) are, how they should behave, and how they are represented internally. They are mostly similar to PostgreSQL's understanding but quite often are not exactly the same. PL/Python converts data from PostgreSQL types to Python types, as shown in the following table:

PostgreSQL

Python 2

Python 3

Comments

int2, int4

int

int

 

int8

long

int

 

real, double, numeric

float

float

This may lose precision for numeric values.

bytea

str

bytes

No encoding conversion is done, nor should any encoding be assumed.

text, char(), varchar(), and other text types

str

str

On Python 2, the string will be in server encoding.

On Python 3, it is a unicode string.

All other types

str

str

PostgreSQL's type output function is used to convert to this string.

Inside the function, all computation is done using Python types and the return value is converted back to PostgreSQL using the following rules (these rules are the direct quotes from official PL/Python documentation at http://www.postgresql.org/docs/current/static/plpython-data.html):

  • When the PostgreSQL return type is Boolean, the return value will be evaluated for truth, according to the Python rules. That is, 0 and empty strings are false, but notably f is true.
  • When the PostgreSQL return type is bytea, the return value will be converted to a string (Python 2) or bytes (Python 3) using the respective Python built-ins, with the result being converted bytea.
  • For all other PostgreSQL return types, the returned Python value is converted to a string using Python's built-in str, and the result is passed to the input function of the PostgreSQL data type.

Strings in Python 2 are required to be in the PostgreSQL server encoding when they are passed to PostgreSQL. Strings that are not valid in the current server encoding will raise an error. But not all encoding mismatches can be detected, so garbage data can still result when this is not done correctly. Unicode strings are converted to the correct encoding automatically, so it can be safer and more convenient to use those. In Python 3, all strings are Unicode strings.

In other words, anything but 0, False, and an empty sequence, including empty strings ' ', or a dictionary becomes PostgreSQL false.

One notable exception to this, is that the check for None is done before any other conversions. Even for Booleans, None is always converted to NULL and not to the Boolean value false.

For the bytea type, the PostgreSQL byte array, the conversion from Python's string representation, is an exact copy with no encoding or other conversions applied.

Writing simple functions in PL/Python

Writing functions in PL/Python is not much different in principle from writing functions in PL/pgSQL. You still have the exact same syntax around the function body in $$, and the argument name, types, and returns all mean the same thing, regardless of the exact PL/language used.

A simple function

So, a simple add_one() function in PL/Python looks like this:

CREATE FUNCTION add_one(i int) 
  RETURNS int AS $$
return i + 1;
$$ LANGUAGE plpythonu;
usm=# SELECT add_one(2);
 add_one 
---------
    3
(1 row)

It can't get any simpler than that, can it?

What you see here is that the PL/Python arguments are passed to the Python code after converting them to appropriate types, and the result is passed back and converted to the appropriate PostgreSQL type for the return value.

Functions returning a record

To return a record from a Python function, you can use:

  • A sequence or list of values in the same order as the fields in the return record
  • A dictionary with keys matching the fields in the return record
  • A class or type instance with attributes matching the fields in the return record

Here are samples of the three ways to return a record:

First, using an instance:

CREATE OR REPLACE FUNCTION userinfo(
                    INOUT username name, 
                    OUT user_id oid, 
                    OUT is_superuser boolean)
AS $$
    class PGUser:
        def __init__(self,username,user_id,is_superuser):
            self.username = username
            self.user_id = user_id
            self.is_superuser = is_superuser
    u = plpy.execute("""
            select usename,usesysid,usesuper
              from pg_user
             where usename = '%s'""" % username)[0]
    user = PGUser(u['usename'], u['usesysid'], u['usesuper'])
    return user
$$ LANGUAGE plpythonu;

Then, a little simpler one using a dictionary:

CREATE OR REPLACE FUNCTION userinfo(
                    INOUT username name, 
                    OUT user_id oid, 
                    OUT is_superuser boolean)
AS $$
    u = plpy.execute("""
            select usename,usesysid,usesuper
              from pg_user
             where usename = '%s'""" % username)[0]
    return {'username':u['usename'], 'user_id':u['usesysid'], 'is_superuser':u['usesuper']}
$$ LANGUAGE plpythonu;

Finally, using a tuple:

CREATE OR REPLACE FUNCTION userinfo(
                  INOUT username name, 
                  OUT user_id oid, 
                  OUT is_superuser boolean)
AS $$
    u = plpy.execute("""
            select usename,usesysid,usesuper
              from pg_user
             where usename = '%s'""" % username)[0]
    return (u['usename'], u['usesysid'], u['usesuper'])
$$ LANGUAGE plpythonu;

Notice [0] at the end of u = plpy.execute(...)[0] in all the examples. It is there to extract the first row of the result, as even for one-row results plpy.execute still returns a list of results.

Tip

Danger of SQL injection!

As we have neither executed a prepare() method and executed a execute() method with arguments after it, nor have we used the plpy.quote_literal() method (both techniques are discussed later) to safely quote the username before merging it into the query, we are open to a security flaw known as SQL injection. So, make sure that you only let trusted users call this function or supply the username argument.

Calling the function defined via any of these three CREATE commands will look exactly the same:

hannu=# SELECT * FROM userinfo('postgres'),
 username | user_id | is_superuser
----------+---------+--------------
 postgres |      10 | t
(1 row)

It usually does not make sense to declare a class inside a function just to return a record value. This possibility is included mostly for cases where you already have a suitable class with a set of attributes matching the ones the function returns.

Table functions

When returning a set from PL/Python functions, you have three options:

  • Return a list or any other sequence of return type
  • Return an iterator or generator
  • The yield keyword in python just returns a generator

Here, we have three ways to generate all even numbers up to the argument value using these different styles:

First, returning a list of integers:

CREATE FUNCTION even_numbers_from_list(up_to int)
  RETURNS SETOF int 
AS $$
    return range(0,up_to,2)
$$ LANGUAGE plpythonu;

libro=# SELECT * FROM even_numbers_from_list(10);
 even_numbers_from_list 
------------------------
          0
          2
          4
          6
          8
(5 rows)

The list here, is returned by a built-in Python function called range, which returns a result of all even numbers below the argument. This gets returned as a table of integers, one integer per row from the PostgreSQL function. If the RETURNS clause of the function definition would say int[] instead of SETOF int, the same function would return a single number of even integers as a PostgreSQL array.

The next function returns a similar result using a generator and returning both the even number and the odd one following it. Also, notice the different PostgreSQL syntax RETURNS TABLE(...) used this time for defining the return set:

CREATE FUNCTION even_numbers_from_generator(up_to int)
  RETURNS TABLE (even int, odd int) 
AS $$
    return ((i,i+1) for i in xrange(0,up_to,2))
$$ LANGUAGE plpythonu;
libro=# SELECT * FROM even_numbers_from_generator(10);
 even | odd 
------+-----
    0 |   1
    2 |   3
    4 |   5
    6 |   7
    8 |   9
(5 rows)

The generator is constructed using a generator expression (x for x in <seq>). Finally, the function is defined using a generator using an explicit yield syntax, and yet another PostgreSQL syntax is used for returning SETOF RECORD with the record structure defined this time by OUT parameters:

CREATE FUNCTION even_numbers_with_yield(up_to int,
                                     OUT even int, 
                                     OUT odd int)
  RETURNS SETOF RECORD 
AS $$
    for i in xrange(0,up_to,2):
        yield i, i+1
$$ LANGUAGE plpythonu;

The important part here, is that you can use any of the preceding ways to define a PL/Python set returning function and they all work the same. Also, you are free to return a mixture of different types for each row of the set:

CREATE FUNCTION birthdates(OUT name text, OUT birthdate date)
  RETURNS SETOF RECORD 
AS $$
    return (
        {'name': 'bob', 'birthdate': '1980-10-10'},
        {'name': 'mary', 'birthdate': '1983-02-17'},
        ['jill', '2010-01-15'],
    )
$$ LANGUAGE plpythonu;

This yields the result, as follows:

hannu=# SELECT * FROM birthdates();
 name | birthdate  
------+------------
 bob  | 1980-10-10
 mary | 1983-02-17
 jill | 2010-01-15
(3 rows)

As you can see, the data returning a part of PL/PythonU is much more flexible than returning data from a function written in PL/pgSQL.

Running queries in the database

If you have ever accessed a database in Python, you know that most database adapters conform to a somewhat loose standard called Python Database API Specification v2.0 or DB API 2 for short. You can find the reference online at http://legacy.python.org/dev/peps/pep-0249/

The first thing you need to know about database access in PL/Python is that in-database queries do not follow this API.

Running simple queries

Instead of using the standard API, there are just three functions for doing all database access. There are two variants: plpy.execute for running a query, and plpy.prepare() for turning a query text into a query plan or a prepared query.

The simplest way to do a query is with:

res = plpy.execute(<query text>, [<row count>])

This takes a textual query and an optional row count, and returns a result object, which emulates a list of dictionaries, one dictionary per row.

As an example, if you want to access a field 'name' of the third row of the result, you use:

res[2]['name']

The index is 2 and not 3 because Python lists are indexed starting from 0, so the first row is res[0], the second row res[1], and so on.

Using prepared queries

In an ideal world, this would be all that is needed, but plpy.execute(query, cnt) has two shortcomings:

  • It does not support parameters
  • The plan for the query is not saved, requiring the query text to be parsed and run through the optimizer at each invocation

We will show a way to properly construct a query string later, but for most uses simple parameter passing is enough. So, the execute(query, [maxrows]) call becomes a set of two statements:

plan = plpy.prepare(<query text>, <list of argument types>)
res = plpy.execute(plan, <list of values>, [<row count>])

For example, to query if a user 'postgres' is a superuser, use the following:

plan = plpy.prepare("select usesuper from pg_user where  usename = $1", ["text"])
res = plpy.execute(plan, ["postgres"])
print res[0]["usesuper"]

The first statement prepares the query, which parses the query string into a query tree, optimizes this tree to produce the best query plan available, and returns the prepared_query object. The second row uses the prepared plan to query for a specific user's superuser status.

The prepared plan can be used multiple times, so that you could continue to see if user bob is superuser.

res = plpy.execute(plan, ["bob"])
print res[0]["usesuper"]

Caching prepared queries

Preparing the query can be quite an expensive step, especially for more complex queries where the optimizer has to choose from a rather large set of possible plans. So, it makes sense to re-use results of this step, if possible.

The current implementation of PL/Python does not automatically cache query plans (prepared queries), but you can do it yourself easily.

try:
    plan = SD['is_super_qplan']
except:
    SD['is_super_qplan'] = plpy.prepare("....
    plan = SD['is_super_qplan']
<the rest of the function>

The global dictionary SD is available to store data between function calls. This variable is private static data. The global dictionary GD is public data, available to all Python functions within a session. Use with care. The values in SD[] and GD[] only live inside a single database session, so it only makes sense to do the caching in case you have long-lived connections.

Writing trigger functions in PL/Python

As with other PLs, PL/PythonU can be used to write trigger functions. The declaration of a trigger function is different from an ordinary function by the return type RETURNS TRIGGER. So, a simple trigger function that just notifies the caller that it is indeed called, looks like this:

CREATE OR REPLACE FUNCTION notify_on_call()
  RETURNS TRIGGER
AS $$
plpy.notice('I was called!')
$$ LANGUAGE plpythonu;

After creating this function, the trigger can be tested on a table using a trigger function:

hannu=# CREATE TABLE ttable(id int);
CREATE TABLE
hannu=# CREATE TRIGGER ttable_notify BEFORE INSERT ON ttable EXECUTE PROCEDURE notify_on_call();
CREATE TRIGGER
hannu=# INSERT INTO ttable VALUES(1);
NOTICE:  I was called!
CONTEXT:  PL/Python function "notify_on_call"
INSERT 0 1

Of course, the preceding trigger function is quite useless, as will be any trigger without knowing when and on what data change, the trigger was called. All the data needed by a trigger, when it is called, is passed in via the trigger dictionary called TD. In TD, you have the following values:

Key

Value

TD["event"]

The event the trigger function is called for; one of the following strings is contained as the event:

INSERT, UPDATE, DELETE, or TRUNCATE

TD["when"]

One of BEFORE, AFTER, or INSTEAD OF

TD["level"]

ROW or STATEMENT

TD["old"]

This is the before-command image of the row. For low-level UPDATE and DELETE triggers, this contains a dictionary for the values of the triggering row, before the changes have been made by the command. It is None for other cases.

TD["new"]

This is the after-command image of the row. For low-level INSERT and UPDATE triggers, this contains a dictionary for the values of the triggering row, after the changes have been made by the command. It is None for other cases.

If you are in a BEFORE or INSTEAD OF trigger, you can make changes to this dictionary and then signal PostgreSQL to use the changed tuple by returning the string MODIFY from the trigger function.

TD["name"]

The trigger name from the CREATE TRIGGER command.

TD["table_name"]

The name of the table on which the trigger occurred.

TD["table_schema"]

The schema of the table on which the trigger occurred.

TD["relid"]

The object identifier (OID) of the table on which the trigger occurred.

TD["args"]

If the CREATE TRIGGER command included arguments, they are available from TD["args"][0] to TD["args"][n-1].

In addition to doing anything you can do in ordinary PL/Python functions, such as modifying data in tables, writing to files and sockets, and sending e-mails, you can also affect the behavior of the triggering command.

If TD["when"] is ("BEFORE", "INSTEAD OF") and TD["level"] == "ROW", you can return SKIP to abort the event. Returning None or OK indicates that the row is unmodified and it is OK to continue. Returning None is also the default behavior for Python if the function does a simple return or runs to the end without a return statement, in which case, you don't need to do anything.

In case you have modified values in the TD["new"] and you want PostgreSQL to continue with the new values, you can return MODIFY to indicate to PL/Python that you've modified the new row. This can only be done if TD["event"] is INSERT or UPDATE, otherwise the return value is ignored.

Exploring the inputs of a trigger

The following trigger function is useful when developing triggers, so that you can easily see what the trigger function is really getting when called:

CREATE OR REPLACE FUNCTION explore_trigger()
  RETURNS TRIGGER 
AS $$
import pprint
nice_data = pprint.pformat(
  (
    ('TD["table_schema"]' , TD["table_schema"] ),
    ('TD["event"]'        , TD["event"] ),
    ('TD["when"]'         , TD["when"] ),
    ('TD["level"]'        , TD["level"] ),
    ('TD["old"]'          , TD["old"] ),
    ('TD["new"]'          , TD["new"] ),
    ('TD["name"]'         , TD["name"] ),
    ('TD["table_name"]'   , TD["table_name"] ),
    ('TD["relid"]'        , TD["relid"] ),
    ('TD["args"]'         , TD["args"] ),
  )
)
plpy.notice('explore_trigger:
' + nice_data)
$$ LANGUAGE plpythonu;

This function formats all the data passed to the trigger in TD using pprint.pformat, and then sends it to the client as a standard Python info message using plpy.notice. For testing this out, we create a simple table and then put an AFTER … FOR EACH ROW … trigger using this function on that table:

CREATE TABLE test(
    id serial PRIMARY KEY,
    data text,
    ts timestamp DEFAULT clock_timestamp()
);

CREATE TRIGGER test_explore_trigger
 AFTER INSERT OR UPDATE OR DELETE ON test
   FOR EACH ROW
EXECUTE PROCEDURE explore_trigger('one', 2, null);

Now, we can explore what the trigger function actually gets:

hannu=# INSERT INTO test(id,data) VALUES(1, 'firstrowdata'),
NOTICE:  explore_trigger:
(('TD["table_schema"]', 'public'),
 ('TD["event"]', 'INSERT'),
 ('TD["when"]', 'AFTER'),
 ('TD["level"]', 'ROW'),
 ('TD["old"]', None),
 ('TD["new"]',
  {'data': 'firstrowdata', 'id': 1, 'ts': '2013-05-13 12:04:03.676314'}),
 ('TD["name"]', 'test_explore_trigger'),
 ('TD["table_name"]', 'test'),
 ('TD["relid"]', '35163'),
 ('TD["args"]', ['one', '2', 'null']))
CONTEXT:  PL/Python function "explore_trigger"
INSERT 0 1

Most of this is expected and corresponds well to the table of the TD dictionary values given in the previous table. What may be a little unexpected, is the fact that the arguments given in the CREATE TRIGGER statement are all converted to strings, even the NULL. When developing your own triggers, either in PL/Python or any other language, it may be useful to put this trigger on the table as well, to check that the inputs to the trigger are as expected. For example, it is easy to see that if you omit the FOR EACH ROW part, the TD['old'] and TD['new'] will both be empty, as the trigger definition defaults to FOR EACH STATEMENT.

A log trigger

Now, we can put this knowledge to work and write a trigger that logs changes to the table to either a file or to a special log-collector process over the network. Logging to a file is the simplest way to permanently log the changes in transactions which were rolled back. If these were logged to a log table, the ROLLBACK command would also remove the log records. This may be a crucial audit requirement for your business.

Of course, this also has a downside. You will be logging the changes that may not be permanent due to the transaction being rolled back. Unfortunately, this is the price you have to pay for not losing the log records.

CREATE OR REPLACE FUNCTION log_trigger()
RETURNS TRIGGER AS $$
    args = tuple(TD["args"])
    if not SD.has_key(args):
        protocol = args[0]
        if protocol == 'udp':
            import socket
            sock = socket.socket( socket.AF_INET,
                                  socket.SOCK_DGRAM )
            def logfunc(msg, addr=args[1],
                             port=int(args[2]), sock=sock):
                sock.sendto(msg, (addr, port))
        elif protocol == 'file':
            f = open(args[1], 'a+')
            def logfunc(msg,f=f):
                f.write(msg+'
')
                f.flush()
        else:
            raise ValueError, 'bad logdest in CREATE TRIGGER'
        SD[args] = logfunc
        SD['env_plan'] = plpy.prepare("""
             select clock_timestamp(),
                    txid_current(),
                    current_user,
                    current_database()""", [])
    logfunc = SD[args]
    env_info_row = plpy.execute(SD['env_plan'])[0]
    import json
    log_msg = json.dumps(
        {'txid' : env_info_row['txid_current'],
         'time' : env_info_row['clock_timestamp'],
         'user' : env_info_row['current_user'],
         'db'   : env_info_row['current_database'],
         'table' : '%s.%s' % (TD['table_name'],
                              TD['table_schema']),
         'event' : TD['event'],
         'old' : TD['old'],
         'new' : TD['new'],
        }
    )
    logfunc(log_msg)
$$ LANGUAGE plpythonu;

First, this trigger checks if it already has a logger function defined and cached in the function's local dictionary SD[]. As the same trigger may be used with many different log destinations, the log function is stored under the key constructed as a Python tuple from the trigger function arguments in the CREATE TRIGGER statement. We cannot use the TD["args"] list directly as a key, as Python dictionary keys have to be immutable, which a list is not, but a tuple is.

If the key is not present, meaning this is the first call to this particular trigger, we have to create an appropriate log function and store it. To do this, we examine the first argument for the log destination type.

For the udp log type, we create a UDP socket for writing. Then, we define a function passing in this socket and also the other two trigger arguments as default arguments for the function. This is the most convenient way to create a closure and to bundle a function with some data values in Python.

For the file type, we just open this file in the append mode (a+) and also create a log function. The log function writes a message to this file and flushes the write, so the data is written to the file immediately and not some time later when the write buffer fills up (this is not preferable for performance critical systems). The log function created in either of these cases is stored in SD[tuple(TD["args"])].

At this point, we also prepare and save a query plan for getting other data we want to log and save this in SD['env_plan']. Now that we are done with the one-time preparations, we can proceed with the actual logging part, which is really very simple.

Next, we retrieve the logging function (logfunc = SD[args]) and get the row of the other logged data:

env_info_row = plpy.execute(SD['env_plan'])[0]

Finally, we convert all the logged data into one JSON object (log_msg = json.dumps({...})) and then use the logging function to send it to the log, logfunc(log_msg).

And that's it.

Next, let's test it out to see how it works by adding another trigger to our test table we created earlier:

CREATE TRIGGER test_audit_trigger
 AFTER INSERT OR UPDATE OR DELETE ON test
   FOR EACH ROW
EXECUTE PROCEDURE log_trigger('file', '/tmp/test.json.log'),

Any changes to the table done via INSERT, UPDATE, or DELETE are logged into /tmp/test.json.log. This file is initially owned by the same user running the server, usually postgres. So, to look at it you need to either be that user or root user, or you have to change the permissions on the file created to allow reading.

If you want to test the UDP logging part, you just have to define another trigger with different arguments:

CREATE TRIGGER test_audit_trigger_udp
   AFTER INSERT OR UPDATE OR DELETE ON test
   FOR EACH ROW
   EXECUTE PROCEDURE log_trigger('udp', 'localhost', 9999);

Of course, you need something to listen at the UDP port there. A minimalist UDP listener is provided for testing in the log_udp_listener.py file under chapter07/logtrigger/. Just run it, and it prints any UDP packets received to stdout.

Constructing queries

PL/Python does a good job of managing values passed to prepared query plans, but a standard PostgreSQL query plan can take an argument in a very limited number of places. Sometimes, you may want to construct whole queries, not just pass values to predefined queries. For example, you can't have an argument for a table name, or a field name.

So, how would you proceed if you want to construct a query from the function's arguments and be sure that everything is quoted properly and no SQL injection would be possible? PL/Python provides three functions to help you with proper quoting of identifiers and data, just for this purpose.

The function plpy.quote_ident(name is meant for quoting identifiers, that is, anything that names a database object or its attribute like a table, a view, a field name, or function name. It surrounds the name with double quotes and takes care of properly escaping anything inside the string which would break the quoting:

hannu=# DO LANGUAGE plpythonu $$ plpy.notice(plpy.quote_ident(r'5""'))
$$;
NOTICE:  "5"" """
CONTEXT:  PL/Python anonymous code block
DO

And yes, 5" " is a legal table or field name in PostgreSQL; you just have to always quote it if you use it in any statement.

Note

The DO syntax creates an anonymous block inside your database session. It is a very handy way to run some procedural language code without needing to create a function.

The other two functions are for quoting literal values. The function, plpy.quote_literal(litvalue), is for quoting strings and plpy.quote_nullable(value_or_none) is for quoting a value, which may be None. Both of these functions quote strings in a similar way, by enclosing them in single quotes (str becomes 'str') and doubling any single quotes or backslashes:

hannu=# DO LANGUAGE plpythonu $$ plpy.notice(plpy.quote_literal(r" ' "))
$$;
NOTICE:  E' '' '
CONTEXT:  PL/Python anonymous code block
DO

The only difference between these two, is that plpy.quote_nullable() can also take a value None, which will be rendered as a string NULL without any surrounding quotes. The argument to both of these has to be a string or a unicode string. If you want it to work with a value of any Python type, wrapping the value in str(value) usually works well.

Handling exceptions

With any bit of code, you need to make sure you handle when errors occur and your PL/Python functions are not an exception.

Before Version 9.1 of PostgreSQL, any error in an SQL query caused the surrounding transaction to be rolled back:

DO LANGUAGE plpythonu $$
plpy.execute('insert into ttable values(1)')
plpy.execute('fail!')
$$;
ERROR:  spiexceptions.SyntaxError: syntax error at or near "fail"
LINE 1: fail!
        ^
QUERY:  fail!
CONTEXT:  Traceback (most recent call last):
  PL/Python anonymous code block, line 3, in <module>
    plpy.execute('fail!')
PL/Python anonymous code block

You can manually use the SAVEPOINT attributes to control the boundaries of the rolled-back block, at least as far back as Version 8.4 of PostgreSQL. This will reduce the amount of the transaction that is rolled back:

CREATE OR REPLACE FUNCTION syntax_error_rollback_test() 
  RETURNS void
AS $$
plpy.execute('insert into ttable values(1)')
try:
   plpy.execute('SAVEPOINT foo;')
   plpy.execute('insert into ttable values(2)')
   plpy.execute('fail!')
except:
   pass
plpy.execute('insert into ttable values(3)')
$$ LANGUAGE plpythonu;

hannu=# SELECT syntax_error_rollback_test();
 syntax_error_rollback_test
---------------------------

(1 row)

When the SAVEPOINT foo; command is executed in PL/Python, an SQL error will not cause full ROLLBACK; but an equivalent of ROLLBACK TO SAVEPOINT foo;, so, only the effects of commands between SAVEPOINT and the error are rolled back:

hannu=# SELECT * FROM ttable ;
 id 
----
  1
  3
(2 rows)

Starting in Version 9.1, there are two important changes in how PostgreSQL exceptions are handled. If no SAVEPOINT or subtransaction is used, each invocation of plpy.prepare() and plpy.execute() is run in its own subtransaction, so that an error will only rollback this subtransaction and not all of the current transaction. Since using a separate subtransaction for each database interaction involves extra costs, and you may want to control the subtransaction boundaries anyway, a new Python context manager, plpy.subtransaction() , is provided.

For an explanation of Python's context managers, refer to http://docs.python.org/library/stdtypes.html#context-manager-types, so that you can use the with statement in Python 2.6, or newer, to wrap a group of database interactions in one subtransaction in a more Pythonic way:

hannu=# CREATE TABLE test_ex(i int);
CREATE TABLE
DO LANGUAGE plpythonu $$
plpy.execute('insert into test_ex values(1)')
try:
  with plpy.subtransaction():
    plpy.execute('insert into test_ex values(2)')
    plpy.execute('fail!')
  except plpy.spiexceptions.SyntaxError:
    pass # silently ignore, avoid doing this in prod. code
    plpy.execute('insert into test_ex values(3)')
$$;
DO
hannu=# SELECT * FROM test_ex;
 i 
---
 1
 3
(2 rows)

Atomicity in Python

While the subtransactions manage data changes in the PostgreSQL database, the variables on the Python side of the fence live their separate lives. Python does not provide even a single-statement level atomicity, as demonstrated by the following:

>>> a = 1
>>> a[1] = a = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object does not support item assignment
>>> a
1
>>> a = a[1] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object does not support item assignment
>>> a
2

As you can see, it is possible that even a single multi-assignment statement can be executed only halfway through. This means that you have to be prepared to fully manage your Python data yourself. The function, plpy.subtransaction(), won't help you in any way with managing Python variables.

Debugging PL/Python

First, let's start by stating that there is no debugger support when running functions in PL/Python; so, it is a good idea to develop and debug a PL/Python function as a pure Python function as much as possible and only do the final integration in PL/Python. To help with this, you can have a similar environment in your Python development environment using the plpy module.

Just put the module in your path and do import plpy before you try running your prospective PL/PythonU functions in an ordinary interpreter. If you use any of the plpy.execute(...) or plpy.prepare() functions , you also need to set up a database connection before using these by calling plpy.connect(<connectstring>).

Using plpy.notice() to track the function's progress

The debugging technology I use most often in any language, is printing out intermediate values as the function progresses. If the printout rolls past too fast, you can slow it down by sleeping a second or two after each print.

In standard Python, it would look like this:

def fact(x):
    f = 1
    while (x > 0):
        f = f * x
        x = x – 1
        print 'f:%d, x:%d' % (f, x)
    return f

It will print out all intermediate values for f and x as it runs:

>>> fact(3)
f:3, x:2
f:6, x:1
f:6, x:0
6

If you try to use print in a PL/Python function, you will discover that nothing is printed. In fact, there is no single logical place to print to when running a pluggable language inside a PostgreSQL server.

The closest thing to print in PL/Python is the function plpy.notice() , which sends a PostgreSQL NOTICE to the client and also to the server log if log_min_messages is set to the value notice or smaller.

CREATE FUNCTION fact(x int) RETURNS int
AS $$
    global x
    f = 1
    while (x > 0):
        f = f * x
        x = x - 1
        plpy.notice('f:%d, x:%d' % (f, x))
    return f
$$ LANGUAGE plpythonu;

Running this is much more verbose than the version with print, because each NOTICE also includes information about the CONTEXT from where the NOTICE comes:

hannu=# SELECT fact(3);
NOTICE:  f:3, x:2
CONTEXT:  PL/Python function "fact"
NOTICE:  f:6, x:1
CONTEXT:  PL/Python function "fact"
NOTICE:  f:6, x:0
CONTEXT:  PL/Python function "fact"
 fact
------
    6
(1 row)

Tip

PL/PythonU function arguments are passed in as globals

If you compared the fact(x) function in Python and PL/Python, you noticed an extra line at the beginning of the PL/Python function:

    global x

This is needed to overcome an implementation detail that often surprises PL/PythonU developers; the function arguments are not the function arguments in the Python sense and neither are they locals. They are passed in as variables in the function's global scope.

Using assert

Similar to ordinary Python programming, you can also use Python's assert statement to catch conditions which should not happen:

CREATE OR REPLACE FUNCTION fact(x int) 
  RETURNS int
AS $$
    global x
    assert x>=0, "argument must be a positive integer"
    f = 1
    while (x > 0):
        f = f * x
        x = x - 1
    return f
$$ LANGUAGE plpythonu;

To test this, call fact() with a negative number:

hannu=# SELECT fact(-1);
ERROR:  AssertionError: argument must be a positive integer
CONTEXT:  Traceback (most recent call last):
  PL/Python function "fact", line 3, in <module>
    assert x>=0, "argument must be a positive integer"
PL/Python function "fact"

You will get a message about AssertionError, together with the location of the failing line number.

Redirecting sys.stdout and sys.stderr

If all the code you need to debug is your own, the preceding two techniques will cover most of your needs. However, what do you do in cases where you use some third party libraries which print out debug information to sys.stdout and/or sys.stderr?

Well, in those cases you can replace Python's sys.stdout and sys.stdin with your own pseudo file object that stores everything written there for later retrieval. Here is a pair of functions, the first of which does the capturing of sys.stdout or uncapturing if it is called with the argument, do_capture set to false, and the second one returns everything captured:

CREATE OR REPLACE FUNCTION capture_stdout(do_capture bool)
  RETURNS text
AS $$
    import sys
    if do_capture:
        try:
            sys.stdout = GD['stdout_to_notice']
        except KeyError:
            class WriteAsNotice:
                def __init__(self, old_stdout):
                    self.old_stdout = old_stdout
                    self.printed = []
                def write(self, s):
                    self.printed.append(s)
                def read(self):
                    text = ''.join(self.printed)
                    self.printed = []
                    return text
            GD['stdout_to_notice'] = WriteAsNotice(sys.stdout)
            sys.stdout = GD['stdout_to_notice']
        return "sys.stdout captured"
    else:
        sys.stdout = GD['stdout_to_notice'].old_stdout
        return "restored original sys.stdout"
$$ LANGUAGE plpythonu;

CREATE OR REPLACE FUNCTION read_stdout()
  RETURNS text
AS $$
    return GD['stdout_to_notice'].read()
$$ LANGUAGE plpythonu;

Here is a sample session using the preceding functions:

hannu=# SELECT capture_stdout(true);
   capture_stdout    
---------------------
 sys.stdout captured
(1 row)

DO LANGUAGE plpythonu $$
print 'TESTING sys.stdout CAPTURING'
import pprint
pprint.pprint( {'a':[1,2,3], 'b':[4,5,6]} )
$$;
DO
hannu=# SELECT read_stdout();
           read_stdout            
----------------------------------
 TESTING sys.stdout CAPTURING    +
 {'a': [1, 2, 3], 'b': [4, 5, 6]}+
 
(1 row)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.78.30