Procedures and indexing

In this section, you will learn the basic pitfalls of indexing and procedures. One of the most common issues is that people forget to define their procedures. In PostgreSQL, there are four types of procedures:

  • VOLATILE
  • STABLE
  • IMMUTABLE
  • [NOT] LEAKPROOF

A volatile procedure may return different outputs for the same input parameters within the same transaction:

test=# SELECT random(), random();
      random       |      random       
-------------------+-------------------
 0.906597905792296 | 0.368819046299905
(1 row)

The random() function is supposed to return different values all the time. This is the core purpose of a random generator. Logically, this has implications when it comes to indexing:

SELECT * FROM tab WHERE field = random();

Can PostgreSQL use an index here? The answer is no, because what will the engine look up in the B-tree? The value is supposed to change by definition for each row. Therefore, indexes are forbidden here. All existing indexes on the column will simply be ignored.

Tip

VOLATILE is a major source of performance problems because people are left wondering why no indexes are used.

STABLE tells PostgreSQL that this function is providing a more deterministic output than a VOLATILE function:

test=# BEGIN;
BEGIN
test=# SELECT now();
              now              
-------------------------------
 2014-10-15 11:04:51.555648+02
(1 row)

test=# SELECT now();
              now              
-------------------------------
 2014-10-15 11:04:51.555648+02
(1 row)

test=# COMMIT;
COMMIT

Within the same transaction, the function will always return the same output, given the same input. The now() function is a classic candidate here because within the same transaction, it will always return the same time (transaction time).

Tip

The clock_timestamp() function is the volatile counterpart of now() because it returns the real time.

IMMUTABLE is the most stable level a function can reach. It basically means that the result is always the same, regardless of the transaction:

test=# SELECT cos(10);
        cos         
--------------------
 -0.839071529076452
(1 row)

The cosine of 10 is always -0.839, regardless of transactional content, age of the programmer, mood of the customer, or the database system in use. It is a law of nature—a fact.

STABLE and IMMUTABLE functions can be indexed, like this:

SELECT * FROM tab WHERE field = now()

The previously mentioned query is a perfect candidate for using an index on a field. Because now()stays constant throughout the transaction, PostgreSQL actually knows what to look up in the index and will therefore consider indexes as an optimization. The same would apply to pi(), sin(), cos(), and so on.

Always define a function properly (if possible) to avoid performance-related problems:

test=# CREATE FUNCTION dummy(int) RETURNS int AS 
$$ 
  SELECT $1; 
$$ LANGUAGE 'sql' IMMUTABLE;
CREATE FUNCTION

LEAKPROOF versus NOT LEAKPROOF

LEAKPROOF is, unfortunately, a bit more complicated than the other three volatility types. However, it is as important as they are.

To understand LEAKPROOF, it is important to dig a little deeper into the working of the planner. Consider the following statement:

SELECT * FROM tab WHERE a = 10 AND b = 20;

Should the system filter on a or on b first? Well, it depends. If b promises a greater reduction in the amount of data returned, it is better to filter on b first to ensure that there is not much data left when the second filter on a is applied.

In other words, PostgreSQL is allowed to reorder those quals (restrictions) and decide on the best way to execute those conditions. Why is this relevant? Well, here is an example:

CREATE FUNCTION slow_func(int4) RETURNS int4 AS $$
        BEGIN
                EXECUTE 'SELECT pg_sleep(1)';
                RAISE NOTICE 'slow_func: %', $1;
                RETURN $1;
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE
        COST  10;
 
CREATE FUNCTION fast_func(int4) RETURNS int4 AS $$
        BEGIN
                RAISE NOTICE 'fast_func: %', $1;
                RETURN $1;
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE
        COST  100;

We have two functions; one is fast and the other is slow. To prove my point, I assigned more penalty points (costs) to the faster function. All that the two functions do is wait for some time to pass. Here is the demo data:

CREATE TABLE t_test (id int4); 
INSERT INTO t_test SELECT * FROM generate_series(1, 20);

Now a view is created. All it does is return even numbers and call the faster function to see whether it returns zero:

CREATE VIEW v AS 
        SELECT  *
        FROM    t_test
        WHERE   id % 2 = 0
                AND fast_func(id) = 0;

In the next step, the view is used and a filter is applied on top of the view:

SELECT * FROM v WHERE slow_func(id) = 0;

PostgreSQL is facing a couple of choices now: Which function should be called first, the fast one or the slow one? And in which order should the filters be applied?

Here is what happens:

	iming
SELECT * FROM v WHERE slow_func(id) = 0;
NOTICE:  slow_func: 2
NOTICE:  slow_func: 4
 *snip*
NOTICE:  slow_func: 20
 id
----
(0 rows) 
Time: 10012.580 ms

PostgreSQL will go for the slow function first because it promises to be faster due to lower costs.

Now the trouble is as follows: imagine you are not allowed to see the complete content of t_test. For some security reasons, you are allowed to see only even numbers. PostgreSQL is allowed to reorder those filters. Who says that id%2 and fast_func are executed first? What if PostgreSQL would decide on calling slow_func first? A simple RAISE NOTICE function would reveal the data that you are actually not allowed to see. In our case, PostgreSQL figures out that %2 should be the first operation, but if this was not there, there would be a major security problem.

This is exactly when LEAKPROOF comes into play. It actually tells the planner whether a function can be pushed through a so-called security barrier or not.

The next listing shows how a security barrier can be created:

CREATE VIEW v WITH (security_barrier) AS 
        SELECT  *
        FROM    t_test
        WHERE   id % 2 = 0 
                AND fast_func(id) = 0;

Just add WITH (security_barrier) to the definition of the view, and PostgreSQL won't be allowed anymore to push NOT LEAKPROOF functions to the view to reorder things. LEAKPROOF functions can be reordered freely.

If you are dealing with security-critical data or many different users and procedures, it is definitely important to think about LEAKPROOF as against NOT LEAKPROOF to avoid nasty surprises.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.85