Chapter 5. Getting Transactions and Locking Right

Transactions are a core technique of every professional relational database system. In fact, it is pretty hard to imagine a world without transactions these days. Atomic deletions, proper locking, and all the functionalities provided by a modern relational system are simple and expected from a modern system, and many applications rely on them. Therefore, this chapter is all about transactions and locking.

The following topics will be covered:

  • The PostgreSQL transaction model
  • Basic locking
  • FOR UPDATE, FOR SHARE, and NOWAIT
  • Locking tables
  • Understanding transaction isolation levels
  • Indexes and foreign keys
  • Transactions and sequences

The PostgreSQL transaction model

The PostgreSQL transaction model differs quite a lot from other systems. In many cases, these differences are a root source of trouble. In this section, some basic examples have been included to make you aware of the way PostgreSQL works.

The first thing to take into account is that a transaction block has to be started with BEGIN statement and finished with a simple COMMIT statement All statements inside the transaction block have to be correct:

test=# BEGIN;
BEGIN
test=# SELECT now();
              now              
-------------------------------
 2014-09-22 13:27:24.952578+02
(1 row)

test=# SELECT now();
              now              
-------------------------------
 2014-09-22 13:27:24.952578+02
(1 row)

test=# COMMIT;
COMMIT

Note that now() will return the transaction time. Therefore it is always the same, regardless of how often it is used. Many people like to use DEFAULT now() in their column definitions. Always keep in mind that DEFAULT now() sets the column to transaction time, and it stays constant even in a long transaction. This is a handy way to determine whether all rows have been written in the same transaction.

The second important thing to note is that a transaction must be free of mistakes:

test=# BEGIN;
BEGIN
test=# SELECT 1;
 ?column? 
----------
        1
(1 row)

test=# SELECT 1 / 0;
ERROR:  division by zero
test=# SELECT 1;
ERROR:  current transaction is aborted, commands 
  ignored until end of transaction block
test=# SELECT 1;
ERROR:  current transaction is aborted, commands 
  ignored until end of transaction block
test=# COMMIT;
ROLLBACK

In this example, an error occurs after a successful start of the transaction. The main issue here is that the transaction will never recover, even if properly working statements are issued after the mistake! Remember that a transaction has to be correct from the beginning to the end. Otherwise, all commands after the error will be ignored by the system.

Understanding savepoints

If the conditions we just mentioned cannot be achieved, it is wise to consider using savepoints. Basically, a savepoint is a mechanism used to jump back inside a transaction. It can be used effectively to avoid errors and to ensure success. The following example shows how it works:

test=# BEGIN;
BEGIN
test=# SELECT 1;
 ?column? 
----------
        1
(1 row)

test=# SAVEPOINT s1;
SAVEPOINT
test=# SELECT 1 / 0;
ERROR:  division by zero
test=# ROLLBACK TO SAVEPOINT s1;
ROLLBACK
test=# COMMIT;
COMMIT

After the initial SELECT statement, a savepoint is created. Note that the savepoint, as the name suggests, is used later on to identify the spot to return to. After an error occurs, it is still possible to jump to a specific savepoint. The main advantage is that the transaction can be committed nicely in this case.

Tip

Keep in mind that a savepoint is only valid inside a transaction. It does not exist after a transaction has committed or rolled back.

Understanding basic locking and deadlocks

Now that you have learned about the basic PostgreSQL model followed by the PostgreSQL system, it is time to dig into locking. In many cases, locking turns out to be a bottleneck. So, it is definitely worth inspecting things.

To get started, a simple demo table can be created like this:

test=# CREATE TABLE t_test AS SELECT 1 AS id;
SELECT 1

The table contains only one row, which is enough to demonstrate some basic concepts of locking. In the first example, the goal is to demonstrate what happens when two users try to increment the value in the table at the same time:

User 1

User 2

BEGIN;

 
 

BEGIN;

UPDATE t_test SET id = id +1 
     RETURNING *;

 
 

UPDATE t_test SET id = id +1 
     RETURNING *;

 

wait ...

COMMIT;

wait ...

 

COMMIT;

The goal is to allow two users to increment the value in the table at the same time. The important thing to observe is that the second UPDATE statement has to wait until the first UPDATE statement has committed. It is important to mention that we are talking about a row level lock and not a table lock here. PostgreSQL only locks those rows affected by the UPDATE statement, ensuring that other people are not facing serious bottlenecks.

The second important observation is the result; it is ensured that the outcome at the end of the example is always 3. As soon as the first transaction commits, the second UPDATE rereads the now-committed row and increments it. While this may seem like a logical issue, it is not widely known to many people in spite of its wide implications.

Let's expand the example a little:

test=# INSERT INTO t_test VALUES (5);
INSERT 0 1
test=# SELECT * FROM t_test;
 id 
----
  3
  5
(2 rows)

Let's perform two updates now:

User 1:

User 2:

BEGIN;

BEGIN;

UPDATE t_test SET id = 4 
   WHERE id = 3;

UPDATE t_test SET id = 6 
    WHERE id = 4;

 

UPDATE t_test SET id = 4 
   WHERE id = 3;

UPDATE t_test SET id = 6 
   WHERE id = 5;

 

… deadlock detected ...

 
 

COMMIT;

The first UPDATE statement of both the users works as expected. The second UPDATE statement of User 2 has to wait for User 1 because the same row is changed. The important part, however, is the final UPDATE. Both users wait on each other, so nobody can win this game. Therefore, PostgreSQL steps in and resolves the hopeless situation. The exact error message is as follows:

test=# UPDATE t_test SET id = 6 WHERE id = 5;
ERROR:  deadlock detected
DETAIL:  Process 27988 waits for ShareLock on
   transaction 1131992; blocked by process 28084.
Process 28084 waits for ShareLock on transaction
   1131990; blocked by process 27988.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (0,5) in relation
  "t_test"

The deadlock is resolved automatically. All that the user has to do is to catch the error and try again.

Tip

Many users think that a deadlock is something nasty, horribly dangerous, and disastrous. Actually, this is not the case. A deadlock is a totally natural thing, and it simply happens once in a while. Don't panic! Instead, think logically about the reasons for it.

How can deadlocks be avoided? Actually, there is no general rule. Personally, I found it useful to try to update data in a sorted order. In many cases, this works well and cures the reasons for a deadlock (which is out of order changes). Besides this, there is not much that can be done to get around this issue.

When talking about deadlocks, many people instantly ask about a PostgreSQL parameter called deadlock_timeout:

test=# SHOW deadlock_timeout;
 deadlock_timeout 
------------------
 1s
(1 row)

There is a general misconception regarding this parameter in the heads of many. The parameter actually tells us how long the system waits before checking for a deadlock. Deadlock detection is fairly expensive, and therefore, PostgreSQL waits for one second before it initializes the check.

Locking in FOR UPDATE mode

In this section you will be introduced to a problem that has bugged generations of database developers—wrong or missing locking.

Let's assume you are reading some data and you definitely intend to change it later on, once you have read it. But what if the data was already gone by the time you were performing your UPDATE? I assume it would not do you any good.

In PostgreSQL, reading can occur concurrently, and many people can read the same piece of data at the same time without interfering with each other. Therefore, reading does not provide sufficient protection against other folk who are planning to modify data as well. The result can be nasty. Consider the following example:

BEGIN;
SELECT * FROM tab WHERE foo = bar;
UPDATE tab SET foo = "value made in the application";
COMMIT;

A transaction like what is shown here is not safe. The reason is simple: What if people execute the same thing concurrently? It may happen that they overwrite each other's changes instantly. Clearly, this would cause major issues.

To prevent this from happening, SELECT … FOR UPDATE comes to the rescue.

Consider the following example:

User 1:

User 2:

BEGIN;

BEGIN;

SELECT * FROM t_test 
   WHERE 1 > 0 FOR UPDATE;

 

… do some work ...

SELECT * FROM t_test 
   WHERE 1 > 0 FOR UPDATE;

… do some work ...

wait ...

COMMIT;

 
 

… returns latest data ...

 

… do some work and commit ...

In this example, SELECT … FOR UPDATE locks the rows returned by the query, just like UPDATE would've locked those rows. The beauty here is that the second transaction has to wait until the first transaction either issues a COMMIT statement or exits. The advantage here is that the second transaction can already build on the changes of the first query, ensuring that nothing is lost.

Avoiding performance bottlenecks

However, if used carelessly, SELECT … FOR UPDATE can cause serious performance issues. Consider the following example:

SELECT … FROM a, b, c, d, e, f, g WHERE … FOR UPDATE;

Let's assume all you want to do is to modify a. The rest of the tables are just used as lookup tables. PostgreSQL has no way to figure out that you are planning to modify just one table. Therefore, it has to lock all rows returned by the query, in all tables. While this is a necessary thing for PostgreSQL to do, it is likely to cause nasty contentions involving lookup tables.

Tip

Contention means you may witness any of the following: comparatively low CPU usage, little disk wait, bad performance, and so on. If the CPU is not working at its top speed and if nothing is moving forward, you are most likely facing contentions caused by locking somewhere.

There are two approaches to fix this problem. The first is to use SELECT … FOR UPDATE NOWAIT. This will make a query stop if a lock cannot be obtained. It can come in handy if you want to avoid waiting indefinitely. The second option is even more sophisticated. Consider the following example:

SELECT … FROM a, b, c, d, e, f, g WHERE … 
  FOR UPDATE OF a, b;

In this case, PostgreSQL knows that only two tables will most likely be updated in the next couple of steps, which will improve locking substantially.

Avoiding table locks

In some cases, SELECT … FOR UPDATE is not enough, and locking an entire table is necessary. To create a table lock, PostgreSQL provides a simple command:

test=# h LOCK
Command:     LOCK
Description: lock a table
Syntax:
LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

where lockmode is one of:

    ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE 
  | SHARE UPDATE EXCLUSIVE | SHARE 
  | SHARE ROW EXCLUSIVE | EXCLUSIVE 
  | ACCESS EXCLUSIVE

Note that PostgreSQL has eight lock levels here, ranging from ACCESS SHARE all the way up to ACCESS EXCLUSIVE. These allow you to define locking in a very fine-grained way. Suppose you want to make sure that you are the only person allowed to read and write a table. Then the lock mode needed is ACCESS EXCLUSIVE. It perfectly ensures that nobody else can even look at that table. If nobody is supposed to modify a table but reading is perfectly fine, EXCLUSIVE is the option of choice.

On the other end of the spectrum, there is a lock held by a simple read called ACCESS SHARE. It can nicely coexist with other reads and does not harm others. It only conflicts with ACCESS EXCLUSIVE, which is needed by DROP TABLE and similar commands at http://www.postgresql.org/docs/9.4/static.

Locking a table is sometimes necessary, but it should never be done in a careless way because it can have nasty side effects and block other transactions.

Tip

More information about locking and a detailed overview of all potential conflicts can be found at:

http://www.postgresql.org/docs/9.4/static/explicit-locking.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.250.11