Transaction isolation

In this section, you will be introduced to an important topic called transaction isolation. In my long career as a PostgreSQL database consultant, I have seen countless scenarios in which transaction isolation has caused enormous problems. Sometimes, people even desperately replaced their hardware to get around issues that were definitely not related to hardware.

Demonstrating read committed mode

So, what is transaction isolation? The idea behind transaction isolation is to give users a tool to control what they see inside a transaction. A report might have different requirements than a simple OLTP application, but let's not get lost in plan theory. Instead, let's look at a real example with some data:

test=# CREATE TABLE t_test (id int);
CREATE TABLE
test=# INSERT INTO t_test VALUES (4), (5);
INSERT 0 2

We've got two rows. Now let's see what happens if two transactions battle for the data at the same time:

User 1:

User 2:

BEGIN;

 

SELECT sum(id) FROM t_test;
 sum 
-----
   9
(1 row)

 
 

INSERT INTO t_test VALUES (9);
INSERT 0 1

SELECT sum(id) FROM t_test;
 sum 
-----
  18
(1 row)

 

COMMIT;

 

The first SELECT statement returns 9, which is the expected value. While the first transaction is still active, some other user adds some more data. Most people now expect the first transaction to return 9, but this is not the case. Just because user 1 is in a transaction, it does not imply that its view of the data never changes. By default, PostgreSQL uses something called read committed mode. This means that a transaction will see changes committed by somebody else every time a new statement is started.

Tip

Note that running statements never change their view of the world just because somebody else makes a change. In read committed mode, a snapshot is taken at the beginning of a statement. Once it is running, it stays as it is.

The fact that the result changes due to some other commit can have serious implications. If a report consists of more than just one query, it can mean that data shown on page 5 does not reflect what it says on page 29 of the report. Of course, such discrepancies don't show up all the time but only sometimes—when somebody happens to commit a change in some other session at the same time.

The support requests reaching us here—at Cybertec Schönig & Schönig GmbH—sound something like this, "Sometimes, PostgreSQL returns the wrong result. We have already updated the database, but sometimes there are still mistakes. Interestingly, this only happens on the live system. We have tested things on the development system and it seems to work just fine. We have even replaced the hardware of our production system because this really feels like a broken RAM. The bug still exists but it seems less likely."

There are a couple of observations to be made from this little (fake) support request: Yes, the "problem" never shows up in the lab. The reason is that there are not as many people in the lab as there are on production. Thus, nobody changes data while you are testing. Upgrading the database certainly does not help because PostgreSQL behaves in the desired way. Upgrading hardware won't change things either, but it makes sense in saying that "the problem" will be less likely because new hardware is usually faster. The faster your hardware, the less likely it is that two transactions run at the same time, thus reducing the odds of causing a problem in the first place.

Using repeatable read

To ensure that a transaction does not change its view of the world, REPEATABLE READ mode can be used. Here is a table showing how it works:

User 1:

User 2:

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN

 

SELECT sum(id) FROM t_test;
 sum 
-----
  18
(1 row)

 
 

DELETE FROM t_test;
DELETE 3

SELECT sum(id) FROM t_test;
 sum 
-----
  18
(1 row)

 

COMMIT;

 

In this case, the transaction sticks to the data it sees. The concurrent DELETE statement, which removes all data from the table, does not harm the report. In other words, side effects can be avoided by turning to the repeatable read mode.

Tip

People often ask if repeatable read has a performance penalty compared to read committed mode; it doesn't. Both isolation levels offer the same throughput. There is no performance bottleneck.

As mentioned before, repeatable read is perfect for all kinds of reports. However, it is not the golden solution for everything. Normal OLTP operations are still best done in the read committed mode to ensure that data is as up to date as possible.

Beyond repeatable read

There is more than just read committed and repeatable read. The highest transaction isolation level supported by PostgreSQL is serializable. The original idea behind the serializable transaction isolation level back in the early days was to totally serialize transactions to avoid all kinds of interaction.

However, executing transactions in a sequential way is clearly not what you would expect from a modern relational database such as PostgreSQL. The problem with sequential execution is scalability. There is no way to scale out if a system executes one thing after another on a single CPU because everything else is just locked. In PostgreSQL, serializable mode is way more sophisticated. In fact, it pretends to do sequential execution, but internally, it executes things as parallel as possible.

What is the purpose of a serializable transaction? Sometimes, freezing your snapshot is not enough. Consider the following example:

CREATE TABLE t_prison (guard text);

Let's assume that you want to ensure that there is always exactly one guard around (for the sake of the example, no primary key is allowed). Now a transaction may look like this:

  BEGIN;
  SELECT count(*) FROM t_prison;
  if count > 1 
    delete a guard
  COMMIT;

What happens if two folks execute this at the same time? Both would see that two people are in the table. Each of them would decide to delete one. If we are unlucky, one would remove "Bob" while the other one would remove "Alice". Voilà! All doors to the prison would be wide open. It does not matter whether repeatable read or read committed is used. Two people can check for the content at the same time, and therefore, see the same data, leading to the same conclusion.

The solution to this problem is serializable mode. It will ensure that only transactions that are independent of each other can run in parallel. Transactions that involve the same data (or same predicates) will conflict with each other.

The beauty of this is that you can basically get around a table lock, thus improving scalability. However, you will need a retry loop in case transactions are aborted due to conflicts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.170.187