The internal structure of the XLOG

We will use the transaction log throughout this book, and to give you a deeper insight into how things work on a technical level, we have added this section dealing exclusively with the internal workings of the XLOG machinery. We will avoid going down to the C level as this would be ways beyond the scope of this book, but we will provide you with insights that are hopefully deep enough.

Understanding the XLOG records

Changes made to the XLOG are record-based. What does that mean? Let us assume you are adding a row to a table:

test=# INSERT INTO t_test VALUES (1, 'hans'),
INSERT 0 1

In this example, we are inserting into a table containing two columns. For the sake of this example, we want to assume that both columns are indexed.

Remember what we learned before: the purpose of the XLOG is to keep those data iles safe. So, this operation will trigger a series of XLOG entries. First, the data file(s) related to the table will be written. Then the indexes' related entries will be created. Finally a COMMIT record is sent to the log.

Not all the XLOG records are equal: Various types of XLOG records exist (for example, heap, btree, clog, storage, gin, and standby records to name just a few).

XLOG records are chained backwards. So, each entry points to the previous entry in the file. This way, we can be perfectly sure that we have found the end of a record as soon as we have found the pointer to the previous entry.

Making the XLOG deterministic

As you can see, a single change can trigger a larger number of XLOG entries. This is true for all kinds of statements; a large DELETE statement for instance can easily cause a million changes. The reason is that PostgreSQL cannot simply put the SQL itself into the log; it really has to log physical changes made to the table.

Why is that? Just consider the following example:

test=# DELETE FROM t_test WHERE id > random();
DELETE 5335

The random() function has to produce different outputs every time it is called, and therefore we cannot just put the SQL into the log because it is not guaranteed to provide us with the same outcome if it is executed during replay.

Making the XLOG reliable

The XLOG itself is one of the most critical and sensitive parts in the entire database instance. Therefore we have to take special care to make sure that everything possible is done to protect it. In the case of a crash, a database instance is usually doomed if there is no XLOG around.

Internally, PostgreSQL takes some special precautions to handle the XLOG:

  • Using CRC32 checksums
  • Disabling signals
  • Space allocation

First of all each XLOG record contains a CRC32 checksum. This allows us to check the integrity of the log at startup. It is perfectly feasible that the last write operations before a crash were not totally sound anymore, and therefore a checksum can definitely help to sort out problems straight away. The checksum is automatically computed by PostgreSQL and users don't have to take care of this feature explicitly.

In addition to checksums, PostgreSQL will disable signals temporarily while writing to the XLOG. This gives some extra level of security and reduces the odds of a stupid corner-case problem somewhere.

Finally, PostgreSQL uses a fixed size XLOG. The size of the XLOG is determined by checkpoint segments as well as by the checkpoint_completion_target.

The size of the PostgreSQL transaction log is calculated as follows:

(2 + checkpoint_completion_target) * checkpoint_segments + 1

An alternative way to calculate the size is:

checkpoint_segments + wal_keep_segments + 1 files

The important thing is that if something is of fixed size, it can rarely run out of space.

Tip

In the case of transaction-log-based replication, we can run out of space on the XLOG directory if the transaction log cannot be archived.

You can learn more about this topic in the next chapter.

LSNs and shared buffer interaction

If you want to repair a table, you have to make sure that you do so in the correct order; it would be a disaster if a row was deleted before it actually came into existence. Therefore the XLOG provides you with the order of all the changes. Internally, this order is reflected through the Logical Sequence Number (LSN). The LSN is essential to the XLOG. Each XLOG entry will be assigned to an LSN straight away.

In one of the previous sections, we have discussed consistency level. With synchronous_commit set to off, a client will get an okay even if the XLOG record has not been flushed to disk yet. Still, since a change must be reflected in cache and since the XLOG must be written before the data table, the system has to make sure that not all the blocks in the shared buffer can be written out instantly. The LSN will guarantee that we can only write blocks from the shared buffer to the data file if the corresponding change has already made it to the XLOG. Writing to the XLOG is fundamental and a violation of this rule would certainly lead to problems after a crash.

Debugging the XLOG and putting it all together

Now that we have seen how the XLOG basically works, we can put it all together and actually look into the XLOG. As of PostgreSQL 9.2, this works as follows: We have to compile PostgreSQL from source. Before we do that, we should modify the following file located at src/include/pg_config_manual.h. At around line 250, we can uncomment WAL_DEBUG and compile as usual. This will allow us then to set a client variable called wal_debug:

test=# SET client_min_messages TO log;
SET
test=# SET wal_debug TO on;
SET

In addition to that, we have to set client_min_messages to make sure that LOG messages will reach our client.

We are using the following table structure for our test:

test=# d t_test
    Table "public.t_test"
 Column |  Type   | Modifiers 
--------+---------+-----------
id      | integer | 
name    | text    | 
Indexes:
"idx_id"btree (id)
"idx_name"btree (name)

If PostgreSQL has been compiled properly (and only then), we will see some information about the XLOG on the screen:

test=# INSERT INTO t_test VALUES (1, 'hans'),
LOG:  INSERT @ 0/17C4680: prev 0/17C4620; xid 1009; len 36: Heap - insert(init): rel 1663/16384/16394; tid 0/1
LOG:  INSERT @ 0/17C46C8: prev 0/17C4680; xid 1009; len 20: Btree - newroot: rel 1663/16384/16397; root 1 lev 0
LOG:  INSERT @ 0/17C4700: prev 0/17C46C8; xid 1009; len 34: Btree - insert: rel 1663/16384/16397; tid 1/1
LOG:  INSERT @ 0/17C4748: prev 0/17C4700; xid 1009; len 20: Btree - newroot: rel 1663/16384/16398; root 1 lev 0
LOG:  INSERT @ 0/17C4780: prev 0/17C4748; xid 1009; len 34: Btree - insert: rel 1663/16384/16398; tid 1/1
LOG:  INSERT @ 0/17C47C8: prev 0/17C4780; xid 1009; len 12: Transaction - commit: 2013-02-25 18:20:46.633599+01
LOG:  XLOG flush request 0/17C47F8; write 0/0; flush 0/0

Just as stated in this chapter, PostgreSQL will first add a row to the table itself (heap). Then the XLOG contains all entries that are index related. Finally, a commit record is added.

All together, 156 bytes have made it to the XLOG; this is far more than the data we have actually added. Consistency, performance (indexes), and reliability come with a price tag.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.172.132