Different types of replication

Now that you are fully armed with the basic understanding of physical and theoretical limitations, it is time to learn about different types of replication.

Synchronous versus asynchronous replication

The first distinction we can make is whether to replicate synchronously or asynchronously.

What does this mean? Let us assume we have two servers and we want to replicate data from one server (the master) to the second server (the slave). The following diagram illustrates the concept of synchronous and asynchronous replication:

Synchronous versus asynchronous replication

We could use a simple transaction like the one shown in the listing:

BEGIN:
INSERT INTO foo VALUES ('bar'),
COMMIT;

In the case of asynchronous replication, the data can be replicated after the transaction has been committed on the master. In other words, the slave is never ahead of the master, and in the case of writing, usually a little behind the master. This delay is called lag.

Synchronous replication enforces higher rules of consistency. If you decide to replicate synchronously (how this is done practically will be discussed in Chapter 5, Setting up Synchronous Replication), the system has to ensure that the data written by the transaction will be at least on two servers at the time the transaction commits. This implies that the slave does not lag behind the master and that the data seen by the end users will be identical on both the servers.

Tip

Some systems will also use a quorum server to decide. So, it is not always about just two or more servers. If a quorum is used, more than half of the servers must agree on action inside the cluster.

Understanding replication and data loss

When a transaction is replicated from a master to a slave, many things have to be taken into consideration, especially when it comes to things such as data loss.

Let us assume we are replicating data asynchronously in the following manner:

  1. A transaction is sent to the master.
  2. It commits on the master.
  3. The master dies before the commit is sent to the slave.
  4. The slave will never get this transaction.

In the case of asynchronous replication, there is a window (lag) during which data can essentially be lost. The size of this window might vary depending on the type of setup. Its size can be very short (maybe as short as a couple of milliseconds) or long (minutes, hours, or days). The important fact is that data can be lost. A small lag will only make data loss less likely, but, any lag larger than zero is susceptible to data loss.

If you want to make sure that data can never be lost, you have to switch to synchronous replication. As you have seen in this chapter already, a synchronous transaction is synchronous because it will only be valid if it commits to at least two servers.

Considering the performance issues

As you have learned in our section about the speed of light and latency, sending unnecessary messages over the network can be expensive and time consuming. If a transaction is replicated in a synchronous way, PostgreSQL has to make sure that data has reached the second node, and this will lead to latency issues.

Synchronous replication can be more expensive than asynchronous replication in many ways, and therefore people should think twice if this overhead is really needed and justified.

Tip

Only use synchronous replication when it is really needed.

Single-master versus multi-master replication

A second way to classify various replication setups is to distinguish between single- and multi-master replication.

Single master means that writes can go to exactly one server, which distributes the data to the slaves inside the setup. Slaves may only receive reads but no writes.

In contrast to single-master replication, multi-master replication allows writes to all the servers inside the cluster. The following diagram shows how things work on a conceptual level:

Single-master versus multi-master replication

Having the ability to write to any node inside the cluster sounds like an advantage but it is not necessarily one. The reason for that is multi-master replication adds a lot of complexity to the system. In the case of just one master, it is totally clear which data is correct, which direction data will flow, and there are rarely conflicts during replication. Multi-master replication is quite different, as writes can go to many nodes at the same time and the cluster has to be perfectly aware of conflicts and handle them gracefully. An alterative would be to use locks to solve the problem but this approach will have its own problems.

Tip

Keep in mind that the need to resolve conflicts will cause network traffic, and this can instantly turn into scalability issues caused by latency.

Logical versus physical replication

One more way to classify replication is to distinguish between logical and physical replication.

The difference is subtle but highly important: Physical replication means that the system will move data as is to the remote box. So, if something is inserted, the remote box will get data in binary format, not via SQL.

Logical replication means that a change, which is equivalent to the data coming in, is replicated.

Let us look at an example to fully understand the difference:

test=# CREATE TABLE t_test (t date);
CREATE TABLE
test=# INSERT INTO t_test VALUES (now()) 
RETURNING *;
t
------------
 2013-02-08
(1 row)

INSERT 0 1

We see two transactions going on here: The first transaction creates a table. Once this is done, the second transaction adds a simple date to the table and commits.

In the case of logical replication, the change will be sent to some sort of queue in logical form, so the system does not send plain SQL but maybe something such as follows:

test=# INSERT INTO t_test VALUES ('2013-02-08'),
INSERT 0 1

Note that the function call has been replaced with the real value. It would be a total disaster if the slave were to calculate now() once again because the date on the remote box might be a totally different one.

Tip

Some systems do use statement-based replication as the core technology. MySQL, for instance, uses a so called bin-log to replicate, which is actually not too binary but more some form of logical replication.

Physical replication will work in a totally different way: Instead of sending some SQL (or whatever) over, which is logically equivalent, the system will send binary changes made by PostgreSQL internally.

Here are some of the binary changes our two transactions might have triggered (but by far, not a complete list):

  1. Add an 8k block to pg_class and put a new record there (to indicate the table is present).
  2. Add rows to pg_attribute to store the column names.
  3. Perform various changes inside the indexes on those tables.
  4. Record the commit status, and so on.

The goal of physical replication is to create a copy of your system that is (largely) identical on the physical level. This means that the same data will be in the same place inside your tables on all boxes. In the case of logical replication, the content should be identical but it makes no difference if it is in the same place or not.

When to use physical replication

Physical replication is very convenient to use and especially easy to set up. It is widely used when the goal is to have identical replicas of your system (to have a backup or to simply scale up).

In many setups, physical replication is the standard method, which exposes the end user to the lowest complexity possible. It is ideal to scale out the data.

When to use logical replication

Logical replication is usually a little harder to set up but it offers greater flexibility. It is also especially important when it comes to upgrading an existing database. Physical replication is totally unsuitable for version jumps because you cannot simply rely on the fact that every version of PostgreSQL has the same on-disk layout. The storage format might change over time and therefore a binary copy is clearly not feasible to jump from one version to the next.

Logical replication allows de-coupling the way data is stored from the way it is transported and replicated. By using a neutral protocol, which is not bound to a certain version of PostgreSQL, it is easy to jump from one version to the next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.46.92