Chapter 9. Replication

MariaDB supports built-in replication; it can be used for several purposes. The most common reason to build a replication environment is to increase data redundancy for improving the fault tolerance. Also, while replication does not replace a good backup plan, a slave data can sometimes be used as a backup for the master in case of data loss. Another use of replication is writing data into the master and spreading the queries through two or more slaves, to improve performance.

In the previous chapter, we discussed backups. Knowing this topic is very important now, because replication, just as with some backup types, is based on the binary log.

In this chapter we will learn:

  • How replication works in MariaDB
  • Setting up a master and a slave
  • Loading data into a slave or a new master
  • Configuring masters and slaves
  • Rotating replication logs
  • Checking the slaves' data integrity
  • Solving the most common replication problems

An overview of replication

MariaDB supports built-in replication. This feature is one of the most ancient and an advanced MariaDB feature. The first version of the code saw the light in MySQL 3.23.15, in May 2000. At that time, MySQL did not even include InnoDB, and did not support important features such as views or the UNION statement. Of course, the first version of replication was quite poor. Basically, a master just logged SQL statements and sent the log entries to the slaves. However, the age of this feature reveals how stable it is nowadays.

MariaDB replication is based on the binary log. The binary log keeps track of the events that modify the databases. The binary log supports three formats:

  • STATEMENT
  • ROW
  • MIXED

With the STATEMENT format, events are all SQL statements that do or could modify some data. With the ROW format, events are all modifications that occur as a consequence of such statements. The MIXED format records the statements when possible, but it can also record the modifications. Depending on the format, we commonly define the replication as statement-based or row-based. The binary log will be explained in detail in this chapter. It has already been mentioned in Chapter 8, Backup and Disaster Recovery.

MariaDB's built-in replication is called asynchronous replication. This means that there is no need for a permanent connection between the slaves and the master.

Note

It is possible to stop the replication at any moment to obtain a snapshot of the master's data. This can be done, for example, to perform a fast backup. When the slave is started again, it will receive all the events that occurred while the replication was not working. The same happens if the slave had crashed for some reason.

The slave can be queried by clients. Read-heavy workloads can greatly benefit from this feature. Connecting several slaves to one master allows us to distribute queries on the slaves.

Each slave can also be a master. For example, server A can be the master for server B, while B can also be the master for server C. If B crashes, replication to C will be temporarily stopped; however, if B loses data, C can be used as a backup. Also, in our example, C could be a master of A. This kind of configuration forms a ring, and it is called circular replication. It allows modifying or reading data on any server. Data consistency is guaranteed, because each modification will eventually be replicated by all servers in the ring. The main disadvantage is that none of the servers contain a version of the data that is always up-to-date.

MariaDB 10.0 supports multisource replication. This feature allows each slave to replicate data from multiple masters. There is no conflict handling in MariaDB. Thus, the master must contain different data. It is not possible to replicate the same database from two or more masters. Multisource replication allows you to use one machine, or a limited number of machines, to replicate data from several masters. The cost of the hardware can be reduced with this technique.

Ideally, in a replication environment all the masters and slaves should use the same MariaDB version. Replication from an older master to a newer slave does not work. For example, a 5.5 master cannot replicate to a 10.0 slave. Replication from a newer master to an older slave is generally supported, but it may cause problems. MySQL servers can be present in the replication topology.

How replication works

In this section, we will be provided with information on how replication is implemented in MariaDB. In particular, the read operation will learn which threads are used and which logs are kept. This is necessary before proceeding with the following section, which explains how to set up the master and slave servers in a replication environment, and how to maintain them.

Replication threads

In MariaDB replication, three kinds of threads are used, as shown in the following table:

Where it runs

Thread name

master

Binlog dump thread

slave

SQL I/O thread

slave

Slave SQL thread

The connections between each slave and a master are requested by the slaves. When a slave is started, it creates the SQL I/O thread. This thread connects to the master and requests events that must be replicated.

On the master, a Binlog dump thread runs. This thread is a daemon that accepts requests from the slave's SQL I/O threads and sends them the binary log events. In the output of SHOW SLAVE STATUS, this thread is called Slave_IO_running. The output of SHOW PROCESSLIST shows this thread as Binlog Dump.

The SQL I/O thread does not execute events directly. It just writes them in a log on the slave called the slave relay log.

The Slave SQL thread reads the relay log and executes the events in the database.

Parallel replication

Before MariaDB 10.0, only one slave SQL thread was started for each slave server. This could lead to poor performance, because normally a master executes the same write operations using several parallel threads. Sometimes a single thread is not sufficient to replicate the master's workload with acceptable performance.

In MariaDB 10.0, a feature called parallel replication has been introduced. Oracle introduced a similar feature in MySQL 5.6. However, MySQL users should note that MariaDB and MySQL use different implementations of parallel replication, configured in different ways. For example, the most important server variable for MariaDB parallel replication is @@slave_parallel_threads, that is not present in MySQL; and the most important server startup option for MySQL parallel replication is --slave-parallel-workers, which is not present in MariaDB.

Parallel replication consists of starting a pool of threads that is able to apply many events in a parallel way. Each thread in this pool is called a worker thread. Note that not all the events can be applied by parallel threads. MariaDB will still execute some operations sequentially, when this is necessary to correctly replicate the data.

This feature is optional and is not enabled by default. To use it, it is necessary to configure it on the master, by setting the @@slave_parallel_threads server variable. This value is the number of worker threads that will be started on each slave. As a consequence, all the slaves replicating data from the same master will have the same number of worker threads. Also, if a slave replicates multiple masters, the same number of worker threads must be configured on all the masters.

Slave logs

Slaves need to record information about the replication configuration and current progress. This information must not be lost, even in the event of a crash. So, each slave maintains three logs:

  • The relay log contains the events that were received by the master's binary logs. As explained previously, this log is written by the slave I/O thread and read by the slave SQL thread, or by pool worker threads if parallel replication is used.
  • The master log stores the information that is necessary to connect to the master, as well as the master's binary logs coordinates. The coordinates consist of a logfile name and the position of the last binary log event that has been received.
  • The relay log info log stores information about the last relay log event that has been applied by the slave SQL thread or the worker threads.

The relay log is always written into files. The master log information and the relay log information can be written in files or into system tables in the mysql database.

Even in multisource replication, each slave has only one log for each type.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.180.43