Using the binary log for incremental backups

The binary log is a series of files that store the events that modify the data. It is used for incremental backups and for replication. The purpose of the binary log is being able to apply the changes again to a database. In the case of replication, its use is intuitive: the binary log events are sent by the master to the slaves, so that they can apply the same changes and always mirror the master's data (not necessarily immediately). Replication cannot work if the binary log is not enabled.

In the case of backups, the binary logs files are used as incremental backups. If a disaster happens, the data can be restored to the most recent and complete backup. It can be a physical or a logical backup. After that, the data will probably be a bit old. However, if the binary log is used, more recent incremental backups could exist. They contain changes that can be sent to the server, so that they are applied to the complete backup.

An example will make the procedure clearer. We have an online shop that uses MariaDB. Every day, the data changes: new products are sold or bought, new users register, the website traffic statistics are collected, and so on. We obviously need to back up the data frequently. But maybe the database is quite big and the website's traffic is high, so we do not want to slow down the website and use a great quantity of disks performing frequent and complete backups. We choose to perform a complete backup once a week, at the day and time when the traffic is statistically lower. But if the complete backup is taken on Sunday and the database gets corrupted on Saturday night, we do not want to lose all data changes that were made during the last week! So, we also want to take incremental backups every day when the traffic is lower. The most convenient way to do this is by rotating the binary log. Rotating the log means that a new logfile is created, and the currently used file remains in the same directory for archive purposes. For extra security, we generally also want to backup that file on a removable storage device. An old logfile that remains in the same directory will most probably be useful if the data gets corrupted. But what if the disk gets damaged? The old logs should be stored on more than one disk. Also, when a new, complete backup is performed, we will also delete archived logfiles from the disk to save space.

The binary logs events can be stored in three formats:

  • Statement-based
  • Row-based
  • Mixed

With the statement-based format, SQL statements are logged. This format has many limitations: if it used, some SQL statements or functions should be avoided because they can produce different results when reapplied. For example, the result of NOW() depends on the current date and time. With the row-based format, modifications to data are logged. The mixed format logs statements when they are safe; when they are not safe, the mixed format logs data modifications. Choosing the proper format is very important to implement an efficient replication, so the formats will be more thoroughly discussed in Chapter 9, Replication.

The binary log will also be more thoroughly described in Chapter 9, Replication. However, in this chapter we will discuss the most important concepts about the binary log.

To enable the binary log, we can start the server with the --log-bin startup option or by specifying log-bin in the configuration file. In both cases, it is possible to specify a file base name instead of simply setting the option to ON. The default base name is the server's hostname followed by -bin. The logfiles also have numerical extensions that can be used to order them.

Note

If the binary log is enabled, the @@log_bin server variable is ON. The @@log_bin_basename variable contains the current logfile's basename.

The SHOW MASTER STATUS statement shows the complete name of the current logfile. It works even if replication is not used. For example:

MariaDB [(none)]> SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+---------------+----------+--------------+------------------+
| binlog.000114 |      323 |              |                  |
+---------------+----------+--------------+------------------+
1 row in set (0.00 sec)

The SHOW BINARY LOGS statement shows the current and the old binary logfiles:

MariaDB [(none)]> SHOW BINARY LOGS;
+---------------+-----------+
| Log_name      | File_size |
+---------------+-----------+
| binlog.000001 |       849 |
| binlog.000002 |       342 |

| binlog.000113 |       342 |
| binlog.000114 |       323 |
+---------------+-----------+
114 rows in set (0.01 sec)

The binary logfiles are written in a compact format that cannot be read by a human or by MariaDB. To examine the files, or apply the changes, we can use mysqlbinlog. This utility is included in all MariaDB distributions, and it translates a logfile into readable SQL statements.

The mysqlbinlog command can be used to send these statements to the mysql client so that it executes them:

../bin/mysqlbinlog binlog.000113 binlog.000114 | mysql -uroot -proot

Note that it is possible to translate multiple files with one invocation, just as in the previous example. The files must be specified in the correct order. Also, we should never send files to the server with multiple calls. This is because a file could try to use a temporary table that has been created in another file. If we use separate invocations, the temporary tables are lost at the end of the file execution and will not be available for the next file.

Sometimes, we may want to recover data that has been destroyed with a wrong statement, such as DELETE, DROP TABLE, or DROP DATABASE. In that case, we need to restore the last complete backup and apply the changes from the recent binary logfiles. But, we also need to avoid executing again the statement that destroyed important data. To do so, we can write the output of mysqlbinlog into a file, manually edit the file, and then send the statements to the server. For example:

root@this:/usr/local/mysql/data# ../bin/mysqlbinlog binlog.000113 > apply
root@this:/usr/local/mysql/data# ../bin/mysqlbinlog binlog.000114 >> apply
root@this:/usr/local/mysql/data# gedit apply
root@this:/usr/local/mysql/data# mysql -uroot -proot < apply

We can also select the events to be applied based on their datetimes. We can specify a start datetime, an end datetime, or both. For example:

../bin/mysqlbinlog binlog.000114 --start-datetime="2014-04-08 14:10:00" --stop-datetime="2014-04-08 15:01:30" | mysql -uroot -proot

This technique can be used to avoid executing a set of statements that delete important data. Generally, we need to examine the output of mysqlbinlog (perhaps written in a file) to be sure about the datetimes of the statements that we do not want to execute.

For many workloads, the datetimes are not precise enough. Since these timestamps do not have a sub-second precision, several events will probably occur at the same datetime, and we generally do not want to exclude all of them. Fortunately, each event has a number indicating its position in the binary log. By examining the output of mysqlbinlog, we can find the positions of the events that we do not want to execute. They are written as comments along with other metainformation:

#140408 15:26:14 server id 1  end_log_pos 694     Query    thread_id=4    exec_time=0    error_code=0
SET TIMESTAMP=1396963574/*!*/;
INSERT INTO myisam1 VALUES (3,3)
/*!*/;
# at 694

In this example, the INSERT statement is at position 694.

Adding the --start-datetime and --stop-datetime options to mysqlbinlog can help us find the exact positions of those statements. Then, we can specify the exact positions of the statements to skip, just as in the following example:

../bin/mysqlbinlog binlog.000114 --start-position= 25392 --stop-position=25399 | mysql -uroot -proot
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.171.212