Checking the replication for errors

Calculating checksums is the best way to be sure that a server and a slave contain exactly the same data as the server. This check can be used in two situations:

  • After loading data into a slave, to be sure that everything worked properly
  • On running servers, on a regular basis, or when we suspect that a replication error happened

The second case is more complicated, because during the normal execution slaves can lag behind their masters. However, a tool explained next is able to perform this check automatically, by waiting until slaves reach a certain binary log event.

There are at least three methods to do this:

  • Using the CHECKSUM TABLE statement
  • Using the Percona pt-table-checksum tool
  • Calculating a checksum of the physical files (only for physical backups)

Sometimes we only want to check a relatively small subset of data. If so, instead of checking the whole tables, we can write a query that returns that data and calculate the MD5 checksum of the result set.

The CHECKSUM TABLE statement

This statement returns the checksums for one or more tables. It has the following syntax:

CHECKSUM TABLE <table_list> [ QUICK | EXTENDED ]

The QUICK option only takes effect with MyISAM and Aria tables. These storage engines calculate a live checksum for the tables that have been created with the CHECKSUM or TABLE_CHECKSUM option set to 1. With the QUICK option, CHECKSUM TABLE returns the live value.

The EXTENDED option calculates a checksum of the table by reading each individual row. This can be very slow.

If no option is specified, QUICK takes effect.

Some storage engines do not support this statement. In this case, NULL is returned.

If the checksum is 0, the table is empty. Consider the following example:

MariaDB [test]> CHECKSUM TABLE customers, orders, products;
+----------------+------------+
| Table          | Checksum   |
+----------------+------------+
| test.customers |          0 |
| test.orders    | 2720624778 |
| test.products  | 3036305396 |
+----------------+------------+
3 rows in set (0.00 sec)

The pt-table-checksum tool

Similar to other tools that we have already discussed, pt-table-checksum is included in the Percona Toolkit suite. Its purpose is to report a reliable checksum and the number of rows in each table. It does this without slowing down the server too much, even with big databases.

The pt-table-checksum tool calculates a checksum for each table. Tables are never read with a long-running locking query. The pt-table-checksum tool uses relatively small queries to divide tables in smaller subsets. Based on the server's response times, pt-table-checksum composes queries that are not too heavy for its current workload. Moreover, it sets @@innodb_lock_wait to 1 second at session level, so that it disconnects when a table is locked by another session for a long time.

By default, pt-table-checksum also detects running slaves and connects to them to execute the checksums. The tool periodically executes a SHOW PROCESSLIST statement to monitor the connected slaves. If some of them lag behind the master or disconnect, pt-table-checksum waits until they recover. For this reason, this tool is probably the best way to periodically check the data integrity of the running servers.

After calculating the checksum for one table, the tool connects to the slaves and calculates the same checksum to verify that the tables are identical. Then, it prints out the checksum and the number of rows. It does not begin another table's checksum until this work is finished.

While pt-table-checksum has good fault tolerance, it could sometimes stop because of an error it cannot handle. In this case, it is possible to restart it with the --resume option specified, so that the work it already did is not lost.

Files checksum

Calculating file checksums is a good way to check that the copy of a physical backup worked properly, before starting a slave. Linux systems usually include the md5sum command, which reports the checksum of one or more files. For example:

root@this:/usr/local/mysql/data/open_fatture# md5sum --binary fornitori.frm fornitori.ibd
4582a1f51dea7980cb739b1a055d3ba7 *fornitori.frm
de8ffec76f0303d0c129a536e015e14d *fornitori.ibd

The --binary option informs md5sum that the specified files contain binary data by default, or if the --text option is specified. It treats the file contents as texts. This is only useful with CONNECT and CSV tables, and logfiles.

Query checksum

In some cases, we can write a query that only returns the recently inserted rows, and perhaps the ones that are modified recently. We can use such queries to quickly check that no error occurred while replicating recent data. To do this, we can use again the md5sum Linux program.

Here is an example:

root@this:/usr/local/mysql# bin/mysql -uroot -proot --execute="SELECT * FROM gest_pescara.orders WHERE o_time > NOW() - INTERVAL 2 DAY;" | md5sum
a50786099fb580c0dcb564323103bee2  -
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.2.225