Troubleshooting

This section provides some hints to solve the most common replication errors.

A slave does not start

When we execute START SLAVE, we do not receive any error if the slave cannot connect to a master. By executing SHOW SLAVE STATUS we will know what the I/O thread is doing.

If the replication does not start, or if it crashes, we will need to find out the reason and solve the problem. The following list of question can be used to find out the most common trivial problems:

  • Is the slave version equal to or minor than the master version?
  • Is the binary log enabled on the master?
  • Are the server IDs unique?
  • Is a variable name mistyped in the configuration file? Can the file be used to successfully start a standalone server?
  • Are the master's address, port, and login credentials correctly configured in the slave?
  • Is an account for the slave configured in the server?
  • Does the account have the REPLICATION SLAVE permission?
  • Is there a firewall that blocks the slave connections?
  • Are the replication filters correctly set?
  • Is the slave's disk full?

When a slave starts, it should not stop replicating unless it receives a STOP SLAVE statement. However, if the slave threads are not running, the slave probably encountered a replication error. Another possibility is that we hit a MariaDB bug—this is unlikely, but it is always possible. Anyway, we should start our investigation from the slave's error log. Then, using the mysqlbinlog utility against the latest relay log file, we should try to find out which statement caused the slave threads to crash.

A slave lags behind

Executing a SHOW BINARY LOGS statement on a master shows which binary log files currently exist. If there are too many files, at least one slave is lagging behind. It is impossible here to define "too many": the DBA knows how much delay he/she can tolerate, or even desires, but this strictly depends on the workload on the master servers.

If we determine that we have too many binary logfiles, we will want to find out which slaves are lagging behind. To do this, we can examine the output of SHOW SLAVE STATUS. We should look at the filenames shown in the Master_Log_File and Relay_Log_File columns.

The Master_Log_File column shows which binary logfile is being read by each slave's I/O thread. If a filename is too old, the I/O thread is lagging behind. This is not a common situation. Probably, the network is too slow for our workload. There could be a network problem that is worth being investigated. Anyway, using connection compression should solve the problem. We should also wonder whether we chose the best binary log format, because this factor affects the quantity of information that is sent through the network.

The Relay_Log_File column shows which relay logfile is being processed by the SQL file. Having an SQL thread which lags behind is a quite common situation. Before MariaDB 10.0, this problem was much harder to solve. Having only one thread that executed the same workload that the master spreads through several threads is less than ideal. MariaDB 10.0 parallel replication should help greatly.

If enabling parallel replication, the most common problem consists of poor query optimization. We should examine the master's slow query log and find the problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.51.246