Complete and incremental backups

If we work with a considerable amount of data, performing a backup can take a long time; during this period, the tables are probably locked. Also, the backup may need a lot of space. To reduce the backup time and the needed space, we can use incremental backups (also called partial backups). An incremental backup is a copy of the changes that were made to the data since a well-known instant (the time of the previous backup).

Of course, we don't want to restore the data by applying all incremental backups that have been performed since the server was started for the first time! Such an operation is theoretically possible, but would be slow, require a lot of space for backups, and be error prone. Thus, regular complete backups are still necessary.

However, a mix of complete and incremental backups is usually a good strategy. For example, we can take a complete backup once a week and an incremental backup each night. To restore the data after a disaster, we will restore the most recent complete backup and then apply all subsequent incremental backups (if any).

Backups and replication

Backups and replication are correlated topics. They both duplicate data to allow us to recover them if a disaster occurs.

However, it is important to remember that replication does not replace a good backup strategy. In fact, there is an important conceptual difference between these techniques. A backup is a static consistent snapshot of the data; it will never change. A replication slave repeats all the operations performed by the master, so its databases constantly change.

In a replication environment, we have the important opportunity of choosing the server we will use to perform backups. Creating backups from the master is often a bad idea, because that server is used by the applications, and we should avoid slowing it down or even stopping it, if possible. A slave is theoretically a good idea, especially if it does not work as a master for other slaves. However, we must also keep in mind that slaves can lag behind their master by some hours or even by several days. While this can be acceptable for replication, backups should always contain very recent data. So, slaves are only used for backups if they do not sensibly lag behind. Replication will be discussed in Chapter 9, Replication.

A database cluster is a complex, very reliable, replication setup. In Chapter 12, MariaDB Galera Cluster, we will discuss the most common clustering solution for MariaDB. Galera guarantees that all data in all the nodes is always up to date. In this case, if one of the nodes does not normally receive queries from the clients, it is a good choice for backups. Otherwise, we can probably choose the server with the most powerful hardware.

Steps to be followed before performing backups

Until this point, we have discussed backup types, and the benefits provided by each type. The coming sections discuss in detail how to perform these backups in practice. But before that detailed discussion, let's ask this important question: what should we do after choosing a backup strategy?

For each involved backup method that we are going to implement, we should take the following steps:

  1. Write the necessary scripts:

    Backups need to be automatic, so we will create cron jobs and other scripts to make them take place regularly.

  2. Test data backups:

    We will use development servers for this. We will set up test data, we will perform a backup, and we will check if the backup has been created. Also, we will check if the time required for the backup methods we have chosen is acceptable.

  3. Test data restoring:

    At this point, we will perform an operation that heavily modifies the database, and we will restore the backup. We will check if everything is in place. This step is useful to check that we know exactly what to do when disasters occur. We must take the correct actions, and we are probably required to do it quickly.

  4. Document all the procedures:

    Even the best backup and restore methods are useless if we do not remember how to use them. Document all possible problems and how to solve them.

  5. Switch to production:

    This should be done only when we are really ready!

This book is specifically about MariaDB. It does not cover cron jobs, system shell, programming, or testing methods. In the following sections, we will only discuss the heart of the topic: how to perform backups and restore data. But when putting these techniques in practice, we will need to follow the preceding steps to make sure that backups always work as expected.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.79.84