Delimited text backups

A backup of a single table can be a delimited text file: a human-readable text file in which the column values are separated by a specific character. The most common example is the CSV format, where the values are separated by a comma. MariaDB supports the following ways to create text-delimited files:

  • The mysqldump command with the --tab option
  • The SELECT … INTO OUTFILE command
  • The CSV storage engine
  • The CONNECT storage engine

MariaDB also supports the following methods to restore a text limited backup:

  • The mysqlimport command
  • The LOAD DATA INFILE command
  • The CSV storage engine
  • The CONNECT storage engine

The --tab option of the mysqldump command

The mysqldump command, when invoked with the --tab option, produces two files for each dumped table. The name of these files is the name of the original table, followed by an extension. One file has a .sql extension, and it contains the CREATE TABLE statement that is necessary to recreate the empty table. Of course, this file needs to be executed first to restore a dump unless the table already exists. The other file has a .txt extension, and it contains a delimited text backup. By default, a tab character is used to separate values, and a new line character is used to separate lines. But the used characters as well as many file characteristics can be configured using some mysqldump options. These options will be discussed later in this section, because they are used by multiple tools and statements.

The --tab option specifies the path where the .sql and .txt files will be located. For example: --tab=/tmp/backup.

Loading a dump file with the mysqlimport command

The mysqlimport command is a tool that is complementary to mysqldump and can be used to import delimited text backups. Like mysqldump, it is included in all MariaDB distributions and is located in the bin directory. Its syntax is as follows:

mysqlimport [options] db_name file [file …]

A database name must be specified so that mysqlimport knows where the table is stored. Then, at least one file to import must be specified. The base name of the file must be the same as the table being referred to. The extension, if it exists, is not relevant (it can be .csv, .txt, or whatever we prefer). This has an interesting consequence: two files with the same base name and different extensions refer to the same table. For very big tables, it could be convenient to split the rows into multiple files. The mysqlimport command also has several options that can be used to specify which characters are used to separate columns and rows, and other characteristics of the files to import. These options are described later in this section.

By default, the delimited text file is expected to be in the server. If mysqlimport is executed remotely, the file can also be located in the client; in this case, the --local option must be specified.

It is possible to skip the first lines in the source file by specifying the --ignore-lines option, for example, --ignore-lines=1. This is useful when the first line is composed of the columns names, or the file begins with some informative lines (like the timestamp of its creation or the name of the software that produced it).

Another important option is --delete, which empties the tables before importing the rows.

In case of duplicate values, the --replace option causes the imported rows to replace the existing rows in the tables, while the --ignore option leaves the existing rows untouched but avoids producing an error.

When importing many data, it may be useful to do it in parallel. The --use-threads option specifies how many threads must be used to import data. For example, if we specify --use-threads=2, mysqlimport will use two threads.

Creating a text-delimited file with the SELECT … INTO OUTFILE command

The SELECT statement has an INTO OUTFILE clause, which causes the result set to be written into a file. By default, the file is saved into the MariaDB install directory (not the data directory). However, a path can be specified with the filename. Remember that the MariaDB user needs to have the FILE privilege to write or read files. Also, the system user used by MariaDB (which is usually mysql) needs to have write access to the directory where the file is saved. On Linux systems, the /tmp directory is usually a good candidate. Beware of the fact that, if the file already exists, an error is produced. Note, however, that the file needs to be located on the server. With SELECT … INTO OUTFILE, we cannot create a file on the client or on any other host.

The result set will not be sent to the client, but it will receive the number of found rows (or an error).

Here is a simple example:

SELECT *
  ->FROM information_schema.TABLES
  ->ORDER BY TABLE_SCHEMA, TABLE_NAME
  ->INTO OUTFILE '/tmp/tables.txt';

By default, a tab character is used to separate columns values, and a new line character is used to separate rows. Several clauses exist to use different separators or other file characteristics. They will be discussed later, in the Separator options and clauses section, together with the corresponding options of mysqldump and mysqlimport.

The SELECT … INTO OUTFILE command is mostly used to exchange data between servers or between MariaDB and other software. Using it to create a backup is not very common. However, this method proves its flexibility when we want to back up only a subset of tables data, for example, using JOIN operations or WHERE clauses.

Running a SELECT … INTO DUMPFILE statement with the default separators is the same as running a query directly from the command line using the following syntax:

mysql -e "SELECT …" > file_name

Consider the following example:

mysql -e "SELECT * FROM information_schema.TABLES ORDER BY TABLE_SCHEMA, TABLE_NAME" > /tmp/tables.txt

This syntax is very convenient when we want to save the result sets of the queries into files from a shell script. Also, this allows creating a text delimited file on the client, instead of the server.

Dumping a table definition with the SHOW CREATE TABLE command

Sometimes, we want to obtain the SQL statement, which allows us to recreate a table structure and not only the data. The statement that does this is very simple; here is an example:

MariaDB [test]> SHOW CREATE TABLE customer G
*************************** 1. row ***************************
       Table: customer
Create Table: CREATE TABLE 'customer' (
  'id' int(11) NOT NULL AUTO_INCREMENT,
  'hire_date' date NOT NULL,
  'first_name' varchar(50) DEFAULT NULL,
  'last_name' varchar(50) DEFAULT NULL,
  PRIMARY KEY ('id')
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

If we need this dump, we probably want to obtain the statement that allows recreating the database, shown as follows:

MariaDB [test]> SHOW CREATE DATABASE test;
+----------+---------------------------------------------------------------+
| Database | Create Database                                               |
+----------+---------------------------------------------------------------+
| test     | CREATE DATABASE 'test' /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+---------------------------------------------------------------+
1 row in set (0.00 sec)

Using both these statements, we will be able to recover data from a SELECT … INTO OUTFILE statement, if the database has been accidentally destroyed. More commonly, these statements allow the exchange of data between servers.

SHOW CREATE TABLE and SHOW CREATE DATABASE do not support a clause to save the results into a file, such as SELECT. However, we can write a simple script to do this.

Loading a dump file with the LOAD DATA INFILE statement

The LOAD DATA INFILE statement is complementary to SELECT … INTO OUTFILE. It loads data from a delimited test file into an existing table.

The general syntax is:

LOAD DATA [LOW_PRIORITY | CONCURRENT]
  [LOCAL] INFILE 'file'
  [REPLACE | IGNORE]
  INTO TABLE tab_name [PARTITION (p_name, …)]
  [CHARACTER SET charset]
  [other_options]
  [IGNORE n {LINES | ROWS}]
  [(column, …)]
  [SET column = expr, …]

In this syntax description, the options that specify the separator characters are grouped in the other_options placeholder. They are identical to the ones used for SELECT … INTO OUTFILE and will be discussed later in this section.

The LOW_PRIORITY and CONCURRENT clauses are only useful with non-transactional tables. The LOW_PRIORITY clause causes the statement to have a lower priority than the read operations. The CONCURRENT clause means that the MyISAM concurrent inserts should be used. Both these clauses may slow down the statement itself, but they will not block concurrent statements: queries from other sessions will have the priority if LOW_PRIORITY is used, and concurrent inserts will be allowed if CONCURRENT is used.

The LOCAL clause means that the specified file must be sent by the client to the server. If this clause is not specified, the file is supposed to be on the server. In this case, the MariaDB user needs to have the FILE privilege. Also, the system user used by MariaDB needs permissions to read the file. Another difference is that, with the LOCAL keyword, duplicate key errors are turned into warnings and do not abort the whole operation.

The REPLACE and IGNORE clauses are used to handle duplicate values. With REPLACE, the new rows replace the existing rows. With IGNORE, the existing rows are left untouched, and no duplicate key error is produced.

The INTO TABLE clause specifies the target table and, optionally, one or more target partitions.

Note

The CHARACTER SET clause should always be present, and it indicates the character set used by the file. The default value is read from the character_set_database session variable, which depends on the default database and cannot be reliably modified by the user.

It is possible to skip the first lines in the source file by specifying IGNORE n LINES. This is mainly useful if the first line of the file contains the column headers.

By default, the server assumes that the columns are ordered in the same way both in the file and in the table. The order of a table's columns is the order in which they appeared in the CREATE TABLE statement unless ALTER TABLE explicitly changed their order. The order can be seen with a simple DESC statement (which shows a table's columns).

The order can (and usually should) be explicitly specified between the parentheses in the same way we specify them in the INSERT statements.

It is also possible to populate one or more columns with a calculated value with the SET clause. For example, if a product table has a price column; it could also have a sales_tax column that is 10 percent of price. Since MariaDB supports VIRTUAL and PERSISTENT calculated columns, there is usually no need to insert the calculated value with LOAD DATA INFILE. But we may still want to insert those values for some reason, for example, because the database was designed years ago, when such features did not exist, and we do not want to modify it. This may be for MySQL compatibility or because the expression that calculates the values is not deterministic, due to which the database cannot be used for a VIRTUAL column. The LOAD DATA INFILE clause has a SET clause that can be used to insert calculated values:

SET sales_tax = price / 100 * 20

Separator options and clauses

The mysqldump and mysqlimport command-line tools, and the SELECT … INTO OUTFILE and LOAD DATA INFILE SQL statements, have a set of options that can be used to specify the characters used to: separate values, enclose strings, escape special characters in strings, and separate rows.

These options are the same for all these tools, except that the SQL syntax is slightly different, and it is slightly more flexible for line separators.

The following table shows the options' syntax and their meaning:

The mysqldump and mysqlimport options

The SELECT ... INTO OUTFILE and LOAD DATA INFILE clauses

Description

--fields-terminated-by=string

FIELDS TERMINATED BY 'string'

Values are separated by this sequence of characters.

--fields-enclosed-by=string, --fields-optionally-enclosed-by=string

FIELDS [OPTIONALLY] ENCLOSED BY 'string'

String values are quoted using the specified sequence of characters. With the optional keyword, the quotes could be omitted when they are unnecessary.

--fields-escaped-by=char

FIELDS ESCAPED BY 'char'

The specified character is used to escape the special characters defined with other options, the NULL value, and the NUL character (ASCII 0x00), which indicates the end of the file on Windows.

 

LINES STARTING BY 'string'

This is only used by LOAD DATA INFILE. The rows begin with the specified string, which will be ignored.

--lines-terminated-by=string

LINES TERMINATED BY 'string'

Lines are separated by this sequence of characters.

In SQL statements, when using multiple field or line clauses, the FIELDS and LINES keywords must not be repeated. For example, a correct syntax is FIELD TERMINATED BY ',' ESCAPED BY '|'. All the SQL clauses are optional; if they are present, they must appear in the same order that is used in the preceding table. Here, COLUMNS is a synonym for FIELDS.

An example to create and restore dump files

We discussed how to create a dump file and how to restore it if necessary, using both SQL statements and command-line tools. Now, let's see a simple example. We will see how to create a logical backup of a table using SELECT INTO OUTFILE, and then we will restore the data with LOAD DATA INFILE.

First, let's create a small table with some example rows:

MariaDB [test]> CREATE TABLE customer (
    -> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    -> hire_date DATE NOT NULL,
    -> first_name VARCHAR(50),
    -> last_name VARCHAR(50)
    -> )
    -> ENGINE = InnoDB,
    -> CHARACTER SET = 'utf8';
Query OK, 0 rows affected (0.41 sec)
MariaDB [test]> INSERT INTO customer (hire_date, first_name, last_name) VALUES
    -> ('2011-05-07', 'David', 'Coverdale'),
    -> ('2010-01-20', 'Ritchie', 'Blackmore'),
    -> ('2012-11-15', 'Ian', 'Paice'),
    -> ('2011-06-01', 'Jon', 'Lord'),
    -> ('2010-02-28', 'Roger', 'Glover'),
Query OK, 5 rows affected (0.10 sec)
Records: 5  Duplicates: 0  Warnings: 0

Now, let's create a delimited text file using mysqldump and check that the first rows are correct. The table is very small, so it is not really necessary to check only the first rows; however, in a more realistic case, it would be much better. We will also check that the table file exists, shown as follows:

root@this:/usr/local/mysql# mysqldump -uroot -proot test customer --tab=/tmp --fields-terminated-by=, --fields-enclosed-by="'" --fields-escaped-by=/
root@this:/usr/local/mysql# ls /tmp
customer.sql
customer.txt
root@this:/usr/local/mysql# tail --lines 3 /tmp/customer.txt
'3','2012-11-15','Ian','Paice'
'4','2011-06-01','Jon','Lord'
'5','2010-02-28','Roger','Glover'

Everything seems to be okay. Now, let's back up the same table with SELECT … INTO OUTFILE. The statement that is going to issue is equivalent to the former example, except that it does not generate a table definition file. Consider the following code snippet:

MariaDB [test]> SELECT *
    -> FROM customer
    -> INTO OUTFILE '/tmp/customer.2.txt'
    -> FIELDS
    -> TERMINATED BY ','
    -> ENCLOSED BY '''
    -> ESCAPED BY ',';
Query OK, 5 rows affected (0.00 sec)

Now, we want to check that the files produced are identical:

root@this:/usr/local/mysql# md5sum /tmp/customer.txt
d6b2c04587f9dc56a82a8b9784abe5fe  /tmp/customer.txt
root@this:/usr/local/mysql# md5sum /tmp/customer.2.txt
d6b2c04587f9dc56a82a8b9784abe5fe  /tmp/customer.2.txt

Since the MD5 sums of the two files are identical, we can assume that the files are identical too.

Before trying to restore the file, we need to empty the table:

MariaDB [test]> TRUNCATE TABLE customer;
Query OK, 0 rows affected (0.25 sec)

Now, let's restore the table from a delimited text file:

root@this:/usr/local/mysql# bin/mysqlimport -uroot -proot --fields-terminated-by=, --fields-enclosed-by="'" --fields-escaped-by=/ test /tmp/customer.txt
test.customer: Records: 5  Deleted: 0  Skipped: 0  Warnings: 0

The equivalent LOAD DATA INFILE statement is the following, but we will need to empty the table again, before issuing it:

MariaDB [test]> LOAD DATA INFILE '/tmp/customer.txt'
    -> INTO TABLE test.customer
    -> FIELDS
    -> TERMINATED BY ','
    -> ENCLOSED BY '''
    -> ESCAPED BY ',';
Query OK, 5 rows affected (0.08 sec)                 
Records: 5  Deleted: 0  Skipped: 0  Warnings: 0

Performing a backup using a CONNECT or CSV engine

The tables created with the CSV engine use normal comma-separated data files, which can be used for backups or data exchange. The CONNECT engine is more complex: it supports several table types. Each table type is in a different data format. Supported formats include CSV, XML, HTML, and data files created by dBASE. The CONNECT engine can even read and write data from or to a remote database server using the native protocol if it is a MariaDB or MySQL server or using the ODBC standards for other DBMS types.

Creating a backup from a table using CSV or CONNECT is very simple. The next example shows how to do this with CSV:

MariaDB [test]> CREATE TABLE customer_bkp ENGINE = CSV SELECT * FROM customer;
ERROR 1178 (42000): The storage engine for the table doesn't support nullable columns

But this did not work! The example shows that CSV cannot be used if a table contains NULL values. This is a very important limitation, and it is not the only one. So, we should usually prefer CONNECT, which is much more advanced and flexible. The only reason why we might use CSV instead is probably that CONNECT has been introduced with MariaDB 10 and cannot be installed on older versions.

Since CONNECT is not installed by default, we may need to install it as follows:

MariaDB [test]> INSTALL SONAME 'ha_connect';
Query OK, 0 rows affected (0.00 sec)

Then, we can use it to perform the backup. We will use the CONNECT storage engine's CSV table type, because it is a good and efficient way to store data. We may use more structured or exotic formats, but there is no reason in this case. Pay attention to the table options in the following example:

MariaDB [test]> CREATE TABLE customer_bkp
    -> ENGINE = CONNECT
    -> TABLE_TYPE = CSV
    -> FILE_NAME = '/tmp/customer.csv'
    -> HUGE = 0
    -> COMPRESS = 1
    -> READONLY = 1
    -> DATA_CHARSET = 'utf8'
    -> SEP_CHAR = ','
    -> ENDING = 1
    -> QUOTED = 1
    -> QCHAR = '"'
    -> HEADER = 1
    -> SELECT * FROM customer;
Query OK, 0 rows affected (0.10 sec)

In this example, we used all the options that are relevant for the CSV format. They are:

  • TABLE_TYPE: As explained earlier, this indicates the data source type for the table (in this case, a CSV file).
  • FILE_NAME: This indicates the name and, optionally, path of the data file.
  • HUGE: This indicates the default value, which is 0. If the table is bigger than 2 GB, it makes sense to inform CONNECT by setting it to 1.
  • COMPRESS: Since this is a backup, we want the table to be compressed. As with InnoDB, CONNECT uses the zlib library and the LZ77 algorithm.
  • READONLY: Since this is a backup, making the table read-only is much safer.
  • DATA_CHARSET: This indicates the character set to be used.
  • SEP_CHAR: This indicates the columns separator.
  • ENDING: This indicates the length of the end of line in characters. It is 1 for Unix systems (lines end with ) and 2 on Windows (lines end with ).
  • QUOTED: Strings are quoted. This could be omitted since QCHAR is specified.
  • QCHAR: This indicates the quoting character.
  • HEADER: This indicates the first row that contains the column names.

Restoring the backup is really simple: we just need to delete the data file in /tmp and replace it with the backup. No further actions are needed to let CONNECT use the backup. Then, we can copy the backup contents into the original table using a normal INSERT … SELECT or CREATE TABLE … SELECT statement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.170.223