CHAPTER 38
Importing and Exporting Data

Back in the Stone Age, cavemen never really had any issues with data incompatibility, as stones and one's own memory were the only storage medium. Copying data involved pulling out the old chisel and getting busy on a new slab of granite. Now, of course, the situation is much different. Hundreds of data storage strategies exist, the most commonplace of which include spreadsheets and various types of databases. Working in a complex, often convoluted fashion, you often need to convert data from one storage type to another, say between a spreadsheet and a relational database, or between an Oracle database and MySQL. If this is done poorly, you could spend hours, and even days and weeks, massaging the converted data into a usable format. This chapter seeks to eliminate that conundrum, by introducing MySQL's data import and export utilities, as well as various techniques and concepts central to lessening the pain involved in performing such tasks.

By the conclusion of this chapter, you will be familiar with the following topics:

  • Common data-formatting standards recognized by most mainstream storage products
  • The SELECT INTO OUTFILE SQL statement
  • The LOAD DATA INFILE SQL statement
  • The mysqlimport utility
  • How to use PHP to mimic MySQL's built-in import utilities

Before delving into the core topics, take a moment to review the sample data used as the basis for examples presented in this chapter. Afterward, several basic concepts surrounding MySQL's import and export strategies are introduced.

Sample Table

If you would like to execute the examples as you proceed through the chapter, the following sales table will be the focus of several examples in this chapter:

CREATE TABLE sales (
   id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
   client_id SMALLINT UNSIGNED NOT NULL,
   order_time TIMESTAMP NOT NULL,
   sub_total DECIMAL(8,2) NOT NULL,
   shipping_cost DECIMAL(8,2) NOT NULL,
   total_cost DECIMAL(8,2) NOT NULL,
   PRIMARY KEY(id);

This table is used to track basic sales information. Although it lacks many of the columns you might find in a real-world implementation, the additional detail is omitted in an attempt to keep the focus on the concepts introduced in this chapter.

Using Data Delimitation

Even if you're a budding programmer, you're probably already quite familiar with software's exacting demands when it comes to data. All i's must be dotted and all t's must be crossed; a single out-of-place character is enough to produce unexpected results. Therefore, you can imagine the issues that might arise when attempting to convert data from one format to another. Thankfully, a particularly convenient formatting strategy has become commonplace: delimitation.

Information structures like database tables and spreadsheets share a similar conceptual organization. Each is broken down into rows and columns, each of which is further broken down into cells. Therefore, you can convert between formats as long as you institute a set of rules for determining how the columns, rows, and cells are recognized. An application utility will read in this data and, based on these rules, make the conversions necessary for adapting the data to its own formatting standards. Typically, a character or a character sequence is used as a delimiter, separating each cell within a row, and each row from the following row. For example, the sales table might be delimited in a format that separates each field by a comma and each row by a newline character:

12309,45633,2007-12-19 01:13:42,22.04,5.67,27.71
12310,942,2007-12-19 01:15:12,11.50,3.40,14.90
12311,7879,2007-12-19 01:15:22,95.99,15.00,110.99
12312,55521,2007-12-19 01:30:45,10.75,3.00,13.75

Many data import and export utilities, including MySQL's, revolve around the concept of data delimitation.

Importing Data

If you've invested the time to read this book, you likely are planning to make MySQL a significant part of your application infrastructure. Therefore, you're probably already thinking about how you're going to migrate existing data, possibly from several sources, into your new database environment. In this section, you'll learn about the two built-in tools MySQL offers for importing delimited data sets into a table: LOAD DATA INFILE and mysqlimport.


Tip You might consider using the mysqlimport client in lieu of LOAD DATA INFILE when you need to create batch imports executed from a cron job.


Importing Data with LOAD DATA INFILE

The LOAD DATA INFILE statement, a command that is executed much like a query is executed within the mysql client, is used to import delimited text files into a MySQL table. Its generalized syntax follows:

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[CHARACTER SET charset_name]
[FIELDS
   [TERMINATED BY 'character'] [[OPTIONALLY] ENCLOSED BY 'character']
   [ESCAPED BY 'character']
]
[LINES
   [STARTING BY 'character'] [TERMINATED BY 'character']
]
[IGNORE number lines]
[SET column_name = expression, ...)]

Certainly one of MySQL's longer query commands you've seen thus far, isn't it? Yet it's this wide array of options that makes this feature so powerful. Each option is introduced next:

  • LOW PRIORITY: This option forces execution of the command to be delayed until no other clients are reading from the table.
  • CONCURRENT: Used in conjunction with a MyISAM table, this option allows other threads to retrieve data from the target table while the command is executing.
  • LOCAL: This option declares that the target infile must reside on the client side. If omitted, the target infile must reside on the same server hosting the MySQL database. When LOCAL is used, the path to the file can be either absolute or relative according to the present location. When omitted, the path can be absolute, local, or, if not present, assumed to reside in MySQL's designated database directory or in the presently chosen database directory.
  • REPLACE: This option results in the replacement of existing rows with new rows possessing identical primary or unique keys.
  • IGNORE: Including this option has the opposite effect of REPLACE. Read-in rows with primary or unique keys matching an existing table row will be ignored.
  • CHARACTER SET charset_name: MySQL will presume the input file contains characters matching the character set assigned to the system variable character_set_database. If the characters do not match this setting, use this option to identify the file's character set.
  • FIELDS TERMINATED BY 'character': This option signals how fields will be terminated. Therefore, TERMINATED BY ',' means that each field will end with a comma, like so:
    12312,55521,2007-12-19 01:30:45,10.75,3.00,13.75
  • The last field does not end in a comma because it isn't necessary, as typically this option is used in conjunction with the LINES TERMINATED BY 'character' option. Encountering the character specified by this other option by default also delimits the last field in the file, as well as signals to the command that a new line (row) is about to begin.
  • [OPTIONALLY] ENCLOSED BY 'character': This option signals that each field will be enclosed by a particular character. This does not eliminate the need for a terminating character. Revising the previous example, using the option FIELDS TERMINATED BY ',' ENCLOSED BY '"' implies that each field is enclosed by a pair of double quotes and delimited by a comma, like so:
    "12312","55521","2007-12-19 01:30:45","10.75","3.00","13.75"
  • The optional OPTIONALLY flag denotes that character strings only require enclosure by the specified character pattern. Fields containing only integers, floats, and so on need not be enclosed.
  • ESCAPED BY 'character': If the character denoted by the ENCLOSED BY option appears within any of the fields, it must be escaped to ensure that the field is not incorrectly read in. However, this escape character must be defined by ESCAPED BY so that it can be recognized by the command. For example, FIELDS TERMINATED BY ',' ENCLOSED BY ''' ESCAPED BY '' would allow the following fields to be properly parsed:
    '[email protected]','Excellent product! I'll return soon!','2007-12-20'
  • Note that because the backslash is treated by MySQL as a special character, you need to escape any instance of it by prefixing it with another backslash in the ESCAPED BY clause.
  • LINES: The following two options are pertinent to how lines are started and terminated, respectively:
    • STARTING BY 'character': This option defines the character intended to signal the beginning of a line, and thus a new table row. Use of this option is generally skipped in preference to the next option.
    • TERMINATED BY 'character': This option defines the character intended to signal the conclusion of a line, and thus the end of a table row. Although it could conceivably be anything, this character is most often the newline ( ) character. In many Windows-based files, the newline character is often represented as .
  • IGNORE number LINES: This option tells the command to ignore the first x lines. This is useful when the target file contains header information.
  • [(SET column_name = expression,...)]: If the number of fields located in the target file does not match the number of fields in the target table, you need to specify exactly which columns are to be filled in by the file data. For example, if the target file containing sales information consists of only four fields, id, client_id, order_time, and total_cost, rather than the six fields used in prior examples (id, client_id, order_time, sub_total, shipping_cost, and total_cost), yet in the target table all six fields remain, the command would have to be written like so:
    LOAD DATA LOCAL INFILE "sales.txt"
    INTO TABLE sales (id, client_id, order_time, total_cost);

Keep in mind that such attempts could fail should one or several of the missing columns be designated as NOT NULL in the table schema. On such occasions, you need to either designate DEFAULT values for the missing columns or further manipulate the data file into an acceptable format.

You can also set columns to variables such as the current timestamp. For example, presume the sales table was modified to include an additional column named added_to_table:

LOAD DATA LOCAL INFILE "sales.txt"
INTO TABLE sales (id, client_id, order_time, total_cost)
SET added_to_table = CURRENT_TIMESTAMP;

Tip If you would like the order of the fields located in the target file to be rearranged as they are read in for insertion into the table, you can do so by rearranging the order via the [(column_name, ...)] option.


A Simple Data Import Example

This example is based upon the ongoing sales theme. Suppose you want to import a file titled productreviews.txt, which contains the following information:

'43','[email protected]','I love the new Website!'
'44','[email protected]','Why don't you sell shoes?'
'45','[email protected]','The search engine works great!'

The target table, aptly titled productreviews, consists of three fields, and they are in the same order (comment_id, email, comment) as the information found in productreviews.txt:

LOAD DATA LOCAL INFILE 'productreviews.txt' INTO TABLE productreviews FIELDS
   TERMINATED BY ',' ENCLOSED BY ''' ESCAPED BY ''
   LINES TERMINATED BY ' ';

Once the import is completed, the productreviews table will look like this:


+------------+---------------------------+--------------------------------+
| comment_id | email                     | comment                        |
+------------+---------------------------+--------------------------------+
|        43  | [email protected]         | I love the new Website!        |
|        44  | [email protected]       | Why don't you sell shoes?      |
|        45  | [email protected] | The search engine works great! |
+------------+---------------------------+--------------------------------+

Choosing the Target Database

You might have noticed that the preceding example referenced the target table but did not clearly define the target database. The reason is that LOAD DATA INFILE assumes that the target table resides in the currently selected database. Alternatively, you can specify the target database by prefixing it with the database name, like so:

LOAD LOCAL DATA INFILE 'productreviews.txt' into table corporate.productreviews;

If you execute LOAD DATA INFILE before choosing a database, or without explicitly specifying the database in the query syntax, an error will occur.

Security and LOAD DATA INFILE

There are a few security issues that you should keep in mind regarding LOAD DATA INFILE:

  • If LOCAL is not used (meaning the file will be read from the server side), the executing user must possess the FILE privilege.
  • To disable LOAD DATA LOCAL INFILE, start the MySQL daemon with the−-local-infile=0 option.

Importing Data with mysqlimport

The mysqlimport client is really just a command-line version of the LOAD DATA INFILE SQL query. Its general syntax follows:

mysqlimport [options] database textfile1 [textfile2 ... textfileN]

Useful Options

Before reviewing any examples, take a moment to review many of the most commonly used mysqlimport options:

  • --columns, -c: This option should be used when the number or ordering of the fields in the target file does not match that found in the table. For example, suppose you were inserting the following target file, which orders the fields as id, order_id, sub_total, shipping_cost, total_cost, and order_time: 45633,12309,22.04,5.67,27.71,2007-12-19 01:13:42
    942,12310,11.50,3.40,14.90,2007-12-19 01:15:12
    7879,12311,95.99,15.00,110.99,2007-12-19 01:15:22
  • Yet the sales table presented at the beginning of this chapter lists the fields in this order: id, client_id, order_time, sub_total, shipping_cost, and total_cost. You can rearrange the input fields during the parsing process so that the data is inserted in the proper location, by including this option: --columns=id,order_id,sub_total,shipping_cost,total_cost,and order_time
  • --compress, -C: Including this option compresses the data flowing between the client and the server, assuming that both support compression. This option is most effective if you're loading a target file that does not reside on the same server as the database.
  • --debug, -#: This option is used to create trace files when debugging.
  • --delete, -d: This option deletes the target table's contents before importing the target file's data.
  • --fields-terminated-by=, --fields-enclosed-by=, --fields-optionally-enclosed-by=, --fields-escaped-by=: These four options determine mysqlimport's behavior in terms of how both fields and lines are recognized during the parsing procedure. See the section "Importing Data with LOAD DATA INFILE," earlier in this chapter, for a complete introduction.
  • --force, -f: Including this option causes mysqlimport to continue execution even if errors occur during execution.
  • --help, -?: Including this option generates a short help file and a comprehensive list of the options discussed in this section.
  • --host, -h: This option specifies the server location of the target database. The default is localhost.
  • --ignore, -i: This option causes mysqlimport to ignore any rows located in the target file that share the same primary or unique key as a row already located in the table.
  • --ignore-lines=n: This option tells mysqlimport to ignore the first n lines of the target file. It's useful when the target file contains header information that should be disregarded.
  • --lines-terminated-by=: This option determines how mysqlimport will recognize each separate line in the file. See the section "Importing Data with LOAD DATA INFILE," earlier in this chapter, for a complete introduction.
  • --lock-tables, -l: This option write-locks all tables located in the target database for the duration of mysqlimport's execution.
  • --local, -L: This option specifies that the target file is located on the client. By default, it is assumed that this file is located on the database server; therefore, you need to include this option if you're executing this command remotely and have not uploaded the file to the server.
  • --low-priority: This option delays execution of mysqlimport until no other clients are reading from the table.
  • --password=your_password, -pyour_password: This option is used to specify the password component of your authentication credentials. If the your_password part of this option is omitted, you will be prompted for the password.
  • --port, -P: If the target MySQL server is running on a nonstandard port (MySQL's standard port is 3306), you need to specify that port value with this option.
  • --replace, -r: This option causes mysqlimport to overwrite any rows located in the target file that share the same primary or unique key as a row already located in the table.
  • --silent, -s: This option tells mysqlimport to output only error information.
  • --socket, -S: This option should be included if a nondefault socket file had been declared when the MySQL server was started.
  • --ssl: This option specifies that SSL should be used for the connection. This would be used in conjunction with several other options that aren't listed here. See Chapter 29 for more information about SSL and the various options used to configure this feature.
  • --user, -u: By default, mysqlimport compares the name/host combination of the executing system user to the mysql privilege tables, ensuring that the executing user possesses adequate permissions to carry out the requested operation. Because it's often useful to perform such procedures under the guise of another user, you can specify the "user" component of credentials with this option.
  • --verbose, -v: This option causes mysqlimport to output a host of potentially useful information pertinent to its behavior.
  • --version, -V: This option causes mysqlimport to output version information and exit.

mysqlimport Data Import Examples

A simple demonstration of mysqlimport involves the update of inventory audit information from the workstation of an accountant located in a company's fiscal department to the database:

%>mysqlimport -h intranet.example.com -u fiscal -p --replace
->--compress --local company inventory.txt

This command results in the compression and transmission of the data found in the local text file (C:auditinventory.txt) to the table inventory located in the company database. Note that mysqlimport strips the extension from each text file and uses the resulting name as the table into which to import the text file's contents.

Writing a mysqlimport Script

Some years ago, I was involved in the creation of a corporate Web site for a pharmaceutical corporation that, among other things, allowed buyers to browse descriptions and pricing information for roughly 10,000 products. This information was maintained on a mainframe, and the data was synchronized on a regular basis to the MySQL database residing on the Web server. To accomplish this, a one-way trust was created between the machines, along with two shell scripts. The first script, located on the mainframe, was responsible for dumping the data (in delimited format) from the mainframe and then pushing this data file via sftp to the Web server. The second script, located on the Web server, was responsible for executing mysqlimport, loading this file to the MySQL database. This script was quite trivial to create, and looked like this:

#!/bin/sh
/usr/local/mysql/bin/mysqlimport --delete --silent
--fields-terminated-by=' ' --lines-terminated-by=' '
products /ftp/uploads/products.txt

To keep the logic involved to a bare minimum, a complete dump of the entire mainframe database was executed each night, and the entire MySQL table was deleted before beginning the import. This ensured that all new products were added, existing product information was updated to reflect changes, and any products that were deleted were removed. To prevent the credentials from being passed in via the command line, a system user named productupdate was created, and a my.cnf file was placed in the user's home directory, which looked like this:

[client]
host=localhost
user=productupdate
password=secret

The permissions and ownership on this file were changed, setting the owner to mysql and allowing only the mysql user to read the file. The final step involved adding the necessary information to the productupdate user's crontab, which executed the script each night at 2 a.m. The system ran flawlessly from the first day.

Loading Table Data with PHP

For security reasons, ISPs often disallow the use of LOAD DATA INFILE, as well as many of MySQL's packaged clients like mysqlimport. However, such limitations do not necessarily mean that you are out of luck when it comes to importing data, because you can mimic LOAD DATA INFILE and mysqlimport functionality using a PHP script. The following script uses PHP's file-handling functionality and a handy function known as fgetcsv() to open and parse the delimited sales data found at the beginning of this chapter:

<?php
    // Connect to the MySQL server and select the corporate database
    $mysqli = new mysqli("localhost","someuser","secret","corporate");

    // Open and parse the sales.csv file
    $fh = fopen("sales.csv", "r");

    while ($line = fgetcsv($fh, 1000, ","))
    {
        $id = $line[0];
        $client_id = $line[1];
        $order_time = $line[2];
        $sub_total = $line[3];
        $shipping_cost = $line[4];
        $total_cost = $line[5];

        // Insert the data into the sales table
        $query = "INSERT INTO sales SET id='$id',
            client_id='$client_id', order_time='$order_time',
            sub_total='$sub_total', shipping_cost='$shipping_cost',
            total_cost='$total_cost'";

        $result = $mysqli->query($query);
    }

    fclose($fh);
    mysqli_close();

?>

Keep in mind that execution of such a script might time out before completing the insertion of a particularly large data set. If you think that this might be the case, set PHP's max_execution_time configuration directive at the beginning of the script. Alternatively, consider using PHP or Perl or another solution to do the job from the command line.

The next section switches directions of the data flow, explaining how to export data from MySQL into other formats.

Exporting Data

As your computing environment grows increasingly complex, you'll probably need to share your data among various disparate systems and applications. Sometimes you won't be able to cull this information from a central source; rather, it must be constantly retrieved from the database, prepped for conversion, and finally converted into a format recognized by the target. This section shows you how to easily export MySQL data using the SQL statement SELECT INTO OUTFILE.


Note Another commonly used data export tool, mysqldump, is introduced in Chapter 27. Although officially it's intended for data backup, it serves a secondary purpose as a great tool for creating data export files.


SELECT INTO OUTFILE

The SELECT INTO OUTFILE SQL statement is actually a variant of the SELECT query. It's used when you want to direct query output to a text file. This file can then be opened by a spreadsheet application, or imported into another database like Microsoft Access, Oracle, or any other software that supports delimitation. Its general syntax format follows:

SELECT [SELECT OPTIONS] INTO {OUTFILE | DUMPFILE} filename
  EXPORT_OPTIONS
  FROM TABLES [ADDITIONAL SELECT OPTIONS]

The following list summarizes the key options:

  • OUTFILE: Selecting this option causes the query result to be output to the text file. The formatting of the query result is dependent upon how the export options are set. These options are introduced below.
  • DUMPFILE: Selecting this option over OUTFILE results in the query results being written as a single line, omitting column or line terminations. This is useful when exporting binary data such as a graphic or a Word file. Keep in mind that you cannot choose OUTFILE when exporting a binary file, or the file will be corrupted. Also, note that a DUMPFILE query must target a single row; combining output from two binary files doesn't make any sense, and an error will be returned if you attempt it. Specifically, the error returned is, "Result consisted of more than one row."
  • EXPORT OPTIONS: The export options determine how the table fields and lines will be delimited in the outfile. Their syntax and rules match exactly those used in LOAD DATA INFILE, introduced earlier in this chapter. Rather than repeat this information, please see the earlier section "Importing Data with LOAD DATA INFILE" for a complete dissertation.

Usage Tips

There are several items worth noting regarding use of SELECT INTO OUTFILE:

  • If a target file path is not specified, the directory of the present database is used.
  • The executing user must possess the selection privilege (SELECT_PRIV) for the target table(s).
  • If a target file path is specified, the MySQL daemon owner must possess adequate privileges to write to the target directory.
  • The process leaves the target file world-readable and -writeable, an unexpected side effect. Therefore, if you're scripting the backup process, you'll probably want to change the file permissions programmatically once the query has completed.
  • The query will fail if the target text file already exists.
  • Export options cannot be included if the target text file is a dump file.

A Simple Data Export Example

Suppose you want to export December, 2007 sales data to a tab-delimited text file, consisting of lines delimited by newline characters:

SELECT * INTO OUTFILE "/backup/corporate/sales/1207.txt"
       FIELDS TERMINATED BY ' ' LINES TERMINATED BY ' '
       FROM corporate.sales
       WHERE MONTH(order_time) = '12' AND YEAR(order_time) = '2007';

Assuming that the executing user has SELECT privileges for the sales table found in the corporate database, and the MySQL daemon process owner can write to the /backup/corporate/sales/ directory, the file 1207.txt will be created, with the following data written to it:

12309  45633  2007-12-19  01:13:42  22.04  5.67    27.71
12310  942    2007-12-19  01:15:12  11.50  3.40    14.90
12311  7879   2007-12-19  01:15:22  95.99  15.00   110.99
12312  55521  2007-12-19  01:30:45  10.75  3.00    13.75

Note that the spacing found between each column does not consist of white space, but rather is due to the tab ( ) character. Also, at the conclusion of each line is the invisible newline ( ) character.

Exporting MySQL Data to Microsoft Excel

Of course, by itself, outputting data to a text file really doesn't accomplish anything except migrate it to a different format. So how do you do something with the data? For instance, suppose employees in the marketing department would like to draw a parallel between a recent holiday sales campaign and a recent rise in sales. To do so, they require the sales data for the month of December. To sift through the data, they'd like it provided in Excel format. Because Excel can convert delimited text files into spreadsheet format, you execute the following query:

SELECT * INTO OUTFILE "/analysis/sales/1207.xls"
   FIELDS TERMINATED BY ' ', LINES TERMINATED BY ' ' FROM corporate.sales
   WHERE MONTH(order_time) = '7' YEAR(order_time) = '2007';

This file is then retrieved via a predefined folder located on the corporate intranet, and opened in Microsoft Excel. A window similar to Figure 38-1 will appear.

If it isn't already selected, choose the Delimited radio button, and click Next to proceed to the next window, the second step of the Text Import Wizard. That window is shown in Figure 38-2.

image

Figure 38-1. Microsoft Excel's Text Import Wizard

In the second step of the Text Import Wizard, choose the cell delimiter specified in the SELECT INTO OUTFILE statement. Clicking Next takes you to the final screen, where you have the opportunity to convert any of the imported columns to one of Excel's supported data formats; this task is not always necessary, but consider experimenting with the supported formats in case there is something more applicable for your particular situation. Click Finish, and the data will open in normal Excel fashion.

image

Figure 38-2. Choosing the delimiter in Excel

Summary

MySQL's data import and export utilities offer powerful solutions for migrating data to and from your MySQL database. Using them effectively can mean the difference between a maintenance nightmare and a triviality.

This concludes the book. Best of luck!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.208.72