Back in the Stone Age, cavemen never really had any issues with data incompatibility, as stones and one's own memory were the only storage medium. Copying data involved pulling out the old chisel and getting busy on a new slab of granite. Now, of course, the situation is much different. Hundreds of data storage strategies exist, the most commonplace of which include spreadsheets and various types of databases. Working in a complex, often convoluted fashion, you often need to convert data from one storage type to another, say between a spreadsheet and a relational database, or between an Oracle database and MySQL. If this is done poorly, you could spend hours, and even days and weeks, massaging the converted data into a usable format. This chapter seeks to eliminate that conundrum, by introducing MySQL's data import and export utilities, as well as various techniques and concepts central to lessening the pain involved in performing such tasks.
By the conclusion of this chapter, you will be familiar with the following topics:
SELECT INTO OUTFILE
SQL statementLOAD DATA INFILE
SQL statementmysqlimport
utilityBefore delving into the core topics, take a moment to review the sample data used as the basis for examples presented in this chapter. Afterward, several basic concepts surrounding MySQL's import and export strategies are introduced.
If you would like to execute the examples as you proceed through the chapter, the following sales
table will be the focus of several examples in this chapter:
CREATE TABLE sales (
id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
client_id SMALLINT UNSIGNED NOT NULL,
order_time TIMESTAMP NOT NULL,
sub_total DECIMAL(8,2) NOT NULL,
shipping_cost DECIMAL(8,2) NOT NULL,
total_cost DECIMAL(8,2) NOT NULL,
PRIMARY KEY(id);
This table is used to track basic sales information. Although it lacks many of the columns you might find in a real-world implementation, the additional detail is omitted in an attempt to keep the focus on the concepts introduced in this chapter.
Even if you're a budding programmer, you're probably already quite familiar with software's exacting demands when it comes to data. All i's must be dotted and all t's must be crossed; a single out-of-place character is enough to produce unexpected results. Therefore, you can imagine the issues that might arise when attempting to convert data from one format to another. Thankfully, a particularly convenient formatting strategy has become commonplace: delimitation.
Information structures like database tables and spreadsheets share a similar conceptual organization. Each is broken down into rows and columns, each of which is further broken down into cells. Therefore, you can convert between formats as long as you institute a set of rules for determining how the columns, rows, and cells are recognized. An application utility will read in this data and, based on these rules, make the conversions necessary for adapting the data to its own formatting standards. Typically, a character or a character sequence is used as a delimiter, separating each cell within a row, and each row from the following row. For example, the sales
table might be delimited in a format that separates each field by a comma and each row by a newline character:
12309,45633,2007-12-19 01:13:42,22.04,5.67,27.71
12310,942,2007-12-19 01:15:12,11.50,3.40,14.90
12311,7879,2007-12-19 01:15:22,95.99,15.00,110.99
12312,55521,2007-12-19 01:30:45,10.75,3.00,13.75
Many data import and export utilities, including MySQL's, revolve around the concept of data delimitation.
If you've invested the time to read this book, you likely are planning to make MySQL a significant part of your application infrastructure. Therefore, you're probably already thinking about how you're going to migrate existing data, possibly from several sources, into your new database environment. In this section, you'll learn about the two built-in tools MySQL offers for importing delimited data sets into a table: LOAD DATA INFILE
and mysqlimport
.
Tip You might consider using the mysqlimport
client in lieu of LOAD DATA INFILE
when you need to create batch imports executed from a cron
job.
The LOAD DATA INFILE
statement, a command that is executed much like a query is executed within the mysql
client, is used to import delimited text files into a MySQL table. Its generalized syntax follows:
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[CHARACTER SET charset_name]
[FIELDS
[TERMINATED BY 'character'] [[OPTIONALLY] ENCLOSED BY 'character']
[ESCAPED BY 'character']
]
[LINES
[STARTING BY 'character'] [TERMINATED BY 'character']
]
[IGNORE number lines]
[SET column_name = expression, ...)]
Certainly one of MySQL's longer query commands you've seen thus far, isn't it? Yet it's this wide array of options that makes this feature so powerful. Each option is introduced next:
LOW PRIORITY:
This option forces execution of the command to be delayed until no other clients are reading from the table.CONCURRENT:
Used in conjunction with a MyISAM table, this option allows other threads to retrieve data from the target table while the command is executing.LOCAL:
This option declares that the target infile must reside on the client side. If omitted, the target infile must reside on the same server hosting the MySQL database. When LOCAL
is used, the path to the file can be either absolute or relative according to the present location. When omitted, the path can be absolute, local, or, if not present, assumed to reside in MySQL's designated database directory or in the presently chosen database directory.REPLACE:
This option results in the replacement of existing rows with new rows possessing identical primary or unique keys.IGNORE:
Including this option has the opposite effect of REPLACE
. Read-in rows with primary or unique keys matching an existing table row will be ignored.CHARACTER SET
charset_name:
MySQL will presume the input file contains characters matching the character set assigned to the system variable character_set_database
. If the characters do not match this setting, use this option to identify the file's character set.FIELDS TERMINATED BY '
character':
This option signals how fields will be terminated. Therefore, TERMINATED BY
',' means that each field will end with a comma, like so:12312,55521,2007-12-19 01:30:45,10.75,3.00,13.75
LINES TERMINATED BY 'character'
option. Encountering the character specified by this other option by default also delimits the last field in the file, as well as signals to the command that a new line (row) is about to begin.[OPTIONALLY] ENCLOSED BY '
character
':
This option signals that each field will be enclosed by a particular character. This does not eliminate the need for a terminating character. Revising the previous example, using the option FIELDS TERMINATED BY ',' ENCLOSED BY '"'
implies that each field is enclosed by a pair of double quotes and delimited by a comma, like so:"12312","55521","2007-12-19 01:30:45","10.75","3.00","13.75"
OPTIONALLY
flag denotes that character strings only require enclosure by the specified character pattern. Fields containing only integers, floats, and so on need not be enclosed.ESCAPED BY '
character
':
If the character denoted by the ENCLOSED BY
option appears within any of the fields, it must be escaped to ensure that the field is not incorrectly read in. However, this escape character must be defined by ESCAPED BY
so that it can be recognized by the command. For example, FIELDS TERMINATED BY '
,' ENCLOSED BY ''' ESCAPED BY ''
would allow the following fields to be properly parsed:'[email protected]','Excellent product! I'll return soon!','2007-12-20'
ESCAPED BY
clause.LINES:
The following two options are pertinent to how lines are started and terminated, respectively:
STARTING BY '
character
':
This option defines the character intended to signal the beginning of a line, and thus a new table row. Use of this option is generally skipped in preference to the next option.TERMINATED BY '
character
':
This option defines the character intended to signal the conclusion of a line, and thus the end of a table row. Although it could conceivably be anything, this character is most often the newline (
) character. In many Windows-based files, the newline character is often represented as
.IGNORE
number
LINES:
This option tells the command to ignore the first x lines. This is useful when the target file contains header information.[(SET
column_name
=
expression
,...)]:
If the number of fields located in the target file does not match the number of fields in the target table, you need to specify exactly which columns are to be filled in by the file data. For example, if the target file containing sales information consists of only four fields, id
, client_id
, order_time
, and total_cost
, rather than the six fields used in prior examples (id
, client_id
, order_time
, sub_total
, shipping_cost
, and total_cost
), yet in the target table all six fields remain, the command would have to be written like so:LOAD DATA LOCAL INFILE "sales.txt"
INTO TABLE sales (id, client_id, order_time, total_cost);
Keep in mind that such attempts could fail should one or several of the missing columns be designated as NOT NULL
in the table schema. On such occasions, you need to either designate DEFAULT
values for the missing columns or further manipulate the data file into an acceptable format.
You can also set columns to variables such as the current timestamp. For example, presume the sales table was modified to include an additional column named added_to_table
:
LOAD DATA LOCAL INFILE "sales.txt"
INTO TABLE sales (id, client_id, order_time, total_cost)
SET added_to_table = CURRENT_TIMESTAMP;
Tip If you would like the order of the fields located in the target file to be rearranged as they are read in for insertion into the table, you can do so by rearranging the order via the [(column_name
, ...)] option.
A Simple Data Import Example
This example is based upon the ongoing sales theme. Suppose you want to import a file titled productreviews.txt
, which contains the following information:
'43','[email protected]','I love the new Website!'
'44','[email protected]','Why don't you sell shoes?'
'45','[email protected]','The search engine works great!'
The target table, aptly titled productreviews
, consists of three fields, and they are in the same order (comment_id, email, comment)
as the information found in productreviews.txt:
LOAD DATA LOCAL INFILE 'productreviews.txt' INTO TABLE productreviews FIELDS
TERMINATED BY ',' ENCLOSED BY ''' ESCAPED BY ''
LINES TERMINATED BY '
';
Once the import is completed, the productreviews
table will look like this:
+------------+---------------------------+--------------------------------+
| comment_id | email | comment |
+------------+---------------------------+--------------------------------+
| 43 | [email protected] | I love the new Website! |
| 44 | [email protected] | Why don't you sell shoes? |
| 45 | [email protected] | The search engine works great! |
+------------+---------------------------+--------------------------------+
Choosing the Target Database
You might have noticed that the preceding example referenced the target table but did not clearly define the target database. The reason is that LOAD DATA INFILE
assumes that the target table resides in the currently selected database. Alternatively, you can specify the target database by prefixing it with the database name, like so:
LOAD LOCAL DATA INFILE 'productreviews.txt' into table corporate.productreviews;
If you execute LOAD DATA INFILE
before choosing a database, or without explicitly specifying the database in the query syntax, an error will occur.
Security and LOAD DATA INFILE
There are a few security issues that you should keep in mind regarding LOAD DATA INFILE:
LOCAL
is not used (meaning the file will be read from the server side), the executing user must possess the FILE
privilege.LOAD DATA LOCAL INFILE
, start the MySQL daemon with the−-local-infile=0
option.The mysqlimport
client is really just a command-line version of the LOAD DATA INFILE
SQL query. Its general syntax follows:
mysqlimport [options] database textfile1 [textfile2 ... textfileN]
Useful Options
Before reviewing any examples, take a moment to review many of the most commonly used mysqlimport
options:
--columns
, -c:
This option should be used when the number or ordering of the fields in the target file does not match that found in the table. For example, suppose you were inserting the following target file, which orders the fields as id
, order_id
, sub_total
, shipping_cost
, total_cost
, and order_time
:
45633,12309,22.04,5.67,27.71,2007-12-19 01:13:42
942,12310,11.50,3.40,14.90,2007-12-19 01:15:12
7879,12311,95.99,15.00,110.99,2007-12-19 01:15:22
sales
table presented at the beginning of this chapter lists the fields in this order: id
, client_id
, order_time
, sub_total
, shipping_cost
, and total_cost
. You can rearrange the input fields during the parsing process so that the data is inserted in the proper location, by including this option:
--columns=id,order_id,sub_total,shipping_cost,total_cost,and order_time
--compress
, -C:
Including this option compresses the data flowing between the client and the server, assuming that both support compression. This option is most effective if you're loading a target file that does not reside on the same server as the database.--debug
, -#:
This option is used to create trace files when debugging.--delete
, -d:
This option deletes the target table's contents before importing the target file's data.--fields-terminated-by=
, --fields-enclosed-by=
, --fields-optionally-enclosed-by=
, --fields-escaped-by=:
These four options determine mysqlimport
's behavior in terms of how both fields and lines are recognized during the parsing procedure. See the section "Importing Data with LOAD DATA INFILE," earlier in this chapter, for a complete introduction.--force
, -f:
Including this option causes mysqlimport
to continue execution even if errors occur during execution.--help
, -?:
Including this option generates a short help file and a comprehensive list of the options discussed in this section.--host
, -h:
This option specifies the server location of the target database. The default is localhost.--ignore
, -i:
This option causes mysqlimport
to ignore any rows located in the target file that share the same primary or unique key as a row already located in the table.--ignore-lines=
n:
This option tells mysqlimport
to ignore the first n
lines of the target file. It's useful when the target file contains header information that should be disregarded.--lines-terminated-by=:
This option determines how mysqlimport
will recognize each separate line in the file. See the section "Importing Data with LOAD DATA INFILE," earlier in this chapter, for a complete introduction.--lock-tables
, -l:
This option write-locks all tables located in the target database for the duration of mysqlimport
's execution.--local
, -L:
This option specifies that the target file is located on the client. By default, it is assumed that this file is located on the database server; therefore, you need to include this option if you're executing this command remotely and have not uploaded the file to the server.--low-priority:
This option delays execution of mysqlimport
until no other clients are reading from the table.--password=
your_password
, -p
your_password:
This option is used to specify the password component of your authentication credentials. If the your_password
part of this option is omitted, you will be prompted for the password.--port
, -P:
If the target MySQL server is running on a nonstandard port (MySQL's standard port is 3306), you need to specify that port value with this option.--replace
, -r:
This option causes mysqlimport
to overwrite any rows located in the target file that share the same primary or unique key as a row already located in the table.--silent
, -s:
This option tells mysqlimport
to output only error information.--socket
, -S:
This option should be included if a nondefault socket file had been declared when the MySQL server was started.--ssl:
This option specifies that SSL should be used for the connection. This would be used in conjunction with several other options that aren't listed here. See Chapter 29 for more information about SSL and the various options used to configure this feature.--user
, -u:
By default, mysqlimport
compares the name/host combination of the executing system user to the mysql
privilege tables, ensuring that the executing user possesses adequate permissions to carry out the requested operation. Because it's often useful to perform such procedures under the guise of another user, you can specify the "user" component of credentials with this option.--verbose
, -v:
This option causes mysqlimport
to output a host of potentially useful information pertinent to its behavior.--version
, -V:
This option causes mysqlimport
to output version information and exit.mysqlimport Data Import Examples
A simple demonstration of mysqlimport
involves the update of inventory audit information from the workstation of an accountant located in a company's fiscal department to the database:
%>mysqlimport -h intranet.example.com -u fiscal -p --replace
->--compress --local company inventory.txt
This command results in the compression and transmission of the data found in the local text file (C:auditinventory.txt)
to the table inventory located in the company database. Note that mysqlimport
strips the extension from each text file and uses the resulting name as the table into which to import the text file's contents.
Writing a mysqlimport Script
Some years ago, I was involved in the creation of a corporate Web site for a pharmaceutical corporation that, among other things, allowed buyers to browse descriptions and pricing information for roughly 10,000 products. This information was maintained on a mainframe, and the data was synchronized on a regular basis to the MySQL database residing on the Web server. To accomplish this, a one-way trust was created between the machines, along with two shell scripts. The first script, located on the mainframe, was responsible for dumping the data (in delimited format) from the mainframe and then pushing this data file via sftp to the Web server. The second script, located on the Web server, was responsible for executing mysqlimport
, loading this file to the MySQL database. This script was quite trivial to create, and looked like this:
#!/bin/sh
/usr/local/mysql/bin/mysqlimport --delete --silent
--fields-terminated-by=' ' --lines-terminated-by='
'
products /ftp/uploads/products.txt
To keep the logic involved to a bare minimum, a complete dump of the entire mainframe database was executed each night, and the entire MySQL table was deleted before beginning the import. This ensured that all new products were added, existing product information was updated to reflect changes, and any products that were deleted were removed. To prevent the credentials from being passed in via the command line, a system user named productupdate
was created, and a my.cnf
file was placed in the user's home directory, which looked like this:
[client]
host=localhost
user=productupdate
password=secret
The permissions and ownership on this file were changed, setting the owner to mysql
and allowing only the mysql
user to read the file. The final step involved adding the necessary information to the productupdate
user's crontab
, which executed the script each night at 2 a.m. The system ran flawlessly from the first day.
For security reasons, ISPs often disallow the use of LOAD DATA INFILE
, as well as many of MySQL's packaged clients like mysqlimport
. However, such limitations do not necessarily mean that you are out of luck when it comes to importing data, because you can mimic LOAD DATA INFILE
and mysqlimport
functionality using a PHP script. The following script uses PHP's file-handling functionality and a handy function known as fgetcsv()
to open and parse the delimited sales data found at the beginning of this chapter:
<?php
// Connect to the MySQL server and select the corporate database
$mysqli = new mysqli("localhost","someuser","secret","corporate");
// Open and parse the sales.csv file
$fh = fopen("sales.csv", "r");
while ($line = fgetcsv($fh, 1000, ","))
{
$id = $line[0];
$client_id = $line[1];
$order_time = $line[2];
$sub_total = $line[3];
$shipping_cost = $line[4];
$total_cost = $line[5];
// Insert the data into the sales table
$query = "INSERT INTO sales SET id='$id',
client_id='$client_id', order_time='$order_time',
sub_total='$sub_total', shipping_cost='$shipping_cost',
total_cost='$total_cost'";
$result = $mysqli->query($query);
}
fclose($fh);
mysqli_close();
?>
Keep in mind that execution of such a script might time out before completing the insertion of a particularly large data set. If you think that this might be the case, set PHP's max_execution_time
configuration directive at the beginning of the script. Alternatively, consider using PHP or Perl or another solution to do the job from the command line.
The next section switches directions of the data flow, explaining how to export data from MySQL into other formats.
As your computing environment grows increasingly complex, you'll probably need to share your data among various disparate systems and applications. Sometimes you won't be able to cull this information from a central source; rather, it must be constantly retrieved from the database, prepped for conversion, and finally converted into a format recognized by the target. This section shows you how to easily export MySQL data using the SQL statement SELECT INTO OUTFILE
.
Note Another commonly used data export tool, mysqldump
, is introduced in Chapter 27. Although officially it's intended for data backup, it serves a secondary purpose as a great tool for creating data export files.
The SELECT INTO OUTFILE
SQL statement is actually a variant of the SELECT
query. It's used when you want to direct query output to a text file. This file can then be opened by a spreadsheet application, or imported into another database like Microsoft Access, Oracle, or any other software that supports delimitation. Its general syntax format follows:
SELECT [SELECT OPTIONS] INTO {OUTFILE | DUMPFILE} filename
EXPORT_OPTIONS
FROM TABLES [ADDITIONAL SELECT OPTIONS]
The following list summarizes the key options:
OUTFILE:
Selecting this option causes the query result to be output to the text file. The formatting of the query result is dependent upon how the export options are set. These options are introduced below.DUMPFILE:
Selecting this option over OUTFILE
results in the query results being written as a single line, omitting column or line terminations. This is useful when exporting binary data such as a graphic or a Word file. Keep in mind that you cannot choose OUTFILE
when exporting a binary file, or the file will be corrupted. Also, note that a DUMPFILE
query must target a single row; combining output from two binary files doesn't make any sense, and an error will be returned if you attempt it. Specifically, the error returned is, "Result consisted of more than one row."EXPORT OPTIONS:
The export options determine how the table fields and lines will be delimited in the outfile. Their syntax and rules match exactly those used in LOAD DATA INFILE
, introduced earlier in this chapter. Rather than repeat this information, please see the earlier section "Importing Data with LOAD DATA INFILE" for a complete dissertation.Usage Tips
There are several items worth noting regarding use of SELECT INTO OUTFILE
:
SELECT_PRIV
) for the target table(s).Suppose you want to export December, 2007 sales data to a tab-delimited text file, consisting of lines delimited by newline characters:
SELECT * INTO OUTFILE "/backup/corporate/sales/1207.txt"
FIELDS TERMINATED BY ' ' LINES TERMINATED BY '
'
FROM corporate.sales
WHERE MONTH(order_time) = '12' AND YEAR(order_time) = '2007';
Assuming that the executing user has SELECT
privileges for the sales
table found in the corporate
database, and the MySQL daemon process owner can write to the /
backup/corporate/sales/ directory, the file 1207.txt
will be created, with the following data written to it:
12309 45633 2007-12-19 01:13:42 22.04 5.67 27.71
12310 942 2007-12-19 01:15:12 11.50 3.40 14.90
12311 7879 2007-12-19 01:15:22 95.99 15.00 110.99
12312 55521 2007-12-19 01:30:45 10.75 3.00 13.75
Note that the spacing found between each column does not consist of white space, but rather is due to the tab (
) character. Also, at the conclusion of each line is the invisible newline (
) character.
Exporting MySQL Data to Microsoft Excel
Of course, by itself, outputting data to a text file really doesn't accomplish anything except migrate it to a different format. So how do you do something with the data? For instance, suppose employees in the marketing department would like to draw a parallel between a recent holiday sales campaign and a recent rise in sales. To do so, they require the sales data for the month of December. To sift through the data, they'd like it provided in Excel format. Because Excel can convert delimited text files into spreadsheet format, you execute the following query:
SELECT * INTO OUTFILE "/analysis/sales/1207.xls"
FIELDS TERMINATED BY ' ', LINES TERMINATED BY '
' FROM corporate.sales
WHERE MONTH(order_time) = '7' YEAR(order_time) = '2007';
This file is then retrieved via a predefined folder located on the corporate intranet, and opened in Microsoft Excel. A window similar to Figure 38-1 will appear.
If it isn't already selected, choose the Delimited radio button, and click Next to proceed to the next window, the second step of the Text Import Wizard. That window is shown in Figure 38-2.
Figure 38-1. Microsoft Excel's Text Import Wizard
In the second step of the Text Import Wizard, choose the cell delimiter specified in the SELECT INTO OUTFILE
statement. Clicking Next takes you to the final screen, where you have the opportunity to convert any of the imported columns to one of Excel's supported data formats; this task is not always necessary, but consider experimenting with the supported formats in case there is something more applicable for your particular situation. Click Finish, and the data will open in normal Excel fashion.
Figure 38-2. Choosing the delimiter in Excel
MySQL's data import and export utilities offer powerful solutions for migrating data to and from your MySQL database. Using them effectively can mean the difference between a maintenance nightmare and a triviality.
This concludes the book. Best of luck!
18.118.163.159