Back in the Stone Age, cavemen never really had any issues with data incompatibility—stones and one’s own memory were the only storage media. Copying data involved pulling out the old chisel and getting busy on a new slab of granite. Now, of course, the situation is much different. Hundreds of data storage strategies exist, the most commonplace of which includes spreadsheets and various types of relational databases. Working in a complex, even convoluted fashion, you often need to convert data from one storage type to another, say between a spreadsheet and a database, or between an Oracle database and MySQL. If this is done poorly, you could spend hours, and even days and weeks, massaging the converted data into a usable format. This chapter seeks to eliminate that conundrum by introducing MySQL’s data import and export utilities, as well as various techniques and concepts central to lessening the pain involved in performing such tasks.
Common data-formatting standards recognized by most mainstream storage products
The SELECT INTO OUTFILE SQL statement
The LOAD DATA INFILE SQL statement
The mysqlimport utility
How to use PHP to mimic MySQL’s built-in import utilities
Before delving into the core topics, take a moment to review the sample data used as the basis for examples presented in this chapter. Afterward, several basic concepts surrounding MySQL’s import and export strategies are introduced.
Sample Table
This table is used to track basic sales information. Although it lacks many of the columns you might find in a real-world implementation, the additional detail is omitted in an attempt to keep the focus on the concepts introduced in this chapter.
Using Data Delimitation
Even if you’re a budding programmer, you’re probably already quite familiar with software’s exacting demands when it comes to data. All i’s must be dotted and all t’s must be crossed, with a single misplaced character enough to produce unexpected results. Therefore, you can imagine the issues that might arise when attempting to convert data from one format to another. Thankfully, a particularly convenient formatting strategy has become commonplace: delimitation.
Of course, the newline character would be invisible when viewing the file from within a text editor; I am just displaying it here for reason of illustration. Many data import and export utilities, including MySQL’s, revolve around the concept of data delimitation.
Importing Data
In this section, you’ll learn about the two built-in tools MySQL offers for importing delimited data sets into a table: LOAD DATA INFILE and mysqlimport.
Tip
You might consider using the mysqlimport client in lieu of LOAD DATA INFILE when you need to create batch imports executed from a cron job.
Importing Data with LOAD DATA INFILE
LOW PRIORITY : This option forces execution of the command to be delayed until no other clients are reading from the table.
CONCURRENT : Used in conjunction with a MyISAM table, this option allows other threads to retrieve data from the target table while the command is executing.
LOCAL : This option declares that the target infile must reside on the client side. If omitted, the target infile must reside on the same server hosting the MySQL database. When LOCAL is used, the path to the file can be either absolute or relative according to the present location. When omitted, the path can be absolute; local; or, if not present, assumed to reside in MySQL’s designated database directory or in the presently chosen database directory.
REPLACE : This option results in the replacement of existing rows with new rows possessing identical primary or unique keys.
IGNORE : Including this option has the opposite effect of REPLACE. Read-in rows with primary or unique keys matching an existing table row will be ignored.
CHARACTER SET charset_name : MySQL will presume the input file contains characters matching the character set assigned to the system variable character_set_database . If the characters do not match this setting, use this option to identify the file’s character set.
FIELDS TERMINATED BY 'character' : This option signals how fields will be terminated. Therefore, FIELDS TERMINATED BY ',' means that each field will end with a comma, like so:
[OPTIONALLY] ENCLOSED BY 'character' : This option signals that each field will be enclosed by a particular character. This does not eliminate the need for a terminating character. Revising the previous example, using the option FIELDS TERMINATED BY ',' ENCLOSED BY '"' implies that each field is enclosed by a pair of double quotes and delimited by a comma, like so:
- ESCAPED BY 'character' : If the character denoted by the ENCLOSED BY option appears within any of the fields, it must be escaped to ensure that the field is not incorrectly read in. However, this escape character must be defined by ESCAPED BY so that it can be recognized by the command. For example, FIELDS TERMINATED BY ',' ENCLOSED BY "' ESCAPED BY '' would allow the following fields to be properly parsed:'[email protected]','Excellent product! I'll return soon!','2010-12-20'
Note that because the backslash is treated by MySQL as a special character, you need to escape any instance of it by prefixing it with another backslash in the ESCAPED BY clause .
- LINES : The following two options are pertinent to how lines are started and terminated, respectively:
STARTING BY 'character' : This option defines the character intended to signal the beginning of a line, and thus a new table row. Use of this option is generally skipped in preference to the next option.
TERMINATED BY 'character' : This option defines the character intended to signal the conclusion of a line, and thus the end of a table row. Although it could conceivably be anything, this character is most often the newline ( ) character. In many Windows-based files, the newline character is often represented as .
IGNORE number LINES : This option tells the command to ignore the first x lines. This is useful when the target file contains header information.
[(SET column_name = expression,...)] : If the number of fields located in the target file does not match the number of fields in the target table, you need to specify exactly which columns are to be filled in by the file data. For example, if the target file containing sales information consists of only four fields (id, client_id, order_time, and total_cost) rather than the six fields used in prior examples (id, client_id, order_time, sub_total, shipping_cost, and total_cost), yet in the target table all six fields remain, the command would have to be written like so:
Keep in mind that such attempts could fail should one or several of the missing columns be designated as NOT NULL in the table schema. On such occasions, you need to either designate DEFAULT values for the missing columns or further manipulate the data file into an acceptable format.
Tip
If you would like the order of the fields located in the target file to be rearranged as they are read in for insertion into the table, you can do so by rearranging the order via the [(column_name, ...)] option .
A Simple Data Import Example
Choosing the Target Database
If you execute LOAD DATA INFILE before choosing a database, or without explicitly specifying the database in the query syntax, an error will occur.
Security and LOAD DATA INFILE
If LOCAL is not used, the executing user must possess the FILE privilege. This is due to the potential implications of allowing the user to read a file residing on the server, which must either reside in the database directory or be world-readable.
To disable LOAD DATA LOCAL INFILE , start the MySQL daemon with the --local-infile=0 option. You can later enable it as needed from the MySQL client by passing the --local-infile=1 option.
Importing Data with mysqlimport
Useful Options
- --columns , -c : This option should be used when the number or ordering of the fields in the target file does not match that found in the table. For example, suppose you were inserting the following target file, which orders the fields as id, order_id, sub_total, shipping_cost, total_cost, and order_time:45633,12309,22.04,5.67,27.71,2010-12-19 01:13:42942,12310,11.50,3.40,14.90,2010-12-19 01:15:127879,12311,95.99,15.00,110.99,2010-12-19 01:15:22
- Yet the sales table presented at the beginning of this chapter lists the fields in this order: id, client_id, order_time, sub_total, shipping_cost, and total_cost. You can rearrange the input fields during the parsing process so that the data is inserted in the proper location, by including this option:--columns=id,order_id,sub_total,shipping_cost,total_cost,and order_time
--compress , -C : Including this option compresses the data flowing between the client and the server, assuming that both support compression. This option is most effective if you’re loading a target file that does not reside on the same server as the database.
--debug , -# : This option is used to create trace files when debugging.
--delete , -d : This option deletes the target table’s contents before importing the target file’s data.
--fields-terminated-by= , --fields-enclosed-by= , --fields-optionally-enclosed-by= , --fields-escaped-by= : These four options determine mysqlimport’s behavior in terms of how both fields and lines are recognized during the parsing procedure. See the section “Importing Data with LOAD DATA INFILE” earlier in this chapter for a complete introduction.
--force , -f : Including this option causes mysqlimport to continue execution even if errors occur during execution.
--help , -? : Including this option generates a short help file and a comprehensive list of the options discussed in this section.
--host , -h : This option specifies the server location of the target database. The default is localhost.
--ignore , -i : This option causes mysqlimport to ignore any rows located in the target file that share the same primary or unique key as a row already located in the table.
--ignore-lines=n : This option tells mysqlimport to ignore the first n lines of the target file. It’s useful when the target file contains header information that should be disregarded.
--lines-terminated-by= : This option determines how mysqlimport will recognize each separate line in the file. See the section “Importing Data with LOAD DATA INFILE” earlier in this chapter for a complete introduction.
--lock-tables , -l : This option write-locks all tables located in the target database for the duration of mysqlimport‘s execution.
--local , -L : This option specifies that the target file is located on the client. By default, it is assumed that this file is located on the database server; therefore, you need to include this option if you’re executing this command remotely and have not uploaded the file to the server.
--low-priority : This option delays execution of mysqlimport until no other clients are reading from the table.
--password=your_password , -pyour_password : This option is used to specify the password component of your authentication credentials. If the your_password part of this option is omitted, you will be prompted for the password.
--port , -P : If the target MySQL server is running on a nonstandard port (MySQL’s standard port is 3306), you need to specify that port value with this option.
--replace , -r : This option causes mysqlimport to overwrite any rows located in the target file that share the same primary or unique key as a row already located in the table.
--silent , -s : This option tells mysqlimport to output only error information.
--socket , -S : This option should be included if a nondefault socket file had been declared when the MySQL server was started.
--ssl : This option specifies that SSL should be used for the connection. This would be used in conjunction with several other options that aren’t listed here. See Chapter 29 for more information about SSL and the various options used to configure this feature.
--user , -u : By default, mysqlimport compares the name/host combination of the executing system user to the mysql privilege tables, ensuring that the executing user possesses adequate permissions to carry out the requested operation. Because it’s often useful to perform such procedures under the guise of another user, you can specify the “user” component of credentials with this option.
--verbose , -v : This option causes mysqlimport to output a host of potentially useful information pertinent to its behavior.
--version , -V : This option causes mysqlimport to output version information and exit.
This command results in the compression and transmission of the data found in the local text file (c:auditinventory.txt) to the table inventory located in the company database. Note that mysqlimport strips the extension from each text file and uses the resulting name as the table into which to import the text file’s contents.
Writing a mysqlimport Script
The permissions and ownership on this file were changed, setting the owner to mysql and allowing only the mysql user to read the file. The final step involved adding the necessary information to the productupdate user’s crontab, which executed the script each night at 2 a.m. The system ran flawlessly from the first day.
Loading Table Data with PHP
Keep in mind that execution of such a script might time out before completing the insertion of a particularly large dataset. If you think that this might be the case, set PHP’s max_execution_time configuration directive at the beginning of the script. Alternatively, consider using PHP, Perl, or another solution to do the job from the command line. The PHP-CLI version defaults max_execution_time to 0 and therefore there is no timeout. Input from files should be treated as any other input nd sanitized before use.
The next section switches directions of the data flow, explaining how to export data from MySQL into other formats.
Exporting Data
As your computing environment grows increasingly complex, you’ll probably need to share your data among various disparate systems and applications. Sometimes you won’t be able to cull this information from a central source; rather, it must be constantly retrieved from the database, prepped for conversion, and finally converted into a format recognized by the target. This section shows you how to easily export MySQL data using the SQL statement SELECT INTO OUTFILE .
Note
Another commonly used data export tool is mysqldump. Although officially it’s intended for data backup, it serves a secondary purpose as a great tool for creating data export files.
SELECT INTO OUTFILE
OUTFILE : Selecting this option causes the query result to be output to the text file. The formatting of the query result is dependent upon how the export options are set. These options are introduced below.
DUMPFILE : Selecting this option over OUTFILE results in the query results being written as a single line, omitting column or line terminations. This is useful when exporting binary data such as a graphic or a Word file. Keep in mind that you cannot choose OUTFILE when exporting a binary file, or the file will be corrupted. Also, note that a DUMPFILE query must target a single row; combining output from two binary files doesn’t make any sense, and an error will be returned if you attempt it. Specifically, the error returned is, “Result consisted of more than one row.”
EXPORT OPTIONS : The export options determine how the table fields and lines will be delimited in the outfile. Their syntax and rules match exactly those used in LOAD DATA INFILE , introduced earlier in this chapter. Rather than repeat this information, please see the earlier section “Importing Data with LOAD DATA INFILE” for a complete dissertation.
Usage Tips
If a target file path is not specified, the directory of the present database is used.
The executing user must possess the selection privilege (SELECT_PRIV) for the target table(s). Further, the user must possess the FILE privilege because this query will result in a file being written to the server.
If a target file path is specified, the MySQL daemon owner must possess adequate privileges to write to the target directory.
The process leaves the target file world-readable and -writeable, an unexpected side effect. Therefore, if you’re scripting the backup process, you’ll probably want to change the file permissions programmatically once the query has completed.
The query will fail if the target text file already exists.
Export options cannot be included if the target text file is a dump file.
A Simple Data Export Example
Note that the spacing found between each column does not consist of spaces, but rather is due to the tab ( ) character. Also, at the conclusion of each line is the invisible newline ( ) character.
Exporting MySQL Data to Microsoft Excel
Note that the file created is a tab-separated values file (TSV). It is possible to use tsv or xls as the file extension and Excel will be able to open both. This file is then retrieved via a predefined folder located on the corporate intranet, and opened in Microsoft Excel.
Summary
MySQL’s data import and export utilities offer powerful solutions for migrating data to and from your MySQL database. Using them effectively can mean the difference between a maintenance nightmare and a triviality.
This concludes the book. If, or rather, when you need more information or help about PHP and MySQL, you will find yourself searching for answers and examples. Both the PHP and MySQL online documentation are great sources for both technical documentation and examples. Many modern text editors include code completion and quick references to functions and parameters. Find an editor that fits your style and budget. Many of them offer a free version and subscription-based version with support and upgrades.
If you have questions, I highly recommend your local PHP Meetup or other user group. They exist all over the world and provide a great opportunity to share knowledge. Online code sharing services like GitHub ( https://github.com ) and Packagist ( https://packagist.org ) are great places to search for sample code and to share your own code.
Best of luck!