© Frank M. Kromann 2018
Frank M. KromannBeginning PHP and MySQLhttps://doi.org/10.1007/978-1-4302-6044-8_35

35. Importing and Exporting Data

Frank M. Kromann1 
(1)
Aliso Viejo, CA, USA
 

Back in the Stone Age, cavemen never really had any issues with data incompatibility—stones and one’s own memory were the only storage media. Copying data involved pulling out the old chisel and getting busy on a new slab of granite. Now, of course, the situation is much different. Hundreds of data storage strategies exist, the most commonplace of which includes spreadsheets and various types of relational databases. Working in a complex, even convoluted fashion, you often need to convert data from one storage type to another, say between a spreadsheet and a database, or between an Oracle database and MySQL. If this is done poorly, you could spend hours, and even days and weeks, massaging the converted data into a usable format. This chapter seeks to eliminate that conundrum by introducing MySQL’s data import and export utilities, as well as various techniques and concepts central to lessening the pain involved in performing such tasks.

By the conclusion of this chapter, you will be familiar with the following topics:
  • Common data-formatting standards recognized by most mainstream storage products

  • The SELECT INTO OUTFILE SQL statement

  • The LOAD DATA INFILE SQL statement

  • The mysqlimport utility

  • How to use PHP to mimic MySQL’s built-in import utilities

Before delving into the core topics, take a moment to review the sample data used as the basis for examples presented in this chapter. Afterward, several basic concepts surrounding MySQL’s import and export strategies are introduced.

Sample Table

If you would like to execute the examples as you proceed through the chapter, the following sales table will be the focus of several examples in this chapter:
CREATE TABLE sales (
   id SMALLINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
   client_id SMALLINT UNSIGNED NOT NULL,
   order_time TIMESTAMP NOT NULL,
   sub_total DECIMAL(8,2) NOT NULL,
   shipping_cost DECIMAL(8,2) NOT NULL,
   total_cost DECIMAL(8,2) NOT NULL
);

This table is used to track basic sales information. Although it lacks many of the columns you might find in a real-world implementation, the additional detail is omitted in an attempt to keep the focus on the concepts introduced in this chapter.

Using Data Delimitation

Even if you’re a budding programmer, you’re probably already quite familiar with software’s exacting demands when it comes to data. All i’s must be dotted and all t’s must be crossed, with a single misplaced character enough to produce unexpected results. Therefore, you can imagine the issues that might arise when attempting to convert data from one format to another. Thankfully, a particularly convenient formatting strategy has become commonplace: delimitation.

Information structures like database tables and spreadsheets share a similar conceptual organization. These structures are typically conceptualized as consisting of rows and columns, each of which is further broken down into cells. Therefore, you can convert between formats as long as you institute a set of rules for determining how the columns, rows, and cells are recognized. One of the most important rules involves the establishment of a character or a character sequence that will be used as a delimiter, separating each cell within a row, and each row from the following row. For example, the sales table might be delimited in a format that separates each field by a comma and each row by a newline character:
12309,45633,2010-12-19 01:13:42,22.04,5.67,27.71
12310,942,2010-12-19 01:15:12,11.50,3.40,14.90
12311,7879,2010-12-19 01:15:22,95.99,15.00,110.99
12312,55521,2010-12-19 01:30:45,10.75,3.00,13.75

Of course, the newline character would be invisible when viewing the file from within a text editor; I am just displaying it here for reason of illustration. Many data import and export utilities, including MySQL’s, revolve around the concept of data delimitation.

Importing Data

In this section, you’ll learn about the two built-in tools MySQL offers for importing delimited data sets into a table: LOAD DATA INFILE and mysqlimport.

Tip

You might consider using the mysqlimport client in lieu of LOAD DATA INFILE when you need to create batch imports executed from a cron job.

Importing Data with LOAD DATA INFILE

The LOAD DATA INFILE statement , a command that is executed much like a query typically from within the mysql client, is used to import delimited text files into a MySQL table. Its generalized syntax follows:
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[CHARACTER SET charset_name]
[FIELDS
   [TERMINATED BY 'character'] [[OPTIONALLY] ENCLOSED BY 'character']
   [ESCAPED BY 'character']
]
[LINES
   [STARTING BY 'character'] [TERMINATED BY 'character']
]
[IGNORE number lines]
[(column_name, ...)]
[SET column_name = expression, ...)]
Certainly, one of the longer MySQL query commands seen thus far, isn’t it? Yet it’s this wide array of options that makes it so powerful. Each option is introduced next:
  • LOW PRIORITY : This option forces execution of the command to be delayed until no other clients are reading from the table.

  • CONCURRENT : Used in conjunction with a MyISAM table, this option allows other threads to retrieve data from the target table while the command is executing.

  • LOCAL : This option declares that the target infile must reside on the client side. If omitted, the target infile must reside on the same server hosting the MySQL database. When LOCAL is used, the path to the file can be either absolute or relative according to the present location. When omitted, the path can be absolute; local; or, if not present, assumed to reside in MySQL’s designated database directory or in the presently chosen database directory.

  • REPLACE : This option results in the replacement of existing rows with new rows possessing identical primary or unique keys.

  • IGNORE : Including this option has the opposite effect of REPLACE. Read-in rows with primary or unique keys matching an existing table row will be ignored.

  • CHARACTER SET charset_name : MySQL will presume the input file contains characters matching the character set assigned to the system variable character_set_database . If the characters do not match this setting, use this option to identify the file’s character set.

  • FIELDS TERMINATED BY 'character' : This option signals how fields will be terminated. Therefore, FIELDS TERMINATED BY ',' means that each field will end with a comma, like so:

12312,55521,2010-12-19 01:30:45,10.75,3.00,13.75
The last field does not end in a comma because it isn’t necessary, as typically this option is used in conjunction with the LINES TERMINATED BY 'character' option . Encountering the character specified by this other option by default also delimits the last field in the file, as well as signals to the command that a newline (row) is about to begin.
  • [OPTIONALLY] ENCLOSED BY 'character' : This option signals that each field will be enclosed by a particular character. This does not eliminate the need for a terminating character. Revising the previous example, using the option FIELDS TERMINATED BY ',' ENCLOSED BY '"' implies that each field is enclosed by a pair of double quotes and delimited by a comma, like so:

"12312","55521","2010-12-19 01:30:45","10.75","3.00","13.75"
The optional OPTIONALLY flag denotes that character strings only require enclosure by the specified character pattern. Fields containing only integers, floats, and so on need not be enclosed.
  • ESCAPED BY 'character' : If the character denoted by the ENCLOSED BY option appears within any of the fields, it must be escaped to ensure that the field is not incorrectly read in. However, this escape character must be defined by ESCAPED BY so that it can be recognized by the command. For example, FIELDS TERMINATED BY ',' ENCLOSED BY "' ESCAPED BY '' would allow the following fields to be properly parsed:
    '[email protected]','Excellent product! I'll return soon!','2010-12-20'
  • Note that because the backslash is treated by MySQL as a special character, you need to escape any instance of it by prefixing it with another backslash in the ESCAPED BY clause .

  • LINES : The following two options are pertinent to how lines are started and terminated, respectively:
    • STARTING BY 'character' : This option defines the character intended to signal the beginning of a line, and thus a new table row. Use of this option is generally skipped in preference to the next option.

    • TERMINATED BY 'character' : This option defines the character intended to signal the conclusion of a line, and thus the end of a table row. Although it could conceivably be anything, this character is most often the newline ( ) character. In many Windows-based files, the newline character is often represented as .

  • IGNORE number LINES : This option tells the command to ignore the first x lines. This is useful when the target file contains header information.

  • [(SET column_name = expression,...)] : If the number of fields located in the target file does not match the number of fields in the target table, you need to specify exactly which columns are to be filled in by the file data. For example, if the target file containing sales information consists of only four fields (id, client_id, order_time, and total_cost) rather than the six fields used in prior examples (id, client_id, order_time, sub_total, shipping_cost, and total_cost), yet in the target table all six fields remain, the command would have to be written like so:

LOAD DATA INFILE "sales.txt"
INTO TABLE sales (id, client_id, order_time, total_cost);

Keep in mind that such attempts could fail should one or several of the missing columns be designated as NOT NULL in the table schema. On such occasions, you need to either designate DEFAULT values for the missing columns or further manipulate the data file into an acceptable format.

You can also set columns to variables such as the current timestamp. For example, presume the sales table was modified to include an additional column named added_to_table :
LOAD DATA INFILE "sales.txt"
INTO TABLE sales (id, client_id, order_time, total_cost)
SET added_to_table = CURRENT_TIMESTAMP;

Tip

If you would like the order of the fields located in the target file to be rearranged as they are read in for insertion into the table, you can do so by rearranging the order via the [(column_name, ...)] option .

A Simple Data Import Example

This example is based upon the ongoing sales theme. Suppose you want to import a file titled productreviews.txt, which contains the following information:
'43','[email protected]','I love the new Website!'
'44','[email protected]','Why don't you sell shoes?'
'45','[email protected]','The search engine works great!'
The target table, aptly titled product_reviews, consists of three fields, and they are in the same order (comment_id, email, comment) as the information found in productreviews.txt:
LOAD DATA INFILE 'productreviews.txt' INTO TABLE product_reviews FIELDS
   TERMINATED BY ',' ENCLOSED BY '" ESCAPED BY ''
    LINES TERMINATED BY ' ';
Once the import is completed, the product_reviews table will look like this:
+------------+---------------------------+---------------------------------+
| comment_id | email                     | comment                        |
+------------+---------------------------+--------------------------------+
|        43  | [email protected]         | I love the new Website!        |
|        44  | [email protected]       | Why don't you sell shoes?      |
|        45  | [email protected] | The search engine works great! |
+------------+---------------------------+--------------------------------+

Choosing the Target Database

You might have noticed that the preceding example referenced the target table but did not clearly define the target database. The reason is that LOAD DATA INFILE assumes that the target table resides in the currently selected database. Alternatively, you can specify the target database by prefixing it with the database name, like so:
LOAD DATA INFILE 'productreviews.txt' into table corporate.product_reviews;

If you execute LOAD DATA INFILE before choosing a database, or without explicitly specifying the database in the query syntax, an error will occur.

Security and LOAD DATA INFILE

Using the LOCAL keyword, it’s possible to load a file that resides on the client. This keyword will cause MySQL to retrieve the file from the client computer. Because a malicious administrator or user could exploit this feature by manipulating the target file path, there are a few security issues that you should keep in mind when using this feature:
  • If LOCAL is not used, the executing user must possess the FILE privilege. This is due to the potential implications of allowing the user to read a file residing on the server, which must either reside in the database directory or be world-readable.

  • To disable LOAD DATA LOCAL INFILE , start the MySQL daemon with the --local-infile=0 option. You can later enable it as needed from the MySQL client by passing the --local-infile=1 option.

Importing Data with mysqlimport

The mysqlimport client is just a command-line version of the LOAD DATA INFILE statement . Its general syntax follows:
mysqlimport [options] database textfile1 [textfile2 ... textfileN]

Useful Options

Before reviewing any examples, take a moment to review many of the most commonly used mysqlimport options:
  • --columns , -c : This option should be used when the number or ordering of the fields in the target file does not match that found in the table. For example, suppose you were inserting the following target file, which orders the fields as id, order_id, sub_total, shipping_cost, total_cost, and order_time:
    45633,12309,22.04,5.67,27.71,2010-12-19 01:13:42
    942,12310,11.50,3.40,14.90,2010-12-19 01:15:12
    7879,12311,95.99,15.00,110.99,2010-12-19 01:15:22
  • Yet the sales table presented at the beginning of this chapter lists the fields in this order: id, client_id, order_time, sub_total, shipping_cost, and total_cost. You can rearrange the input fields during the parsing process so that the data is inserted in the proper location, by including this option:
    --columns=id,order_id,sub_total,shipping_cost,total_cost,and order_time
  • --compress , -C : Including this option compresses the data flowing between the client and the server, assuming that both support compression. This option is most effective if you’re loading a target file that does not reside on the same server as the database.

  • --debug , -# : This option is used to create trace files when debugging.

  • --delete , -d : This option deletes the target table’s contents before importing the target file’s data.

  • --fields-terminated-by= , --fields-enclosed-by= , --fields-optionally-enclosed-by= , --fields-escaped-by= : These four options determine mysqlimport’s behavior in terms of how both fields and lines are recognized during the parsing procedure. See the section “Importing Data with LOAD DATA INFILE” earlier in this chapter for a complete introduction.

  • --force , -f : Including this option causes mysqlimport to continue execution even if errors occur during execution.

  • --help , -? : Including this option generates a short help file and a comprehensive list of the options discussed in this section.

  • --host , -h : This option specifies the server location of the target database. The default is localhost.

  • --ignore , -i : This option causes mysqlimport to ignore any rows located in the target file that share the same primary or unique key as a row already located in the table.

  • --ignore-lines=n : This option tells mysqlimport to ignore the first n lines of the target file. It’s useful when the target file contains header information that should be disregarded.

  • --lines-terminated-by= : This option determines how mysqlimport will recognize each separate line in the file. See the section “Importing Data with LOAD DATA INFILE” earlier in this chapter for a complete introduction.

  • --lock-tables , -l : This option write-locks all tables located in the target database for the duration of mysqlimport‘s execution.

  • --local , -L : This option specifies that the target file is located on the client. By default, it is assumed that this file is located on the database server; therefore, you need to include this option if you’re executing this command remotely and have not uploaded the file to the server.

  • --low-priority : This option delays execution of mysqlimport until no other clients are reading from the table.

  • --password=your_password , -pyour_password : This option is used to specify the password component of your authentication credentials. If the your_password part of this option is omitted, you will be prompted for the password.

  • --port , -P : If the target MySQL server is running on a nonstandard port (MySQL’s standard port is 3306), you need to specify that port value with this option.

  • --replace , -r : This option causes mysqlimport to overwrite any rows located in the target file that share the same primary or unique key as a row already located in the table.

  • --silent , -s : This option tells mysqlimport to output only error information.

  • --socket , -S : This option should be included if a nondefault socket file had been declared when the MySQL server was started.

  • --ssl : This option specifies that SSL should be used for the connection. This would be used in conjunction with several other options that aren’t listed here. See Chapter 29 for more information about SSL and the various options used to configure this feature.

  • --user , -u : By default, mysqlimport compares the name/host combination of the executing system user to the mysql privilege tables, ensuring that the executing user possesses adequate permissions to carry out the requested operation. Because it’s often useful to perform such procedures under the guise of another user, you can specify the “user” component of credentials with this option.

  • --verbose , -v : This option causes mysqlimport to output a host of potentially useful information pertinent to its behavior.

  • --version , -V : This option causes mysqlimport to output version information and exit.

Considering some of these options, the following mysqlimport example illustrates a scenario involving the update of inventory audit information residing on the workstation of a company accountant:
%>mysqlimport -h intranet.example.com -u accounting -p --replace
> --compress --local company c:auditinventory.txt

This command results in the compression and transmission of the data found in the local text file (c:auditinventory.txt) to the table inventory located in the company database. Note that mysqlimport strips the extension from each text file and uses the resulting name as the table into which to import the text file’s contents.

Writing a mysqlimport Script

Some years ago, I was involved in the creation of a corporate website for a pharmaceutical corporation that, among other things, allowed buyers to browse descriptions and pricing information for roughly 10,000 products. This information was maintained on a mainframe, and the data was synchronized on a regular basis to the MySQL database residing on the web server. To accomplish this, a one-way trust was created between the machines, along with two shell scripts. The first script, located on the mainframe, was responsible for dumping the data (in delimited format) from the mainframe and then pushing this data file via sftp to the web server. The second script, located on the web server, was responsible for executing mysqlimport, loading this file to the MySQL database. This script was quite trivial to create, and looked like this:
#!/bin/sh
/usr/local/mysql/bin/mysqlimport --delete --silent
--fields-terminated-by=' ' --lines-terminated-by=' '
products /ftp/uploads/products.txt
To keep the logic involved to a bare minimum, a complete dump of the entire mainframe database was executed each night, and a new empty MySQL table was created before beginning the import. The table would have the different name but the same definition. When the import was complete and verified, the old table was dropped and the new table renamed in a single transaction. This ensured that all new products were added, existing product information was updated to reflect changes, and any products that were deleted were removed. To prevent the credentials from being passed in via the command line, a system user named productupdate was created, and a my.cnf file was placed in the user’s home directory, which looked like this:
[client]
host=localhost
user=productupdate
password=secret

The permissions and ownership on this file were changed, setting the owner to mysql and allowing only the mysql user to read the file. The final step involved adding the necessary information to the productupdate user’s crontab, which executed the script each night at 2 a.m. The system ran flawlessly from the first day.

Loading Table Data with PHP

For security reasons, ISPs often disallow the use of LOAD DATA INFILE , as well as many of MySQL’s packaged clients like mysqlimport. However, such limitations do not necessarily mean that you are out of luck when it comes to importing data; you can mimic LOAD DATA INFILE and mysqlimport functionality using a PHP script. The following script uses PHP’s file-handling functionality and a handy function known as fgetcsv() to open and parse the delimited sales data found at the beginning of this chapter:
<?php
    // Connect to the MySQL server and select the corporate database
    $mysqli = new mysqli("localhost","someuser","secret","corporate");
    // Open and parse the sales.csv file
    $fh = fopen("sales.csv", "r");
    while ($fields = fgetcsv($fh, 1000, ","))
    {
        $id = $ fields[0];
        $client_id = $fields[1];
        $order_time = $fields[2];
        $sub_total = $fields[3];
        $shipping_cost = $fields[4];
        $total_cost = $fields[5];
        // Insert the data into the sales table
        $query = "INSERT INTO sales SET id='$id',
            client_id='$client_id', order_time='$order_time',
            sub_total='$sub_total', shipping_cost='$shipping_cost',
            total_cost='$total_cost'";
        $result = $mysqli->query($query);
    }
    fclose($fh);
    $mysqli->close();
?>

Keep in mind that execution of such a script might time out before completing the insertion of a particularly large dataset. If you think that this might be the case, set PHP’s max_execution_time configuration directive at the beginning of the script. Alternatively, consider using PHP, Perl, or another solution to do the job from the command line. The PHP-CLI version defaults max_execution_time to 0 and therefore there is no timeout. Input from files should be treated as any other input nd sanitized before use.

The next section switches directions of the data flow, explaining how to export data from MySQL into other formats.

Exporting Data

As your computing environment grows increasingly complex, you’ll probably need to share your data among various disparate systems and applications. Sometimes you won’t be able to cull this information from a central source; rather, it must be constantly retrieved from the database, prepped for conversion, and finally converted into a format recognized by the target. This section shows you how to easily export MySQL data using the SQL statement SELECT INTO OUTFILE .

Note

Another commonly used data export tool is mysqldump. Although officially it’s intended for data backup, it serves a secondary purpose as a great tool for creating data export files.

SELECT INTO OUTFILE

The SELECT INTO OUTFILE SQL statement is actually a variant of the SELECT query. It’s used when you want to direct query output to a text file. This file can then be opened by a spreadsheet application, or imported into another database like Microsoft Access, Oracle, or any other software that supports delimitation. Its general syntax format follows:
SELECT [SELECT OPTIONS] INTO OUTFILE filename
  EXPORT_OPTIONS
  FROM tables [ADDITIONAL SELECT OPTIONS]
The following list summarizes the key options:
  • OUTFILE : Selecting this option causes the query result to be output to the text file. The formatting of the query result is dependent upon how the export options are set. These options are introduced below.

  • DUMPFILE : Selecting this option over OUTFILE results in the query results being written as a single line, omitting column or line terminations. This is useful when exporting binary data such as a graphic or a Word file. Keep in mind that you cannot choose OUTFILE when exporting a binary file, or the file will be corrupted. Also, note that a DUMPFILE query must target a single row; combining output from two binary files doesn’t make any sense, and an error will be returned if you attempt it. Specifically, the error returned is, “Result consisted of more than one row.”

  • EXPORT OPTIONS : The export options determine how the table fields and lines will be delimited in the outfile. Their syntax and rules match exactly those used in LOAD DATA INFILE , introduced earlier in this chapter. Rather than repeat this information, please see the earlier section “Importing Data with LOAD DATA INFILE” for a complete dissertation.

Usage Tips

There are several items worth noting regarding use of SELECT INTO OUTFILE :
  • If a target file path is not specified, the directory of the present database is used.

  • The executing user must possess the selection privilege (SELECT_PRIV) for the target table(s). Further, the user must possess the FILE privilege because this query will result in a file being written to the server.

  • If a target file path is specified, the MySQL daemon owner must possess adequate privileges to write to the target directory.

  • The process leaves the target file world-readable and -writeable, an unexpected side effect. Therefore, if you’re scripting the backup process, you’ll probably want to change the file permissions programmatically once the query has completed.

  • The query will fail if the target text file already exists.

  • Export options cannot be included if the target text file is a dump file.

A Simple Data Export Example

Suppose you want to export December 2017 sales data to a tab-delimited text file consisting of lines delimited by newline characters:
SELECT * INTO OUTFILE "/backup/corporate/sales/1217.txt"
  FIELDS TERMINATED BY ' ' LINES TERMINATED BY ' '
  FROM corporate.sales
  WHERE MONTH(order_time) = '12' AND YEAR(order_time) = '2017';
The directory separator used here is the Linux/Unix stile. On a Windows-based system you should use the backslash instead. Also, the line ending on a Windows-based system is instead of as used in the example above. Assuming that the executing user has SELECT privileges for the sales table found in the corporate database, and the MySQL daemon process owner can write to the /backup/corporate/sales/ directory, the file 1217.txt will be created with the following data written to it:
12309  45633  2010-12-19  01:13:42  22.04  5.67    27.71
12310  942    2010-12-19  01:15:12  11.50  3.40    14.90
12311  7879   2010-12-19  01:15:22  95.99  15.00   110.99
12312  55521  2010-12-19  01:30:45  10.75  3.00    13.75

Note that the spacing found between each column does not consist of spaces, but rather is due to the tab ( ) character. Also, at the conclusion of each line is the invisible newline ( ) character.

Exporting MySQL Data to Microsoft Excel

Of course, by itself, outputting data to a text file really doesn’t accomplish anything except migrate it to a different format. So how do you do something with the data? For instance, suppose employees in the marketing department would like to draw a parallel between a recent holiday sales campaign and a recent rise in sales. To do so, they require the sales data for the month of December. To sift through the data, they’d like it provided in Excel format. Because Excel can convert delimited text files into spreadsheet format, you execute the following query:
SELECT * INTO OUTFILE "/analysis/sales/1217.xls"
   FIELDS TERMINATED BY ' ', LINES TERMINATED BY ' ' FROM corporate.sales
   WHERE MONTH(order_time) = '12' YEAR(order_time) = '2017';

Note that the file created is a tab-separated values file (TSV). It is possible to use tsv or xls as the file extension and Excel will be able to open both. This file is then retrieved via a predefined folder located on the corporate intranet, and opened in Microsoft Excel.

As discussed in Chapter 24. MySQL comes with two clients that are used to export data. These are mysqldump and mysqlhotcopy. Mysql dump is a database backup application that can dump an entire database into a file. The contents of the file will be a series of SQL commands that can be used to re-create the database as it was at the time of the dump. The syntax for using the mysqldump command looks like this:
$ mysqldump -u <user> -p <database? >database.sql
Alternatively, you might want to use the mysqlhotcopy command. It only supports MyISAM and Archive tables, and it works by flushing the tables to disk and performing a copy of the files in the file system. This is a very fast way to copy a table or a database, but it can only be done on the server where the files are located. In contrast mysqldump can be used to create a database dump of a remote database. The syntax for mysqlhotcopy is:
$ mysqlhotcopy db_name [/path/to/new_directory]

Summary

MySQL’s data import and export utilities offer powerful solutions for migrating data to and from your MySQL database. Using them effectively can mean the difference between a maintenance nightmare and a triviality.

This concludes the book. If, or rather, when you need more information or help about PHP and MySQL, you will find yourself searching for answers and examples. Both the PHP and MySQL online documentation are great sources for both technical documentation and examples. Many modern text editors include code completion and quick references to functions and parameters. Find an editor that fits your style and budget. Many of them offer a free version and subscription-based version with support and upgrades.

If you have questions, I highly recommend your local PHP Meetup or other user group. They exist all over the world and provide a great opportunity to share knowledge. Online code sharing services like GitHub ( https://github.com ) and Packagist ( https://packagist.org ) are great places to search for sample code and to share your own code.

Best of luck!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.108.54