By default, LOAD
DATA
assumes that datafiles contain lines
that are terminated by
linefeed (newline) characters and that data values
within a line are separated by tab characters. The following statement
does not specify anything about the format of the datafile, so MySQL
assumes the default format:
mysql>LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl;
Two LOAD
DATA
clauses provide explicit information
about the datafile format. A FIELDS
clause describes the characteristics of fields within a line, and a
LINES
clause specifies the
line-ending sequence. The following LOAD
DATA
statement indicates that the input file contains data values separated
by colons and lines terminated by carriage returns:
mysql>LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl
->FIELDS TERMINATED BY ':'
->LINES TERMINATED BY ' ';
Each clause follows the table name. If both are present, the
FIELDS
clause must precede the
LINES
clause. The line and field
termination indicators can contain multiple characters. For example,
indicates that lines are
terminated by carriage return/linefeed pairs.
The LINES
clause also has
a
STARTING
BY
subclause. It specifies the sequence to
be stripped from each input record. Like
TERMINATED
BY
, the sequence can have multiple
characters. If TERMINATED
BY
and STARTING
BY
both are present in the LINES
clause, they can appear in any
order.
Note that for STARTING
BY
, everything
up
to the given sequence is
stripped from each line. If you specify STARTING
BY
'X'
and an input line begins with abcX
,
all four leading characters are stripped.
If you use mysqlimport,
command options provide the format specifiers. mysqlimport commands that correspond to the
preceding two LOAD
DATA
statements look like this:
%mysqlimport --local cookbook mytbl.txt
%mysqlimport --local --fields-terminated-by=":" --lines-terminated-by=" "
cookbook mytbl.txt
The order in which you specify the options doesn’t matter for mysqlimport.
You can use hex notation to specify arbitrary format characters
for FIELDS
and LINES
clauses. This can be useful for
loading datafiles that use binary format codes. Suppose that a
datafile has lines with Ctrl-A between fields and Ctrl-B at the end of
lines. The ASCII values for Ctrl-A and Ctrl-B are 1 and 2, so you
represent them as 0x01
and 0x02
:
FIELDS TERMINATED BY 0x01 LINES TERMINATED BY 0x02
mysqlimport also
understands hex constants for format specifiers. You may find this
capability helpful if you don’t like remembering how to type escape
sequences on the command line or when it’s necessary to use quotes
around them. Tab is 0x09
, linefeed
is 0x0a
, and carriage return is
0x0d
. Here’s an example that
indicates that the datafile contains tab-delimited lines terminated by
CRLF pairs:
%mysqlimport --local --lines-terminated-by=0x0d0a
--fields-terminated-by=0x09 cookbook mytbl.txt
When you import datafiles, don’t assume that LOAD
DATA
(or mysqlimport) knows more than it
does. It’s important always to keep in mind that LOAD
DATA
knows nothing at all about the format of your datafile. And always
make sure that you do know what its format is. If the file has been
transferred from one machine to another, its contents may have been
changed in subtle ways of which you’re not aware.
Some LOAD
DATA
frustrations occur because people
expect MySQL to know things that it cannot possibly know. LOAD
DATA
makes certain assumptions about the structure of input files,
represented as the default settings for the line and field
terminators, and for the quote and escape character settings. If your
input doesn’t match those assumptions, you need to tell MySQL about
it.
When in doubt, check the contents of your datafile using a hex
dump program or other utility that displays a visible representation
of whitespace characters like tab, carriage return, and linefeed.
Under Unix, programs such as
od or hexdump can display file contents in a
variety of formats. If you don’t have these or some comparable
utility, the transfer directory
of the recipes
distribution
contains hex dumpers written in Perl, Ruby, and Python (hexdump.pl, hexdump.rb, and hexdump.py), as well as programs that
display printable representations of all characters of a file
(see.pl, see.rb, and see.py). You may find them useful for
examining files to see what they really contain. In some cases, you
may be surprised to discover that a file’s contents are different from
what you think. This is, in fact, quite likely if the file has been
transferred from one machine to another:
An FTP transfer between machines running different
operating systems typically translates line endings to those that
are appropriate for the destination machine if the transfer is
performed in text mode rather than in binary (image) mode. Suppose
that you have tab-delimited linefeed-terminated records in a
datafile that load into MySQL on a Unix system just fine using the
default LOAD
DATA
settings. If you copy the file to a
Windows machine with FTP using a text transfer mode, the linefeeds
might be converted to carriage return/linefeed pairs. On that
machine, the file will not load properly with the same LOAD
DATA
statement if its contents have been
changed. Does MySQL have any way of knowing that? No. So it’s up
to you to tell it, by adding a LINES
TERMINATED
BY
'
'
clause to the statement.
Transfers between any two systems with dissimilar default line
endings can cause these changes.
Data files pasted into email messages often do not survive intact. Mail software may wrap (break) long lines or convert line-ending sequences. If you must transfer a datafile by email, it’s best sent as an attachment.
3.20.224.107