Someone gives you a datafile and says, “Here, put this into MySQL for me.” But no table yet exists to hold the data.
Write the
CREATE
TABLE
statement yourself. Or use a utility
that guesses the table structure by examining the contents of the
datafile.
Sometimes you need to import data into MySQL for which no table
has yet been set up. You can create the table yourself, based on any
knowledge you might have about the contents of the file. Or you might
be able to avoid some of the work by using guess_table.pl, a utility located in the
transfer directory of the
recipes
distribution. guess_table.pl reads the datafile to see
what kind of information it contains, and then attempts to produce an
appropriate CREATE
TABLE
statement that matches the contents of
the file. This script is necessarily imperfect, because column
contents sometimes are ambiguous. (For example, a column containing a
small number of distinct strings might be a VARCHAR
column or an ENUM
.) Still, it’s often easier to tweak the
CREATE
TABLE
statement that guess_table.pl produces than to write the
entire statement from scratch. This utility also has a diagnostic
function, although that’s not its primary purpose. For example, you
might believe a column contains only numbers, but if guess_table.pl indicates that it should be
created using a VARCHAR
type, that
tells you the column contains at least one nonnumeric value.
guess_table.pl assumes that
its input is in tab-delimited, linefeed-terminated format. It also
assumes valid input because any attempt to guess data types based on
possibly flawed data is doomed to failure. This means, for example,
that if a date column is to be recognized as such, it should be in ISO
format. Otherwise, guess_table.pl
may characterize it as a VARCHAR
column. If a datafile doesn’t satisfy these assumptions, you may be
able to reformat it first using the cvt_file.pl and cvt_date.pl utilities described in Recipes
and .
guess_table.pl understands the following options:
--labels
Interpret the first input line as a row of column labels,
and use them for table column names. If this option is omitted,
guess_table.pl uses default
column names of c1
, c2
, and so forth.
Note that if the file contains a row of labels, and you
neglect to specify this option, the labels will be treated as
data values by guess_table.pl. The likely result is
that the script will characterize all
columns as VARCHAR
columns
(even those that otherwise contain only numeric or temporal
values), due to the presence of a nonnumeric or nontemporal
value in the column.
--lower
,
--upper
Force column names in the CREATE
TABLE
statement to be lowercase or
uppercase.
--quote-names
,
--skip-quote-names
Quote or do not quote table and column identifiers in the
CREATE
TABLE
statement with `
characters (for example, `mytbl`
). This can be useful if an
identifier is a reserved word. The default is to quote
identifiers.
--report
Generate a report rather than a CREATE
TABLE
statement. The script displays
the information that it gathers about each column.
--table
=
tbl_name
Specify the table name to use in the CREATE
TABLE
statement. The default name is
t
.
Here’s an example of how guess_table.pl works. Suppose that a file named stockdat.csv is in CSV format and has the following contents:
commodity,trade_date,shares,price,change sugar,12-14-2006,1000000,10.50,-.125 oil,12-14-2006,96000,60.25,.25 wheat,12-14-2006,2500000,8.75,0 gold,12-14-2006,13000,103.25,2.25 sugar,12-15-2006,970000,10.60,.1 oil,12-15-2006,105000,60.5,.25 wheat,12-15-2006,2370000,8.65,-.1 gold,12-15-2006,11000,101,-2.25
The first row indicates the column labels, and the following
rows contain data records, one per line. The values in the trade_date
column are dates, but they are in
MM-DD-CCYY
format rather than the ISO
format that MySQL expects. cvt_date.pl can convert these dates to ISO
format. However, both cvt_date.pl
and guess_table.pl require input in
tab-delimited, linefeed-terminated format. So first let’s use cvt_file.pl to convert the input to
tab-delimited, linefeed-terminated format. Then we can convert the
dates with cvt_date.pl:
%cvt_file.pl --iformat=csv stockdat.csv > tmp1.txt
%cvt_date.pl --iformat=us tmp1.txt > tmp2.txt
Then feed the resulting file, tmp2.txt, to guess_table.pl:
%guess_table.pl --labels --table=stocks tmp2.txt > stocks.sql
The CREATE
TABLE
statement that guess_table.pl writes to stocks.sql looks like this:
CREATE TABLE `stocks` ( `commodity` VARCHAR(5) NOT NULL, `trade_date` DATE NOT NULL, `shares` INT UNSIGNED NOT NULL, `price` DOUBLE UNSIGNED NOT NULL, `change` DOUBLE NOT NULL );
guess_table.pl produces that statement based on deductions such as the following:
If a column contains only numeric values, it’s assumed to be
an INT
if no values contain a
decimal point, and DOUBLE
otherwise.
If a numeric column contains no negative values, the column
is likely to be UNSIGNED
.
If a column contains no empty values, guess_table.pl assumes that it’s
probably NOT
NULL
.
Columns that cannot be classified as numbers or dates are
taken to be VARCHAR
columns,
with a length equal to the longest value present in the
column.
You might want to edit the CREATE
TABLE
statement that guess_table.pl produces, to make
modifications such as increasing the size of character fields,
changing VARCHAR
to CHAR
, or adding indexes. Another reason to
edit the statement is that if a column has a name that is a reserved
word in MySQL, you can rename it.
To create the table, use the statement produced by guess_table.pl:
%mysql cookbook < stocks.sql
Then you can load the datafile into the table (skipping the initial row of labels):
mysql>LOAD DATA LOCAL INFILE 'tmp2.txt' INTO TABLE stocks
->IGNORE 1 LINES;
The resulting data after import looks like this:
mysql>SELECT * FROM stocks;
+-----------+------------+---------+--------+--------+
| commodity | trade_date | shares | price | change |
+-----------+------------+---------+--------+--------+
| sugar | 2006-12-14 | 1000000 | 10.5 | -0.125 |
| oil | 2006-12-14 | 96000 | 60.25 | 0.25 |
| wheat | 2006-12-14 | 2500000 | 8.75 | 0 |
| gold | 2006-12-14 | 13000 | 103.25 | 2.25 |
| sugar | 2006-12-15 | 970000 | 10.6 | 0.1 |
| oil | 2006-12-15 | 105000 | 60.5 | 0.25 |
| wheat | 2006-12-15 | 2370000 | 8.65 | -0.1 |
| gold | 2006-12-15 | 11000 | 101 | -2.25 |
+-----------+------------+---------+--------+--------+
3.144.17.137