Use a join—that is, a query that lists multiple tables in
its
FROM
clause and
tells MySQL how to match information from them.
The essential idea behind a join is that it combines rows in one table with rows in one or more other tables. Joins enable you to combine information from multiple tables when each table contains only part of the information in which you’re interested. Output rows from a join contain more information than rows from either table by itself.
A complete join that produces all possible row combinations is
called a
Cartesian product. For example,
joining each row in a 100-row table to each row in a 200-row table
produces a result containing 100 ×
200, or 20,000 rows. With larger tables, or joins between more than
two tables, the result set for a Cartesian product can easily become
immense. Because of that, and because you rarely want all the
combinations anyway, a join normally includes an ON
or USING
clause that specifies how to join rows
between tables. (This requires that each table have one or more
columns of common information that can be used to link them together
logically.) You can also include a WHERE
clause that restricts which of the
joined rows to select. Each of these clauses narrows the focus of the
query.
This recipe introduces basic join syntax and demonstrates how
joins help you answer specific types of questions when you are looking
for matches between tables. Later recipes show how to identify
mismatches between tables (Finding Rows with No Match in Another Table) and how to compare a table to
itself (Comparing a Table to Itself). The examples
assume that you have an art collection and use the following two
tables to record your acquisitions. artist
lists those painters whose works you
want to collect, and painting
lists
each painting that you’ve actually purchased:
CREATE TABLE artist ( a_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # artist ID name VARCHAR(30) NOT NULL, # artist name PRIMARY KEY (a_id), UNIQUE (name) ); CREATE TABLE painting ( a_id INT UNSIGNED NOT NULL, # artist ID p_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # painting ID title VARCHAR(100) NOT NULL, # title of painting state VARCHAR(2) NOT NULL, # state where purchased price INT UNSIGNED, # purchase price (dollars) INDEX (a_id), PRIMARY KEY (p_id) );
You’ve just begun the collection, so the tables contain only the following rows:
mysql>SELECT * FROM artist ORDER BY a_id;
+------+----------+ | a_id | name | +------+----------+ | 1 | Da Vinci | | 2 | Monet | | 3 | Van Gogh | | 4 | Picasso | | 5 | Renoir | +------+----------+ mysql>SELECT * FROM painting ORDER BY a_id, p_id;
+------+------+-------------------+-------+-------+ | a_id | p_id | title | state | price | +------+------+-------------------+-------+-------+ | 1 | 1 | The Last Supper | IN | 34 | | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | 3 | Starry Night | KY | 48 | | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | 5 | The Rocks | IA | 33 | | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+------+-------------------+-------+-------+
The low values in the price
column of the painting
table betray
the fact that your collection actually contains only cheap facsimiles,
not the originals. Well, that’s all right: who can afford the
originals?
Each table contains partial information about your collection.
For example, the artist
table
doesn’t tell you which paintings each artist produced, and the
painting
table lists artist IDs but
not their names. To use the information in both tables, you can ask
MySQL to show you various combinations of artists and paintings by
writing a query that performs a join. A join names two or more tables
after the FROM
keyword. In the
output column list, you can name columns from any or all the joined
tables, or use expressions that are based on those columns,
tbl_name
.*
to select all columns from a given table,
or *
to select all columns from all
tables.
The simplest join involves two tables and selects all columns from each. With no restrictions, the join
generates output for all combinations of rows (that is, the Cartesian
product). The following complete join between the artist
and painting
tables shows this:
mysql>SELECT * FROM artist, painting;
+------+----------+------+------+-------------------+-------+-------+
| a_id | name | a_id | p_id | title | state | price |
+------+----------+------+------+-------------------+-------+-------+
| 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 |
| 2 | Monet | 1 | 1 | The Last Supper | IN | 34 |
| 3 | Van Gogh | 1 | 1 | The Last Supper | IN | 34 |
| 4 | Picasso | 1 | 1 | The Last Supper | IN | 34 |
| 5 | Renoir | 1 | 1 | The Last Supper | IN | 34 |
| 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 |
| 2 | Monet | 1 | 2 | The Mona Lisa | MI | 87 |
| 3 | Van Gogh | 1 | 2 | The Mona Lisa | MI | 87 |
| 4 | Picasso | 1 | 2 | The Mona Lisa | MI | 87 |
| 5 | Renoir | 1 | 2 | The Mona Lisa | MI | 87 |
| 1 | Da Vinci | 3 | 3 | Starry Night | KY | 48 |
| 2 | Monet | 3 | 3 | Starry Night | KY | 48 |
| 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 |
| 4 | Picasso | 3 | 3 | Starry Night | KY | 48 |
| 5 | Renoir | 3 | 3 | Starry Night | KY | 48 |
| 1 | Da Vinci | 3 | 4 | The Potato Eaters | KY | 67 |
| 2 | Monet | 3 | 4 | The Potato Eaters | KY | 67 |
| 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 |
| 4 | Picasso | 3 | 4 | The Potato Eaters | KY | 67 |
| 5 | Renoir | 3 | 4 | The Potato Eaters | KY | 67 |
| 1 | Da Vinci | 3 | 5 | The Rocks | IA | 33 |
| 2 | Monet | 3 | 5 | The Rocks | IA | 33 |
| 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 |
| 4 | Picasso | 3 | 5 | The Rocks | IA | 33 |
| 5 | Renoir | 3 | 5 | The Rocks | IA | 33 |
| 1 | Da Vinci | 5 | 6 | Les Deux Soeurs | NE | 64 |
| 2 | Monet | 5 | 6 | Les Deux Soeurs | NE | 64 |
| 3 | Van Gogh | 5 | 6 | Les Deux Soeurs | NE | 64 |
| 4 | Picasso | 5 | 6 | Les Deux Soeurs | NE | 64 |
| 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 |
+------+----------+------+------+-------------------+-------+-------+
The statement output illustrates why a complete join generally is not useful: it produces a lot of output, and the result is not meaningful. Clearly, you’re not maintaining these tables to match every artist with every painting, which is what the preceding statement does. An unrestricted join in this case produces nothing of value.
To answer questions meaningfully, you must combine the two
tables in a way that produces only the relevant matches. Doing so is a
matter of including appropriate join conditions. For example, to
produce a list of paintings together with the artist names, you can
associate rows from the two tables using a simple WHERE
clause that matches up values in the
artist ID column that is common to both tables and that serves as the
link between them:
mysql>SELECT * FROM artist, painting
->WHERE artist.a_id = painting.a_id;
+------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+
The column names in the
WHERE
clause include
table qualifiers to make it clear which a_id
values to compare. The output indicates
who painted each painting, and, conversely, which paintings by each
artist are in your collection.
Another way to write the same join is to use
INNER
JOIN
rather than the comma operator and
indicate the matching conditions with an
ON
clause:
mysql>SELECT * FROM artist INNER JOIN painting
->ON artist.a_id = painting.a_id;
+------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+
In the special case that the matched columns have the same name
in both tables and are compared using the =
operator, you can use an INNER JOIN
with a USING
clause instead. This requires no table qualifiers, and each join
column is named only once:
mysql>SELECT * FROM artist INNER JOIN painting
->USING(a_id);
+------+----------+------+-------------------+-------+-------+ | a_id | name | p_id | title | state | price | +------+----------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 5 | The Rocks | IA | 33 | | 5 | Renoir | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+-------------------+-------+-------+
Note that when you write a query with a USING
clause, SELECT
*
returns only one instance of each join column (a_id
).
Any of ON
, USING
, or WHERE
can include comparisons, so how do you
know which join conditions to put in each clause? As a rule of thumb,
it’s conventional to use ON
or
USING
to specify how to join the
tables, and the WHERE
clause to
restrict which of the joined rows to select. For example, to join
tables based on the a_id
column,
but select only rows for paintings obtained in Kentucky, use an
ON
(or USING
) clause to match the rows in the two
tables, and a WHERE
clause to test
the state
column:
mysql>SELECT * FROM artist INNER JOIN painting
->ON artist.a_id = painting.a_id
->WHERE painting.state = 'KY';
+------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | +------+----------+------+------+-------------------+-------+-------+
The preceding queries use SELECT
*
to select all columns. To be more selective about which columns a
statement should display, provide a list that names only those columns
in which you’re interested:
mysql>SELECT artist.name, painting.title, painting.state, painting.price
->FROM artist INNER JOIN painting
->ON artist.a_id = painting.a_id
->WHERE painting.state = 'KY';
+----------+-------------------+-------+-------+ | name | title | state | price | +----------+-------------------+-------+-------+ | Van Gogh | Starry Night | KY | 48 | | Van Gogh | The Potato Eaters | KY | 67 | +----------+-------------------+-------+-------+
You’re not limited to two tables when writing joins. Suppose
that you prefer to see complete state names rather than abbreviations
in the preceding query result. The states
table used in earlier chapters maps
state abbreviations to names, so you can add it to the previous query
to display names:
mysql>SELECT artist.name, painting.title, states.name, painting.price
->FROM artist INNER JOIN painting INNER JOIN states
->ON artist.a_id = painting.a_id AND painting.state = states.abbrev;
+----------+-------------------+----------+-------+ | name | title | name | price | +----------+-------------------+----------+-------+ | Da Vinci | The Last Supper | Indiana | 34 | | Da Vinci | The Mona Lisa | Michigan | 87 | | Van Gogh | Starry Night | Kentucky | 48 | | Van Gogh | The Potato Eaters | Kentucky | 67 | | Van Gogh | The Rocks | Iowa | 33 | | Renoir | Les Deux Soeurs | Nebraska | 64 | +----------+-------------------+----------+-------+
Another common use of three-way joins is for enumerating many-to-many relationships. See Enumerating a Many-to-Many Relationship for an example.
By including appropriate conditions in your joins, you can answer very specific questions, such as the following:
Which paintings did Van Gogh paint? To answer this question,
use the a_id
value to find
matching rows, add a WHERE
clause to restrict output to those rows that contain the artist
name, and select the title from those rows:
mysql>SELECT painting.title
->FROM artist INNER JOIN painting ON artist.a_id = painting.a_id
->WHERE artist.name = 'Van Gogh';
+-------------------+ | title | +-------------------+ | Starry Night | | The Potato Eaters | | The Rocks | +-------------------+
Who painted the Mona Lisa? Again you
use the a_id
column to join the
rows, but this time the WHERE
clause restricts output to those rows that contain the title, and
you select the artist name from those rows:
mysql>SELECT artist.name
->FROM artist INNER JOIN painting ON artist.a_id = painting.a_id
->WHERE painting.title = 'The Mona Lisa';
+----------+ | name | +----------+ | Da Vinci | +----------+
Which artists’ paintings did you purchase in Kentucky or
Indiana? This is somewhat similar to the previous statement, but
it tests a different column (a_id
) in the painting
table to determine which rows
to join with the artist
table:
mysql>SELECT DISTINCT artist.name
->FROM artist INNER JOIN painting ON artist.a_id = painting.a_id
->WHERE painting.state IN ('KY','IN'),
+----------+ | name | +----------+ | Da Vinci | | Van Gogh | +----------+
The statement also uses DISTINCT
to display each artist name
just once. Try it without DISTINCT
and you’ll see that Van Gogh is
listed twice; that’s because you obtained two Van Goghs in
Kentucky.
Joins can also be used with aggregate functions to produce summaries. For example, to find out how many paintings you have per artist, use this statement:
mysql>SELECT artist.name, COUNT(*) AS 'number of paintings'
->FROM artist INNER JOIN painting ON artist.a_id = painting.a_id
->GROUP BY artist.name;
+----------+---------------------+ | name | number of paintings | +----------+---------------------+ | Da Vinci | 2 | | Renoir | 1 | | Van Gogh | 3 | +----------+---------------------+
A more elaborate statement might also show how much you paid for each artist’s paintings, in total and on average:
mysql>SELECT artist.name,
->COUNT(*) AS 'number of paintings',
->SUM(painting.price) AS 'total price',
->AVG(painting.price) AS 'average price'
->FROM artist INNER JOIN painting ON artist.a_id = painting.a_id
->GROUP BY artist.name;
+----------+---------------------+-------------+---------------+ | name | number of paintings | total price | average price | +----------+---------------------+-------------+---------------+ | Da Vinci | 2 | 121 | 60.5000 | | Renoir | 1 | 64 | 64.0000 | | Van Gogh | 3 | 148 | 49.3333 | +----------+---------------------+-------------+---------------+
Note that the summary statements produce output only for those
artists in the artist
table for
whom you actually have acquired paintings. (For example, Monet is
listed in the artist
table but is
not present in the summary because you don’t have any of his paintings
yet.) If you want the summary to include all artists, even if you have
none of their paintings yet, you must use a different kind of
join—specifically, an outer join:
Joins written with the comma operator or INNER
JOIN
are inner
joins, which means that they produce a result only for
values in one table that match values in another table.
An outer join can produce those matches as well, but also can show you which values in one table are missing from the other. Finding Rows with No Match in Another Table introduces outer joins.
The tbl_name.col_name
notation that qualifies a column name with a table name is always
allowable in a join but can be shortened to just
col_name
if the name appears in only one of
the joined tables. In that case, MySQL can determine without ambiguity
which table the column comes from, and no table name qualifier is
necessary. We can’t do that for the following join. Both tables have
an a_id
column, so the column
reference is ambiguous:
mysql>SELECT * FROM artist INNER JOIN painting ON a_id = a_id;
ERROR 1052 (23000): Column 'a_id' in on clause is ambiguous
By contrast, the following query is unambiguous. Each instance
of a_id
is qualified with the
appropriate table name, only artist
has a name
column, and only
painting
has title
and state
columns:
mysql>SELECT name, title, state FROM artist INNER JOIN painting
->ON artist.a_id = painting.a_id;
+----------+-------------------+-------+ | name | title | state | +----------+-------------------+-------+ | Da Vinci | The Last Supper | IN | | Da Vinci | The Mona Lisa | MI | | Van Gogh | Starry Night | KY | | Van Gogh | The Potato Eaters | KY | | Van Gogh | The Rocks | IA | | Renoir | Les Deux Soeurs | NE | +----------+-------------------+-------+
To make the meaning of a statement clearer to human readers, it’s often useful to qualify column names even when that’s not strictly necessary as far as MySQL is concerned. I tend to use qualified names in join examples for that reason.
If you don’t want to write complete table names when qualifying column references, give each table a short alias and refer to its columns using the alias. The following two statements are equivalent:
SELECT artist.name, painting.title, states.name, painting.price FROM artist INNER JOIN painting INNER JOIN states ON artist.a_id = painting.a_id AND painting.state = states.abbrev; SELECT a.name, p.title, s.name, p.price FROM artist AS a INNER JOIN painting AS p INNER JOIN states AS s ON a.a_id = p.a_id AND p.state = s.abbrev;
In AS
alias_name
clauses, the AS
is optional.
For complicated statements that select many columns, aliases can save a lot of typing. In addition, aliases are not only convenient but necessary for some types of statements, as will become evident when we get to the topic of self-joins (Comparing a Table to Itself).
3.145.202.27