You want to find rows in one table that have no match in another. Or you want to produce a list on the basis of a join between tables, and you want the list to include an entry for every row in the first table, even when there are no matches in the second table.
Finding Rows in One Table That Match Rows in Another focused on inner
joins, which are joins that find matches between two tables. However,
the answers to some questions require determining which rows do
not have a match (or, stated another way, which
rows have values that are missing from the other table). For example,
you might want to know which artists in the artist
table you don’t yet have any
paintings by. The same kind of question occurs in other contexts. Some
examples:
You’re working in sales. You have a list of potential customers, and another list of people who have placed orders. To focus your efforts on people who are not yet actual customers, you want to find people in the first list who are not in the second.
You have one list of baseball players, and another list of players who have hit home runs, and you want to know which players in the first list have not hit a home run. The answer is determined by finding those players in the first list who are not in the second.
For these types of questions, it’s necessary to use an
outer join. Like an inner join, an outer join can
find matches between tables. But unlike an inner join, an outer join
can also determine which rows in one table have no match in another.
Two types of outer join are LEFT
JOIN
and RIGHT
JOIN
.
To see why outer joins are useful, let’s consider the problem of
determining which artists in the artist
table are missing from the painting
table. At present, the tables are
small, so it’s easy to examine them visually:
mysql>SELECT * FROM artist ORDER BY a_id;
+------+----------+ | a_id | name | +------+----------+ | 1 | Da Vinci | | 2 | Monet | | 3 | Van Gogh | | 4 | Picasso | | 5 | Renoir | +------+----------+ mysql>SELECT * FROM painting ORDER BY a_id, p_id;
+------+------+-------------------+-------+-------+ | a_id | p_id | title | state | price | +------+------+-------------------+-------+-------+ | 1 | 1 | The Last Supper | IN | 34 | | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | 3 | Starry Night | KY | 48 | | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | 5 | The Rocks | IA | 33 | | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+------+-------------------+-------+-------+
By looking at the tables, you can see that you have no paintings
by Monet or Picasso (there are no painting
rows with an a_id
value of 2 or 4). But as you acquire
more paintings and the tables get larger, it won’t be so easy to
eyeball them and answer the question by inspection. Can you answer it
using SQL? Sure, although first attempts at a solution generally look
something like the following statement, which uses a not-equal
condition to look for mismatches between the two tables:
mysql>SELECT * FROM artist INNER JOIN painting
->ON artist.a_id != painting.a_id;
+------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 2 | Monet | 1 | 1 | The Last Supper | IN | 34 | | 3 | Van Gogh | 1 | 1 | The Last Supper | IN | 34 | | 4 | Picasso | 1 | 1 | The Last Supper | IN | 34 | | 5 | Renoir | 1 | 1 | The Last Supper | IN | 34 | | 2 | Monet | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 1 | 2 | The Mona Lisa | MI | 87 | | 4 | Picasso | 1 | 2 | The Mona Lisa | MI | 87 | | 5 | Renoir | 1 | 2 | The Mona Lisa | MI | 87 | | 1 | Da Vinci | 3 | 3 | Starry Night | KY | 48 | | 2 | Monet | 3 | 3 | Starry Night | KY | 48 | | 4 | Picasso | 3 | 3 | Starry Night | KY | 48 | | 5 | Renoir | 3 | 3 | Starry Night | KY | 48 | | 1 | Da Vinci | 3 | 4 | The Potato Eaters | KY | 67 | | 2 | Monet | 3 | 4 | The Potato Eaters | KY | 67 | | 4 | Picasso | 3 | 4 | The Potato Eaters | KY | 67 | | 5 | Renoir | 3 | 4 | The Potato Eaters | KY | 67 | | 1 | Da Vinci | 3 | 5 | The Rocks | IA | 33 | | 2 | Monet | 3 | 5 | The Rocks | IA | 33 | | 4 | Picasso | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 3 | 5 | The Rocks | IA | 33 | | 1 | Da Vinci | 5 | 6 | Les Deux Soeurs | NE | 64 | | 2 | Monet | 5 | 6 | Les Deux Soeurs | NE | 64 | | 3 | Van Gogh | 5 | 6 | Les Deux Soeurs | NE | 64 | | 4 | Picasso | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+
That output obviously is not correct. (For example, it falsely
indicates that each painting was painted by several different
artists.) The problem is that the statement produces a list of all
combinations of values from the two tables in which the artist ID
values aren’t the same, whereas what you really need is a list of
values in artist
that aren’t
present at
all in painting
. The trouble here is that an inner
join can only produce results based on combinations of values that are
present in both tables. It can’t tell you anything about values that
are missing from one of them.
When faced with the problem of finding values in one table that
have no match in (or that are missing from) another table, you should
get in the habit of thinking, “Aha, that’s a LEFT
JOIN
problem.” A LEFT
JOIN
is one type of outer join: it’s similar
to an inner join in that it attempts to match rows in the first (left)
table with the rows in the second (right) table. But in addition, if a
left table row has no match in the right table, a LEFT
JOIN
still produces a row—one in which all the columns from the right table
are set to NULL
. This means you can
find values that are missing from the right table by looking for
NULL
. It’s easier to understand how
this happens by working in stages. Begin with an inner join that
displays matching rows:
mysql>SELECT * FROM artist INNER JOIN painting
->ON artist.a_id = painting.a_id;
+------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+
In this output, the first a_id
column comes from the artist
table and the second one comes from
the painting
table.
Now compare that result with the output you get from a LEFT
JOIN
. A LEFT
JOIN
is written much like an INNER
JOIN
:
mysql>SELECT * FROM artist LEFT JOIN painting
->ON artist.a_id = painting.a_id;
+------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 2 | Monet | NULL | NULL | NULL | NULL | NULL | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 4 | Picasso | NULL | NULL | NULL | NULL | NULL | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+
The output is similar to that from the inner join, except that
the LEFT
JOIN
also produces at least one output row
for every artist
row, including
those that have no painting
table
match. For those output rows, all the columns from painting
are set to NULL
. These are rows that the inner join
does not produce.
Next, to restrict the output only to the nonmatched artist
rows, add a WHERE
clause that looks for NULL
values in any painting
column that cannot otherwise
contain NULL
. This filters out the
rows that the inner join produces, leaving those produced only by the
outer join:
mysql>SELECT * FROM artist LEFT JOIN painting
->ON artist.a_id = painting.a_id
->WHERE painting.a_id IS NULL;
+------+---------+------+------+-------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+---------+------+------+-------+-------+-------+ | 2 | Monet | NULL | NULL | NULL | NULL | NULL | | 4 | Picasso | NULL | NULL | NULL | NULL | NULL | +------+---------+------+------+-------+-------+-------+
Finally, to show only the artist
table values that are missing from
the painting
table, shorten the
output column list to include only columns from the artist
table. The result is that the
LEFT
JOIN
lists those left-table rows containing
a_id
values that are not present in
the right table:
mysql>SELECT artist.* FROM artist LEFT JOIN painting
->ON artist.a_id = painting.a_id
->WHERE painting.a_id IS NULL;
+------+---------+ | a_id | name | +------+---------+ | 2 | Monet | | 4 | Picasso | +------+---------+
A similar kind of operation can be used to report each
left-table value along with an indicator as to whether it’s present in
the right table. To do this, perform a LEFT
JOIN
that counts the number of times each left-table value occurs in the
right table. A count of zero indicates that the value is not present.
The following statement lists each artist from the artist
table and shows whether you have any
paintings by the artist:
mysql>SELECT artist.name,
->IF(COUNT(painting.a_id)>0,'yes','no') AS 'in collection'
->FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id
->GROUP BY artist.name;
+----------+---------------+ | name | in collection | +----------+---------------+ | Da Vinci | yes | | Monet | no | | Picasso | no | | Renoir | yes | | Van Gogh | yes | +----------+---------------+
A RIGHT
JOIN
is another kind of outer join. It is
like LEFT
JOIN
but reverses the roles of the left and
right tables. Semantically, RIGHT
JOIN
forces the matching process to
produce a row from each table in the right table, even in the absence
of a corresponding row in the left table. Syntactically,
tbl1
LEFT
JOIN
tbl2
is equivalent to
tbl2
RIGHT
JOIN
tbl1
. This
means that you would rewrite the preceding LEFT
JOIN
as follows to convert it to a RIGHT
JOIN
that produces the same
results:
mysql>SELECT artist.name,
->IF(COUNT(painting.a_id)>0,'yes','no') AS 'in collection'
->FROM painting RIGHT JOIN artist ON artist.a_id = painting.a_id
->GROUP BY artist.name;
+----------+---------------+ | name | in collection | +----------+---------------+ | Da Vinci | yes | | Monet | no | | Picasso | no | | Renoir | yes | | Van Gogh | yes | +----------+---------------+
Elsewhere in this book, I’ll generally refer only to LEFT
JOIN
for brevity, but such references apply to RIGHT
JOIN
as well if you reverse the roles of the
tables.
As shown in this section, LEFT
JOIN
is useful for finding values with no match in another table or for
showing whether each value is matched. LEFT
JOIN
may also be used to produce a summary that includes all items in a
list, even those for which there’s nothing to summarize. This is very
common for characterizing the relationship between a master table and
a detail table. For example, a LEFT
JOIN
can produce “total sales
per customer” reports that list all customers, even those who
haven’t bought anything during the summary period. (See Producing Master-Detail Lists and Summaries for information about
master-detail lists.)
You can also use LEFT
JOIN
to perform consistency
checking when you receive two datafiles that are supposed to be
related, and you want to determine whether they really are. (That is,
you want to check the integrity of their relationship.) Import each
file into a MySQL table, and then run a couple of LEFT
JOIN
statements to determine whether there are unattached rows in one table
or the other—that is, rows that have no match in the other table.
Identifying and Removing Mismatched or Unattached Rows discusses how to
identify (and optionally delete) these unattached rows.
3.16.66.156