You want to compare rows in a table to other rows in the same
table. For example, you want to find all paintings in your collection
by the artist who painted The Potato Eaters. Or
you want to know which states listed in the states
table joined the Union in the same
year as New York. Or you want to know which states did not join the
Union in the same year as any other state.
Problems that require comparing a table to itself involve an operation known as a self-join. It’s performed much like other joins, except that you must always use table aliases so that you can refer to the same table different ways within the statement.
A special case of joining one table to another occurs when both tables are the same. This is called a self-join. Although many people find the idea confusing or strange to think about at first, it’s perfectly legal. It’s likely that you’ll find yourself using self-joins quite often because they are so important.
A tip-off that you need a self-join is when you want to know which pairs of elements in a table satisfy some condition. For example, suppose that your favorite painting is The Potato Eaters, and you want to identify all the items in your collection that were done by the artist who painted it. Do so as follows:
Identify the row in the painting
table that contains the title
The Potato Eaters, so that you can refer to
its a_id
value.
Use the a_id
value to
match other rows in the table that have the same a_id
value.
Display the titles from those matching rows.
The artist ID and painting titles that we begin with look like this:
mysql>SELECT a_id, title FROM painting ORDER BY a_id;
+------+-------------------+
| a_id | title |
+------+-------------------+
| 1 | The Last Supper |
| 1 | The Mona Lisa |
| 3 | Starry Night |
| 3 | The Potato Eaters |
| 3 | The Rocks |
| 5 | Les Deux Soeurs |
+------+-------------------+
A two-step method for picking out the right titles without a join is to look up the artist’s ID with one statement and then use the ID in a second statement to select rows that match it:
mysql>SELECT @id := a_id FROM painting WHERE title = 'The Potato Eaters';
+-------------+ | @id := a_id | +-------------+ | 3 | +-------------+ mysql>SELECT title FROM painting WHERE a_id = @id;
+-------------------+ | title | +-------------------+ | Starry Night | | The Potato Eaters | | The Rocks | +-------------------+
A different solution that requires only a single statement is to use a self-join. The trick to this lies in figuring out the proper notation to use. First attempts at writing a statement that joins a table to itself often look something like this:
mysql>SELECT title
->FROM painting INNER JOIN painting
->ON a_id = a_id;
->WHERE title = 'The Potato Eaters';
ERROR 1066 (42000): Not unique table/alias: 'painting'
The problem with that statement is that the column references
are ambiguous. MySQL can’t tell which instance of the painting
table any given column name refers
to. The solution is to give at least one instance of the table an
alias so that you can distinguish column references by using different
table qualifiers. The following statement shows how to do this, using
the aliases p1
and p2
to refer to the painting
table different ways:
mysql>SELECT p2.title
->FROM painting AS p1 INNER JOIN painting AS p2
->ON p1.a_id = p2.a_id
->WHERE p1.title = 'The Potato Eaters';
+-------------------+ | title | +-------------------+ | Starry Night | | The Potato Eaters | | The Rocks | +-------------------+
The statement output illustrates something typical of self-joins: when you begin with a reference value in one table instance (The Potato Eaters) to find matching rows in a second table instance (paintings by the same artist), the output includes the reference value. That makes sense: after all, the reference matches itself. If you want to find only other paintings by the same artist, explicitly exclude the reference value from the output:
mysql>SELECT p2.title
->FROM painting AS p1 INNER JOIN painting AS p2
->ON p1.a_id = p2.a_id
->WHERE p1.title = 'The Potato Eaters' AND p2.title != 'The Potato Eaters';
+--------------+ | title | +--------------+ | Starry Night | | The Rocks | +--------------+
A more general way to exclude the reference value without naming it literally is to specify that you don’t want output rows to have the same title as the reference, whatever that title happens to be:
mysql>SELECT p2.title
->FROM painting AS p1 INNER JOIN painting AS p2
->ON p1.a_id = p2.a_id
->WHERE p1.title = 'The Potato Eaters' AND p2.title != p1.title
+--------------+ | title | +--------------+ | Starry Night | | The Rocks | +--------------+
The preceding statements use comparisons of ID values to match
rows in the two table instances, but any kind of value can be used.
For example, to use the states
table to answer the question “Which states joined the Union in
the same year as New York?,” perform a
temporal pairwise comparison
based on the year part of the dates in the table’s statehood
column:
mysql>SELECT s2.name, s2.statehood
->FROM states AS s1 INNER JOIN states AS s2
->ON YEAR(s1.statehood) = YEAR(s2.statehood)
->WHERE s1.name = 'New York'
->ORDER BY s2.name;
+----------------+------------+ | name | statehood | +----------------+------------+ | Connecticut | 1788-01-09 | | Georgia | 1788-01-02 | | Maryland | 1788-04-28 | | Massachusetts | 1788-02-06 | | New Hampshire | 1788-06-21 | | New York | 1788-07-26 | | South Carolina | 1788-05-23 | | Virginia | 1788-06-25 | +----------------+------------+
Here again, the reference value (New York) appears in the
output. If you want to prevent that, add to the ON
expression a term that explicitly
excludes the reference:
mysql>SELECT s2.name, s2.statehood
->FROM states AS s1 INNER JOIN states AS s2
->ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name != s2.name
->WHERE s1.name = 'New York'
->ORDER BY s2.name;
+----------------+------------+ | name | statehood | +----------------+------------+ | Connecticut | 1788-01-09 | | Georgia | 1788-01-02 | | Maryland | 1788-04-28 | | Massachusetts | 1788-02-06 | | New Hampshire | 1788-06-21 | | South Carolina | 1788-05-23 | | Virginia | 1788-06-25 | +----------------+------------+
Like the problem of finding other paintings by the painter of
The Potato Eaters, the statehood problem can be
solved by using a user-defined variable and two statements. That will
always be true when you’re seeking matches for just one particular row
in your table. Other problems require finding matches for several
rows, in which case the two-statement method will not work. Suppose
that you want to find each pair of states that joined the Union in the
same year. In this case, the output potentially can include any pair
of rows from the states
table.
There is no fixed reference value, so you cannot store the reference
in a variable. A self-join is perfect for this problem:
mysql>SELECT YEAR(s1.statehood) AS year,
->s1.name AS name1, s1.statehood AS statehood1,
->s2.name AS name2, s2.statehood AS statehood2
->FROM states AS s1 INNER JOIN states AS s2
->ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name != s2.name
->ORDER BY year, s1.name, s2.name;
+------+----------------+------------+----------------+------------+ | year | name1 | statehood1 | name2 | statehood2 | +------+----------------+------------+----------------+------------+ | 1787 | Delaware | 1787-12-07 | New Jersey | 1787-12-18 | | 1787 | Delaware | 1787-12-07 | Pennsylvania | 1787-12-12 | | 1787 | New Jersey | 1787-12-18 | Delaware | 1787-12-07 | | 1787 | New Jersey | 1787-12-18 | Pennsylvania | 1787-12-12 | | 1787 | Pennsylvania | 1787-12-12 | Delaware | 1787-12-07 | | 1787 | Pennsylvania | 1787-12-12 | New Jersey | 1787-12-18 | ... | 1912 | Arizona | 1912-02-14 | New Mexico | 1912-01-06 | | 1912 | New Mexico | 1912-01-06 | Arizona | 1912-02-14 | | 1959 | Alaska | 1959-01-03 | Hawaii | 1959-08-21 | | 1959 | Hawaii | 1959-08-21 | Alaska | 1959-01-03 | +------+----------------+------------+----------------+------------+
The condition in the ON
clause that requires state pair names not to be identical eliminates
the trivially duplicate rows showing that each state joined the Union
in the same year as itself. But you’ll notice that each remaining pair
of states still appears twice. For example, there is one row that
lists Delaware and New Jersey, and another that lists New Jersey and
Delaware. This is often the case with self-joins: they produce pairs
of rows that contain the same values, but for which the values are not
in the same order. For techniques that eliminate these “near
duplicates” from the query result, see Eliminating Duplicates from a Self-Join Result.
Some self-join problems are of the “Which values are
not matched by other rows in the table?”
variety. An instance of this is the question “Which states did
not join the Union in the same year as any other state?”
Finding these states is a “nonmatch” problem, which is
the type of problem that typically involves a LEFT
JOIN
. In this case, the solution uses a
LEFT
JOIN
of the states
table to itself:
mysql>SELECT s1.name, s1.statehood
->FROM states AS s1 LEFT JOIN states AS s2
->ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name != s2.name
->WHERE s2.name IS NULL
->ORDER BY s1.name;
+----------------+------------+ | name | statehood | +----------------+------------+ | Alabama | 1819-12-14 | | Arkansas | 1836-06-15 | | California | 1850-09-09 | | Colorado | 1876-08-01 | | Illinois | 1818-12-03 | | Indiana | 1816-12-11 | | Iowa | 1846-12-28 | | Kansas | 1861-01-29 | | Kentucky | 1792-06-01 | | Louisiana | 1812-04-30 | | Maine | 1820-03-15 | | Michigan | 1837-01-26 | | Minnesota | 1858-05-11 | | Mississippi | 1817-12-10 | | Missouri | 1821-08-10 | | Nebraska | 1867-03-01 | | Nevada | 1864-10-31 | | North Carolina | 1789-11-21 | | Ohio | 1803-03-01 | | Oklahoma | 1907-11-16 | | Oregon | 1859-02-14 | | Rhode Island | 1790-05-29 | | Tennessee | 1796-06-01 | | Utah | 1896-01-04 | | Vermont | 1791-03-04 | | West Virginia | 1863-06-20 | | Wisconsin | 1848-05-29 | +----------------+------------+
For each row in the states
table, the statement selects rows in which the state has a statehood
value in the same year, not
including that state itself. For rows having no such match, the
LEFT
JOIN
forces the output to contain a row
anyway, with all the s2
columns set
to NULL
. Those rows identify the
states with no other state that joined the Union in the same
year.
3.145.101.109