You want to find which row within each group of rows in a table contains the maximum or minimum value for a given column. For example, you want to determine the most expensive painting in your collection for each artist.
Create a temporary table to hold the per-group maximum or
minimum values, and then join the temporary table with the original
one to pull out the matching row for each group. If you prefer a
single-query solution, use a subquery in the
FROM
clause rather
than a temporary table.
Many questions involve finding largest or smallest values in a
particular table column, but it’s also common to want to know what the
other values are in the row that contains the value. For example, when
you are using the artist
and
painting
tables, it’s possible to
answer questions like “What is the most expensive painting in
the collection, and who painted it?” One way to do this is to
store the highest price in a user-defined variable and then use the
variable to identify the row containing the price so that you can
retrieve other columns from it:
mysql>SET @max_price = (SELECT MAX(price) FROM painting);
mysql>SELECT artist.name, painting.title, painting.price
->FROM artist INNER JOIN painting
->ON painting.a_id = artist.a_id
->WHERE painting.price = @max_price;
+----------+---------------+-------+ | name | title | price | +----------+---------------+-------+ | Da Vinci | The Mona Lisa | 87 | +----------+---------------+-------+
The same thing can be done by creating a temporary table to hold the maximum price and then joining it with the other tables:
mysql>CREATE TABLE tmp SELECT MAX(price) AS max_price FROM painting;
mysql>SELECT artist.name, painting.title, painting.price
->FROM artist INNER JOIN painting INNER JOIN tmp
->ON painting.a_id = artist.a_id
->AND painting.price = tmp.max_price;
+----------+---------------+-------+ | name | title | price | +----------+---------------+-------+ | Da Vinci | The Mona Lisa | 87 | +----------+---------------+-------+
The techniques of using a user-defined variable or a temporary table as just shown were illustrated originally in Finding Values Associated with Minimum and Maximum Values. Their use here is similar except that now we are applying them to multiple tables.
On the face of it, using a temporary table and a join is just a more complicated way of answering the
question than with a user-defined variable. Does this technique have
any practical value? Yes, it does, because it leads to a more general
technique for answering more difficult questions. The previous
statements show information only for the single most expensive
painting in the entire painting
table. What if your question is, “What is the most expensive
painting for each artist?” You can’t use a
user-defined variable to answer that question, because the answer
requires finding one price per artist, and a variable can hold only a
single value at a time. But the technique of using a temporary table
works well, because the table can hold multiple rows, and a join can
find matches for all of them.
To answer the question, select each artist ID and the
corresponding maximum painting price into a temporary table. The table
will contain not just the maximum painting price but the maximum
within each group, where “group” is defined as
“paintings by a given artist.” Then use the artist IDs
and prices stored in the tmp
table
to match rows in the painting
table, and join the result with the artist
table to get the artist names:
mysql>CREATE TABLE tmp
->SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id;
mysql>SELECT artist.name, painting.title, painting.price
->FROM artist INNER JOIN painting INNER JOIN tmp
->ON painting.a_id = artist.a_id
->AND painting.a_id = tmp.a_id
->AND painting.price = tmp.max_price;
+----------+-------------------+-------+ | name | title | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa | 87 | | Van Gogh | The Potato Eaters | 67 | | Renoir | Les Deux Soeurs | 64 | +----------+-------------------+-------+
To obtain the same result with a single statement, use a
subquery in the FROM
clause that
retrieves the same rows contained in the temporary table:
mysql>SELECT artist.name, painting.title, painting.price
->FROM artist INNER JOIN painting INNER JOIN
->(SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id)
->AS tmp
->ON painting.a_id = artist.a_id
->AND painting.a_id = tmp.a_id
->AND painting.price = tmp.max_price;
+----------+-------------------+-------+ | name | title | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa | 87 | | Van Gogh | The Potato Eaters | 67 | | Renoir | Les Deux Soeurs | 64 | +----------+-------------------+-------+
Yet another way to answer maximum-per-group questions is to use
a
LEFT
JOIN
that joins a table to itself. The
following statement identifies the highest-priced painting per artist
ID (we are using IS
NULL
to select all the rows from p1
for which there is
no row in p2
with a higher price):
mysql>SELECT p1.a_id, p1.title, p1.price
->FROM painting AS p1 LEFT JOIN painting AS p2
->ON p1.a_id = p2.a_id AND p1.price < p2.price
->WHERE p2.a_id IS NULL;
+------+-------------------+-------+ | a_id | title | price | +------+-------------------+-------+ | 1 | The Mona Lisa | 87 | | 3 | The Potato Eaters | 67 | | 5 | Les Deux Soeurs | 64 | +------+-------------------+-------+
To display artist names rather than ID values, join the result
of the LEFT
JOIN
to the artist
table:
mysql>SELECT artist.name, p1.title, p1.price
->FROM painting AS p1 LEFT JOIN painting AS p2
->ON p1.a_id = p2.a_id AND p1.price < p2.price
->INNER JOIN artist ON p1.a_id = artist.a_id
->WHERE p2.a_id IS NULL;
+----------+-------------------+-------+ | name | title | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa | 87 | | Van Gogh | The Potato Eaters | 67 | | Renoir | Les Deux Soeurs | 64 | +----------+-------------------+-------+
The self–LEFT
JOIN
method is perhaps somewhat less
intuitive than using a temporary table or a subquery.
The techniques just shown work for other kinds of values, such
as temporal values. Consider the driver_log
table that lists drivers and
trips that they’ve taken:
mysql>SELECT name, trav_date, miles
->FROM driver_log
->ORDER BY name, trav_date;
+-------+------------+-------+ | name | trav_date | miles | +-------+------------+-------+ | Ben | 2006-08-29 | 131 | | Ben | 2006-08-30 | 152 | | Ben | 2006-09-02 | 79 | | Henry | 2006-08-26 | 115 | | Henry | 2006-08-27 | 96 | | Henry | 2006-08-29 | 300 | | Henry | 2006-08-30 | 203 | | Henry | 2006-09-01 | 197 | | Suzi | 2006-08-29 | 391 | | Suzi | 2006-09-02 | 502 | +-------+------------+-------+
One type of maximum-per-group problem for this table is “show the most recent trip for each driver.” It can be solved with a temporary table like this:
mysql>CREATE TABLE tmp
->SELECT name, MAX(trav_date) AS trav_date
->FROM driver_log GROUP BY name;
mysql>SELECT driver_log.name, driver_log.trav_date, driver_log.miles
->FROM driver_log INNER JOIN tmp
->ON driver_log.name = tmp.name AND driver_log.trav_date = tmp.trav_date
->ORDER BY driver_log.name;
+-------+------------+-------+ | name | trav_date | miles | +-------+------------+-------+ | Ben | 2006-09-02 | 79 | | Henry | 2006-09-01 | 197 | | Suzi | 2006-09-02 | 502 | +-------+------------+-------+
You can also use a subquery in the FROM
clause like this:
mysql>SELECT driver_log.name, driver_log.trav_date, driver_log.miles
->FROM driver_log INNER JOIN
->(SELECT name, MAX(trav_date) AS trav_date
->FROM driver_log GROUP BY name) AS tmp
->ON driver_log.name = tmp.name AND driver_log.trav_date = tmp.trav_date
->ORDER BY driver_log.name;
+-------+------------+-------+ | name | trav_date | miles | +-------+------------+-------+ | Ben | 2006-09-02 | 79 | | Henry | 2006-09-01 | 197 | | Suzi | 2006-09-02 | 502 | +-------+------------+-------+
Which technique is better: the temporary table or the subquery
in the FROM
clause? For small
tables, there might not be much difference either way. If the
temporary table or subquery result is large, a general advantage of
the temporary table is that you can index it after creating it and
before using it in a join.
This recipe shows how to answer maximum-per-group questions by
selecting summary information into a temporary table and joining that
table to the original one or by using a subquery in the FROM
clause. These techniques have
application in many contexts. One of them is calculation of team
standings, where the standings for each group of teams are determined
by comparing each team in the group to the team with the best record.
Computing Team Standings discusses how to do
this.
18.219.182.76