SQL can group rows based on matching values in specified columns and computer summary measures for each group. When these grouping queries are combined with the set functions that you saw earlier in this chapter, SQL can provide simple reports without requiring any special programming.
Forming Groups
To form a group, you add a GROUP BY clause to a SELECT statement, followed by the columns whose values are to be used to form the groups. All rows whose values match on those columns will be placed in the same group.
For example, if someone at the rare book store wants to see how many copies of each book edition have been sold, he or she can use a query like
The query forms groups by matching ISBNs. It displays the ISBN and the number of rows in each group (see
Figure 7-2).
There is a major restriction that you must observe with a grouping query: You can display values only from columns that are used to form the groups. As an example, assume that someone at the rare book store wants to see the number of copies of each title that have been sold. A working query could be written
WHERE volume.isbn = book.isbn
AND book.work_numb = work.work_numb
The result appears in
Figure 7-3. The problem with this approach is that titles may duplicate. Therefore, it would be better to group by the work number. However, given the restriction as to what can be displayed, you wouldn't be able to display the title.
The solution is to make the DBMS do a bit of extra work: Group by both the work number and the title. The DBMS will
then form groups that have the same values in both columns. There is only one title per work number, so the result will be the same as that in
Figure 7-3 if there are no duplicated titles. We therefore gain the ability to display the title when grouping by the work number. The query could be written
SELECT work.work_numb title, COUNT (*)
WHERE volume.isbn = book.isbn
AND book.work_numb = work.work_numb
GROUP BY work_numb, title
As you can see in
Figure 7-4, the major difference between the two results is the appearance of the work number column.
You can use any of the set functions in a grouping query. For example, someone at the rare book store could generate the total cost of all sales with
SELECT sale_id, SUM (selling_price)
The result can be seen in
Figure 7-5. Notice that the last line of the result has nulls for both output values. This occurs because those volumes that haven't been sold have null for the sale ID and selling price. If you wanted to clean up the output, removing rows with nulls, you could add a WHERE clause:
SELECT sale_id, SUM (selling_price)
WHERE NOT (sale_id IS NULL)
Including the title as part of the GROUP BY clause was a trick to allow us to display the title in the result. However, more commonly we use multiple columns to created nested groups. For example, if someone at the rare book store wanted to see the total cost of purchases made by each customer per day, the query could be written
SELECT customer.customer_numb, sale_date, SUM (selling_price)
FROM customer, sale, volume
WHERE customer.customer_numb = sale.customer_numb
AND sale.sale_id = volume.sale_id
GROUP BY customer.customer_numb, sale_date;
Because the
customer_numb column is listed first in the GROUP BY clause, its values are used to create the outer groupings. The DBMS then groups orders by date
within customer numbers. The default output (see
Figure 7-6) is somewhat hard to interpret because the outer groupings are not in order. However, if you add an ORDER BY clause to sort the output by customer number, you can see the ordering by date within each customer (see
Figure 7-7).
Restricting Groups
The grouping queries you have seen to this point include all the rows in the table. However, you can restrict the rows that are included in grouped output using one of two strategies:
◊ Restrict the rows before groups are formed.
◊ Allow all groups to be formed and then restrict the groups.
The first strategy is performed with the WHERE clause in the same way we have been restricting rows to this point. The second requires a HAVING clause, which contains a predicate that applies to groups after they are formed.
Assume, for example, that someone at the rare book store wants to see the number of books ordered at each price over $75. One way to write the query is to use a WHERE clause to throw out rows with a selling price less than or equal to $75:
SELECT selling_price, count (*)
Alternatively, you could let the DBMS form the groups and then throw out the groups that have a cost less than or equal to $75 with a HAVING clause:
SELECT selling_price, count (*)
HAVING selling_price > 75;
The result in both cases is the same (see
Figure 7-8). However, the way in which the query is processing is different.