Chapter 13. Advanced Group Operations

Group operations aggregate data over multiple rows. We discussed the GROUP BY clause and basic group operations in Chapter 4. Decision-support systems require more complex group operations. Data warehousing applications involve aggregation over multiple dimensions of data. To enable effective decision support, you need to summarize transaction data at various levels. We discuss advanced group operations used by decision-support systems in this chapter.

Oracle provides several handy SQL features to summarize data. These include the following:

  • A ROLLUP function to generate totals and subtotals in the summarized results.

  • A CUBE function to generate subtotals for all possible combinations of grouped columns.

  • A GROUPING SETS function to generate summary information at the level you choose without including all the rows produced by the regular GROUP BY operation.

  • The GROUPING, GROUPING_ID and GROUP_ID functions to help you correctly interpret results generated using ROLLUP, CUBE, and GROUPING SETS.

Multiple Summary Levels

In Chapter 4, you saw how the GROUP BY clause, along with the aggregate functions, can be used to produce summary results. For example, if you want to print the monthly total sales for each region, you would probably execute the following query:

            SELECT r.name region, 
                   TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)
            FROM all_orders o JOIN region r
            ON r.region_id = o.region_id
            GROUP BY r.name, o.month;

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
New England          January            1527645
New England          February           1847238
New England          March              1699449
New England          April              1792866
New England          May                1698855
New England          June               1510062
New England          July               1678002
New England          August             1642968
New England          September          1726767
New England          October            1648944
New England          November           1384185
New England          December           1599942
Mid-Atlantic         January            1832091
Mid-Atlantic         February           1286028
Mid-Atlantic         March              1911093
Mid-Atlantic         April              1623438
Mid-Atlantic         May                1778805
Mid-Atlantic         June               1504455
Mid-Atlantic         July               1820742
Mid-Atlantic         August             1381560
Mid-Atlantic         September          1178694
Mid-Atlantic         October            1530351
Mid-Atlantic         November           1598667
Mid-Atlantic         December           1477374
Southeast US         January            1137063
Southeast US         February           1855269
Southeast US         March              1967979
Southeast US         April              1830051
Southeast US         May                1983282
Southeast US         June               1705716
Southeast US         July               1670976
Southeast US         August             1436295
Southeast US         September          1905633
Southeast US         October            1610523
Southeast US         November           1661598
Southeast US         December           1841100

36 rows selected.

As expected, this report prints the total sales for each region and month combination. However, in a more complex application, you may also want to have the subtotal for each region over all months, along with the total for all regions, or you may want the subtotal for each month over all regions, along with the total for all months. In short, you may need to generate subtotals and totals at more than one level.

UNION

In data warehouse applications, you frequently need to generate summary information over various dimensions, and subtotal and total across those dimensions. Generating and retrieving this type of summary information is a core goal of almost all data warehouse applications.

By this time, you have realized that a simple GROUP BY query is not sufficient to generate the subtotals and totals described in this section. To illustrate the complexity of the problem, let’s attempt to write a query that would return the following in a single output:

  • Sales for each month for every region

  • Subtotals over all months for every region

  • Total sales for all regions over all months

One way to generate multiple levels of summary (the only way prior to Oracle8i) is to write a UNION query. For example, the following UNION query will give us the desired three levels of subtotals:

               SELECT r.name region, 
                      TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY r.name, o.month
               UNION ALL
               SELECT r.name region, NULL, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY r.name
               UNION ALL
               SELECT NULL, NULL, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id;

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
New England          January            1527645
New England          February           1847238
New England          March              1699449
New England          April              1792866
New England          May                1698855
New England          June               1510062
New England          July               1678002
New England          August             1642968
New England          September          1726767
New England          October            1648944
New England          November           1384185
New England          December           1599942
Mid-Atlantic         January            1832091
Mid-Atlantic         February           1286028
Mid-Atlantic         March              1911093
Mid-Atlantic         April              1623438
Mid-Atlantic         May                1778805
Mid-Atlantic         June               1504455
Mid-Atlantic         July               1820742
Mid-Atlantic         August             1381560
Mid-Atlantic         September          1178694
Mid-Atlantic         October            1530351
Mid-Atlantic         November           1598667
Mid-Atlantic         December           1477374
Southeast US         January            1137063
Southeast US         February           1855269
Southeast US         March              1967979
Southeast US         April              1830051
Southeast US         May                1983282
Southeast US         June               1705716
Southeast US         July               1670976
Southeast US         August             1436295
Southeast US         September          1905633
Southeast US         October            1610523
Southeast US         November           1661598
Southeast US         December           1841100
Mid-Atlantic                           18923298
New England                            19756923
Southeast US                           20605485
                                       59285706

40 rows selected.

This query produced 40 rows of output, 36 of which are the sales for each month for every region. The last four rows are the subtotals and the total. The three rows with region names and NULL values for the month are the subtotals for each region over all the months, and the last row with NULL values for both the region and month is the total sales for all the regions over all the months.

Now that you have the desired result, try to analyze the query a bit. You have a very small all_orders table with only 1440 rows in this example. You wanted to have summary information over just two dimensions—region and month. You have 3 regions and 12 months. To get the desired summary information from this table, you have to write a query consisting of three SELECT statements combined together using UNION ALL. The execution plan for this query is:

PLAN_TABLE_OUTPUT
-----------------------------------------------------

-----------------------------------------------------
| Id | Operation                  | Name            |
-----------------------------------------------------
|  0 | SELECT STATEMENT           |                 |
|  1 | UNION-ALL                  |                 |
|  2 | SORT GROUP BY              |                 |
|  3 | MERGE JOIN                 |                 |
|  4 | TABLE ACCESS BY INDEX ROWID| REGION          |
|  5 | INDEX FULL SCAN            | REGION_PK       |
|* 6 | SORT JOIN                  |                 |
|  7 | TABLE ACCESS FULL          | ALL_ORDERS      |
|  8 | SORT GROUP BY              |                 |
|  9 | MERGE JOIN                 |                 |
|  10| TABLE ACCESS BY INDEX ROWID| REGION          |
|  11| INDEX FULL SCAN            | REGION_PK       |
|* 12| SORT JOIN                  |                 |
|  13| TABLE ACCESS FULL          | ALL_ORDERS      |
|  14| SORT AGGREGATE             |                 |
|  15| NESTED LOOPS               |                 |
|  16| TABLE ACCESS FULL          | ALL_ORDERS      |
|* 17| INDEX UNIQUE SCAN          | REGION_PK       |
-----------------------------------------------------

As indicated by the execution plan output, Oracle needs to perform the following operations to get the results:

Three FULL TABLE scans on all_orders
Three INDEX scan on region_pk (Primary key of table region)
Two Sort-Merge Joins
One NESTED LOOPS JOIN
Two SORT GROUP BY operations
One SORT AGGREGATE operation
One UNION ALL

In any practical application the all_orders table will consist of millions of rows, and performing all these operations would be time-consuming. Even worse, if you have more dimensions for which to prepare summary information than the two shown in this example, you have to write an even more complex query. The bottom line is that such a query badly hurts performance.

ROLLUP

Oracle8i introduced several new features for generating multiple levels of summary information with one query. One such feature is a set of extensions to the GROUP BY clause. In Oracle8i, the GROUP BY clause comes with two extensions: ROLLUP and CUBE. Oracle9i introduced another extension: GROUPING SETS. We discuss ROLLUP in this section. CUBE and GROUPING SETS are discussed later in this chapter.

ROLLUP is an extension to the GROUP BY clause, and therefore can only appear in a query with a GROUP BY clause. The ROLLUP operation groups the selected rows based on the expressions in the GROUP BY clause, and prepares a summary row for each group. The syntax of ROLLUP is:

SELECT  . . . 
FROM  . . . 
GROUP BY ROLLUP (ordered list of grouping columns)

Using ROLLUP, you can generate the summary information discussed in the previous section in a much easier way than in our UNION ALL query. For example:

               SELECT r.name region, 
                      TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY ROLLUP (r.name, o.month);

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
New England          January            1527645
New England          February           1847238
New England          March              1699449
New England          April              1792866
New England          May                1698855
New England          June               1510062
New England          July               1678002
New England          August             1642968
New England          September          1726767
New England          October            1648944
New England          November           1384185
New England          December           1599942
New England                            19756923
Mid-Atlantic         January            1832091
Mid-Atlantic         February           1286028
Mid-Atlantic         March              1911093
Mid-Atlantic         April              1623438
Mid-Atlantic         May                1778805
Mid-Atlantic         June               1504455
Mid-Atlantic         July               1820742
Mid-Atlantic         August             1381560
Mid-Atlantic         September          1178694
Mid-Atlantic         October            1530351
Mid-Atlantic         November           1598667
Mid-Atlantic         December           1477374
Mid-Atlantic                           18923298
Southeast US         January            1137063
Southeast US         February           1855269
Southeast US         March              1967979
Southeast US         April              1830051
Southeast US         May                1983282
Southeast US         June               1705716
Southeast US         July               1670976
Southeast US         August             1436295
Southeast US         September          1905633
Southeast US         October            1610523
Southeast US         November           1661598
Southeast US         December           1841100
Southeast US                           20605485
                                       59285706

40 rows selected.

As you can see in this output, the ROLLUP operation produced subtotals across the specified dimensions and a grand total. The argument to the ROLLUP operation is an ordered list of grouping columns. Since the ROLLUP operation is used in conjunction with the GROUP BY clause, it first generates aggregate values based on the GROUP BY operation on the ordered list of columns. It then generates higher-level subtotals and finally a grand total. ROLLUP not only simplifies the query, it results in more efficient execution. The execution plan for this ROLLUP query is as follows:

PLAN_TABLE_OUTPUT
----------------------------------------------------

----------------------------------------------------
| Id  | Operation                     | Name       |
----------------------------------------------------
|   0 | SELECT STATEMENT              |            |
|   1 |  SORT GROUP BY ROLLUP         |            |
|   2 |   MERGE JOIN                  |            |
|   3 |    TABLE ACCESS BY INDEX ROWID| REGION     |
|   4 |     INDEX FULL SCAN           | REGION_PK  |
|*  5 |    SORT JOIN                  |            |
|   6 |     TABLE ACCESS FULL         | ALL_ORDERS |
----------------------------------------------------

Rather than the multiple table scans, joins, and other operations required by the UNION ALL version of the query, the ROLLUP query needs just one index scan on region_pk, one full table scan on all_orders, and one join to generate the required output.

If you want to generate subtotals for each month instead of for each region, all you need to do is change the order of columns in the ROLLUP operation, as in the following example:

               SELECT r.name region, 
               TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY ROLLUP (o.month, r.name);

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
New England          January            1527645
Mid-Atlantic         January            1832091
Southeast US         January            1137063
                     January            4496799
New England          February           1847238
Mid-Atlantic         February           1286028
Southeast US         February           1855269
                     February           4988535
New England          March              1699449
Mid-Atlantic         March              1911093
Southeast US         March              1967979
                     March              5578521
New England          April              1792866
Mid-Atlantic         April              1623438
Southeast US         April              1830051
                     April              5246355
New England          May                1698855
Mid-Atlantic         May                1778805
Southeast US         May                1983282
                     May                5460942
New England          June               1510062
Mid-Atlantic         June               1504455
Southeast US         June               1705716
                     June               4720233
New England          July               1678002
Mid-Atlantic         July               1820742
Southeast US         July               1670976
                     July               5169720
New England          August             1642968
Mid-Atlantic         August             1381560
Southeast US         August             1436295
                     August             4460823
New England          September          1726767
Mid-Atlantic         September          1178694
Southeast US         September          1905633
                     September          4811094
New England          October            1648944
Mid-Atlantic         October            1530351
Southeast US         October            1610523
                     October            4789818
New England          November           1384185
Mid-Atlantic         November           1598667
Southeast US         November           1661598
                     November           4644450
New England          December           1599942
Mid-Atlantic         December           1477374
Southeast US         December           1841100
                     December           4918416
                                       59285706

49 rows selected.

Adding dimensions does not result in additional complexity. The following query rolls up subtotals for the region, the month, and the year for the first quarter:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY ROLLUP (o.year, o.month, r.name);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
      2000 January   New England                   1018430
      2000 January   Mid-Atlantic                  1221394
      2000 January   Southeast US                   758042
      2000 January                                 2997866
      2000 February  New England                   1231492
      2000 February  Mid-Atlantic                   857352
      2000 February  Southeast US                  1236846
      2000 February                                3325690
      2000 March     New England                   1132966
      2000 March     Mid-Atlantic                  1274062
      2000 March     Southeast US                  1311986
      2000 March                                   3719014
      2000                                        10042570
      2001 January   New England                    509215
      2001 January   Mid-Atlantic                   610697
      2001 January   Southeast US                   379021
      2001 January                                 1498933
      2001 February  New England                    615746
      2001 February  Mid-Atlantic                   428676
      2001 February  Southeast US                   618423
      2001 February                                1662845
      2001 March     New England                    566483
      2001 March     Mid-Atlantic                   637031
      2001 March     Southeast US                   655993
      2001 March                                   1859507
      2001                                         5021285
                                                  15063855

27 rows selected.

Partial ROLLUPs

In a ROLLUP query with N dimensions, the grand total is considered the top level. The various subtotal rows of N-1 dimensions constitute the next lower level, the subtotal rows of N-2 dimensions constitute yet another level down, and so on. In the most recent example, you have three dimensions (year, month, and region), and the total row is the top level. The subtotal rows for the year represent the next lower level, because these rows are subtotals across two dimensions (month and region). The subtotal rows for the year and month combination are one level lower, because these rows are subtotals across one dimension (region). The rest of the rows are the result of the regular GROUP BY operation (without ROLLUP), and form the lowest level.

If you want to exclude some subtotals and totals from the ROLLUP output, you can only move top to bottom, i.e., exclude the top-level total first, then progressively go down to the next level subtotals, and so on. To do this, you have to take out one or more columns from the ROLLUP operation, and put them in the GROUP BY clause. This is called a partial ROLLUP.

As an example of a partial ROLLUP, let’s see what happens when you take out the first column, which is o.year, from the previous ROLLUP operation and move it into the GROUP BY clause.

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY o.year, ROLLUP (o.month, r.name);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
      2000 January   New England                   1018430
      2000 January   Mid-Atlantic                  1221394
      2000 January   Southeast US                   758042
      2000 January                                 2997866
      2000 February  New England                   1231492
      2000 February  Mid-Atlantic                   857352
      2000 February  Southeast US                  1236846
      2000 February                                3325690
      2000 March     New England                   1132966
      2000 March     Mid-Atlantic                  1274062
      2000 March     Southeast US                  1311986
      2000 March                                   3719014
      2000                                        10042570
      2001 January   New England                    509215
      2001 January   Mid-Atlantic                   610697
      2001 January   Southeast US                   379021
      2001 January                                 1498933
      2001 February  New England                    615746
      2001 February  Mid-Atlantic                   428676
      2001 February  Southeast US                   618423
      2001 February                                1662845
      2001 March     New England                    566483
      2001 March     Mid-Atlantic                   637031
      2001 March     Southeast US                   655993
      2001 March                                   1859507
      2001                                         5021285

26 rows selected.

The query in this example excludes the grand-total row from the output. By taking out o.year from the ROLLUP operation, you are asking the database not to roll up summary information over the years. Therefore, the database rolls up summary information on region and month. When you proceed to remove o.month from the ROLLUP operation, the query will not generate the roll up summary for the month dimension, and only the region-level subtotals will be printed in the output. For example:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY o.year, o.month, ROLLUP (r.name);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
      2000 January   New England                   1018430
      2000 January   Mid-Atlantic                  1221394
      2000 January   Southeast US                   758042
      2000 January                                 2997866
      2000 February  New England                   1231492
      2000 February  Mid-Atlantic                   857352
      2000 February  Southeast US                  1236846
      2000 February                                3325690
      2000 March     New England                   1132966
      2000 March     Mid-Atlantic                  1274062
      2000 March     Southeast US                  1311986
      2000 March                                   3719014
      2001 January   New England                    509215
      2001 January   Mid-Atlantic                   610697
      2001 January   Southeast US                   379021
      2001 January                                 1498933
      2001 February  New England                    615746
      2001 February  Mid-Atlantic                   428676
      2001 February  Southeast US                   618423
      2001 February                                1662845
      2001 March     New England                    566483
      2001 March     Mid-Atlantic                   637031
      2001 March     Southeast US                   655993
      2001 March                                   1859507

24 rows selected.

CUBE

The CUBE extension of the GROUP BY clause takes aggregation one step further than ROLLUP. The CUBE operation generates subtotals for all possible combinations of the grouping columns. Therefore, output of a CUBE operation will contain all subtotals produced by an equivalent ROLLUP operation and also some additional subtotals. For example, if you are performing ROLLUP on columns region and month, you will get subtotals for all months for each region, and a grand total. However, if you perform the corresponding CUBE, you will get:

  • The regular rows produced by the GROUP BY clause

  • Subtotals for all months on each region

  • A subtotal for all regions on each month

  • A grand total

Like ROLLUP, CUBE is an extension of the GROUP BY clause, and can appear in a query only along with a GROUP BY clause. The syntax of CUBE is:

SELECT  . . . 
FROM  . . . 
GROUP BY CUBE (list of grouping columns)

For example, the following query returns subtotals for all combinations of regions and months in the all_orders table:

               SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 
               SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY CUBE(r.name, o.month);

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
                                       59285706
                     January            4496799
                     February           4988535
                     March              5578521
                     April              5246355
                     May                5460942
                     June               4720233
                     July               5169720
                     August             4460823
                     September          4811094
                     October            4789818
                     November           4644450
                     December           4918416
New England                            19756923
New England          January            1527645
New England          February           1847238
New England          March              1699449
New England          April              1792866
New England          May                1698855
New England          June               1510062
New England          July               1678002
New England          August             1642968
New England          September          1726767
New England          October            1648944
New England          November           1384185
New England          December           1599942
Mid-Atlantic                           18923298
Mid-Atlantic         January            1832091
Mid-Atlantic         February           1286028
Mid-Atlantic         March              1911093
Mid-Atlantic         April              1623438
Mid-Atlantic         May                1778805
Mid-Atlantic         June               1504455
Mid-Atlantic         July               1820742
Mid-Atlantic         August             1381560
Mid-Atlantic         September          1178694
Mid-Atlantic         October            1530351
Mid-Atlantic         November           1598667
Mid-Atlantic         December           1477374
Southeast US                           20605485
Southeast US         January            1137063
Southeast US         February           1855269
Southeast US         March              1967979
Southeast US         April              1830051
Southeast US         May                1983282
Southeast US         June               1705716
Southeast US         July               1670976
Southeast US         August             1436295
Southeast US         September          1905633
Southeast US         October            1610523
Southeast US         November           1661598
Southeast US         December           1841100

52 rows selected.

Note that the output contains not only the subtotals for each region, but also the subtotals for each month. You can get the same result from a query without the CUBE operation. However, that query would be lengthy and complex and, of course, very inefficient. Such a query would look like:

               SELECT NULL region, NULL month, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id 
               UNION ALL
               SELECT NULL, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY o.month
               UNION ALL
               SELECT r.name region, NULL, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY r.name
               UNION ALL
               SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 
               SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY r.name, o.month;

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
                                       59285706
                     January            4496799
                     February           4988535
                     March              5578521
                     April              5246355
                     May                5460942
                     June               4720233
                     July               5169720
                     August             4460823
                     September          4811094
                     October            4789818
                     November           4644450
                     December           4918416
Mid-Atlantic                           18923298
New England                            19756923
Southeast US                           20605485
New England          January            1527645
New England          February           1847238
New England          March              1699449
New England          April              1792866
New England          May                1698855
New England          June               1510062
New England          July               1678002
New England          August             1642968
New England          September          1726767
New England          October            1648944
New England          November           1384185
New England          December           1599942
Mid-Atlantic         January            1832091
Mid-Atlantic         February           1286028
Mid-Atlantic         March              1911093
Mid-Atlantic         April              1623438
Mid-Atlantic         May                1778805
Mid-Atlantic         June               1504455
Mid-Atlantic         July               1820742
Mid-Atlantic         August             1381560
Mid-Atlantic         September          1178694
Mid-Atlantic         October            1530351
Mid-Atlantic         November           1598667
Mid-Atlantic         December           1477374
Southeast US         January            1137063
Southeast US         February           1855269
Southeast US         March              1967979
Southeast US         April              1830051
Southeast US         May                1983282
Southeast US         June               1705716
Southeast US         July               1670976
Southeast US         August             1436295
Southeast US         September          1905633
Southeast US         October            1610523
Southeast US         November           1661598
Southeast US         December           1841100

52 rows selected.

Since a CUBE produces aggregate results for all possible combinations of the grouping columns, the output of a query using CUBE is independent of the order of columns in the CUBE operation, if everything else remains the same. This is not the case with ROLLUP. If everything else in the query remains the same, ROLLUP(a,b) will produce a slightly different result set than ROLLUP(b,a). However, the result set of CUBE(a,b) will be the same as that of CUBE(b,a). The following example illustrates this by taking the example at the beginning of this section and reversing the order of columns in the CUBE operation:

               SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 
               SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY CUBE(o.month, r.name);

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
                                       59285706
New England                            19756923
Mid-Atlantic                           18923298
Southeast US                           20605485
                     January            4496799
New England          January            1527645
Mid-Atlantic         January            1832091
Southeast US         January            1137063
                     February           4988535
New England          February           1847238
Mid-Atlantic         February           1286028
Southeast US         February           1855269
                     March              5578521
New England          March              1699449
Mid-Atlantic         March              1911093
Southeast US         March              1967979
                     April              5246355
New England          April              1792866
Mid-Atlantic         April              1623438
Southeast US         April              1830051
                     May                5460942
New England          May                1698855
Mid-Atlantic         May                1778805
Southeast US         May                1983282
                     June               4720233
New England          June               1510062
Mid-Atlantic         June               1504455
Southeast US         June               1705716
                     July               5169720
New England          July               1678002
Mid-Atlantic         July               1820742
Southeast US         July               1670976
                     August             4460823
New England          August             1642968
Mid-Atlantic         August             1381560
Southeast US         August             1436295
                     September          4811094
New England          September          1726767
Mid-Atlantic         September          1178694
Southeast US         September          1905633
                     October            4789818
New England          October            1648944
Mid-Atlantic         October            1530351
Southeast US         October            1610523
                     November           4644450
New England          November           1384185
Mid-Atlantic         November           1598667
Southeast US         November           1661598
                     December           4918416
New England          December           1599942
Mid-Atlantic         December           1477374
Southeast US         December           1841100

52 rows selected.

This query produced the same results as the earlier query; only the order of the rows happens to be different.

Partial CUBE

To exclude some subtotals from the output, you can do a partial CUBE, (similar to a partial ROLLUP) by taking out column(s) from the CUBE operation and putting them into the GROUP BY clause. Here’s an example:

               SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 
               SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               GROUP BY r.name, CUBE(o.month);

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
New England                            19756923
New England          January            1527645
New England          February           1847238
New England          March              1699449
New England          April              1792866
New England          May                1698855
New England          June               1510062
New England          July               1678002
New England          August             1642968
New England          September          1726767
New England          October            1648944
New England          November           1384185
New England          December           1599942
Mid-Atlantic                           18923298
Mid-Atlantic         January            1832091
Mid-Atlantic         February           1286028
Mid-Atlantic         March              1911093
Mid-Atlantic         April              1623438
Mid-Atlantic         May                1778805
Mid-Atlantic         June               1504455
Mid-Atlantic         July               1820742
Mid-Atlantic         August             1381560
Mid-Atlantic         September          1178694
Mid-Atlantic         October            1530351
Mid-Atlantic         November           1598667
Mid-Atlantic         December           1477374
Southeast US                           20605485
Southeast US         January            1137063
Southeast US         February           1855269
Southeast US         March              1967979
Southeast US         April              1830051
Southeast US         May                1983282
Southeast US         June               1705716
Southeast US         July               1670976
Southeast US         August             1436295
Southeast US         September          1905633
Southeast US         October            1610523
Southeast US         November           1661598
Southeast US         December           1841100

39 rows selected.

If you compare the results of the partial CUBE operation with that of the full CUBE operation, discussed at the beginning of this section, you will notice that the partial CUBE has excluded the subtotals for each month and the grand total from the output. If you want to retain the subtotals for each month, but want to exclude the subtotals for each region, you can swap the position of r.name and o.month in the GROUP BY . . . CUBE clause, as shown here:

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 
SUM(o.tot_sales)
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
GROUP BY o.month, CUBE(r.name);

One interesting thing to note is that if you have one column in the CUBE operation, it produces the same result as the ROLLUP operation. Therefore, the following two queries produce identical results:

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 
SUM(o.tot_sales)
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
GROUP BY r.name, CUBE(o.month);

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 
SUM(o.tot_sales)
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
GROUP BY r.name, ROLLUP(o.month);

The GROUPING Function

ROLLUP and CUBE produce extra rows in the output that contain subtotals and totals. When a row represents a summary over a given column or set of columns, those columns will contain NULL values. Output containing NULLs and indicating subtotals doesn’t make sense to an ordinary person who is unware of the behavior of ROLLUP and CUBE operations. Does your corporate vice president (VP) care about whether you used ROLLUP or CUBE or any other operation to get him the monthly total sales for each region? Obviously, he doesn’t. That’s exactly why you are reading this page and not your VP.

If you know your way around the NVL function, you would probably attempt to translate each NULL value from CUBE and ROLLUP to some descriptive value, as in the following example:

               SELECT NVL(TO_CHAR(o.year), 'All Years') year,
               NVL(TO_CHAR(TO_DATE(o.month, 'MM'), 'Month'), 'First Quarter') month,
               NVL(r.name, 'All Regions') region, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY ROLLUP (o.year, o.month, r.name);

YEAR         MONTH         REGION         SUM(O.TOT_SALES)
------------ ------------- -------------- ----------------
2000         January       New England             1018430
2000         January       Mid-Atlantic            1221394
2000         January       Southeast US             758042
2000         January       All Regions             2997866
2000         February      New England             1231492
2000         February      Mid-Atlantic             857352
2000         February      Southeast US            1236846
2000         February      All Regions             3325690
2000         March         New England             1132966
2000         March         Mid-Atlantic            1274062
2000         March         Southeast US            1311986
2000         March         All Regions             3719014
2000         First Quarter All Regions            10042570
2001         January       New England              509215
2001         January       Mid-Atlantic             610697
2001         January       Southeast US             379021
2001         January       All Regions             1498933
2001         February      New England              615746
2001         February      Mid-Atlantic             428676
2001         February      Southeast US             618423
2001         February      All Regions             1662845
2001         March         New England              566483
2001         March         Mid-Atlantic             637031
2001         March         Southeast US             655993
2001         March         All Regions             1859507
2001         First Quarter All Regions             5021285
All Years    First Quarter All Regions            15063855

27 rows selected.

The NVL function works pretty well for this example. However, if the data itself contains some NULL values, it becomes impossible to distinguish whether a NULL value represents unavailable data or a subtotal row. The NVL function will cause a problem in such a case. The following data can be used to illustrate this problem:

               SELECT * FROM disputed_orders;

ORDER_NBR   CUST_NBR SALES_EMP_ID SALE_PRICE ORDER_DT  EXPECTED_ STATUS
---------- ---------- ------------ ---------- --------- --------- ---------
      1001          1         7354         99 22-JUL-01 23-JUL-01 DELIVERED
      1000          1         7354            19-JUL-01 24-JUL-01
      1002          5         7368            12-JUL-01 25-JUL-01
      1003          4         7654         56 16-JUL-01 26-JUL-01 DELIVERED
      1004          4         7654         34 18-JUL-01 27-JUL-01 PENDING
      1005          8         7654         99 22-JUL-01 24-JUL-01 DELIVERED
      1006          1         7354            22-JUL-01 28-JUL-01
      1007          5         7368         25 20-JUL-01 22-JUL-01 PENDING
      1008          5         7368         25 21-JUL-01 23-JUL-01 PENDING
      1009          1         7354         56 18-JUL-01 22-JUL-01 DELIVERED
      1012          1         7354         99 22-JUL-01 23-JUL-01 DELIVERED
      1011          1         7354            19-JUL-01 24-JUL-01
      1015          5         7368            12-JUL-01 25-JUL-01
      1017          4         7654         56 16-JUL-01 26-JUL-01 DELIVERED
      1019          4         7654         34 18-JUL-01 27-JUL-01 PENDING
      1021          8         7654         99 22-JUL-01 24-JUL-01 DELIVERED
      1023          1         7354            22-JUL-01 28-JUL-01
      1025          5         7368         25 20-JUL-01 22-JUL-01 PENDING
      1027          5         7368         25 21-JUL-01 23-JUL-01 PENDING
      1029          1         7354         56 18-JUL-01 22-JUL-01 DELIVERED

20 rows selected.

Note that the column status contains NULL values. If you want the summary status of orders for each customer, and you executed the following query (note the application of NVL to the status column), the output might surprise you.

               SELECT NVL(TO_CHAR(cust_nbr), 'All Customers') customer, 
               NVL(status, 'All Status') status,
               COUNT(*) FROM disputed_orders
               GROUP BY CUBE(cust_nbr, status);

CUSTOMER                                 STATUS                 COUNT(*)
---------------------------------------- -------------------- ----------
All Customers                            All Status                    6
All Customers                            All Status                   20
All Customers                            PENDING                       6
All Customers                            DELIVERED                     8
1                                        All Status                    4
1                                        All Status                    8
1                                        DELIVERED                     4
4                                        All Status                    4
4                                        PENDING                       2
4                                        DELIVERED                     2
5                                        All Status                    2
5                                        All Status                    6
5                                        PENDING                       4
8                                        All Status                    2
8                                        DELIVERED                     2

15 rows selected.

This output doesn’t make any sense. The problem is that any time the status column legitimately contains a NULL value, the NVL function returns the string “All Status.” Obviously, NVL isn’t useful in this situation. However, don’t worry—Oracle provides a solution to this problem through the GROUPING function.

The GROUPING function is meant to be used in conjunction with either a ROLLUP or a CUBE operation. The GROUPING function takes a grouping column name as input and returns either 1 or 0. A 1 is returned if the column’s value is NULL as the result of aggregation (ROLLUP or CUBE); otherwise, 0 is returned. The general syntax of the GROUPING function is:

SELECT  . . .  [GROUPING(grouping_column_name)]  . . . 
FROM  . . . 
GROUP BY  . . .  {ROLLUP | CUBE} (grouping_column_name)

The following example illustrates the use of GROUPING function in a simple way by returning the GROUPING function results for the three columns passed to ROLLUP:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales),
               GROUPING(o.year) y, GROUPING(o.month) m, GROUPING(r.name) r
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY ROLLUP (o.year, o.month, r.name);

YEAR MONTH     REGION               SUM(O.TOT_SALES)    Y     M     R
----- --------- -------------------- ---------------- ---- ----- -----
 2000 January   New England                   1018430    0     0     0
 2000 January   Mid-Atlantic                  1221394    0     0     0
 2000 January   Southeast US                   758042    0     0     0
 2000 January                                 2997866    0     0     1
 2000 February  New England                   1231492    0     0     0
 2000 February  Mid-Atlantic                   857352    0     0     0
 2000 February  Southeast US                  1236846    0     0     0
 2000 February                                3325690    0     0     1
 2000 March     New England                   1132966    0     0     0
 2000 March     Mid-Atlantic                  1274062    0     0     0
 2000 March     Southeast US                  1311986    0     0     0
 2000 March                                   3719014    0     0     1
 2000                                        10042570    0     1     1
 2001 January   New England                    509215    0     0     0
 2001 January   Mid-Atlantic                   610697    0     0     0
 2001 January   Southeast US                   379021    0     0     0
 2001 January                                 1498933    0     0     1
 2001 February  New England                    615746    0     0     0
 2001 February  Mid-Atlantic                   428676    0     0     0
 2001 February  Southeast US                   618423    0     0     0
 2001 February                                1662845    0     0     1
 2001 March     New England                    566483    0     0     0
 2001 March     Mid-Atlantic                   637031    0     0     0
 2001 March     Southeast US                   655993    0     0     0
 2001 March                                   1859507    0     0     1
 2001                                         5021285    0     1     1
                                             15063855    1     1     1

27 rows selected.

Look at the y, m, and r columns in this output. Row 4 is a region-level subtotal for a particular month and year, and therefore, the GROUPING function results in a value of 1 for the region and a value 0 for the month and year. Row 26 (the second to last) is a subtotal for all regions and months for a particular year, and therefore, the GROUPING function prints 1 for the month and the region and 0 for the year. Row 27 (the grand total) contains 1 for all the GROUPING columns.

With a combination of GROUPING and DECODE (or CASE), you can produce more readable query output when using CUBE and ROLLUP, as in the following example:

               SELECT DECODE(GROUPING(o.year), 1, 'All Years', o.year) Year,
               DECODE(GROUPING(o.month), 1, 'All Months', 
               TO_CHAR(TO_DATE(o.month, 'MM'), 'Month')) Month,
               DECODE(GROUPING(r.name), 1, 'All Regions', r.name) Region, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY ROLLUP (o.year, o.month, r.name);

YEAR             MONTH      REGION               SUM(O.TOT_SALES)
---------------- ---------- -------------------- ----------------
2000             January    New England                   1018430
2000             January    Mid-Atlantic                  1221394
2000             January    Southeast US                   758042
2000             January    All Regions                   2997866
2000             February   New England                   1231492
2000             February   Mid-Atlantic                   857352
2000             February   Southeast US                  1236846
2000             February   All Regions                   3325690
2000             March      New England                   1132966
2000             March      Mid-Atlantic                  1274062
2000             March      Southeast US                  1311986
2000             March      All Regions                   3719014
2000             All Months All Regions                  10042570
2001             January    New England                    509215
2001             January    Mid-Atlantic                   610697
2001             January    Southeast US                   379021
2001             January    All Regions                   1498933
2001             February   New England                    615746
2001             February   Mid-Atlantic                   428676
2001             February   Southeast US                   618423
2001             February   All Regions                   1662845
2001             March      New England                    566483
2001             March      Mid-Atlantic                   637031
2001             March      Southeast US                   655993
2001             March      All Regions                   1859507
2001             All Months All Regions                   5021285
All Years        All Months All Regions                  15063855

27 rows selected.

By using DECODE with GROUPING, we produced the same result that was produced by using NVL at the beginning of the section. However, the risk of mistreating a NULL data value as a summary row is eliminated by using GROUPING and DECODE. You will notice this in the following example, in which NULL data values in subtotal and total rows are treated differently by the GROUPING function than the NULL values in the summary rows:

               SELECT DECODE(GROUPING(cust_nbr), 1, 'All Customers', cust_nbr) customer,
               DECODE(GROUPING(status), 1, 'All Status', status) status, COUNT(*)
               FROM disputed_orders
               GROUP BY CUBE(cust_nbr, status);

CUSTOMER                                 STATUS                 COUNT(*)
---------------------------------------- -------------------- ----------
All Customers                                                          6
All Customers                            All Status                   20
All Customers                            PENDING                       6
All Customers                            DELIVERED                     8
1                                                                      4
1                                        All Status                    8
1                                        DELIVERED                     4
4                                        All Status                    4
4                                        PENDING                       2
4                                        DELIVERED                     2
5                                                                      2
5                                        All Status                    6
5                                        PENDING                       4
8                                        All Status                    2
8                                        DELIVERED                     2

15 rows selected.

GROUPING SETS

Earlier in this chapter, you saw how to generate summary information using ROLLUP and CUBE. However, the output of ROLLUP and CUBE include the rows produced by the regular GROUP BY operation along with the summary rows. Oracle9i introduced another extension to the GROUP BY clause called GROUPING SETS that you can use to generate summary information at the level you choose without including all the rows produced by the regular GROUP BY operation.

Like ROLLUP and CUBE, GROUPING SETS is also an extension of the GROUP BY clause, and can appear in a query only along with a GROUP BY clause. The syntax of GROUPING SETS is:

SELECT  . . . 
FROM  . . . 
GROUP BY GROUPING SETS (list of grouping columns)

Let’s take an example to understand the GROUPING SETS operation further:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY GROUPING SETS (o.year, o.month, r.name);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
                     Mid-Atlantic                  5029212
                     New England                   5074332
                     Southeast US                  4960311
           January                                 4496799
           February                                4988535
           March                                   5578521
      2000                                        10042570
      2001                                         5021285

8 rows selected.

This output contains only the subtotals at the region, month, and year levels, but that none of the normal, more detailed, GROUP BY data is included. The order of columns in the GROUPING SETS operation is not critical. The operation produces the same output regardless of the order of the columns. For example:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales)
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY GROUPING SETS (o.month, r.name, o.year);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
                     Mid-Atlantic                  5029212
                     New England                   5074332
                     Southeast US                  4960311
           January                                 4496799
           February                                4988535
           March                                   5578521
      2000                                        10042570
      2001                                         5021285

8 rows selected.

Pushing the GROUPING Envelope

The grouping examples you have seen so far represent simple ways of aggregating data using Oracle’s extensions of the GROUP BY clause. These simple mechanisms were introduced in Oracle8i. In Oracle9i Database, Oracle enhanced this new functionality in some interesting and useful ways. Oracle now allows for:

  • Repeating column names in the GROUP BY clause

  • Grouping on composite columns

  • Concatenated groupings

Repeated Column Names in the GROUP BY Clause

In Oracle8i, repeating column names are not allowed in a GROUP BY clause. If the GROUP BY clause contains an extension (i.e., ROLLUP or CUBE), you cannot use the same column inside the extension as well as outside the extension. The following SQL is invalid in Oracle8i:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY o.year, ROLLUP (o.year, o.month, r.name);
                         *
ERROR at line 6:
ORA-30490: Ambiguous expression in GROUP BY ROLLUP or CUBE list

However, the same query works in Oracle9i Database and later:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY o.year, ROLLUP (o.year, o.month, r.name);

      YEAR MONTH     REGION                    TOTAL
---------- --------- -------------------- ----------
      2000 January   Mid-Atlantic            1221394
      2000 January   New England             1018430
      2000 January   Southeast US             758042
      2000 January                           2997866
      2000 February  Mid-Atlantic             857352
      2000 February  New England             1231492
      2000 February  Southeast US            1236846
      2000 February                          3325690
      2000 March     Mid-Atlantic            1274062
      2000 March     New England             1132966
      2000 March     Southeast US            1311986
      2000 March                             3719014
      2001 January   Mid-Atlantic             610697
      2001 January   New England              509215
      2001 January   Southeast US             379021
      2001 January                           1498933
      2001 February  Mid-Atlantic             428676
      2001 February  New England              615746
      2001 February  Southeast US             618423
      2001 February                          1662845
      2001 March     Mid-Atlantic             637031
      2001 March     New England              566483
      2001 March     Southeast US             655993
      2001 March                             1859507
      2000                                  10042570
      2001                                   5021285
      2000                                  10042570
      2001                                   5021285

28 rows selected.

Repetition of o.year in the GROUP BY clause as well as in the ROLLUP operation repeats the summary rows of each year in the output and suppresses the grand total. Repetition of column names in a GROUP BY clause isn’t very useful, but it’s worth knowing that such constructs are allowed in Oracle9i and later.

Grouping on Composite Columns

Oracle8i supports grouping on individual columns only. Oracle9i extends the grouping operations to include grouping on composite columns. A composite column is a collection of two or more columns, but their values are treated as one for the grouping computation. Oracle8i allows group operations of the form ROLLUP (a,b,c), while, Oracle9i allows group operations of the form ROLLUP (a,(b,c)) as well. In this case, (b,c) is treated as one column for the purpose of the grouping computation. For example:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY ROLLUP ((o.year, o.month),r.name);

      YEAR MONTH     REGION                    TOTAL
---------- --------- -------------------- ----------
      2000 January   Mid-Atlantic            1221394
      2000 January   New England             1018430
      2000 January   Southeast US             758042
      2000 January                           2997866
      2000 February  Mid-Atlantic             857352
      2000 February  New England             1231492
      2000 February  Southeast US            1236846
      2000 February                          3325690
      2000 March     Mid-Atlantic            1274062
      2000 March     New England             1132966
      2000 March     Southeast US            1311986
      2000 March                             3719014
      2001 January   Mid-Atlantic             610697
      2001 January   New England              509215
      2001 January   Southeast US             379021
      2001 January                           1498933
      2001 February  Mid-Atlantic             428676
      2001 February  New England              615746
      2001 February  Southeast US             618423
      2001 February                          1662845
      2001 March     Mid-Atlantic             637031
      2001 March     New England              566483
      2001 March     Southeast US             655993
      2001 March                             1859507
                                            15063855

25 rows selected.

In this example, two columns (o.year, o.month) are treated as one composite column. This causes Oracle to treat the combination of year and month as one dimension, and the summary rows are computed accordingly. Although this query is not allowed in Oracle8i, you can fake composite column groupings in Oracle8i by using the concatenation operator (||) to combine two columns and treat the result as one composite column. Oracle8i can then produce the same result as the previous query in Oracle 9i. For example:

               SELECT TO_CHAR(o.year)||' '||TO_CHAR(TO_DATE(o.month,'MM'),'Month') 
                      Year_Month, 
                      r.name region, SUM(o.tot_sales) 
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY 
               ROLLUP (TO_CHAR(o.year)||' '||
                       TO_CHAR(TO_DATE(o.month,'MM'),'Month'), r.name);

YEAR_MONTH           REGION               SUM(O.TOT_SALES)
-------------------- -------------------- ----------------
2000 February        Mid-Atlantic                   857352
2000 February        New England                   1231492
2000 February        Southeast US                  1236846
2000 February                                      3325690
2000 January         Mid-Atlantic                  1221394
2000 January         New England                   1018430
2000 January         Southeast US                   758042
2000 January                                       2997866
2000 March           Mid-Atlantic                  1274062
2000 March           New England                   1132966
2000 March           Southeast US                  1311986
2000 March                                         3719014
2001 February        Mid-Atlantic                   428676
2001 February        New England                    615746
2001 February        Southeast US                   618423
2001 February                                      1662845
2001 January         Mid-Atlantic                   610697
2001 January         New England                    509215
2001 January         Southeast US                   379021
2001 January                                       1498933
2001 March           Mid-Atlantic                   637031
2001 March           New England                    566483
2001 March           Southeast US                   655993
2001 March                                         1859507
                                                  15063855

25 rows selected.

This query converts the numeric month into the string expression of the name of the month and concatenates it with the string representation of the year. The same expression has to be used in the SELECT list and the ROLLUP clause. The expression TO_CHAR(o.year)||' '||TO_CHAR(TO_DATE( o.month,'MM'),'Month') is treated as one composite column.

Concatenated Groupings

With Oracle9i and later, you can have multiple ROLLUP, CUBE, or GROUPING SETS operations, or a combination of these under the GROUP BY clause in a query. This is not allowed in Oracle8i. You will get an error message if you attempt the following query in Oracle8i:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY ROLLUP (o.year, o.month), ROLLUP(r.name);
                                   *
ERROR at line 6:
ORA-30489: Cannot have more than one rollup/cube expression list

However, the same query works in Oracle9i and later:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY ROLLUP (o.year, o.month), ROLLUP(r.name);

      YEAR MONTH     REGION                    TOTAL
---------- --------- -------------------- ----------
      2000 January   Mid-Atlantic            1221394
      2000 January   New England             1018430
      2000 January   Southeast US             758042
      2000 January                           2997866
      2000 February  Mid-Atlantic             857352
      2000 February  New England             1231492
      2000 February  Southeast US            1236846
      2000 February                          3325690
      2000 March     Mid-Atlantic            1274062
      2000 March     New England             1132966
      2000 March     Southeast US            1311986
      2000 March                             3719014
      2000           Mid-Atlantic            3352808
      2000           New England             3382888
      2000           Southeast US            3306874
      2000                                  10042570
      2001 January   Mid-Atlantic             610697
      2001 January   New England              509215
      2001 January   Southeast US             379021
      2001 January                           1498933
      2001 February  Mid-Atlantic             428676
      2001 February  New England              615746
      2001 February  Southeast US             618423
      2001 February                          1662845
      2001 March     Mid-Atlantic             637031
      2001 March     New England              566483
      2001 March     Southeast US             655993
      2001 March                             1859507
      2001           Mid-Atlantic            1676404
      2001           New England             1691444
      2001           Southeast US            1653437
      2001                                   5021285
                     Mid-Atlantic            5029212
                     New England             5074332
                     Southeast US            4960311
                                            15063855

36 rows selected.

When you have multiple grouping operations (ROLLUP, CUBE, or GROUPING SETS) in a GROUP BY clause, what you have is called a concatenated grouping. The result of the concatenated grouping is to produce a cross-product of groupings from each grouping operation. Therefore, the query:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY ROLLUP(o.year),  ROLLUP (o.month), ROLLUP (r.name);

behaves as a CUBE and produces the same result as the query:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year, o.month, r.name);

Since a CUBE contains aggregates for all possible combinations of the grouping columns, the concatenated grouping of CUBES is no different from a regular CUBE, and all the following queries return the same result as the query shown previously:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year, o.month), CUBE (r.name);

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year), CUBE (o.month, r.name);

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year), CUBE (o.month), CUBE (r.name);

Concatenated groupings with GROUPING SETS

Concatenated groupings come in handy while using GROUPING SETS. Since GROUPING SETS produces only the subtotal rows, you can specify just the aggregation levels you want in your output by using a concatenated grouping of GROUPING SETS. The concatenated grouping of GROUPING SETS (a,b) and GROUPING SETS (c,d) will produce aggregate rows for the aggregation levels (a,c), (a,d), (b,c), and (b,d). The concatenated grouping of GROUPING SETS (a,b) and GROUPING SETS (c) will produce aggregate rows for the aggregation levels (a,c) and (b,c). For example:

                  SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
                  r.name region, SUM(o.tot_sales) total
                  FROM all_orders o JOIN region r
                  ON r.region_id = o.region_id
                  WHERE o.month BETWEEN 1 AND 3
                  GROUP BY GROUPING SETS (o.year, o.month), GROUPING SETS (r.name);

      YEAR MONTH     REGION                    TOTAL
---------- --------- -------------------- ----------
      2000           Mid-Atlantic            3352808
      2000           New England             3382888
      2000           Southeast US            3306874
      2001           Mid-Atlantic            1676404
      2001           New England             1691444
      2001           Southeast US            1653437
           January   Mid-Atlantic            1832091
           January   New England             1527645
           January   Southeast US            1137063
           February  Mid-Atlantic            1286028
           February  New England             1847238
           February  Southeast US            1855269
           March     Mid-Atlantic            1911093
           March     New England             1699449
           March     Southeast US            1967979

15 rows selected.

The concatenated grouping GROUP BY GROUPING SETS (O.YEAR, O.MONTH), GROUPING SETS (R.NAME) in this example produces rows for aggregate levels (O.YEAR, R.NAME) and (O.MONTH, R.NAME). Therefore, you see aggregate rows for (Year, Region) and (Month, Region) combinations in the output. The following example extends the previous query:

                  SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
                  r.name region, SUM(o.tot_sales) total
                  FROM all_orders o JOIN region r
                  ON r.region_id = o.region_id
                  WHERE o.month BETWEEN 1 AND 3
                  GROUP BY GROUPING SETS (o.year, o.month), GROUPING SETS (o.year, r. name);

          YEAR MONTH     REGION                    TOTAL
    ---------- --------- -------------------- ----------
 1:      2000                                  10042570
 2:      2001                                   5021285
 3:      2000 January                           2997866
 4:      2000 February                          3325690
 5:      2000 March                             3719014
 6:      2001 January                           1498933
 7:      2001 February                          1662845
 8:      2001 March                             1859507
 9:      2000           Mid-Atlantic            3352808
10:      2000           New England             3382888
11:      2000           Southeast US            3306874
12:      2001           Mid-Atlantic            1676404
13:      2001           New England             1691444
14:      2001           Southeast US            1653437
15:           January   Mid-Atlantic            1832091
16:           January   New England             1527645
17:           January   Southeast US            1137063
18:           February  Mid-Atlantic            1286028
19:           February  New England             1847238
20:           February  Southeast US            1855269
21:           March     Mid-Atlantic            1911093
22:           March     New England             1699449
23:           March     Southeast US            1967979

23 rows selected.

This example produces four grouping combinations. Table 13-1 describes the various grouping combinations produced by this query and references their corresponding row numbers in the output.

Table 13-1. Grouping combinations

Grouping combination

Corresponding rows

(o.year, o.year)

1-2

(o.year, r.name)

9-14

(o.month, o.year)

3-8

(o.month, r.name)

15-23

The GROUPING SETS operation is independent of the order of columns. Therefore, the following two queries will produce the same results as shown previously:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY GROUPING SETS (o.year, r.name), GROUPING SETS (o.year, o.month);

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY GROUPING SETS (o.month, o.year), GROUPING SETS (r.name, o.year);

It is permissible to have a combination of ROLLUP, CUBE, and GROUPING SETS in a single GROUP BY clause, as in the following example:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY GROUPING SETS (o.month, o.year), ROLLUP(r.name), CUBE (o.year);

However, the output from such queries seldom makes any sense. You should carefully evaluate the need for such a query if you intend to write one.

ROLLUP and CUBE as arguments to GROUPING SETS

Unlike the ROLLUP and CUBE operations, the GROUPING SETS operation can take a ROLLUP or a CUBE as its argument. As you have seen earlier, GROUPING SETS produces only subtotal rows. However, there are times when you may need to print the grand total along with the subtotals. In such situations, you can perform the GROUPING SETS operation on ROLLUP operations, as in the following example:

                  SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
                  r.name region, SUM(o.tot_sales) total
                  FROM all_orders o JOIN region r
                  ON r.region_id = o.region_id
                  WHERE o.month BETWEEN 1 AND 3
                  GROUP BY GROUPING SETS (ROLLUP (o.year), 
                                          ROLLUP (o.month), 
                                          ROLLUP (r. name));

      YEAR MONTH     REGION                    TOTAL
---------- --------- -------------------- ----------
                     Mid-Atlantic            5029212
                     New England             5074332
                     Southeast US            4960311
           January                           4496799
           February                          4988535
           March                             5578521
      2000                                  10042570
      2001                                   5021285
                                            15063855
                                            15063855
                                            15063855

11 rows selected.

This example produces the subtotals for each dimension, as expected from the regular GROUPING SETS operations. Also, it produces the grand total across all the dimensions. However, you get three identical grand-total rows. The grand-total rows are repeated because they are produced by each ROLLUP operation inside the GROUPING SETS. If you insist on only one grand-total row, you may use the DISTINCT keyword in the SELECT clause:

                  SELECT Distinct o.year, 
                                  TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
                                  r.name region, SUM(o.tot_sales) total
                  FROM all_orders o JOIN region r ON r.region_id = o.region_id
                  WHERE o.month BETWEEN 1 AND 3
                  GROUP BY GROUPING SETS (ROLLUP (o.year), ROLLUP (o.month), 
                                          ROLLUP (r. name));

 YEAR MONTH     REGION                    TOTAL
----- --------- -------------------- ----------
 2000                                  10042570
 2001                                   5021285
       February                         4988535
       January                          4496799
       March                            5578521
                Mid-Atlantic            5029212
                New England             5074332
                Southeast US            4960311
                                       15063855

9 rows selected.

In this example, the DISTINCT keyword eliminated the duplicate grand-total rows. You can also eliminate duplicate rows by using the GROUP_ID function, as discussed later in this chapter.

If you are interested in subtotals and totals on composite dimensions, you can use composite or concatenated ROLLUP operations within GROUPING SETS, as in the following example:

                  SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
                  r.name region, SUM(o.tot_sales) total
                  FROM all_orders o JOIN region r
                  ON r.region_id = o.region_id
                  WHERE o.month BETWEEN 1 AND 3
                  GROUP BY GROUPING SETS (ROLLUP (o.year, o.month), ROLLUP(r.name));

     YEAR MONTH     REGION                    TOTAL
--------- --------- -------------------- ----------
                    Mid-Atlantic            5029212
                    New England             5074332
                    Southeast US            4960311
     2000 January                           2997866
     2000 February                          3325690
     2000 March                             3719014
     2000                                  10042570
     2001 January                           1498933
     2001 February                          1662845
     2001 March                             1859507
     2001                                   5021285
                                           15063855
                                           15063855

13 rows selected.

This query generates subtotals for (year, month) combinations, subtotals for the region, subtotals for the year, and the grand total. Note that there are duplicate grand-total rows because of the multiple ROLLUP operations within the GROUPING SETS operation.

The GROUPING_ID and GROUP_ID Functions

Earlier in this chapter, you saw how to use the GROUPING function to distinguish between the regular GROUP BY rows and the summary rows produced by the GROUP BY extensions. Oracle9i extended the concept of the GROUPING function and introduced two more functions that you can use with a GROUP BY clause:

  • GROUPING_ID

  • GROUP_ID

These functions can be used only with a GROUP BY clause. However, unlike the GROUPING function that can only be used with a GROUP BY extension, the GROUPING_ID and GROUP_ID functions can be used in a query, even without a GROUP BY extension.

Tip

Although it is legal to use these two functions without a GROUP BY extension, using GROUPING_ID and GROUP_ID without ROLLUP, CUBE, or GROUPING SETS doesn’t produce any meaningful output, because GROUPING_ID and GROUP_ID are 0 for all regular GROUP BY rows.

The following sections discuss these two functions in detail.

GROUPING_ID

The syntax of the GROUPING_ID function is as follows:

SELECT  . . .  , GROUPING_ID(ordered_list_of_grouping_columns)
FROM  . . . 
GROUP BY  . . .

The GROUPING_ID function takes an ordered list of grouping columns as input, and computes the output by working through the following steps:

  1. It generates the results of the GROUPING function as applied to each of the individual columns in the list. The result of this step is a set of ones and zeros.

  2. It puts these ones and zeros in the same order as the order of the columns in its argument list to produce a bit vector.

  3. Treating this bit vector (a series of ones and zeros) as a binary number, it converts the bit vector into a decimal (base 10) number.

  4. The decimal number computed in Step 3 is returned as the GROUPING_ID function’s output.

The following example illustrates this process and compares the results from GROUPING_ID with those from GROUPING:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total,
               GROUPING(o.year) y, GROUPING(o.month) m, GROUPING(r.name) r,
               GROUPING_ID (o.year, o.month, r.name) gid
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY CUBE (o.year, o.month, r.name);

YEAR MONTH     REGION              TOTAL   Y    M   R    GID
---- --------- -------------- ---------- --- ---- --- ------
2000 January   Mid-Atlantic      1221394   0    0   0      0
2000 January   New England       1018430   0    0   0      0
2000 January   Southeast US       758042   0    0   0      0
2000 January                     2997866   0    0   1      1
2000 February  Mid-Atlantic       857352   0    0   0      0
2000 February  New England       1231492   0    0   0      0
2000 February  Southeast US      1236846   0    0   0      0
2000 February                    3325690   0    0   1      1
2000 March     Mid-Atlantic      1274062   0    0   0      0
2000 March     New England       1132966   0    0   0      0
2000 March     Southeast US      1311986   0    0   0      0
2000 March                       3719014   0    0   1      1
2000           Mid-Atlantic      3352808   0    1   0      2
2000           New England       3382888   0    1   0      2
2000           Southeast US      3306874   0    1   0      2
2000                            10042570   0    1   1      3
2001 January   Mid-Atlantic       610697   0    0   0      0
2001 January   New England        509215   0    0   0      0
2001 January   Southeast US       379021   0    0   0      0
2001 January                     1498933   0    0   1      1
2001 February  Mid-Atlantic       428676   0    0   0      0
2001 February  New England        615746   0    0   0      0
2001 February  Southeast US       618423   0    0   0      0
2001 February                    1662845   0    0   1      1
2001 March     Mid-Atlantic       637031   0    0   0      0
2001 March     New England        566483   0    0   0      0
2001 March     Southeast US       655993   0    0   0      0
2001 March                       1859507   0    0   1      1
2001           Mid-Atlantic      1676404   0    1   0      2
2001           New England       1691444   0    1   0      2
2001           Southeast US      1653437   0    1   0      2
2001                             5021285   0    1   1      3
     January   Mid-Atlantic      1832091   1    0   0      4
     January   New England       1527645   1    0   0      4
     January   Southeast US      1137063   1    0   0      4
     January                     4496799   1    0   1      5
     February  Mid-Atlantic      1286028   1    0   0      4
     February  New England       1847238   1    0   0      4
     February  Southeast US      1855269   1    0   0      4
     February                    4988535   1    0   1      5
     March     Mid-Atlantic      1911093   1    0   0      4
     March     New England       1699449   1    0   0      4
     March     Southeast US      1967979   1    0   0      4
     March                       5578521   1    0   1      5
               Mid-Atlantic      5029212   1    1   0      6
               New England       5074332   1    1   0      6
               Southeast US      4960311   1    1   0      6
                                15063855   1    1   1      7

48 rows selected.

Note that the GROUPING_ID is the decimal equivalent of the bit vector generated by the individual GROUPING functions. In this output, the GROUPING_ID has values 0, 1, 2, 3, 4, 5, 6, and 7. Table 13-2 describes these aggregation levels.

Table 13-2. Result of GROUPING_ID(o.year, o.month, r.name)

Aggregation level

Bit vector

GROUPING_ID

Regular GROUP BY rows

0 0 0

0

Subtotal for Year-Month, aggregated at (Region)

0 0 1

1

Subtotal for Year-Region, aggregated at (Month)

0 1 0

2

Subtotal for Year, aggregated at (Month, Region)

0 1 1

3

Subtotal for Month-Region, aggregated at (Year)

1 0 0

4

Subtotal for Month, aggregated at (Year, Region)

1 0 1

5

Subtotal for Region, aggregated at (Year, Month)

1 1 0

6

Grand total for all levels, aggregated at (Year, Month, Region)

1 1 1

7

The GROUPING_ID function can be used effectively in a query to filter rows according to your requirement. Let’s say you want only the summary rows to be displayed in the output, and not the regular GROUP BY rows. You can use the GROUPING_ID function in the HAVING clause to do this by restricting output to only those rows that contain totals and subtotals (i.e., for which GROUPING_ID > 0):

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY CUBE (o.year, o.month, r.name)
               HAVING GROUPING_ID (o.year, o.month, r.name) > 0;

     YEAR MONTH     REGION                    TOTAL
--------- --------- -------------------- ----------
                                           15063855
                    New England             5074332
                    Mid-Atlantic            5029212
                    Southeast US            4960311
          January                           4496799
          January   New England             1527645
          January   Mid-Atlantic            1832091
          January   Southeast US            1137063
          February                          4988535
          February  New England             1847238
          February  Mid-Atlantic            1286028
          February  Southeast US            1855269
          March                             5578521
          March     New England             1699449
          March     Mid-Atlantic            1911093
          March     Southeast US            1967979
     2000                                  10042570
     2000           New England             3382888
     2000           Mid-Atlantic            3352808
     2000           Southeast US            3306874
     2000 January                           2997866
     2000 February                          3325690
     2000 March                             3719014
     2001                                   5021285
     2001           New England             1691444
     2001           Mid-Atlantic            1676404
     2001           Southeast US            1653437
     2001 January                           1498933
     2001 February                          1662845
     2001 March                             1859507

30 rows selected.

As you can see, GROUPING_ID makes it easier to filter the output of aggregation operations. Without the GROUPING_ID function, you have to write a more complex query using the GROUPING function to achieve the same result. For example, the following query uses GROUPING rather than GROUPING_ID to display only totals and subtotals. Note the added complexity in the HAVING clause.

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY CUBE (o.year, o.month, r.name)
               HAVING GROUPING(o.year) > 0
               OR GROUPING(o.month) > 0
               OR GROUPING(r.name) > 0;

   YEAR MONTH     REGION                    TOTAL
------- --------- -------------------- ----------
                                         15063855
                  New England             5074332
                  Mid-Atlantic            5029212
                  Southeast US            4960311
        January                           4496799
        January   New England             1527645
        January   Mid-Atlantic            1832091
        January   Southeast US            1137063
        February                          4988535
        February  New England             1847238
        February  Mid-Atlantic            1286028
        February  Southeast US            1855269
        March                             5578521
        March     New England             1699449
        March     Mid-Atlantic            1911093
        March     Southeast US            1967979
   2000                                  10042570
   2000           New England             3382888
   2000           Mid-Atlantic            3352808
   2000           Southeast US            3306874
   2000 January                           2997866
   2000 February                          3325690
   2000 March                             3719014
   2001                                   5021285
   2001           New England             1691444
   2001           Mid-Atlantic            1676404
   2001           Southeast US            1653437
   2001 January                           1498933
   2001 February                          1662845
   2001 March                             1859507

30 rows selected.

GROUPING and GROUPING_ID in ORDER BY

The GROUPING and GROUPING_ID functions not only help you filter rows returned from queries using CUBE and ROLLUP, they can also help you to order those rows in a meaningful way. The order of the rows in a query’s output is not guaranteed unless you use an ORDER BY clause in the query. However, if you order the results of a CUBE or ROLLUP query by one dimension, the order of the results may not be meaningful with respect to other dimensions. In such an aggregate query, you may prefer to order the results based on the number of dimensions involved rather than by individual dimensions. For example, when executing the previous section’s query, you may prefer to see the output rows in the following order:

  1. Those rows representing an aggregate in one dimension

  2. Those rows representing an aggregate in two dimensions

  3. Those rows representing an aggregate in three dimensions

To achieve this ordering of rows, you need to use an ORDER BY clause that uses a combination of GROUPING and GROUPING_ID functions, as shown in the following example:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total,
               GROUPING_ID (o.year, o.month, r.name) gid,
               GROUPING(o.year) + GROUPING(o.month) + GROUPING(r.name) sum_grouping
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY CUBE (o.year, o.month, r.name)
               HAVING GROUPING(o.year) > 0
               OR GROUPING(o.month) > 0
               OR GROUPING(r.name) > 0
               ORDER BY (GROUPING(o.year) + GROUPING(o.month) + GROUPING(r.name)),
               GROUPING_ID (o.year, o.month, r.name);

  YEAR MONTH     REGION              TOTAL   GID SUM_GROUPING
------ --------- -------------- ---------- ----- ------------
  2000 January                     2997866     1            1
  2000 February                    3325690     1            1
  2000 March                       3719014     1            1
  2001 March                       1859507     1            1
  2001 February                    1662845     1            1
  2001 January                     1498933     1            1
  2000           New England       3382888     2            1
  2001           Mid-Atlantic      1676404     2            1
  2001           Southeast US      1653437     2            1
  2001           New England       1691444     2            1
  2000           Mid-Atlantic      3352808     2            1
  2000           Southeast US      3306874     2            1
       January   New England       1527645     4            1
       January   Mid-Atlantic      1832091     4            1
       January   Southeast US      1137063     4            1
       February  Southeast US      1855269     4            1
       March     Mid-Atlantic      1911093     4            1
       March     New England       1699449     4            1
       February  Mid-Atlantic      1286028     4            1
       February  New England       1847238     4            1
       March     Southeast US      1967979     4            1
  2000                            10042570     3            2
  2001                             5021285     3            2
       January                     4496799     5            2
       March                       5578521     5            2
       February                    4988535     5            2
                 New England       5074332     6            2
                 Mid-Atlantic      5029212     6            2
                 Southeast US      4960311     6            2
                                  15063855     7            3

In this output, the aggegate rows for individual dimensions, region, month, and year are shown first. These are followed by the aggregate rows for two dimensions: month and region, year and region, and year and month, respectively. The last row is the one aggregated over all three dimensions.

GROUP_ID

As you saw in previous sections, Oracle9i Database allows you to have repeating grouping columns and multiple grouping operations in a GROUP BY clause. Some combinations could result in duplicate rows in the output. The GROUP_ID distinguishes between otherwise duplicate result rows.

The syntax of the GROUP_ID function is:

SELECT  . . .  , GROUP_ID( )
FROM  . . . 
GROUP BY  . . .

The GROUP_ID function takes no argument, and returns 0 through n - 1, where n is the occurrence count for duplicates. The first occurrence of a given row in the output of a query will have a GROUP_ID of 0, the second occurrence of a given row will have a GROUP_ID of 1, and so forth. The following example illustrates the use of the GROUP_ID function:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total, GROUP_ID( )
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY o.year, ROLLUP (o.year, o.month, r.name);

      YEAR MONTH     REGION                    TOTAL GROUP_ID( )
---------- --------- -------------------- ---------- ----------
      2000 January   Mid-Atlantic            1221394          0
      2000 January   New England             1018430          0
      2000 January   Southeast US             758042          0
      2000 January                           2997866          0
      2000 February  Mid-Atlantic             857352          0
      2000 February  New England             1231492          0
      2000 February  Southeast US            1236846          0
      2000 February                          3325690          0
      2000 March     Mid-Atlantic            1274062          0
      2000 March     New England             1132966          0
      2000 March     Southeast US            1311986          0
      2000 March                             3719014          0
      2001 January   Mid-Atlantic             610697          0
      2001 January   New England              509215          0
      2001 January   Southeast US             379021          0
      2001 January                           1498933          0
      2001 February  Mid-Atlantic             428676          0
      2001 February  New England              615746          0
      2001 February  Southeast US             618423          0
      2001 February                          1662845          0
      2001 March     Mid-Atlantic             637031          0
      2001 March     New England              566483          0
      2001 March     Southeast US             655993          0
      2001 March                             1859507          0
      2000                                  10042570          0
      2001                                   5021285          0
      2000                                  10042570          1
      2001                                   5021285          1

28 rows selected.

Note that the value 1 is returned by the GROUP_ID function for the last two rows. These rows are indeed duplicates of the previous two rows. If you don’t want to see the duplicates in your result set, restrict your query’s results to GROUP_ID 0:

               SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
               r.name region, SUM(o.tot_sales) total
               FROM all_orders o JOIN region r
               ON r.region_id = o.region_id
               WHERE o.month BETWEEN 1 AND 3
               GROUP BY o.year, ROLLUP (o.year, o.month, r.name)
               HAVING GROUP_ID( ) = 0;

      YEAR MONTH     REGION                    TOTAL
---------- --------- -------------------- ----------
      2000 January   New England             1018430
      2000 January   Mid-Atlantic            1221394
      2000 January   Southeast US             758042
      2000 January                           2997866
      2000 February  New England             1231492
      2000 February  Mid-Atlantic             857352
      2000 February  Southeast US            1236846
      2000 February                          3325690
      2000 March     New England             1132966
      2000 March     Mid-Atlantic            1274062
      2000 March     Southeast US            1311986
      2000 March                             3719014
      2001 January   New England              509215
      2001 January   Mid-Atlantic             610697
      2001 January   Southeast US             379021
      2001 January                           1498933
      2001 February  New England              615746
      2001 February  Mid-Atlantic             428676
      2001 February  Southeast US             618423
      2001 February                          1662845
      2001 March     New England              566483
      2001 March     Mid-Atlantic             637031
      2001 March     Southeast US             655993
      2001 March                             1859507
      2000                                  10042570
      2001                                   5021285

26 rows selected.

This version of the query uses HAVING GROUP_ID( ) = 0 to eliminate the two duplicate totals from the result set. GROUP_ID is only meaningful in the HAVING clause, because it applies to summarized data. You can’t use GROUP_ID in a WHERE clause, and it wouldn’t make sense to try.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.187