Chapter 13: Optimizer Transformations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 13

Optimizer Transformations

We have seen throughout this book that there are many ways to transform one SQL query (or DML subquery) into another without in any way changing the query’s meaning. For all but the most trivial of queries the CBO will consider a number of such transformations, sometimes estimating costing, to determine the best SQL statement on which to perform the final state optimization that we just covered in Chapter 12. This chapter is dedicated to the topic of optimizer transformations; you will see that there are a bewildering number of such transformations. Although not all optimizer transformations are included in this chapter, the vast majority that you will encounter in your day-to-day life are, as well as some that you will rarely encounter.

You may wonder why all this effort has been put into CBO transformations. After all, you may feel that it is a programmer’s responsibility to write SQL in an efficient way, and you would be correct in that feeling. However, you will see as this chapter progresses that there are some ways to rewrite a SQL statement that a mere mortal is unlikely to consider, and that these esoteric rewrites sometimes yield highly efficient execution plans.

However, some CBO transformations, such as join elimination, which I will cover later in this chapter, seem only to apply to very poorly written code. There are several reasons that such transformations are important:

Queries often reference data dictionary views designed for a multitude of purposes. Once these views are expanded into a statement, the result is often obviously inefficient.
Sometimes SQL code is generated by graphical tools or other language processors. Automatically generated code often seems hopelessly naïve.
Well, there are actually quite a lot of bad SQL programmers out there!

There is a special class of transformations that I call no-brainers that I will discuss very shortly. With the exception of the no-brainers, all optimizer transformations can be disabled in one fell swoop by the NO_QUERY_TRANSFORMATION hint. This hint is sometimes useful for analysis and debugging, but I have never used it in a production SQL statement. However, in some cases, you will need to force or disable specific transformations in particular contexts. With the exception of the no-brainers, all optimizer transformations can be forced, where legal, with a hint that is specific to that transformation and can be disabled with another hint that is usually, but not always, generated by prefixing the characters NO_ to the start of the hint. For example, we have already discussed view merging in Chapter 8: the MERGE hint forces view merging while the NO_MERGE hint disables it. Two exceptions to this rule are:

The or-expansion transformation is forced with the USE_CONCAT hint and disabled with the NO_EXPAND hint.
Factored subquery materialization is forced with the MATERIALIZE hint and disabled with the INLINE hint.

Most of the query transformations explained in the chapter can be expressed in terms of legal SQL; in other words, you might have written the transformed SQL yourself in the first place. However, we have already seen in Chapter 11 that subquery unnesting can generate semi-joins and anti-joins that have no equivalent legal SQL syntax, and here we will discuss group by pushdown, another example of a transformation that cannot be expressed by legal SQL. Please bear in mind as you read the listings in this chapter that are designed to demonstrate individual transformations that the transformed query text may not be an exact representation. The listings show the original and, where possible, the transformed query together with the associated execution plans and the hints that force and disable the transformations in question.

The fact that the CBO can perform a number of transformations, one after the other, makes the task of grasping what the CBO is up to somewhat difficult. The good news is that, with the exception of the no-brainers, the hints that are associated with the transformations that the CBO has chosen will appear in the outline section of the plans displayed as a result of the DBMS_XPLAN functions, as I described in Chapter 8. You will, however, never see hints like NO_MERGE or NO_EXPAND in the outline section of a displayed execution plan because the presence of OUTLINE and OUTLINE_LEAF hints means that the absence of hints like MERGE or USE_CONCAT implies that the transformations haven’t been applied.

This introduction has had to make special mention of no-brainer optimizer transformations on several occasions. Let us get to those now.

No-brainer Transformations

I am sure you will not be surprised to read that Oracle does not officially refer to any of the CBO transformations as no-brainers; this is just a term that I have coined myself to refer to a special set of transformations that have some unusual properties. These transformations are so fundamental that they have no associated hints, so they cannot be forced or disabled. In fact, these transformations aren’t even disabled by the NO_QUERY_TRANSFORMATION hint. So invisible are these transformations that there are only two ways to know that they exist:

You can wade through a 10053 trace file. That is what Christian Antognini did.
You can read Christian’s book, Troubleshooting Oracle Performance. That is what I did!

There is some merit to discussing a couple of these no-brainers, but I won’t attempt to be exhaustive. Let us start with the count transformation.

Count Transformation

Take a look at the two queries and their associated execution plans, shown in Listing 13-1.

Listing 13-1. Count transformation

SELECT COUNT (cust_income_level) FROM sh.customers;
----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  SORT AGGREGATE    |           |
|   2 |   TABLE ACCESS FULL| CUSTOMERS |
----------------------------------------
 
SELECT COUNT (cust_id) FROM sh.customers;
--------------------------------------------------------------
| Id  | Operation                     | Name                 |
--------------------------------------------------------------
|   0 | SELECT STATEMENT              |                      |
|   1 |  SORT AGGREGATE               |                      |
|   2 |   BITMAP CONVERSION COUNT     |                      |
|   3 |    BITMAP INDEX FAST FULL SCAN| CUSTOMERS_GENDER_BIX |
--------------------------------------------------------------

Both the execution plans and the results of the two queries are different. It turns out that the CUST_INCOME_LEVEL column in SH.CUSTOMERS does not have a NOT NULL constraint and as such the column may be, and indeed is, NULL for some rows. Such rows are not counted in the first query of Listing 13-1. Since CUST_INCOME_LEVEL is not indexed, the only way to execute COUNT (CUST_INCOME_LEVEL) is to perform a full table scan. However, CUST_ID is NOT NULL and as such COUNT (CUST_ID) can be transformed to COUNT (*). We could implement COUNT (*) by performing a full table scan, but we can also implement COUNT (*) by means of a bitmap index or a B-tree index on a NOT NULL column. In the case of the second query, the index CUSTOMERS_GENDER_BIX is a bitmap index on a non-null column so it is a doubly suitable, cheap alternative to a full table scan.

Getting bored? The next transformation is more interesting, but not by much.

Predicate Move-around

Consider the query and execution plan in Listing 13-2.

Listing 13-2. Predicate move-around

WITH cust_q
     AS (  SELECT cust_id, promo_id, SUM (amount_sold) cas
             FROM sh.sales
         GROUP BY cust_id, promo_id)
    ,prod_q
     AS (  SELECT prod_id, promo_id, SUM (amount_sold) pas
             FROM sh.sales
            WHERE promo_id = 999
         GROUP BY prod_id, promo_id)
SELECT promo_id
      ,prod_id
      ,pas
      ,cust_id
      ,cas
  FROM cust_q NATURAL JOIN prod_q;
 
----------------------------------------
| Id  | Operation              | Name  |
----------------------------------------
|   0 | SELECT STATEMENT       |       |
|*  1 |  HASH JOIN             |       |
|   2 |   VIEW                 |       |
|   3 |    HASH GROUP BY       |       |
|   4 |     PARTITION RANGE ALL|       |
|*  5 |      TABLE ACCESS FULL | SALES |
|   6 |   VIEW                 |       |
|   7 |    HASH GROUP BY       |       |
|   8 |     PARTITION RANGE ALL|       |
|*  9 |      TABLE ACCESS FULL | SALES |
----------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - access("CUST_Q"."PROMO_ID"="PROD_Q"."PROMO_ID")
   5 - filter("PROMO_ID"=999)
   9 - filter("PROMO_ID"=999)

The query in Listing 13-2 doesn’t do anything meaningful. For each combination of product and customer it lists the total amount of sales for PROMO_ID 999 for the customer and the total amount of sales for PROMO_ID 999 for the product. There are two factored subqueries and only one WHERE clause, but the natural join means that, by using transitive closure, we can apply the predicate in the other subquery as well. However, Listing 13-3 makes the query a little more complicated and thus the predicate move-around transformation ceases to function.

Listing 13-3. Predicate move-around not recognized

WITH cust_q
     AS (  SELECT cust_id
                 ,promo_id
                 ,SUM (amount_sold) cas
                 ,MAX (SUM (amount_sold)) OVER (PARTITION BY promo_id) max_cust
             FROM sh.sales
         GROUP BY cust_id, promo_id)
    ,prod_q
     AS (  SELECT prod_id
                 ,promo_id
                 ,SUM (amount_sold) pas
                 ,MAX (SUM (amount_sold)) OVER (PARTITION BY promo_id) max_prod
             FROM sh.sales
            WHERE promo_id = 999
         GROUP BY prod_id, promo_id)
SELECT promo_id
      ,prod_id
      ,pas
      ,cust_id
      ,cas
  FROM cust_q NATURAL JOIN prod_q
 WHERE cust_q.cas = cust_q.max_cust AND prod_q.pas = prod_q.max_prod;
 
------------------------------------------
| Id  | Operation                | Name  |
------------------------------------------
|   0 | SELECT STATEMENT         |       |
|   1 |  MERGE JOIN              |       |
|*  2 |   VIEW                   |       |
|   3 |    WINDOW BUFFER         |       |
|   4 |     SORT GROUP BY        |       |
|   5 |      PARTITION RANGE ALL |       |
|   6 |       TABLE ACCESS FULL  | SALES |
|*  7 |   SORT JOIN              |       |
|*  8 |    VIEW                  |       |
|   9 |     WINDOW BUFFER        |       |
|  10 |      SORT GROUP BY       |       |
|  11 |       PARTITION RANGE ALL|       |
|* 12 |        TABLE ACCESS FULL | SALES |
------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - filter("CUST_Q"."CAS"="CUST_Q"."MAX_CUST")
   7 - access("CUST_Q"."PROMO_ID"="PROD_Q"."PROMO_ID")
       filter("CUST_Q"."PROMO_ID"="PROD_Q"."PROMO_ID")
   8 - filter("PROD_Q"."PAS"="PROD_Q"."MAX_PROD")
  12 - filter("PROMO_ID"=999)

Listing 13-3 does something a lot more meaningful: it identifies the one product that sold the most for PROMO_ID 999 and the one customer for PROMO_ID 999 that bought the most. But the addition of the analytic functions into the subqueries has caused the CBO to lose confidence in the correctness of predicate move-around. To make the query efficient you will have to add the WHERE clause to the CUST_Q factored subquery yourself.

There are one or two other no-brainer transformations:

Filter pushdown is a simpler variant of predicate move-around.
Distinct elimination removes a redundant DISTINCT operation from a query when the operands involve all columns of the primary key, all columns of a unique key that are NOT NULL, or the ROWID.
Select list pruning removes items from the select list of a subquery that aren’t referenced.

The key point involving these transformations is that there is no possible downside to their application. For example, there is no chance that you can make a query slower by removing unreferenced items from the select list in a subquery. All of this is of some interest, but not very much, so let us move on to the second group of transformations: set and join transformations.

Set and Join Transformations

There are several transformations related to set and join operations, most of which have only been implemented recently, which is surprising given how useful many of them are. Let us get going with join elimination.

Join Elimination

Join elimination is one of those transformations that can be demonstrated with inline views and factored subqueries but is really intended for general purpose data dictionary views. I will demonstrate this focus on data dictionary views in Listing 13-4, but I will assume the point is understood in future examples.

Listing 13-4. Join elimination with data dictionary views

CREATE OR REPLACE VIEW cust_sales
AS
   WITH sales_q
        AS (  SELECT cust_id
                    ,SUM (amount_sold) amount_sold
                    ,AVG (amount_sold) avg_sold
                    ,COUNT (*) cnt
                FROM sh.sales s
            GROUP BY cust_id)
   SELECT c.*
         ,s.cust_id AS sales_cust_id
         ,s.amount_sold
         ,s.avg_sold
         ,cnt
     FROM sh.customers c JOIN sales_q s ON c.cust_id = s.cust_id;
 
SELECT sales_cust_id
      ,amount_sold
      ,avg_sold
      ,cnt
  FROM cust_sales;
 
-----------------------------------------------
| Id  | Operation              | Name         |
-----------------------------------------------
|   0 | SELECT STATEMENT       |              |
|   1 |  NESTED LOOPS          |              |
|   2 |   VIEW                 |              |
|   3 |    HASH GROUP BY       |              |
|   4 |     PARTITION RANGE ALL|              |
|   5 |      TABLE ACCESS FULL | SALES        |
|*  6 |   INDEX UNIQUE SCAN    | CUSTOMERS_PK |
-----------------------------------------------
 
CREATE OR REPLACE VIEW cust_sales
AS
   WITH sales_q
        AS (  SELECT cust_id
                    ,SUM (amount_sold) amount_sold
                    ,AVG (amount_sold) avg_sold
                    ,COUNT (*) cnt
                FROM sh.sales s
            GROUP BY cust_id)
   SELECT c.*
         ,s.cust_id AS sales_cust_id
         ,s.amount_sold
         ,s.avg_sold
         ,cnt
     FROM sh.customers c RIGHT JOIN sales_q s ON c.cust_id = s.cust_id;
 
SELECT /*+   eliminate_join(@SEL$13BD1B6A c@sel$2) */
       /* no_eliminate_join(@SEL$13BD1B6A c@sel$2) */sales_cust_id
      ,amount_sold
      ,avg_sold
      ,cnt
  FROM cust_sales;
 
-- Untransformed execution plan (NO_ELIMINATE_JOIN)
 
-----------------------------------------------
| Id  | Operation              | Name         |
-----------------------------------------------
|   0 | SELECT STATEMENT       |              |
|   1 |  NESTED LOOPS OUTER    |              |
|   2 |   VIEW                 |              |
|   3 |    HASH GROUP BY       |              |
|   4 |     PARTITION RANGE ALL|              |
|   5 |      TABLE ACCESS FULL | SALES        |
|*  6 |   INDEX UNIQUE SCAN    | CUSTOMERS_PK |
-----------------------------------------------
 
-- Transformed query
 
  SELECT cust_id AS sales_cust_id
        ,SUM (amount_sold) amount_sold
        ,AVG (amount_sold) avg_sold
        ,COUNT (*) cnt
    FROM sh.sales s
GROUP BY cust_id;
 
-- Transformed execution plan (default)
 
--------------------------------------
| Id  | Operation            | Name  |
--------------------------------------
|   0 | SELECT STATEMENT     |       |
|   1 |  HASH GROUP BY       |       |
|   2 |   PARTITION RANGE ALL|       |
|   3 |    TABLE ACCESS FULL | SALES |
--------------------------------------

Listing 13-4 begins by creating data dictionary view CUST_SALES that adds some columns to the SH.CUSTOMERS example schema table, which are obtained by performing aggregation on the SH.SALES table. The query then uses the CUST_SALES view but is only interested in the columns from the SH.SALES table. Unfortunately, the aggregation operation has caused the CBO to lose track of the significance of the referential integrity constraint that ensures that all rows in SH.SALES have exactly one corresponding row in SH.CUSTOMERS. As a result of this the CBO does not apply the join elimination optimization.

Listing 13-4 continues by redefining the CUST_SALES view to use an outer join. This confirms to the CBO that the join of the aggregated SH.SALES data with SH.CUSTOMERS will not result in the loss of any rows and the fact that the join column CUST_ID is the primary key of SH.CUSTOMERS means that the join will not result in the addition of any rows either. The CBO is now confident that the elimination of the join from SH.SALES to SH.CUSTOMERS will not affect the result and proceeds with the transformation.

The second query in Listing 13-4 includes two comments. The first is a hint to force the default behavior. If you remove the first comment and add a plus sign to the second you can disable the transformation. Unfortunately, because the join happens inside a view you have to look at the outline section of DBMS_XPLAN.DISPLAY to identify the query block and generate a global hint if you really want to disable the transformation. Listing 13-4 also shows the execution plan associated with the untransformed query, the transformed query, and the transformed execution plan.

Have a careful look at Listing 13-4 because I will be using this approach as I go through other transformations.

There is one other interesting application of the join elimination that applies to nested tables. Because nested tables offer no performance or administrative benefits they aren’t seen much in schemas designed in the 21st century. But just in case you are ever need to work with a 20th-century schema, take a look at Listing 13-5, which references the PM.PRINT_MEDIA example schema.

Listing 13-5. Join elimination on a nested table

  SELECT /*+   eliminate_join(parent_tab) */
         /* no_eliminate_join(parent_tab) */
         nested_tab.document_typ, COUNT (*) cnt
    FROM pm.print_media parent_tab
        ,TABLE (parent_tab.ad_textdocs_ntab)nested_tab
GROUP BY nested_tab.document_typ;
 
-- Untransformed execution plan (NO_ELIMINATE_JOIN)
 
--------------------------------------------------
| Id  | Operation           | Name               |
--------------------------------------------------
|   0 | SELECT STATEMENT    |                    |
|   1 |  HASH GROUP BY      |                    |
|   2 |   NESTED LOOPS      |                    |
|   3 |    TABLE ACCESS FULL| TEXTDOCS_NESTEDTAB |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0011792       |
--------------------------------------------------
 
-- Transformed execution plan (default)
 
------------------------------------------------------------------------
| Id  | Operation                            | Name                    |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                         |
|   1 |  HASH GROUP BY                       |                         |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| TEXTDOCS_NESTEDTAB      |
|*  3 |    INDEX FULL SCAN                   | SYS_FK0000091775N00007$ |
------------------------------------------------------------------------

What is interesting about Listing 13-5 is that prior to the introduction of the join elimination transformation in 10gR2 the possibility of referencing a nested table without its parent didn’t exist. There is no valid SQL syntax to express the results of the join elimination transformation!

You might think that join elimination should be a no-brainer: there can never be any performance benefit to retaining the join. However, I suspect that the creation of hints to control transformations is now a standard, and, if so, I approve. Seeing hints in the outline section of DBMS_XPLAN functions means that you can work out what is happening more easily and allows you to disable the transformation for educational or investigative purposes.

Outer Join to Inner Join

Sometimes an outer join can’t be eliminated, but it can be converted to an inner join. Let us take another look at the second definition of the CUST_SALES view in Listing 13-4. Although the change to the view definition helped queries that only needed aggregated SH.SALES data, it made queries that do require columns from SH.CUSTOMERS less efficient. If we change such queries we can eliminate that overhead. Take a look at Listing 13-6.

Listing 13-6. Outer join to inner join transformation

SELECT /*+   outer_join_to_inner(@SEL$13BD1B6A c@sel$2) */
       /* no_outer_join_to_inner(@SEL$13BD1B6A c@sel$2) */
      *
 FROM cust_sales c
WHERE cust_id IS NOT NULL;
 
-- Untransformed execution plan (NO_OUTER_JOIN_TO_INNER)
---------------------------------------------
| Id  | Operation               | Name      |
---------------------------------------------
|   0 | SELECT STATEMENT        |           |
|*  1 |  FILTER                 |           |
|*  2 |   HASH JOIN OUTER       |           |
|   3 |    VIEW                 |           |
|   4 |     HASH GROUP BY       |           |
|   5 |      PARTITION RANGE ALL|           |
|   6 |       TABLE ACCESS FULL | SALES     |
|   7 |    TABLE ACCESS FULL    | CUSTOMERS |
---------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - filter("C"."CUST_ID" IS NOT NULL)
   2 - access("C"."CUST_ID"(+)="S"."CUST_ID")
 
-- Transformed query
    
WITH sales_q
        AS (  SELECT cust_id
                    ,SUM (amount_sold) amount_sold
                    ,AVG (amount_sold) avg_sold
                    ,COUNT (*) cnt
                FROM sh.sales s
            GROUP BY cust_id)
   SELECT c.*
         ,s.cust_id AS sales_cust_id
         ,s.amount_sold
         ,s.avg_sold
         ,cnt
     FROM sh.customers c JOIN sales_q s ON c.cust_id = s.cust_id;
 
-- Transformed execution plan (default)
 
--------------------------------------------
| Id  | Operation              | Name      |
--------------------------------------------
|   0 | SELECT STATEMENT       |           |
|*  1 |  HASH JOIN             |           |
|   2 |   VIEW                 |           |
|   3 |    HASH GROUP BY       |           |
|   4 |     PARTITION RANGE ALL|           |
|   5 |      TABLE ACCESS FULL | SALES     |
|   6 |   TABLE ACCESS FULL    | CUSTOMERS |
--------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - access("C"."CUST_ID"="S"."CUST_ID")

Leaving aside the referential integrity constraint for a minute, the potential difference between an outer join and an inner join with SH.CUSTOMERS is that extra rows might be added with C.CUST_ID NULL. The presence of the WHERE cust_id IS NOT NULL predicate means that any such additional rows would be eliminated, so the CBO will apply the transformation from an outer join to an inner join unless prohibited from doing so by a hint.

Full Outer Join to Outer Join

A logical extension to the outer-join-to-inner-join transformation is the full-outer-join-to-outer-join transformation. Precisely the same principles apply. If you understood Listing 13-6, Listing 13-7 should be self-explanatory.

Listing 13-7. Full outer join to outer join

SELECT /*+    full_outer_join_to_outer(cust) */
       /*  no_full_outer_join_to_outer(cust) */
      *
 FROM sh.countries FULL OUTER JOIN sh.customers cust USING (country_id)
WHERE cust.cust_id IS NOT NULL;
 
-- Untransformed execution plan (NO_FULL_OUTER_JOIN_TO_OUTER)
-------------------------------------------
| Id  | Operation             | Name      |
-------------------------------------------
|   0 | SELECT STATEMENT      |           |
|*  1 |  VIEW                 | VW_FOJ_0  |
|*  2 |   HASH JOIN FULL OUTER|           |
|   3 |    TABLE ACCESS FULL  | COUNTRIES |
|   4 |    TABLE ACCESS FULL  | CUSTOMERS |
-------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - filter("CUST"."CUST_ID" IS NOT NULL)
   2 - access("COUNTRIES"."COUNTRY_ID"="CUST"."COUNTRY_ID")
 
-- Transformed query
    
SELECT *
  FROM sh.countries RIGHT OUTER JOIN sh.customers cust USING (country_id);
 
-- Transformed execution plan (default)
 
-------------------------------------------
| Id  | Operation             | Name      |
-------------------------------------------
|   0 | SELECT STATEMENT      |           |
|*  1 |  HASH JOIN RIGHT OUTER|           |
|   2 |   TABLE ACCESS FULL   | COUNTRIES |
|   3 |   TABLE ACCESS FULL   | CUSTOMERS |
-------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - access("COUNTRIES"."COUNTRY_ID"(+)="CUST"."COUNTRY_ID")

Of course, if we were to add a predicate such as WHERE COUNTRIES.COUNTRY_NAME IS NOT NULL to Listing 13-7 we could apply the outer-join-to-inner-join transformation as well and we would get an inner join.

Semi-Join to Inner Join

Although the semi-join-to-inner-join transformation has been around for some time, the hint to manage the transformation is new to 11.2.0.3. Listing 13-8 demonstrates.

Listing 13-8. Semi-join to inner join

SELECT *
  FROM sh.sales s
 WHERE EXISTS
          (SELECT /*+   semi_to_inner(c) */
                  /*  no_semi_to_inner(c) */
                 1
            FROM sh.customers c
           WHERE     c.cust_id = s.cust_id
                 AND cust_first_name = 'Abner'
                 AND cust_last_name = 'Everett'),
 
-- Untransformed execution plan (NO_SEMI_TO_INNER)
 
------------------------------------------
| Id  | Operation            | Name      |
------------------------------------------
|   0 | SELECT STATEMENT     |           |
|*  1 |  HASH JOIN RIGHT SEMI|           |
|*  2 |   TABLE ACCESS FULL  | CUSTOMERS |
|   3 |   PARTITION RANGE ALL|           |
|   4 |    TABLE ACCESS FULL | SALES     |
------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - access("C"."CUST_ID"="S"."CUST_ID")
   2 - filter("CUST_FIRST_NAME"='Abner' AND "CUST_LAST_NAME"='Everett')
 
-- Transformed query (approximate)
 
WITH q1
     AS (SELECT DISTINCT cust_id
           FROM sh.customers
          WHERE cust_first_name = 'Abner' AND cust_last_name = 'Everett')
SELECT s.*
  FROM sh.sales s JOIN q1 ON s.cust_id = q1.cust_id;
 
-- Transfomed execution plan (default)
-------------------------------------------------------------
| Id  | Operation                          | Name           |
-------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                |
|   1 |  NESTED LOOPS                      |                |
|   2 |   NESTED LOOPS                     |                |
|   3 |    SORT UNIQUE                     |                |
|*  4 |     TABLE ACCESS FULL              | CUSTOMERS      |
|   5 |    PARTITION RANGE ALL             |                |
|   6 |     BITMAP CONVERSION TO ROWIDS    |                |
|*  7 |      BITMAP INDEX SINGLE VALUE     | SALES_CUST_BIX |
|   8 |   TABLE ACCESS BY LOCAL INDEX ROWID| SALES          |
-------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   4 - filter("CUST_FIRST_NAME"='Abner' AND "CUST_LAST_NAME"='Everett')
   7 - access("C"."CUST_ID"="S"."CUST_ID")

The transformation shown in Listing 13-8 allows additional join orders for nested loops and merge semi-joins. Prior to the creation of this transformation semi-joins required that the unnested subquery be the probe row source of a nested loops or merge semi-join (join inputs can be swapped for hash joins). Now that we can transform the semi-join to an inner join, we can use the subquery as the driving row source.

Notice that there is a SORT UNIQUE operation in the transformed execution plan. The purpose of this operation is to ensure that any row from SH.SALES appears at most once in the result set. However, if you take the transformed query and run DBMS_XPLAN.DISPLAY on that query you will not see this SORT UNIQUE operation. This is because the no-brainer distinct elimination transformation is performed if the transformed query is directly executed or explained. I imagine that the redundant SORT UNIQUE operation introduced by the transformation will disappear in a future release.

Subquery Unnesting

We have already seen the subquery unnesting transformation in the context of semi-joins and anti-joins in Chapter 11. We will also be looking at unnesting of subqueries in the select list in Chapter 14. However, there are one or two other variants of subquery unnesting that I want to address here. Take a look at Listing 13-9.

Listing 13-9. Subquery unnesting with decorrelation

SELECT c.cust_first_name, c.cust_last_name, s1.amount_sold
  FROM sh.customers c, sh.sales s1
 WHERE     s1.amount_sold = (SELECT /*     unnest */
                                    /*+ no_unnest */
                                    MAX (amount_sold)
                               FROM sh.sales s2
                              WHERE s1.cust_id= s2.cust_id)
       AND c.cust_id = s1.cust_id;
 
-- Untransformed execution plan (NO_UNNEST)
 
-----------------------------------------------------------------------
| Id  | Operation                                    | Name           |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                |
|*  1 |  FILTER                                      |                |
|*  2 |   HASH JOIN                                  |                |
|   3 |    TABLE ACCESS FULL                         | CUSTOMERS      |
|   4 |    PARTITION RANGE ALL                       |                |
|   5 |     TABLE ACCESS FULL                        | SALES          |
|   6 |   SORT AGGREGATE                             |                |
|   7 |    PARTITION RANGE ALL                       |                |
|   8 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES          |
|   9 |      BITMAP CONVERSION TO ROWIDS             |                |
|* 10 |       BITMAP INDEX SINGLE VALUE              | SALES_CUST_BIX |
-----------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - filter("S1"."AMOUNT_SOLD"= (SELECT /*+ NO_UNNEST */
              MAX("AMOUNT_SOLD") FROM "SH"."SALES" "S2" WHERE "S2"."CUST_ID"=:B1))
   2 - access("C"."CUST_ID"="S1"."CUST_ID")
  10 - access("S2"."CUST_ID"=:B1
 
-- Transformed query
 
WITH vw_sq_1
     AS (  SELECT cust_id AS sq_cust_id, MAX (amount_sold) max_amount_sold
             FROM sh.sales s2
         GROUP BY cust_id)
SELECT c.cust_first_name, c.cust_last_name, s1.amount_sold
  FROM sh.sales s1, sh.customers c, vw_sq_1
 WHERE     s1.amount_sold = vw_sq_1.max_amount_sold
       AND s1.cust_id = vw_sq_1.sq_cust_id
       AND s1.cust_id = c.cust_id;
 
-- Transfomed execution plan (default)
-----------------------------------------------------
| Id  | Operation                    | Name         |
-----------------------------------------------------
|   0 | SELECT STATEMENT             |              |
|   1 |  NESTED LOOPS                |              |
|   2 |   NESTED LOOPS               |              |
|*  3 |    HASH JOIN                 |              |
|   4 |     VIEW                     | VW_SQ_1      |
|   5 |      HASH GROUP BY           |              |
|   6 |       PARTITION RANGE ALL    |              |
|   7 |        TABLE ACCESS FULL     | SALES        |
|   8 |     PARTITION RANGE ALL      |              |
|   9 |      TABLE ACCESS FULL       | SALES        |
|* 10 |    INDEX UNIQUE SCAN         | CUSTOMERS_PK |
|  11 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS    |
-----------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   3 - access("S1"."AMOUNT_SOLD"="MAX(AMOUNT_SOLD)" AND
              "S1"."CUST_ID"="ITEM_1")
  10 - access("C"."CUST_ID"="S1"."CUST_ID")

The original subquery in the WHERE clause is a correlated subquery, meaning that in the untransformed query that subquery is logically executed for every row returned by the join. However, in the transformed query the aggregation is performed just once, aggregating for every value of CUST_ID.

Subquery unnesting is rather unusual in that it is sometimes applied as a cost-based transformation and sometimes as a heuristic transformation. If you use a construct such as WHERE C1 = (<subquery>), subquery unnesting is cost based. However, in all the listings in this book, subquery unnesting is applied as a heuristic transformation, meaning that the CBO does not use the cost-based framework to determine whether to apply the transformation. In common with all the other transformations we have discussed so far (except the semi-to-inner transformation), subquery unnesting is always performed (unless hinted) regardless of whether or not performance is expected to improve. In the cases of all the optimizer transformations that we have discussed so far, there hasn’t been any chance that the transformations would harm performance. It definitely is possible for subquery unnesting to harm performance in some cases. We will return to the topic of the potential performance impact of heuristic transformations in Chapters 14 and 18, but for now just make a mental note of it. We need to turn our attention to another case of subquery unnesting, seen in Listing 13-10.

Listing 13-10. Subquery unnesting using window functions

SELECT c.cust_first_name, c.cust_last_name, s1.amount_sold
  FROM sh.customers c, sh.sales s1
 WHERE     amount_sold = (SELECT /*     unnest */
                                 /*+ no_unnest */
MAX (amount_sold)
                            FROM sh.sales s2
                           WHERE c.cust_id= s2.cust_id)
       AND c.cust_id = s1.cust_id;
 
-- Untransformed execution plan (NO_UNNEST)
-----------------------------------------------------------------------
| Id  | Operation                                    | Name           |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                |
|*  1 |  FILTER                                      |                |
|*  2 |   HASH JOIN                                  |                |
|   3 |    TABLE ACCESS FULL                         | CUSTOMERS      |
|   4 |    PARTITION RANGE ALL                       |                |
|   5 |     TABLE ACCESS FULL                        | SALES          |
|   6 |   SORT AGGREGATE                             |                |
|   7 |    PARTITION RANGE ALL                       |                |
|   8 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES          |
|   9 |      BITMAP CONVERSION TO ROWIDS             |                |
|* 10 |       BITMAP INDEX SINGLE VALUE              | SALES_CUST_BIX |
-----------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - filter("AMOUNT_SOLD"= (SELECT /*+ NO_UNNEST */
              MAX("AMOUNT_SOLD") FROM "SH"."SALES" "S2" WHERE "S2"."CUST_ID"=:B1))
   2 - access("C"."CUST_ID"="S1"."CUST_ID")
  10 - access("S2"."CUST_ID"=:B1)
 
-- Transformed query
 
WITH vw_wif_1
     AS (SELECT c.cust_first_name
               ,c.cust_last_name
               ,s.amount_sold
               ,MAX (amount_sold) OVER (PARTITION BY s.cust_id) AS item_4
           FROM sh.customers c, sh.sales s
          WHERE s.cust_id = c.cust_id)
SELECT cust_first_name, cust_last_name, amount_sold
  FROM vw_wif_1
 WHERE CASE WHEN item_4 = amount_sold THEN ROWID END IS NOT NULL;
 
-- Transformed execution plan (default)
--------------------------------------------
| Id  | Operation              | Name      |
--------------------------------------------
|   0 | SELECT STATEMENT       |           |
|*  1 |  VIEW                  | VW_WIF_1  |
|   2 |   WINDOW SORT          |           |
|*  3 |    HASH JOIN           |           |
|   4 |     TABLE ACCESS FULL  | CUSTOMERS |
|   5 |     PARTITION RANGE ALL|           |
|   6 |      TABLE ACCESS FULL | SALES     |
--------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   1 - filter("VW_COL_4" IS NOT NULL)
   3 - access("C"."CUST_ID"="S1"."CUST_ID")

The difference in the queries shown in Listing 13-9 and Listing 13-10 is difficult to spot. In fact, the correlating column in Listing 13-9 is S1.CUST_ID and in Listing 13-10 it is C.CUST_ID. Given that there is also a predicate in the queries that equates S1.CUST_ID and C.CUST_ID, it may be a surprise that the resulting execution plan in Listing 13-10 differs so much from that in Listing 13-9. This is an example of a problem with transitive closure, and we will return to this topic in Chapter 14. For now, let us just accept that because the correlating column (C.CUST_ID) is from one table and the subquery is equated with a column from another table (S1.AMOUNT_SOLD), Listing 13-10 uses a different variant of subquery unnesting than did Listing 13-9. We can see that in Listing 13-9 we have two full scans of SH.SALES (bad) but no sort (good). In Listing 13-10 we have only one scan of SH.SALES (good) as well as a sort (bad). If you have a query like the ones in the last two listings it might be best to experiment with a code change to see which execution plan performs best.

The transformed query in Listing 13-10 sorts all the rows from the result of the join and then selects just the highest ranking. Why the convoluted predicate involving a case expression? I wouldn’t want to speculate.

Partial Joins

Partial joins are a new feature of 12cR1, but in my opinion they haven’t been explained well. Let me see if I can provide another angle on things. Take a look at Listing 13-11.

Listing 13-11. Partial join transformation

  SELECT /*+    partial_join(iv) */
         /*  no_partial_join(iv) */
         product_id, MAX (it.quantity)
    FROM oe.order_items it JOIN oe.inventories iv USING (product_id)
GROUP BY product_id;
 
-- Untransformed execution plan (NO_PARTIAL_JOIN)
-----------------------------------------------
| Id  | Operation              | Name         |
-----------------------------------------------
|   0 | SELECT STATEMENT       |              |
|   1 |  HASH GROUP BY         |              |
|*  2 |   HASH JOIN            |              |
|   3 |    TABLE ACCESS FULL   | ORDER_ITEMS  |
|   4 |    INDEX FAST FULL SCAN| INVENTORY_IX |
-----------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("IT"."PRODUCT_ID"="IV"."PRODUCT_ID")
-- Transformed query
 
  SELECT product_id, MAX (quantity)
    FROM oe.order_items it
   WHERE EXISTS
            (SELECT 1
               FROM oe.inventories iv
              WHERE it.product_id = iv.product_id)
GROUP BY it.product_id;
 
-- Transformed execution plan (default)
-----------------------------------------------
| Id  | Operation              | Name         |
-----------------------------------------------
|   0 | SELECT STATEMENT       |              |
|   1 |  HASH GROUP BY         |              |
|*  2 |   HASH JOIN SEMI       |              |
|   3 |    TABLE ACCESS FULL   | ORDER_ITEMS  |
|   4 |    INDEX FAST FULL SCAN| INVENTORY_IX |
-----------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("IT"."PRODUCT_ID"="IV"."PRODUCT_ID")

As you can see, the partial join converts an inner join to a semi-join. But earlier we had the semi-to-inner-join transformation, didn’t we? Are you confused? I was. Let me unravel the mystery.

The benefits of the partial join are best explained with a join of two tables that have a many-to-many relationship, so in Listing 13-11 I have temporarily abandoned the SH example schema and used two tables from the OE example schema that have a many-to-many relationship.

Let us take PRODUCT_ID 1797 as an example and see first how the untransformed query processes it. This product forms part of three orders, and the values of QUANTITY in the three matching rows in OE.ORDER_ITEMS are 7, 9, and 12, respectively. PRODUCT_ID 1797 is also in stock at three warehouses, so there are three rows matching PRODUCT_ID 1797 in OE.INVENTORIES. When we join OE.ORDER_ITEMS and OE.INVENTORIES using the PRODUCT_ID we get nine rows for PRODUCT_ID 1797 with values of QUANTITY 7, 9, 12, 7, 9, 12, 7, 9, and 12 (the three rows from OE.ORDER_ITEMS being replicated three times). When we input these nine rows into our HASH GROUP BY operation we get a value of 12 for MAX (QUANTITY). If we had included MIN (QUANTITY), SUM (DISTINCT QUANTITY), AVG (QUANTITY), and SUM (QUANTITY) in our select list we would have got 7, 28, 9⅓, and 84, respectively.

The transformation to the semi-join prevents the duplication of rows from OE.ORDER_ITEMS as a result of any duplicate values of PRODUCT_ID 1797 in OE.INVENTORIES, so the values input to the HASH GROUP BY operation are just 7, 9, and 12. The value of MAX (PRODUCT_ID) is 12, precisely the same as if we had processed all nine rows. The values of MIN (QUANTITY), SUM (DISTINCT QUANTITY), AVG (QUANTITY), and SUM (QUANTITY) based on these rows would be 7, 28, 9⅓, and 28, respectively. All my example aggregate values are the same for the three rows in the transformed query as for the nine rows in the untransformed query, except SUM (QUANTITY), which yields a different result. As a consequence, if you try adding SUM (QUANTITY) to the select list of the query in Listing 13-11 you will find that the partial join is suddenly illegal. In 12.1.0.1 the AVG (QUANTITY) function will also disable partial joins, and I speculate that this is because COUNT (QUANTITY) and SUM (QUANTITY) are both genuinely illegal. This is the sort of thing that might get fixed in a later release.

The main benefit of the partial join is to reduce the number of rows being input to a GROUP BY or DISTINCT operation. In the case of the HASH JOIN SEMI operation in the transformed execution plan, this is accomplished by removing the entries from the hash cluster as soon as one match is found. From then on, no further rows from OE.INVENTORIES with the same PRODUCT_ID will match.

The partial join also has some interesting side benefits when applied to nested loops joins. Take a look at Listing 13-12, which displays edited runtime statistics along with the execution plan.

Listing 13-12. Partial join transformation with nested loops join

BEGIN
   FOR r IN (  SELECT/*+ TAGPJ gather_plan_statistics */
                     cust_id, MAX (s.amount_sold)
                 FROM sh.customers c JOIN sh.sales s USING (cust_id)
                WHERE s.amount_sold > 1782 AND s.prod_id = 18
             GROUP BY cust_id)
   LOOP
      NULL;
   END LOOP;
END;
 
 SET LINES 200 PAGES 0
 
SELECT p.*
  FROM v$sql s
      ,TABLE (
          DBMS_XPLAN.display_cursor (s.sql_id
                                    ,s.child_number
                                    ,'BASIC IOSTATS LAST')) p
 WHERE sql_text LIKE 'SELECT /*+ TAGPJ%';
 
----------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |      1 |        |     83 |
|   1 |  HASH GROUP BY               |                |      1 |      3 |     83 |
|   2 |   NESTED LOOPS SEMI          |                |      1 |      3 |    108 |
|   3 |    PARTITION RANGE ALL       |                |      1 |      3 |    108 |
|*  4 |     TABLE ACCESS BY LOCAL IND| SALES          |     28 |      3 |    108 |
|   5 |      BITMAP CONVERSION TO ROW|                |     16 |        |   9591 |
|*  6 |       BITMAP INDEX SINGLE VAL| SALES_PROD_BIX |     16 |        |     16 |
|*  7 |    INDEX UNIQUE SCAN         | CUSTOMERS_PK   |     83 |  55500 |     83 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
4 - filter("SALES"."AMOUNT_SOLD">1782)
6 - access("SALES"."PROD_ID"=18)
7 - access("CUSTOMERS"."CUST_ID"="SALES"."CUST_ID")

At first glance the partial join applied to the nested loops join in Listing 13-12 seems pointless. There is a primary key constraint on the CUST_ID column of SH.CUSTOMERS, so there will only be one row obtained by the INDEX UNIQUE SCAN on line 7, so what benefit comes from converting NESTED LOOPS to NESTED LOOPS SEMI? The answer can be seen by looking at the STARTS column of operation 7. Despite the fact that there are 108 rows from SH.SALES that match the predicates SH.PROD_ID=18 and SH.AMOUNT_SOLD > 1782, the INDEX UNIQUE SCAN was only performed 83 times! This is because there are only 83 distinct values of CUST_ID in those 108 rows and the NESTED LOOPS SEMI and NESTED LOOPS ANTI join operations can take advantage of scalar subquery caching. A regular NESTED LOOPS join cannot!

But perhaps we are getting distracted. The main purpose of the partial join transformation is to eliminate rows as early in the process as possible that would otherwise be pointlessly input to a DISTINCT or GROUP BY operation. This requires the conversion of an inner join to a semi-join. On the other hand, the purpose of the semi-to-inner-join transformation is to allow the subquery to act as the driving row source in a NESTED LOOPS or MERGE join operation. These two goals are mutually exclusive. Fortunately, both the partial join and the semi-to-inner-join transformations are cost based, and the CBO will not perform a partial join transformation when it thinks that doing so would create a disadvantageous join order.

Join Factorization

The purpose of the join factorization transformation is to remove common components from the operands of a UNION ALL operation and then apply them once—after the operation. Listing 13-13 provides an example.

Listing 13-13. Join factorization

SELECT /*+   factorize_join(@set$1(s@u1 S@u2)) qb_name(u1) */
       /* no_factorize_join(@set$1) */
       *
  FROM sh.sales s, sh.customers c
 WHERE     c.cust_first_name = 'Abner'
       AND c.cust_last_name = 'Everett'
       AND s.cust_id = c.cust_id
/*AND prod_id = 13
AND time_id = DATE '2001-09-13' */
 
UNION ALL
SELECT /*+ qb_name(u2) */
       *
  FROM sh.sales s, sh.customers c
 WHERE     c.cust_first_name = 'Abigail'
       AND c.cust_last_name = 'Ruddy'
       AND s.cust_id = c.cust_id
/*AND prod_id = 13
AND time_id = DATE '2001-09-13' */;
 
-- Untransformed execution plan (NO_FACTORIZE_JOIN)
 
--------------------------------------------------------------
| Id  | Operation                           | Name           |
--------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |
|   1 |  UNION-ALL                          |                |
|   2 |   NESTED LOOPS                      |                |
|   3 |    NESTED LOOPS                     |                |
|*  4 |     TABLE ACCESS FULL               | CUSTOMERS      |
|   5 |     PARTITION RANGE ALL             |                |
|   6 |      BITMAP CONVERSION TO ROWIDS    |                |
|*  7 |       BITMAP INDEX SINGLE VALUE     | SALES_CUST_BIX |
|   8 |    TABLE ACCESS BY LOCAL INDEX ROWID| SALES          |
|   9 |   NESTED LOOPS                      |                |
|  10 |    NESTED LOOPS                     |                |
|* 11 |     TABLE ACCESS FULL               | CUSTOMERS      |
|  12 |     PARTITION RANGE ALL             |                |
|  13 |      BITMAP CONVERSION TO ROWIDS    |                |
|* 14 |       BITMAP INDEX SINGLE VALUE     | SALES_CUST_BIX |
|  15 |    TABLE ACCESS BY LOCAL INDEX ROWID| SALES          |
--------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   4 - filter("C"."CUST_FIRST_NAME"='Abner' AND
              "C"."CUST_LAST_NAME"='Everett')
   7 - access("S"."CUST_ID"="C"."CUST_ID")
  11 - filter("C"."CUST_FIRST_NAME"='Abigail' AND
              "C"."CUST_LAST_NAME"='Ruddy')
  14 - access("S"."CUST_ID"="C"."CUST_ID")
 
-- Transformed query
 
WITH vw_jf
     AS (SELECT *
           FROM sh.customers c
          WHERE c.cust_first_name = 'Abner' AND c.cust_last_name = 'Everett'
         UNION ALL
         SELECT *
           FROM sh.customers c
          WHERE c.cust_first_name = 'Abigail' AND c.cust_last_name = 'Ruddy')
SELECT *
  FROM sh.sales s, vw_jf
 WHERE s.cust_id = vw_jf.cust_id;
 
-- Transformed execution plan (default)
 
---------------------------------------------------
| Id  | Operation            | Name               |
---------------------------------------------------
|   0 | SELECT STATEMENT     |                    |
|*  1 |  HASH JOIN           |                    |
|   2 |   VIEW               | VW_JF_SET$F472D255 |
|   3 |    UNION-ALL         |                    |
|*  4 |     TABLE ACCESS FULL| CUSTOMERS          |
|*  5 |     TABLE ACCESS FULL| CUSTOMERS          |
|   6 |   PARTITION RANGE ALL|                    |
|   7 |    TABLE ACCESS FULL | SALES              |
---------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   6 - access("TIME_ID"=TO_DATE(' 2001-09-13 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   7 - access("PROD_ID"=13)
  10 - filter("C"."CUST_FIRST_NAME"='Abner' AND
              "C"."CUST_LAST_NAME"='Everett')
  11 - access("C"."CUST_ID"="S"."CUST_ID")
  12 - filter("C"."CUST_FIRST_NAME"='Abigail' AND
              "C"."CUST_LAST_NAME"='Ruddy')
  13 - access("C"."CUST_ID"="S"."CUST_ID")

The two branches of the original query in Listing 13-13 both reference the SH.SALES table in identical ways. The transformed query extracts the joins in the two branches and performs just one join after the UNION ALL operation. As you can imagine, this transformation may generate huge savings, but in this case we have replaced two indexed accesses with one full table scan and the benefit is small. In fact, if you add the commented out predicates it would actually be disadvantageous to perform join factorization. Fortunately, join factorization is a cost-based transformation and you will find that when the additional predicates are included the transformation will not be applied unless forced with a hint.

Hinting join factorization is tricky because the hint applies to a set query block. This creates a series of cascading complications.

It is not possible to use a local FACTORIZE_JOIN hint, and we need to resort to global hint syntax.
The global hint syntax means that we need to refer to the row sources being factorized using the query blocks in which they are contained. Although it wasn’t strictly necessary, I have named the query blocks in the UNION ALL branches as U1 and U2 using QB_NAME hints to make the example clearer.
QB_NAME hinting is only practical with traditional join syntax so I have had to abandon ANSI join syntax in Listing 13-13.

Set to Join

It is possible to express INTERSECT and MINUS set operations as joins. Listing 13-14 shows how this is done.

Listing 13-14. Set to join transformation

SELECT /*+ set_to_join(@set$1) */ /* no_set_to_join(@set$1) */
       prod_id
  FROM sh.sales s1
 WHERE time_id < DATE '2000-01-01' AND cust_id = 13
MINUS
SELECT prod_id
  FROM sh.sales s2
 WHERE time_id >=DATE '2000-01-01';
 
-- Untransformed execution plan (default)
 
------------------------------------------------------------------------------------
| Id  | Operation                                    | Name           | Cost (%CPU)|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                |  2396  (99)|
|   1 |  MINUS                                       |                |            |
|   2 |   SORT UNIQUE                                |                |    24   (0)|
|   3 |    PARTITION RANGE ITERATOR                  |                |    24   (0)|
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES          |    24   (0)|
|   5 |      BITMAP CONVERSION TO ROWIDS             |                |            |
|*  6 |       BITMAP INDEX SINGLE VALUE              | SALES_CUST_BIX |            |
|   7 |   SORT UNIQUE                                |                |  2372   (1)|
|   8 |    PARTITION RANGE ITERATOR                  |                |   275   (2)|
|   9 |     TABLE ACCESS FULL                        | SALES          |   275   (2)|
------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
  
   6 - access("CUST_ID"=13)
 
-- Transformed query
 
SELECT DISTINCT s1.prod_id
  FROM sh.sales s1
 WHERE     time_id < DATE '2000-01-01'
       AND cust_id = 13
       AND prod_id NOT IN (SELECT prod_id
                             FROM sh.sales s2
                            WHERE time_id >=DATE '2000-01-01'),
 
-- Transfomed execution plan (SET_TO_JOIN)
 
------------------------------------------------------------------------------------
| Id  | Operation                                    | Name           | Cost (%CPU)|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                |   301   (2)|
|   1 |  HASH UNIQUE                                 |                |   301   (2)|
|*  2 |   HASH JOIN ANTI                             |                |   301   (2)|
|   3 |    PARTITION RANGE ITERATOR                  |                |    24   (0)|
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES          |    24   (0)|
|   5 |      BITMAP CONVERSION TO ROWIDS             |                |            |
|*  6 |       BITMAP INDEX SINGLE VALUE              | SALES_CUST_BIX |            |
|   7 |    PARTITION RANGE ITERATOR                  |                |   275   (2)|
|   8 |     TABLE ACCESS FULL                        | SALES          |   275   (2)|
------------------------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("PROD_ID"="PROD_ID")
   6 - access("CUST_ID"=13)

The query in Listing 13-14 identifies the products that CUST_ID 13 bought in the 20th century that were not sold to anyone in the 21st century. The untransformed query performs unique sorts on each of the two branches of the UNION ALL operation before identifying the elements of the result set; this is how MINUS and INTERSECT set operations are always implemented. The application of the transformation means that one of these two sorts can be avoided. But take a look at operation 2 in the execution plan of the transformed query—the workarea for the HASH JOIN ANTI operation includes duplicate values, so we need to ask ourselves whether this transformation is, in fact, worthwhile. In this case CUST_ID 13 only made 19 transactions for 6 products in the 20th century, so eliminating 13 rows from this workarea isn’t a big deal. However, the transformation has avoided a unique sort of the 72 products from the 492,064 rows from SH.SALES relating to transactions in the 21st century. The transformation is a good idea on this occasion.

Despite the fact that the cost estimates show that the CBO believes (correctly) that the set-to-join transformation results in a superior execution plan, the transformation will not be applied to the unhinted query and will normally never be applied unless hinted!

To understand the reason why we need to use the SET_TO_JOIN hint in Listing 13-14, we need to understand a little more about the difference between cost-based transformations and heuristic transformations. When the CBO considers cost-based transformations, it attempts to determine whether the transformation will reduce the elapsed time of the query; if the transformation will reduce the estimated elapsed time of the query—i.e., the cost is lower—the transformation will be applied, otherwise the transformation will not be applied. A heuristic transformation works by applying a rule of some kind. So far all the heuristic transformations that we have come across in this chapter are applied unconditionally unless disabled by a hint. The working assumption in these cases is that the hint will probably, if not certainly, help.,

In the case of the set-to-join heuristic transformation, the heuristic rule is that the transformation is never applied unless hinted, the working assumption being that the cases where the transformation is needed are rare. Given the need for the CBO to come up with a plan in milliseconds, the CBO will not waste its time looking at the set-to-join transformation unless you hint it.

If you like playing around with hidden initialization parameters and don’t care about the support implications of doing so, you can change the value of "_convert_set_to_join". If you change the value of this parameter from its default value of FALSE to TRUE at the session level, the heuristic rule will be altered so that the transformation is always applied. This is the only circumstance in which you would need to use the NO_SET_TO_JOIN hint to disable the transformation.

The set-to-join transformation provides a fitting end to our discussion of set- and join-related transformations. We can now turn our attention to a new class of transformations relating to aggregate operations.

Aggregation Transformations

Aggregations are a surprisingly complex matter in Oracle databases and figure heavily in the life of a SQL tuning specialist. We need to understand what information needs to be aggregated, when it is to be aggregated, and how it is to be aggregated. In recent releases Oracle has included a number of aggregate-related transformations that not only help query performance but also act as excellent educational guides that you can use to aid your analysis of queries containing aggregations. Let us get started with a look at how distinct aggregations can be optimized.

Distinct Aggregation

I don’t know about you, but I have always felt that an aggregate function including the DISTINCT keyword is inherently inefficient and that using it is just a bit lazy. I don’t have to worry now because these days the CBO will convert my concise query into something less concise but more efficient. Take a look at Listing 13-15.

Listing 13-15. Distinct aggregation transformation

SELECT /*+   transform_distinct_agg */
       /* no_transform_distinct_agg */
      COUNT (DISTINCT cust_id) FROM sh.sales;
 
-- Untransformed execution plan (NO_TRANSFORM_DISTINCT_AGG)
----------------------------------------------------------------------
| Id  | Operation                      | Name           | Cost (%CPU)|
----------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |   407   (0)|
|   1 |  SORT GROUP BY                 |                |            |
|   2 |   PARTITION RANGE ALL          |                |   407   (0)|
|   3 |    BITMAP CONVERSION TO ROWIDS |                |   407   (0)|
|   4 |     BITMAP INDEX FAST FULL SCAN| SALES_CUST_BIX |            |
----------------------------------------------------------------------
-- Transformed query
 
WITH vw_dag
     AS (  SELECT cust_id
             FROM sh.sales
         GROUP BY cust_id)
SELECT COUNT (cust_id)
  FROM vw_dag;
 
-- Transformed execution plan (default)
 
------------------------------------------------------------------------
| Id  | Operation                        | Name           | Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                |   428   (5)|
|   1 |  SORT AGGREGATE                  |                |            |
|   2 |   VIEW                           | VW_DAG_0       |   428   (5)|
|   3 |    HASH GROUP BY                 |                |   428   (5)|
|   4 |     PARTITION RANGE ALL          |                |   407   (0)|
|   5 |      BITMAP CONVERSION TO ROWIDS |                |   407   (0)|
|   6 |       BITMAP INDEX FAST FULL SCAN| SALES_CUST_BIX |            |
------------------------------------------------------------------------

The distinct aggregation transformation was introduced in 11gR2 and, as far as I can see, is a heuristic transformation that is unconditionally applied where legal. The estimated cost is higher with the transformation than without, but by separating the job of identifying the distinct values from the job of counting them the query is actually much faster.

Distinct Placement

Distinct placement is a cost-based transformation used from 11gR2 onwards to eliminate duplicate rows as soon as possible. Conceptually, distinct placement and group by placement, which we will discuss in the next section, are very similar. Listing 13-16 shows the basic idea.

Listing 13-16. Distinct placement

SELECT /*+no_partial_join(s1) no_partial_join(s2)     place_distinct(s1) */
       /*  no_partial_join(s1) no_partial_join(s2) no_place_distinct(s1) */
      DISTINCT cust_id, prod_id
 FROM sh.sales s1 JOIN sh.sales s2 USING (cust_id, prod_id)
WHERE s1.time_id < DATE '2000-01-01' AND s2.time_id >=DATE '2000-01-01';
 
-- Untransformed execution plan (NO_PLACE_DISTINCT)
---------------------------------------------------------
| Id  | Operation                  | Name  | Cost (%CPU)|
---------------------------------------------------------
|   0 | SELECT STATEMENT           |       |  5302   (1)|
|   1 |  HASH UNIQUE               |       |  5302   (1)|
|*  2 |   HASH JOIN                |       |  1787   (1)|
|   3 |    PARTITION RANGE ITERATOR|       |   246   (2)|
|   4 |     TABLE ACCESS FULL      | SALES |   246   (2)|
|   5 |    PARTITION RANGE ITERATOR|       |   275   (2)|
|   6 |     TABLE ACCESS FULL      | SALES |   275   (2)|
---------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("S1"."PROD_ID"="S2"."PROD_ID" AND
              "S1"."CUST_ID"="S2"."CUST_ID")
-- Transformed query
 
WITH vw_dtp
     AS (SELECT /*+ no_partial_join(s1) no_merge */
                DISTINCT s1.cust_id, s1.prod_id
           FROM sh.sales s1
          WHERE s1.time_id < DATE '2000-01-01')
SELECT /*+ no_partial_join(s2) */
       DISTINCT cust_id, prod_id
  FROM vw_dtp NATURAL JOIN sh.sales s2
 WHERE s2.time_id >=DATE '2000-01-01';
 
-- Transformed execution plan (default)
 
---------------------------------------------------------------------
| Id  | Operation                    | Name            | Cost (%CPU)|
---------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |  5595   (1)|
|   1 |  HASH UNIQUE                 |                 |  5595   (1)|
|*  2 |   HASH JOIN                  |                 |  3326   (1)|
|   3 |    VIEW                      | VW_DTP_6DE9D1A7 |  2138   (1)|
|   4 |     HASH UNIQUE              |                 |  2138   (1)|
|   5 |      PARTITION RANGE ITERATOR|                 |   246   (2)|
|   6 |       TABLE ACCESS FULL      | SALES           |   246   (2)|
|   7 |    PARTITION RANGE ITERATOR  |                 |   275   (2)|
|   8 |     TABLE ACCESS FULL        | SALES           |   275   (2)|
---------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("ITEM_2"="S2"."PROD_ID" AND "ITEM_1"="S2"."CUST_ID")

The query in Listing 13-16 finds the combinations of customer and product that occurred in both the 20th and 21st centuries. To keep things simple I have disabled the partial join transformation that would have taken place without the NO_PARTIAL_JOIN hints. The addition of the extra HASH UNIQUE operation means that the workarea used by the HASH JOIN operation on line 2 in the transformed execution plan contains only 72 rows as opposed to the 426,779 rows from SH.SALES associated with 20th century sales, which are placed in the workarea associated with operation 2 of the untransformed query. The cost estimates suggest that the transformation isn’t worth the trouble, but somehow the CBO performs the transformation anyway and the query runs measurably faster.

Group by Placement

Group by placement is implemented in the same way and for the same reasons as distinct placement. Listing 13-17 highlights some new points, however.

Listing 13-17. Group by placement

  SELECT /*+   place_group_by((s p)) */
         /* no_place_group_by */
         cust_id
        ,c.cust_first_name
        ,c.cust_last_name
        ,c.cust_email
        ,p.prod_category
        ,SUM (s.amount_sold) total_amt_sold
    FROM sh.sales s
         JOIN sh.customers c USING (cust_id)
         JOIN sh.products p USING (prod_id)
GROUP BY cust_id
        ,c.cust_first_name
        ,c.cust_last_name
        ,c.cust_email
        ,p.prod_category;
 
-- Untransformed execution plan (NO_PLACE_GROUP_BY)
 
----------------------------------------------------------------------
| Id  | Operation                | Name                 | Cost (%CPU)|
----------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                      | 19957   (1)|
|   1 |  HASH GROUP BY           |                      | 19957   (1)|
|*  2 |   HASH JOIN              |                      |  2236   (1)|
|   3 |    VIEW                  | index$_join$_004     |     2   (0)|
|*  4 |     HASH JOIN            |                      |            |
|   5 |      INDEX FAST FULL SCAN| PRODUCTS_PK          |     1   (0)|
|   6 |      INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX |     1   (0)|
|*  7 |    HASH JOIN             |                      |  2232   (1)|
|   8 |     TABLE ACCESS FULL    | CUSTOMERS            |   423   (1)|
|   9 |     PARTITION RANGE ALL  |                      |   517   (2)|
|  10 |      TABLE ACCESS FULL   | SALES                |   517   (2)|
----------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("S"."PROD_ID"="P"."PROD_ID")
   4 - access(ROWID=ROWID)
   7 - access("S"."CUST_ID"="C"."CUST_ID")
 
-- Transformed query
 
WITH vw_gbc
     AS (  SELECT /*+ no_place_group_by */
                 s.cust_id
                 ,p.prod_category
                 ,SUM (s.amount_sold) total_amt_sold
             FROM sh.sales s JOIN sh.products p USING (prod_id)
         GROUP BY s.cust_id, p.prod_category, prod_id)
  SELECT /*+ no_place_group_by leading(vw_gbc)
            use_hash(c) no_swap_join_inputs(c) */
         cust_id
        ,c.cust_first_name
        ,c.cust_last_name
        ,c.cust_email
        ,vw_gbc.prod_category
        ,SUM (vw_gbc.total_amt_sold) total_amt_sold
    FROM vw_gbc JOIN sh.customers c USING (cust_id)
GROUP BY cust_id
        ,c.cust_first_name
        ,c.cust_last_name
        ,c.cust_email
        ,vw_gbc.prod_category;
 
-- Transformed execution plan (default)
 
-------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |  4802   (1)|
|   1 |  HASH GROUP BY              |                      |  4802   (1)|
|*  2 |   HASH JOIN                 |                      |  4319   (1)|
|   3 |    VIEW                     | VW_GBC_1             |  3682   (1)|
|   4 |     HASH GROUP BY           |                      |  3682   (1)|
|*  5 |      HASH JOIN              |                      |   521   (2)|
|   6 |       VIEW                  | index$_join$_004     |     2   (0)|
|*  7 |        HASH JOIN            |                      |            |
|   8 |         INDEX FAST FULL SCAN| PRODUCTS_PK          |     1   (0)|
|   9 |         INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX |     1   (0)|
|  10 |       PARTITION RANGE ALL   |                      |   517   (2)|
|  11 |        TABLE ACCESS FULL    | SALES                |   517   (2)|
|  12 |    TABLE ACCESS FULL        | CUSTOMERS            |   423   (1)|
-------------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("ITEM_1"="C"."CUST_ID")
   5 - access("S"."PROD_ID"="P"."PROD_ID")
   7 - access(ROWID=ROWID)

The query in Listing 13-17 identifies the total sales by each customer for each product category. The transformed query performs an aggregation after the join of SH.SALES with SH.PRODUCTS so that the workarea for the HASH JOIN on line 2 is small.

There seem to be some restrictions¹ on the application of the group by placement transformation, not all of which I have been able to work out. However, it does seem that on this occasion we can’t group the rows from SH.SALES without joining them at least once.

In Listing 13-17 the two operands of the join that produce the final result are:

The intermediate result set formed by the join of SH.SALES and SH.PRODUCTS
The table SH.CUSTOMERS

It is possible to perform group by placement on either or both of these operands. You can see the syntax for forcing group placement on the first of these operands in Listing 13-17. The hint PLACE_GROUP_BY ((S P) (C)) would be used to perform group placement on both operands.

Listing 13-18 shows what happens when we remove the hints from the transformed query in Listing 13-17 and then run EXPLAIN PLAN.

Listing 13-18. Group placement in a subquery

WITH vw_gbc
     AS (  SELECT /*+   place_group_by((s)) */
                  /* no_place_group_by */
                  s.cust_id, p.prod_category, SUM (s.amount_sold) total_amt_sold
             FROM sh.sales s JOIN sh.products p USING (prod_id)
         GROUP BY s.cust_id, p.prod_category, prod_id)
  SELECT /*+ place_group_by((vw_gbc)) */
         /* no_place_group_by leading(vw_gbc)use_hash(c) no_swap_join_inputs(c) */
         cust_id
        ,c.cust_first_name
        ,c.cust_last_name
        ,c.cust_email
        ,vw_gbc.prod_category
        ,SUM (vw_gbc.total_amt_sold) total_amt_sold
    FROM vw_gbc JOIN sh.customers c USING (cust_id)
GROUP BY cust_id
        ,c.cust_first_name
        ,c.cust_last_name
        ,c.cust_email
        ,vw_gbc.prod_category;
 
-- Untransformed execution plan (two NO_PLACE_GROUP_BY)
 
-------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      | 29396   (1)|
|   1 |  HASH GROUP BY              |                      | 29396   (1)|
|*  2 |   HASH JOIN                 |                      | 11675   (1)|
|   3 |    VIEW                     |                      |  9046   (1)|
|   4 |     HASH GROUP BY           |                      |  9046   (1)|
|*  5 |      HASH JOIN              |                      |   521   (2)|
|   6 |       VIEW                  | index$_join$_002     |     2   (0)|
|*  7 |        HASH JOIN            |                      |            |
|   8 |         INDEX FAST FULL SCAN| PRODUCTS_PK          |     1   (0)|
|   9 |         INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX |     1   (0)|
|  10 |       PARTITION RANGE ALL   |                      |   517   (2)|
|  11 |        TABLE ACCESS FULL    | SALES                |   517   (2)|
|  12 |    TABLE ACCESS FULL        | CUSTOMERS            |   423   (1)|
-------------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("VW_GBC"."CUST_ID"="C"."CUST_ID")
   5 - access("S"."PROD_ID"="P"."PROD_ID")
   7 - access(ROWID=ROWID)
 
-- Transformed query
 
WITH vw_gbc_1
     AS (  SELECT s.cust_id, s.prod_id, SUM (s.amount_sold) AS total_amt_sold
             FROM sh.sales s
         GROUP BY s.cust_id, s.prod_id)
    ,vw_gbc_2
     AS (  SELECT vw_gbc_1.cust_id, vw_gbc_1.total_amt_sold, p.prod_category
             FROM vw_gbc_1 JOIN sh.products p USING (prod_id)
         GROUP BY vw_gbc_1.cust_id
                 ,vw_gbc_1.total_amt_sold
                 ,p.prod_category
                 ,prod_id)
    ,vw_gbc
     AS (  SELECT vw_gbc_2.cust_id
                 ,SUM (vw_gbc_2.total_amt_sold) AS total_amt_sold
                 ,vw_gbc_2.prod_category
             FROM vw_gbc_2
         GROUP BY vw_gbc_2.cust_id, vw_gbc_2.prod_category)
  SELECT cust_id
        ,c.cust_first_name
        ,c.cust_last_name
        ,c.cust_email
        ,vw_gbc.prod_category
        ,SUM (vw_gbc.total_amt_sold) total_amt_sold
    FROM vw_gbc JOIN sh.customers c USING (cust_id)
GROUP BY cust_id
        ,c.cust_first_name
        ,c.cust_last_name
        ,c.cust_email
        ,vw_gbc.prod_category;
 
-- Transformed execution plan (default)
 
---------------------------------------------------------------------------
| Id  | Operation                     | Name                 | Cost (%CPU)|
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                      |  7224   (1)|
|   1 |  HASH GROUP BY                |                      |  7224   (1)|
|*  2 |   HASH JOIN                   |                      |  6741   (1)|
|   3 |    VIEW                       | VW_GBC_3             |  6104   (1)|
|   4 |     HASH GROUP BY             |                      |  6104   (1)|
|   5 |      VIEW                     |                      |  6104   (1)|
|   6 |       HASH GROUP BY           |                      |  6104   (1)|
|*  7 |        HASH JOIN              |                      |  3022   (1)|
|   8 |         VIEW                  | index$_join$_002     |     2   (0)|
|*  9 |          HASH JOIN            |                      |            |
|  10 |           INDEX FAST FULL SCAN| PRODUCTS_PK          |     1   (0)|
|  11 |           INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX |     1   (0)|
|  12 |         VIEW                  | VW_GBC_2             |  3019   (1)|
|  13 |          HASH GROUP BY        |                      |  3019   (1)|
|  14 |           PARTITION RANGE ALL |                      |   517   (2)|
|  15 |            TABLE ACCESS FULL  | SALES                |   517   (2)|
|  16 |    TABLE ACCESS FULL          | CUSTOMERS            |   423   (1)|
---------------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   2 - access("ITEM_1"="C"."CUST_ID")
   7 - access("ITEM_1"="P"."PROD_ID")
   9 - access(ROWID=ROWID)

Listing 13-18 shows that the transformed query from Listing 13-17 will undergo further transformations if we run or explain it directly. Now that we have two separate query blocks, the CBO can aggregate the data from SH.SALES before any join. The data is aggregated again after we join with SH.PRODUCTS and before we join with SH.CUSTOMERS. There is an extra and unnecessary aggregation introduced at this stage: the operations on lines 4 and 6 could be merged.

This analysis leads to an interesting SQL tuning technique: by looking at what transformations the CBO has performed, we may be able to further improve matters by running the transformation process iteratively. It may be that not all of these transformations will work out in practice, but at least you will have some ideas to try out!

Group by Pushdown

Group by pushdown does for parallel aggregations what distinct and group by placements do for join aggregations. The idea is the same: do as much aggregation as possible as early in the process as you can. Listing 13-19 shows the group by pushdown transformation in action.

Listing 13-19. Group by pushdown

  SELECT /*+ parallel gby_pushdown */ /* parallel no_gby_pushdown */
        prod_id
        ,cust_id
        ,promo_id
        ,COUNT (*) cnt
    FROM sh.sales
   WHERE amount_sold > 100
GROUP BY prod_id, cust_id, promo_id;
-- Untransformed execution plan (NO_GBY_PUSHDOWN)
 
---------------------------------------------------------
| Id  | Operation               | Name     | Cost (%CPU)|
---------------------------------------------------------
|   0 | SELECT STATEMENT        |          |   289   (3)|
|   1 |  PX COORDINATOR         |          |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |   289   (3)|
|   3 |    HASH GROUP BY        |          |   289   (3)|
|   4 |     PX RECEIVE          |          |   288   (2)|
|   5 |      PX SEND HASH       | :TQ10000 |   288   (2)|
|   6 |       PX BLOCK ITERATOR |          |   288   (2)|
|*  7 |        TABLE ACCESS FULL| SALES    |   288   (2)|
---------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   7 - filter("AMOUNT_SOLD">100)
 
-- Concept of the transformed query
 
WITH pq
     AS (  SELECT prod_id
                 ,cust_id
                 ,promo_id
                 ,COUNT (*) cnt
             FROM sh.sales
            WHERE amount_sold > 100 AND time_id < DATE '2000-01-01'
         GROUP BY prod_id, cust_id, promo_id
         UNION ALL
           SELECT prod_id
                 ,cust_id
                 ,promo_id
                 ,COUNT (*) cnt
             FROM sh.sales
            WHERE amount_sold > 100 AND time_id >=DATE '2000-01-01'
         GROUP BY prod_id, cust_id, promo_id)
  SELECT prod_id
        ,cust_id
        ,promo_id
        ,SUM (cnt) AS cnt
    FROM pq
GROUP BY prod_id, cust_id, promo_id;
 
-- Transformed execution plan
 
----------------------------------------------------------
| Id  | Operation                | Name     | Cost (%CPU)|
----------------------------------------------------------
|   0 | SELECT STATEMENT         |          |   325   (1)|
|   1 |  PX COORDINATOR          |          |            |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001 |   325   (1)|
|   3 |    HASH GROUP BY         |          |   325   (1)|
|   4 |     PX RECEIVE           |          |   325   (1)|
|   5 |      PX SEND HASH        | :TQ10000 |   325   (1)|
|   6 |       HASH GROUP BY      |          |   325   (1)|
|   7 |        PX BLOCK ITERATOR |          |   144   (2)|
|*  8 |         TABLE ACCESS FULL| SALES    |   144   (2)|
----------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
  
   8 - filter("AMOUNT_SOLD">100)

The query in Listing 13-19 is a straightforward COUNT aggregation performed on SH.SALES in parallel. Without the transformation, each row from SH.SALES where AMOUNT_SOLD > 100 would be sent from one parallel query server to another for aggregation. The concept of this transformation is shown in Listing 13-19 by a UNION ALL factored subquery. Each branch of the UNION ALL is supposed to relate to a parallel query server that reads a portion of the SH.SALES table. The conceptual query uses centuries to partition the data between branches, whereas the parallel query would use block-range granules.

In the conceptual query, each branch of the UNION ALL performs its own COUNT aggregation before the main query performs a SUM aggregation on the result to obtain the final result. In the parallel query, each parallel query slave that reads SH.SALES performs a COUNT aggregation on its subset of data so that the number of rows sent to DFO:TQ10001 for a final SUM aggregation is much reduced.

Once again, we can see that the CBO has performed a transformation that actually increases the estimated cost. The evidence is mounting that the cost-based transformation framework doesn’t actually perform final state optimization on each possible transformation, but instead performs some kind of approximate costing calculation.

Group by pushdown is a cost-based transformation that works for DISTINCT operations as well as for GROUP BY operations. Since GROUP BY and DISTINCT operations almost always significantly reduce cardinality, group by pushdown is almost always applied where legal.

The group by pushdown transformation is the last of the aggregation transformations that I wanted to cover. It is time now to look at subqueries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 13: Optimizer Transformations

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 13: Optimizer Transformations