CHAPTER 18

image

Miscellaneous SQL Constructs

The SQL language offers a wide variety of constructs—from the very simple to the extremely complex. In this book, we looked at many examples that demonstrate the core topics. This chapter is devoted to a review of several use cases for constructs that should help round out your knowledge.

Conditional Logic Constructs

Oracle supports many different ways to implement conditional IF–THEN–ELSE logic in SQL statements. Sometimes you need to determine conditionally which column data are emitted in the column list. Sometimes you need to determine which rows are returned by having a more complex condition in the predicate. Wherever you need to apply conditional logic, you have several constructs from which to choose that I describe in this section:

  • DECODE
  • CASE
  • NVL,  NVL2
  • COALESCE
  • NULLIF

Using DECODE

DECODE is a proprietary Oracle function that provides a simple conditional construct. Prior to Oracle version 8.1.6, when the CASE statement was introduced, DECODE was the only way to implement conditional logic. DECODE is limited to use with equality operators, so a CASE statement is certainly more flexible. But, DECODE can be useful for writing short, simple logical comparisons, as shown in Listing 18-1.

Listing 18-1.  Using a Simple DECODE Construct

SQL> select ename, decode (deptno, 10, 'Accounting',
  2  20, 'Research',
  3  30, 'Sales',
  4  'Unknown') as dept
  5  from scott.emp
  6  where rownum < 6 ;

ENAME      DEPT
---------- ----------
SMITH      Research
ALLEN      Sales
WARD       Sales
JONES      Research
MARTIN     Sales

One difference between DECODE and CASE is that DECODE does not expect datatype consistency in the conditions and resulting statements, as shown in Listing 18-2.

Listing 18-2.  DECODE Using Different Datatype Comparisons

SQL>select decode(42,42,1,
  2                 '42','2',
  3                 3) tst
  4  from dual ;

       TST
----------
         1

1 row selected.

SQL>select case 42 when 42  then 1
  2                when '42' then '2'
  3                else 3   end tst
  4  from dual ;
              when '42' then '2'
                   *
ERROR at line 2:
ORA-00932: inconsistent datatypes: expected NUMBER got CHAR

Although this isn’t a big factor most of the time, it is something to keep in mind. Also note that this datatype mismatch oversight on the part of DECODE is only applicable for number and string datatypes. If we try to use a date datatype with a number or string, we get the same error as CASE.

Another thing to keep in mind about DECODE is how it handles null values. If you use DECODE(null,null,'NULL','NOT NULL'), the answer is 'NULL'. This is different from most null comparisons in that null usually does not equal null (in other words, null = null does not match). In reality, using DECODE for null checks isn’t really the proper function to use because NVL, NVL2, or CASE is actually better. Generally speaking, CASE is a better, more flexible, easier to read alternative, but DECODE remains an old standby for simple logic.

Using CASE

A CASE statement is used to choose from a set of conditions to execute a corresponding statement when a condition is matched. There are two types of CASE statements: simple and searched. For simple CASE statements, a single expression is evaluated and compared with several possible values, as shown in Listing 18-3.

Listing 18-3.  A Simple CASE Statement

SQL>create table grades (
  2  student_id number,
  3  subject_id number,
  4  grade      char(1));

Table created.

SQL>insert into grades values (1,1,'A'),

SQL>insert into grades values (2,1,'C'),

SQL>insert into grades values (3,1,'A'),

SQL>insert into grades values (4,1,'D'),

SQL>insert into grades values (5,1,'F'),

SQL>insert into grades values (6,1,'B'),

SQL>insert into grades values (7,1,'C'),

SQL>insert into grades values (8,1,'C'),

SQL>select student_id,
  2  case grade
  3  when 'A' then 'Very Good'
  4  when 'B' then 'Good'
  5  when 'C' then 'Fair'
  6  when 'D' then 'Poor'
  7  when 'F' then 'Failure'
  8  else 'Withdrawn' end as grade
  9  from grades;

STUDENT_ID GRADE
---------- ---------
         1 Very Good
         2 Fair
         3 Very Good
         4 Poor
         5 Failure
         6 Good
         7 Fair
         8 Fair

8 rows selected.

A searched CASE statement differs from the simple type in that it allows for more complex conditions to be evaluated. With a searched CASE statement, multiple—possibly differing—Boolean expressions are evaluated, and the first one with a value of TRUE is chosen, as shown in Listing 18-4.

Listing 18-4. A Searched CASE Statement

SQL>select student_id,
  2  case when grade <= 'B' then 'Honor Roll'
  3       when grade = 'F' then 'Needs Tutoring'
  4       else 'Satisfactory' end grade_category
  5  from grades ;

STUDENT_ID GRADE_CATEGORY
---------- --------------
         1 Honor Roll
         2 Satisfactory
         3 Honor Roll
         4 Satisfactory
         5 Needs Tutoring
         6 Honor Roll
         7 Satisfactory
         8 Satisfactory

8 rows selected.

In both the simple and searched CASE constructs, the conditions are evaluated sequentially from top to bottom and execution halts after the first match is found. This means if more than one condition is TRUE, only the first result action is taken.

CASE statements can be used throughout a SQL statement and not just in the SELECT column list, as shown in the previous two listings. One way to use a CASE statement is to eliminate repeating accesses on the same table (or tables), as shown in Listing 18-5.

Listing 18-5.  Using CASE to Eliminate Repeated Table Accesses

-- Original SQL using UNION
select customer_id, order_total, 0 as disc_rate
from
(select c.customer_id,  nvl(sum(o.order_total),0) as order_total
  from oe.customers c, oe.orders o
 where c.customer_id = o.customer_id(+)
 group by c.customer_id ) t2
where order_total = 0
union
select customer_id, order_total, .1 as disc_rate
from
(select c.customer_id,  nvl(sum(o.order_total),0) as order_total
  from oe.customers c, oe.orders o
 where c.customer_id = o.customer_id
 group by c.customer_id ) t2
where order_total > 0 and order_total < 100000
union
select customer_id, order_total, .15 as disc_rate
from
(select c.customer_id,  nvl(sum(o.order_total),0) as order_total
  from oe.customers c, oe.orders o
 where c.customer_id = o.customer_id
 group by c.customer_id ) t2
where order_total >= 100000 and order_total <= 500000
union
select customer_id, order_total, .2 as disc_rate
from
(select c.customer_id,  nvl(sum(o.order_total),0) as order_total
  from oe.customers c, oe.orders o
 where c.customer_id = o.customer_id
 group by c.customer_id ) t2
where order_total > 500000
;

SQL>@dcplan
Enter value for sql_id: c7ngvkwthdaak
Enter value for child_no:
Enter value for format: BASIC +PREDICATE +ROWS

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------
Plan hash value: 729697565

---------------------------------------------------------------------
| Id  | Operation                         | Name            | Rows  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                 |       |
|   1 |  SORT UNIQUE                      |                 |    10 |
|   2 |   UNION-ALL                       |                 |       |
|*  3 |    FILTER                         |                 |       |
|   4 |     SORT GROUP BY NOSORT          |                 |     1 |
|   5 |      MERGE JOIN OUTER             |                 |   377 |
|   6 |       INDEX FULL SCAN             | CUSTOMERS_PK    |   319 |
|*  7 |       SORT JOIN                   |                 |   105 |
|   8 |        TABLE ACCESS BY INDEX ROWID| ORDERS          |   105 |
|*  9 |         INDEX RANGE SCAN          | ORD_CUSTOMER_IX |   105 |
|* 10 |    FILTER                         |                 |       |
|  11 |     SORT GROUP BY NOSORT          |                 |     3 |
|  12 |      TABLE ACCESS BY INDEX ROWID  | ORDERS          |   105 |
|  13 |       INDEX FULL SCAN             | ORD_CUSTOMER_IX |   105 |
|* 14 |    FILTER                         |                 |       |
|  15 |     SORT GROUP BY NOSORT          |                 |     3 |
|  16 |      TABLE ACCESS BY INDEX ROWID  | ORDERS          |   105 |
|  17 |       INDEX FULL SCAN             | ORD_CUSTOMER_IX |   105 |
|* 18 |    FILTER                         |                 |       |
|  19 |     SORT GROUP BY NOSORT          |                 |     3 |
|  20 |      TABLE ACCESS BY INDEX ROWID  | ORDERS          |   105 |
|  21 |       INDEX FULL SCAN             | ORD_CUSTOMER_IX |   105 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(NVL(SUM("O"."ORDER_TOTAL"),0)=0)
   7 - access("C"."CUSTOMER_ID"="O"."CUSTOMER_ID")
       filter("C"."CUSTOMER_ID"="O"."CUSTOMER_ID")
   9 - access("O"."CUSTOMER_ID">0)
  10 - filter((NVL(SUM("O"."ORDER_TOTAL"),0)>0 AND
              NVL(SUM("O"."ORDER_TOTAL"),0)<100000))
  14 - filter((NVL(SUM("O"."ORDER_TOTAL"),0)>=100000 AND
              NVL(SUM("O"."ORDER_TOTAL"),0)<=500000))
  18 - filter(NVL(SUM("O"."ORDER_TOTAL"),0)>500000)

-- Rewritten SQL using CASE
select c.customer_id, nvl(sum(o.order_total),0) as order_total,
case when nvl(sum(o.order_total),0) = 0 then 0
     when nvl(sum(o.order_total),0) < 100000 then .1
     when nvl(sum(o.order_total),0) between 100000 and 500000 then .15
     when nvl(sum(o.order_total),0) > 500000 then .2
     else 0 end
as disc_rate
from oe.customers c, oe.orders o
where c.customer_id = o.customer_id(+)
group by c.customer_id
;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------

Plan hash value: 3685486572

------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |       |
|   1 |  SORT GROUP BY NOSORT          |                 |    47 |
|   2 |   MERGE JOIN OUTER             |                 |   377 |
|   3 |    INDEX FULL SCAN             | CUSTOMERS_PK    |   319 |
|*  4 |    SORT JOIN                   |                 |   105 |
|   5 |     TABLE ACCESS BY INDEX ROWID| ORDERS          |   105 |
|*  6 |      INDEX RANGE SCAN          | ORD_CUSTOMER_IX |   105 |
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("C"."CUSTOMER_ID"="O"."CUSTOMER_ID")
       filter("C"."CUSTOMER_ID"="O"."CUSTOMER_ID")
   6 - access("O"."CUSTOMER_ID">0)

Notice how much simpler the SQL becomes when using CASE. Not only is it simpler, but also it is more efficient. The two tables, orders and customers, only have to be accessed once instead of once per condition. Keep in mind that CASE is a great alternative when you need different sets of data from the same tables.

Using NVL, NVL2, and COALESCE

The functions NVL, NVL2, and COALESCE all deal specifically with nulls. Although you could use DECODE or CASE to formulate null comparison logic, these functions are specifically available to provide special, simple treatment for nulls. Consider the syntax for each:

NVL(expr1, expr2)
If expr1 is null, then NVL returns expr2.
If expr1 is not null, then NVL returns expr1.

NVL2(expr1, expr2, expr3)
If expr1 is null, then NVL2 returns expr3.
If expr2 is not null, then NVL2 returns expr2.

COALESCE(expr[,expr]...)
Returns the first non-null expr in the expression list.

Now take a look at the examples of their use in Listing 18-6.

Listing 18-6.  NVL, NVL2, COALESCE, CASE, and DECODE Examples

SQL>select nvl(comm,0)
  2  from scott.emp
  3  where comm is null
  4  and rownum = 1;

NVL(COMM,0)
-----------
          0

1 row selected.

SQL>select nvl2(comm,comm,0)
  2  from scott.emp
  3  where comm is null
  4  and rownum = 1 ;

NVL2(COMM,COMM,0)
-----------------
                0

1 row selected.

SQL>select coalesce(comm,0)
  2  from scott.emp
  3  where comm is null
  4  and rownum = 1 ;

COALESCE(COMM,0)
----------------
               0

1 row selected.

SQL>select case when comm is null then 0
  2             else comm end comm
  3  from scott.emp
  4  where comm is null
  5  and rownum = 1 ;

      COMM
----------
         0

1 row selected.

SQL>select decode(comm,null,0,comm) comm
  2  from scott.emp
  3  where comm is null
  4  and rownum = 1 ;

      COMM
----------
         0

1 row selected.

In each of the examples, the answer is the same. The decision regarding which one to use is really just a matter of personal preference. When the comparison is truly intended to work with nulls, I prefer to avoid CASE and DECODE just because they require a bit more typing. Plus, by using NVL, NVL2, or COALESCE, the intent to deal specifically with nulls is clear. Listing 18-7 shows one of my favorite ways to use NVL in scripts when I want to have a dynamic WHERE clause.

Listing 18-7.  Using NVL for a Dynamic WHERE Clause

SQL>select sql_id, child_number, plan_hash_value plan_hash, executions execs,
  2  (elapsed_time/1000000)/decode(nvl(executions,0),0,1,executions) avg_etime,
  3  buffer_gets/decode(nvl(executions,0),0,1,executions) avg_lio,
  4  sql_text
  5  from v$sql s
  6  where upper(sql_text) like upper(nvl('&sql_text',sql_text))
  7  and sql_id like nvl('&sql_id',sql_id)
  8  order by 1, 2, 3
  9  /
Enter value for sql_text: %karen%
Enter value for sql_id:

no rows selected

The very simple technique shown here allows me to enter either value as input. Or, I can leave the values both empty and the null value causes the comparison to be made against the same column value, resulting in all rows being matched. I discussed similar techniques in Chapter 5, if you want to refer back for more detail. As I said, this is a quick way to be able to write a single SQL statement to cover several inputs. However, I caution you about using this technique in SQL when you’re concerned about performance. The technique can make it hard for the optimizer to produce the very best plan that covers all scenarios given different inputs. Be careful and test thoroughly!

Using NULLIF

If you’ve ever had to write a SQL statement that needed to include expressions that could result in a “divide by zero” error, NULLIF should be your friend. The syntax for NULLIF is as follows:

NULLIF(expr1,expr2)
If expr1 = expr2, NULLIF returns null.
If expr1 <> expr2, NULLIF returns expr1.

Typically, this function is to be used for numeric comparisons. You may use nonnumeric datatypes as well, as long as both expressions have the same datatype. As I mentioned, one of the reasons I love NULLIF is that it makes it easy to avoid the “divide by zero” error, as shown in Listing 18-8.

Listing 18-8.  Using NULLIF to Avoid the “Divide by Zero” Error

SQL> select sql_id, child_number, plan_hash_value plan_hash, executions execs,
  2  (elapsed_time/1000000)/nullif(executions,0) avg_etime,
  3  buffer_gets/nullif(executions,0) avg_lio,
  4  sql_text
  5  from v$sql s
  6  where upper(sql_text) like upper(nvl('&sql_text',sql_text))
  7  and sql_id like nvl('&sql_id',sql_id)
  8  order by 1, 2, 3
  9  /
Enter value for sql_text:
Enter value for sql_id: c7ngvkwthdaak

SQL_ID        CHILD_NUMBER  PLAN_HASH EXECS  AVG_ETIME AVG_LIO SQL_TEXT
------------- ------------ ---------- ----- ---------- ------- -----------
c7ngvkwthdaak            0  729697565     1    .012356      13 select cust

1 row selected.

Did you notice that the SQL statement I used here is the same one I used in Listing 18-7 with a combination of DECODE and NVL? This syntax yields the same result but is much simpler and easier to read.

Conditional logic constructs provide a level of flexibility similar to what you’d find in PL/SQL or other programming languages. However, the ability to include conditional processing directly in a SQL statement allows us to exploit fully the power of set-based data access. There are several options that can be used interchangeably and your choice of which to use is mainly a personal preference. However, I encourage you to review options you’ve never used before to see if they help you write simple, easy-to-understand code.

PIVOT/UNPIVOT Queries

Pivoting is a common technique that allows you to write cross-tabulation (also called crosstab, matrix, or transposed) queries that rotate rows of data into aggregated columns of data. Data may also be “unpivoted” (in other words, rotated from columns into rows) using similar techniques, but unpivoting is not the reverse of pivoting. Pivoting data creates aggregates; unpivoting cannot undo aggregations made by pivoting, so it’s not truly an opposite function.

Using PIVOT

Prior to Oracle 11g, pivoting data required some bulky and tedious manipulation to formulate a SQL statement to pivot. However, beginning with Oracle 11g, the PIVOT SQL function was introduced to provide a much more elegant solution. Listing 18-9 shows a simple example of how a pivot query was written prior to 11g and after using the PIVOT function.

Listing 18-9.  Comparing Old Pivot Query Formulation with the PIVOT Function

-- Old way

SQL>select *
  2  from (select job, sum(decode(deptno,10,sal)) dept10,
  3                    sum(decode(deptno,20,sal)) dept20,
  4                    sum(decode(deptno,30,sal)) dept30,
  5                    sum(decode(deptno,40,sal)) dept40
  6          from scott.emp
  7         group by job)
  8  order by job ;

JOB           DEPT10     DEPT20     DEPT30     DEPT40
--------- ---------- ---------- ---------- ----------
ANALYST                    6000
CLERK           1300       1900        950
MANAGER         2450       2975       2850
PRESIDENT       5000
SALESMAN                              5600

5 rows selected.

-- New way using PIVOT function

SQL>select * from
  2   (select job, deptno, sum(sal) sal from scott.emp group by job, deptno)
  3     PIVOT ( sum(sal) FOR deptno IN (10, 20, 30, 40) )
  4  order by job;

JOB               10         20         30         40
--------- ---------- ---------- ---------- ----------
ANALYST                    6000
CLERK           1300       1900        950
MANAGER         2450       2975       2850
PRESIDENT       5000
SALESMAN                              5600

5 rows selected.

As shown in the listing, pivoting data is the process of taking multiple rows, aggregating them, and transposing them into columns, with each column representing a different range of aggregate data. And, as you can see, using the PIVOT function is much simpler than the old way. The syntax of a PIVOT statement is as follows:

SELECT ...
FROM    ...
PIVOT [XML]         image Note the placement between the FROM and WHERE clauses
        ( pivot_clause
          pivot_for_clause
          pivot_in_clause )
WHERE   ...

The three clauses used with the PIVOT keyword are the following:

  1. pivot_clause: defines the columns to be aggregated
  2. pivot_for_clause: defines the columns to be grouped and pivoted
  3. pivot_in_clause: defines the range of values to which to limit the results; the resulting aggregations for each value are transposed into a separate column

Using the PIVOT query from Listing 18-9 as an example, the clauses are as follows:

select * from
  (select job, deptno, sum(sal) sal
     from scott.emp group by job, deptno)
          PIVOT ( sum(sal)            image pivot_clause
           FOR deptno                  image pivot_for_clause
           IN (10, 20, 30, 40) )                image pivot_in_clause
order by job;

In this query, the aggregate salary totals by department have been transposed into columns. Our example used a single aggregation on the sal column, but you can specify multiple columns if desired. Another thing to keep in mind is that pivot operations perform an implicit GROUP BY using any columns not in the pivot_clause. In our example, we used an inline view to group by the data first, but this actually was not necessary because of the implicit GROUP BY, as shown in Listing 18-10.

Listing 18-10.  Impliciting GROUP BY When Using the PIVOT Function

SQL>select * from
  2  (select job, deptno, sal from scott.emp)
  3   pivot (sum(sal) for deptno in (10,20,30,40)) ;

JOB               10         20         30         40
--------- ---------- ---------- ---------- ----------
CLERK           1300       1900        950
SALESMAN                              5600
PRESIDENT       5000
MANAGER         2450       2975       2850
ANALYST                    6000

5 rows selected.

The data are pivoted on the deptno aggregations only, and the range of values is limited to the hard coded pivot_in_clause list. As you can see, the column names are the same as the pivot_in_clause. This is the default behavior and it can be changed by using column aliases if desired, as shown in Listing 18-11.

Listing 18-11.  Using Aliases with the PIVOT Function

SQL>select * from
  2  (select job, deptno, sal from scott.emp)
  3  pivot (sum(sal) as sum_sal
  4  for deptno
  5  in (10 as dept10,
  6      20 as dept20,
  7      30 as dept30,
  8      40 as dept40)) ;

JOB       DEPT10_SUM_SAL DEPT20_SUM_SAL DEPT30_SUM_SAL DEPT40_SUM_SAL
--------- -------------- -------------- -------------- --------------
CLERK               1300           1900            950
SALESMAN                                          5600
PRESIDENT           5000
MANAGER             2450           2975           2850
ANALYST                            6000

5 rows selected.

Notice how Oracle concatenates the aliases to form the column names. This means if we don’t alias the values in the pivot_in_clause, the column names are a concatenation of the list of values and our aggregated column alias (for example, 10_SUM_SAL).

You can also pivot multiple columns, but be careful because each additional column means doubling the number of aggregates. Listing 18-12 shows the addition of one new aggregate and a limiting WHERE clause as well.

Listing 18-12.  Using Multiple Aggregates for the PIVOT Function

SQL> select * from
  2  (select job, deptno, sal from scott.emp)
  3  pivot (sum(sal) as sum, count(sal) as ct
  4  for deptno
  5  in (10 as dept10,
  6      20 as dept20,
  7      30 as dept30))
  8  where job = 'MANAGER';

JOB       DEPT10_SUM  DEPT10_CT DEPT20_SUM  DEPT20_CT DEPT30_SUM  DEPT30_CT
--------- ---------- ---------- ---------- ---------- ---------- ----------
MANAGER         2450          1       2975          1       2850          1

1 row selected.

There are a few things to be aware of when using PIVOT:

  • Any columns referenced only in the pivot_clause cannot be used in the SELECT column list.
  • Any columns referenced only in the pivot_for_clause cannot be used in the SELECT column list.
  • All columns in the pivot_clause must be aggregate functions.

Last, whenever you use a PIVOT function, it is typically reflected in the execution plan, as shown in Listing 18-13.

Listing 18-13.  Execution Plan for the PIVOT Function

---------------------------------------------------
| Id  | Operation                  | Name | Rows  |
---------------------------------------------------
|   0 | SELECT STATEMENT           |      |       |
|   1 |  SORT GROUP BY NOSORT PIVOT|      |     1 |
|*  2 |   TABLE ACCESS FULL        | EMP  |     3 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("JOB"='MANAGER')

Depending on how we limit our pivot data, it is possible for the optimizer not to use the PIVOT operation. When this happens, the optimizer determines that it is more effective to do a normal AGGREGATE operation instead. However, most of the time, the PIVOT operation is chosen as the most effective.

One thing you may have noticed is that the examples have used a known list of values for the pivot_in_clause. Unfortunately, there isn’t support for dynamic lists of values. Listing 18-14 shows the resulting error message if you try to use a subquery to provide a list of values.

Listing 18-14.  Dynamic pivot_in_clause Is Not Allowed

SQL>select * from scott.emp
  2  pivot (sum(sal)
  3  for deptno in (select deptno from scott.dept));
for deptno in (select deptno from scott.dept))
               *
ERROR at line 3:
ORA-00936: missing expression

But never fear! There is a remedy! When I listed the PIVOT syntax earlier in this section, did you notice the optional XML keyword? If you add the XML keyword, the generated pivot set provides the results in XML format. The root element is <PivotSet>, and each value is identified by a name–value pair, as shown in Listing 18-15.

Listing 18-15.  Using the XML Option

SQL>select *
  2  from (select job, deptno, sal from scott.emp)
  3   pivot XML
  4  (sum(sal)
  5   for deptno in (ANY));

JOB       DEPTNO_XML
--------- --------------------------------------------------
ANALYST   <PivotSet><item><column name =
          "DEPTNO">20</column><column name =
          "SUM(SAL)">6000</column></item></PivotSet>

CLERK     <PivotSet><item><column name =
          "DEPTNO">10</column><column name =
          "SUM(SAL)">1300</column></item><item><column name
          = "DEPTNO">20</column><column name =
          "SUM(SAL)">1900</column></item><item><column name
          = "DEPTNO">30</column><column name =
          "SUM(SAL)">950</column></item></PivotSet>

MANAGER   <PivotSet><item><column name =
          "DEPTNO">10</column><column name =
          "SUM(SAL)">2450</column></item><item><column name
          = "DEPTNO">20</column><column name =
          "SUM(SAL)">2975</column></item><item><column name
          = "DEPTNO">30</column><column name =
          "SUM(SAL)">2850</column></item></PivotSet>

PRESIDENT <PivotSet><item><column name =
          "DEPTNO">10</column><column name =
          "SUM(SAL)">5000</column></item></PivotSet>

SALESMAN  <PivotSet><item><column name =
          "DEPTNO">30</column><column name =
          "SUM(SAL)">5600</column></item></PivotSet>

 
5 rows selected.

The resulting data can now be manipulated with XPath or XQuery expressions. You can also use any subquery you desire in place of the ANY keyword. If you use a subquery, make sure the result set is unique; otherwise, an error occurs. Another thing to keep in mind about generating XML elements is that quite a bit of data above and beyond the results themselves are generated. Consider your requirements carefully and weigh the overheads of this method with how you’ll process the XML results to make sure your performance doesn’t suffer adversely.

Using UNPIVOT

The UNPIVOT function also appeared with PIVOT in Oracle 11g. To unpivot data means to rotate data from columns back to rows. As mentioned earlier, if you PIVOT data, you can’t simply UNPIVOT the pivoted dataset and expect it to return the output as it was prior to pivoting. UNPIVOT has no capability to convert aggregates back to their raw data elements. The syntax of an UNPIVOT statement is as follows:

SELECT ...
FROM   ...
PIVOT [INCLUDE|EXCLUDE NULLS]
        ( unpivot_clause
          unpivot_for_clause
          unpivot_in_clause )
WHERE   ...

The three clauses used with the UNPIVOT keyword are the following:

  1. unpivot_clause: defines the name for a column to represent the unpivoted values
  2. unpivot_for_clause: defines the name for the column that results from the unpivot query
  3. unpivot_in_clause: defines the list of pivoted columns (not values) to be unpivoted

To demonstrate how UNPIVOT works, let’s start by creating a PIVOTed set of data using our example query, as shown in Listing 18-16.

Listing 18-16.  Creating a Table with PIVOTed Data  

SQL>create table pivot_tab as
  2  select * from
  3   (select job, deptno, sal from scott.emp)
  4   pivot (sum(sal) as sum_sal
  5   for deptno
  6   in (10 as dept10,
  7       20 as dept20,
  8       30 as dept30,
  9       40 as dept40)) ;

Table created.

SQL>select * from pivot_tab ;

JOB       DEPT10_SUM_SAL DEPT20_SUM_SAL DEPT30_SUM_SAL DEPT40_SUM_SAL
--------- -------------- -------------- -------------- --------------
CLERK               1300           1900            950
SALESMAN                                          5600
PRESIDENT           5000
MANAGER             2450           2975           2850
ANALYST                            6000

5 rows selected.

Now, let’s unpivot our dataset, as shown in Listing 18-17.

Listing 18-17.  Using UNPIVOT

SQL>select * from pivot_tab
  2  unpivot ( sal_amt
  3  for deptsal_desc
  4  in (dept10_sum_sal, dept20_sum_sal, dept30_sum_sal, dept40_sum_sal)) ;

JOB       DEPTSAL_DESC      SAL_AMT
--------- -------------- ----------
CLERK     DEPT10_SUM_SAL       1300
CLERK     DEPT20_SUM_SAL       1900
CLERK     DEPT30_SUM_SAL        950
SALESMAN  DEPT30_SUM_SAL       5600
PRESIDENT DEPT10_SUM_SAL       5000
MANAGER   DEPT10_SUM_SAL       2450
MANAGER   DEPT20_SUM_SAL       2975
MANAGER   DEPT30_SUM_SAL       2850
ANALYST   DEPT20_SUM_SAL       6000

9 rows selected.

Note that the list of column names from our pivoted table data provides the column value for the unpivot_for_clausecolumn named deptsal_desc. The values themselves are used to populate the sal_amt column values, as indicated in unpivot_clause. Although we used our example table of pivoted data, we can unpivot the columns of any table or view.

It is useful to alias the columns in the unpivot_in_clause because it allows us to change the descriptive data to something different from the original column name. Unlike the PIVOT function, the column or columns referred to in the unpivot_in_clause are the only ones that can be aliased. This makes sense, because the unpivot_clause (sal_amt) is the name we wish to use and it doesn’t need further aliasing. Listing 18-18 shows an example of using an alias with a bit of creativity to reverse engineer back to deptno.

Listing 18-18.  Using Aliases with UNPIVOT  

SQL> select * from pivot_tab
  2  unpivot ( sal_amt
  3  for deptno
  4  in (dept10_sum_sal as '10',
  5      dept20_sum_sal as '20',
  6      dept30_sum_sal as '30',
  7      dept40_sum_sal as '40')) ;

JOB       DEPTNO    SAL_AMT
--------- ------ ----------
CLERK     10           1300
CLERK     20           1900
CLERK     30            950
SALESMAN  30           5600
PRESIDENT 10           5000
MANAGER   10           2450
MANAGER   20           2975
MANAGER   30           2850
ANALYST   20           6000

9 rows selected.

Similar to the PIVOT function, the optimizer identifies the presence of the UNPIVOT function in the execution plan using an UNPIVOT operation, as shown in Listing 18-19.

Listing 18-19.  Execution Plan for the UNPIVOT Function

-------------------------------------------------
| Id  | Operation           | Name      | Rows  |
-------------------------------------------------
|   0 | SELECT STATEMENT    |           |       |
|*  1 |  VIEW               |           |    20 |
|   2 |   UNPIVOT           |           |       |
|   3 |    TABLE ACCESS FULL| PIVOT_TAB |     5 |
-------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("unpivot_view_007"."SAL_AMT" IS NOT NULL)

Note the presence of the VIEW operation that uses a filter predicate to remove the null values for sal_amt. This is the default behavior resulting from the EXCLUDING NULLS clause. If we use INCLUDING NULLS instead, the filter is removed and rows with null values are displayed.

A really nifty use of UNPIVOT that is related to unpivoting a previously pivoted dataset is simply using it to pivot particularly wide output to read down the page instead of across, as shown in Listing 18-20.

Listing 18-20.  Using the UNPIVOT Function

SQL>select column_name, column_value from
  2  (select to_char(empno) as empno, ename, job, to_char(mgr) mgr,
  3   to_char(hiredate,'mm/dd/yyyy') hiredate, to_char(sal) sal,
  4   to_char(comm) comm, to_char(deptno) deptno
  5  from scott.emp where rownum = 1)
  6  unpivot (column_value
  7  for column_name
  8  in (empno, ename, job, mgr, hiredate, sal, comm, deptno));

COLUMN_N COLUMN_VALUE
-------- ----------------------------------------
EMPNO    7369
ENAME    SMITH
JOB      CLERK
MGR      7902
HIREDATE 12/17/1980
SAL      800
DEPTNO   20

7 rows selected.

In this example, note how the results are now displayed down the page instead of across. Also note that all the columns have converted to string datatypes in the inline view. This is because of a restriction on how an UNPIVOT query is processed. Unless all the columns have the same datatype, the query fails with “ORA-01790: expression must have same datatype as corresponding expression.” This restriction certainly makes things a bit tedious but, given a bit of effort, we could write a dynamic SQL wrapper for this type of query and use it to unpivot data from any table. I leave that exercise to you for additional practice.

SQL to Generate Test Data

There are times when you may find it difficult to performance test your SQL because of the lack of appropriate test data. This could be a result of the fact that your application is new and lacks real data that can be copied for development use, or it could be that your development environment only loads partial datasets and not a full copy ofproduction. Or, maybe you just want to test how a particular bit of SQL works and only need dummy data. Using some very simple SQL constructs, you can generate random test data quite easily.

What to Watch out For

Before you start generating any data, make sure you know what to watch out for. The main thing to keep in mind is to try and keep to a minimum the overhead required to create your data. One of the most frequently used techniques can be a real resource hog if you’re not careful. This technique involves using an actual “big” table as a driver to create your test data, as shown in Listing 18-21.

Listing 18-21.  Generating Rows Using a “Big” Table

SQL>create table from_big as
  2  select * from dba_source ;

Table created.

Elapsed: 00:00:10.40

SQL>select count(*) from from_big ;

  COUNT(*)
----------
    403941

1 row selected.

In this case, we get a good-size set of data, but it took more than ten seconds and it had to read from a system view. This isn’t too bad if you only plan on generating a single copy of the data. But, what happens when you want to create ten times the data? There are several other techniques to consider to produce more random data without requiring you to query preexisting tables:

  • CONNECT BY clause
  • MODEL clause
  • WITH clause

Using CONNECT BY

You can use CONNECT BY to generate data using FAST DUAL as the source. Listing 18-22 shows how to use this method most efficiently.

Listing 18-22.  Generating Rows Using CONNECT BY

SQL>create table conn_by as
  2  select count(*) ct from
  3  (select rownum rn from
  4   (select rownum rn from dual connect by rownum <= 1000) t1,
  5   (select rownum rn from dual connect by rownum <= 1000) t2,
  6   (select rownum rn from dual connect by rownum <= 1000) t3
  7   where rownum <= 1000000);

Table created.

Elapsed: 00:00:03.15

SQL> @dcplan
Enter value for sql_id: 84s1cjrn9sumf
Enter value for child_no: 0
Enter value for format: BASIC

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
create table conn_by

Plan hash value: 2015107946

-------------------------------------------------------
| Id  | Operation                              | Name |
-------------------------------------------------------
|   0 | CREATE TABLE STATEMENT                 |      |
|   1 |  LOAD AS SELECT                        |      |
|   2 |   OPTIMIZER STATISTICS GATHERING       |      |
|   3 |    SORT AGGREGATE                      |      |
|   4 |     VIEW                               |      |
|   5 |      COUNT STOPKEY                     |      |
|   6 |       MERGE JOIN CARTESIAN             |      |
|   7 |        MERGE JOIN CARTESIAN            |      |
|   8 |         VIEW                           |      |
|   9 |          COUNT                         |      |
|  10 |           CONNECT BY WITHOUT FILTERING |      |
|  11 |            FAST DUAL                   |      |
|  12 |         BUFFER SORT                    |      |
|  13 |          VIEW                          |      |
|  14 |           COUNT                        |      |
|  15 |            CONNECT BY WITHOUT FILTERING|      |
|  16 |             FAST DUAL                  |      |
|  17 |        BUFFER SORT                     |      |
|  18 |         VIEW                           |      |
|  19 |          COUNT                         |      |
|  20 |           CONNECT BY WITHOUT FILTERING |      |
|  21 |            FAST DUAL                   |      |
-------------------------------------------------------

In this example, it took just more than three seconds to generate a one million-row table. You may be wondering why there are three inline views being cartesian joined to create the data instead of just using a single CONNECT BY ROWNUM < 1000000. This method is used to prevent the process from running out of private memory, as shown in Listing 18-23.

Listing 18-23.  Memory Error When Using CONNECT BY

SQL>select count(*) from
  2  (select rownum rn from dual connect by rownum <= 10000000) ;
    (select rownum rn from dual connect by rownum <= 10000000)
                       *
ERROR at line 2:
ORA-30009: Not enough memory for CONNECT BY operation

As shown, I used a large number of rows (ten million instead of one million), but this simple statement can consume enough UGA/PGA to error out as a result of a lack of memory. CONNECT BY is recursive in nature and therefore consumes more memory as you increase the number of rows you want in a single shot. So, to create large amounts of data, you simply have to use CONNECT BY with lower recursion. Writing the statement to use MERGE JOIN CARTESIAN does the trick!

Personally, I prefer a combination method of using the CONNECT BY method and WITH clause, but it doesn’t use a recursive WITH (discussed later). Instead, it simply uses the WITH clause to avoid repeating the inline query, as shown in Listing 18-22. Listing 18-24 demonstrates how to combine the WITH and CONNECT BY clauses to formulate an easy-to-read, understandable, and efficient data generator.

Listing 18-24.  Using WITH and CONNECT BY

SQL>create table with_conn_by as
  2  with data_gen as
  3  (
  4  select /*+ materialize */ rownum rid
  5  from dual
  6  connect by level < 1000
  7  )
  8  select mod(rownum-1, 1000) as num_mod1,
  9    trunc(dbms_random.value(0,1000)) rand_val1,
 10    trunc(dbms_random.value(0,1000)) rand_val2,
 11    rpad('z',100) pad_data
 12  from data_gen t1, data_gen t2, data_gen t3
 13  where rownum <= 1000000 ;

Table created.

Elapsed: 00:00:32.97

Notice how in this example I also demonstrated how to populate additional columns with various types of data. Using variations of this example, you can create any quantity and type of data you wish.

Using the MODEL Clause

We discussed the MODEL clause in great detail in Chapter 9. Although it may seem a bit intimidating at first, using the MODEL clause is quite powerful and particularly useful—and simple—when generating dummy data, as shown in Listing 18-25.

Listing 18-25.  Generating Rows Using the MODEL Clause

SQL>create table model_tab as (
  2   select *
  3   from dual
  4   model
  5     dimension by (0 d)
  6     measures (0 rnum)
  7     rules iterate (1000000)
  8     (rnum[ITERATION_NUMBER] = ITERATION_NUMBER+1)
  9   )  ;

Table created.

Elapsed: 00:05:09.26

The MODEL clause method took significantly longer than the CONNECT BY method to produce the same one million-row dataset (more than five minutes vs. a few seconds!). This time difference results from the way the two constructs use memory. Memory usage reported by CONNECT BY doesn’t use work-area memory, so normal PGA size restrictions such as PGA_AGGREGATE_TARGET don’t apply to it. But, the MODEL clause does use work-area memory traditionally, and if there is too much information to be retained in the private memory allocation, it spills over to disk (TEMP). For this reason, using the MODEL clause is more limiting in terms of time and resources required. I have only used this method a handful of times and I find it most useful when I want to create a limited amount of data with numerous columns using spreadsheetlike formulas to populate them.

Using the Recursive WITH Clause

Subfactored queries were discussed in Chapter 10. Listing 18-26 shows an example of using a recursive WITH clause to generate test data.

Listing 18-26.  Generating Rows Using the Recursive WITH Clause

SQL>create table with_tab as
  2  with data_gen(rn)
  3  as
  4  (select 1 rn from dual
  5   union all
  6   select rn+1 from data_gen
  7   where rn < 1000000 )
  8  select * from data_gen;

Table created.

Elapsed: 00:00:26.43

Once again, the time it takes to use a recursive WITH clause is significantly greater than the CONNECT BY construct, but it is also significantly less than the MODEL clause. Always keep in mind that the more rows you wish to create—particularly if you are creating random data (similar to what I showed in Listing 18-24) and not just a one-column table—as shown in most of the examples, the longer it takes and the more resources (particularly CPU) it uses.

Data Generator Wrap-up

In this section, I showed you a few of the more common ways to generate data. There are others, including a pipelined PL/SQL function, that can be used as well. The construct you choose certainly makes a difference with regard to the time it takes to generate the amount and type of data you want as, well as the database resources used. Always keep in mind that both your time and your database’s resources are valuable. Don’t waste either one!

Summary

The SQL language offers many constructs to help you build and extract data from your database. Many constructs have similar functionality and serve the same purpose, such as NVL, NVL2, and COALESCE. Others, such as PIVOT and UNPIVOT, help you manipulate data to report it more understandably. Using the examples in this chapter as a guide, I urge you to continue to explore out-of-the-box uses for the SQL language that help you do your job more proficiently.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.75.165