CHAPTER 13

image

Beyond the SELECT

This chapter is a collection of topics involving SQL statements that are not straight SELECTs. These statements are often referred to as Data Manipulation Language (or DML) statements. In this chapter, I provide some information on some of the less well-known options to the standard DML commands—namely, INSERT, UPDATE, DELETE, and MERGE. I also focus on alternate approaches, with an eye toward improving performance.

INSERT

INSERT is the primary command used in the SQL language to load data. If you are reading this book, you probably already have a pretty good handle on the INSERT command. In this section, I talk about some of the less often used options, some of which I rarely, if ever, see in the wild. I believe this is because of a lack of familiarity more than a lack of functionality.

There are two basic methods that Oracle uses for inserts. For simplicity, let’s call them the slow way and the fast way. The slow way is usually called conventional. With this mechanism, the data go through the buffer cache, empty space in existing blocks is reused, undo is generated for all data and metadata changes, and redo is generated for all changes by default. This is a lot of work, which is why I call it the slow way. The fast way is also called direct path. This method does not look for space in existing blocks; it just starts inserting data above the high-water mark. The fast way protects the data dictionary with undo and redo for metadata changes, but it generates no undo for data changes. It can also avoid redo generation for data changes in some cases, such as nologging operations. Keep in mind that, by default, indexes on tables loaded with direct path inserts still generate both undo and redo.

Direct Path Inserts

Direct path inserts can be invoked by using the APPEND hint (parallel inserts do this by default, by the way). In Oracle Database 11g Release 2, the APPEND_VALUES hint was added and can be used for inserts that specify a VALUES clause as opposed to using SELECT to provide the values for inserting. Listing 13-1 shows a simple example of both forms.

Listing 13-1.  Simple Insert APPEND and APPEND_VALUES

insert /*+ append */ into big_emp select * from hr.employees;

insert /*+ append_values */ into dual (dummy) values ('Y'),

There are a few issues with the fast approach, however.

  • Only one direct path write can occur on a table at any given time.
  • Data are inserted above the high-water mark, so any available space in the blocks below the high-water mark are not used by the direct path inserts.
  • The session that performs the INSERT APPEND can’t do anything with the table (even select from it) after the INSERT until a commit or rollback is issued.
  • Some of the less frequently used data structures (object types, IOTs, and so on) are not supported.
  • Referential constraints are not supported (in other words, they cause the INSERT to be executed using the conventional method).

The first item in the list is the biggest issue. In an OLTP-type system with many small inserts occurring frequently, the direct path mechanism just does not work. The second bulleted item is also a big issue. It makes no sense for small inserts to be applied in empty blocks above the high-water mark. This results in a huge waste of space. In fact, in Oracle Database 11g, the behavior of the APPEND hint was modified to allow it to be used in INSERT statements using the VALUES clause (prior to 11g, it is ignored unless the insert statement has a SELECT clause). This behavior change resulted in a bug being logged because it was using so much space for small inserts. The eventual resolution was to return the APPEND hint to its original behavior and introduce the APPEND_VALUES hint in Oracle Database 11gR2. At any rate, note that the direct path inserts are designed for large, “bulk” inserts only.

Note also that, as with most hints, the APPEND hint is silently ignored if for any reason it is not possible for Oracle to obey the hint. When this occurs with the APPEND hint, the insert is done using the conventional mechanism. Listing 13-2 shows an example of the APPEND hint being ignored because of a foreign key constraint.

Listing 13-2.  Disabled APPEND Hint

SQL> @constraints
Enter value for owner: KRM
Enter value for table_name: BIG_EMP
Enter value for constraint_type:

TABLE_NAME  CONSTRAINT_NAME      C SEARCH_CONDITION                    STATUS
----------- -------------------- - ---------------------------------- -------
BIG_EMP     BIG_EMP_MANAGER_FK   R                                    ENABLED
BIG_EMP     SYS_C0026608         C "JOB_ID" IS NOT NULL               ENABLED
BIG_EMP     SYS_C0026607         C "HIRE_DATE" IS NOT NULL            ENABLED
BIG_EMP     SYS_C0026606         C "EMAIL" IS NOT NULL                ENABLED
BIG_EMP     SYS_C0026605         C "LAST_NAME" IS NOT NULL            ENABLED

SQL> @mystats
Enter value for name: write direct

NAME                                                     VALUE
------------------------------------------------------- -------
physical writes direct                                   0

SQL>
SQL>
SQL> insert /*+ append */ into big_emp select * from hr.employees;

107 rows created.

SQL> @mystats
Enter value for name: direct

NAME                                                     VALUE
------------------------------------------------------- -------
physical writes direct                                   0

SQL> select count(*) from big_emp;

COUNT(*)
--------
     107

The APPEND hint definitely does not do what it is intended to do in this case. The inserts are not done with direct path writes, as shown by the physical direct writes statistic and by the fact that you can select from the table after the insert. (If the insert had been done with direct path writes, you would have had to issue a commit or rollback before you could select from the table). Listing 13-3 shows the expected behavior if you disable the foreign key constraint responsible for disabling the APPEND hint.

Listing 13-3.  Disabling Constraint Enables APPEND Hint

SQL> alter table big_emp disable constraint BIG_EMP_MANAGER_FK;

Table altered.

SQL> insert /*+ append */ into big_emp select * from hr.employees;

107 rows created.

SQL> @mystats
Enter value for name: direct

NAME                                                                   VALUE
------------------------------------------------------------ ---------------
physical writes direct                                                     2

SQL> select count(*) from big_emp;
select count(*) from big_emp
                     *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

The direct path method is clearly used in this example, as you can see from the statistics and from the fact that you cannot select from the table without issuing a commit first. By the way, the error message is a bit of a red herring. It says that the object was modified in parallel, which in this case is not true. This is a holdover from an earlier version in which a parallel insert was the only way to do an insert above the high-water mark. Next I discuss a couple of unusual variants on the INSERT statement.

Multitable Inserts

The multitable insert is rarely used, even though it has been around since at least version 9i. This construct can be useful for ETL-type processing when data are staged and then rearranged as they are loaded into permanent structures. In these cases, it is fairly common to stage data in a nonnormalized format that is later split into multiple tables or some other more normalized structure. The multitable insert is a convenient way to accomplish this type of work without having to write a bunch of procedural code. The syntax is very straightforward; just use INSERT ALL and then supply multiple INTO clauses.

These clauses can specify the same or different tables. Only one set of input values can be used (either via a VALUES clause or a subquery), but the individual values can be reused or not used at all. Listing 13-4 shows an example of the syntax inserting into a single table. (Note that the scripts are provided in the online code suite to create the people and denormalized_people tables).

Listing 13-4.  Basic Multitable Insert into a Single Table

INSERT ALL
  INTO people (person_id, first_name, last_name) -- the parent
    VALUES (person_id, first_name, last_name)
  INTO people (first_name, last_name, parent_id) -- the child
    VALUES (child1, last_name, person_id)
  INTO people (first_name, last_name, parent_id) -- the child
    VALUES (child2, last_name, person_id)
  INTO people (first_name, last_name, parent_id) -- the child
    VALUES (child3, last_name, person_id)
  INTO people (first_name, last_name, parent_id) -- the child
    VALUES (child4, last_name, person_id)
  INTO people (first_name, last_name, parent_id) -- the child
    VALUES (child5, last_name, person_id)
  INTO people (first_name, last_name, parent_id) -- the child
    VALUES (child6, last_name, person_id)
  SELECT person_id, first_name, last_name,
         child1, child2, child3, child4, child5, child6
FROM denormalized_people;

This example shows that multiple INTO clauses can be used, although in this case all the INTO clauses referenced the same table. You can just as easily insert into multiple tables (hence the term multitable insert), as shown in Listing 13-5.

Listing 13-5.  Basic Multitable Insert

INSERT ALL
  INTO parents (person_id, first_name, last_name)
    VALUES (person_id, first_name, last_name)
  INTO children (first_name, last_name, parent_id)
    VALUES (child1, last_name, person_id)
  INTO children (first_name, last_name, parent_id)
    VALUES (child1, last_name, person_id)
  INTO children (first_name, last_name, parent_id)
    VALUES (child1, last_name, person_id)
  INTO children (first_name, last_name, parent_id)
    VALUES (child1, last_name, person_id)
  INTO children (first_name, last_name, parent_id)
    VALUES (child1, last_name, person_id)
  INTO children (first_name, last_name, parent_id)
    VALUES (child1, last_name, person_id)
SELECT person_id, first_name, last_name,
       child1, child2, child3, child4, child5, child6
FROM denormalized_people;

Conditional Insert

The INSERT command also has the ability to do conditional processing. It’s like having a CASE statement embedded in the INSERT statement.

In the previous example, you inserted a record for every child, but most likely some of the child columns are null in this kind of a repeating column layout. So, it would be nice if we could avoid creating these records without having to write procedural code. This is exactly the situation a conditional insert was built for. By the way, this type of data layout is often seen when loading files from external systems. Creating external tables on files is an excellent way to load them, and it allows these less common insert options to be applied directly to the data-loading process, rather than after the files have been staged in an Oracle table. Listing 13-6 shows an example of the conditional insert in which the parent fields are always loaded but the child fields are loaded only if they have data in them.

Listing 13-6.  Conditional Insert

INSERT ALL
WHEN 1=1 THEN -- always insert the parent
  INTO people (person_id, first_name, last_name)
    VALUES (person_id, first_name, last_name)
WHEN child1 is not null THEN -- only insert non-null children
  INTO people (first_name, last_name, parent_id)
    VALUES (child1, last_name, person_id)
WHEN child2 is not null THEN
  INTO people (first_name, last_name, parent_id)
    VALUES (child2, last_name, person_id)
WHEN child3 is not null THEN
  INTO people (first_name, last_name, parent_id)
    VALUES (child3, last_name, person_id)
WHEN child4 is not null THEN
  INTO people (first_name, last_name, parent_id)
    VALUES (child4, last_name, person_id)
WHEN child5 is not null THEN
  INTO people (first_name, last_name, parent_id)
    VALUES (child5, last_name, person_id)
WHEN child6 is not null THEN
  INTO people (first_name, last_name, parent_id)
    VALUES (child6, last_name, person_id)
SELECT person_id, first_name, last_name,
  child1, child2, child3, child4, child5, child6
FROM denormalized_people;

DML Error Logging

And now for something really cool: DML error logging. This feature provides a mechanism for preventing your one million-row insert from failing because a few rows had problems. This feature was introduced in 10gR2 and it’s similar to the SQL*Loader error logging feature. DML error logging basically diverts any records that otherwise cause the statement to fail, placing them in an errors table. This is an extremely useful feature that is rarely used, which is a little surprising because it’s very easy to implement. DML error logging also provides excellent performance and saves a lot of coding. Without this feature, we have to create a bad records table, write procedural code to handle any exceptions raised by any single record, insert the problem records into the bad records table, and preserve the integrity of the transaction by handling the error records in an autonomous transaction, which is a lot of work. By the way, the LOG ERRORS clause works with the other DML statements as well (UPDATE, DELETE, and MERGE).

Here is how to enable DML error logging:

  1. Create the error log table using DBMS_ERRLOG.CREATE_ERROR_LOG.
  2. Specify the LOG ERRORS clause on the INSERT.

That’s it. Listing 13-7 shows how the CREATE_ERROR_LOG procedure works.

Listing 13-7.  CREATE_ERROR_LOG

SQL> EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('big_emp', 'big_emp_bad'),
PL/SQL procedure successfully completed.
SQL> desc big_emp
 Name                                            Null?    Type
 ----------------------------------------------- -------- ----------------
 EMPLOYEE_ID                                              NUMBER(6)
 FIRST_NAME                                               VARCHAR2(20)
 LAST_NAME                                       NOT NULL VARCHAR2(25)
 EMAIL                                           NOT NULL VARCHAR2(25)
 PHONE_NUMBER                                             VARCHAR2(20)
 HIRE_DATE                                       NOT NULL DATE
 JOB_ID                                          NOT NULL VARCHAR2(10)
 SALARY                                                   NUMBER(8,2)
 COMMISSION_PCT                                           NUMBER(2,2)
 MANAGER_ID                                               NUMBER(6)
 DEPARTMENT_ID                                            NUMBER(4)

SQL> desc big_emp_bad
 Name                                            Null?    Type
 ----------------------------------------------- -------- -----------------
 ORA_ERR_NUMBER$                                          NUMBER
 ORA_ERR_MESG$                                            VARCHAR2(2000)
 ORA_ERR_ROWID$                                           ROWID
 ORA_ERR_OPTYP$                                           VARCHAR2(2)
 ORA_ERR_TAG$                                             VARCHAR2(2000)
 EMPLOYEE_ID                                              VARCHAR2(4000)
 FIRST_NAME                                               VARCHAR2(4000)
 LAST_NAME                                                VARCHAR2(4000)
 EMAIL                                                    VARCHAR2(4000)
 PHONE_NUMBER                                             VARCHAR2(4000)
 HIRE_DATE                                                VARCHAR2(4000)
 JOB_ID                                                   VARCHAR2(4000)
 SALARY                                                   VARCHAR2(4000)
 COMMISSION_PCT                                           VARCHAR2(4000)
 MANAGER_ID                                               VARCHAR2(4000)
 DEPARTMENT_ID                                            VARCHAR2(4000)

As you can see, all the columns in the errors table are created as VARCHAR2(4000), which allows columns of most datatypes to be inserted into the errors table, even if records are failing as a result of data being too large to fit into a column or because of inconsistent datatype issues, such as number columns that contain nonnumeric data. There are also a few extra columns for the error number, the error message, and the rowid. Last, there is a column called ORA_ERR_TAG$ that allows user-defined data to be placed in the row for debugging purposes (such as the step the ETL process was on, or something of that nature).

The syntax is very straightforward. You simply add the keywords LOG ERRORS INTO and specify the name of your errors table. As another option, you can tell Oracle how many errors to allow before giving up and canceling the statement. This is done with the REJECT LIMIT clause. Note that, by default, REJECT LIMIT is set to zero, so if you hit one error, the statement aborts and rolls back (just the statement, not the transaction). The single error is preserved in the errors table, though. In most cases, you probably want to set REJECT LIMIT to UNLIMITED, which allows the INSERT statement to complete regardless of how many records are diverted to the errors table. It is somewhat surprising that UNLIMITED is not the default, because this is the most common usage. Listing 13-8 shows a simple example.

Listing 13-8.  Insert Error Logging

SQL>
SQL> insert into big_emp
  2      (employee_id, first_name, last_name,
  3       hire_date, email, department_id)
  4    values (300,'Bob', 'Loblaw',
  5           '01-jan-10', '[email protected]', 12345)
  6    log errors into big_emp_bad;
         '01-jan-10', '[email protected]', 12345)
                                                    *
ERROR at line 5:
ORA-12899: value too large for column "KRM"."BIG_EMP"."EMAIL" (actual: 26, maximum: 25)

SQL> insert into big_emp
  2      (employee_id, first_name, last_name,
  3       hire_date, email, department_id)
  4    values (301,'Bob', 'Loblaw',
  5           '01-jan-10', '[email protected]', 12345)
  6    log errors into big_emp_bad;
         '01-jan-10', '[email protected]', 12345)
                                          *
ERROR at line 5:
ORA-01400: cannot insert NULL into ("KRM"."BIG_EMP"."JOB_ID")

SQL> insert into big_emp
  2      (employee_id, first_name, last_name,
  3       hire_date, email, department_id,job_id)
  4    values (302,'Bob', 'Loblaw',
  5           '01-jan-10', '[email protected]', 12345, 1)
  6    log errors into big_emp_bad;
         '01-jan-10', '[email protected]', 12345, 1)
                                          *
ERROR at line 5:
ORA-01438: value larger than specified precision allowed for this column

SQL> insert into big_emp
  2       (employee_id, first_name, last_name,
  3        hire_date, email, department_id,job_id)
  4    values (303,'Bob', 'Loblaw',
  5           '01-jan-10', '[email protected]', '2A45', 1)
  6    log errors into big_emp_bad;
         '01-jan-10', '[email protected]', '2A45', 1)
                                          *
ERROR at line 5:
ORA-01722: invalid number

SQL>
SQL> SELECT ORA_ERR_MESG$, ORA_ERR_TAG$, employee_id FROM big_emp_bad;

ORA_ERR_MESG$
--------------------------------------------------------------------------
ORA_ERR_TAG$
--------------------------------------------------------------------------
EMPLOYEE_ID
--------------------------------------------------------------------------
ORA-01438: value larger than specified precision allowed for this column

302

ORA-01722: invalid number

303

ORA-12899: value too large for column "KRM"."BIG_EMP"."EMAIL" (actual: 26, maximum: 25)

300

ORA-01400: cannot insert NULL into ("KRM"."BIG_EMP"."JOB_ID")

301

Improving Insert Error Logging

The example in Listing 13-8 shows several insert statements, all of which fail. The records that fail to be inserted into the table, regardless of the error that caused the failure, are inserted automatically into the errors table. Because I didn’t specify a value for REJECT LIMIT, all the statements are rolled back when they encounter the first error. Therefore, no records are actually inserted into the big_emp table. All the error records are preserved, though. I did this to demonstrate that a single errors table can be reused for multiple loads, preserving the records across multiple insert statements. Note that error logging is rarely used in this manner in real life. In real life, REJECT LIMIT is, in general, set to UNLIMITED. Listings 13-9 and 13-10 show better examples of using a multirow INSERT statement. Listing 13-9 shows what happens to an insert when a record fails without the error logging clause; Listing 13-10 shows how it works with the error logging clause.

Listing 13-9.  Better Insert Error Logging

SQL> set echo on
SQL> create table test_big_insert as select * from dba_objects where 1=2;

Table created.

SQL>
SQL> desc test_big_insert
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------
 OWNER                                              VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(128)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                          NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                            DATE
 LAST_DDL_TIME                                      DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)
 NAMESPACE                                          NUMBER
 EDITION_NAME                                       VARCHAR2(30)

SQL>
SQL> alter table test_big_insert modify object_id number(2);

Table altered.

SQL>
SQL> insert into test_big_insert
  2    select * from dba_objects
  3    where object_id is not null;
  select * from dba_objects
         *
ERROR at line 2:
ORA-01438: value larger than specified precision allowed for this column

Because I set up the situation, I have a pretty good idea which column is causing the problem. The object_id column is modified in the listing. But, in real life, the troublesome column is not usually so obvious. In fact, without the Error Logging clause, it can be quite difficult to determine which row is causing the problem.

The error message doesn’t give me any information about which column or which record is causing the failure. I can determine which column is causing the problem by specifying the column names manually in the SELECT. However, there is no way to know which rows are causing the problem. The Error Logging clause in Listing 13-10 solves both problems. Remember that all the column’s values are saved in the errors table, along with the error messages, making it easy to determine where the problem lies.

Listing 13-10.  Better Insert Error Logging (continued)

SQL>
SQL> EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('test_big_insert', 'tbi_errors'),

PL/SQL procedure successfully completed.

SQL>
SQL> insert into test_big_insert
  2    select * from dba_objects
  3    where object_id is not null
  4    log errors into tbi_errors
  5    reject limit unlimited;

98 rows created.

SQL>
SQL> select count(*) from dba_objects
  2  where object_id is not null
  3  and length(object_id) < 3;

  COUNT(*)
----------
        98
SQL> select count(*) from test_big_insert;

  COUNT(*)
----------
        98
SQL>
SQL> select count(*) from dba_objects
  2  where object_id is not null
  3  and length(object_id) > 2;

  COUNT(*)
----------
     73276
SQL> select count(*) from tbi_errors;

  COUNT(*)
----------
     73276
SQL> rollback;

Rollback complete.

SQL> select count(*) from test_big_insert;

  COUNT(*)
----------
         0
SQL> select count(*) from tbi_errors;

  COUNT(*)
----------
     73282

This example shows the Error Logging clause with REJECT LIMIT at UNLIMITED, which allows the statement to complete despite the fact that most of the records fail to be inserted. In addition, you can see that, although a rollback removed the records from the base table, the error records remain.

DML Error Logging Restrictions

Although DML error logging is extremely robust, you should be aware of the following caveats:

  • The LOG ERRORS clause does not cause implicit commits. The insert of error records is handled as an autonomous transaction—meaning, you can commit or rollback the entire set of records inserted into the base table (along with other pending changes), even if errors are returned and bad records are inserted into the errors table. The records loaded into the errors table are preserved even if the transaction is rolled back.
  • The LOG ERRORS clause does not disable the APPEND hint. Inserts into the base table are done using the direct path write mechanism if the APPEND hint is used. However, any inserts into the errors table do not use direct path writes. In general, this is not a problem because you rarely expect to load a lot of data into an errors table.
  • Direct path insert operations that violate a unique constraint or index cause the statement to fail and roll back.
  • Any UPDATE operation that violates a unique constraint or index causes the statement to fail and rollback.
  • Any operation that violates a deferred constraint causes the statement to fail and rollback.
  • The LOG ERRORS clause does not track the values of LOBs, LONGs, or object-type columns. It can be used with tables that contain these unsupported types of columns, but the unsupported columns are not added to the errors table. To create the errors table for a table that contains unsupported column types, you must use the skip_unsupported parameter of the CREATE_ERROR_LOG procedure. The default for this parameter is FALSE, which causes the procedure to fail when attempting to create an errors table for a table with unsupported column types. Listing 13-11 shows the proper syntax for creating an errors table when there are unsupported column types in the base table.

Listing 13-11.  DBMS_ERRLOG.CREATE_ERROR_LOG Parameters

exec DBMS_ERRLOG.CREATE_ERROR_LOG(err_log_table_owner => '&owner', -
                             dml_table_name => '&table_name', -
                             err_log_table_name => '&err_log_table_name', -
                             err_log_table_space => NULL, -
                             skip_unsupported => TRUE);

As you can see, the INSERT statement has several options that are rarely used. The most useful of these features, in my opinion, is DML error logging (which can also be used with the other DML commands). It allows very difficult problems, such as corruption issues, to be identified fairly easily, and it provides excellent performance compared with the row-by-row processing that is required without it. Note also the fairly extreme performance improvement provided by direct path inserts vs. conventional inserts; there are drawbacks with regard to recoverability and serialization, but for bulk loading of data, the positives generally far outweigh the negatives.

UPDATE

Massive updates are almost always a bad idea. I recently reviewed a system that updates a billion-plus rows in a single table every night—a full year forecast and every single value is recalculated every night. Aside from the observation that forecasting that far in the future is not necessary for items that have a 90-day turnaround time, it’s much faster to load a billion records from scratch than to update a billion records.

Using CTAS vs. UPDATE

The traditional method when doing this type of processing is to do a truncate and then a reload. But, what if the truncate-and-reload method just won’t work? One alternative is to use Create Table As Select (CTAS) to create a new table and then just replace the original table with the newly created one. It sounds easy if you say it fast. Of course, there are many details that must be addressed. Listing 13-12 shows a quick demonstration of the potential difference in performance between these two approaches.

Listing 13-12.  Performance Delta between UPDATE and CTAS

SQL> set autotrace on
SQL> set timing on
SQL> update skew2 set col1 = col1*1;

32000004 rows updated.

Elapsed: 00:27:56.41

Execution Plan
----------------------------------------------------------
Plan hash value: 1837483169

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |       |    32M|   793M| 28370   (1)| 00:05:41 |
|   1 |  UPDATE            | SKEW2 |       |       |            |          |
|   2 |   TABLE ACCESS FULL| SKEW2 |    32M|   793M| 28370   (1)| 00:05:41 |
----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
       1908  recursive calls
   32743098  db block gets
     163363  consistent gets
     317366  physical reads
 8187521328  redo size
       1373  bytes sent via SQL*Net to client
       1097  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
   32000004  rows processed

SQL> create table skew_temp as
  2  select pk_col, col1*1 col1, col2, col3, col4 from skew2;

Table created.

Elapsed: 00:00:44.30

SQL> set timing off
SQL>
SQL> select count(*) from skew_temp;

  COUNT(*)
----------
  32000004

SQL> @find_sql_stats
Enter value for sql_text: %skew2%
Enter value for sql_id:

SQL_ID        ROWS_PROCESSED  AVG_ETIME  AVG_PIO      AVG_LIO SQL_TEXT
------------- -------------- ---------- -------- ------------ --------------
2aqsvr3h3qrrg       32000004   1,676.78  928,124   65,409,243 update skew2
                                                              set col1 = col
4y4dquf0mkhup       32000004      44.30  162,296      492,575 create table
                                                              skew_temp as s

As you can see, the update took almost 30 minutes (1676.78 seconds) whereas the CTAS table took less than a minute (44.30 seconds). So it’s clear that there are significant performance benefits to be had by recreating the table vs. updating all the records. And as you might already expect from the previous example, recreating a table can also be more efficient than updating a relatively small portion of the rows. Listing 13-13 shows a comparison of the two methods when updating approximately 10 percent of the rows.

Listing 13-13.  Performance Delta between UPDATE and CTAS: 10 Percent

SQL> select count(*) from skew2 where col1 = 1;

  COUNT(*)
----------
   3199971

Elapsed: 00:00:10.90
SQL> select 3199971/32000004 from dual;

3199971/32000004
----------------
      .099999081

Elapsed: 00:00:00.01
SQL> -- about 10% of the rows col1=1
SQL>
SQL> update skew2 set col1=col1*1 where col1 = 1;

3199971 rows updated.

Elapsed: 00:03:11.63
SQL> drop table skew_temp;

Table dropped.

Elapsed: 00:00:00.56
SQL> create table skew_temp as
  2  select pk_col, case when col1 = 1 then col1*1 end col1,
  3  col2, col3, col4 from skew2;

Table created.

Elapsed: 00:01:23.62

SQL> alter table skew2 rename to skew_old;

Table altered.

Elapsed: 00:00:00.06
SQL> alter table skew_temp rename to skew2;

Table altered.

Elapsed: 00:00:00.05

In this example, I recreated a table using CTAS in less than half the time it took to update about 10 percent of the records. Obviously, there are many details that are ignored in the previous two examples. These examples had no constraints or indexes or grants to deal with, making them considerably less complicated than most real-life situations. Each of these complications can be dealt with in an automated fashion, however.

Using INSERT APPEND vs. UPDATE

Listing 13-14 shows a more realistic example of using an INSERT APPEND technique to replace an UPDATE statement. For this example, I use a script from the online code suite called recreate_table.sql. It uses the dbms_metadata package to generate a script with the necessary Data Definition Language (DDL) to recreate a table and its dependent objects. It then uses an INSERT APPEND in place of UPDATE to move the data. The last step is to use ALTER TABLE RENAME to swap the new table for the original one. After the script is generated, it should be edited to customize how the steps are performed. For example, you may want to comment out the swap of the tables via the RENAME at the end until you’re sure everything works as expected. Note that the particulars of the INSERT APPEND also have to be built when editing the script. Note also that the script renames all existing indexes because you cannot have duplicate index names, even if they are on different tables.

Listing 13-14.  INSERT APPEND Instead of a Mass UPDATE

SQL>
SQL> @recreate_table
Enter value for owner: KRM
Enter value for table_name: SKEW2

... Output supressed for readability

SQL> @recreate_SKEW2.sql
SQL> ALTER INDEX SYS_C0029558 RENAME TO SYS_C0029558_OLD;

Index altered.

Elapsed: 00:00:00.02
SQL> ALTER INDEX SKEW2_COL1 RENAME TO SKEW2_COL1_OLD;

Index altered.

Elapsed: 00:00:00.02
SQL> ALTER INDEX SKEW2_COL4 RENAME TO SKEW2_COL4_OLD;

Index altered.

Elapsed: 00:00:00.02
SQL>
SQL>   CREATE TABLE "KRM"."SKEW2_TEMP"
  2     (    "PK_COL" NUMBER,
  3          "COL1" NUMBER,
  4          "COL2" VARCHAR2(30),
  5          "COL3" DATE,
  6          "COL4" VARCHAR2(1)
  7     ) SEGMENT CREATION IMMEDIATE
  8    PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  9    STORAGE(INITIAL 1483735040 NEXT 1048576
               MINEXTENTS 1 MAXEXTENTS 2147483645
 10    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL
 11    DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 12    TABLESPACE "USERS" ;

Table created.

Elapsed: 00:00:00.10
SQL>
SQL>   INSERT /*+APPEND*/ INTO SKEW2_TEMP SELECT /*+PARALLEL(a 4)*/
  2  PK_COL,
  3  COL1,
  4  case when COL1 = 2 then 'ABC' else COL2 end,
  5  COL3,
  6  COL4
  7  FROM SKEW2 a;

32000004 rows created.

Elapsed: 00:00:52.87
SQL>
SQL>   CREATE INDEX "KRM"."SKEW2_COL1" ON "KRM"."SKEW2_TEMP" ("COL1")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING COMPUTE STATISTICS
  3    STORAGE(INITIAL 595591168 NEXT 1048576 MINEXTENTS 1
  4    MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  5    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  6    TABLESPACE "USERS"
  7    PARALLEL 8 ;

Index created.

Elapsed: 00:01:40.16
SQL>
SQL>   CREATE INDEX "KRM"."SKEW2_COL4" ON "KRM"."SKEW2_TEMP" ("COL4")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  3    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  4    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
  5    FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  6    TABLESPACE "USERS"
  7    PARALLEL 8 ;

Index created.

Elapsed: 00:01:11.05
SQL>
SQL>   CREATE UNIQUE INDEX "KRM"."SYS_C0029558"
  2           ON "KRM"."SKEW2_TEMP" ("PK_COL")
  3    PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING COMPUTE STATISTICS
  4    STORAGE(INITIAL 865075200 NEXT 1048576
  5            MINEXTENTS 1 MAXEXTENTS 2147483645
  6    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
  7    FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  8    TABLESPACE "USERS"
  9    PARALLEL 8 ;

Index created.

Elapsed: 00:01:34.26
SQL>
SQL> -- Note: No Grants found!
SQL> -- Note: No Triggers found!
SQL>
SQL>
SQL> ALTER TABLE "KRM"."SKEW2_TEMP" ADD PRIMARY KEY ("PK_COL")
  2    USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
  3    COMPUTE STATISTICS
  4    STORAGE(INITIAL 865075200 NEXT 1048576
  5            MINEXTENTS 1 MAXEXTENTS 2147483645
  6    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
  7    FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  8    TABLESPACE "USERS"  ENABLE;

Table altered.

Elapsed: 00:00:15.16
SQL>
SQL>   ALTER TABLE SKEW2 RENAME TO SKEW2_ORIG;

Table altered.

Elapsed: 00:00:00.04
SQL>
SQL> ALTER TABLE SKEW2_TEMP RENAME TO SKEW2;

Table altered.

Elapsed: 00:00:00.03

The order of the steps is very important. In general, it is much faster to defer the creation of indexes and the enabling of constraints until after loading the data. Be aware that you need to drop the old table manually (maybe after a day or two, when everyone is quite sure the operation works correctly). By the way, I think it’s a really bad idea to drop objects in a script. As a matter of fact, I recommend commenting out the last two statements that do the RENAME. It’s safer to run them interactively after you make sure everything works as planned. For comparison, Listing 13-15 shows the timing of the same change made by using a standard UPDATE statement.

Listing 13-15.  Mass UPDATE Timings for Comparison

SQL> select my_rows, total_rows,
  2  100*my_rows/total_rows row_percent from
  3  (select sum(decode(col1,1,1,0)) my_rows, count(*) total_rows
  4* from skew2)

   MY_ROWS TOTAL_ROWS ROW_PERCENT
---------- ---------- -----------
   8605185   32000004        26.9

1 row selected.

Elapsed: 00:00:01.29
SQL> update /*+ parallel 4 */ skew2 set col2 = 'ABC' where col1 = 2;

8605185 rows updated.

Elapsed: 00:12:37.53

To sum up this example, when modifying roughly 27 percent of the rows in the table, the straight UPDATE took about 12.5 minutes and the rebuild with INSERT APPEND took about 5.5 minutes. Keep in mind that there are many variables I have not covered in detail. Every situation has differences in the number of dependent objects and the percentage of rows affected by the update. These factors have a large effect on the outcome, so test thoroughly in your environment with your specific data.

In this section, you learned that it can be considerably faster to rebuild tables than to update a large percentage of the rows. Obviously, making use of the direct path write via the APPEND hint is an important part of that. The biggest negative to this approach is that the table must be offline for the entire time the rebuild is taking place—or at least protected in some manner from concurrent modifications. This does not usually present a major obstacle because these types of mass updates are rarely done while users are accessing the table. In cases when concurrent access is required, partitioning or materialized views can provide the necessary isolation.

DELETE

Just like massive UPDATEs, massive DELETEs are almost always a bad idea. In general, it is faster (if somewhat more complicated) to recreate a table or partition (without the rows you wish to eliminate) than it is to delete a large percentage of the rows. The biggest downside to the approach of recreating is that the object must be protected from other changes while it is being rebuilt. This is basically the same approach I used in the previous section with the UPDATE command, but DELETEs can be even more time-consuming.

The basic idea is pretty much the same as with the mass UPDATEs:

  1. Create a temporary table.
  2. Insert the records that are not to be deleted into the temporary table.
  3. Recreate the dependent objects (indexes, constraints, grants, triggers).
  4. Rename the tables.

Let’s use the recreate_table.sql script again to create a script that I can edit, then I’ll modify the INSERT statement to give me the records that are left behind after my DELETE. Listing 13-16 shows an example of how a DELETE statement compares with a rebuild using a reciprocal INSERT statement.

Listing 13-16.  Mass DELETE

SQL > delete from skew2 where col1=1;
  
3199972 rows deleted.

Elapsed: 00:04:12.64

SQL> rollback;

Rollback complete.

Elapsed: 00:01:48.59

SQL> @recreate_SKEW3.sql
SQL> set timing on
SQL>
SQL> ALTER INDEX SYS_C0029558 RENAME TO SYS_C0029558_OLD;

Index altered.

Elapsed: 00:00:00.03
SQL> ALTER INDEX SKEW2_COL1 RENAME TO SKEW2_COL1_OLD;

Index altered.

Elapsed: 00:00:00.04
SQL> ALTER INDEX SKEW2_COL4 RENAME TO SKEW2_COL4_OLD;

Index altered.

Elapsed: 00:00:00.02
SQL>
SQL>   CREATE TABLE "KRM"."SKEW2_TEMP"
  2     (    "PK_COL" NUMBER,
  3          "COL1" NUMBER,
  4          "COL2" VARCHAR2(30),
  5          "COL3" DATE,
  6          "COL4" VARCHAR2(1)
  7     ) SEGMENT CREATION IMMEDIATE
  8    PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  9    STORAGE(INITIAL 1483735040 NEXT 1048576
 10    MINEXTENTS 1 MAXEXTENTS 2147483645
 11    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
 12    FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 13    TABLESPACE "USERS" ;

Table created.

Elapsed: 00:00:00.11
SQL>
SQL>   INSERT /*+APPEND*/ INTO SKEW2_TEMP SELECT /*+PARALLEL(a 4)*/
  2  PK_COL,
  3  COL1,
  4  COL2,
  5  COL3,
  6  COL4
  7  FROM SKEW2 a where col1 != 1;

28800032 rows created.

Elapsed: 00:00:42.30
SQL>
SQL>   CREATE INDEX "KRM"."SKEW2_COL1" ON "KRM"."SKEW2_TEMP" ("COL1")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING COMPUTE STATISTICS
  3    STORAGE(INITIAL 595591168 NEXT 1048576
  4            MINEXTENTS 1 MAXEXTENTS 2147483645
  5    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
  6    FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  7    TABLESPACE "USERS"
  8    PARALLEL 8 ;

Index created.

Elapsed: 00:01:36.50
SQL>
SQL>   CREATE INDEX "KRM"."SKEW2_COL4" ON "KRM"."SKEW2_TEMP" ("COL4")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  3    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  4    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
  5    FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  6    TABLESPACE "USERS"
  7    PARALLEL 8 ;

Index created.

Elapsed: 00:01:09.43
SQL>
SQL>   CREATE UNIQUE INDEX "KRM"."SYS_C0029558"
  2           ON "KRM"."SKEW2_TEMP" ("PK_COL")
  3    PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING COMPUTE STATISTICS
  4    STORAGE(INITIAL 865075200 NEXT 1048576
  5            MINEXTENTS 1 MAXEXTENTS 2147483645
  6    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
  7    FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  8    TABLESPACE "USERS"
  9    PARALLEL 8 ;

Index created.

Elapsed: 00:01:26.30
SQL>
SQL>    -- Note: No Grants found!
SQL>    -- Note: No Triggers found!
SQL>
SQL>   ALTER TABLE "KRM"."SKEW2_TEMP" ADD PRIMARY KEY ("PK_COL")
  2    USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
  3    NOLOGGING COMPUTE STATISTICS
  4    STORAGE(INITIAL 865075200 NEXT 1048576
  5            MINEXTENTS 1 MAXEXTENTS 2147483645
  6    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
  7    FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  8    TABLESPACE "USERS"  ENABLE;

Table altered.

Elapsed: 00:00:20.42

Similar to the comparison with the UPDATE statement, the rebuild provides a viable alternative. In this example, I deleted roughly 10 percent of the records. The DELETE took about 4.25 minutes and the rebuild took about 5.25 minutes. In this case, the straight DELETE was actually faster. But, as the number of records increases, the time to rebuild remains basically the same whereas the time to run the DELETE increases. Eventually, there is a point when the rebuild becomes much cheaper than the DELETE.

Truncate

I am sure you are aware of the TRUNCATE command, but I should mention it here anyway. If you need to delete all the rows from a table or a partition, the TRUNCATE command is the way to do it. Truncating a table moves the high-water mark rather than actually changing all the blocks that hold records. It is blazingly fast compared with using the DELETE command. There are only a few very minor negatives.

  • TRUNCATE is a DDL command, so it issues an implicit commit; once a table is truncated, there is no going back.
  • You cannot flash back to the state of the table prior to the truncate.
  • TRUNCATE works on the whole table or nothing.
  • In a RAC environment, all the nodes must be sent a message to invalidate the buffers of the table (which is not the case with a DELETE).

In addition to completing extremely quickly, the TRUNCATE command makes a big difference for future queries on the table. Because full table scans read every block to the high-water mark, and the DELETE command has no effect on the high-water mark, you may be giving up performance gains for future statements.

MERGE

The MERGE statement was introduced in Oracle Database 9i. It provides the classic UPSERT functionality. MERGE updates the record if one already exists, or inserts a new record if one doesn’t already exist. (Oracle Database 10g enhanced the MERGE command to allow it to delete records as well.) The idea is to eliminate the extra code necessary to do error checking, and to eliminate the additional round-trips to the database when it’s necessary to issue additional SQL statements (such as write a piece of code that attempts an update, check the status of the update, and, if the update fails, then issue the insert). The MERGE statement does all this at the database level without all the additional code. Obviously, it performs better than the procedural code version.

Syntax and Usage

The syntax of the typical MERGE statement is relatively easy to follow. The following is the basic syntax of a MERGE statement:

MERGE INTO table_name
USING (subquery) ON (subquery.column = table.column)
WHEN MATCHED THEN UPDATE ...
WHEN NOT MATCHED THEN INSERT ...

The first part of the MERGE statement looks just like an INSERT, specifying the table (or view) that is to be the target of the inserted, updated, or deleted data. The USING keyword specifies a data source (usually a subquery, although it could be a staging table as well) and a join condition that tells Oracle how to determine whether a record already exists in the target table. In addition, you must add an UPDATE clause or an INSERT clause or both. In most cases, you see both because there is little value in using the MERGE statement without both clauses. Now let’s move on to the UPDATE and INSERT clauses (they probably should have called these the WHEN MATCHED and WHEN NOT MATCHED clauses instead).

The UPDATE clause tells Oracle what to do when a matching record is found. In most cases, finding a matching record results in an update to that record. There is also an optional WHERE clause that can be used to limit which records are updated, even if there is a match. Alternatively, you can delete matching records using yet another WHERE clause. Note that the records to be deleted must pass the criteria in the main WHERE clause and the criteria in the DELETE WHERE clause.

The DELETE clause is not actually used that often. It can be handy, though, for a job that needs to do more than just load data. For example, some ETL processes also perform cleanup tasks. For the DELETE portion of the UPDATE clause to kick in, a matching record must be found that passes the WHERE clause in the UPDATE clause as well as the WHERE clause associated with the DELETE. Listing 13-17 shows the MERGE command with an UPDATE clause that contains a DELETE.

Listing 13-17.  MERGE with UPDATE Clause

MERGE INTO big_emp t
USING (select * from hr.employees) s
ON (t.employee_id = s.employee_id)
WHEN MATCHED THEN UPDATE SET
--  t.employee_id = s.employee_id, -- ON clause columns not allowed
  t.first_name = t.first_name,
  t.last_name = s.last_name ,
  t.email = s.email ,
  t.phone_number = s.phone_number ,
  t.hire_date = s.hire_date ,
  t.job_id = s.job_id ,
  t.salary = s.salary ,
  t.commission_pct = s.commission_pct ,
  t.manager_id = s.manager_id ,
  t.department_id = s.department_id
     WHERE (S.salary <= 3000)
DELETE WHERE (S.job_id = 'FIRED'),

The INSERT clause tells Oracle what to do when a matching record is not found. In general, this means “do an insert.” However, the INSERT clause can be left off altogether. There is also an optional WHERE clause that can be applied, so it is not always the case that an insert is done if a match is not found. Listing 13-18 shows two versions of a MERGE statement with an INSERT clause.

Listing 13-18.  MERGE with INSERT Clause

MERGE INTO big_emp t
USING (select * from hr.employees) s
ON (t.employee_id = s.employee_id)
WHEN NOT MATCHED THEN INSERT
(t.employee_id ,
 t.first_name ,
 t.last_name ,
 t.email ,
 t.phone_number ,
 t.hire_date ,
 t.job_id ,
 t.salary ,
 t.commission_pct ,
 t.manager_id ,
 t.department_id)
VALUES
(s.employee_id ,
 s.first_name ,
 s.last_name ,
 s.email ,
 s.phone_number ,
 s.hire_date ,
 s.job_id ,
 s.salary ,
 s.commission_pct ,
 s.manager_id ,
 s.department_id)
     WHERE (S.job_id != 'FIRED'),

MERGE INTO big_emp t
USING (select * from hr.employees where job_id != 'FIRED') s
ON (t.employee_id = s.employee_id)
WHEN NOT MATCHED THEN INSERT
(t.employee_id ,
 t.first_name ,
 t.last_name ,
 t.email ,
 t.phone_number ,
 t.hire_date ,
 t.job_id ,
 t.salary ,
 t.commission_pct ,
 t.manager_id ,
 t.department_id)
VALUES
(s.employee_id ,
 s.first_name ,
 s.last_name ,
 s.email ,
 s.phone_number ,
 s.hire_date ,
 s.job_id ,
 s.salary ,
 s.commission_pct ,
 s.manager_id ,
 s.department_id);

The statements accomplish the same thing but use a slightly different mechanism. One qualifies the set of records to be merged in the subquery in the USING clause; the other qualifies the statements to be merged in the WHERE clause inside the INSERT clause. Be aware that these two forms can have different performance characteristics and may even result in different plans. Listing 13-19 shows a more realistic example with both the INSERT clause and the UPDATE clause. Note that the UPDATE clause also contains a DELETE WHERE clause that cleans up records of employees who have been fired.

Listing 13-19.  Full MERGE

SQL>
SQL> -- delete from big_emp where employee_id > 190;
SQL> -- insert into hr.jobs select 'FIRED', 'Fired', 0, 0 from dual;
SQL> -- update hr.employees set job_id = 'FIRED' where employee_id=197;
SQL> MERGE /*+ APPEND */ INTO big_emp t
  2  USING (select * from hr.employees) s
  3  ON (t.employee_id = s.employee_id)
  4  WHEN MATCHED THEN UPDATE SET
  5  --  t.employee_id = s.employee_id,
  6    t.first_name = t.first_name,
  7    t.last_name = s.last_name ,
  8    t.email = s.email ,
  9    t.phone_number = s.phone_number ,
 10    t.hire_date = s.hire_date ,
 11    t.job_id = s.job_id ,
 12    t.salary = s.salary ,
 13    t.commission_pct = s.commission_pct ,
 14    t.manager_id = s.manager_id ,
 15    t.department_id = s.department_id
 16       WHERE (S.salary <= 3000)
 17       DELETE WHERE (S.job_id = 'FIRED')
 18  WHEN NOT MATCHED THEN INSERT
 19  (t.employee_id ,
 20   t.first_name ,
 21   t.last_name ,
 22   t.email ,
 23   t.phone_number ,
 24   t.hire_date ,
 25   t.job_id ,
 26   t.salary ,
 27   t.commission_pct ,
 28   t.manager_id ,
 29   t.department_id)
 30  VALUES
 31  (s.employee_id ,
 32   s.first_name ,
 33   s.last_name ,
 34   s.email ,
 35   s.phone_number ,
 36   s.hire_date ,
 37   s.job_id ,
 38   s.salary ,
 39   s.commission_pct ,
 40   s.manager_id ,
 41   s.department_id)
 42       WHERE (S.job_id != 'FIRED'),

88140 rows merged.

Elapsed: 00:00:06.51

Performance Comparison

So how does the MERGE statement compare with a straight INSERT or CTAS operation? Obviously, there is some inherent overhead in the MERGE statement that makes such a comparison an unfair test, but MERGE is no slouch. Keep in mind that just like with the INSERT command, the fastest way to load a lot of data is to make sure it uses the direct path mechanism by using the APPEND hint. Listing 13-20 compares the performance of INSERT, MERGE, and CTAS. It also demonstrates that all are capable of doing direct path writes.

Listing 13-20.  INSERT, MERGE, and CTAS Performance Comparison

SQL> @compare_insert_merge_ctas.sql

Table dropped.

Elapsed: 00:00:00.69
SQL> @flush_pool
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.46
SQL> select name, value from v$mystat s, v$statname n
  2  where n.statistic# = s.statistic# and name = 'physical writes direct';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical writes direct                                                    0

Elapsed: 00:00:00.03
SQL> create /* compare_insert_merge_ctas.sql */ table skew3
  2  as select * from skew;

Table created.

Elapsed: 00:00:32.92
SQL> select name, value from v$mystat s, v$statname n
  2  where n.statistic# = s.statistic# and name = 'physical writes direct';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical writes direct                                               163031

Elapsed: 00:00:00.03
SQL>
SQL> truncate table skew3 drop storage;

Table truncated.

Elapsed: 00:00:01.01
SQL> INSERT /*+ APPEND */ /* compare_insert_merge_ctas.sql */
  2 INTO skew3 select * from skew;

32000004 rows created.

Elapsed: 00:00:31.23
SQL> select name, value from v$mystat s, v$statname n
  2  where n.statistic# = s.statistic# and name = 'physical writes direct';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical writes direct                                               326062

Elapsed: 00:00:00.03
SQL>
SQL> truncate table skew3 drop storage;

Table truncated.

Elapsed: 00:00:00.84
SQL> MERGE /*+ APPEND */ /* compare_insert_merge_ctas.sql */
  2  INTO skew3 t
  3  USING (select * from skew) s
  4  ON (t.pk_col = s.pk_col)
  5  WHEN NOT MATCHED THEN INSERT
  6  (t.pk_col, t.col1, t.col2, t.col3, t.col4)
  7       VALUES (s.pk_col, s.col1, s.col2, s.col3, s.col4);

32000004 rows merged.

Elapsed: 00:00:49.07
SQL> select name, value from v$mystat s, v$statname n
  2  where n.statistic# = s.statistic# and name = 'physical writes direct';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical writes direct                                               489093

Elapsed: 00:00:00.01

SQL> @fss2
Enter value for sql_text: %compare_insert%
Enter value for sql_id:

SQL_ID        AVG_ETIME AVG_CPU  AVG_PIO  AVG_LIO SQL_TEXT
------------- --------- ------- -------- -------- --------------------
6y6ms28kzzb5z     49.07   48.48  162,294  490,664 MERGE /*+ APPEND */
g1pf9b564j7yn     31.22   30.93  162,296  489,239 INSERT /*+ APPEND */
g909cagdbs1t5     32.91   31.25  162,294  494,480 create /* compare_in

In this very simple test, you can see that all three approaches are able to use the direct path writes, and that CTAS and INSERT are very similar in performance. The MERGE statement is considerably slower, as expected because of its additional capabilities and the necessary overhead associated with them. But, the MERGE statement provides the most flexibility, so don’t overlook the fact that this single statement can perform multiple types of DML with a single execution.

Summary

There are four SQL commands for modifying data: INSERT, UPDATE, DELETE, and MERGE (and the latter is actually capable of performing all three functions). In this chapter I discussed these commands briefly and focused on one key performance concept: Direct path inserts are much, much faster than conventional inserts. There is a good reason for this difference in performance. Direct path inserts do a lot less work. There are a number of drawbacks when using the technique, however. The biggest drawback is that it is a serial operation; only one process can be engaging in a direct path insert on a table at any given time, and any other process that wishes to do the same simply has to wait. Another big drawback is that available space that’s already allocated to the table is not used by direct path inserts. For these reasons, it’s only applicable to large batch-type loading operations. Nevertheless, it is the fastest way to insert data into a table and, as such, should be considered whenever performance is among the most important decision-making criteria. Techniques have been developed for using direct path inserts in the place of updates and deletes. We explored a couple of these techniques in this chapter as well. Last, I explained several of the lesser known options of the DML commands, including the extremely powerful Error Logging clause, which can be applied to all four of the DML commands.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.41.229