CHAPTER 10
Database Tables

In this chapter, we will discuss the various types of database tables and cover when you might want to use each type (i.e., when one type of table is more appropriate than another). We will concentrate on the physical storage characteristics of the tables: how the data is organized and stored.

Once upon a time, there was only one type of table, really: a "normal" table. It was managed in the same way a "heap of stuff" is managed (the definition of which appears in the next section). Over time, Oracle added more sophisticated types of tables. Now, in addition to the heap organized table, there are clustered tables (three types of those), index organized tables, nested tables, temporary tables, and object tables. Each table type has different characteristics that make it suitable for use in different application areas.

Types of Tables

We will define each type of table before getting into the details. There are nine major table types in Oracle:

  • Heap organized tables: These are "normal," standard database tables. Data is managed in a heap-like fashion. As data is added, the first free space found in the segment that can fit the data will be used. As data is removed from the table, it allows space to become available for reuse by subsequent INSERTs and UPDATEs. This is the origin of the name "heap" as it refers to this type of table. A heap is a bunch of space, and it is used in a somewhat random fashion.
  • Index organized tables: These tables are stored in an index structure. This imposes physical order on the rows themselves. Whereas in a heap the data is stuffed wherever it might fit, in index organized tables (IOTs) the data is stored in sorted order, according to the primary key.
  • Index clustered tables: Clusters are groups of one or more tables, physically stored on the same database blocks, with all rows that share a common cluster key value being stored physically "near" each other. Two goals are achieved in this structure. First, many tables may be stored physically joined together. Normally, you would expect data from only one table to be found on a database block, but with clustered tables, data from many tables may be stored on the same block. Second, all data that contains the same cluster key value, such as DEPTNO=10, will be physically stored together. The data is "clustered" around the cluster key value. A cluster key is built using a B*Tree index.
  • Hash clustered tables: These tables are similar to clustered tables, but instead of using a B*Tree index to locate the data by cluster key, the hash cluster hashes the key to the cluster to arrive at the database block the data should be on. In a hash cluster, the data is the index (metaphorically speaking). These tables are appropriate for data that is read frequently via an equality comparison on the key.
  • Sorted hash clustered tables: This table type is new in Oracle 10g and combines some aspects of a hash clustered table with those of an IOT. The concept is as follows: you have some key value that rows will be hashed by (say, CUSTOMER_ID), and then a series of records related to that key that arrive in sorted order (timestamp-based records) and are processed in that sorted order. For example, a customer places orders in your order entry system, and these orders are retrieved and processed in a first in, first out (FIFO) manner. In such as system, a sorted hash cluster may be the right data structure for you.
  • Nested tables: These are part of the object-relational extensions to Oracle. They are simply system-generated and -maintained child tables in a parent/child relationship. They work in much the same way as EMP and DEPT in the SCOTT schema. EMP is considered to be a child of the DEPT table, since the EMP table has a foreign key, DEPTNO, that points to DEPT. The main difference is that they are not "stand-alone" tables like EMP.
  • Temporary tables: These tables store scratch data for the life of a transaction or the life of a session. These tables allocate temporary extents, as needed, from the current user's temporary tablespace. Each session will see only the extents that session allocates; it will never see any of the data created in any other session.
  • Object tables: These tables are created based on an object type. They have special attri-butes not associated with non-object tables, such as a system-generated REF (object identifier) for each row. Object tables are really special cases of heap, index organized, and temporary tables, and they may include nested tables as part of their structure as well.
  • External tables: These tables are not stored in the database itself; rather, they reside outside of the database in ordinary operating system files. External tables in Oracle9i and above give you the ability to query a file residing outside the database as if it were a normal table inside the database. They are most useful as a means of getting data into the database (they are a very powerful data-loading tool). Furthermore, in Oracle 10g, which introduces an external table unload capability, they provide an easy way to move data between Oracle databases without using database links. We will look at external tables in some detail in Chapter 15.

Here is some general information about tables, regardless of their type:

  • A table can have up to 1,000 columns, although I recommend against a design that does contain the maximum number of columns, unless there is some pressing need. Tables are most efficient with far fewer than 1,000 columns. Oracle will internally store a row with more than 254 columns in separate row pieces that point to each other and must be reassembled to produce the entire row image.
  • A table can have a virtually unlimited number of rows, although you will hit other limits that prevent this from happening. For example, typically a tablespace can have at most 1,022 files (although there are new BIGFILE tablespaces in Oracle 10g that will get you beyond these file size limits, too). Say you have 32GB files—that is to say, 32,704GB per tablespace. This would be 2,143,289,344 blocks, each of which is 16KB in size. You might be able to fit 160 rows of between 80 to 100 bytes per block. This would give you 342,926,295,040 rows. If you partition the table, though, you can easily multiply this number many times. For example, consider a table with 1,024 hash partitions—that would be 1024 × 342,926,295,040 rows. There are limits, but you'll hit other practical limitations before even coming close to these figures.
  • A table can have as many indexes as there are permutations of columns (and permutations of functions on those columns). With the advent of function-based indexes, the true number of indexes you could create theoretically becomes infinite! Once again, however, practical restrictions will limit the actual number of indexes you will create and maintain.
  • There is no limit to the number of tables you may have, even within a single database. Yet again, practical limits will keep this number within reasonable bounds. You will not have millions of tables (this many is impracticable to create and manage), but you may have thousands of tables.

In the next section, we'll look at some of the parameters and terminology relevant to tables. After that, we'll jump into a discussion of the basic heap-organized table, and then move on to examine the other types.

Terminology

In this section, we will cover the various storage parameters and terminology associated with tables. Not all parameters are used for every table type. For example, the PCTUSED parameter is not meaningful in the context of an IOT. We'll cover the relevant parameters as part of the discussion of each individual table type. The goal is to introduce the terms and define them. As appropriate, more information on using specific parameters is covered in subsequent sections.

Segment

A segment in Oracle is an object that consumes storage on disk. While there are many segment types, the most popular are as follows:

  • Cluster: This segment type is capable of storing tables. There are two types of clusters: B*Tree and hash. Clusters are commonly used to store related data from multiple tables "prejoined" on the same database block and to store related information from a single table together. The term "cluster" refers to this segment's ability to cluster related information physically together.
  • Table: A table segment holds data for a database table and is perhaps the most common segment type used in conjunction with an index segment.
  • Table partition or subpartition: This segment type is used in partitioning and is very similar to a table segment. A table partition or subpartition segment holds just a slice of the data from a table. A partitioned table is made up of one or more table partition segments, and a composite partitioned table is made up of one or more table subpartition segments.
  • Index: This segment type holds an index structure.
  • Index partition: Similar to a table partition, this segment type contains some slice of an index. A partitioned index consists of one or more index partition segments.
  • Lob partition, lob subpartition, lobindex, and lobsegment: The lobindex and lobsegment segments hold the structure of a large object, or LOB. When a table containing a LOB is partitioned, the lobsegment will be partitioned as well—the lob partition segment is used for that. It is interesting to note that there is not a lobindex partition segment type—for whatever reason, Oracle marks the partitioned lobindex as an index partition (one wonders why a lobindex is given a special name!).
  • Nested table: This is the segment type assigned to nested tables, a special kind of "child" table in a master/detail relationship that we'll discuss later.
  • Rollback and Type2 undo: This is where undo data is stored. Rollback segments are those manually created by the DBA. Type2 undo segments are automatically created and managed by Oracle.

So, for example, a table may be a segment. An index may be a segment. I stress the words "may be" because we can partition an index into separate segments. So, the index object itself would just be a definition, not a physical segment—and the index would be made up of many index partitions, and each index partition would be a segment. A table may be a segment or not. For the same reason, we might have many table segments due to partitioning, or we might create a table in a segment called a cluster. Here the table will reside, perhaps with other tables in the same cluster segment.

The most common case, however, is that a table will be a segment and an index will be a segment. That is the easiest way to think of it for now. When you create a table, you are normally creating a new table segment and, as discussed in Chapter 3, that segment consists of extents, and extents consist of blocks. That is the normal storage hierarchy. But it is important to note that only the "common" case has this one-to-one relationship. For example, consider this simple CREATE TABLE statement:

Create table t ( x int primary key, y clob, z blob );

This statement creates six segments. If you issue this CREATE TABLE statement in a schema that owns nothing, you'll observe the following:

ops$tkyte@ORA10G> select segment_name, segment_type
  2  from user_segments;
no rows selected

ops$tkyte@ORA10G> create table t ( x int primary key, y clob, z blob );
Table created.

ops$tkyte@ORA10G> select segment_name, segment_type
  2  from user_segments;

SEGMENT_NAME                   SEGMENT_TYPE
------------------------------ ------------------
SYS_IL0000063631C00002$$       LOBINDEX
SYS_LOB0000063631C00003$$      LOBSEGMENT
SYS_C009783                    INDEX
SYS_IL0000063631C00003$$       LOBINDEX
SYS_LOB0000063631C00002$$      LOBSEGMENT
T                              TABLE

6 rows selected.

The table itself created a segment in this example: the last row in the output. The primary key constraint created an index segment in this case in order to enforce uniqueness.


Note A unique or primary key constraint may or may not create a new index. If there is an existing index on the constrained columns, and these columns are on the leading edge of the index, the constraint can and will use them.


Additionally, each of the LOB columns created two segments: one segment to store the actual chunks of data pointed to by the character large object (CLOB) or binary large object (BLOB) pointer, and one segment to "organize" them. LOBs provide support for very large chunks of information, up to many gigabytes in size. They are stored in chunks in the lobsegment, and the lobindex is used to keep track of where the LOB chunks are and the order in which they should be accessed.

Segment Space Management

Starting in Oracle9i, there are two methods for managing space in segments:

  • Manual Segment Space Management: You set various parameters such as FREELISTS, FREELIST GROUPS, PCTUSED, and others to control how space is allocated, used, and reused in a segment over time. I will refer to this space management method in this chapter as MSSM, but bear in mind that that is a made-up abbreviation that you will not find in the Oracle documentation.
  • Automatic Segment Space Management (ASSM): You control one parameter relating to how space is used: PCTFREE. The others are accepted when the segment is created, but they are ignored.

MSSM is the legacy implementation in Oracle. It has been around for many years, over many versions. ASSM was first introduced in Oracle9i Release 1 and its design intention was to eliminate the need to fine-tune the myriad parameters used to control space allocation and provide high concurrency. For example, by having the FREELISTS parameter set to the default of 1, you might find that your insert/update-intensive segments may be suffering from contention on free space allocation. When Oracle goes to insert a row into a table, or update an index key entry, or update a row causing the row to migrate (more on that in a moment as well), it may need to get a block from the list of free blocks associated with the segment. If there is only one list, only one transaction at a time may review and modify this list—they would have to wait for each other. Multiple FREELISTS and FREELIST GROUPS serve the purpose of increasing concurrency in such a case, as the transactions may each be looking at different lists and not contending with each other.

When I discuss the storage settings shortly, I will mention which are for manual and which are for automatic segment space management, but in the area of storage/segment characteristics, the only storage settings that apply to ASSM segments are as follows:

  • BUFFER_POOL
  • PCTFREE
  • INITRANS
  • MAXTRANS (only in 9i; in 10g this is ignored for all segments)

The remaining storage and physical attribute parameters do not apply to ASSM segments.

Segment space management is an attribute inherited from the tablespace in which a segment is contained (and segments never span tablespaces). For a segment to use ASSM, it would have to reside in a tablespace that supported that method of space management.

High-Water Mark

This is a term used with table segments stored in the database. If you envision a table, for example, as a "flat" structure or as a series of blocks laid one after the other in a line from left to right, the high-water mark (HWM) would be the rightmost block that ever contained data, as illustrated in Figure 10-1.

Figure 10-1 shows that the HWM starts at the first block of a newly created table. As data is placed into the table over time and more blocks get used, the HWM rises. If we delete some (or even all) of the rows in the table, we might have many blocks that no longer contain data, but they are still under the HWM, and they will remain under the HWM until the object is rebuilt, truncated, or shrunk (shrinking of a segment is a new Oracle 10g feature that is supported only if the segment is in an ASSM tablespace).

The HWM is relevant since Oracle will scan all blocks under the HWM, even when they contain no data, during a full scan. This will impact the performance of a full scan—especially if most of the blocks under the HWM are empty. To see this, just create a table with 1,000,000 rows (or create any table with a large number of rows), and then execute a SELECT COUNT(*) from this table. Now, DELETE every row in it and you will find that the SELECT COUNT(*) takes just as long (or longer, if you need to clean out the block; refer to the "Block Cleanout" section of Chapter 9) to count 0 rows as it did to count 1,000,000. This is because Oracle is busy reading all of the blocks below the HWM to see if they contain data. You should compare this to what happens if you used TRUNCATE on the table instead of deleting each individual row. TRUNCATE will reset the HWM of a table back to "zero" and will truncate the associated indexes on the table as well. If you plan on deleting every row in a table, TRUNCATE—if it can be used—would be the method of choice for this reason.

image

Figure 10-1. HWM depiction

In an MSSM tablespace, segments have a definite HWM. In an ASSM tablespace, however, there is an HWM and a low HWM (see Figure 10-2). In MSSM, when the HWM is advanced (e.g., as rows are inserted), all of the blocks are formatted and valid, and Oracle can read them safely. With ASSM, however, when the HWM is advanced, Oracle doesn't format all of the blocks immediately—they are only formatted and made safe to read upon their first use. So, when full scanning a segment, we have to know if the blocks to be read are "safe" or unformatted (meaning they contain nothing of interest and we do not process them). To make it so that not every block in the table need go through this safe/not safe check, Oracle maintains a low HWM and an HWM. Oracle will full scan the table up to the HWM—and for all of the blocks below the low HWM, it will just read and process them. For blocks between the low HWM and the HWM, it must be more careful and refer to the ASSM bitmap information used to manage these blocks to see which of them it should read and which it should just ignore.

image

Figure 10-2. Low HWM depiction

FREELISTS

When you use an MSSM tablespace, the freelist is where Oracle keeps tracks of blocks under the HWM for objects that have free space on them.


Note Freelists and freelist groups do not pertain to ASSM tablespaces at all; only MSSM tablespaces use this technique.


Each object will have at least one freelist associated with it, and as blocks are used, they will be placed on or taken off of the freelist as needed. It is important to note that only blocks under the HWM of an object will be found on the freelist. The blocks that remain above the HWM will be used only when the freelists are empty, at which point Oracle advances the HWM and adds these blocks to the freelist. In this fashion, Oracle postpones increasing the HWM for an object until it has to.

An object may have more than one freelist. If you anticipate heavy INSERT or UPDATE activity on an object by many concurrent users, then configuring more than one freelist can have a major positive impact on performance (at the cost of possible additional storage). Having sufficient freelists for your needs is crucial.

FREELISTS can be a huge positive performance influence (or inhibitor) in an environment with many concurrent inserts and updates. An extremely simple test can show the benefits of setting FREELISTS correctly. Consider this relatively simple table:

ops$tkyte@ORA10GR1> create table t ( x int, y char(50) ) tablespace MSSM;
Table created.

Using five concurrent sessions, we start inserting into this table like wild. If we measure the systemwide wait events for block-related waits both before and after inserting, we will find large waits, especially on data blocks (trying to insert data). This is frequently caused by insufficient freelists on tables (and on indexes, but we'll cover that in detail in the next chapter). I used Statspack for this—I took a statspack.snap, executed a script that started the five concurrent SQL*Plus sessions, and waited for them to exit before taking another statspack.snap. The script these sessions ran was simply as follows:

begin
    for i in 1 .. 100000
    loop
        insert into t values ( i, 'x' );
    end loop;
    commit;
end;
/
exit;

Now, this is a very simple block of code, and I'm the only user in the database here. I should get the best possible performance. I have plenty of buffer cache configured, my redo logs are sized appropriately, indexes won't be slowing things down, and I'm running on a machine with two hyperthreaded Xeon CPUs—this should run fast. What I discovered afterward, however, is the following:

              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:       793 29-Apr-05 13:45:36       15       3.9
  End Snap:       794 29-Apr-05 13:46:34       15       5.7
   Elapsed:                0.97 (mins)

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                      % Total
Event                                               Waits    Time (s) Call Time
-------------------------------------------- ------------ ----------- ---------
CPU time                                                          165     53.19
buffer busy waits                                 368,698         119     38.43
log file parallel write                             1,323          21      6.86
latch: cache buffers chains                           355           2       .67
enq: HW - contention                                2,828           1       .24

I collectively waited 119 seconds, or about 24 seconds per session, on buffer busy waits. These waits are caused entirely by the fact that there are not enough freelists configured on my table for the type of concurrent activity that is taking place. I can eliminate most of that wait time easily, just by creating the table with multiple freelists:

ops$tkyte@ORA10GR1> create table t ( x int, y char(50) )
   2  storage( freelists 5 ) tablespace MSSM;
Table created.

or by altering the object:

ops$tkyteORA10GR1> alter table t storage ( FREELISTS 5 );
Table altered.

You will find that the buffer busy waits goes way down, and the amount of CPU needed (since you are doing less work here; competing for a latched data structure can really burn CPU) also goes down along with the elapsed time:

              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:       809 29-Apr-05 14:04:07       15       4.0
  End Snap:       810 29-Apr-05 14:04:41       14       6.0
   Elapsed:                0.57 (mins)

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                      % Total
Event                                               Waits    Time (s) Call Time
-------------------------------------------- ------------ ----------- ---------
CPU time                                                          122     74.66
buffer busy waits                                  76,538          24     14.94
log file parallel write                               722          14      8.45
latch: cache buffers chains                           144           1       .63
enq: HW - contention                                  678           1       .46

What you want to do for a table is try to determine the maximum number of concurrent (truly concurrent) inserts or updates that will require more space. What I mean by "truly concurrent" is how often you expect two people at exactly the same instant to request a free block for that table. This is not a measure of overlapping transactions; it is a measure of how many sessions are doing inserts at the same time, regardless of transaction boundaries. You want to have about as many freelists as concurrent inserts into the table to increase concurrency.

You should just set freelists really high and then not worry about it, right? Wrong—of course, that would be too easy. When you use multiple freelists, there is a master freelist and process freelists. If a segment has a single freelist, then the master and process freelists are one and the same thing. If you have two freelists, you'll really have one master freelist and two process freelists. A given session will be assigned to a single process freelist based on a hash of its session ID. Now, each process freelist will have very few blocks on it—the remaining free blocks are on the master freelist. As a process freelist is used, it will pull a few blocks from the master freelist as needed. If the master freelist cannot satisfy the space requirement, then Oracle will advance the HWM and add empty blocks to the master freelist. So, over time, the master freelist will fan out its storage over the many process freelists (again, each of which has only a few blocks on it). So, each process will use a single process freelist. It will not go from process freelist to process freelist to find space. This means that if you have ten process free-lists on a table, and the one your process is using exhausts the free buffers on its list, it will not go to another process freelist for space—even if the other nine process freelists have five blocks each (45 blocks in total), it will go to the master freelist. Assuming the master freelist cannot satisfy the request for a free block, it would cause the table to advance the HWM or, if the table's HWM cannot be advanced (all the space is used), to extend (to get another extent). It will then continue to use the space on its freelist only (which is no longer empty). There is a tradeoff to be made with multiple freelists. On one hand, use of multiple freelists is a huge performance booster. On the other hand, it will probably cause the table to use slightly more disk space than absolutely necessary. You will have to decide which is less bothersome in your environment.

Do not underestimate the usefulness of the FREELISTS parameter, especially since you can alter it up and down at will with Oracle 8.1.6 and later. What you might do is alter it to a large number to perform some load of data in parallel with the conventional path mode of SQL*Loader. You will achieve a high degree of concurrency for the load with minimum waits. After the load, you can reduce the value to some more reasonable, day-to-day number. The blocks on the many existing freelists will be merged into the one master freelist when you alter the space down.

Another way to solve the previously mentioned issue of buffer busy waits is to use an ASSM managed tablespace. If you take the preceding example and create the table T in an ASSM managed tablespace as follows:

ops$tkyte@ORA10GR1> create tablespace assm
  2 datafile size 1m autoextend on next 1m
  3 segment space management auto;
Tablespace created.

ops$tkyte@ORA10GR1> create table t ( x int, y char(50) ) tablespace ASSM;
Table created.

you'll find the buffer busy waits, CPU time, and elapsed time to have decreased for this case as well—without having to figure out the optimum number of required freelists:

              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:       812 29-Apr-05 14:12:37       15       3.9
  End Snap:       813 29-Apr-05 14:13:07       15       5.6
   Elapsed:                0.50 (mins)

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                      % Total
Event                                               Waits    Time (s) Call Time
-------------------------------------------- ------------ ----------- ---------
CPU time                                                          107     78.54
log file parallel write                               705          12      9.13
buffer busy waits                                  12,485          12      8.52
latch: library cache                                   68           1       .70
LGWR wait for redo copy                             3,794           1       .47

This is one of ASSM's main purposes: to remove the need to manually determine the correct settings for many key storage parameters.

PCTFREE and PCTUSED

In general, the PCTFREE parameter tells Oracle how much space should be reserved on a block for future updates. By default, this is 10 percent. If there is a higher percentage of free space than the value specified in PCTFREE, then the block is considered to be "free." PCTUSED tells Oracle the percentage of free space that needs to be present on a block that is not currently "free" in order for it to become free again. The default value is 40 percent.

As noted earlier, when used with a table (but not an IOT, as we'll see), PCTFREE tells Oracle how much space should be reserved on a block for future updates. This means if we use an 8KB blocksize, as soon as the addition of a new row onto a block causes the free space on the block to drop below about 800 bytes, Oracle will use another block from the FREELIST instead of the existing block. This 10 percent of the data space on the block is set aside for updates to the rows on that block.


Note PCTFREE and PCTUSED are implemented differently for different table types. Some table types employ both, whereas others only use PCTFREE, and even then only when the object is created. IOTs use PCTFREE upon creation to set aside space in the table for future updates, but do not use PCTFREE to decide when to stop inserting rows into a given block, for example.


The exact effect of these two parameters varies depending on whether you are using ASSM or MSSM tablespaces. When you are using MSSM, these parameter settings control when the block will be put on and taken off the freelist. If you are using the default values for PCTFREE (10) and PCTUSED (40), then a block will remain on the freelist until it is 90 percent full (10 percent free space). Once it hits 90 percent, it will be taken off the freelist and remain off the freelist until the free space on the block exceeds 60 percent of the block.

When you are using ASSM, PCTFREE still limits if a new row may be inserted into a block, but it does not control whether a block is on a freelist or not, as ASSM does not use freelists at all. In ASSM, PCTUSED is simply ignored.

There are three settings for PCTFREE: too high, too low, and just about right. If you set PCTFREE for blocks too high, you will waste space. If you set PCTFREE to 50 percent and you never update the data, you have just wasted 50 percent of every block. On another table, however, 50 percent may be very reasonable. If the rows start out small and tend to double in size, setting PCTFREE too small will cause row migration as you update the rows.

Row Migration

What exactly is row migration? Row migration is when a row is forced to leave the block it was created on because it grew too large to fit on that block with the rest of the rows. I'll illustrate a row migration in this section. We'll start with a block that looks like Figure 10-3.

image

Figure 10-3. Data block before update

Approximately one-seventh of the block is free space. However, we would like to more than double the amount of space used by row 4 via an UPDATE (it currently consumes one-seventh of the block). In this case, even if Oracle coalesced the space on the block as shown in Figure 10-4, there is still insufficient room double the size of row 4 because the size of the free space is less than the current size of row 4.

image

Figure 10-4. Data block as it would appear after coalescing free space

If the row fit into the coalesced space, then this would have happened. This time, however, Oracle will not perform this coalescing and the block will remain as it is. Since row 4 would have to span more than one block if it stayed on this block, Oracle will move, or migrate, the row. However, Oracle cannot just move the row—it must leave behind a "forwarding address." There may be indexes that physically point to this address for row 4. A simple update will not modify the indexes as well. (Note that there is a special case with partitioned tables that a rowid, the address of a row, will change. We will look at this case in Chapter 13.) Therefore, when Oracle migrates the row, it will leave behind a pointer to where the row really is. After the update, the blocks might look as shown in Figure 10-5.

image

Figure 10-5. Migrated row depiction

So, a migrated row is a row that had to move from the block it was inserted into, onto some other block. Why is this an issue? Your application will never know; the SQL you use is no different. It only matters for performance reasons. If you go to read this row via an index, the index will point to the original block. That block will point to the new block. Instead of doing the two or so I/Os to read the index plus one I/O to read the table, you'll need to do yet one more I/O to get to the actual row data. In isolation, this is no big deal—you won't even notice it. However, when you have a sizable percentage of your rows in this state, with lots of users accessing them, you'll begin to notice this side effect. Access to this data will start to slow down (additional I/Os and the associated latching that goes with the I/O add to the access time), your buffer cache efficiency goes down (you need to buffer two blocks instead of just the one you would if the rows were not migrated), and your table grows in size and complexity. For these reasons, you do not want migrated rows.

It is interesting to note what Oracle will do if the row that was migrated from the block on the left to the block on the right, in Figure 10-5, has to migrate again at some future point in time. This would be due to other rows being added to the block it was migrated to and then updating this row to make it even larger. Oracle will actually migrate the row back to the original block and, if there is sufficient space, leave it there (the row might become "unmigrated"). If there isn't sufficient space, Oracle will migrate the row to another block altogether and change the forwarding address on the original block. As such, row migrations will always involve one level of indirection.

So, now we are back to PCTFREE and what it is used for: it is the setting that will help you to minimize row chaining when set properly.

Setting PCTFREE and PCTUSED Values

Setting PCTFREE and PCTUSED is an important—and greatly overlooked—topic. In summary, PCTUSED and PCTFREE are both crucial when using MSSM; with ASSM, only PCTFREE is. On one hand, you need to use them to avoid too many rows from migrating. On the other hand, you use them to avoid wasting too much space. You need to look at your objects and describe how they will be used, and then you can come up with a logical plan for setting these values. Rules of thumb may very well fail you on these settings; they really need to be set based on usage. You might consider the following (keeping in mind that "high" and "low" are relative terms, and that when using ASSM only PCTFREE applies):

  • High PCTFREE, low PCTUSED: This setting is for when you insert lots of data that will be updated and the updates will increase the size of the rows frequently. This setting reserves a lot of space on the block after inserts (high PCTFREE) and makes it so that the block must almost be empty before getting back onto the freelist (low PCTUSED).
  • Low PCTFREE, high PCTUSED: This setting is for if you tend to only ever INSERT or DELETE from the table, or if you do UPDATE, the UPDATE tends to shrink the row in size.

LOGGING and NOLOGGING

Normally objects are created in a LOGGING fashion, meaning all operations performed against them that can generate redo will generate it. NOLOGGING allows certain operations to be performed against that object without the generation of redo; we covered this in the last chapter in some detail. NOLOGGING affects only a few specific operations, such as the initial creation of the object, or direct path loads using SQL*Loader, or rebuilds (see the Oracle SQL Reference manual for the database object you are working with to see which operations apply).

This option does not disable redo log generation for the object in general—only for very specific operations. For example, if I create a table as SELECT NOLOGGING and then INSERT INTO images/U001.jpg THAT_TABLE VALUES ( 1 ), the INSERT will be logged, but the table creation might not have been (the DBA can force logging at the database or tablespace level).

INITRANS and MAXTRANS

Each block in a segment has a block header. Part of this block header is a transaction table. Entries will be made in the transaction table to describe which transactions have what rows/elements on the block locked. The initial size of this transaction table is specified by the INITRANS setting for the object. For tables, this defaults to 2 (indexes default to 2). This transaction table will grow dynamically as needed up to MAXTRANS entries in size (given sufficient free space on the block, that is). Each allocated transaction entry consumes 23 to 24 bytes of storage in the block header. Note that as of Oracle 10g, MAXTRANS is ignored—all segments have a MAXTRANS of 255.

Heap Organized Tables

A heap organized table is probably used 99 percent (or more) of the time in applications, although that might change over time with the advent of IOTs, since they can themselves be indexed. A heap organized table is the type of table you get by default when you issue the CREATE TABLE statement. If you want any other type of table structure, you need to specify that in the CREATE statement itself.

A heap is a classic data structure studied in computer science. It is basically a big area of space, disk, or memory (disk in the case of a database table, of course), which is managed in an apparently random fashion. Data will be placed where it fits best, rather than in any specific sort of order. Many people expect data to come back out of a table in the same order it was put into it, but with a heap, this is definitely not assured. In fact, rather the opposite is guaranteed: the rows will come out in a wholly unpredictable order. This is quite easy to demonstrate.

In this example, I will set up a table such that in my database I can fit one full row per block (I am using an 8KB blocksize). You do not need to have the case where you only have one row per block—I am just taking advantage of that to demonstrate a predictable sequence of events. The following sort of behavior (that rows have no order) will be observed on tables of all sizes, in databases with any blocksize:

ops$tkyte@ORA10GR1> create table t
  2  ( a int,
  3    b varchar2(4000) default rpad('*',4000,'*'),
  4    c varchar2(3000) default rpad('*',3000,'*')
  5  )
  6  /
Table created.

ops$tkyte@ORA10GR1> insert into t (a) values ( 1);
1 row created.

ops$tkyte@ORA10GR1> insert into t (a) values ( 2);
1 row created.

ops$tkyte@ORA10GR1> insert into t (a) values ( 3);
1 row created.

ops$tkyte@ORA10GR1> delete from t where a = 2 ;
1 row deleted.

ops$tkyte@ORA10GR1> insert into t (a) values ( 4);
1 row created.

ops$tkyte@ORA10GR1> select a from t;

     A
----------
     1
     4
     3

Adjust columns B and C to be appropriate for your blocksize if you would like to reproduce this. For example, if you have a 2KB blocksize, you do not need column C, and column B should be a VARCHAR2(1500) with a default of 1,500 asterisks. Since data is managed in a heap in a table like this, as space becomes available, it will be reused.


Note When using ASSM or MSSM, you'll find rows end up in "different places." The underlying space management routines are very different, and the same operations executed against a table in ASSM and MSSM may well result in different physical order. The data will logically be the same, but it will be stored in different ways.


A full scan of the table will retrieve the data as it hits it, never in the order of insertion. This is a key concept to understand about database tables: in general, they are inherently unordered collections of data. You should also note that I do not need to use a DELETE in order to observe this effect; I could achieve the same results using only INSERTs. If I insert a small row, followed by a very large row that will not fit on the block with the small row, and then a small row again, I may very well observe that the rows come out by default in the order "small row, small row, large row." They will not be retrieved in the order of insertion. Oracle will place the data where it fits, not in any order by date or transaction.

If your query needs to retrieve data in order of insertion, you must add a column to the table that you can use to order the data when retrieving it. That column could be a number column, for example, maintained with an increasing sequence (using the Oracle SEQUENCE object). You could then approximate the insertion order using a SELECT that did an ORDER BY on this column. It will be an approximation because the row with sequence number 55 may very well have committed before the row with sequence 54, therefore it was officially "first" in the database.

You should think of a heap organized table as a big unordered collection of rows. These rows will come out in a seemingly random order, and depending on other options being used (parallel query, different optimizer modes, and so on), they may come out in a different order with the same query. Do not ever count on the order of rows from a query unless you have an ORDER BY statement on your query!

That aside, what is important to know about heap tables? Well, the CREATE TABLE syntax spans some 72 pages in the Oracle SQL Reference manual, so there are lots of options that go along with them. There are so many options that getting a hold on all of them is pretty difficult. The "wire diagrams" (or "train track" diagrams) alone take 18 pages to cover. One trick I use to see most of the options available to me in the CREATE TABLE statement for a given table is to create the table as simply as possible, for example:

ops$tkyte@ORA10GR1> create table t
  2  ( x int primary key,
  3    y date,
  4    z clob
  5  )
  6  /
Table created.

Then, using the standard supplied package DBMS_METADATA, I query the definition of it and see the verbose syntax:

ops$tkyte@ORA10GR1> select dbms_metadata.get_ddl( 'TABLE', 'T' ) from dual;

DBMS_METADATA.GET_DDL('TABLE','T')
-------------------------------------------------------------------------------
  CREATE TABLE "OPS$TKYTE"."T"
   (    "X" NUMBER(*,0),
        "Y" DATE,
        "Z" CLOB,
         PRIMARY KEY ("X")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "USERS" ENABLE
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "USERS"
 LOB ("Z") STORE AS (
  TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
  NOCACHE
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT))

The nice thing about this trick is that it shows many of the options for my CREATE TABLE statement. I just have to pick data types and such, and Oracle will produce the verbose version for me. I can now customize this verbose version, perhaps changing the ENABLE STORAGE IN ROW to DISABLE STORAGE IN ROW, which would disable the storage of the LOB data in the row with the structured data, causing it to be stored in another segment. I use this trick all of the time to save the couple minutes of confusion I would otherwise have if I were trying to figure this all out from the huge wire diagrams. I can also use this technique to learn what options are available to me on the CREATE TABLE statement under different circumstances.

Now that you know how to see most of the options available to you on a given CREATE TABLE statement, which are the important ones you need to be aware of for heap tables? In my opinion, there are two with ASSM and four with MSSM:

  • FREELISTS: MSSM only. Every table manages the blocks it has allocated in the heap on a freelist. A table may have more than one freelist. If you anticipate heavy insertion into a table by many concurrent users, configuring more than one freelist can have a major positive impact on performance (at the cost of possible additional storage). Refer to the previous discussion and example in the section "FREELISTS" for the sort of impact this setting can have on performance.
  • PCTFREE: Both ASSM and MSSM. A measure of how full a block can be is made during the INSERT process. As shown earlier, this is used to control whether a row may be added to a block or not based on how full the block currently is. This option is also used to control row migrations caused by subsequent updates and needs to be set based on how you use the table.
  • PCTUSED: MSSM only. A measure of how empty a block must become before it can be a candidate for insertion again. A block that has less than PCTUSED space used is a candidate for insertion of new rows. Again, like PCTFREE, you must consider how you will be using your table to set this option appropriately.
  • INITRANS: Both ASSM and MSSM. The number of transaction slots initially allocated to a block. If set too low (it defaults to and has a minimum of 2), this option can cause concurrency issues in a block that is accessed by many users. If a database block is nearly full and the transaction list cannot be dynamically expanded, sessions will queue up waiting for this block, as each concurrent transaction needs a transaction slot. If you believe you will have many concurrent updates to the same blocks, you should consider increasing this value

Note LOB data that is stored out of line in the LOB segment does not make use of the PCTFREE/PCTUSED parameters set for the table. These LOB blocks are managed differently: they are always filled to capacity and returned to the freelist only when completely empty.


These are the parameters you want to pay particularly close attention to. With the introduction of locally managed tablespaces, which are highly recommended, I find that the rest of the storage parameters (such as PCTINCREASE, NEXT, and so on) are simply not relevant anymore.

Index Organized Tables

Index organized tables (IOTs) are, quite simply, tables stored in an index structure. Whereas a table stored in a heap is unorganized (i.e., data goes wherever there is available space), data in an IOT is stored and sorted by primary key. IOTs behave just like "regular" tables do as far as your application is concerned; you use SQL to access them as normal. They are especially useful for information retrieval, spatial, and OLAP applications.

What is the point of an IOT? You might ask the converse, actually: what is the point of a heap organized table? Since all tables in a relational database are supposed to have a primary key anyway, isn't a heap organized table just a waste of space? We have to make room for both the table and the index on the primary key of the table when using a heap organized table. With an IOT, the space overhead of the primary key index is removed, as the index is the data and the data is the index. The fact is that an index is a complex data structure that requires a lot of work to manage and maintain, and the maintenance requirements increase as the width of the row to store increases. A heap, on the other hand, is trivial to manage by comparison. There are efficiencies in a heap organized table over an IOT. That said, IOTs have some definite advantages over their heap counterparts. For example, I remember once building an inverted list index on some textual data (this predated the introduction of interMedia and related technologies). I had a table full of documents, and I would parse the documents and find words within them. I had a table that then looked like this:

create table keywords
( word varchar2(50),
  position  int,
  doc_id int,
  primary key(word,position,doc_id)
);

Here I had a table that consisted solely of columns of the primary key. I had over 100 percent overhead; the size of my table and primary key index were comparable (actually, the primary key index was larger since it physically stored the rowid of the row it pointed to, whereas a rowid is not stored in the table—it is inferred). I only used this table with a WHERE clause on the WORD or WORD and POSITION columns. That is, I never used the table—I used only the index on the table. The table itself was no more than overhead. I wanted to find all documents containing a given word (or "near" another word, and so on). The heap table was useless, and it just slowed down the application during maintenance of the KEYWORDS table and doubled the storage requirements. This is a perfect application for an IOT.

Another implementation that begs for an IOT is a code lookup table. Here you might have ZIP_CODE to STATE lookup, for example. You can now do away with the heap table and just use an IOT itself. Anytime you have a table that you access via its primary key exclusively, it is a candidate for an IOT.

When you want to enforce co-location of data or you want data to be physically stored in a specific order, the IOT is the structure for you. For users of Sybase and SQL Server, this is where you would have used a clustered index, but IOTs go one better. A clustered index in those databases may have up to a 110 percent overhead (similar to the previous KEYWORDS table example). Here, we have a 0 percent overhead since the data is stored only once. A classic example of when you might want this physically co-located data would be in a parent/child relationship. Let's say the EMP table had a child table containing addresses. You might have a home address entered into the system when the employee is initially sent an offer letter for a job, and later he adds his work address. Over time, he moves and changes the home address to a previous address and adds a new home address. Then he has a school address he added when he went back for a degree, and so on. That is, the employee has three or four (or more) detail records, but these details arrive randomly over time. In a normal heap-based table, they just go "anywhere." The odds that two or more of the address records would be on the same database block in the heap table are very near zero. However, when you query an employee's information, you always pull the address detail records as well. The rows that arrive over time are always retrieved together. To make the retrieval more efficient, you can use an IOT for the child table to put all of the records for a given employee "near" each other upon insertion, so when you retrieve them over and over again, you do less work.

An example will easily show the effects of using an IOT to physically co-locate the child table information. Let's create and populate an EMP table:

ops$tkyte@ORA10GR1> create table emp
  2  as
  3  select object_id   empno,
  4         object_name ename,
  5         created     hiredate,
  6         owner       job
  7    from all_objects
  8  /
Table created.

ops$tkyte@ORA10GR1> alter table emp add constraint emp_pk primary key(empno)
  2  /
Table altered.

ops$tkyte@ORA10GR1> begin
  2     dbms_stats.gather_table_stats( user, 'EMP', cascade=>true );
  3  end;
  4  /
PL/SQL procedure successfully completed.

Next, we'll implement the child table two times: once as a conventional heap table and again as an IOT:

ops$tkyte@ORA10GR1> create table heap_addresses
  2  ( empno     references emp(empno) on delete cascade,
  3    addr_type varchar2(10),
  4    street    varchar2(20),
  5    city      varchar2(20),
  6    state     varchar2(2),
  7    zip       number,
  8    primary key (empno,addr_type)
  9  )
 10  /
Table created.

ops$tkyte@ORA10GR1> create table iot_addresses
  2  ( empno     references emp(empno) on delete cascade,
  3    addr_type varchar2(10),
  4    street    varchar2(20),
  5    city      varchar2(20),
  6    state     varchar2(2),
  7    zip       number,
  8    primary key (empno,addr_type)
  9 )
 10 ORGANIZATION INDEX
 11 /
Table created.

I populated these tables by inserting into them a work address for each employee, then a home address, then a previous address, and finally a school address. A heap table would tend to place the data at "the end" of the table; as the data arrives, the heap table would simply add it to the end, due to the fact that the data is just arriving and no data is being deleted. Over time, if addresses are deleted the inserts would become more random throughout the table. But suffice it to say that the odds an employee's work address would be on the same block as his home address in the heap table is near zero. For the IOT, however, since the key is on EMPNO,ADDR_TYPE, we'll be pretty sure that all of the addresses for a given EMPNO are located on one or maybe two index blocks together. The inserts used to populate this data were

ops$tkyte@ORA10GR1> insert into heap_addresses
  2  select empno, 'WORK', '123 main street', 'Washington', 'DC', 20123
  3    from emp;
48250 rows created.

ops$tkyte@ORA10GR1> insert into iot_addresses
  2  select empno, 'WORK', '123 main street', 'Washington', 'DC', 20123
  3    from emp;
48250 rows created.

I did that three more times, changing WORK to HOME, PREV, and SCHOOL in turn. Then I gathered statistics:

ops$tkyte@ORA10GR1> exec dbms_stats.gather_table_stats( user, 'HEAP_ADDRESSES' );
PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR1> exec dbms_stats.gather_table_stats( user, 'IOT_ADDRESSES' );
PL/SQL procedure successfully completed.

Now we are ready to see what measurable difference we could expect to see. Using AUTOTRACE, we'll get a feeling for the change:

ops$tkyte@ORA10GR1> set autotrace traceonly
ops$tkyte@ORA10GR1> select *
  2    from emp, heap_addresses
  3   where emp.empno = heap_addresses.empno
  4     and emp.empno = 42;


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=4 Bytes=336)
   1    0   NESTED LOOPS (Cost=8 Card=4 Bytes=336)
   2    1 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 Card=1...
   3    2   INDEX (UNIQUE SCAN) OF 'EMP_PK' (INDEX (UNIQUE)) (Cost=1 Card=1)
   4    1 TABLE ACCESS (BY INDEX ROWID) OF 'HEAP_ADDRESSES' (TABLE) (Cost=6...
   5    4   INDEX (RANGE SCAN) OF 'SYS_C008078' (INDEX (UNIQUE)) (Cost=2 Card=4)

Statistics
----------------------------------------------------------
...
         11   consistent gets
...
          4   rows processed

That is a pretty common plan: go to the EMP table by primary key; get the row; then using that EMPNO, go to the address table; and using the index, pick up the child records. We did 11 I/Os to retrieve this data. Now running the same query, but using the IOT for the addresses

ops$tkyte@ORA10GR1> select *
  2    from emp, iot_addresses
  3   where emp.empno = iot_addresses.empno
  4     and emp.empno = 42;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=4 Bytes=336)
   1    0   NESTED LOOPS (Cost=4 Card=4 Bytes=336)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 Card=1...
   3    2       INDEX (UNIQUE SCAN) OF 'EMP_PK' (INDEX (UNIQUE)) (Cost=1 Card=1)
   4    1     INDEX (RANGE SCAN) OF 'SYS_IOT_TOP_59615' (INDEX (UNIQUE)) (Cost=2...

Statistics
----------------------------------------------------------
...
          7    consistent gets
...
          4    rows processed

ops$tkyte@ORA10GR1> set autotrace off

we did four fewer I/Os (the four should have been guessable); we skipped four TABLE ACCESS images/U001.jpg (BY INDEX ROWID) steps. The more child records we have, the more I/Os we would anticipate skipping.

So, what is four I/Os? Well, in this case it was over one-third of the I/O performed for the query, and if we execute this query repeatedly, that would add up. Each I/O and each consistent get requires an access to the buffer cache, and while it is true that reading data out of the buffer cache is faster than disk, it is also true that the buffer cache gets are not free and not totally cheap. Each will require many latches of the buffer cache, and latches are serialization devices that will inhibit our ability to scale. We can measure both the I/O reduction as well as latching reduction by running a PL/SQL block such as this:

ops$tkyte@ORA10GR1> begin
  2      for x in ( select empno from emp )
  3      loop
  4          for y in ( select emp.ename, a.street, a.city, a.state, a.zip
  5                       from emp, heap_addresses a
  6                      where emp.empno = a.empno
  7                        and emp.empno = x.empno )
  8          loop
  9              null;
 10          end loop;
 11       end loop;
 12  end;
 13  /
PL/SQL procedure successfully completed.

Here, we are just emulating a busy period and running the query some 45,000 times, once for each EMPNO. If we run that for the HEAP_ADRESSES and IOT_ADDRESSES tables, TKPROF shows us the following:

SELECT EMP.ENAME, A.STREET, A.CITY, A.STATE, A.ZIP
  FROM EMP, HEAP_ADDRESSES A
 WHERE EMP.EMPNO = A.EMPNO AND EMP.EMPNO = :B1

call     count      cpu    elapsed       disk      query    current       rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1     0.00       0.00          0          0          0          0
Execute  48244     7.66       7.42          0          0          0          0
Fetch    48244      6.29      6.56          0     483393          0     192976
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    96489    13.95      13.98          0     483393          0     192976

Rows    Row Source Operation
------- ---------------------------------------------------
 192976 NESTED LOOPS (cr=483393 pr=0 pw=0 time=5730335 us)
  48244  TABLE ACCESS BY INDEX ROWID EMP (cr=144732 pr=0 pw=0 time=1594981 us)
  48244   INDEX UNIQUE SCAN EMP_PK (cr=96488 pr=0 pw=0 time=926147 us)...
 192976  TABLE ACCESS BY INDEX ROWID HEAP_ADDRESSES (cr=338661 pr=0 pw=0 time=...
 192976   INDEX RANGE SCAN SYS_C008073 (cr=145685 pr=0 pw=0 time=1105135 us)...
********************************************************************************
SELECT EMP.ENAME, A.STREET, A.CITY, A.STATE, A.ZIP
  FROM EMP, IOT_ADDRESSES A
 WHERE EMP.EMPNO = A.EMPNO AND EMP.EMPNO = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  48244      8.17       8.81          0          0          0           0
Fetch    48244      4.31       4.12          0     292918          0      192976
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    96489     12.48      12.93          0     292918          0      192976

Rows    Row Source Operation
------- ---------------------------------------------------
 192976 NESTED LOOPS (cr=292918 pr=0 pw=0 time=3429753 us)
  48244  TABLE ACCESS BY INDEX ROWID EMP (cr=144732 pr=0 pw=0 time=1615024 us)
  48244   INDEX UNIQUE SCAN EMP_PK (cr=96488 pr=0 pw=0 time=930931 us)...
 192976  INDEX RANGE SCAN SYS_IOT_TOP_59607 (cr=148186 pr=0 pw=0 time=1417238 us)...

Both queries fetched exactly the same number of rows, but the HEAP table performed considerably more logical I/O. As the degree of concurrency on the system goes up, we would likewise expect the CPU used by the HEAP table to go up more rapidly as well, while the query possibly waits for latches into the buffer cache. Using runstats (a utility of my own design), we can measure the difference in latching. On my system, I observed the following:

STAT...consistent gets             484,065     293,566    -190,499
STAT...no work - consistent re     194,546       4,047    -190,499
STAT...consistent gets from ca     484,065     293,566    −190,499
STAT...session logical reads       484,787     294,275    −190,512
STAT...table fetch by rowid        241,260      48,260    −193,000
STAT...buffer is not pinned co     337,770      96,520    −241,250
LATCH.cache buffers chains         732,960     349,380    −383,580

Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
990,344     598,750    −391,594    165.40%

where Run1 was the HEAP_ADDRESSES table and Run2 was the IOT_ADDRESSES table. As you can see, there was a dramatic and repeatable decrease in the latching taking place, mostly due to the cache buffers chains latch (the one that protects the buffer cache). The IOT in this case would provide the following benefits:

  • Increased buffer cache efficiency, as any given query needs to have fewer blocks in the cache
  • Decreased buffer cache access, which increases scalability
  • Less overall work to retrieve our data, as it is faster
  • Less physical I/O per query possibly, as fewer distinct blocks are needed for any given query and a single physical I/O of the addresses most likely retrieves all of them (not just one of them, as the heap table implementation does)

The same would be true if you frequently use BETWEEN queries on a primary or unique key. Having the data stored physically sorted will increase the performance of those queries as well. For example, I maintain a table of stock quotes in my database. Every day, for hundreds of stocks, I gather together the stock ticker, date, closing price, days high, days low, volume, and other related information. The table looks like this:

ops$tkyte@ORA10GR1> create table stocks
  2  ( ticker      varchar2(10),
  3    day         date,
  4    value       number,
  5    change      number,
  6    high        number,
  7    low         number,
  8    vol         number,
  9    primary key(ticker,day)
 10  )
 11  organization index
 12  /
Table created.

I frequently look at one stock at a time for some range of days (e.g., computing a moving average). If I were to use a heap organized table, the probability of two rows for the stock ticker ORCL existing on the same database block are almost zero. This is because every night, I insert the records for the day for all of the stocks. That fills up at least one database block (actually, many of them). Therefore, every day I add a new ORCL record, but it is on a block different from every other ORCL record already in the table. If I query as follows:

Select * from stocks
 where ticker = 'ORCL'
   and day between sysdate-100 and sysdate;

Oracle would read the index and then perform table access by rowid to get the rest of the row data. Each of the 100 rows I retrieve would be on a different database block due to the way I load the table—each would probably be a physical I/O. Now consider that I have this same data in an IOT. That same query only needs to read the relevant index blocks, and it already has all of the data. Not only is the table access removed, but all of the rows for ORCL in a given range of dates are physically stored "near" each other as well. Less logical I/O and less physical I/O is incurred.

Now you understand when you might want to use IOTs and how to use them. What you need to understand next is what the options are with these tables. What are the caveats? The options are very similar to the options for a heap organized table. Once again, we'll use DBMS_METADATA to show us the details. Let's start with the three basic variations of the IOT:

ops$tkyte@ORA10GR1> create table t1
  2  (  x int primary key,
  3     y varchar2(25),
  4     z date
  5  )
  6  organization index;
Table created.

ops$tkyte@ORA10GR1> create table t2
  2  (  x int primary key,
  3     y varchar2(25),
  4     z date
  5  )
  6  organization index
  7  OVERFLOW;
Table created.

ops$tkyte@ORA10GR1> create table t3
  2  (  x int primary key,
  3     y varchar2(25),
  4     z date
  5  )
  6  organization index
  7  overflow INCLUDING y;
Table created.

We'll get into what OVERFLOW and INCLUDING do for us, but first let's look at the detailed SQL required for the first table:

ops$tkyte@ORA10GR1> select dbms_metadata.get_ddl( 'TABLE', 'T1' ) from dual;

DBMS_METADATA.GET_DDL('TABLE','T1')
--------------------------------------------------------------------------------

  CREATE TABLE "OPS$TKYTE"."T1"
   (    "X" NUMBER(*,0),
        "Y" VARCHAR2(25),
        "Z" DATE,
         PRIMARY KEY ("X") ENABLE
   )
ORGANIZATION INDEX
NOCOMPRESS
PCTFREE 10 INITRANS 2 MAXTRANS 255 LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
        PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS"
PCTTHRESHOLD 50

This table introduces two new options, NOCOMPRESS and PCTTHRESHOLD, which we'll look at in a moment. You might have noticed that something is missing from the preceding CREATE images/U001.jpg TABLE syntax: there is no PCTUSED clause, but there is a PCTFREE. This is because an index is a complex data structure that isn't randomly organized like a heap, so data must go where it "belongs." Unlike in a heap, where blocks are sometimes available for inserts, blocks are always available for new entries in an index. If the data belongs on a given block because of its values, it will go there regardless of how full or empty the block is. Additionally, PCTFREE is used only when the object is created and populated with data in an index structure. It is not used like it is in the heap organized table. PCTFREE will reserve space on a newly created index, but not for subsequent operations on it, for much the same reason as PCTUSED is not used at all. The same considerations for freelists we had on heap organized tables apply in whole to IOTs.

Now, on to the newly discovered option NOCOMPRESS. This option is available to indexes in general. It tells Oracle to store each and every value in an index entry (i.e., do not compress). If the primary key of the object were on columns A, B, and C, every occurrence of A, B, and C would physically be stored. The converse to NOCOMPRESS is COMPRESS N, where N is an integer that represents the number of columns to compress. This removes repeating values and factors them out at the block level, so that the values of A and perhaps B that repeat over and over are no longer physically stored. Consider, for example, a table created like this:

ops$tkyte@ORA10GR1> create table iot
  2  ( owner, object_type, object_name,
  3    primary key(owner,object_type,object_name)
  4  )
  5  organization index
  6  NOCOMPRESS
  7  as
  8  select owner, object_type, object_name from all_objects
  9  /
Table created.

It you think about it, the value of OWNER is repeated many hundreds of times. Each schema (OWNER) tends to own lots of objects. Even the value pair of OWNER,OBJECT_TYPE repeats many times—a given schema will have dozens of tables, dozens of packages, and so on. Only all three columns together do not repeat. We can have Oracle suppress these repeating values. Instead of having an index block with the values shown in Table 10-1, we could use COMPRESS 2 (factor out the leading two columns) and have a block with the values shown in Table 10-2.

Table 10-1. Index Leaf Block, NOCOMPRESS

Sys,table,t1 Sys,table,t2 Sys,table,t3 Sys,table,t4
Sys,table,t5 Sys,table,t6 Sys,table,t7 Sys,table,t8
... ... ... ...
Sys,table,t100 Sys,table,t101 Sys,table,t102 Sys,table,t103


Table 10-2. Index Leaf Block, COMPRESS 2

Sys,table t1 t2 t3
t4 t5 ... ...
... t103 t104 ...
t300 t301 t302 t303

That is, the values SYS and TABLE appear once, and then the third column is stored. In this fashion, we can get many more entries per index block than we could otherwise. This does not decrease concurrency—we are still operating at the row level in all cases—or functionality at all. It may use slightly more CPU horsepower, as Oracle has to do more work to put together the keys again. On the other hand, it may significantly reduce I/O and allow more data to be cached in the buffer cache, since we get more data per block. That is a pretty good tradeoff.

Let's demonstrate the savings by doing a quick test of the preceding CREATE TABLE as SELECT with NOCOMPRESS, COMPRESS 1, and COMPRESS 2. We'll start by creating our IOT without compression:

ops$tkyte@ORA10GR1> create table iot
  2  ( owner, object_type, object_name,
  3    constraint iot_pk primary key(owner,object_type,object_name)
  4  )
  5  organization index
  6  NOCOMPRESS
  7  as
  8  select distinct owner, object_type, object_name
  9    from all_objects
 10  /
Table created.

Now we can measure the space used. We'll use the ANALYZE INDEX VALIDATE STRUCTURE command for this. This command populates a dynamic performance view named INDEX_STATS, which will contain only one row at most with the information from the last execution of that ANALYZE command:

ops$tkyte@ORA10GR1> analyze index iot_pk validate structure;
Index analyzed.

ops$tkyte@ORA10GR1> select lf_blks, br_blks, used_space,
  2         opt_cmpr_count, opt_cmpr_pctsave
  3    from index_stats;

   LF_BLKS    BR_BLKS USED_SPACE OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
---------- ---------- ---------- -------------- ----------------
       284          3    2037248              2               33

So, that shows our index is currently using 284 leaf blocks (where our data is) and 3 branch blocks (blocks Oracle uses to navigate the index structure) to find the leaf blocks. The space used is about 2MB (2,038,248 bytes). The other two oddly named columns are trying to tell us something. The OPT_CMPR_COUNT (optimum compression count) column is trying to say, "If you made this index COMPRESS 2, you would achieve the best compression." The OPT_CMPR_PCTSAVE (optimum compression percentage saved) is telling us if we did the COMPRESS 2, we would save about one-third of the storage and the index would consume just two-thirds the disk space it is now.


Note The next chapter covers the index structure in more detail.


To test that theory, we'll rebuild the IOT with COMPRESS 1 first:

ops$tkyte@ORA10GR1> alter table iot move compress 1;
Table altered.

ops$tkyte@ORA10GR1> analyze index iot_pk validate structure;
Index analyzed.

ops$tkyte@ORA10GR1> select lf_blks, br_blks, used_space,
  2         opt_cmpr_count, opt_cmpr_pctsave
  3    from index_stats;

   LF_BLKS    BR_BLKS USED_SPACE OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
---------- ---------- ---------- -------------- ----------------
       247          1    1772767              2               23

As you can see, the index is in fact smaller: about 1.7MB, with fewer leaf blocks and many fewer branch blocks. But now it is saying, "You still can get another 23 percent off," as we didn't chop off that much yet. Let's rebuild with COMPRESS 2:

ops$tkyte@ORA10GR1> alter table iot move compress 2;
Table altered.

ops$tkyte@ORA10GR1> analyze index iot_pk validate structure;
Index analyzed.

ops$tkyte@ORA10GR1> select lf_blks, br_blks, used_space,
  2         opt_cmpr_count, opt_cmpr_pctsave
  3    from index_stats;

   LF_BLKS    BR_BLKS USED_SPACE OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
---------- ---------- ---------- -------------- ----------------
       190          1    1359357              2                0

Now we are significantly reduced in size, both by the number of leaf blocks as well as overall used space, about 1.3MB. If we go back to the original numbers,

ops$tkyte@ORA10GR1> select (2/3) * 2037497 from dual;

(2/3)*2037497
-------------
   1358331.33

then we can see the OPT_CMPR_PCTSAVE was dead-on accurate. The preceding example points out an interesting fact with IOTs. They are tables, but only in name. Their segment is truly an index segment.

I am going to defer discussion of the PCTTHRESHOLD option at this point, as it is related to the next two options for IOTs: OVERFLOW and INCLUDING. If we look at the full SQL for the next two sets of tables, T2 and T3, we see the following (I've used a DBMS_METADATA routine to suppress the storage clauses, as they are not relevant to the example):

ops$tkyte@ORA10GR1> begin
  2    dbms_metadata.set_transform_param
  3    ( DBMS_METADATA.SESSION_TRANSFORM, 'STORAGE', false );
  4  end;
 /

ops$tkyte@ORA10GR1> select dbms_metadata.get_ddl( 'TABLE', 'T2' ) from dual;

DBMS_METADATA.GET_DDL('TABLE','T2')
--------------------------------------------------------------------------------
  CREATE TABLE "OPS$TKYTE"."T2"
   (    "X" NUMBER(*,0),
        "Y" VARCHAR2(25),
        "Z" DATE,
         PRIMARY KEY ("X") ENABLE
   ) ORGANIZATION INDEX NOCOMPRESS PCTFREE 10 INITRANS 2 MAXTRANS 255 LOGGING
  TABLESPACE "USERS"
 PCTTHRESHOLD 50 OVERFLOW
 PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 LOGGING
  TABLESPACE "USERS"

ops$tkyte@ORA10GR1> select dbms_metadata.get_ddl( 'TABLE', 'T3' ) from dual;

DBMS_METADATA.GET_DDL('TABLE','T3')
--------------------------------------------------------------------------------
  CREATE TABLE "OPS$TKYTE"."T3"
   (    "X" NUMBER(*,0),
        "Y" VARCHAR2(25),
        "Z" DATE,
         PRIMARY KEY ("X") ENABLE
   ) ORGANIZATION INDEX NOCOMPRESS PCTFREE 10 INITRANS 2 MAXTRANS 255 LOGGING
  TABLESPACE "USERS"
 PCTTHRESHOLD 50 INCLUDING "Y" OVERFLOW
 PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 LOGGING
  TABLESPACE "USERS"

So, now we have PCTTHRESHOLD, OVERFLOW, and INCLUDING left to discuss. These three items are intertwined, and their goal is to make the index leaf blocks (the blocks that hold the actual index data) able to efficiently store data. An index typically is on a subset of columns. You will generally find many more times the number of row entries on an index block than you would on a heap table block. An index counts on being able to get many rows per block. Oracle would spend large amounts of time maintaining an index otherwise, as each INSERT or UPDATE would probably cause an index block to split in order to accommodate the new data.

The OVERFLOW clause allows you to set up another segment (making an IOT a multisegment object, much like having a CLOB column does) where the row data for the IOT can overflow onto when it gets too large.


Note The columns making up the primary key cannot overflow—they must be placed on the leaf blocks directly.


Notice that an OVERFLOW reintroduces the PCTUSED clause to an IOT when using MSSM. PCTFREE and PCTUSED have the same meanings for an OVERFLOW segment as they did for a heap table. The conditions for using an overflow segment can be specified in one of two ways:

  • PCTTHRESHOLD: When the amount of data in the row exceeds that percentage of the block, the trailing columns of that row will be stored in the overflow. So, if PCTTHRESHOLD was 10 percent and your blocksize was 8KB, any row that was greater than about 800 bytes in length would have part of it stored elsewhere, off the index block.
  • INCLUDING: All of the columns in the row up to and including the one specified in the INCLUDING clause are stored on the index block, and the remaining columns are stored in the overflow.

Given the following table with a 2KB blocksize:

ops$tkyte@ORA10GR1> create table iot
  2  (  x  int,
  3     y  date,
  4     z  varchar2(2000),
  5     constraint iot_pk primary key (x)
  6  )
  7  organization index
  8  pctthreshold 10
  9  overflow
 10  /
Table created.

Graphically, it could look as shown in Figure 10-6.

The gray boxes are the index entries, part of a larger index structure (in Chapter 11, you'll see a larger picture of what an index looks like). Briefly, the index structure is a tree, and the leaf blocks (where the data is stored) are in effect a doubly linked list to make it easier to traverse the nodes in order once we've found where we want to start in the index. The white box represents an OVERFLOW segment. This is where data that exceeds our PCTTHRESHOLD setting will be stored. Oracle will work backward from the last column up to but not including the last column of the primary key to find out what columns need to be stored in the overflow segment. In this example, the number column X and the date column Y will always fit in the index block. The last column, Z, is of varying length. When it is less than about 190 bytes or so (10 percent of a 2KB block is about 200 bytes; subtract 7 bytes for the date and 3 to 5 for the number), it will be stored on the index block. When it exceeds 190 bytes, Oracle will store the data for Z in the overflow segment and set up a pointer (a rowid, in fact) to it.

image

Figure 10-6. IOT with overflow segment, PCTTHRESHOLD clause

The other option is to use the INCLUDING clause. Here we are stating explicitly what columns we want stored on the index block and which should be stored in the overflow. Given a CREATE TABLE like this:

ops$tkyte@ORA10GR1> create table iot
  2  (  x  int,
  3     y  date,
  4     z  varchar2(2000),
  5     constraint iot_pk primary key (x)
  6  )
  7  organization index
  8  including y
  9  overflow
 10  /
Table created.

what we can expect to find is illustrated in Figure 10-7.

image

Figure 10-7. IOT with OVERFLOW segment, INCLUDING clause

In this situation, regardless of the size of the data stored in it, Z will be stored "out of line" in the overflow segment.

Which is better, then: PCTTHRESHOLD, INCLUDING, or some combination of both? It depends on your needs. If you have an application that always, or almost always, uses the first four columns of a table and rarely accesses the last five columns, using INCLUDING would be appropriate. You would include up to the fourth column and let the other five be stored out of line. At runtime, if you need them, the columns will be retrieved in much the same way as a migrated or chained row would be. Oracle will read the "head" of the row, find the pointer to the rest of the row, and then read that. If, on the other hand, you cannot say that you almost always access these columns and hardly ever access those columns, you should give some consideration to PCTTHRESHOLD. Setting PCTTHRESHOLD is easy once you determine the number of rows you would like to store per index block on average. Suppose you wanted 20 rows per index block. Well, that means each row should be one-twentieth (5 percent). Your PCTTHRESHOLD would be 5, and each chunk of the row that stays on the index leaf block should consume no more than 5 percent of the block.

The last thing to consider with IOTs is indexing. You can have an index on an IOT itself—sort of like having an index on an index. These are called secondary indexes. Normally, an index contains the physical address of the row it points to, the rowid. An IOT secondary index cannot do this; it must use some other way to address the row. This is because a row in an IOT can move around a lot, and it does not "migrate" in the way a row in a heap organized table would. A row in an IOT is expected to be at some position in the index structure, based on its primary key value; it will only be moving because the size and shape of the index itself is changing. (We'll cover more about how index structures are maintained in the next chapter.) To accommodate this, Oracle introduced a logical rowid. These logical rowids are based on the IOT's primary key. They may also contain a "guess" as to the current location of the row, although this guess is almost always wrong because after a short while, data in an IOT tends to move. The guess is the physical address of the row in the IOT when it was first placed into the secondary index structure. If the row in the IOT has to move to another block, the guess in the secondary index becomes "stale." Therefore, an index on an IOT is slightly less efficient than an index on a regular table. On a regular table, an index access typically requires the I/O to scan the index structure and then a single read to read the table data. With an IOT, typically two scans are performed: one on the secondary structure and the other on the IOT itself. That aside, indexes on IOTs provide fast and efficient access to the data in the IOT using columns other than the primary key.

Index Organized Tables Wrap-Up

Getting the right mix of data on the index block versus data in the overflow segment is the most critical part of the IOT setup. Benchmark various scenarios with different overflow conditions, and see how they will affect your INSERTs, UPDATEs, DELETEs, and SELECTs. If you have a structure that is built once and read frequently, stuff as much of the data onto the index block as you can. If you frequently modify the structure, you will have to achieve some balance between having all of the data on the index block (great for retrieval) versus reorganizing data in the index frequently (bad for modifications). The freelist consideration you had for heap tables applies to IOTs as well. PCTFREE and PCTUSED play two roles in an IOT. PCTFREE is not nearly as important for an IOT as for a heap table, and PCTUSED doesn't come into play normally. When considering an OVERFLOW segment, however, PCTFREE and PCTUSED have the same interpretation as they do for a heap table; set them for an overflow segment using the same logic as you would for a heap table.

Index Clustered Tables

I generally find people's understanding of what a cluster is in Oracle to be inaccurate. Many people tend to confuse a cluster with a SQL Server or Sybase "clustered index." They are not the same. A cluster is a way to store a group of tables that share some common column(s) in the same database blocks and to store related data together on the same block. A clustered index in SQL Server forces the rows to be stored in sorted order according to the index key, similar to an IOT as just described. With a cluster, a single block of data may contain data from many tables. Conceptually, you are storing the data "prejoined." It can also be used with single tables, where you are storing data together grouped by some column. For example, all of the employees in department 10 will be stored on the same block (or as few blocks as possible, if they all don't fit). It is not storing the data sorted—that is the role of the IOT. It is storing the data clustered by some key, but in a heap. So, department 100 might be right next to department 1, and very far away (physically on disk) from departments 101 and 99.

Graphically, you might think of it as shown in Figure 10-8. On the left side of the image, we are using conventional tables. EMP will be stored in its segment. DEPT will be stored on its own. They may be in different files and different tablespaces, and they are definitely in separate extents. On the right side of the image, we see what would happen if we clustered these two tables together. The square boxes represent database blocks. We now have the value 10 factored out and stored once. Then, all of the data from all of the tables in the cluster for department 10 is stored in that block. If all of the data for department 10 does not fit on the block, then additional blocks will be chained to the original block to contain the overflow, in the same fashion as the overflow blocks for an IOT.

image

Figure 10-8. Index clustered data

So, let's look at how we might go about creating a clustered object. Creating a cluster of tables in the object is straightforward. The definition of the storage of the object (PCTFREE, PCTUSED, INITIAL, and so on) is associated with the CLUSTER, not the tables. This makes sense since there will be many tables in the cluster, and they will be on the same block. Having different PCTFREEs would not make sense. Therefore, a CREATE CLUSTER looks a lot like a CREATE TABLE with a small number of columns (just the cluster key columns):

ops$tkyte@ORA10GR1> create cluster emp_dept_cluster
  2  ( deptno number(2) )
  3  size 1024
  4  /
Cluster created.

Here, we have created an index cluster (the other type being a hash cluster, which we'll look at in the next section). The clustering column for this cluster will be the DEPTNO column. The columns in the tables do not have to be called DEPTNO, but they must be NUMBER(2), to match this definition. We have, on the cluster definition, a SIZE 1024 option. This is used to tell Oracle that we expect about 1,024 bytes of data to be associated with each cluster key value. Oracle will use that to compute the maximum number of cluster keys that could fit per block. Given that we have an 8KB blocksize, Oracle will fit up to seven cluster keys (but maybe fewer if the data is larger than expected) per database block. This is, the data for departments 10, 20, 30, 40, 50, 60, and 70 would tend to go onto one block, and as soon as we insert department 80, a new block will be used. That does not mean that the data is stored in a sorted manner; it just means that if we inserted the departments in that order, they would naturally tend to be put together. If we inserted the departments in the order 10, 80, 20, 30, 40, 50, 60, and then 70, the final department, 70, would tend to be on the newly added block. As we'll see shortly, both the size of the data and the order in which the data is inserted will affect the number of keys we can store per block.

The SIZE parameter therefore controls the maximum number of cluster keys per block. It is the single largest influence on the space utilization of our cluster. Set the size too high, and we'll get very few keys per block and we'll use more space than we need. Set the size too low, and we'll get excessive chaining of data, which offsets the purpose of the cluster to store all of the data together on a single block. It is the most important parameter for a cluster.

Now for the cluster index on our cluster. We need to index the cluster before we can put data in it. We could create tables in the cluster right now, but we're going to create and populate the tables simultaneously, and we need a cluster index before we can have any data. The cluster index's job is to take a cluster key value and return the block address of the block that contains that key. It is a primary key in effect, where each cluster key value points to a single block in the cluster itself. So, when we ask for the data in department 10, Oracle will read the cluster key, determine the block address for that, and then read the data. The cluster key index is created as follows:

ops$tkyte@ORA10GR1> create index emp_dept_cluster_idx
  2  on cluster emp_dept_cluster
  3  /
Index created.

It can have all of the normal storage parameters of an index and can be stored in another tablespace. It is just a regular index, so it can be on multiple columns; it just happens to index into a cluster and can also include an entry for a completely null value (see Chapter 11 for the reason why this is interesting to note). Note that we do not specify a list of columns in this CREATE INDEX statement—that is derived from the CLUSTER definition itself. Now we are ready to create our tables in the cluster:

ops$tkyte@ORA10GR1> create table dept
  2  ( deptno  number(2) primary key,
  3    dname   varchar2(14),
  4    loc     varchar2(13)
  5  )
  6  cluster emp_dept_cluster(deptno)
  7  /
Table created.

ops$tkyte@ORA10GR1> create table emp
  2  ( empno    number primary key,
  3    ename    varchar2(10),
  4    job      varchar2(9),
  5    mgr      number,
  6    hiredate date,
  7    sal      number,
  8    comm     number,
  9    deptno number(2) references dept(deptno)
 10  )
 11  cluster emp_dept_cluster(deptno)
 12  /
Table created.

Here, the only difference from a "normal" table is that we used the CLUSTER keyword and told Oracle which column of the base table will map to the cluster key in the cluster itself. Remember, the cluster is the segment here, therefore this table will never have segment attri-butes such as TABLESPACE, PCTFREE, and so on—they are attributes of the cluster segment, not the table we just created. We can now load them up with the initial set of data:

ops$tkyte@ORA10GR1> begin
  2      for x in ( select * from scott.dept )
  3      loop
  4          insert into dept
  5          values ( x.deptno, x.dname, x.loc );
  6          insert into emp
  7          select *
  8            from scott.emp
  9           where deptno = x.deptno;
 10      end loop;
 11  end;
 12  /
PL/SQL procedure successfully completed.

You might be wondering, "Why didn't we just insert all of the DEPT data and then all of the EMP data, or vice versa? Why did we load the data DEPTNO by DEPTNO like that?" The reason is in the design of the cluster. We are simulating a large, initial bulk load of a cluster. If we had loaded all of the DEPT rows first, we definitely would have gotten our seven keys per block (based on the SIZE 1024 setting we made), since the DEPT rows are very small (just a couple of bytes). When it came time to load up the EMP rows, we might have found that some of the departments had many more than 1,024 bytes of data. This would cause excessive chaining on those cluster key blocks. Oracle would chain, or link together, a list of blocks to contain this information. By loading all of the data for a given cluster key at the same time, we pack the blocks as tightly as possible and start a new block when we run out of room. Instead of Oracle putting up to seven cluster key values per block, it will put as many as can fit.

A quick example will show the difference between these two approaches. We'll add a large column to the EMP table: a CHAR(1000). This column will be used to make the EMP rows much larger than they are now. We'll load the cluster tables in two ways: we'll load DEPT and then EMP. The second time, we'll load by department number: a DEPT row, then all the EMP rows that go with it, and then the next DEPT. We'll look at the blocks each row ends up on, in the given case, to see which one best achieves the goal of co-locating the data by DEPTNO. Our EMP table looks like this:

ops$tkyte@ORA10GR1> create table emp
  2  ( empno    number primary key,
  3    ename    varchar2(10),
  4    job      varchar2(9),
  5    mgr      number,
  6    hiredate date,
  7    sal      number,
  8    comm     number,
  9    deptno   number(2) references dept(deptno),
 10    data     char(1000)
 11  )
 12  cluster emp_dept_cluster(deptno)
 13  /
Table created.

When we load the data into the DEPT and the EMP tables, we see that many of the EMP rows are not on the same block as the DEPT row anymore (DBMS_ROWID is a supplied package useful for peeking at the contents of a row ID):

ops$tkyte@ORA10GR1> insert into dept
  2  select * from scott.dept;
4 rows created.

ops$tkyte@ORA10GR1> insert into emp
  2  select emp.*, '*' from scott.emp;
14 rows created.

ops$tkyte@ORA10GR1> select dept_blk, emp_blk,
  2         case when dept_blk <> emp_blk then '*' end flag,
  3             deptno
  4    from (
  5  select dbms_rowid.rowid_block_number(dept.rowid) dept_blk,
  6         dbms_rowid.rowid_block_number(emp.rowid) emp_blk,
  7         dept.deptno
  8    from emp, dept
  9   where emp.deptno = dept.deptno
 10         )
 11   order by deptno
 12  /

  DEPT_BLK    EMP_BLK F     DEPTNO
---------- ---------- - ----------
      4792       4788 *         10
      4792       4788 *         10
      4792       4791 *         10
      4792       4788 *         20
      4792       4788 *         20
      4792       4792           20
      4792       4792           20
      4792       4791 *         20
      4792       4788 *         30
      4792       4792           30
      4792       4792           30
      4792       4792           30
      4792       4792           30
      4792       4788 *         30

14 rows selected.

More than half of the EMP rows are not on the block with the DEPT row. Loading the data using the cluster key instead of the table key, we get the following:

ops$tkyte@ORA10GR1> begin
  2      for x in ( select * from scott.dept )
  3      loop
  4          insert into dept
  5          values ( x.deptno, x.dname, x.loc );
  6          insert into emp
  7          select emp.*, 'x'
  8            from scott.emp
  9           where deptno = x.deptno;
 10      end loop;
 11  end;
 12  /
PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR1> select dept_blk, emp_blk,
  2         case when dept_blk <> emp_blk then '*' end flag,
  3             deptno
  4    from (
  5  select dbms_rowid.rowid_block_number(dept.rowid) dept_blk,
  6         dbms_rowid.rowid_block_number(emp.rowid) emp_blk,
  7         dept.deptno
  8    from emp, dept
  9   where emp.deptno = dept.deptno
 10         )
 11   order by deptno
 12  /

  DEPT_BLK    EMP_BLK F     DEPTNO
---------- ---------- - ----------
        12         12           10
        12         12           10
        12         12           10
        11         11           20
        11         11           20
        11         11           20
        11         12 *         20
        11         11           20
        10         10           30
        10         10           30
        10         10           30
        10         10           30
        10         10           30
        10         11 *         30

14 rows selected.

Note Your mileage may vary here, as the order in which the rows are fetched from the SCOTT.DEPT table can and will change the results, and the use of ASSM versus MSSM may as well. The concept should be clear, however: if you put the row for DEPTNO=n on a given block, and then the employee rows for DEPTNO=n, you should achieve the best clustering possible.


Most of the EMP rows are on the same block as the DEPT rows. This example is somewhat contrived in that I woefully undersized the SIZE parameter on the cluster to make a point, but the approach suggested is correct for an initial load of a cluster. It will ensure that if for some of the cluster keys, you exceed the estimated SIZE, you will still end up with most of the data clustered on the same block. If you load a table at a time, you will not.

This technique applies only to the initial load of a cluster—after that, you would use it as your transactions deem necessary. You will not adapt your application to work specifically with a cluster.

Here is a bit of puzzle to amaze and astound your friends with. Many people mistakenly believe a rowid uniquely identifies a row in a database, and that given a rowid you can tell what table the row came from. In fact, you cannot. You can and will get duplicate rowids from a cluster. For example, after executing the preceding code you should find

ops$tkyte@ORA10GR1> select rowid from emp
  2  intersect
  3  select rowid from dept;

ROWID
------------------
AAAOniAAJAAAAAKAAA
AAAOniAAJAAAAAKAAB
AAAOniAAJAAAAALAAA
AAAOniAAJAAAAAMAAA

Every rowid assigned to the rows in DEPT has been assigned to the rows in EMP as well. That is because it takes a table and row ID to uniquely identify a row. The rowid pseudo-column is unique only within a table.

I also find that many people believe the cluster object to be an esoteric object that no one really uses—everyone just uses normal tables. The fact is, you use clusters every time you use Oracle. Much of the data dictionary is stored in various clusters, for example:

sys@ORA10GR1> break on cluster_name
sys@ORA10GR1> select cluster_name, table_name
  2  from user_tables
  3  where cluster_name is not null
  4  order by 1;

CLUSTER_NAME                   TABLE_NAME
------------------------------ ------------------------------
C_COBJ#                        CCOL$
                               CDEF$
C_FILE#_BLOCK#                 UET$
                               SEG$
C_MLOG#                        MLOG$
                               SLOG$
C_OBJ#                         ICOL$
                               CLU$
                               COL$
                               TYPE_MISC$
                               VIEWTRCOL$
                               ATTRCOL$
                               SUBCOLTYPE$
                               COLTYPE$
                               LOB$
                               TAB$
                               IND$
                               ICOLDEP$
                               OPQTYPE$
                               REFCON$
LIBRARY$
                               NTAB$
C_OBJ#_INTCOL#                 HISTGRM$
C_RG#                          RGROUP$
                               RGCHILD$
C_TOID_VERSION#                TYPE$
                               COLLECTION$
                               METHOD$
                               RESULT$
                               PARAMETER$
                               ATTRIBUTE$
C_TS#                          TS$
                               FET$
C_USER#                        USER$
                               TSQ$
SMON_SCN_TO_TIME               SMON_SCN_TIME

36 rows selected.

As you can see, most of the object-related data is stored in a single cluster (the C_OBJ# cluster): 16 tables sharing the same block. It is mostly column-related information stored there, so all of the information about the set of columns of a table or index is stored physically on the same block. This makes sense, as when Oracle parses a query, it wants to have access to the data for all of the columns in the referenced table. If this data were spread all over the place, it would take a while to get it together. Here, it is on a single block typically and readily available.

When would you use a cluster? It is easier perhaps to describe when not to use one:

  • If you anticipate the tables in the cluster will be modified heavily: You must be aware that an index cluster will have certain negative side effects on DML performance (INSERT statements in particular). It takes more work to manage the data in a cluster.
  • If you need to perform full scans of tables in clusters: Instead of just having to full scan the data in your table, you have to full scan the data for (possibly) many tables. There is more data to scan through, so full scans will take longer.
  • If you believe you will frequently need to TRUNCATE and load the table: Tables in clusters cannot be truncated. That is obvious—since the cluster stores more than one table on a block, you must delete the rows in a cluster table.

So, if you have data that is mostly read (that does not mean "never written"; it is perfectly OK to modify cluster tables) and read via indexes, either the cluster key index or other indexes you put on the tables in the cluster, and join this information together frequently, a cluster would be appropriate. Look for tables that are logically related and always used together, like the people who designed the Oracle data dictionary when they clustered all column-related information together.

Index Clustered Tables Wrap-Up

Clustered tables give you the ability to physically "prejoin" data together. You use clusters to store related data from many tables on the same database block. Clusters can help read-intensive operations that always join data together or access related sets of data (e.g., everyone in department 10).

Clustered tables reduce the number of blocks that Oracle must cache. Instead of keeping ten blocks for ten employees in the same department, Oracle will put them in one block and therefore increase the efficiency of your buffer cache. On the downside, unless you can calculate your SIZE parameter setting correctly, clusters may be inefficient with their space utiliza-tion and can tend to slow down DML-heavy operations.

Hash Clustered Tables

Hash clustered tables are very similar in concept to the index clustered tables just described with one main exception: the cluster key index is replaced with a hash function. The data in the table is the index; there is no physical index. Oracle will take the key value for a row, hash it using either an internal function or one you supply, and use that to figure out where the data should be on disk. One side effect of using a hashing algorithm to locate data, however, is that you cannot range scan a table in a hash cluster without adding a conventional index to the table. In an index cluster, the query

select * from emp where deptno between 10 and 20

would be able to make use of the cluster key index to find these rows. In a hash cluster, this query would result in a full table scan unless you had an index on the DEPTNO column. Only exact equality searches (including in lists and subqueries) may be made on the hash key without using an index that supports range scans.

In a perfect world, with nicely distributed hash key values and a hash function that distributes them evenly over all of the blocks allocated to the hash cluster, we can go straight from a query to the data with one I/O. In the real world, we will end up with more hash key values hashing to the same database block address than fit on that block. This will result in Oracle having to chain blocks together in a linked list to hold all of the rows that hash to this block. Now, when we need to retrieve the rows that match our hash key, we might have to visit more than one block.

Like a hash table in a programming language, hash tables in the database have a fixed "size." When you create the table, you must determine the number of hash keys your table will have, forever. That does not limit the amount of rows you can put in there.

Figure 10-9 shows a graphical representation of a hash cluster with table EMP created in it. When the client issues a query that uses the hash cluster key in the predicate, Oracle will apply the hash function to determine which block the data should be in. It will then read that one block to find the data. If there have been many collisions, or the SIZE parameter to the CREATE CLUSTER was underestimated, Oracle will have allocated overflow blocks that are chained off the original block.

image

Figure 10-9. Hash cluster depiction

When you create a hash cluster, you'll use the same CREATE CLUSTER statement you used to create the index cluster with different options. You'll just be adding a HASHKEYS option to it to specify the size of the hash table. Oracle will take your HASHKEYS value and round it up to the nearest prime number (the number of hash keys will always be a prime). Oracle will then compute a value based on the SIZE parameter multiplied by the modified HASHKEYS value. It will allocate at least that much space in bytes for the cluster. This is a big difference from the preceding index cluster, which dynamically allocates space as it needs it. A hash cluster pre-allocates enough space to hold (HASHKEYS/trunc(blocksize/SIZE)) bytes of data. For example, if you set your SIZE to 1,500 bytes and you have a 4KB blocksize, Oracle will expect to store two keys per block. If you plan on having 1,000 HASHKEYs, Oracle will allocate 500 blocks.

It is interesting to note that unlike a conventional hash table in a computer language, it is OK to have hash collisions—in fact, it is desirable in many cases. If you take the same DEPT/EMP example from earlier, you could set up a hash cluster based on the DEPTNO column. Obviously, many rows will hash to the same value, and you expect them to (they have the same DEPTNO). This is what the cluster is about in some respects: clustering like data together. This is why Oracle asks you to specify the HASHKEYs (how many department numbers you anticipate over time) and SIZE (what the size of the data is that will be associated with each department number). It allocates a hash table to hold HASHKEY number of departments of SIZE bytes each. What you do want to avoid is unintended hash collisions. It is obvious that if you set the size of the hash table to 1,000 (really 1,009, since the hash table size is always a prime number and Oracle rounds up for you), and you put 1,010 departments in the table, there will be at least one collision (two different departments hashing to the same value). Unintended hash collisions are to be avoided, as they add overhead and increase the probability of block chaining occurring.

To see what sort of space hash clusters take, we'll use a small utility stored procedure, SHOW_SPACE (for details on this procedure, see the "Setup" section at the beginning of the book), that we'll use in this chapter and in the next chapter. This routine just uses the DBMS_SPACE-supplied package to get details about the storage used by segments in the database.

Now if we issue a CREATE CLUSTER statement such as the following, we can see the storage it allocated:

ops$tkyte@ORA10GR1> create cluster hash_cluster
  2  ( hash_key number )
  3  hashkeys 1000
  4  size 8192
  5  tablespace mssm
  6  /
Cluster created.

ops$tkyte@ORA10GR1> exec show_space( 'HASH_CLUSTER', user, 'CLUSTER' )
Free Blocks.............................               0
Total Blocks............................           1,024
Total Bytes.............................       8,388,608
Total MBytes............................               8
Unused Blocks...........................              14
Unused Bytes............................         114,688
Last Used Ext FileId....................               9
Last Used Ext BlockId...................           1,033
Last Used Block.........................             114

PL/SQL procedure successfully completed.

We can see that the total number of blocks allocated to the table is 1,024. Fourteen of these blocks are unused (free). One block goes to table overhead, to manage the extents. Therefore, 1,009 blocks are under the HWM of this object, and these are used by the cluster. The prime 1,009 just happens to be the next largest prime over 1,000, and since the blocksize is 8KB, we can see that Oracle did in fact allocate (8192 × 1009) blocks. The figure is a little higher than this, due to the way extents are rounded and/or by using locally managed tablespaces with uniformly sized extents.

This example points out one of the issues with hash clusters you need to be aware of. Normally, if you create an empty table, the number of blocks under the HWM for that table is 0. If you full scan it, it reaches the HWM and stops. With a hash cluster, the tables will start out big and will take longer to create, as Oracle must initialize each block, an action that normally takes place as data is added to the table. They have the potential to have data in their first block and their last block, with nothing in between. Full scanning a virtually empty hash cluster will take as long as full scanning a full hash cluster. This is not necessarily a bad thing; you built the hash cluster to have very fast access to the data by a hash key lookup. You did not build it to full scan it frequently.

Now we can start placing tables into the hash cluster in the same fashion we did with index clusters:

Ops$tkyte@ORA10GR1> create table hashed_table
  2  ( x number, data1 varchar2(4000), data2 varchar2(4000) )
  3  cluster hash_cluster(x);
Table created.

To see the difference a hash cluster can make, I set up a small test. I created a hash cluster, loaded some data in it, copied this data to a "regular" table with a conventional index on it, and then did random reads on each table (the same "random" reads on each). Using runstats, SQL_TRACE, and TKPROF, I was able to determine the characteristics of each. The following is the setup I performed, followed by the analysis:

ops$tkyte@ORA10GR1> create cluster hash_cluster
  2  ( hash_key number )
  3  hashkeys 75000
  4  size 150
  5  /
Cluster created.

ops$tkyte@ORA10GR1> create table t_hashed
  2  cluster hash_cluster(object_id)
  3  as
  4  select *
  5  from all_objects
  6  /
Table created.

ops$tkyte@ORA10GR1> alter table t_hashed add constraint
  2  t_hashed_pk primary key(object_id)
  2  /
Table altered.

ops$tkyte@ORA10GR1> begin
  2    dbms_stats.gather_table_stats( user, 'T_HASHED', cascade=>true );
  3 end;
  4 /
PL/SQL procedure successfully completed.

I created the hash cluster with a SIZE of 150 bytes. This is because I determined the average row size for a row in my table would be about 100 bytes, but would vary up and down based on the data. I then created and populated a table in that cluster as a copy of ALL_OBJECTS.

Next, I created the "conventional clone" of the table:

ops$tkyte@ORA10GR1> create table t_heap
  2  as
  3  select *
  4  from t_hashed
  5  /
Table created.

ops$tkyte@ORA10GR1> alter table t_heap add constraint
  2  t_heap_pk primary key(object_id)
  3  /
Table altered.

ops$tkyte@ORA10GR1> begin
  2    dbms_stats.gather_table_stats( user, 'T_HEAP', cascade=>true );
  3 end;
  4 /
PL/SQL procedure successfully completed.

Now, all I needed was some "random" data to pick rows from each of the tables with. To achieve that, I simply selected all of the OBJECT_IDs into an array and had them sorted randomly, to hit the table all over in a scattered fashion. I used a PL/SQL package to define and declare the array and a bit of PL/SQL code to "prime" the array, to fill it up:

ops$tkyte@ORA10GR1> create or replace package state_pkg
  2  as
  3      type array is table of t_hashed.object_id%type;
  4      g_data array;
  5  end;
  6  /
Package created.

ops$tkyte@ORA10GR1> begin
  2      select object_id bulk collect into state_pkg.g_data
  3        from t_hashed
  4       order by dbms_random.random;
  5  end;
  6  /
PL/SQL procedure successfully completed.

To see the work performed by each, I used the following block of code (if you replace occurrences of the word HASHED with HEAP, you have the other block of code you need to test against):

ops$tkyte@ORA10GR1> declare
  2      l_rec t_hashed%rowtype;
  3  begin
  4      for i in 1 .. state_pkg.g_data.count
  5      loop
  6          select * into l_rec from t_hashed
  7          where object_id = state_pkg.g_data(i);
  8      end loop;
  9  end;
 10  /
PL/SQL procedure successfully completed.

Next, I ran the preceding block of code three times (and the copy of that block of code where HASHED is replaced with HEAP as well). The first run was to "warm up" the system, to get any hard parses out of the way. The second time I ran the blocks of code, I used runstats to see the material differences between the two: running first the hashed implementation and then the heap. The third time I ran the blocks of code, I did so with SQL_TRACE enabled so I could see a TKPROF report. The runstats run reported the following:

ops$tkyte@ORA10GR1> exec runstats_pkg.rs_stop(10000);
Run1 ran in 263 hsecs
Run2 ran in 268 hsecs
run 1 ran in 98.13% of the time

Name                                  Run1        Run2        Diff
LATCH.cache buffers chains          99,891     148,031      48,140
STAT...Cached Commit SCN refer      48,144           0     −48,144
STAT...no work - consistent re      48,176           0     −48,176
STAT...cluster key scans            48,176           0     −48,176
STAT...cluster key scan block       48,176           0     −48,176
STAT...table fetch by rowid              0      48,176      48,176
STAT...rows fetched via callba           0      48,176      48,176
STAT...buffer is not pinned co      48,176      96,352      48,176
STAT...index fetch by key                0      48,176      48,176
STAT...session logical reads        48,901     145,239      96,338
STAT...consistent gets              48,178     144,530      96,352
STAT...consistent gets from ca      48,178     144,530      96,352
STAT...consistent gets - exami           1     144,529     144,528

Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
347,515     401,961      54,446     86.45%

Now, these two simulations ran in about the same amount of time by the wall clock. As I had a buffer cache larger than large enough to cache these results, I was expecting that. The material difference to note, however, is the large reduction in cache buffers chains latches. The first implementation (hashed) used significantly fewer, meaning the hashed implementation should scale better in a read-intensive environment, since it needs fewer resources that require some level of serialization. This was due entirely to the fact that the I/O needed by the hashed implementation was significantly reduced over the HEAP table—you can see the statistic consistent gets in that report bears this out. The TKPROF shows it even more clearly:

SELECT * FROM T_HASHED WHERE OBJECT_ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  48174      4.77       4.83          0          2          0           0
Fetch    48174      1.50       1.46          0      48174          0       48174
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    96349      6.27       6.30          0      48176          0       48174

Rows     Row Source Operation
-------  ---------------------------------------------------
  48174  TABLE ACCESS HASH T_HASHED (cr=48174 pr=0 pw=0 time=899962 us)
********************************************************************************
SELECT * FROM T_HEAP WHERE OBJECT_ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  48174      5.37       5.02          0          0          0           0
Fetch    48174      1.36       1.32          0     144522          0       48174
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    96349      6.73       6.34          0     144522          0       48174

Rows     Row Source Operation
-------  ---------------------------------------------------
  48174  TABLE ACCESS BY INDEX ROWID T_HEAP (cr=144522 pr=0 pw=0 time=1266695 us)
  48174  INDEX UNIQUE SCAN T_HEAP_PK (cr=96348 pr=0 pw=0 time=700017 us)(object ...

The HASHED implementation simply converted the OBJECT_ID passed into the query into a FILE/BLOCK to be read and read it—no index. The HEAP table, however, had to do two I/Os on the index for each row. The cr=96348 in the TKPROF Row Source Operation line shows us exactly how many consistent reads were done against the index. Each time I looked up OBJECT_ID = :B1, Oracle had to get the root block of the index and then find the leaf block containing the location of that row. Then, I had to take the leaf block information, which included the ROWID of that row, and access that row in the table for a third I/O. The HEAP table did three times the I/O of the HASHED implementation.

The points of interest here are as follows:

  • The hash cluster did significantly less I/O (query column). This is what we anticipated. The query simply took the random OBJECT_IDs, performed the hash on them, and went to the block. The hash cluster has to do at least one I/O to get the data. The conventional table with an index had to perform index scans followed by a table access by rowid to get the same answer. The indexed table has to do at least three I/Os in this case to get the data.
  • The hash cluster query took the same amount of CPU for all intents and purposes, even though it went to the buffer cache one-third as many times. This, too, could be anticipated. The act of performing a hash is very CPU intensive. The act of performing an index lookup is I/O intensive. It was a tradeoff here. However, as we scale up users, we would expect the hash cluster query to scale better, as it has to get in line to access the buffer cache less frequently.

This last point is the important one. When working with computers, it is all about resources and their utilization. If we are I/O bound and perform queries that do lots of keyed reads like I just did, a hash cluster may improve performance. If we are already CPU bound, a hash cluster may possibly decrease performance since it needs more CPU horsepower to hash, but if the extra CPU we are burning is due to spinning on cache buffers chains latches, the hash cluster could significantly reduce the CPU needed. This is one of the main reasons why rules of thumb do not work on real-world systems: what works for you might not work for others in similar but different conditions.

There is a special case of a hash cluster called a single table hash cluster. This is an optimized version of the general hash cluster we've already looked at. It supports only one table in the cluster at a time (you have to DROP the existing table in a single table hash cluster before you can create another). Additionally, if there is a one-to-one mapping between hash keys and data rows, the access to the rows is somewhat faster as well. These hash clusters are designed for those occasions when you want to access a table by primary key and do not care to cluster other tables with it. If you need fast access to an employee record by EMPNO, a single table hash cluster might be called for. I did the preceding test on a single table hash cluster as well and found the performance to be even better than just a hash cluster. You could even go a step further with this example, however, and take advantage of the fact that Oracle will allow you to write your own specialized hash function (instead of using the default one provided by Oracle). You are limited to using only the columns available in the table, and you may use only the Oracle built-in functions (e.g., no PL/SQL code) when writing these hash functions. By taking advantage of the fact that OBJECT_ID is a number between 1 and 75,000 in the preceding example, I made my "hash function" simply be the OBJECT_ID column itself. In this fashion, I am guaranteed to never have a hash collision. Putting it all together, I'll create a single table hash cluster with my own hash function via

ops$tkyte@ORA10GR1> create cluster hash_cluster
  2  ( hash_key number(10) )
  3  hashkeys 75000
  4  size 150
  5  single table
  6  hash is HASH_KEY
  7  /
Cluster created.

I've simply added the key words SINGLE TABLE to make it a single table hash cluster. My HASH IS function is simply the HASH_KEY cluster key in this case. This is a SQL function, so I could have used trunc(mod(hash_key/324+278,555)/abs(hash_key+1)) if I wanted (not that this is a good hash function—it just demonstrates that we can use a complex function there if we wish). I used a NUMBER(10) instead of just a number. Since the hash value must be an integer, it cannot have any fractional components. Then, I create the table in that cluster

ops$tkyte@ORA10GR1> create table t_hashed
  2  cluster hash_cluster(object_id)
  3  as
  4  select OWNER, OBJECT_NAME, SUBOBJECT_NAME,
  5         cast( OBJECT_ID as number(10) ) object_id ,
  6         DATA_OBJECT_ID, OBJECT_TYPE, CREATED,
  7         LAST_DDL_TIME, TIMESTAMP, STATUS, TEMPORARY,
  8         GENERATED, SECONDARY
  9    from all_objects
 10  /
Table created.

to build the hashed table. Note the use of the CAST built-in function to make the datatype of OBJECT_ID be what it must be. I ran the test as before (three runs of each block), and this time the runstats output was consistently even more positive:

Run1 ran in 224 hsecs
Run2 ran in 269 hsecs
run 1 ran in 83.27% of the time

Name                                  Run1        Run2        Diff
STAT...index fetch by key                0      48,178      48,178
STAT...buffer is not pinned co      48,178      96,356      48,178
STAT...table fetch by rowid              0      48,178      48,178
STAT...cluster key scans            48,178           0     −48,178
STAT...session logical reads        48,889     145,245      96,356
STAT...consistent gets              48,184     144,540      96,356
STAT...consistent gets from ca      48,184     144,540      96,356
STAT...consistent gets - exami      48,184     144,540      96,356
LATCH.cache buffers chains          51,663     148,019      96,356

Run1 latches total versus runs -- difference and pct
Run1       Run2      Diff      Pct
298,987    402,085   103,098   74.36%

PL/SQL procedure successfully completed.

This single table hash cluster required even less latching into the buffer cache to process (it can stop looking for data sooner, and it has more information). As a result, the TKPROF report shows a measurable decrease in CPU utilization this time around:

SELECT * FROM T_HASHED WHERE OBJECT_ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  48178      4.45       4.52          0          2          0           0
Fetch    48178      0.67       0.82          0      48178          0       48178
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    96357      5.12       5.35          0      48180          0       48178

Rows   Row Source Operation
------- ---------------------------------------------------
  48178 TABLE ACCESS HASH T_HASHED (cr=48178 pr=0 pw=0 time=551123 us)
********************************************************************************
SELECT * FROM T_HEAP WHERE OBJECT_ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  48178      5.38       4.99          0          0          0           0
Fetch    48178      1.25       1.65          0     144534          0       48178
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    96357      6.63       6.65          0     144534          0       48178

Rows    Row Source Operation
------- ---------------------------------------------------
  48178 TABLE ACCESS BY INDEX ROWID T_HEAP (cr=144534 pr=0 pw=0 time=1331049 us)
  48178  INDEX UNIQUE SCAN T_HEAP_PK (cr=96356 pr=0 pw=0 time=710295 us)(object...

Hash Clustered Tables Wrap-Up

That is the nuts and bolts of a hash cluster. Hash clusters are similar in concept to index clusters, except a cluster index is not used. The data is the index in this case. The cluster key is hashed into a block address and the data is expected to be there. The important things to understand about hash clusters are as follows:

  • The hash cluster is allocated right from the beginning. Oracle will take your HASHKEYS/trunc(blocksize/SIZE) and allocate and format that space right away. As soon as the first table is put in that cluster, any full scan will hit every allocated block. This is different from every other table in this respect.
  • The number of HASHKEYs in a hash cluster is a fixed size. You cannot change the size of the hash table without a rebuild of the cluster. This does not in any way limit the amount of data you can store in this cluster; it simply limits the number of unique hash keys that can be generated for this cluster. That may affect performance due to unintended hash collisions if the value was set too low.
  • Range scanning on the cluster key is not available. Predicates such as WHERE cluster_key BETWEEN 50 AND 60 cannot use the hashing algorithm. There are an infinite number of possible values between 50 and 60, and the server would have to generate them all to hash each one and see if there was any data there. This is not possible. The cluster will be full scanned if you use a range on a cluster key and have not indexed it using a conventional index.

Hash clusters are suitable in the following situations:

  • You know with a good degree of accuracy how many rows the table will have over its life, or you have some reasonable upper bound. Getting the size of the HASHKEYs and SIZE parameters right is crucial to avoid a rebuild.
  • DML, especially inserts, is light with respect to retrieval. This means you have to balance optimizing data retrieval with new data creation. Light inserts might be 100,000 per unit of time for one person and 100 per unit of time for another—all depending on their data retrieval patterns. Updates do not introduce significant overhead, unless you update the HASHKEY, which would not be a good idea, as it would cause the row to migrate.
  • You access the data by the HASHKEY value constantly. For example, say you have a table of parts, and these parts are accessed by part number. Lookup tables are especially appropriate for hash clusters.

Sorted Hash Clustered Tables

Sorted hash clusters are new in Oracle 10g. They combine the qualities of the hash cluster just described with those of an IOT. They are most appropriate when you constantly retrieve data using a query similar to this:

Select *
  From t
 Where KEY=:x
 Order by SORTED_COLUMN

That is, you retrieve the data by some key and need that data ordered by some other column. Using a sorted hash cluster, Oracle can return the data without performing a sort at all. It accomplishes this by storing the data upon insert in sorted order physically—by key. Suppose you have a customer order table:

ops$tkyte@ORA10G> select cust_id, order_dt, order_number
  2  from cust_orders
  3  order by cust_id, order_dt;

CUST_ID ORDER_DT                     ORDER_NUMBER
------- ---------------------------- ------------
      1 31-MAR-05 09.13.57.000000 PM        21453
        11-APR-05 08.30.45.000000 AM        21454
        28-APR-05 06.21.09.000000 AM        21455
      2 08-APR-05 03.42.45.000000 AM        21456
        19-APR-05 08.59.33.000000 AM        21457
        27-APR-05 06.35.34.000000 AM        21458
        30-APR-05 01.47.34.000000 AM        21459

7 rows selected.

The table is stored in a sorted hash cluster, whereby the HASH key is CUST_ID and the field to sort on is ORDER_DT. Graphically, it might look like Figure 10-10, where 1, 2, 3, 4, ... represent the records stored sorted on each block.

image

Figure 10-10.Sorted hash cluster depiction

Creating a sorted hash cluster is much the same as the other clusters. To set up a sorted hash cluster capable of storing the above data, we could use the following:

ops$tkyte@ORA10G> CREATE CLUSTER shc
  2  (
  3     cust_id   NUMBER,
  4     order_dt  timestamp SORT
  5  )
  6  HASHKEYS 10000
  7  HASH IS cust_id
  8  SIZE 8192
  9  /
Cluster created.

We've introduced a new keyword here: SORT. When we created the cluster, we identified the HASH IS CUST_ID and we added an ORDER_DT of type timestamp with the keyword SORT. This means the data will be located by CUST_ID (where CUST_ID=:X) and physically retrieved and sorted by ORDER_DT. Technically, it really means we'll store some data that will be retrieved via a NUMBER column and sorted by the TIMESTAMP. The column names here are not relevant, as they were not in the B*Tree or HASH clusters, but convention would have us name them after what they represent.

The CREATE TABLE for our CUST_ORDERS would look like this:

ops$tkyte@ORA10G> CREATE TABLE cust_orders
  2  (  cust_id       number,
  3     order_dt      timestamp SORT ,
  4      order_number number,
  5     username      varchar2(30),
  6     ship_addr     number,
  7     bill_addr     number,
  8     invoice_num   number
  9  )
 10  CLUSTER shc ( cust_id, order_dt )
 11  /
Table created.

We've mapped the CUST_ID column of this table to the hash key for the sorted hash cluster and the ORDER_DT column to the SORT column. We can observe using AUTOTRACE in SQL*Plus that the normal sort operations we expect are missing when accessing the sorted hash cluster:

ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> variable x number
ops$tkyte@ORA10G> select cust_id, order_dt, order_number
  2    from cust_orders
  3   where cust_id = :x
  4   order by order_dt;

Execution Plan
----------------------------------------------------------
  0       SELECT STATEMENT Optimizer=ALL_ROWS (Cost=0 Card=4 Bytes=76)
  1     0    TABLE ACCESS (HASH) OF 'CUST_ORDERS' (CLUSTER (HASH))

ops$tkyte@ORA10G> select job, hiredate, empno
  2    from scott.emp
  3   where job = 'CLERK'
  4   order by hiredate;

Execution Plan
----------------------------------------------------------
  0       SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=3 Bytes=60)
  1     0   SORT (ORDER BY) (Cost=3 Card=3 Bytes=60)
  2     1     TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 Card=3 ...
  3     2       INDEX (RANGE SCAN) OF 'JOB_IDX' (INDEX) (Cost=1 Card=3)

ops$tkyte@ORA10G> set autotrace off

I added the query against the normal SCOTT.EMP table (after indexing the JOB column for this demonstration) to compare what we normally expect to see: the SCOTT.EMP query plan versus what the sorted hash cluster can do for us when we want to access the data in a FIFO mode (like a queue). As you can see, the sorted hash cluster has one step: it takes the CUST_ID=:X, hashes the input, finds the first row, and just starts reading the rows, as they are in order already. The regular table is much different: it finds all the JOB='CLERK' rows (which could be anywhere in that heap table), sorts them, and then returns the first one.

So, the sorted hash cluster has all the retrieval aspects of the hash cluster, in that it can get to the data without having to traverse an index, and many of the features of the IOT, in that the data will be sorted within that key by some field of your choice. This data structure works well when the input data arrives "in order" by the sort field, by key. That is, over time the data arrives in increasing sort order for any given key value. Stock information fits this requirement as an example. Every night you get a new file full of stock symbols, the date (the date would be the sort key and the stock symbol would be the hash key), and related information. You receive and load this data in sort key order. The stock data for stock symbol ORCL for yesterday does not arrive after today—you would load yesterday's value, and then today's value, and later tomorrow's value. If the information arrives randomly (not in sort order), this data structure quickly breaks down during the insert process, as much data has to be moved to put the rows physically in order on disk. A sorted hash cluster is not recommended in that case (an IOT, on the other hand, could well be useful for that data).

When considering using this structure, you should employ the same considerations from the hash cluster section, in addition to the constraint that the data should arrive sorted for each key value over time.

Nested Tables

Nested tables are part of the object-relational extensions to Oracle. A nested table, one of the two collection types in Oracle, is very similar to a child table in a traditional parent/child table pair in the relational model. It is an unordered set of data elements, all of the same data type, which could be either a built-in data type or an object data type. It goes one step further, however, since it is designed to give the illusion that each row in the parent table has its own child table. If there are 100 rows in the parent table, then there are virtually 100 nested tables. Physically, there is only the single parent and the single child table. There are large syntactic and semantic differences between nested tables and parent/child tables as well, and we'll look at those in this section.

There are two ways to use nested tables. One is in your PL/SQL code as a way to extend the PL/SQL language. The other is as a physical storage mechanism for persistent storage of collections. I personally use them in PL/SQL all of the time, but have never used them as a permanent storage mechanism.

In this section, I'll briefly introduce the syntax to create, query, and modify nested tables. Then we'll look at some of the implementation details and what is important to know about how Oracle really stores nested tables.

Nested Tables Syntax

The creation of a table with a nested table is fairly straightforward—it is the syntax for manipulating it that gets a little complex. Let's use the simple EMP and DEPT tables to demonstrate. We're familiar with that little data model that is implemented relationally as follows:

ops$tkyte@ORA10GR1> create table dept
  2  (deptno number(2) primary key,
  3   dname     varchar2(14),
  4   loc       varchar2(13)
  5  );
Table created.

ops$tkyte@ORA10GR1> create table emp
  2  (empno       number(4) primary key,
  3   ename       varchar2(10),
  4   job         varchar2(9),
  5   mgr         number(4) references emp,
  6   hiredate    date,
  7   sal         number(7, 2),
  8   comm        number(7, 2),
  9   deptno      number(2) references dept
 10  );
Table created.

with primary and foreign keys. We'll do the equivalent implementation using a nested table for the EMP table:

ops$tkyte@ORA10GR1> create or replace type emp_type
  2  as object
  3  (empno       number(4),
  4   ename       varchar2(10),
  5   job         varchar2(9),
  6   mgr         number(4),
  7   hiredate    date,
  8   sal         number(7, 2),
  9   comm        number(7, 2)
 10  );
 11  /
Type created.

ops$tkyte@ORA10GR1> create or replace type emp_tab_type
  2  as table of emp_type
  3  /
Type created.

To create a table with a nested table, we need a nested table type. The preceding code creates a complex object type, EMP_TYPE, and a nested table type of that, EMP_TAB_TYPE. In PL/SQL, this will be treated much like an array would. In SQL, it will cause a physical nested table to be created. Here is the simple CREATE TABLE statement that uses it:

ops$tkyte@ORA10G> create table dept_and_emp
  2  (deptno number(2) primary key,
  3   dname     varchar2(14),
  4   loc       varchar2(13),
  5   emps      emp_tab_type
  6 )
  7 nested table emps store as emps_nt;
Table created.

ops$tkyte@ORA10G> alter table emps_nt add constraint
  2  emps_empno_unique unique(empno)
  3  /
Table altered.

The important part of this CREATE TABLE is the inclusion of the column EMPS of EMP_TAB_TYPE and the corresponding NESTED TABLE EMPS STORE AS EMPS_NT. This created a real physical table, EMPS_NT, separate from and in addition to the table DEPT_AND_EMP. We add a constraint on the EMPNO column directly on the nested table to make the EMPNO unique as it was in our original relational model. We cannot implement our full data model; however, there is the self-referencing constraint:

ops$tkyte@ORA10G> alter table emps_nt add constraint mgr_fk
  2 foreign key(mgr) references emps_nt(empno);
alter table emps_nt add constraint mgr_fk
*
ERROR at line 1:
ORA-30730: referential constraint not allowed on nested table column

This will simply not work. Nested tables do not support referential integrity constraints, as they cannot reference any other table—even themselves. So, we'll just skip that for now. Next, we'll populate this table with the existing EMP and DEPT data:

ops$tkyte@ORA10G> insert into dept_and_emp
  2  select dept.*,
  3     CAST( multiset( select empno, ename, job, mgr, hiredate, sal, comm
  4                       from SCOTT.EMP
  5                       where emp.deptno = dept.deptno ) AS emp_tab_type )
  6     from SCOTT.DEPT
  7  /
4 rows created.

There are two things to notice here:

  • Only "four" rows were created. There are really only four rows in the DEPT_AND_EMP table. The 14 EMP rows don't exist independently.
  • The syntax is getting pretty exotic. CAST and MULTISET is syntax most people have never used. You will find lots of exotic syntax when dealing with object-relational components in the database. The MULTISET keyword is used to tell Oracle the subquery is expected to return more than one row (subqueries in a SELECT list have previously been limited to returning one row). CAST is used to instruct Oracle to treat the returned set as a collection type—in this case, we CAST the MULTISET to be an EMP_TAB_TYPE. CAST is a general-purpose routine not limited to use in collections. For example, if we wanted to fetch the EMPNO column from EMP as a VARCHAR2(20) instead of a NUMBER(4) type, we may use the query SELECT CAST( EMPNO AS VARCHAR2(20) ) E FROM EMP.

We're now ready to query the data. Let's see what one row might look like:

ops$tkyte@ORA10G> select deptno, dname, loc, d.emps AS employees
  2  from dept_and_emp d
  3  where deptno = 10
  4  /

    DEPTNO DNAME          LOC           EMPLOYEES(EMPNO, ENAME, JOB,
---------- -------------- ------------- ----------------------------
        10 ACCOUNTING     NEW YORK      EMP_TAB_TYPE(EMP_TYPE(7782,
                                        'CLARK', 'MANAGER', 7839, '0
                                        9-JUN-81', 2450, NULL), EMP_
                                        TYPE(7839, 'KING', 'PRESIDEN
                                        T', NULL, '17-NOV-81', 5000,
                                         NULL), EMP_TYPE(7934, 'MILL
                                        ER', 'CLERK', 7782, '23-JAN-
                                        82', 1300, NULL))

All of the data is there, in a single column. Most applications, unless they are specifically written for the object-relational features, will not be able to deal with this particular column. For example, ODBC doesn't have a way to deal with a nested table (JDBC, OCI, Pro*C, PL/SQL, and most other APIs and languages do). For those cases, Oracle provides a way to un-nest a collection and treat it much like a relational table:

ops$tkyte@ORA10G> select d.deptno, d.dname, emp.*
  2 from dept_and_emp D, table(d.emps) emp
  3 /

DEPTNO DNAME       EMPNO ENAME      JOB         MGR HIREDATE    SAL  COMM
------ ----------- ----- ---------- --------- ----- --------- ----- -----
    10 ACCOUNTING   7782 CLARK      MANAGER    7839 09-JUN-81  2450
    10 ACCOUNTING   7839 KING       PRESIDENT       17-NOV-81  5000
    10 ACCOUNTING   7934 MILLER     CLERK      7782 23-JAN-82  1300
    20 RESEARCH     7369 SMITH      CLERK      7902 17-DEC-80   800
    20 RESEARCH     7566 JONES      MANAGER    7839 02-APR-81  2975
    20 RESEARCH     7788 SCOTT      ANALYST    7566 09-DEC-82  3000
    20 RESEARCH     7876 ADAMS      CLERK      7788 12-JAN-83  1100
    20 RESEARCH     7902 FORD       ANALYST    7566 03-DEC-81  3000
    30 SALES        7499 ALLEN      SALESMAN   7698 20-FEB-81  1600   300
    30 SALES        7521 WARD       SALESMAN   7698 22-FEB-81  1250   500
    30 SALES        7654 MARTIN     SALESMAN   7698 28-SEP-81  1250  1400
    30 SALES        7698 BLAKE      MANAGER    7839 01-MAY-81  2850
    30 SALES        7844 TURNER     SALESMAN   7698 08-SEP-81  1500     0
    30 SALES        7900 JAMES      CLERK      7698 03-DEC-81   950

14 rows selected.

We are able to cast the EMPS column as a table and it naturally did the join for us—no join conditions were needed. In fact, since our EMP type doesn't have the DEPTNO column, there is nothing for us apparently to join on. Oracle takes care of that nuance for us.

So, how can we update the data? Let's say we want to give department 10 a $100 bonus. We would code the following:

ops$tkyte@ORA10G> update
  2    table( select emps
  3             from dept_and_emp
  4                    where deptno = 10
  5             )
  6  set comm = 100
  7  /
3 rows updated.

Here is where the "virtually a table for every row" comes into play. In the SELECT predicate shown earlier, it may not have been obvious that there was a table per row; especially since the joins and such aren't there, it looks a little like "magic." The UPDATE statement, however, shows that there is a table per row. We selected a discrete table to UPDATE—this table has no name, only a query to identify it. If we use a query that does not SELECT exactly one table, we will receive the following:

ops$tkyte@ORA10G> update
  2    table( select emps
  3             from dept_and_emp
  4               where deptno = 1
  5        )
  6  set comm = 100
  7  /
update
*
ERROR at line 1:
ORA-22908: reference to NULL table value

ops$tkyte@ORA10G> update
  2    table( select emps
  3             from dept_and_emp
  4               where deptno > 1
  5       )
  6  set comm = 100
  7  /
  table( select emps
         *
ERROR at line 2:
ORA-01427: single-row subquery returns more than one row

If we return fewer than one row (one nested table instance), the update fails. Normally an update of zero rows is OK, but not in this case—it returns an error the same as if we left the table name off the update. If we return more than one row (more than one nested table instance), the update fails. Normally an update of many rows is perfectly OK. This shows that Oracle considers each row in the DEPT_AND_EMP table to point to another table, not just another set of rows as the relational model does.

This is the semantic difference between a nested table and a parent/child relational table. In the nested table model, there is one table per parent row. In the relational model, there is one set of rows per parent row. This difference can make nested tables somewhat cumbersome to use at times. Consider this model we are using, which provides a very nice view of the data from the perspective of single department. It is a terrible model if we want to ask questions like "What department does KING work for?", "How many accountants do we have working for us?", and so on. These questions are best asked of the EMP relational table, but in this nested table model we can only access the EMP data via the DEPT data. We must always join; we cannot query the EMP data alone. Well, we can't do it in a supported, documented method, but we can use a trick (more on this trick later). If we needed to update every row in the EMPS_NT, we would have to do four updates: one each for the rows in DEPT_AND_EMP to update the virtual table associated with each row.

Another thing to consider is that when we updated the employee data for department 10, we were semantically updating the EMPS column in the DEPT_AND_EMP table. We understand that physically there are two tables involved, but semantically there is only one. Even though we updated no data in the department table, the row that contains the nested table we did modify is locked from update by other sessions. In a traditional parent/child table relationship, this would not be the case.

These are the reasons why I tend to stay away from nested tables as a persistent storage mechanism. It is the rare child table that is not queried stand-alone. In the preceding example, the EMP table should be a strong entity. It stands alone, so it needs to be queried alone. I find this to be the case almost all of the time. I tend to use nested tables via views on relational tables.

So, now that we have seen how to update a nested table instance, inserting and deleting are pretty straightforward. Let's add a row to the nested table instance department 10 and remove a row from department 20:

ops$tkyte@ORA10G> insert into table
  2  ( select emps from dept_and_emp where deptno = 10 )
  3  values
  4  ( 1234, 'NewEmp', 'CLERK', 7782, sysdate, 1200, null );
1 row created.

ops$tkyte@ORA10G> delete from table
  2  ( select emps from dept_and_emp where deptno = 20 )
  3  where ename = 'SCOTT';
1 row deleted.

ops$tkyte@ORA10G> select d.dname, e.empno, ename
  2  from dept_and_emp d, table(d.emps) e
  3  where d.deptno in ( 10, 20 );

DNAME               EMPNO ENAME
-------------- ---------- ----------
ACCOUNTING           7782 CLARK
ACCOUNTING           7839 KING
ACCOUNTING           7934 MILLER
RESEARCH             7369 SMITH
RESEARCH             7566 JONES
RESEARCH             7876 ADAMS
RESEARCH             7902 FORD
ACCOUNTING           1234 NewEmp

8 rows selected.

That is the basic syntax of how to query and modify nested tables. You will find many times that you must un-nest these tables as we just did, especially in queries, to make use of them. Once you conceptually visualize the "virtual table per row" concept, working with nested tables becomes much easier.

Previously I stated, "We must always join; we cannot query the EMP data alone," but then I followed that up with a caveat: "You can if you really need to." It is undocumented and not supported, so use it only as a last ditch method. Where it will come in most handy is if you ever need to mass update the nested table (remember, you would have to do that through the DEPT table with a join). There is an underdocumented hint (it is mentioned briefly and not fully documented), NESTED_TABLE_GET_REFS, which is used by various tools such as EXP and IMP to deal with nested tables. It is also a way to see a little more about the physical structure of the nested tables. If you use this hint, you can query to get some "magic" results. The following query is what EXP (a data unload utility) uses to extract the data from this nested table:

ops$tkyte@ORA10G> SELECT /*+NESTED_TABLE_GET_REFS*/
  2         NESTED_TABLE_ID,SYS_NC_ROWINFO$
  3   FROM "OPS$TKYTE"."EMPS_NT"
  4 /

NESTED_TABLE_ID                  SYS_NC_ROWINFO$(EMPNO, EN
-------------------------------- -------------------------
F60DEEE0FF7D7BC1E030007F01001321 EMP_TYPE(7782, 'CLARK', '
                                 MANAGER', 7839, '09-JUN-8
                                 1', 2450, 100)

F60DEEE0FF7D7BC1E030007F01001321 EMP_TYPE(7839, 'KING', 'P
                                 RESIDENT', NULL, '17-NOV-
                                 81', 5000, 100) ...

Well, this is somewhat surprising if you describe this table:

ops$tkyte@ORA10G> desc emps_nt
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 EMPNO                                  NUMBER(4)
 ENAME                                  VARCHAR2(10)
 JOB                                    VARCHAR2(9)
 MGR                                    NUMBER(4)
 HIREDATE                               DATE
 SAL                                    NUMBER(7,2)
 COMM                                   NUMBER(7,2)

These two columns don't even show up. They are part of the hidden implementation of nested tables. The NESTED_TABLE_ID is really a foreign key to the parent table DEPT_AND_EMP. DEPT_AND_EMP actually has a hidden column in it that is used to join to EMPS_NT. The SYS_NC_ROWINF$ "column" is a magic column; it is more of a function than a column. The nested table here is really an object table (it is made of an object type), and SYS_NC_INFO$ is the internal way Oracle references the row as an object, instead of referencing each of the scalar columns. Under the covers, all Oracle has done for us is implement a parent/child table with system-generated primary and foreign keys. If we dig a little deeper, we can query the "real" data dictionary to see all of the columns in the DEPT_AND_EMP table:

sys@ORA10G> select name
  2    from sys.col$
  3   where obj# = ( select object_id
  4                    from dba_objects
  5                   where object_name = 'DEPT_AND_EMP'
  6                     and owner = 'OPS$TKYTE' )
  7  /

NAME
------------------------------
DEPTNO
DNAME
EMPS
LOC
SYS_NC0000400005$

Selecting this column out from the nested table, we'll see something like this:

ops$tkyte@ORA10G> select SYS_NC0000400005$ from dept_and_emp;

SYS_NC0000400005$
--------------------------------
F60DEEE0FF887BC1E030007F01001321
F60DEEE0FF897BC1E030007F01001321
F60DEEE0FF8A7BC1E030007F01001321
F60DEEE0FF8B7BC1E030007F01001321

The weird-looking column name, SYS_NC0000400005$, is the system-generated key placed into the DEPT_AND_EMP table. If we dig even deeper, we will find that Oracle has placed a unique index on this column. Unfortunately, however, it neglected to index the NESTED_TABLE_ID in EMPS_NT. This column really needs to be indexed, as we are always joining from DEPT_AND_EMP to EMPS_NT. This is an important thing to remember about nested tables if you use them with all of the defaults as just done: always index the NESTED_TABLE_ID in the nested tables!

I've gotten off track, though, at this point—I was talking about how to treat the nested table as if it were a real table. The NESTED_TABLE_GET_REFS hint does that for us. We can use the hint like this:

ops$tkyte@ORA10G> select /*+ nested_table_get_refs */ empno, ename
  2    from emps_nt where ename like '%A%';

     EMPNO ENAME
---------- ----------
      7782 CLARK
      7876 ADAMS
      7499 ALLEN
      7521 WARD
      7654 MARTIN
      7698 BLAKE
      7900 JAMES
7 rows selected.

ops$tkyte@ORA10G> update /*+ nested_table_get_refs */ emps_nt
  2   set ename = initcap(ename);
14 rows updated.

ops$tkyte@ORA10G> select /*+ nested_table_get_refs */ empno, ename
  2  from emps_nt where ename like '%a%';

     EMPNO ENAME
---------- ----------
      7782 Clark
      7876 Adams
      7521 Ward
      7654 Martin
      7698 Blake
      7900 James
6 rows selected.

Again, this is not a thoroughly documented and supported feature. It has a specific functionality for EXP and IMP to work. This is the only environment it is assured to work in. Use it at your own risk, and resist putting it into production code. In fact, if you find you need to use it, then by definition you didn't mean to use a nested table at all! It is the wrong construct for you. Use it for one-off fixes of data or to see what is in the nested table out of curiosity. The supported way to report on the data is to un-nest it like this:

ops$tkyte@ORA10G> select d.deptno, d.dname, emp.*
  2  from dept_and_emp D, table(d.emps) emp
  3  /

This is what you should use in queries and production code.

Nested Table Storage

We have already seen some of the storage of the nested table structure. In this section, we'll take an in-depth look at the structure created by Oracle by default and what sort of control over that we have. Working with the same CREATE statement as before

ops$tkyte@ORA10G> create table dept_and_emp
  2  (deptno  number(2) primary key,
  3   dname      varchar2(14),
  4   loc        varchar2(13),
  5   emps       emp_tab_type
  6  )
  7  nested table emps store as emps_nt;
Table created.

ops$tkyte@ORA10G> alter table emps_nt add constraint emps_empno_unique
  2             unique(empno)
  3  /
Table altered.

we know that Oracle really creates a structure like the one shown in Figure 10-11.

image

Figure 10-11.Nested table physical implementation

The code created two real tables. The table we asked to have is there, but it has an extra hidden column (we'll have one extra hidden column by default for each nested table column in a table). It also created a unique constraint on this hidden column. Oracle created the nested table, EMPS_NT, for us. This table has two hidden columns, one of which, SYS_NC_ROWINFO$, is not really a column but a virtual column that returns all of the scalar elements as an object. The other is the foreign key called NESTED_TABLE_ID, which can be joined back to the parent table. Notice the lack of an index on this column. Finally, Oracle added an index on the DEPTNO column in the DEPT_AND_EMP table to enforce the primary key. So, we asked for a table and got a lot more than we bargained for. If you look at it, it is a lot like what you might create for a parent/child relationship, but you would have used the existing primary key on DEPTNO as the foreign key in EMPS_NT instead of generating a surrogate RAW(16) key.

If we look at the DBMS_METADATA.GET_DDL dump of our nested table example, we see the following:

ops$tkyte@ORA10G> begin
  2     dbms_metadata.set_transform_param
  3     ( DBMS_METADATA.SESSION_TRANSFORM, 'STORAGE', false );
  4  end;
  5  /
PL/SQL procedure successfully completed.

ops$tkyte@ORA10G> select dbms_metadata.get_ddl( 'TABLE', 'DEPT_AND_EMP' ) from dual;

DBMS_METADATA.GET_DDL('TABLE','DEPT_AND_EMP')
-------------------------------------------------------------------------------

  CREATE TABLE "OPS$TKYTE"."DEPT_AND_EMP"
   (    "DEPTNO" NUMBER(2,0),
        "DNAME" VARCHAR2(14),
        "LOC" VARCHAR2(13),
        "EMPS" "OPS$TKYTE"."EMP_TAB_TYPE",
         PRIMARY KEY ("DEPTNO")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
  TABLESPACE "USERS"  ENABLE
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  TABLESPACE "USERS"
 NESTED TABLE "EMPS" STORE AS "EMPS_NT"
 (PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 LOGGING
  TABLESPACE "USERS" ) RETURN AS VALUE

The only new thing here so far is the RETURN AS VALUE. It is used to describe how the nested table is returned to a client application. By default, Oracle will return the nested table by value to the client; the actual data will be transmitted with each row. This can also be set to RETURN AS LOCATOR, meaning the client will get a pointer to the data, not the data itself. If—and only if—the client dereferences this pointer will the data be transmitted to it. So, if you believe the client will typically not look at the rows of a nested table for each parent row, you can return a locator instead of the values, saving on the network round-trips. For example, if you have a client application that displays the lists of departments and when the user double-clicks a department it shows the employee information, you may consider using the locator. This is because the details are usually not looked at—that is the exception, not the rule.

So, what else can we do with the nested table? First, the NESTED_TABLE_ID column must be indexed. Since we always access the nested table from the parent to the child, we really need that index. We can index that column using CREATE INDEX, but a better solution is to use an IOT to store the nested table. The nested table is another perfect example of what an IOT is excellent for. It will physically store the child rows co-located by NESTED_TABLE_ID (so retrieving the table is done with less physical I/O). It will remove the need for the redundant index on the RAW(16) column. Going one step further, since the NESTED_TABLE_ID will be the leading column in the IOT's primary key, we should also incorporate index key compression to suppress the redundant NESTED_TABLE_IDs that would be there otherwise. In addition, we can incorporate our UNIQUE and NOT NULL constraint on the EMPNO column into the CREATE TABLE command. Therefore, if we take the preceding CREATE TABLE and modify it slightly, as follows:

ops$tkyte@ORA10G> CREATE TABLE "OPS$TKYTE"."DEPT_AND_EMP"
  2  ("DEPTNO" NUMBER(2, 0),
  3   "DNAME"  VARCHAR2(14),
  4   "LOC"    VARCHAR2(13),
  5  "EMPS" "EMP_TAB_TYPE")
  6  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 LOGGING
  7  STORAGE(INITIAL 131072 NEXT 131072
  8          MINEXTENTS 1 MAXEXTENTS 4096
  9          PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
 10          BUFFER_POOL DEFAULT)
 11  TABLESPACE "USERS"
 12  NESTED TABLE "EMPS"
 13     STORE AS "EMPS_NT"
 14     ( (empno NOT NULL, unique (empno), primary key(nested_table_id,empno))
 15       organization index compress 1 )
 16     RETURN AS VALUE
 17  /
Table created.

we now get the following set of objects. Instead of having a conventional table EMP_NT, we have an IOT EMPS_NT as signified by the index structure overlaid on the table in Figure 10-12.

image

Figure 10-12. Nested table implemented as an IOT

Where the EMPS_NT is an IOT using compression, it should take less storage than the original default nested table and it has the index we badly need.

Nested Tables Wrap-Up

I do not use nested tables as a permanent storage mechanism myself, for the following reasons:

  • The unnecessary storage overhead of the RAW(16) columns that are added. Both the parent and child table will have this extra column. The parent table will have an extra 16-byte RAW for each nested table column it has. Since the parent table typically already has a primary key (DEPTNO in my examples), it makes sense to use this key in the child tables, not a system-generated key.
  • The unnecessary overhead of the additional unique constraint on the parent table, when it typically already has a unique constraint.
  • The nested table is not easily used by itself, without using unsupported constructs (NESTED_TABLE_GET_REFS). It can be un-nested for queries, but not mass updates. I have yet to find a table in real life that isn't queried "by itself."

I do use nested tables heavily as a programming construct and in views. This is where I believe they are in their element. As a storage mechanism, I much prefer creating the parent/child tables myself. After creating the parent/child tables, we can, in fact, create a view that makes it appear as if we had a real nested table. That is, we can achieve all of the advantages of the nested table construct without incurring the overhead.

If you do use a nested table as a storage mechanism, be sure to make it an IOT to avoid the overhead of an index on the NESTED_TABLE_ID and the nested table itself. See the previous section on IOTs for advice on setting them up with overflow segments and other options. If you do not use an IOT, make sure to create an index on the NESTED_TABLE_ID column in the nested table to avoid full scanning it to find the child rows.

Temporary Tables

Temporary tables are used to hold intermediate resultsets, for the duration of either a trans-action or a session. The data held in a temporary table is only ever visible to the current session—no other session will see any other session's data, even if the current session COMMITs the data. Multiuser concurrency is not an issue with regard to temporary tables either, as one session can never block another session by using a temporary table. Even if we "lock" the temporary table, it will not prevent other sessions using their temporary table. As we observed in Chapter 9, temporary tables generate significantly less redo than regular tables. However, since they must generate undo information for the data they contain, they will generate some amount of redo. UPDATEs and DELETEs will generate the largest amount; INSERTs and SELECTs the least amount.

Temporary tables will allocate storage from the currently logged-in user's temporary tablespace, or if they are accessed from a definer rights procedure, the temporary tablespace of the owner of that procedure will be used. A global temporary table is really just a template for the table itself. The act of creating a temporary table involves no storage allocation; no INITIAL extent is allocated, as it would be for a regular table. Rather, at runtime when a session first puts data into the temporary table, a temporary segment for that session will be created. Since each session gets its own temporary segment (not just an extent of an existing segment), every user might be allocating space for her temporary table in different tablespaces. USER1 might have his temporary tablespace set to TEMP1, so his temporary tables will be allocated from this space. USER2 might have TEMP2 as her temporary tablespace, and her temporary tables will be allocated there.

Oracle's temporary tables are similar to temporary tables in other relational databases, with the main exception being that they are "statically" defined. You create them once per database, not once per stored procedure in the database. They always exist—they will be in the data dictionary as objects, but they will always appear empty until your session puts data into them. The fact that they are statically defined allows you to create views that reference temporary tables, to create stored procedures that use static SQL to reference them, and so on.

Temporary tables may be session based (data survives in the table across commits but not a disconnect/reconnect). They may also be transaction based (data disappears after a commit). Here is an example showing the behavior of both. I used the SCOTT.EMP table as a template:

ops$tkyte@ORA10G> create global temporary table temp_table_session
  2  on commit preserve rows
  3  as
  4  select * from scott.emp where 1=0
  5  /
Table created.

The ON COMMIT PRESERVE ROWS clause makes this a session-based temporary table. Rows will stay in this table until my session disconnects or I physically remove them via a DELETE or TRUNCATE. Only my session can see these rows; no other session will ever see "my" rows, even after I COMMIT.

ops$tkyte@ORA10G> create global temporary table temp_table_transaction
  2  on commit delete rows
  3  as
  4  select * from scott.emp where 1=0
  5  /
Table created.

The ON COMMIT DELETE ROWS makes this a transaction-based temporary table. When my session commits, the rows disappear. The rows will disappear by simply giving back the temporary extents allocated to my table—there is no overhead involved in the automatic clearing of temporary tables.

Now, let's look at the differences between the two types:

ops$tkyte@ORA10G> insert into temp_table_session select * from scott.emp;
14 rows created.

ops$tkyte@ORA10G> insert into temp_table_transaction select * from scott.emp;
14 rows created.

We've just put 14 rows into each TEMP table, and this shows we can "see" them:

ops$tkyte@ORA10G> select session_cnt, transaction_cnt
  2    from ( select count(*) session_cnt from temp_table_session ),
  3         ( select count(*) transaction_cnt from temp_table_transaction );
SESSION_CNT TRANSACTION_CNT
----------- ---------------
         14              14

ops$tkyte@ORA10G> commit;

Since we've committed, we'll see the session-based rows but not the transaction-based rows:

ops$tkyte@ORA10G> select session_cnt, transaction_cnt
  2    from ( select count(*) session_cnt from temp_table_session ),
  3         ( select count(*) transaction_cnt from temp_table_transaction );

SESSION_CNT TRANSACTION_CNT
----------- ---------------
         14               0

ops$tkyte@ORA10G>
ops$tkyte@ORA10G> disconnect
Disconnected from Oracle Database 10g Enterprise Edition Release 10.1.0.3.0
With the Partitioning, OLAP and Data Mining options
ops$tkyte@ORA10G> connect /
Connected.

Since we've started a new session, we'll see no rows in either table:

ops$tkyte@ORA10G> select session_cnt, transaction_cnt
  2    from ( select count(*) session_cnt from temp_table_session ),
  3         ( select count(*) transaction_cnt from temp_table_transaction );

SESSION_CNT TRANSACTION_CNT
----------- ---------------
         0                0

If you have experience with temporary tables in SQL Server and/or Sybase, the major consideration for you is that instead of executing SELECT X, Y, Z INTO #TEMP FROM SOME_TABLE to dynamically create and populate a temporary table, you will

  • Create all your global temporary tables once, as part of the application installation, just as you create permanent tables.
  • In your procedures, simply INSERT INTO TEMP (X,Y,Z) SELECT X,Y,Z FROM SOME_TABLE.

Just to drive home the point, the goal here is to not create tables in your stored procedures at runtime. That is not the proper way to use temporary tables in Oracle. DDL is an expensive operation; you want to avoid doing that at runtime. The temporary tables for an application should be created during the application installation—never at runtime.

Temporary tables can have many of the attributes of a permanent table. They may have triggers, check constraints, indexes, and so on. Features of permanent tables that they do not support include the following:

  • They cannot have referential integrity constraints. Neither can they be the target of a foreign key, nor can they have a foreign key defined on them.
  • They cannot have NESTED TABLE type columns. In Oracle9i and earlier, they cannot have VARRAY type columns either; this restriction was lifted in Oracle 10g.
  • They cannot be IOTs.
  • They cannot be in a cluster of any type.
  • They cannot be partitioned.
  • They cannot have statistics generated via the ANALYZE table command.

One of the drawbacks of a temporary table in any database is the fact that the optimizer has no real statistics on it normally. When using the cost-based optimizer (CBO), valid statistics are vital to the optimizer's success (or failure). In the absence of statistics, the optimizer will make guesses as to the distribution of data, the amount of data, and the selectivity of an index. When these guesses are wrong, the query plans generated for queries that make heavy use of temporary tables could be less than optimal. In many cases, the correct solution is to not use a temporary table at all, but rather to use an INLINE VIEW (for an example of an INLINE VIEW, refer to the SELECT just run—it has two of them) in its place. In this fashion, Oracle will have access to all of the relevant statistics for a table and can come up with an optimal plan.

I find many times people use temporary tables because they learned in other databases that joining too many tables in a single query is a "bad thing." This is a practice that must be unlearned for Oracle development. Rather than trying to outsmart the optimizer and breaking what should be a single query into three or four queries that store their subresults into temporary tables, and then combining the temporary tables, you should just code a single query that answers the original question. Referencing many tables in a single query is OK; the temporary table crutch is not needed in Oracle for this purpose.

In other cases, however, the use of a temporary table in a process is the correct approach. For example, I once wrote a Palm sync application to synchronize the date book on a Palm Pilot with calendar information stored in Oracle. The Palm gives me a list of all records that have been modified since the last hot synchronization. I must take these records and compare them against the live data in the database, update the database records, and then generate a list of changes to be applied to the Palm. This is a perfect example of when a temporary table is very useful. I used a temporary table to store the changes from the Palm in the database. I then ran a stored procedure that bumps the Palm-generated changes against the live (and very large) permanent tables to discover what changes need to be made to the Oracle data, and then to find the changes that need to come from Oracle back down to the Palm. I have to make a couple of passes on this data. First, I find all records that were modified only on the Palm and make the corresponding changes in Oracle. Next, I find all records that were modified on both the Palm and my database since the last synchronization and rectify them. Then I find all records that were modified only on the database and place their changes into the temporary table. Lastly, the Palm sync application pulls the changes from the temporary table and applies them to the Palm device itself. Upon disconnection, the temporary data goes away.

The issue I encountered, however, is that because the permanent tables were analyzed, the CBO was being used. The temporary table had no statistics on it (you can analyze the temporary table, but no statistics are gathered), and the CBO would "guess" many things about it. I, as the developer, knew the average number of rows I might expect, the distribution of the data, the selectivity of the indexes, and so on. I needed a way to inform the optimizer of these better guesses. There are three ways to give the optimizer statistics on the global temporary tables. One is via dynamic sampling (new in Oracle9i Release 2 and above) and the other is the DBMS_STATS package, which has two ways to accomplish this. First, let's look at dynamic sampling.

Dynamic sampling is the optimizer's ability, when hard parsing a query, to scan segments in the database (sample them) to collect statistics useful in optimizing that particular query. It is akin to doing a "miniature gather statistics" command during a hard parse. In Oracle 10g, dynamic sampling will work out of the box, because the default setting has been increased from 1 to 2, and at level 2, the optimizer will dynamically sample any unanalyzed object referenced in a query processed by the optimizer prior to evaluating the query plan. In 9i Release 2, the setting of 1 would cause dynamic sampling to be used much less often. We can use an ALTER SESSION|SYSTEM command in Oracle9i Release 2 to make it behave the way Oracle 10g does by default, or we can use the dynamic sampling hint as follows:

ops$tkyte@ORA9IR2> create global temporary table gtt
  2  as
  3  select * from scott.emp where 1=0;
Table created.

ops$tkyte@ORA9IR2> insert into gtt select * from scott.emp;
14 rows created.

ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select /*+ first_rows */ * from gtt;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=17 Card=8168 Bytes...
   1    0   TABLE ACCESS (FULL) OF 'GTT' (Cost=17 Card=8168 Bytes=710616)

ops$tkyte@ORA9IR2> select /*+ first_rows dynamic_sampling(gtt 2) */ * from gtt;

Execution Plan
----------------------------------------------------------
  0   SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=17 Card=14 Bytes=1218)
  1  0  TABLE ACCESS (FULL) OF 'GTT' (Cost=17 Card=14 Bytes=1218)

ops$tkyte@ORA9IR2> set autotrace off

Here, we set the dynamic sampling to level 2 for the table GTT in this query. Left to itself, the optimizer guessed 8,168 rows would be returned from the table GTT. Using dynamic sampling, the estimated cardinality will be much closer to reality (which leads to better query plans overall). Using the level 2 setting, the optimizer quickly scans the table to come up with more-realistic estimates of the true size of this table. In Oracle 10g, we should find this to be less of a problem, because the defaults will cause dynamic sampling to take place:

ops$tkyte@ORA10G> create global temporary table gtt
  2  as
  3  select * from scott.emp where 1=0;
Table created.

ops$tkyte@ORA10G> insert into gtt select * from scott.emp;
14 rows created.

ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> select * from gtt;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=14 Bytes=1218)
   1    0   TABLE ACCESS (FULL) OF 'GTT' (TABLE (TEMP)) (Cost=2 Card=14 Bytes=1218)

ops$tkyte@ORA10G> set autotrace off

We get the right cardinality without having to ask for it. Dynamic sampling does not come free, however—there is a cost associated with having to perform it at query parse time. If we gathered appropriate representative statistics ahead of time, we could avoid this at hard parse time. That leads us in to DBMS_STATS.

There are three methods to use DBMS_STATS to gather representative statistics. The first way is to use DBMS_STATS with the GATHER_SCHEMA_STATS or GATHER_DATABASE_STATS call. These procedures allow you to pass in a parameter, GATHER_TEMP, which is a Boolean and defaults to FALSE. When set to TRUE, any ON COMMIT PRESERVE ROWS global temporary table will have statistics gathered and stored (this technique will not work on ON COMMIT DELETE ROWS tables). Consider the following (note that this was done in an empty schema; the only objects are those you see created):

ops$tkyte@ORA10G> create table emp as select * from scott.emp;
Table created.

ops$tkyte@ORA10G> create global temporary table gtt1 ( x number )
  2  on commit preserve rows;
Table created.

ops$tkyte@ORA10G> create global temporary table gtt2 ( x number )
  2  on commit delete rows;
Table created.

ops$tkyte@ORA10G> insert into gtt1 select user_id from all_users;
38 rows created.

ops$tkyte@ORA10G> insert into gtt2 select user_id from all_users;
38 rows created.

ops$tkyte@ORA10G> exec dbms_stats.gather_schema_stats( user );
PL/SQL procedure successfully completed.

ops$tkyte@ORA10G> select table_name, last_analyzed, num_rows from user_tables;

TABLE_NAME                     LAST_ANAL   NUM_ROWS
------------------------------ --------- ----------
EMP                            01-MAY-05         14
GTT1
GTT2

As we can see, only the EMP table was analyzed in this case; the two global temporary tables were ignored. We can change that behavior by calling GATHER_SCHEMA_STATS with GATHER_TEMP => TRUE:

ops$tkyte@ORA10G> insert into gtt2 select user_id from all_users;
38 rows created.

ops$tkyte@ORA10G> exec dbms_stats.gather_schema_stats( user, gather_temp=>TRUE );
PL/SQL procedure successfully completed.

ops$tkyte@ORA10G> select table_name, last_analyzed, num_rows from user_tables;

TABLE_NAME                     LAST_ANAL   NUM_ROWS
------------------------------ --------- ----------
EMP                            01-MAY-05         14
GTT1                           01-MAY-05         38
GTT2                           01-MAY-05          0

Notice that the ON COMMIT PRESERVE rows table has accurate statistics, but the ON COMMIT DELETE ROWS does not. DBMS_STATS commits, and that wipes out any information in that table. Do note, however, that GTT2 does now have statistics, which in itself is a bad thing, because the statistics are very much incorrect! It is doubtful the table will have 0 rows in it at runtime. So, if you use this approach, be aware of two things:

  • Make sure to populate your global temporary tables with representative data in the session that gathers the statistics. If not, they will appear empty to DBMS_STATS.
  • If you have ON COMMIT DELETE ROWS global temporary tables, this approach should not be used, as you will definitely gather inappropriate values.

The second technique that works with ON COMMIT PRESERVE ROWS global temporary tables is to use GATHER_TABLE_STATS directly on the table. You would populate the global temporary table as we just did, and then execute GATHER_TABLE_STATS on that global temporary table. Note that just as before, this does not work for ON COMMIT DELETE ROWS global temporary tables, as the same issues as just described would come into play.

The last technique using DBMS_STATS uses a manual process to populate the data dictionary with representative statistics for our temporary tables. For example, if on average the number of rows in the temporary table will be 500, the average row size will be 100 bytes, and the number of blocks will be 7, we could simply use the following:

ops$tkyte@ORA10G> create global temporary table t ( x int, y varchar2(100) );
Table created.

ops$tkyte@ORA10G> begin
  2     dbms_stats.set_table_stats( ownname => USER,
  3                                 tabname => 'T',
  4                                 numrows => 500,
  5                                 numblks => 7,
  6                                 avgrlen => 100 );
  7  end;
  8  /
PL/SQL procedure successfully completed.

ops$tkyte@ORA10G> select table_name, num_rows, blocks, avg_row_len
  2               from user_tables
  3              where table_name = 'T';

TABLE_NAME                       NUM_ROWS     BLOCKS AVG_ROW_LEN
------------------------------ ---------- ---------- -----------
T                                     500          7         100

Now, the optimizer won't use its best guess—it will use our best guess for this information.

Temporary Tables Wrap-Up

Temporary tables can be useful in an application where you need to temporarily store a set of rows to be processed against other tables, for either a session or a transaction. They are not meant to be used as a means to take a single larger query and "break it up" into smaller result sets that would be combined back together (which seems to be the most popular use of temporary tables in other databases). In fact, you will find in almost all cases that a single query broken up into smaller temporary table queries performs more slowly in Oracle than the single query would have. I've seen this behavior time and time again, when given the opportunity to rewrite the series of INSERTs into temporary tables as SELECTs in the form of one large query, the resulting single query executes much faster than the original multistep process.

Temporary tables generate a minimum amount of redo, but they still generate some redo, and there is no way to disable that. The redo is generated for the rollback data, and in most typical uses it will be negligible. If you only INSERT and SELECT from temporary tables, the amount of redo generated will not be noticeable. Only if you DELETE or UPDATE a temporary table heavily will you see large amounts of redo generated.

Statistics used by the CBO can be generated on a temporary table with care; however, a better guess set of statistics may be set on a temporary table using the DBMS_STATS package or dynamically collected by the optimizer at hard parse time using dynamic sampling.

Object Tables

We have already seen a partial example of an object table with nested tables. An object table is a table that is created based on a TYPE, not as a collection of columns. Normally, a CREATE TABLE would look like this:

create table t ( x int, y date, z varchar2(25) );

An object table creation statement looks more like this:

create table t of Some_Type;

The attributes (columns) of T are derived from the definition of SOME_TYPE. Let's quickly look at an example involving a couple of types, and then we'll review the resulting data structures:

ops$tkyte@ORA10G> create or replace type address_type
  2  as object
  3  ( city    varchar2(30),
  4    street  varchar2(30),
  5    state   varchar2(2),
  6    zip     number
  7  )
  8  /
Type created.

ops$tkyte@ORA10G> create or replace type person_type
  2  as object
  3  ( name             varchar2(30),
  4    dob              date,
  5    home_address     address_type,
  6    work_address     address_type
  7  )
  8  /
Type created.

ops$tkyte@ORA10G> create table people of person_type
  2  /
Table created.

ops$tkyte@ORA10G> desc people
Name                                     Null?    Type
---------------------------------------- -------- ----------------------------
NAME                                              VARCHAR2(30)
DOB                                               DATE
HOME_ADDRESS                                      ADDRESS_TYPE
WORK_ADDRESS                                      ADDRESS_TYPE

That's all there is to it. We create some type definitions, and then we can create tables of that type. The table appears to have four columns representing the four attributes of the PERSON_TYPE we created. We are at the point where we can now perform DML on the object table to create and query data:

ops$tkyte@ORA10G> insert into people values ( 'Tom', '15-mar-1965',
  2  address_type( 'Reston', '123 Main Street', 'Va', '45678' ),
  3  address_type( 'Redwood', '1 Oracle Way', 'Ca', '23456' ) );
1 row created.

ops$tkyte@ORA10G> select * from people;

NAME  DOB       HOME_ADDRESS(CITY, S WORK_ADDRESS(CITY, S
----- --------- -------------------- --------------------
Tom   15-MAR-65 ADDRESS_TYPE('Reston ADDRESS_TYPE('Redwoo
                ', '123 Main Street' d', '1 Oracle Way',
                , 'Va', 45678)    'Ca', 23456)

ops$tkyte@ORA10G> select name, p.home_address.city from people p;

NAME  HOME_ADDRESS.CITY
----- ------------------------------
Tom   Reston

We're starting to see some of the object syntax necessary to deal with object types. For example, in the INSERT statement we had to wrap the HOME_ADDRESS and WORK_ADDRESS with a CAST. We cast the scalar values to be of an ADDRESS_TYPE. Another way of saying this is that we create an ADDRESS_TYPE instance for that row by using the default constructor for the ADDRESS_TYPE object.

Now, as far as the external face of the table is concerned, there are four columns in our table. By now, after seeing the hidden magic that took place for the nested tables, we can probably guess that there is something else going on. Oracle stores all object-relational data in plain old relational tables—at the end of the day it is all in rows and columns. If we dig into the "real" data dictionary, we can see what this table really looks like:

ops$tkyte@ORA10G> select name, segcollength
  2    from sys.col$
  3   where obj# = ( select object_id
  4                    from user_objects
  5                   where object_name = 'PEOPLE' )
  6  /

NAME                      SEGCOLLENGTH
------------------------- ------------
SYS_NC_OID$                         16
SYS_NC_ROWINFO$                      1
NAME                                30
DOB                                  7
HOME_ADDRESS                         1
SYS_NC00006$                        30
SYS_NC00007$                        30
SYS_NC00008$                         2
SYS_NC00009$                        22
WORK_ADDRESS                         1
SYS_NC00011$                        30
SYS_NC00012$                        30
SYS_NC00013$                         2
SYS_NC00014$                        22
 14 rows selected.

This looks quite different from what DESCRIBE tells us. Apparently, there are 14 columns in this table, not 4. In this case, they are

  • SYS_NC_OID$: This is the system-generated object ID of the table. It is a unique RAW(16) column. It has a unique constraint on it, and there is a corresponding unique index created on it as well.
  • SYS_NC_ROWINFO: This is the same "magic" function we observed with the nested table. If we select that from the table, it returns the entire row as a single column:
    ops$tkyte@ORA10G> select sys_nc_rowinfo$ from people;

    SYS_NC_ROWINFO$(NAME, DOB, HOME_ADDRESS(CITY,STREET,STATE,ZIP), WORK_ADDRESS
    ----------------------------------------------------------------------------
    PERSON_TYPE('Tom', '15-MAR-65', ADDRESS_TYPE('Reston', '123 Main Street',
    'Va', 45678), ADDRESS_TYPE('Redwood', '1 Oracle Way', 'Ca', 23456))
  • NAME, DOB: These are the scalar attributes of our object table. They are stored much as we would expect, as regular columns.
  • HOME_ADDRESS, WORK_ADDRESS: These are "magic" functions as well. They return the collection of columns they represent as a single object. These consume no real space except to signify NULL or NOT NULL for the entity.
  • SYS_NCnnnnn$: These are the scalar implementations of our embedded object types. Since the PERSON_TYPE had the ADDRESS_TYPE embedded in it, Oracle needed to make room to store them in the appropriate type of columns. The system-generated names are necessary since a column name must be unique, and there is nothing stopping us from using the same object type more than once as we did here. If the names were not generated, we would have ended up with the ZIP column twice.

So, just like with the nested table, there is a lot going on here. A pseudo primary key of 16 bytes was added, there are virtual columns, and an index was created for us. We can change the default behavior with regard to the value of the object identifier assigned to an object, as we'll see in a moment. First, let's look at the full verbose SQL that would generate our table for us. Again, this was generated using EXP/IMP since I wanted to easily see the dependent objects, including all of the SQL needed to re-create this object. This was achieved via the following:

[tkyte@localhost tkyte]$ exp userid=/ tables=people rows=n
Export: Release 10.1.0.3.0 - Production on Sun May 1 14:04:16 2005
Copyright (c) 1982, 2004, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set
Note: table data (rows) will not be exported

About to export specified tables via Conventional Path ...
. . exporting table             PEOPLE
Export terminated successfully without warnings.

[tkyte@localhost tkyte]$ imp userid=/ indexfile=people.sql full=y
Import: Release 10.1.0.3.0 - Production on Sun May 1 14:04:33 2005
Copyright (c) 1982, 2004, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

Export file created by EXPORT:V10.01.00 via conventional path
import done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set
Import terminated successfully without warnings.

Review of the people.sql file that results would show this:

CREATE TABLE "OPS$TKYTE"."PEOPLE"
OF "PERSON_TYPE" OID 'F610318AC3D8981FE030007F01001464'
OIDINDEX (PCTFREE 10 INITRANS 2 MAXTRANS 255
          STORAGE(INITIAL 131072 NEXT 131072
                  MINEXTENTS 1 MAXEXTENTS 4096
                  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
                  BUFFER_POOL DEFAULT)
TABLESPACE "USERS")
PCTFREE 10 PCTUSED 40
INITRANS 1 MAXTRANS 255
LOGGING STORAGE(INITIAL 131072 NEXT 131072
                MINEXTENTS 1 MAXEXTENTS 4096
                PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
                BUFFER_POOL DEFAULT) TABLESPACE "USERS" NOCOMPRESS
/

ALTER TABLE "OPS$TKYTE"."PEOPLE" MODIFY
("SYS_NC_OID$" DEFAULT SYS_OP_GUID())
/

This gives us a little more insight into what is actually taking place here. We see the OIDINDEX clause clearly now, and we see a reference to the SYS_NC_OID$ column. This is the hidden primary key of the table. The function SYS_OP_GUID is the same as the function SYS_GUID. They both return a globally unique identifier that is a 16-byte RAW field.

The OID '<big hex number>' syntax is not documented in the Oracle documentation. All this is doing is ensuring that during an EXP and subsequent IMP, the underlying type PERSON_TYPE is in fact the same type. This will prevent an error that would occur if we performed the following steps:

  1. Create the PEOPLE table.
  2. Export the table.
  3. Drop the table and the underlying PERSON_TYPE.
  4. Create a new PERSON_TYPE with different attributes.
  5. Import the old PEOPLE data.

Obviously, this export cannot be imported into the new structure—it will not fit. This check prevents that from occurring.

If you remember, I mentioned that we can change the behavior of the object identifier assigned to an object instance. Instead of having the system generate a pseudo primary key for us, we can use the natural key of an object. At first, this might appear self-defeating—the SYS_NC_OID$ will still appear in the table definition in SYS.COL$ and, in fact, it will appear to consume massive amounts of storage as compared to the system-generated column. Once again, however, there is "magic" at work here. The SYS_NC_OID$ column for an object table that is based on a primary key and not system generated is a virtual column and consumes no real storage on disk.

Here is an example that shows what happens in the data dictionary and demonstrates that there is no physical storage consumed for the SYS_NC_OID$. We'll start with an analysis of the system-generated OID table:

ops$tkyte@ORA10G> create table people of person_type
  2  /
Table created.

ops$tkyte@ORA10G> select name, type#, segcollength
  2    from sys.col$
  3   where obj# = ( select object_id
  4                    from user_objects
  5                   where object_name = 'PEOPLE' )
  6     and name like 'SYS\_NC\_%' escape ''
  7  /

NAME                           TYPE# SEGCOLLENGTH
------------------------- ---------- ------------
SYS_NC_OID$                       23           16
SYS_NC_ROWINFO$                  121            1

ops$tkyte@ORA10G> insert into people(name)
  2  select rownum from all_objects;
48217 rows created.

ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'PEOPLE' );
PL/SQL procedure successfully completed.

ops$tkyte@ORA10G> select table_name, avg_row_len from user_object_tables;

TABLE_NAME                     AVG_ROW_LEN
------------------------------ -----------
PEOPLE                                  23

We see here that the average row length is 23 bytes: 16 bytes for the SYS_NC_OID$ and 7 bytes for the NAME. Now, let's do the same thing, but use a primary key on the NAME column as the object identifier:

ops$tkyte@ORA10G> CREATE TABLE "PEOPLE"
  2  OF "PERSON_TYPE"
  3  ( constraint people_pk primary key(name) )
  4  object identifier is PRIMARY KEY
  5  /
Table created.

ops$tkyte@ORA10G> select name, type#, segcollength
  2    from sys.col$
  3   where obj# = ( select object_id
  4                    from user_objects
  5                   where object_name = 'PEOPLE' )
  6     and name like 'SYS\_NC\_%' escape ''
  7  /

NAME                                TYPE# SEGCOLLENGTH
------------------------------ ---------- ------------
SYS_NC_OID$                            23           81
SYS_NC_ROWINFO$                       121            1

According to this, instead of a small 16-byte column, we have a large 81-byte column! In reality, there is no data stored in there. It will be null. The system will generate a unique ID based on the object table, its underlying type, and the value in the row itself. We can see this in the following:

ops$tkyte@ORA10G> insert into people (name)
  2  values ( 'Hello World!' );
1 row created.

ops$tkyte@ORA10G> select sys_nc_oid$ from people p;

SYS_NC_OID$
-------------------------------------------------------------------------------
F610733A48F865F9E030007F0100149A00000017260100010001002900000000000C07001E01000
02A00078401FE000000140C48656C6C6F20576F726C6421 00000000000000000000000000000000
0000

ops$tkyte@ORA10G> select utl_raw.cast_to_raw( 'Hello World!' ) data
  2  from dual;

DATA
-------------------------------------------------------------------------------
48656C6C6F20576F726C6421

ops$tkyte@ORA10G> select utl_raw.cast_to_varchar2(sys_nc_oid$) data
  2  from people;

DATA
-------------------------------------------------------------------------------
<garbage bits and bytes..>Hello World!

If we select out the SYS_NC_OID$ column and inspect the HEX dump of the string we inserted, we see that the row data itself is embedded in the object ID. Converting the object ID into a VARCHAR2, we can confirm that visually. Does that mean our data is stored twice with a lot of overhead with it? No, it is not—it is just factored into that magic thing that is the SYS_NC_OID$ column upon retrieval. Oracle synthesizes the data upon selecting from the table.

Now for an opinion. The object-relational components (nested tables and object tables) are primarily what I call "syntactic sugar." They are always translated into good old relational rows and columns. I prefer not to use them as physical storage mechanisms personally. There are too many bits of "magic" happening—side effects that are not clear. You get hidden columns, extra indexes, surprise pseudo columns, and so on. This does not mean that the object-relational components are a waste of time—on the contrary, I use them in PL/SQL constantly. I use them with object views. I can achieve the benefits of a nested table construct (less data returned over the network for a master/detail relationship, conceptually easier to work with, and so on) without any of the physical storage concerns. That is because I can use object views to synthesize my objects from my relational data. This solves most of my concerns with object tables/nested tables in that the physical storage is dictated by me, the join conditions are set up by me, and the tables are available as relational tables (which is what many third-party tools and applications will demand) naturally. The people who require an object view of relational data can have it, and the people who need the relational view can have it. Since object tables are really relational tables in disguise, we are doing the same thing Oracle does for us behind the scenes, only we can do it more efficiently, since we don't have to do it generically as they do. For example, using the types defined earlier, I could just as easily use the following:

ops$tkyte@ORA10G> create table people_tab
  2  (  name        varchar2(30) primary key,
  3     dob         date,
  4     home_city   varchar2(30),
  5     home_street varchar2(30),
  6     home_state  varchar2(2),
  7     home_zip    number,
  8     work_city   varchar2(30),
  9     work_street varchar2(30),
 10     work_state  varchar2(2),
 11     work_zip    number
 12  )
 13  /
Table created.

ops$tkyte@ORA10G> create view people of person_type
  2  with object identifier (name)
  3  as
  4  select name, dob,
  5    address_type(home_city,home_street,home_state,home_zip) home_adress,
  6    address_type(work_city,work_street,work_state,work_zip) work_adress
  7    from people_tab
  8  /
View created.

ops$tkyte@ORA10G> insert into people values ( 'Tom', '15-mar-1965',
  2  address_type( 'Reston', '123 Main Street', 'Va', '45678' ),
  3  address_type( 'Redwood', '1 Oracle Way', 'Ca', '23456' ) );
1 row created.

However I achieve very much the same effect, I know exactly what is stored, how it is stored, and where it is stored. For more complex objects, we may have to code INSTEAD OF triggers on the object views to allow for modifications through the view.

Object Tables Wrap-Up

Object tables are used to implement an object relational model in Oracle. A single object table will create many physical database objects typically, and add additional columns to your schema to manage everything. There is some amount of "magic" associated with object tables. Object views allow you to take advantage of the syntax and semantics of "objects," while at the same time retaining complete control over the physical storage of the data and allowing for relational access to the underlying data. In that fashion, you can achieve the best of both the relational and object-relational worlds.

Summary

Hopefully after reading this chapter, you have come to the conclusion that not all tables are created equal. Oracle provides a rich variety of table types that you can exploit. In this chapter, we have covered many of the salient aspects of tables in general and explored the many different table types Oracle provides for us to use.

We began by looking at some terminology and storage parameters associated with tables. We looked at the usefulness of freelists in a multiuser environment where a table is frequently inserted/updated by many people simultaneously, and how the use of ASSM tablespaces could make it so we don't even have to think about that. We investigated the meaning of PCTFREE and PCTUSED, and developed some guidelines for setting them correctly.

Then we got into the different types of tables, starting with the common heap. The heap organized table is by far the most commonly used table in most Oracle applications, and it is the default table type. We moved on to examine index organized tables (IOTs), which provide us with the ability store our table data in an index structure instead of a heap table. We saw how these are applicable for various uses, such as lookup tables and inverted lists, where a heap table would just be a redundant copy of the data. Later, we saw how IOTs can really be useful when mixed with other table types, specifically the nested table type.

We looked at cluster objects, of which Oracle has three kinds: index, hash, and sorted hash. The goals of the cluster are twofold:

  • To give us the ability to store data from many tables together on the same database block(s).
  • To give us the ability to force like data to be stored physically "together" based on some cluster key. In this fashion all of the data for department 10 (from many tables) may be stored together.

These features allow us to access related data very quickly, with minimal physical I/O. We observed the main differences between index clusters and hash clusters, and discussed when each would (and would not) be appropriate.

Next, we moved on to cover nested tables. We reviewed the syntax, semantics, and usage of nested tables. We saw how they are in a fact a system-generated and -maintained parent/child pair of tables, and we discovered how Oracle physically does this for us. We looked at using different table types for nested tables, which by default use a heap-based table. We found that there will probably never be a reason not to use an IOT instead of a heap table for nested tables.

Then we looked into the ins and outs of temporary tables, including how to create them, where they get their storage from, and the fact that they introduce no concurrency-related issues at runtime. We explored the differences between session-level and transaction-level temporary tables, and we discussed the appropriate method for using temporary tables in an Oracle database.

This chapter finished with a look at the inner workings of object tables. As with nested tables, we discovered there is a lot going on under the covers with object tables in Oracle. We discussed how object views on top of relational tables can give us the functionality of an object table, while at the same time offering easy access to the underlying relational data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.94.249