C H A P T E R  4

Storage Indexes

Storage Indexes are the most useful Exadata feature that you never hear about. They are not indexes that are stored in the database like Oracle’s traditional B-Tree or bitmapped indexes. In fact, they are not indexes at all in the traditional sense. They are not capable of identifying a set of records that has a certain value in a given column. Rather, they are a feature of the storage server software that is designed to eliminate disk I/O. They are sometimes described as “reverse indexes.” That’s because they identify locations where the requested records are not, instead of the other way around. They work by storing minimum and maximum column values for disk storage units, which are 1 Megabyte (MB) by default. Because SQL predicates are passed to the storage servers when Smart Scans are performed, the storage software can check the predicates against the Storage Index metadata (maximum and minimum values) before doing the requested I/O. Any storage region that cannot possibly have a matching row is skipped. In many cases, this can result in a significant reduction in the amount of I/O that must be performed. Keep in mind that since the storage software needs the predicates to compare to the maximum and minimum values in the Storage Indexes, this optimization is only available for Smart Scans.

The storage software provides no documented mechanism for altering or tuning Storage Indexes (although there are a few undocumented parameters that can be set prior to starting cellsrv on the storage servers). In fact, there is not even much available in the way of monitoring. For example, there is no wait event that records the amount of time spent when a Storage Index is accessed or updated. Even though there are no commands to manipulate Storage Indexes, they are an extremely powerful feature and can provide dramatic performance improvements. For that reason it is important to understand how they work.

images Kevin Says: To keep the role of Storage Indexes straight in my mind, I generally picture an optimization that dramatically improves searching for a needle in a haystack as opposed to finding where certain pieces of straw exist in a haystack.

Structure

Storage Indexes consist of a minimum and a maximum value for up to eight columns. This structure is maintained for 1MB chunks of storage (storage regions). Storage Indexes are stored in memory only and are never written to disk.

images Kevin Says: Storage Indexes are stored in the heap of cellsrv, so technically speaking they could end up on disk (swap) under insane conditions…so, not never…

Figure 4-1 shows a conceptual view of the data contained in a Storage Index.

images

Figure 4-1. Conceptual diagram of a Storage Index

As you can see in the diagram, the first storage region in the Customer table has a maximum value of 77, indicating that it’s possible for it to contain rows that will satisfy the query predicate (cust_age >35). The other storage regions in the diagram do not have maximum values that are high enough to contain any records that will satisfy the query predicate. Therefore, those storage regions will not be read from disk.

In addition to the minimum and maximum values, there is a flag to indicate whether any of the records in a storage region contain nulls. The fact that nulls are represented at all is somewhat surprising given that nulls are not stored in traditional Oracle indexes. This ability of Storage Indexes to track nulls may actually have repercussions for design and implementation decisions. There are systems that don’t use nulls at all. SAP, for example, uses a single space character instead of nulls. SAP does this simply to insure that records can be accessed via B-Tree indexes (which do not store nulls). At any rate, Storage Indexes provide the equivalent of a bit-mapped index on nulls, which makes finding nulls a very efficient process (assuming they represent a low percentage of the values).

Monitoring Storage Indexes

The ability to monitor Storage Indexes is very limited. The optimizer doesn’t know whether a Storage Index will be used for a particular SQL statement. Nor do AWR or ASH capture any information about whether Storage Indexes were used by particular SQL statements. There is a single statistic that tracks Storage Index usage at the database level and an undocumented tracing mechanism.

Database Statistics

There is only one database statistic related to storage indexes. The statistic, cell physical IO bytes saved by storage index, keeps track of the accumulated I/O that has been avoided by the use of Storage Indexes. This statistic is exposed in v$sesstat and v$sysstat and related views. It’s a strange statistic that calculates a precise value for something it didn’t do. Nevertheless, it is the only easily accessible indicator as to whether Storage Indexes have been used. Unfortunately, since the statistic is cumulative, like all statistics in v$sesstat, it must be checked before and after a given SQL statement in order to determine whether Storage Indexes were used on that particular statement. Here is an example:

SYS@EXDB1> set echo on
SYS@EXDB1> @si
SYS@EXDB1> col name for a70
SYS@EXDB1> col value for 99999999999999
SYS@EXDB1> select name, value
  2  from v$mystat s, v$statname n
  3  where n.statistic# = s.statistic#
  4  and name like '%storage%';


NAME                                                    VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index               0

SYS@EXDB1> select avg(pk_col) from kso.skew2 where col1 is null;

AVG(PK_COL)
-----------
   32000001

SYS@EXDB1> set echo off
SYS@EXDB1> @si

NAME                                                    VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index      3984949248

SYS@EXDB1> select avg(pk_col) from kso.skew2 where col1 is null;

AVG(PK_COL)
-----------
   32000001


SYS@EXDB1> @si

NAME                                                    VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index      7969898496

As you can see, the si.sql script queries v$mystat for a statistic that contains the word “storage.” The value for this statistic will be 0 until a SQL statement that uses a Storage Index has been executed in the current session. In our example, the query used a Storage Index that eliminated about 4 billion bytes of disk I/O. This is the amount of additional I/O that would have been necessary without Storage Indexes. Note that v$mystat is a view that exposes cumulative statistics for your current session. So if you run the statement a second time, the value should increase to twice the value it had after the first execution. Of course, disconnecting from the session (by exiting SQL*Plus for example) resets most statistics exposed by v$mystat, including this one, to 0.

Tracing

There is another way to monitor what is going on with Storage Indexes at the individual storage cell level. The cellsrv program has the ability to create trace files whenever Storage Indexes are accessed. This tracing can be enabled by setting the _CELL_STORAGE_INDEX_DIAG_MODE parameter to TRUE in the cellinit.ora file on one of the storage cells. You will need to restart the cellsrv program once this parameter is set. In general this should not cause any interruption to the clusterware or databases running on the database machines, thanks to the redundancy provided by ASM. However, you should be aware that if you have issues on other storage cells, it’s possible that restarting a cellsrv process could cause an outage. Tracing can also be enabled on all storage servers by setting the hidden database parameter, _KCFIS_STORAGEIDX_DIAG_MODE to a value of 2. Since this tracing mechanism is completely undocumented, it should not be used without approval from Oracle support. Better safe than sorry.

Because the cellsrv process is multithreaded, the tracing facility creates many trace files. The result is similar to tracing a select statement that is executed in parallel on a database server, in that there are multiple trace files that need to be combined to show the whole picture. The naming convention for the trace files is svtrc_, followed by a process ID, followed by a thread identifier. The process ID matches the operating system process ID of the cellsrv process. Since cellsrv enables only 100 threads by default, the file names are reused rapidly as requests come into the storage cells. Because of this rapid reuse, it’s quite easy to wrap around the thread number portion of the file name. Such wrapping around doesn’t wipe out the previous trace file, but rather appends new data to the existing file. Appending happens with trace files on Oracle database servers as well, but is much less common because the process ID portion of the default file name comes from the user’s shadow process. So basically each session gets its own number.

There is another related cellsrv parameter, _CELL_SI_MAX_NUM_DIAG_MODE_DUMPS, that sets a maximum number of trace files that will be created before the tracing functionality is turned off. The parameter defaults to a value of 20. Presumably the parameter is a safety mechanism to keep the disk from getting filled by trace files, since a single query can create a large number of files.

Here is a snippet from a trace file generated on our test system:

Trace file
/opt/oracle/cell11.2.2.2.0_LINUX.X64_101206.2/log/diag/asm/cell/enkcel03/trace/svtrc_13253_100.trc
ORACLE_HOME = /opt/oracle/cell11.2.2.2.0_LINUX.X64_101206.2
System name:    Linux
Node name:      enkcel03.enkitec.com
Release:        2.6.18-194.3.1.0.3.el5
Version:        #1 SMP Tue Aug 31 22:41:13 EDT 2010
Machine:        x86_64
CELL SW Version:        OSS_11.2.0.3.0_LINUX.X64_101206.2

*** 2010-12-17 11:40:41.127
UserThread: LWPID: 13834 userId: 100 kernelId: 100 pthreadID: 0x2aae6b689940
2010-12-17 12:11:26.491971*: FenceMaster: OSS_IOCTL_FENCE_ENTITY is called, host enkdb02.enkitec.com[pid:8519] number of fencing in progress 1 reid cid=3cb0d13cdb9cff5eff8b8bfb091c6fe9,icin=171990399,nmn=2,lnid=171990399,gid=23,gin=1,gmn=1,umemid=1,opid=48,opsn=1,lvl=process hdr=0xfece0100
2010-12-17 12:11:26.497277*: FenceMaster: OSS_IOCTL_FENCE_ENTITY is set, number of fencing in progress 0 reid cid=3cb0d13cdb9cff5eff8b8bfb091c6fe9,icin=171990399,nmn=2,lnid=171990399,gid=23,gin=1,gmn=1,umemid=1,opid=48,opsn=1,lvl=process hdr=0xfece0100
2010-12-17 12:45:25.914281*: FenceMaster: OSS_IOCTL_FENCE_ENTITY is called, host enkdb01.enkitec.com[pid:9326] number of fencing in progress 1 reid cid=3cb0d13cdb9cff5eff8b8bfb091c6fe9,icin=171990399,nmn=1,lnid=171990399,gid=-2147483642,gin=1,gmn=3,umemid=3,opid=42,opsn=3,lvl=process hdr=0xfece0100
2010-12-17 12:45:25.915592*: FenceMaster: OSS_IOCTL_FENCE_ENTITY is set, number of fencing in progress 0 reid cid=3cb0d13cdb9cff5eff8b8bfb091c6fe9,icin=171990399,nmn=1,lnid=171990399,gid=-2147483642,gin=1,gmn=3,umemid=3,opid=42,opsn=3,lvl=process hdr=0xfece0100
2010-12-17 12:45:41.118778*: FenceMaster: OSS_IOCTL_FENCE_ENTITY is called, host enkdb01.enkitec.com[pid:9326] number of fencing in progress 1 reid cid=3cb0d13cdb9cff5eff8b8bfb091c6fe9,icin=171990399,nmn=1,lnid=171990399,gid=-2147483642,gin=1,gmn=3,umemid=3,opid=0,opsn=0,lvl=member hdr=0xfece0100
2010-12-17 12:45:41.122256*: FenceMaster: OSS_IOCTL_FENCE_ENTITY is set, number of fencing in progress 0 reid cid=3cb0d13cdb9cff5eff8b8bfb091c6fe9,icin=171990399,nmn=1,lnid=171990399,gid=-2147483642,gin=1,gmn=3,umemid=3,opid=0,opsn=0,lvl=member hdr=0xfece0100
2010-12-21 12:12:34.465398*: FenceMaster: OSS_IOCTL_FENCE_ENTITY is called, host enkdb02.enkitec.com[pid:8519] number of fencing in progress 1 reid cid=3cb0d13cdb9cff5eff8b8bfb091c6fe9,icin=171990399,nmn=2,lnid=171990399,gid=-2147483643,gin=1,gmn=1,umemid=1,opid=39,opsn=1,lvl=process hdr=0xfece0100
2010-12-21 12:12:34.471408*: FenceMaster: OSS_IOCTL_FENCE_ENTITY is set, number of fencing in progress 0 reid cid=3cb0d13cdb9cff5eff8b8bfb091c6fe9,icin=171990399,nmn=2,lnid=171990399,gid=-2147483643,gin=1,gmn=1,umemid=1,opid=39,opsn=1,lvl=process hdr=0xfece0100
2010-12-23 09:17:38.277822*: RIDX (0x2aae2f29feec) for SQLID 6dx247rvykr72 filter 1
2010-12-23 09:17:38.277822*: RIDX (0x2aae2f29feec) : st 2 validBitMap 7fffffffffffffff tabn 0 id {75759 4 3314771398}
2010-12-23 09:17:38.277822*: RIDX: strt 32 end 2048 offset 533652848640 size 1032192 rgnIdx 508931 RgnOffset 16384 scn: 0x0000.073fd1a7 hist: 0x9
2010-12-23 09:17:38.277822*: RIDX validation history: 0:PartialRead 1:PartialRead 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
2010-12-23 09:17:38.277822*: Col id [2] numFilt 20 flg 2:
2010-12-23 09:17:38.277822*: lo: c1 2 0 0 0 0 0 0
2010-12-23 09:17:38.277822*: hi: c3 64 51 4b 0 0 0 0
2010-12-23 09:17:38.277822*: Col id [3] numFilt 0 flg 2:
2010-12-23 09:17:38.277822*: lo: 61 73 64 64 73 61 64 61
2010-12-23 09:17:38.277822*: hi: 61 73 64 64 73 61 64 61
2010-12-23 09:17:38.277822*: Col id [5] numFilt 0 flg 2:
2010-12-23 09:17:38.277822*: lo: 4e 0 0 0 0 0 0 0
2010-12-23 09:17:38.277822*: hi: 59 0 0 0 0 0 0 0
. . .
2010-12-23 09:17:38.291153*: RIDX (0x2aadfea3d2d0) for SQLID 6dx247rvykr72 filter 1
2010-12-23 09:17:38.291153*: RIDX (0x2aadfea3d2d0) : st 2 validBitMap ffffffffffffffff tabn 0 id {75759 4 3314771398}
2010-12-23 09:17:38.291153*: RIDX: strt 0 end 2048 offset 546303901696 size 1048576 rgnIdx 520996 RgnOffset 0 scn: 0x0000.073fd34b hist: 0x1a
2010-12-23 09:17:38.291153*: RIDX validation history: 0:FullRead 1:PartialWrite 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
2010-12-23 09:17:38.291153*: Col id [2] numFilt 20 flg 2:
2010-12-23 09:17:38.291153*: lo: c1 2 0 0 0 0 0 0
2010-12-23 09:17:38.291153*: hi: c3 64 5c 8 0 0 0 0
2010-12-23 09:17:38.291153*: Col id [3] numFilt 0 flg 2:
2010-12-23 09:17:38.291153*: lo: 61 73 64 64 73 61 64 61
2010-12-23 09:17:38.291153*: hi: 61 73 64 64 73 61 64 61
2010-12-23 09:17:38.291153*: Col id [5] numFilt 0 flg 2:
2010-12-23 09:17:38.291153*: lo: 59 0 0 0 0 0 0 0
2010-12-23 09:17:38.291153*: hi: 59 0 0 0 0 0 0 0
2010-12-23 09:17:38.292459*: RIDX (0x2aadfea3d3dc) for SQLID 6dx247rvykr72 filter 1
2010-12-23 09:17:38.292459*: RIDX (0x2aadfea3d3dc) : st 2 validBitMap ffffffffffffffff tabn 0 id {75759 4 3314771398}
2010-12-23 09:17:38.292459*: RIDX: strt 0 end 2048 offset 546304950272 size 1048576 rgnIdx 520997 RgnOffset 0 scn: 0x0000.073fd34b hist: 0x1a
2010-12-23 09:17:38.292459*: RIDX validation history: 0:FullRead 1:PartialWrite 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
2010-12-23 09:17:38.292459*: Col id [2] numFilt 20 flg 2:
2010-12-23 09:17:38.292459*: lo: c1 2 0 0 0 0 0 0
2010-12-23 09:17:38.292459*: hi: c3 64 27 4f 0 0 0 0
2010-12-23 09:17:38.292459*: Col id [3] numFilt 0 flg 2:
2010-12-23 09:17:38.292459*: lo: 61 73 64 64 73 61 64 61
2010-12-23 09:17:38.292459*: hi: 61 73 64 64 73 61 64 61
2010-12-23 09:17:38.292459*: Col id [5] numFilt 0 flg 2:
2010-12-23 09:17:38.292459*: lo: 59 0 0 0 0 0 0 0
2010-12-23 09:17:38.292459*: hi: 59 0 0 0 0 0 0 0

Several things are worth pointing out in this trace file:

  • The first several lines are the standard trace file header with file name and software version.
  • The lines that begin with timestamps are each associated with a single storage region.
  • The SQLID of the statement that generated the trace file is identified for each storage region.
  • Each line containing the SQLID keyword begins data for a new storage region.
  • The line below the SQLID line contains an ID which contains the data_object_id from DBA_OBJECTS for the table being scanned.
  • The regionSize fields show that the storage regions really are 1MB.
  • It looks like each Storage Index entry occupies 2K of memory based on the strt and end field values.
  • For each column evaluated there is an id field that correlates to its position in the table.
  • For each column evaluated there is a flg field. It appears that it is the decimal representation of a bit mask. It also appears that the first bit indicates whether nulls are contained in the current column of the storage region. (That is, 1 and 3 both indicate that nulls are present.)
  • For each column evaluated there is a lo and a hi value (stored as hex).
  • The lo and hi values are only 8 bytes, indicating that the Storage Indexes will be ineffective on columns where the leading portion of the values are not distinct (empirical evidence bears this out, by the way).

While generating and reading trace files is very informative, it is not very easy to do and requires direct access to the storage servers. On top of that, the approach is completely undocumented. It is probably best used for investigations in nonproduction environments.

Monitoring Wrap Up

Neither the database statistic nor the tracing is a particularly satisfying way of monitoring Storage Index usage. It would be nice to be able to track Storage Index usage at the statement level, via a column in V$SQL for example. In the meantime, the cell physical IO bytes saved by storage index statistic is the best option we have.

Controlling Storage Indexes

There is not much you can do to control Storage Index behavior. However, the developers have built in a few hidden parameters that provide some flexibility.

There are three database parameters that deal with Storage Indexes (that we’re aware of):

  • _kcfis_storageidx_disabled (default is FALSE)
  • _kcfis_storageidx_diag_mode (default is 0)
  • _cell_storidx_mode (default is EVA)

None of these parameters are documented, so you need to be careful with the methods we discuss in this section. Nevertheless, we’ll tell you a little bit about some of these parameters and what they can do.

_kcfis_storageidx_disabled

The _kcfis_storageidx_disabled parameter allows Storage Indexes to be disabled. As with all hidden parameters, it’s best to check with Oracle support before setting it, but as hidden parameters go, this one is relatively innocuous. We have used it extensively in testing and have not experienced any negative consequences.

You can set the parameter at the session level with the alter session statement:

alter session set "_kcfis_storageidx_disabled"=true;

Note that although setting _kcfis_storageidx_disabled to TRUE disables Storage Indexes for reads, the setting does not disable the maintenance of existing Storage Indexes. That is to say that existing Storage Indexes will still be updated when values in a table are changed, even if this parameter is set to TRUE.

_kcfis_storageidx_diag_mode

The second parameter, __KCFIS_STORAGEIDX_DIAG_MODE, looks eerily like the cellinit.ora parameter _CELL_STORAGE_INDEX_DIAG_MODE, which was discussed earlier. As you might expect, setting this parameter at the database layer causes trace files to be generated across all the affected storage cells. Setting it to a value of 2 enables tracing. Oddly, setting it to a value of 1 disables Storage Indexes. Unfortunately, the trace files are created on the storage cells. But this method of generating them is much less intrusive than restarting the cellsrv process on a storage server.

You can set the parameter at the session level with the alter session statement:

alter session set "_kcfis_storageidx_diag_mode"=2;

There may be other valid values for the parameter that enable different levels of tracing. Keep in mind that this will produce a large number of trace files on every storage cell that is involved in a query that uses Storage Indexes.

_cell_storidx_mode

The _CELL_STORIDX_MODE parameter was added in the second point release of Oracle Database 11gR2 (11.2.0.2). While this parameter is undocumented, it appears that it controls where Storage Indexes will be applied. There are three valid values for this parameter (EVA,KDST,ALL). EVA and KDST are Oracle kernel function names.

You can set the parameter at the session level with the alter session statement:

alter session set "_cell_storidx_mode"=ALL;

The effects of this parameter have varied across releases. As of cellsrv version 11.2.2.3.0, EVA (the default) supports all the valid comparison operators. You should note that in older versions, the EVA setting did not support the IS NULL comparison operator. It’s also important to keep in mind that the database patching is tied to the storage software patching. Upgrading the version of cellsrv without patching the database software can result in unpredicatable behavior (disabling storage indexes for example).

Storage Software Parameters

In addition to the database parameters, there are also a number of undocumented storage software parameters that are related to Storage Index behavior. These parameters can be modified by adding them to the cellinit.ora file and then restarting cellsrv. Note that cellinit.ora will be discussed in more detail in Chapter 8. Here is a list of the cellinit.ora Storage Index parameters along with their default values.

  • _cell_enable_storage_index_for_loads=TRUE
  • _cell_enable_storage_index_for_writes=TRUE
  • _cell_si_max_num_diag_mode_dumps=20
  • _cell_storage_index_columns=0
  • _cell_storage_index_diag_mode=FALSE
  • _cell_storage_index_partial_rd_sectors=512
  • _cell_storage_index_partial_reads_threshold_percent=85
  • _cell_storage_index_sizing_factor=2

You’ve already seen the tracing parameters (_CELL_STORAGE_INDEX_DIAG_MODE and _CELL_SI_MAX_NUM_DIAG_MODE_DUMPS) in the section “Monitoring Storage Indexes.” These two parameters are the most useful in our opinion, although you should get the idea from the list that there is also some built in ability to modify behaviors such as the amount of memory to allocate for storage indexes and the number of columns that can be indexed per table.

Behavior

There is not a lot you can do to control when Storage Indexes are used and when they are not. Other than the parameter for disabling them, there is little you can do. There is no specific hint to enable or disable their use. And unfortunately, the OPT_PARAM hint does not work with the _KCFIS_STORAGEIDX_DISABLED parameter, either. The fact that there is no way to force the use of a Storage Index makes it even more important to understand when this powerful optimization will and will not be used.

In order for a storage index to be used, a query must include or make use of all the following:

Smart Scan: Storage Indexes can only be used with statements that do Smart Scans. This comes with a whole set of requirements, as detailed in Chapter 2. The main requirements are that the optimizer must choose a full scan and that the I/O must be done via the direct path read mechanism.

At Least One Predicate: In order for a statement to use a Storage Index, there must be a WHERE clause with at least one predicate.

Simple Comparison Operators: Storage Indexes can be used with the following set of operators:

=, <, >, BETWEEN, >=, <=, IN, IS NULL, IS NOT NULL

If a query meets the requirements of having at least one predicate involving simple comparison operators, and if that query’s execution makes use of Smart Scan, then the storage software can make use of storage indexes. They can be applied to any of the following aspects of the query:

Multi-Column Predicates: Storage Indexes can be used with multiple predicates on the same table.

Joins: Storage Indexes can be used on statements accessing multiple tables to minimize disk I/O before the Join operations are carried out.

Parallel Query: Storage Indexes can be used by parallel query slaves. In fact, since direct path reads are required to enable Storage Indexes, parallel queries are very useful for ensuring that Storage Indexes can be used.

HCC: Storage Indexes work with HCC compressed tables.

Bind Variables: Storage Indexes work with bind variables. The values of the bind variables appear to be passed to the storage cells with each execution.

Partitions–Storage Indexes work with partitioned objects. Individual statements can benefit from partition eliminate and storage indexes during the same execution.

Sub-queries: Storage Indexes work with predicates that compare a column to a value returned by a sub-query.

Encryption: Storage Indexes work on encrypted tables.

There are of course limitations. Following are some features and syntax that prevent the use of storage indexes:

CLOBs: Storage Indexes are not created on CLOBs.

!=: Storage Indexes do not work with predicates that use the != comparison operator.

Wildcards: Storage Indexes do not work on predicates that use the % wildcard.

A further limitation is that storage indexes may contain only eight columns. They are created and maintained for eight-columns per table. This does not mean that queries with more than 8 predicates cannot make use of Storage Indexes. In such cases, the storage software can use the indexes that exist, but by default there will be a maximum of eight columns that can be indexed. It does appear that the developers have parameterized this setting, so it may be possible to change this value with help from Oracle support.

Finally, bear in mind that storage indexes are not persisted to disk. The storage cell must rebuild them whenever the cellsrv program is restarted. They are generally created during the first smart scan that references a given column after a storage server has been restarted. They can also be created when a table is created via a CREATE TABLE AS SELECT statement, or during other direct-path loads. And of course, the storage cell will update storage indexes in response to changes that applications make to the data in the tables.

Performance

Storage Indexes provide some of the most dramatic performance benefits available on the Exadata platform. Depending on the clustering factor of a particular column (that is, how well the column’s data is sorted on disk), the results can be spectacular. Here’s a typical example showing the performance of a query with and without the benefit of Storage Indexes:

SYS@EXDB1> alter session set cell_offload_processing=false;

Session altered.

SYS@EXDB1> alter session set "_kcfis_storageidx_disabled"=true;

Session altered.


SYS@EXDB1> select count(*) from kso.skew3;


  COUNT(*)
----------
 384000048

1 row selected.

Elapsed: 00:01:45.39

SYS@EXDB1> alter session set cell_offload_processing=true;

Session altered.

Elapsed: 00:00:00.00
SYS@EXDB1> select count(*) from kso.skew3;

  COUNT(*)
----------
 384000048

1 row selected.

Elapsed: 00:00:23.70

SYS@EXDB1> select count(*) from kso.skew3 where pk_col = 7000;

  COUNT(*)
----------
        12

Elapsed: 00:00:13.74

SYS@EXDB1> alter session set "_kcfis_storageidx_disabled"=false;

Session altered.

Elapsed: 00:00:00.00

SYS@EXDB1> select count(*) from kso.skew3 where pk_col = 7000;

  COUNT(*)
----------
        12

Elapsed: 00:00:01.06

At the start of this demonstration, all offloading was disabled via the database initialization parameter, CELL_OFFLOAD_PROCESSING. Storage Indexes were also disabled, via the hidden parameter _KCFIS_STORAGEIDX_DISABLED. A query without a WHERE clause was run and was completed using direct path reads, but without offloading. That query took 1 minute and 45 seconds to do the full table scan and returned entire blocks to the database grid, just as it would on non-Exadata storage environments. Offloading was then re-enabled and the query was repeated. This time it completed in about 24 seconds. The improvement in elapsed time was primarily due to column projection since the storage layer only had to return a counter of rows, instead of returning any of the column values.

A very selective WHERE clause was then added to the query; it reduced the time to about 14 seconds. This improvement was thanks to predicate filtering. Remember that Storage Indexes were still turned off. A counter for only 12 rows had to be returned to the database machine, but the storage cells still had to read all the data to determine which rows to return. Finally the Storage Indexes were re-enabled, by setting _KCFIS_STORAGEIDX_DISABLED to FALSE, and the query with the WHERE clause was executed again. This time the elapsed time was only about 1 second. While this performance improvement seems extreme, it is relatively common when Storage Indexes are used.

Special Optimization for Nulls

Nulls are a special case for Storage Indexes. There is a separate flag in the Storage Index structure that is used to indicate whether a storage region contains nulls or not. This separate flag makes queries looking for nulls (or the absence of nulls) even more efficient than the normal minimum and maximum comparisons that are typically done. Here’s an example comparing typical performance with and without the special null optimization.

SYS@EXDB1> set timing on
SYS@EXDB1> select count(*) from kso.skew3 where col1 =10000;

  COUNT(*)
----------
         0

Elapsed: 00:00:14.47
SYS@EXDB1> @si

NAME                                                    VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index         3915776

Elapsed: 00:00:00.00
SYS@EXDB1> select count(*) from kso.skew3 where col1 is null;

  COUNT(*)
----------
         4

Elapsed: 00:00:00.12
SYS@EXDB1> @si

NAME                                                    VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index     15954714624

Elapsed: 00:00:00.00

In this example you can see that retrieval of a few nulls was extremely fast. This is because there is no possibility that any storage region that doesn’t contain a null will have to be read, so no false positives will slow down this query. With any other value (except the minimum or maximum value for a column), there will most likely be storage regions that can’t be eliminated, even though they don’t actually contain a value that matches the predicates. This is exactly the case in the previous example, where no records were returned for the first query, even though it took 14 seconds to read all the data from disk. Notice also that the amount of I/O saved by the null query is almost 16 gigabytes (GB), while the amount saved by the first query was only about 3MB. That means that the first query found almost no storage regions that it could eliminate.

Physical Distribution of Values

Storage Indexes behave very differently from normal indexes. They maintain a fairly coarse picture of the values that are stored on disk. However, their mechanism can be very effective at eliminating large amounts of disk I/O in certain situations while still keeping the cost of maintaining them relatively low. It’s important to keep in mind that the physical distribution of data on disk will have a large impact on how effective the Storage Indexes are. An illustration will make this clearer.

Suppose you have a table that has a column with unique values (that is, no value is repeated). If the data is stored on disk in such a manner that the rows are ordered by that column, then there will be one and only one storage region for any given value of that column. Any query with an equality predicate on that column will have to read at most one storage region. Figure 4-2 shows a conceptual picture of a Storage Index for a sorted column.

images

Figure 4-2. A Storage Index on a sorted column

As you can see from the diagram, if you wanted to retrieve the record where the value was 102, you would only have one storage region that could possibly contain that value.

Suppose now that the same data set is stored on disk in a random order. How many storage regions would you expect to have to read to locate a single row via an equality predicate? It depends on the number of rows that fit into a storage region, but the answer is certainly much larger than the 1 storage region that would be required with the sorted data set.

images Kevin Says: Choosing to sort data at load time is a “pay now or pay later” proposition. If sorting data fits within the opportunity window for loading data, the extra effort will pay dividends in increased Storage Index effectiveness and, quite often, improved Hybrid Columnar Compression ratios.

It’s just that simple. Storage indexes will be more effective on sorted data. From a performance perspective, the better sorted the data is on disk, the faster the average access time will be when using Storage Indexes. For a column that is completely sorted, the access time should be very fast and there should be little variation in the access time, regardless of what values are requested. For unsorted data the access times will be faster toward the ends of the range of values (because there are not many storage regions that will have ranges of values containing the queried value). The average access times for values in the middle of the distribution will vary widely. Figure 2-3 is a chart comparing access times using Storage Indexes for sorted and unsorted data.

images

Figure 4-3. Storage Index access times – sorted vs. unsorted

As you can see, sorted data will provide better and more consistent results. While we’re on the subject, I should point out that there are many cases where several columns will benefit from this behavioral characteristic of Storage Indexes. It’s common in data warehouse environments to have data that is partitioned on a date column. And there are often many other columns that track the partition key such as associated dates (order date, ship date, insert date, return date for example) or sequentially generated numbers like order numbers. Queries against these column are often problematic due the fact that partition eliminate cannot help them. Storage Indexes will provide a similar benefit to partition elimination as long as care is taken to ensure that the data is pre-sorted prior to loading.

Potential Issues

There’s no such thing as a free puppy. As with everything in life, there are a few issues with Storage Indexes that you should be aware of.

Incorrect Results

By far the biggest issue with Storage Indexes has been that in early releases of the Exadata Storage Software there were a handful of bugs regarding incorrect results. That is to say that in certain situations, usage of Storage Indexes could eliminate storage regions from consideration that actually contained records of interest. This incorrect elimination could occur due to timing issues with concurrent DML while a Smart Scan was being done using Storage Indexes. These bugs have been addressed in 11.2.0.2 and the latest patches on the storage servers. If you run into this issue, disabling Storage Index use via the hidden parameter, _KCFIS_STORAGEIDX_DISABLED, may be the only option available until the proper patches are applied. This parameter can be set with an alter session command, so that only problematic queries are affected. Of course you should check with Oracle Support before enabling any hidden parameters.

Moving Target

Storage Indexes can be a little frustrating because they don’t always kick in when you expect them to. And without any hints to tell Oracle that you really want a Storage Index to be used, there is little you can do other than try to understand why they are not used in certain circumstances so you can avoid those conditions in the future.

In early versions of the storage server software, one of the main reasons that Storage Indexes were disabled was due to implicit data type conversions. Over the years, Oracle has gotten better and better at doing “smart” data type conversions that don’t have negative performance consequences. For example, if you write a SQL statement with a WHERE clause that compares a date field to a character string, Oracle will usually apply a to_date function to the character string instead of modifying the date column (which could have the unpleasant side effect of disabling an index). Unfortunately, when the Exadata storage software was relatively new, all the nuances had not been worked out, at least to the degree we’re used to from the database side. Dates have been particularly persnickety. Here is an example using cellsrv 11.2.1.2.6:

SYS@EXDB1> select count(*) from kso.skew3 where col3 = '20-OCT-05';      

  COUNT(*)
----------
         0

Elapsed: 00:00:14.00
SYS@EXDB1> @si

NAME                                                    VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index               0


Elapsed: 00:00:00.01
SYS@EXDB1> select count(*) from kso.skew3 where col3 = '20-OCT-2005';

  COUNT(*)
----------
         0

Elapsed: 00:00:00.07
SYS@EXDB1> @si

NAME                                                    VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index     15954337792

Elapsed: 00:00:00.01

In this very simple example there is a query with a predicate comparing a date column (col3) to a string containing a date. In one case, the string contained a four-digit year. In the other, only two digits were used. Only the query with the four-digit year format used the Storage Index. Let’s look at the plans for the statements to see why the two queries were treated differently:

SYS@EXDB1> @fsx2
Enter value for sql_text: select count(*) from kso.skew3 where col3 = %
Enter value for sql_id:
Enter value for inst_id:


SQL_ID            AVG_ETIME  PX OFFLOAD IO_SAVED% SQL_TEXT
------------- ------------- --- ------- --------- ----------------------------------------
2s58n6d3mzkmn           .07   0 Yes        100.00 select count(*) from kso.skew3 where
                                                  col3 = '20-OCT-2005'

fuhmg9hqdbd84         14.00   0 Yes         99.99 select count(*) from kso.skew3 where
                                                  col3 = '20-OCT-05'

2 rows selected.

SYS@EXDB1> select * from table(dbms_xplan.display_cursor('&sql_id','&child_no','typical'));
Enter value for sql_id: fuhmg9hqdbd84
Enter value for child_no:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID  fuhmg9hqdbd84, child number 0
-------------------------------------
select count(*) from kso.skew3 where col3 = '20-OCT-05'

Plan hash value: 2684249835


-------------------------------------------------------------------------
| Id|Operation                  |Name  | Rows|Bytes|Cost (%CPU)|Time    |
-------------------------------------------------------------------------
|  0|SELECT STATEMENT           |      |     |     |  535K(100)|        |
|  1| SORT AGGREGATE            |      |    1|   8 |           |        |
|* 2|  TABLE ACCESS STORAGE FULL|SKEW3 |  384|3072 |  535K  (2)|01:47:04|
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("COL3"='20-OCT-05')
       filter("COL3"='20-OCT-05')


20 rows selected.

SYS@EXDB1> /
Enter value for sql_id: 2s58n6d3mzkmn
Enter value for child_no:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID  2s58n6d3mzkmn, child number 0
-------------------------------------
select count(*) from kso.skew3 where col3 = '20-OCT-2005'

Plan hash value: 2684249835

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       |   531K(100)|          |
|   1 |  SORT AGGREGATE            |       |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW3 |   384 |  3072 |   531K  (1)| 01:46:24 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("COL3"=TO_DATE(' 2005-10-20 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
       filter("COL3"=TO_DATE(' 2005-10-20 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))


22 rows selected.

It appears that the optimizer didn’t recognize the two-digit date as a date. At the very least, the optimizer failed to apply the to_date function to the literal and so the Storage Index was not used. Fortunately most of these types of data conversion issues have been resolved with the later releases. Here’s the same test using cellsrv 11.2.2.2.0:

SYS@SANDBOX> @si

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0

SYS@SANDBOX> select count(*) from kso.skew3 where col3 = '20-OCT-05';

  COUNT(*)
----------
         0

SYS@SANDBOX> @si

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                              16024526848

So as you can see, this conversion issue has been resolved. So why bring it up? Well the point is that the behavior of Storage Indexes have undergone numerous changes as the product has matured. As a result, we have built a set of test cases that we use to verify behavior after each patch in our lab. Our test cases primarily verify comparison operators (=,<,like, IS NULL, etc…) and a few other special cases such as LOBs, compression and encryption. Of course it’s always a good practice to test application behavior after any patching, but if you have specific cases where Storage Indexes are critical to your application you may want take special care to test those parts of you application.

Partition Size

Storage Indexes depend on Smart Scans, which depend on direct path reads. As we discussed in Chapter 2, Oracle will generally use serial direct path reads for large objects. However, when an object is partitioned, Oracle may fail to recognize that the object is “large,” because Oracle looks at the size of each individual segment. This may result in some partitions not being read via the Smart Scan mechanism and thus disabling any Storage Indexes for that partition. When historical partitions are compressed, the problem becomes even more noticeable, as the reduced size of the compressed partitions will be even less likely to trigger the serial direct path reads. This issue can be worked around by not relying on the serial direct path read algorithm and instead specifying a degree of parallelism for the object or using a hint to force the desired behavior.

Incompatible Coding Techniques

Finally, there are some coding techniques that can disable Storage Indexes. Here’s an example showing the effect of the trunc function on date columns:

SYS@EXDB1> select count(*) from kso.skew3 where trunc(col3) = '20-OCT-2005';

  COUNT(*)
----------
         4


1 row selected.

Elapsed: 00:00:57.51
SYS@EXDB1> @fsx2
Enter value for sql_text: select count(*) from kso.skew3 where trunc(col3)%
Enter value for sql_id:

SQL_ID            AVG_ETIME  PX OFFLOAD IO_SAVED% SQL_TEXT
------------- ------------- --- ------- --------- ----------------------------------------
3c1w96cayhut9         56.94   0 Yes         99.99 select count(*) from kso.skew3 where
                                                  trunc(col3) = '20-OCT-2005'

3c1w96cayhut9         57.50   0 Yes         99.99 select count(*) from kso.skew3 where
                                                  trunc(col3) = '20-OCT-2005'


2 rows selected.

Elapsed: 00:00:00.01
SYS@EXDB1> select * from table(dbms_xplan.display_cursor('&sql_id','&child_no','typical'));
Enter value for sql_id: 3c1w96cayhut9                                   
Enter value for child_no:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID  3c1w96cayhut9, child number 1
-------------------------------------
select count(*) from kso.skew3 where trunc(col3) = '20-OCT-2005'

Plan hash value: 2684249835

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       |   541K(100)|          |
|   1 |  SORT AGGREGATE            |       |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW3 |     4 |    32 |   541K  (3)| 01:48:24 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage(TRUNC(INTERNAL_FUNCTION("COL3"))=TO_DATE(' 2005-10-20
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter(TRUNC(INTERNAL_FUNCTION("COL3"))=TO_DATE(' 2005-10-20
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
cardinality feedback used for this statement

In this example, a function was applied to a date column, which as you might expect, disables the Storage Index. The fact that applying a function to a column disables the Storage Index is not too surprising, but application of the trunc function is a commonly seen coding technique. Many dates have a time component and many queries want data for a specific day. It is well known that truncating a date in this manner will disable normal B-Tree index usage. In the past, that generally didn’t matter. Queries in many data warehouse environments were designed to do full scans anyway, so there was really no need to worry about disabling an index. Storage Indexes change the game from this perspective and may force us to re-think some of our approaches. We’ll discuss this issue in more detail in Chapter 16.

Summary

Storage Indexes are an optimization technique that is available when the database is able to utilize Smart Table Scans. They can provide dramatic performance improvements. Storage indexes can be thought of as an alternate partitioning strategy, but without the normal restrictions associated with partitioning. They are especially effective with queries that access data via an alternate key that tracks the primary partition key.

How the data is physically stored is an important consideration and has a dramatic impact on the effectiveness of Storage Indexes. Care should be taken when migrating data to the Exadata platform to ensure that the data is clustered on disk in a manner that will allow Storage Indexes to be used effectively.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.253.31