ANALYZE TABLE . . . STATISTICS FOR COLUMNS

Sometimes the distribution of data in a column is not uniform. For example, a column may have three distinct values, A, B, and C, with relative distributions of 1%, 49%, and 50%, respectively. In this case, using an index to find all rows with a value of A would make sense, but it would not work for the values B and C. The results of the ANALYZE INDEX command would indicate that there are three distinct values, but would give no indication of the relative distribution of the data.

This is where histograms come into play. By grouping the values into buckets, Oracle can determine the distribution of data. With a histogram in place, Oracle would know that if it were looking for rows with a value of A (in the example above), an index would work just fine, but if it were looking for rows with either a B or C value, a full table scan would be more efficient.

Oracle uses a height-balanced histogram. That is, once it determines the number of buckets to use, it divides the total number of rows by the number of buckets. Each bucket then represents that many rows in sorted order. Oracle keeps track of the distribution of keys by storing the highest key value for each bucket. This is different from a width-balanced histogram, where the same number of distinct keys are stored in each bucket with a separate indication of the number of rows represented by each bucket.

The operation of the ANALYZE TABLE command changed slightly in Oracle7 Release 7.2. Since that time, if you do not include a FOR TABLE or a FOR COLUMNS clause, Oracle computes two bucket histograms for all columns in the table. This takes significantly more time to complete, and provides the optimizer with extraneous, often confusing information.

Warning

The EXP (Export) utility saves information about whether a table has been analyzed so the IMP (Import) can reanalyze the table after all rows are inserted. However, EXP has not been updated to reflect this new syntax, so it does not include the FOR TABLE clause. Thus, when you import a table that has been analyzed, IMP forces the creation of two bucket histograms for every column. For this reason, we recommend that you include the STATISTICS=NONE clause in the EXP parameter file, and that you separately generate statements to reanalyze the tables. See Chapter 16, for a discussion of syntax.

Although we recommend that you analyze all tables and indexes, you should choose carefully the columns that you analyze. You should only analyze columns that are indexed, that are used in a WHERE clause, and that have a skewed distribution. You do not need to analyze unique columns or columns with a relatively high number of distinct keys, as those columns are analyzed when an ANALYZE INDEX operation is performed. You want to analyze columns where the distribution of data is such that Oracle uses the index only for specific key values.

An example of using the ANALYZE command to generate histograms is shown here:

ANALYZE TABLE scott.emp COMPUTE STATISTICS FOR COLUMNS empno, deptno;

When you analyze a table to generate column information, Oracle7 populates columns in the following views:

DBA_TAB_COLUMNS
USER_TAB_COLUMNS
ALL_TAB_COLUMNS

In Oracle8, the columns are updated in the views:

DBA_TAB_COL_STATISTICS
DBA_PART_COL_STATISTICS

These are the view columns that Oracle populates:

NUM_DISTINCT

The number of distinct key values.

LOW_VALUE

The lowest keyed value for the column.

HIGH_VALUE

The highest keyed value for the column.

DENSITY

A representation of the distribution of key values within the column.

NUM_NULLS

The number of NULL values.

NUM_BUCKETS

The number of histogram buckets used.

LAST_ANALYZED

The date the column was last analyzed.

SAMPLE_SIZE

The sample size used to calculate the histogram. A value of indicates that the values were computed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.107.255