How to Use the SYSSTAT Tables to Perform What-if Modeling and Analysis

Because DB2 maintains the SYSIBM and SYSCAT system catalog tables and views, they are read-only. However, you can update the SYSSTAT views. This is a special set of views that you can use to update database statistics. The information contained in these views affects how the DB2 optimizer chooses access plans when executing a query.

For planning purposes, you can change the statistical information in the SYSSTAT tables so that they do not reflect the actual state of tables and indexes. This lets you:

  • Model query performance on a development system using production system statistics.

  • Perform what-if query performance analysis by examining various possible changes to the query access plan.

You must have explicit DBADM authority for the database to modify statistics for tables and indexes and their components. That is, your user ID must have DBADM authority in the SYSCAT.DBAUTH table.

Table D.2 provides information about the system catalog tables that contain catalog statistics and the RUNSTATS options that collect specific statistics.

Table D.2. Table Statistics (SYSCAT.TABLES and SYSSTAT.TABLES)
  RUNSTATS Option
StatisticDescriptionTableIndexes
FPAGESNumber of pages being used by a table.YesYes
NPAGESNumber of pages containing rows.YesYes
OVERFLOWNumber of rows that overflow.YesNo
CARDNumber of rows in a table (cardinality).YesYes[1]
ACTIVE_BLOCKSFor MDC tables, the total number of occupied blocks.YesNo

[1] If the table does not have any indexes defined and you request statistics for indexes, no new CARD statistics are updated. The previous CARD statistics are retained.

The Yes or No in the RUNSTATS Option column indicates whether you need to execute the RUNSTATS command on the table, the indexes, or both to collect the statistics specified in the Statistic column. For example, if you want to collect the statistics on FPAGES, you must execute the RUNSTATS command on both the table and indexes. On the other hand, if you want to collect the statistics for OVERFLOW, you need to execute the RUNSTATS command on the table only.

Let's walk you through a what-if scenario: If the EMPLOYEE table had a lot more rows that it has now, which access plan would the DB2 optimizer choose?

First, you collect the statistics on the EMPLOYEE table using the RUNSTATS command:

					RUNSTATS ON TABLE SYLVIAQ.EMPLOYEE
				

After it is completed, obtain the statistics by querying the SYSSTAT.TABLES view:

					SELECT * FROM SYSSTAT.TABLES
					WHERE TABSCHEMA = 'SYLVIAQ' AND TABNAME = 'EMPLOYEE'
				

Figure D.8 shows the output.

Figure D.8. Statistics for the EMPLOYEE table


In Figure D.8, the CARD column indicates that the EMPLOYEE table currently has 32 rows. To update the statistics for the EMPLOYEE table to reflect a bigger table, issue:

					UPDATE SYSSTAT.TABLES
					SET CARD   = 10000,
					NPAGES = 1000,
					FPAGES = 1000,
					OVERFLOW = 2
					WHERE TABSCHEMA = 'SYLVIAQ' AND TABNAME = 'EMPLOYEE'
				

After this is completed, you can run your query against the EMPLOYEE table and get the access plan in text-based format using the db2exfmt command, or in graphic format using the Visual Explain GUI tool (see Chapter 4, Using the DB2 Tools).

You must be careful when manually updating catalog statistics: arbitrary changes can seriously affect the performance of subsequent queries. You can use any of the following methods to revert your changes back.

  • ROLLBACK the unit of work in which the changes have been made (assuming the unit of work has not been committed).

  • Use the RUNSTATS utility to recalculate and refresh the catalog statistics.

  • Update the catalog statistics to indicate that statistics have not been gathered. (For example, setting column NPAGES to –1 indicates that the number-of-pages statistic has not been collected.)

  • Replace the catalog statistics with the data they contained before you made any changes. This method is possible only if you used the db2look command to capture the statistics before you made any changes.

In some cases, the optimizer may determine that some particular statistical value or combination of values is not valid, and it will use the default values and issue a warning. Such circumstances are rare, however, since most of the validation is done when updating the statistics.

Tables D.3 through D.8 briefly describe the rest of the updatable SYSSTAT views.

Table D.3. Column Statistics (SYSCAT.COLUMNS and SYSSTAT.COLUMNS)
  RUNSTATS Option
StatisticDescriptionTableIndexes
COLCARDColumn cardinality.YesYes[1]
AVGCOLLENAverage length of a column.YesYes[1]
HIGH2KEYSecond highest value in a column.YesYes[1]
LOW2KEYSecond lowest value in a column.YesYes[1]
NUMNULLSNumber of NULLs in a column.YesYes[1]
SUB_COUNTAverage number of subelements.YesNo[2]
SUB_DELIM_LENGTHAverage length of each delimiter separating each subelement.YesNo[2]

[1] Column statistics are gathered for the first column in the index key.

[2] These statistics provide information about data in columns that contain a series of subfields or subelements that are delimited by blanks. The SUB_COUNT and SUB_DELIM_LENGTH statistics are collected only for single-byte character set string columns of type CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC.

Table D.4. Multicolumn Statistics (SYSCAT.COLGROUPS and SYSSTAT.COLGROUPS)
  RUNSTATS Option
StatisticDescriptionTableIndexes
COLGROUPCARDCardinality of the column group.YesNo

The multicolumn distribution statistics listed in Tables D.5 and D.6 are not collected by RUNSTATS. You can update them manually, however.

Table D.5. Multicolumn Distribution Statistics (SYSCAT.COLGROUPDIST and SYSSTAT.COLGROUPDIST)
  RUNSTATS Option
StatisticDescriptionTableIndexes
TYPEF = Frequency value. Q = Quantile value.YesNo
ORDINALOrdinal number of the column in the group.YesNo
SEQNOSequence number n that represents the nth TYPE value.YesNo
COLVALUEThe data value as a character literal or a null value.YesNo

Table D.6. Multicolumn Distribution Statistics 2 (SYSCAT.COLGROUPDISTCOUNTS and SYSSTAT.COLGROUPDISTCOUNTS)
  RUNSTATS Option
StatisticDescriptionTableIndexes
TYPEF = Frequency value. Q = Quantile value.YesNo
SEQNOSequence number n that represents the nth TYPE value.YesNo
VALCOUNTIf TYPE = F, VALCOUNT is the number of occurrences of COLVALUEs for the column group identified by this SEQNO. If TYPE = Q, VALCOUNT is the number of rows whose value is less than or equal to COLVALUEs for the column group with this SEQNO.YesNo
DISTCOUNTIf TYPE = Q, this column contains the number of distinct values that are less than or equal to COLVALUEs for the column group with this SEQNO. Null if unavailable.YesNo

Table D.7. Index Statistics (SYSCAT.INDEXES and SYSSTAT.INDEXES)
  RUNSTATS Option
StatisticDescriptionTableIndexes
NLEAFNumber of index leaf pages.NoYes
NLEVELSNumber of index levels.NoYes
CLUSTERRATIODegree of clustering of table data.NoYes[2]
CLUSTERFACTORFiner degree of clustering.NoSee [1], [2]
DENSITYRatio (percentage) of SEQUENTIAL_ PAGES to the number of pages in the range of pages occupied by the index.[3]NoYes
FIRSTKEYCARDNumber of distinct values in the first column of the index.NoYes
FIRST2KEYCARDNumber of distinct values in the first two columns of the index.NoYes
FIRST3KEYCARDNumber of distinct values in the first three columns of the index.NoYe
FIRST4KEYCARDNumber of distinct values in the first four columns of the index.NoYes
FULLKEYCARDNumber of distinct values in all columns of the index, excluding any key value in a type-2 index for which all record identifiers (RIDs) are marked deleted.NoYes
PAGE_FETCH_PAIRSPage fetch estimates for different buffer sizes.NoSee [1], [2]
SEQUENTIAL_PAGESNumber of leaf pages located on disk in index key order, with few or no large gaps between them.NoYes
AVERAGE_SEQUENCE_PAGESAverage number of index pages accessible in sequence. This is the number of index pages that the prefetchers can detect as being in sequence.NoYes
AVERAGE_RANDOM_PAGESAverage number of random index pages between sequential page accesses.NoYes
AVERAGE_SEQUENCE_GAPGap between sequences.NoYes
AVERAGE_SEQUENCE_FETCH_PAGESAverage number of table pages accessible in sequence. This is the number of table pages that the prefetchers can detect as being in sequence when they fetch table rows using the index.NoYes[4]
AVERAGE_RANDOM_FETCH_PAGESAverage number of random table pages between sequential page accesses when fetching table rows using the index.NoYes[4]
AVERAGE_SEQUENCE_FETCH_GAPGap between sequences when fetching table rows using the index.NoYes[4]
NUMRIDSNumber of record identifiers (RIDs) in the index, including deleted RIDs in type-2 indexes.NoYes
NUMRIDS_DELETEDTotal number of RIDs marked deleted in the index, except RIDs on leaf pages on which all record identifiers are marked deleted.NoYes
NUM_EMPTY_LEAFSTotal number of leaf pages on which all record identifiers are marked deleted.NoYes

[2] CLUSTERFACTOR and PAGE_FETCH_PAIRS are not collected with the DETAILED clause unless the table is of a respectable size. If the table is greater than about 25 pages, then CLUSTERFACTOR and PAGE_FETCH_ PAIRS statistics are collected. In this case, CLUSTERRATIO is –1 (not collected). If the table is a relatively small table, only CLUSTERRATIO is filled in by RUNSTATS while CLUSTERFACTOR and PAGE_FETCH_PAIRS are not. If the DETAILED clause is not specified, only the CLUSTERRATIO statistic is collected.

[1] Detailed index statistics are gathered by specifying the DETAILED clause on the RUNSTATS command.

[3] This statistic measures the percentage of pages in the file containing the index that belongs to that table. For a table having only one defined index, DENSITY should normally be 100. DENSITY is used by the optimizer to estimate how many irrelevant pages from other indexes might be read, on average, if the index pages were prefetched.

[4] These statistics cannot be computed when this table is in a DMS table space.

Column distribution statistics (listed in Table D.8) are gathered by specifying the WITH DISTRIBUTION clause on the RUNSTATS command. Note that distribution statistics cannot be gathered unless there is a sufficient lack of uniformity in the column values.

Table D.8. Column Distribution Statistics (SYSCAT.COLDIST and SYSSTAT.COLDIST)
  RUNSTATS Option
StatisticDescriptionTableIndexes
DISTCOUNTIf TYPE = Q, the number of distinct values that are less than or equal to COLVALUE statistics.Distribution[1]No
TYPEIndicator of whether row provides frequent-value or quantile statistics.DistributionNo
SEQNOFrequency ranking of a sequence number to help uniquely identify the row in the table.DistributionNo
COLVALUEData value for which frequency or quantile statistics is collected.DistributionNo
VALCOUNTFrequency with which the data value occurs in columns. For quantiles, the number of values is less than or equal to the data value (COLVALUE).DistributionNo

[1] DISTCOUNT is collected only for columns that are the first key column in an index.

Table D.9. Function Statistics (SYSCAT.FUNCTIONS and SYSSTAT.FUNCTIONS)
StatisticDescription
IOS_PER_INVOCEstimated number of read/write requests executed each time a function is executed.
INSTS_PER_INVOCEstimated number of machine instructions executed each time a function is executed.
IOS_PER_ARGBYTEEstimated number of read/write requests executed per input argument byte.
INSTS_PER_ARGBYTESEstimated number of machine instructions executed per input argument byte.
PERCENT_ARGBYTESEstimated average percent of input argument bytes that the function will actually process.
INITIAL_IOSEstimated number of read/write requests executed only the first/last time the function is invoked.
INITIAL_INSTSEstimated number of machine instructions executed only the first/last time the function is invoked.
CARDINALITYEstimated number of rows generated by a table function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.88.7