Chapter 9 – Collecting Statistics

“Fall seven times – Stand up eight.”

Japanese Proverb

 

The Basics on Collecting Statistics

image

The GENERATE STATISTICS SQL command should be run on a regular basis for tables. It collects all levels of statistics, including dispersion values (if the table is not larger than 500 million rows). Dispersion is the inverse of cardinality (the number of distinct values in a column).

 

Best Practices for Generating Statistics

image

Above are some best practices for generating statistics.

 

Syntax to Collect Statistics

image

When Collecting Statistics on certain columns in a table, you should choose those columns used in WHERE, ORDER BY, GROUP BY, and JOIN clauses. There is absolutely no reason to collect on columns used only in the SELECT clause. You need to be the owner of the database or table or have the GenStats permission to execute GENERATE STATISTICS. As you can see from our examples, the GENERATE STATISTICS may be run for every table in the database, specific tables, or specific columns within a table. You collect statistics for only certain columns to reduce the time needed to run GENERATE STATISTICS.

 

Syntax to Collect Express Statistics

image

Although the “express” operation is much faster, it is not as accurate. The major difference is that, when doing an express generation, the GENERATE does an approximation to determine the number of distinct values for each column without reading each and every row and column value. What does this mean regarding accuracy? If the table is evenly distributed across the SPUs, it does not mean much in the way of efficient execution plans out of the optimizer. However, if the data is skewed, it can mean that execution plans are not as efficient as they might otherwise be given exact demographics of the data.

 

The Basics on Collecting Statistics

image

Following a CTAS operation (CREATE TABLE AS…) the Netezza software will GENERATE STATISTICS to collect additional statistics on the created table. However, for small tables, the GENERATE STATISTICS process can often take more time to run than the CTAS operation itself. Therefore, if the CTAS insert operation yields a row count that is less than 10,000 (ctas_auto_stats_min_rows default setting), then GENERATE STATISTICS is not run.

If you use the GROOM command to reclaim space from (logically) deleted records, the system leaves the min/max, null, and dispersion values unchanged, and updates the zone maps. Truncating a table will reset all statistics except dispersion values.

 

Collecting Full Statistics

image

Updating statistics is the most important ongoing database administration task for optimal query performance. Netezza is designed to perform with minimal intervention and tuning. Just keep the statistics up to date and Netezza will handle the rest.

 

Just-In-Time (JIT) Statistics

image

JIT statistics improve selectivity estimations when a table contains data skew. It can also improve the optimizer’s plan when there are complex column restrictions. JIT statistics are also used to avoid broadcast of large tables that were estimated to be small based on the available statistics. The overhead is worth the time because broadcasting a large table can have a huge consequence. JIT statistics doesn’t use full generation of statistics, but instead uses a sampler scan function and then leverages Zone Map information to collect the number of maximum extents on the data slice with the greatest data skew, total number of extents, estimated number of rows, and Dispersion values for columns used in joins or group by processing.

 

How Netezza Collects Statistics on Small Tables

image

 

How Netezza Collects Statistics on Medium Tables

image

 

How Netezza Collects Statistics on Large Tables

image

 

Generating Statistics using NzAdmin

image

NzAdmin can be used to generate statistics. Just select the Database tab in the NzAdmin main window. Click the Database tab on the NzAdmin tool’s main window. In the left pane, click Databases > select database > Tables. In the right pane, right-click a table, then click Generate Statistics. In the GENERATE STATISTICS dialog box (shown above), select the columns on which you want to generate statistics. If you do not select any columns, the system processes all columns.

 

You Cannot Generate Statistics Within a Begin-End Block

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.45.5