Chapter 6 – CTAS and CBT

“A different world can not be achieved through indifferent people.”

Chinese Proverb

 

CTAS (Create Table AS)

image

Create Table AS (CTAS) is a way to create new tables from existing tables without the reloading data. The table create is based solely on query results. This is a brilliant way to place tables distribution keys in the order you want to performance tune queries.

 

Using the CTAS (Create Table AS) Table For Co-Location

image

Because both tables are joined on the key Customer_Number, and both tables are using Customer_Number as their Distribution Key, the matching rows are physically stored on the same SPU. This has to happen for two rows to be joined. They must be on the same SPU and physically touch in that SPUs memory. It is called Co-Location, and we “on purpose” made our CTAS tables both distribute to achieve Co-Location.

 

CTAS Facts

image

 

Altering a CTAS Table to Rename It

“The only true wisdom is in knowing you know nothing.”

– Socrates

image

 

FPGA Card and Zone Maps – The Netezza Secret Weapon

image

The first 200 columns of every table are automatically zone mapped. A Zone Map lists the min and max value for each column. The Field Programmable Gate Array (FPGA card) analyzes the query and before ordering the block into memory, checks the Zone Map to see if the block cannot possibly have that data. It can then skip reading that block. Where most databases use indexes to determine where data is, Netezza uses Zone Maps to determine where data is NOT! Our query above is looking for data WHERE Order_Total < 300. The Zone Map shows this block will contain rows, and therefore it will be moved into memory for processing. Each SPU has their own separate Zone Map and FPGA card. I have colored the min and max values to illustrate how it all works. No matter what column is in the WHERE clause, Netezza can exclude blocks.

 

How A CTAS with ORDER BY Improves Queries

image

he zone maps will eliminate the need to read the majority of blocks (3 MB extents). Since the data is sorted on the column Order_Date, the zone maps min and max values have tight date ranges. Our query only wants January data, so the majority of data blocks won’t need to be read. The FPGA card reads the zone map and only reads blocks with January data.

 

A CTAS Major Sort Benefits over the Minor Sort

image

The zone maps will eliminate the need to read the majority of blocks (3 MB extents). Since the data is sorted on the column Dept_No first (major sort), the zone maps min and max values have tight ranges. Any query using Dept_No in the WHERE clause will allow the majority of blocks to be eliminated by the FPGA card because the zone map will show the min and max values for the Dept_No and the range of values should be small. Because Dept_No is our major sort and Last_Name is our minor sort (and this is a sort with precedence), then Last_Name’s zone maps won’t benefit nearly as much as Dept_No’s.

 

A CBT (Cluster Based Table) Orders Data Without Precedence

image

A Cluster Based Table, referred to as a CBT will take up to four columns (usually three) and allow them to essentially be the sort keys, but each sorted column gets equal benefit in the zone map. This is called sorting without precedence. Netezza uses the Hilbert Space-Filling Curve to hand this. An example awaits on the following page.

 

A CBT (Cluster Based Table) in Theory

image

Above, you can see that in our example on the left when our query searches for customer_number 007 a normal table (Ordered by Order_Date) has to read every extent in it’s Order_Table. This is because the zone map has min and max values, and 007 could not be eliminated from any zone map extent. Our CBT example on the right places like data in our CBT sorts and gives each sort equal precedence. Now, when we look for customer_number 007 only three extents need to be read.

 

Creating a Cluster Based Table (CBT

image

It is the ORGANIZE ON clause that builds the CTAS table as a CBT. You can specify up to four columns you want to ORGANIZE ON, but the max is often three. This might or might not help in the queries, but worth proofing.

 

Creating a Temp Table as a Cluster Based Table (CBT)

image

It is the ORGANIZE ON clause that builds the CTAS table as a CBT. You can specify up to four columns you want to ORGANIZE ON, but the max is often three. This might or might not help in the queries, but worth proofing.

 

Comparing Extents That Are Sorted Vs. A CBT

image

Above is an example of a table with 8 extents (24 MB) held by a single SPU. On the left, the table has been sorted by Order_Date. On the right, the table is a CBT table that is organized on Customer_Number and Order_Date. Imagine if we were to query for all records involving Customer_Number 1. The table on the left would require the SPU to look in every extent. The table on the right would only need to look through three extents to find Customer_Number 1 orders. Remember, the zone map for each extent has the min and max values for both Customer_Number and Order_Date. The system knows where data does NOT reside and the blocks are not read into memory. That is Netezza’s secret sauce!

 

Benefits Of A Cluster Based Table (CBT)

image

It is important that you implement and test when and where a CBT helps performance of a table. Above are some concepts

 

Altering a Cluster Based Table (CBT) back to a Normal Table

image

It is the ALTER Table and ORGANIZE ON NONE clause that alters the CTAS table from a CBT back to a normal table

 

GROOM Command is Used to Physically Change the Table

image

The GROOM TABLE command performs many functions, but relating to CBTs it physically re-organizes or clusters the CBT records. Execute the GROOM TABLE command to implement the clustering that you specified when you created or altered the table. Also, when new records have been inserted or Updated, you periodically execute the GROOM TABLE command.

 

After Creating a CBT, You Must GROOM The Table

image

The GROOM TABLE command processes and reorganizes the table records in each data slice in a series of “steps”. It is designed this way so users can still perform queries such as SELECT, UPDATE, DELETE, and INSERT operations while the online data grooming is in progress.

 

What The GROOM Does for a Table

image

Before you GROOM a table, you should make sure that backups of the table have been taken. Once the GROOM happens, then the deleted data is gone. If you want that deleted data gone, then just run GROOM.

 

Groom Command Syntax

There are only two different ways to run the GROOM command:

image

You can only run the GROOM command if you are the owner of the table or you have been given GROOM privilege. GROOM runs as a background process, so full table access is available while the records are being groomed.

 

Checking Groom Progress

image

Because GROOM TABLE operations are done as background processes, you may want or need to check the progress of currently running commands. There are two ways that you can do this and they are listed above.

 

How to Know if your CBT Table Needs to be GROOMED?

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.22.23