Chapter 3 – How Netezza Distributes the Data

“The man who has no imagination has no wings.”

Muhammad Ali

 

Netezza Tables - Distribution Key or Random Distribution

image

The next couple of pages will show you how Netezza spreads the data among the SPUs.

 

Table CREATE Examples with 4 different Distribution Keys

image

A table can have only one Distribution Key, so picking the right one is essential. Above are four different Examples for your consideration.

 

Netezza Uses a Hash Formula

image

There is one Hashing Formula in Netezza, and it is consistent. The concept is to take the value of a row’s Distribution Key and run it through the Hash Formula. It will produce a Row Hash number. That Row Hash will stay with the row and reside as the RowID. The Row Hash also determines which SPU owns the row.

 

The Hash Map Determines which SPU will own the Row

image

A row will be placed on a SPU after the loading node hashes the row’s Distribution key value. The output of the Hashing Algorithm is a numeric number. The numeric number goes to a bucket in the Hash Map and is assigned to a SPU.

 

The Hash Formula, Hash Map and SPU

image

The above example hashed Emp_No 1001 (Distribution Key value), and the output was a numeric number of 13. Netezza counted over to bucket 13 in the Hash Map, and it has the number five (5) inside that bucket. This means that this row will go to SPU 5.

 

Placing rows on the SPU

image

The above example hashed Emp_No 1002 (Distribution Key value), and the output was a numeric number of 9. Netezza counted over to bucket 9 in the Hash Map, and it has the number one (1) inside that bucket. This means that this row will go to SPU 1.

 

Placing rows on the SPU Continued

image

The above example hashed Emp_No 1003 (Primary Index value), and the output was a numeric number of 8. Netezza counted over to bucket 8 in the Hash Map, and it has the number eight (8) inside that bucket. This means that this row will go to SPU 8.

 

A Review of the Hashing Process

image

Take a look at the row hash for each row, and notice it corresponds with the Hash Map. Data that is unique will provide perfectly even distribution.

 

Like Data Hashes to the Same SPU

image

Netezza Hashes the Distribution Key with a single formula, and then distributes the rows among the SPUs. All like values go to the same SPU. Be careful what Distribution Key you choose.

 

Distribution Keys

image

A Unique Distribution Key spreads the rows of a table evenly across the SPUs.

 

Distribution Key in WHERE Clause – 1 SPU Retrieve

image

Use the Distribution Key column in your SQL WHERE clause for a single SPU retrieve. Netezza is fastest when the distribution key is used in the WHERE clause.

 

A Non-Unique Distribution Key

image

A Non-Unique Distribution Key will have duplicates grouped together on the same SPU, so data will always be skewed (uneven). The above skew is reasonable.

 

Distribution Key in the WHERE Clause – 1 SPU Retrieve

image

Use the Distribution Key in your SQL WHERE clause, and only 1 SPU retrieves.

 

A conceptual example of a Multi-Column Distribution Key

image

A table can have only one Distribution Key, but you can combine multiple columns together to form one Multi-Column Distribution Key.

 

Distribution Key in the WHERE Clause – 1 SPU Retrieve

image

Use all columns of the Distribution Key in the WHERE clause if you want to get a Single SPU retrieve.

 

A conceptual example of a Table with Random Distribution

image

A Table that specifically states Distribute on Random will receive no Distribution Key. It will distribute the data evenly but randomly.

 

A Full Table Scan – or a Sequential Scan

image

Since this table had Random Distribution, the data spread evenly, but Netezza did a Full Table Scan (Sequential Scan) on the query. So, all-SPUs read all of the rows they own.

 

What happens when you forget the Distribution Key?

image

When you forget to define the Distribution Key, Netezza will default to the first column in the table. Clearly define what you want!

 

Fully Qualifying an Object

image

Object names within a database should be unique. To access a table in the current database, you can use just the object name. But for cross database access, names need to be fully qualified to the object level.

 

Checking for Skew

image

You can use the query above to check for data distribution. The more even the distribution, the better the performance on queries that involve a Full Table Scan (FTS).

 

Educate the Business on the Business by Sharing the Model

“If I have seen farther than others, it is because I was standing on the shoulders of giants.”

- Isaac Newton

image

Great companies model the business, but fail to allow the masses to reap its rewards. Allow your business users to stand on the shoulders of giants and understand the business. Start first by allowing users to visually see what tables join to what tables. The Nexus Query Chameleon shows you tables visually. Then, it shows you what joins to what. You just click on the columns you want on your report, and Nexus writes the SQL perfectly every time.

 

Load Your Models and have the SQL Built Automatically

“Following the light of the sun, we left the Old World.”

- Christopher Columbus

image

Don’t let Christopher Columbus follow the sun and navigate the SQL West because your profits will soon be going South. Some users like to build their SQL by hand, but this makes as much sense as assembly line bakers wanting to churn their own butter. Load the logical models for every system in your enterprise, show the tables and views visually, allow users to click on the columns they want, and then let the tool build the SQL perfectly every time. You can save magnitudes of order in development costs.

 

Netezza is Massively Parallel

“To escape criticism – do nothing, say nothing, be nothing.”

– Elbert Hubbard

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.131.212