Chapter 2 – A Chip Off The Old Block

“The difference between genius and stupidity is that genius has its limits”

Albert Einstein

 

Netezza Stores Data in Tables

image

Netezza stores data in tables much like an Excel spreadsheet. Each row has many columns. The difference is how Netezza processes the data. The rows are not stored together like an Excel spreadsheet. Each row is physically separated from its neighbor and sent to a particular SPU. The SPUs then store and retrieve the rows they have been assigned when the host orders them to do so. The host retrieves the data from the SPUs, puts the report together, and the user sees their information once again.

 

Each SPU is Assigned Specific Rows

“Great ability develops and reveals itself increasingly with every new assignment.”

- Baltasar Gracian

image

“Great scalability develops as different SPUs are assigned different rows.”

- Guru of the obvious

Netezza takes the rows of every table and spreads them among the SPUs. The table above had 12 rows, so each SPU was assigned four rows. Every SPU in a Netezza system receives a portion of the rows.

 

Each SPU Organizes the Rows inside a Data Block (Extent)

image

When you go to the airport, you pack your things inside your luggage. Each SPU has a separate luggage bag for each table. This keeps things organized. This luggage is called a data block, but Netezza refers to it as an Extent. Each piece of luggage (extent) is 3 MBs.

 

SPUs Must Transfer Their Data Blocks to Memory

image

No matter if a SPU needs to read one row or even just one column, it must move the entire block (extent) from disk into memory.

When you go to the airport and pack your things inside your luggage, you can’t retrieve things inside your luggage once you check it in. If you forgot your medicine because it was packed inside your luggage, you would need to retrieve the entire bag to get your medicine. Netezza retrieves the luggage to read just one row.

 

As Tables Get Bigger the SPU uses Multiple Extents

image

When you go on vacation for two-weeks, you might pack a lot of clothes. It is then that you take two suitcases. Netezza has to move data blocks inside memory. When they have a lot of data, they pack more suitcases. As you can see above, each SPU has two extents.

 

SPUs Process A Table One Block at a Time

image

At the Airport luggage counter, each bag needs to be weighed. You put bag one on first and then after that is processed, you put bag two on. That is how Netezza SPUs process data. One data block at a time.

 

The Slowest Processing is a Full Table Scan

image

A Full Table Scan (FTS) means every SPU must read every row they own for a particular table. That means each block must be placed into memory (one block at a time). This is extremely expensive and time-consuming for large tables. Above, you can see that the Employee_Table has grown into 12 extents per SPU.

 

The FPGA Card and the Zone Maps Eliminate Extents

image

Each extent has a zone map that provides the min and max value for each column in the extent. The FPGA card reads the zone map, and if Netezza determines that the block could not possibly have the data it is looking for, then Netezza skips bringing that block into memory.

 

The FPGA Card and the Zone Map Enlightenment

image

There is NO WAY this extent has Dept_No 400 because the min Dept_No is 100 and the max Dept_No is 300

Each extent has a zone map that provides the min and max value for each column in the extent. The FPGA card reads the zone map, and if Netezza determines that the block could not possibly have the data it is looking for, then Netezza skips bringing that block into memory.

 

Netezza Systems Can Grow Forever

“It’s always been and always will be the same in the world: the horse does the work and the coachman is tipped.”

- Anonymous

image

“It’s always been and always will be the same with Netezza: the SPUs do the work and the cows get tipped.”

- Farmer Tera-Tom (out standing in his field)

If you need to double the speed of your Full Table Scans, then just add more hardware and double your SPUs. The data from each table respreads, and the system is twice as fast. Netezza’s number one weapon for processing massive amounts of data is linear scalability, which means as you add SPUs, your system improves performance linearly. Other systems can only get so big before they max out!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.134.133