Chapter 11 – Creating Tables

“You know you’re old if they have discontinued your blood type.”

Phyllis Diller

 

CREATE TABLE Syntax

image

The CREATE table statement establishes the name of the table, a name of each column in the table, its data type, and any data attributes. It also defines the Distribution key for the table. In addition, you may specify data protection features, data attributes, and constraints that dictate data quality. The WITH CHECK OPTION is used to specify that Referential Integrity (RI) is only checked at the end of a batch. The NO option indicates that RI is not enforced.

 

Viewing the DDL

image

Once you create a table, you can view it in the Nexus tree by right clicking on the table and selecting View DDL.

 

Netezza Tables - Distribution Key or Random Distribution

image

The next couple of pages will show you how Netezza spreads the data among the SPUs.

 

Table CREATE Examples with 4 different Distribution Keys

image

A table can have only one Distribution Key, so picking the right one is essential. Above are four different Examples for your consideration.

 

The Worst Mistake You Can Make For A Distribution Key

image

If you make the Distribution Key a Boolean, only two SPUs hold the data

Imagine that you just made the Distribution Key on the column GENDER. There are only two values (Boolean) and those are ‘M’ for male or ‘F’ for Female. All of the rows in your table will go to only two SPUs. It is a good idea to choose the distribution key on a column with more distinct values than the number of SPUs. It is also a good idea to make the Distribution Key the same column as the joining column to another table it joins with so all joins are co-located on the same SPU. When you can’t find a good column that will distribute well, it is a good idea to then go with RANDOM.

 

Good things to know about Table and Object Names

image

Above is some knowledge that might be good to know.

 

Netezza Data Types

image

Listed above are the data types that you can utilize in Netezza.

 

Netezza Data Types in More Detail

image

 

Netezza Data Type Extensions

image

These data types are all available for use within NPS. Notice that there are fixed and variable length data formats. The fixed data types always require the entire defined length on disk for the column. The variable types can be used to maximize data storage within a block by storing only the data provided within a row by the client software.

 

Reserved Names Within A Table

image

Whether the system is executing a single or multi-statement transaction, it is identified with a TransactionID. Each row in a NPS table has four extra columns: SLICEID, ROWID, CREATEXID and DELETEXID. The columns uniquely identify every row in every table. Once SLICEID, ROWID, and CREATEXID are assigned, they are never changed during the life of that row. However, DELETEXID is up for grabs. Obviously, it is changed by the transaction that deletes the row. A deleted row stays in the table but is not available to users because DELETEXID is greater than zero.

There is another time when DELETEXID gets set. Whenever a column in an existing row gets changed, the current row is effectively deleted, and a “new” row inserted. During the life a transaction, the current row’s DELETEXID remains active, and the new row has the identification of the transaction in progress. At the end of the transaction, the system NPS changes the DELETEXID of the new row to zero and the DELETEXID of the original row to the ID of the transaction. At that point, the transaction is finished, the original row is inaccessible, and the new row is ready for use.

 

How To Query and See Non-Active Rows

image

When Netezza does a DELETE or an UPDATE, it creates a new row and logically deletes the old row. The logically deleted rows won’t show up in a query unless you set the SHOW_DELETED_RECORDS to TRUE. You can easily spot these non-active rows because they have a DELETEXID other than zero. These non-active rows take up space in a table. Once the GROOM Tablename command is run, the non-active rows are physically deleted. Netezza uses this strategy so they don’t have to have a journal to handle rollbacks. It is also valuable because there are ways to recover deleted rows.

 

Column Attributes

image

Alfred North Whitehead once said, “We think in generalities, but we live in details.” We have seen examples in this chapter of creating tables with simple column definitions. These examples have been generalities, but often we need to model our columns in great detail. It is time to take the next step. When defining a table, it is normally advantageous to be more specific regarding the definition of the columns and their attributes. In the above example, the columns have been further defined using column attributes. The emp column is defined with the NOT NULL attribute. This means that a NULL value cannot be stored in the dept column. The lname column has been further defined to default to the user session name. The fname column defaults to TeraTom, salary to zero and hire_date to today’s date.

 

Constraints

image

Sometimes boundaries are necessary! At times, it is advisable to add restrictions to the table and columns within a table. This is done to provide data integrity, availability, and ease of use to the table and its data. Users can still use their imagination and ask any question, but on all inserts, updates, and deletes NPS will be watching.

Relational theory describes columns and the attributes a column can have as part of a domain. The domain includes the data type and valid value boundaries based on business requirements or restrictions. Because ANSI does not support a rigorous atomic definition of a domain, it is up to the database designer to define the domains for a table and their legal values by creating constraints on one or more columns. Constraints keep the table in check to enforce certain rules.

 

Constraints

image

The great writer Mark Twain was quoted as saying, “Whenever you find you are on the side of the majority, it is time to pause and reflect.” Sometimes it is good to be on the side of the majority, but at other times it is best to be UNIQUE. The UNIQUE constraint is used to enforce uniqueness of values stored within the column(s). This means that no two rows in the table can have the same value for the column or columns utilizing the UNIQUE constraint. That is the idea behind a constraint. If data fall outside of the constraint, then the system pauses and reflects and sends an error message.

 

Column Level Constraint Example

image

In the above table, the emp column is defined with a primary key constraint and therefore must have a value because it cannot be null and it must also be unique. The definition would be the same if these two individual constraints were used. The dept column has an Referential Integrity (RI) constraint named Ref_1 on the column called dept in the Dept_Table. This means that a dept cannot be entered into the employee table unless that dept exists in the department table. This is referential integrity as its best!

The dept column referenced in the department table must be defined as unique in order for this constraint to be defined. The last two constraints are on the soc_sec column. First, there is the NOT NULL that requires a value to be stored there. Then, the NameUniq constraint requires that the value be different from any other value in other rows because it must be unique.

 

Defining Constraints at the Table Level

image

Besides using column level constraints, table level constraints can also be used. This is the only way to implement multi-column constraints. A multi-column constraint involves more than one column. All table level constraints should always be named. Table level constraints are established after the column definitions.

 

Utilizing Default Values for a Table

image

A default value control phrase determines the action to be taken when you do not supply a value for a field. Default value control phrases are only valid when used with the columns defined in the CREATE TABLE and ALTER TABLE statements. A default value control phrase determines the action to be taken when you do not supply a value for a field. Instead of placing a NULL, the system will place the default value listed in the CREATE or ALTER table command. All columns without a DEFAULT phrase defined for them contain a NULL. However, if a column does not have a DEFAULT defined and has an attribute of NOT NULL, an error is returned and the insert fails.

 

CTAS (Create Table AS)

image

Create Table AS (CTAS) is a way to create new tables from existing tables without the reloading data. The table create is based solely on query results. This is a brilliant way to place tables distribution keys in the order you want to performance tune queries.

 

CTAS Facts

image

 

Using the CTAS (Create Table AS) Table For Co-Location

image

Because both tables are joined on the key Customer_Number and both tables are using Customer_Number as their Distribution Key, the matching rows are physically stored on the same SPU. This has to happen for two rows to be joined. They must be on the same SPU and physically touch in that SPUs memory. It is called Co-Location, and we “on purpose” made our CTAS tables both distribute to achieve Co-Location.

 

Altering a CTAS Table to Rename It

“The only true wisdom is in knowing you know nothing.”

– Socrates

image

 

FPGA Card and Zone Maps – The Netezza Secret Weapon

image

The first 200 columns of every table are automatically zone mapped. A Zone Map lists the min and max value for each column. The Field Programmable Gate Array (FPGA card) analyzes the query, and before ordering the block into memory, checks the Zone Map to see if the block cannot possibly have that data. It can then skip reading that block. Where most databases use indexes to determine where data is, Netezza uses Zone Maps to determine where data is NOT! Our query above is looking for data WHERE Order_Total < 300. The Zone Map shows this block will contain rows and therefore it will be moved into memory for processing. Each SPU has their own separate Zone Map and FPGA card. I have colored the min and max values to illustrate how it all works. No matter what column is in the WHERE clause, Netezza can exclude blocks.

 

How A CTAS with ORDER BY Improves Queries

image

The zone maps will eliminate the need to read the majority of blocks (3 MB extents). Since the data is sorted on the column, Order_Date the zone maps min and max values have tight date ranges. Our query only wants January data, so the majority of data blocks won’t need to be read. The FPGA card reads the zone map and only reads blocks with January data.

 

A CTAS Major Sort Benefits over the Minor Sort

image

The zone maps will eliminate the need to read the majority of blocks (3 MB extents). Since the data is sorted on the column Dept_No first (major sort), the zone maps min and max values have tight ranges. Any query using Dept_No in the WHERE clause will allow the majority of blocks to be eliminated by the FPGA card because the zone map will show the min and max values for the Dept_No, and the range of values should be small. Because Dept_No is our major sort and Last_Name is our minor sort (and this is a sort with precedence), then Last_Name’s zone maps won’t benefit nearly as much as Dept_No’s.

 

Altering A Table

image

The only thing in life we can consistently count on is change. This is especially true in a data warehouse environment. As business requirements change, sometimes it is necessary to reflect those changes into the tables. Netezza allows for modification of a table at either the table or column level using the ALTER command.

 

Altering a Table Examples

image

There are some restrictions when renaming columns. First of all, a new column name cannot match any existing column name in the same table. Secondly, the affected column cannot be part of a referential integrity constraint.

 

Drop Table, Truncate, and Delete Compared

image

If you want to delete all the rows in the table but maintain the table structure, it is recommended to Truncate the table. Otherwise, you will need to delete the rows and then run the GROOM command…

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.41.235