Chapter 6 - Indexes

“Only he who attempts the ridiculous may achieve the impossible.”

– Don Quixote

There are Only Three Types of Scans

Sequential scan – This is a Full Table Scan. Each vworker that holds a portion of a table will begin reading from the first row to the last row. This is the slowest of all scans.

Index scan – When an index has been placed on a column used in the WHERE clause of the SQL, the optimizer will consider using the index subtable to perform the scan. The index will hold the secondary index values and contain pointers to the base rows. If the Postgres optimizer thinks that the answer set will contain < 20% of the base rows, it will use the index scan. An index scan is much like an index at the back of a book. If you only want to read a couple pages, the index is how you look something up fast. If you have to read most of the book anyway, then there is no sense in using the index and then reading the entire book. It is important to run the ANALYZE command so that statistics are available on the table so the optimizer can make a good decision about using the index.

Bitmap index scan – If multiple columns in the WHERE clause of the SQL are ANDed together and these columns are also secondary indexes, then the optimizer can use a bitmap scan. This means that Aster will read each index table and set a bit if they get a hit for that value. If there is a bit flipped for a particular row for all scans, they know the row qualifies. The Postgres optimizer must believe that the filtering on these columns will select out <10-20% of the base fact table). Once again, it is important that the table has had the ANALYZE command run to ensure the statistics are current and accurate.

Above are the three types of scans an Aster Database will run.

Guidelines for Indexes

image

Create Indexes after Inserting Data. Otherwise, indexes are updated with each insert.

image

Understand when an index is appropriate. It will only be used if the optimizer thinks that the returning rows will be < 20% of the entire table. The less the percentage of returning rows the more attractive the index. You can even index a Distribution Key.

image

Order Index Columns for Performance: When creating an index with a composite (multicolumn) key, the most popular columns (in the WHERE clause) should be first.

image

Limit the Number of Indexes for each table because an index will speed up SELECTs, but will slow down DMLs such as INSERTs, DELETEs, and UPDATEs. Tables that rarely load new data will benefit from indexes. Tables that get modified very often will do better with fewer indexes.

image

Drop indexes that are not being used. Check out the explain to see if the indexes are utilized in queries. If not, then drop them.

image

Keep statistics up to date. Run ANALYZE when the data changes by more than 10%.

Above are some great guidelines for your knowledge.

An Index Syntax Example

image

Define indexes on a table if the column is used in the WHERE clause of the SQL to find a single row or typically only a few rows. Indexes can also be very useful for speeding up a group-by clause and can even be useful in many joins. A secondary index can slow down the load process, so it is best to create the index after the table has been loaded. If you are maintaining a table with a great deal of inserts, updates, and deletes, then drop the index and re-create it after the maintenance is complete.

The B-Tree Index

image

Indexes allow you to avoid a Full Table Scan. If the query has the indexed column in its WHERE clause and Aster estimates it will have to select less than 20% of the rows in the table, then Aster will choose the index scan rather than the Full Table Scan. A B-tree index is organized like an upside-down tree. The roots at the bottom level of the index holds the actual data values and then there are pointers to the corresponding rows. In a sense, it is like an index at the end of a book. Each value points to the page where each entry is present.

Which Columns Might You Create an Index?

Flight_Table

image

Choose the best four secondary index choices from the above table.

Indexes allow you to avoid a Full Table Scan. If the query has the indexed column in its WHERE clause and Aster estimates it will have to select less than 20% of the rows in the table, then Aster will choose the index scan rather than the Full Table Scan. Which four secondary index choices would you most likely create for the Flight_Table? Turn the pages to see our opinion.

Answer - Which Columns Might You Create an Index?

Flight_Table

image

image

CREATE INDEX choice1_idx ON Flight_Table (Cust_No);

image

CREATE INDEX choice2_idx ON Flight_Table (First_Name);

image

CREATE INDEX choice3_idx ON Flight_Table (Last_Name);

image

CREATE INDEX choice4_idx ON Flight_Table (First_Name, Last_Name);

Above are the best choices for indexing. Notice that we also created a multi-column index in example four.

A Visual of an Index (Conceptually)

image

The b-tree Subtable is vworker local because all values in the subtable are of those in the base table (owned by that vworker).

The concept of an index is that each index subtable is sorted by the index value. Inside the index subtable are both the index value and a pointer to the base row. Each vworker can perform a search of their subtable quickly because it is in ascending order. Then, they can find the rows from the base table they need to satisfy the query.

A Query Using an Index Uses All vworkers

image

All vworkers move their index block into memory and do a binary search. For any vworker who finds they have a Sushma, they then move their base block into memory and retrieve the row. The optimizer won't consider an index unless it thinks it will bring back less than 20% of the rows. This is why the ANALYZE command is so important. ANALYZE updates the statistics of a table.

Multicolumn indexes

image

Multicolumn indexes, often referred to as composite indexes, are useful when a set of columns are used together in the WHERE clause of the SQL statement. They become extremely beneficial when a particular combination of buckets often occur together (Cust_No between 4 and 8 AND Flight_No between 100 and 200 occurring together in many scenarios). List the columns used the most in the WHERE clause first in a Multi-Column index. Why? A multicolumn index can be used with query conditions that involve any subset of the indexed columns, but the index is most efficient when there are constraints on the leading (left most) columns. Up to 32 fields may be specified in a Multi-Column index by default.

A NUSI BITMAP Theory

image

When multiple indexes are used in the SQL, Aster can bring both index subtables into a vworker's memory. The indexes are scanned simultaneously. Above, the First_Name subtable is scanned by each vworker for the name 'Mo' while simultaneously the Dept_No index subtable is scanned for Dept_No 100. The base table Row IDs are placed into memory. If they match, then Aster knows that both conditions were met and the row is returned.

A NUSI Bitmap in Action

image

The Queen directs the vworkers to do binary searches on their index subtables and record the Row-ID of the base when a condition is met. If all conditions are met, then the vworker is directed to place the base table block holding the row inside its memory. It can then retrieve the base row. Think of this technique as binary searches in parallel.

Indexes on Expressions

image

image

CREATE INDEX low_idx ON Flight_Table (lower(First_Name));

image

SELECT * FROM Flight_Table WHERE lower(First_Name) = 'anna';

The example above shows the creation of the index and then the SQL query that follows. The optimizer could use the index for the SQL query above.

Indexes on Extracts of Dates

image

image

CREATE INDEX month_idx ON Order_Table
(EXTRACT(MONTH FROM Order_Time_Stamp));

image

SELECT * FROM Order_Table
WHERE EXTRACT(Month FROM Order_Time_Stamp) = 01 ;

The example above shows the creation of the index and then the SQL query that follows. The optimizer could use the index for the SQL query above.

GiST Indexes

Aster Data only supports b-tree indexes and GiST indexes. A GiST index lets you perform lookups via the index that take a given IP address and search through a table of IP address ranges to find the ranges that include that address.

Aster Database supports the ip4 data type for IPv4 network addresses, and the ip4range data type for ranges of network addresses. The GiST index type is only used for creating an index on a column of type ip4range.

Here are two syntactically correct ways to create a GiST index:

CREATE INDEX indexname ON tablename USING gist (column);

CREATE INDEX indexname ON tablename USING gist
(ip4range(column1::ip4,column2::ip4));

A GiST index can only be used with IP Range type queries.

Five Operational Tips for Efficient Indexing

image

Index only after loading – Wait until after you have loaded the data before you define indexes. The presence of the indexes will slow your loading down significantly. If you are loading to an existing table, it is best to drop the index, load your data, and then re-create the index.

image

Remove indexes before bulk deleting – Drop all indexes of a table before you perform a bulk delete on that table. The presence of the indexes will slow your delete operation down significantly.

image

Indexes on logically partitioned tables vs. parent/child tables with inheritance - If you are placing indexes on logically partitioned tables, you need only create your indexes on the top level partitioned table because indexes are inherited.

image

Reindexing – When you INSERT, UPDATE, or DELETE a large number of rows on a table, you should run the REINDEX command to rebuild the table’s indexes. This removes fragmentation from the index.

image

Test your indexes with EXPLAIN – An index that is not going to be used by the Optimizer is taking up space and time. Test queries to make sure the index is being used, otherwise drop it.

When indexes are created, a subtable is created that is maintained with the base table. When a row is updated in the base table, the row is also updated in the index subtable. This takes time, so on load and maintenance jobs, it is often best to drop the indexes and re-create them afterward.

REINDEX

image

REINDEX TABLE Order_Table ;

image

REINDEX INDEX month_idx ;

If you suspect corruption of an index on a user table, you can simply rebuild that index, or all indexes on the table, using REINDEX INDEX or REINDEX TABLE.

REINDEX is similar to a drop and recreate of the index because the index is rebuilt from scratch. To REINDEX a single index or a table, you need to be the owner of that index or table. Superusers can REINDEX anything.

Aster Database does not support REINDEX for the entire database. To REINDEX your database, run REINDEX on each table in the database.

createCompressedIndexOnCompressedTableByDefault Flag

There is a configuration flag that is named:

createCompressedIndexOnCompressedTableByDefault

The DBA can set this flag using the queen configuration flag interface. This flag automatically creates compressed indexes for compressed tables. To override this flag you can select compression in the CREATE INDEX statement.

The default value of createCompressedIndexOnCompressedTableByDefault is OFF. When it is set to ON:

1. Indexes implicitly created through the CREATE TABLE statement will be compressed automatically, using the same compression level as the base table. This does not apply to tables created using CREATE TABLE AS SELECT (CTAS).

2. If the COMPRESS clause isn’t specified when issuing CREATE INDEX, a compressed index with the same compression level as the base table will be created.

Yes, indexes can be compressed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.57.172