Hour 16. Using Indexes to Improve Performance


What You’ll Learn in This Hour:

How indexes work

How to create an index

The different types of indexes

When to use indexes

When not to use indexes


In this hour, you learn how to improve SQL statement performance by creating and using indexes. You begin with the CREATE INDEX command and learn how to use indexes that have been created on tables.

What Is an Index?

Simply put, an index is a pointer to data in a table. An index in a database is similar to an index in the back of a book. For example, if you want to reference all pages in a book that discuss a certain topic, you first refer to the index, which lists all topics alphabetically, and it refers you to one or more specific page numbers. An index in a database works the same way in that a query is pointed to the exact physical location of data in a table. You are actually being directed to the data’s location in an underlying file of the database, but as far as you are concerned, you are referring to a table.

Which would be faster, looking through a book page by page for some information or searching the book’s index and getting a page number? Of course, using the book’s index is the most efficient method. It can save a lot of time, especially if the book is large. If you have a book of just a few pages, however, it might be faster to check the pages for the information than to flip back and forth between the index and pages of the book. When a database does not use an index, it is performing what is typically called a full table scan, the same as flipping through a book page by page. Full table scans are discussed in Hour 17, “Improving Database Performance.”

An index is typically stored separately from the table for which the index was created. An index’s main purpose is to improve the performance of data retrieval. Indexes can be created or dropped with no effect on the data. However, after an index is dropped, performance of data retrieval might be slowed. Indexes do take up physical space and can often grow larger than the table. Therefore, you should consider them when estimating your database storage needs.

How Do Indexes Work?

When an index is created, it records the location of values in a table that are associated with the column that is indexed. Entries are added to the index when new data is added to the table. When a query is executed against the database and a condition is specified on a column in the WHERE clause that is indexed, the index is first searched for the values specified in the WHERE clause. If the value is found in the index, the index returns the exact location of the searched data in the table. Figure 16.1 illustrates the functioning of an index.

Suppose the following query was issued:

SELECT *
FROM TABLE_NAME
WHERE NAME = 'SMITH';

Figure 16.1. Table access using an index.

image

As shown in Figure 16.1, the NAME index is referenced to resolve the location of all names equal to SMITH. After the location is determined, the data can quickly be retrieved from the table. The data, in this case, names is alphabetized in the index.


By the Way: Variations of Index Creation

Indexes can be created during table creation in certain implementations. Most implementations accommodate a command, aside from the CREATE TABLE command, used to create indexes. Check your particular implementation for the exact syntax for the command, if any, that is available to create an index.


A full table scan occurs if there is no index on the table and the same query is executed, which means that every row of data in the table is read to retrieve information pertaining to all individuals with the name SMITH.

An index is faster because it typically stores information in an orderly tree-like format. Consider if we have a list of books upon which we place an index. The index has a root node, which is the beginning point of each query. Then it is split into branches. Maybe in our case there are two branches, one for letters A–L and the other for letters M–Z. Now if you ask for a book with a name that starts with the letter M, you enter the index at the root node and immediately travel to the branch containing letters M–Z. This effectively cuts your time to find the book by eliminating close to half the possibilities.

The CREATE INDEX Command

The CREATE INDEX statement, as with many other statements in SQL, varies greatly among different relational database vendors. Most relational database implementations use the CREATE INDEX statement:

CREATE INDEX INDEX_NAME ON TABLE_NAME

The syntax is where the vendors start varying greatly on the CREATE INDEX statement options. Some implementations allow the specification of a storage clause (as with the CREATE TABLE statement), ordering (DESC||ASC), and the use of clusters. You must check your particular implementation for its correct syntax.

Types of Indexes

You can create different types of indexes on tables in a database, all of which serve the same goal: to improve database performance by expediting data retrieval. This hour discusses single-column indexes, composite indexes, and unique indexes.

Single-Column Indexes


Did You Know?: Best Places for Single-Column Indexes

Single-column indexes are most effective when used on columns that are frequently used alone in the WHERE clause as query conditions. Good candidates for a single-column index are an individual identification number, a serial number, or a system-assigned key.


Indexing on a single column of a table is the simplest and most common manifestation of an index. Obviously, a single-column index is one that is created based on only one table column. The basic syntax is as follows:

CREATE INDEX INDEX_NAME
ON TABLE_NAME (COLUMN_NAME)

For example, if you want to create an index on EMPLOYEE_TBL for employees’ last names, the command used to create the index looks like the following:

CREATE INDEX NAME_IDX
ON EMPLOYEE_TBL (LAST_NAME);

Unique Indexes

You use unique indexes for performance and data integrity. A unique index does not allow duplicate values to be inserted into the table. Otherwise, the unique index performs the same way a regular index performs. The syntax is as follows:

CREATE UNIQUE INDEX INDEX_NAME
ON TABLE_NAME (COLUMN_NAME)

If you want to create a unique index on EMPLOYEE_TBL for an employee’s last name, the command used to create the unique index looks like the following:

CREATE UNIQUE INDEX NAME_IDX
ON EMPLOYEE_TBL (LAST_NAME);

The only problem with this index is that every individual’s last name in EMPLOYEE_TBL must be unique, which is impractical. However, a unique index should be created for a column, such as an individual’s Social Security number, because that number would be unique for each individual.

You might be wondering, “What if an employee’s Social Security number is the primary key for a table?” An index is usually implicitly created when you define a primary key for a table. However, a company can use a fictitious number for an employee ID but maintain each employee’s Social Security number for tax purposes. You probably want to index this column and ensure that all entries into this column are unique values.

When working with objects such as unique indexes, it is often beneficial to create the indexes on empty tables during the creation of the database structure. This ensures that the data going into the structure already meets the demand of the constraints you want to place on it. If you are working with existing data, you will want to analyze the impact of whether the data needs to be adjusted to be able to properly apply the index.

Composite Indexes

A composite index is an index on two or more columns of a table. You should consider performance when creating a composite index, because the order of columns in the index has a measurable effect on the data retrieval speed. Generally, the most restrictive value should be placed first for optimum performance. However, the columns that are always specified in your queries should be placed first. The syntax is as follows:

CREATE INDEX INDEX_NAME
ON TABLE_NAME (COLUMN1, COLUMN2)

An example of a composite index follows:

CREATE INDEX ORD_IDX
ON ORDERS_TBL (CUST_ID, PROD_ID);

In this example, you create a composite index based on two columns in the ORDERS_TBL: CUST_ID and PROD_ID. You assume that these two columns are frequently used together as conditions in the WHERE clause of a query.


Did You Know?: Unique Index Constraints

You can only create a unique index on a column in a table whose values are unique. In other words, you cannot create a unique index on an existing table with data that already contains records on the indexed key that are nonunique. Similarly, you cannot create a unique index on a column that allows for NULL values. If you attempt to create a unique index on a column that violates one of these principles, the statement fails.


In deciding whether to create a single-column index or a composite index, consider the column(s) that you might use frequently in a query’s WHERE clause as filter conditions. If only one column is used, choose a single-column index. If two or more columns are frequently used in the WHERE clause as filters, a composite index would be the best choice.

Implicit Indexes

Implicit indexes are indexes that are automatically created by the database server when an object is created. Indexes are automatically created for primary key constraints and unique constraints.

Why are indexes automatically created for these constraints? Imagine that you are the database server. A user adds a new product to the database. The product identification is the primary key on the table, which means that it must be a unique value. To efficiently make sure the new value is unique among hundreds or thousands of records, the product identifications in the table must be indexed. Therefore, when you create a primary key or unique constraint, an index is automatically created for you.


Did You Know?: Best Places for Composite Indexes

Composite indexes are most effective on table columns that are used together frequently as conditions in a query’s WHERE clause.


When Should Indexes Be Considered?

Unique indexes are implicitly used in conjunction with a primary key for the primary key to work. Foreign keys are also excellent candidates for an index because you often use them to join the parent table. Most, if not all, columns used for table joins should be indexed.

Columns that you frequently reference in the ORDER BY and GROUP BY clauses should be considered for indexes. For example, if you are sorting on an individual’s name, it is quite beneficial to have an index on the name column. It renders an automatic alphabetical order on every name, thus simplifying the actual sort operation and expediting the output results.

Furthermore, you should create indexes on columns with a high number of unique values, or columns that, when used as filter conditions in the WHERE clause, return a low percentage of rows of data from a table. This is where trial and error might come into play. Just as you should always test production code and database structures before implementing them into production, so should you test indexes. Your testing should center on trying different combinations of indexes, no indexes, single-column indexes, and composite indexes. There is no cut-and-dried rule for using indexes. The effective use of indexes requires a thorough knowledge of table relationships, query and transaction requirements, and the data itself.

When Should Indexes Be Avoided?


By the Way: Plan for Indexing Accordingly

You should plan your tables and indexes. Don’t assume that because an index has been created, all performance issues are resolved. The index might not help at all (it might actually hinder performance) and might just take up disk space.


Although indexes are intended to enhance a database’s performance, sometimes you should avoid them. The following guidelines indicate when you should reconsider using an index:

• You should not use indexes on small tables. This is because indexes have an overhead associated with them in terms of query time to access them. In the case of small tables, it is usually faster for the query engine to do a quick scan over the table rather than look at an index first.

• You should not use indexes on columns that return a high percentage of data rows when used as a filter condition in a query’s WHERE clause. For instance, you would not have an entry for the words the or and in the index of a book.

• You can index tables that have frequent, large batch update jobs run. However, the batch job’s performance is slowed considerably by the index. You can correct the conflict of having an index on a table that is frequently loaded or manipulated by a large batch process by dropping the index before the batch job and then re-creating the index after the job has completed. This is because the indexes are also updated as the data is inserted, causing additional overhead.

• You should not use indexes on columns that contain a high number of NULL values. This is because indexes operate best on columns that have a higher uniqueness of the data between rows. If there are a lot of NULL values, the index will be skewed toward the NULL values and might affect performance.

You should not index columns that are frequently manipulated. Maintenance on the index can become excessive.

You can see in Figure 16.2 that an index on a column, such as gender, might not prove beneficial. For example, suppose the following query was submitted to the database:

SELECT *
FROM TABLE_NAME
WHERE GENDER = 'FEMALE';

Figure 16.2. An example of an ineffective index.

image

By referring to Figure 16.2, which is based on the previous query, you can see that there is constant activity between the table and its index. Because a high number of data rows is returned for WHERE GENDER = 'FEMALE' (or 'MALE'), the database server constantly has to read the index, and then the table, and then the index, and then the table, and so on. In this case, it might be more efficient for a full table scan to occur because a high percentage of the table must be read anyway.


Watch Out!: Indexes Can Sometimes Lead to Performance Problems

Caution should be taken when creating indexes on a table’s extremely long keys because performance is inevitably slowed by high I/O costs.


As a general rule, do not use an index on a column used in a query’s condition that returns a high percentage of data rows from the table. In other words, do not create an index on a column such as gender, or any column that contains few distinct values. This is often referred to as a column’s cardinality, or the uniqueness of the data. High cardinality means very unique and therefore describes things such as identification numbers. Low-cardinality values are not very unique and refer to columns such as the gender example.

Altering an Index

You can alter an index after it has been created using syntax that is similar to the CREATE INDEX syntax. The types of alterations that you can manage with the statement differ between implementations but handle all the basic variations of an index in terms of columns, ordering, and such. The syntax is as follows:

ALTER INDEX INDEX_NAME

You should take care when altering an existing index on production systems. This is because in most cases the index is immediately rebuilt, which obviously creates an overhead in terms of resources. Additionally, on most basic implementations, while the index is being rebuilt it cannot be utilized for queries that might put an additional hindrance upon the performance of your system.

Dropping an Index

An index can be dropped rather simply. Check your particular implementation for the exact syntax, but most major implementations use the DROP command. You should take care when dropping an index because performance might be slowed drastically (or improved!). The syntax is as follows:

DROP INDEX INDEX_NAME

MySQL uses a slightly different syntax; you also specify the table name of the table that you are dropping the index from:

DROP INDEX INDEX_NAME ON TABLE_NAME

The most common reason for dropping an index is an attempt to improve performance. Remember that if you drop an index, you can re-create it later. You might need to rebuild an indexes to reduce fragmentation. It is often necessary to experiment with the use of indexes in a database to determine the route to best performance, which might involve creating an index, dropping it, and eventually re-creating it, with or without modifications.


Watch Out!: Be Careful with Your Indexes

Indexes are often good for performance, but that’s not always true. Refrain from creating indexes on columns that contain few unique values, such as gender and state of residence.



By the Way: Differences in How to Drop Indexes

MySQL uses the ALTER TABLE command to drop indexes. However, you can still use the DROP INDEX syntax and MySQL maps it to an appropriate ALTER TABLE statement. Again, different SQL implementations might vary widely in syntax, especially when dealing with indexes and data storage.


Summary

You have learned that you can use indexes to improve the overall performance of queries and transactions performed within the database. Database indexes, like an index of a book, enable specific data to be quickly referenced from a table. The most common method for creating indexes is through use of the CREATE INDEX command. Different types of indexes are available among SQL implementations. Unique indexes, single-column indexes, and composite indexes are among those types of indexes. You need to consider many factors when deciding on the index type that best meets the needs of your database. The effective use of indexes often requires some experimentation, a thorough knowledge of table relationships and data, and a little patience—but patience when you create an index can save minutes, hours, or even days of work later.

Q&A

Q. Does an index actually take up space the way a table does?

A. Yes. An index takes up physical space in a database. In fact, an index can become much larger than the table for which the index was created.

Q. If you drop an index so a batch job can complete faster, how long does it take to re-create the index?

A. Many factors are involved, such as the size of the index being dropped, the CPU usage, and the machine’s power.

Q. Should all indexes be unique?

A. No. Unique indexes allow no duplicate values. There might be a need for the allowance of duplicate values in a table.

Workshop

The following workshop is composed of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises are intended to afford you the opportunity to apply the concepts discussed during the current hour, as well as build upon the knowledge acquired in previous hours of study. Please take time to complete the quiz questions and exercises before continuing. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.

Quiz

1. What are some major disadvantages of using indexes?

2. Why is the order of columns in a composite important?

3. Should a column with a large percentage of NULL values be indexed?

4. Is the main purpose of an index to stop duplicate values in a table?

5. True or false: The main reason for a composite index is for aggregate function usage in an index.

6. What does cardinality refer to? What is considered a column of high-cardinality?

Exercises

1. For the following situations, decide whether an index should be used and, if so, what type of index should be used.

a. Several columns, but a rather small table

b. Medium-sized table; no duplicates should be allowed

c. Several columns, very large table, several columns used as filters in the WHERE clause

d. Large table, many columns, a lot of data manipulation

2. Write the SQL statement to create an index called EP_POSITION in EMPLOYEE_PAY_TBL on the POSITION column.

3. Create a statement to alter the index you just created to make it unique. What do you need to do to create a unique index on the SALARY column? Write the SQL statements that you need to run them in the sequence.

4. Study the tables used in this book. What are some good candidates for indexed columns based on how a user might search for data?

5. Create a multicolumn index on ORDERS_TBL. Include the following columns: CUST_ID, PROD_ID, and ORD_DATE.

6. Create some additional indexes on your tables as desired.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.198.183