Indexes

An index, also known as a key, provides a means of rapidly accessing specific rows of data within a table.

Indexes Explained

Suppose that you have a users table that holds details about many people. When you ask MySQL to retrieve all rows matching a given criterion, it reads through the entire table, row by row, retrieving data from all rows that match the criterion. Even if you want to retrieve only a single specific row, it must read every row of the table.

Reading through an entire table like this—an operation known as a table scan—does the job but is highly inefficient. That's where an index can help.

Say that we have a query in which we want to retrieve the row referenced by a given user_id in table users. The table might look like this:

+---------+----------------+
| user_id | name           |
+---------+----------------+
|       3 | Clare Scriven  |
|       1 | Tony Butcher   |
|       2 | John Schreiber |
|       5 | Peter Stamp    |
+---------+----------------+

As you can see, the user_id numbers are not in any particular order, and we've no idea how many rows to retrieve. Our query may look like this:

SELECT * FROM users WHERE user_id = '2'

Without an index, MySQL has to read every row, looking at the user_id for each person and seeing whether it matches the one we're looking for.

Figure 9.1 illustrates the effect an index has on this table, when applied to the user_id column.

Figure 9.1. Using an index on the user_id column.


An index is like an additional column on a table that holds the contents of the original column, but sorted into order. Every entry in the index points back to the row from which it is derived, so by identifying which row you want from the index, you can quickly find the corresponding row in the table.

Why is this more efficient? Values in the index are kept in order, so if a query tries to retrieve data from a row with a given user_id, MySQL looks in the index for that user_id rather than scan the entire table. A search algorithm within MySQL helps it find that value in the index quickly, helped by the fact that the values are sorted. After MySQL finds the relevant item in the index, it does a seek to the related table row and retrieves the data. MySQL can stop searching the index as soon as it encounters a different value (because the index is sorted!).

If the user_id column is declared as unique, this further helps efficiency because MySQL knows there's only one row to retrieve. For this reason, unique indexes generally improve performance more than non-unique indexes.

In the preceding example, you saw what happens in a query when you apply an index to the user_id column. However, in general, tables may have many more columns that are used by queries to find a given row. Such a table might therefore have many indexes, some unique and some non-unique, according to the data they hold. Then, any query that tries to look up data according to the value of an indexed column, will benefit from the index being present.

When and When Not to Use Indexes

You will have realized by now that applying an index—or even several indexes—to a table is generally a good idea. However, let's spend a few moments understanding when they're useful, what goes on to create an index, and why they are not always desirable.

When to Use Indexes

Use an index for all columns referred to in a WHERE clause in a SELECT statement. For example:

SELECT name FROM users WHERE user_id = 3

indicates that the user_id column in table users should be indexed. Similarly, if your query tries to find the maximum or minimum value of a column, it would run much more quickly with an index on the column named in the MAX() or MIN() function.

Wherever you use ORDER BY or GROUP BY, an index would be an advantage. Because both operations perform a sort on the values in the column they specify, having that column's values presorted as an index is clearly an advantage.

Finally, columns on which you perform a join should be indexed. So if you write this:

SELECT u.name, u.auth_group, a.headline FROM users AS u
INNER JOIN articles AS a ON u.user_id = a.user_id;

the user_id columns in both tables ought to be indexed. The columns to be indexed are those appearing after ON or USING. The process of matching values across these columns requires multiple comparisons, and an index reduces the workload for MySQL.

Always use a unique index rather than a non-unique index if you can. As well as enforcing uniqueness in a column (which may or may not be a necessity), they speed up read operations because MySQL knows in advance exactly how many rows it expects to retrieve.

Where columns cannot be unique, columns whose values are nearer to being unique are better candidates for an index than columns with frequently occurring values. For example, if you have a table of people who will be looked up by name (such as a phone directory), a “last name” column would be a stronger candidate than a “first initial” column because last names are more diverse than first initials (although in practice you might index both).

When Not to Use Indexes

Indexes are beneficial to database queries that read data, but when a write operation is taking place, they actually create extra work. For example, if you perform an INSERT to create a new row of data in a table, all the indexes for that table will have to be updated. So in general, indexes should not be applied on tables that have many write operations compared to reads.

If a SELECT statement has to retrieve more than about a third of the rows in a table, an index will not help. For example, selecting on gender from a table of people might retrieve about half the rows, so indexing the gender column would be pointless. When using an index, MySQL has to perform a disk seek to go from each index entry to the corresponding row in the table, and if it has to do this for a high proportion of rows, it might as well do a table scan.

A further consideration is that indexes take up disk space. If an index does not benefit performance, it might as well be dropped because it will make your database larger than it needs to be.

Finally, there will be no performance advantage from applying indexes to tables with just a few rows of data; a table with just a few rows would not be worth indexing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.156.202