Chapter 28 introduced the utility of the PRIMARY
and UNIQUE
keys, defining the role of each and showing you how to recognize and incorporate them into your table structures. However, indexing plays such an important role in database development that it's worth devoting some additional time to this feature. In this chapter, the following topics are covered:
An index is essentially an ordered (or indexed) subset of table columns, with each row entry pointing to its corresponding table row. Generally speaking, introducing indexing into your MySQL database development strategy gives you three advantages:
name
column, in addition to any other column that will be frequently searched.There are four general categories of indexes: primary, unique, normal, and full-text. Each type is introduced in this section.
The primary key index is the most common type of index found in relational databases. It's used to uniquely identify each row as a result of the primary key's own uniqueness. Therefore, the key must be either a value that the entity represented by the row uniquely possesses, or a unique value generated by the database, such as an automatically incrementing integer value. As a result, regardless of whether preexisting rows are subsequently deleted, every row will have a unique primary index. For example, suppose you want to create a database of useful Web sites for your company's IT team. This table might look like the following:
CREATE TABLE bookmarks (
id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(75) NOT NULL,
url VARCHAR(200) NOT NULL,
description MEDIUMTEXT NOT NULL,
PRIMARY KEY(id));
Because the id
column automatically increments (beginning with 1) with each insertion, it's not possible for the bookmarks
table to ever contain multiple rows containing exactly the same cells. For instance, consider the following three queries:
INSERT INTO bookmarks (name, url, description)
VALUES("Apress", "www.apress.com", "Computer books");
INSERT INTO bookmarks (name, url, description)
VALUES("Google", "www.google.com", "Search engine");
INSERT INTO bookmarks (name, url, description)
VALUES("W. Jason Gilmore", "www.wjgilmore.com", "Jason's website");
Executing these three queries and retrieving the table produces the following output:
+-------+------------------+-------------------+-----------------+
| id | name | url | description |
+------ +------------------+-------------------+-----------------+
| 1 | Apress | www.apress.com | Computer books |
| 2 | Google | www.google.com | Search engine |
| 3 | W. Jason Gilmore | www.wjgilmore.com | Jason's website |
+-------+------------------+-------------------+-----------------+
Note how the id column increments with each insertion, ensuring row uniqueness.
Note You can have only one automatically incrementing column per table, and that column must be designated as the primary key. Furthermore, any column designated as a primary key cannot hold NULL
values; even if not explicitly declared as NOT NULL
, MySQL will automatically assign this trait.
It is typically ill-advised to create a primary index that allows the developer to divine some information about the row it represents. The reason why is demonstrated with an illustration. Rather than use an integer value as the bookmarks
table's primary index, suppose you decide to use the url
instead. The repercussions involved in making such a decision should be obvious. First, what happens if the URL changes, due to a trademark issue or an acquisition, for example? Even Social Security numbers, values once taken for granted as being unique, can be changed due to the repercussions of identity theft. Save yourself the hassle and always use a primary index that offers no insight into the data it represents; it should be an autonomous vehicle with the sole purpose of ensuring the ability to uniquely identify a data record.
Like a primary index, a unique index prevents duplicate values from being created. However, the difference is that only one primary index is allowed per table, whereas multiple unique indexes are supported. With this possibility in mind, consider again the bookmarks
table from the previous section. Although it's conceivable that two Web sites could share the same name—for example, "Great PHP resource"—it wouldn't make sense to repeat URLs. This sounds like an ideal unique index:
CREATE TABLE bookmarks (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(75) NOT NULL,
url VARCHAR(200) NOT NULL UNIQUE,
description MEDIUMTEXT NOT NULL,
PRIMARY KEY(id));
As mentioned, it's possible to designate multiple fields as unique in a given table. For instance, suppose you want to prevent contributors to the link repository from repeatedly designating nondescriptive names ("cool site," for example) when inserting a new Web site. Again returning to the bookmarks
table, define the name
column as unique:
CREATE TABLE bookmarks (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(75) NOT NULL UNIQUE,
url VARCHAR(200) NOT NULL UNIQUE,
description MEDIUMTEXT NOT NULL,
PRIMARY KEY(id));
You can also specify a multiple-column unique index. For example, suppose you want to allow your contributors to insert duplicate url
values, and even duplicate name
values, but you do not want duplicate name
and url
combinations to appear. You can enforce such restrictions by creating a multiple-column unique index. Revisiting the original bookmarks
table:
CREATE TABLE bookmarks (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(75) NOT NULL,
url VARCHAR(200) NOT NULL,
UNIQUE(name,url),
description MEDIUMTEXT NOT NULL,
PRIMARY KEY(id));
Given this configuration, the following name
and url
value pairs could all simultaneously reside in the same table:
Apress site, http://www.apress.com/
Apress site, http://blogs.apress.com/
Blogs, http://www.apress.com/
Apress blogs, http://blogs.apress.com/
However, attempting to insert any of the preceding combinations again will result in an error because duplicate combinations of name
and url
are illegal.
Quite often you'll want to optimize searches on fields other than those designated as primary or even unique. Because this is a commonplace occurrence, it only makes sense that it should be possible to optimize such searches by indexing these fields. Such indexes are typically called normal, or ordinary.
Single-Column Normal Indexes
A single-column normal index should be used if a particular column in your table will be the focus of a considerable number of your selection queries. For example, suppose an employee profile table consists of four columns: a unique row ID, first name, last name, and e-mail address. You know that the majority of the searches will be specific to either the employee's last name or the e-mail address. You should create one normal index for the last name and a unique index for the e-mail address, like so:
CREATE TABLE employees (
id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
firstname VARCHAR(35) NOT NULL,
lastname VARCHAR(35) NOT NULL,
email VARCHAR(55) NOT NULL UNIQUE,
INDEX (lastname),
PRIMARY KEY(id));
Building on this idea, MySQL offers the feature of creating partial-column indexes, based on the idea that the first N characters of a given column often are enough to ensure uniqueness, where N is specified within the index creation parameters. Creating partial-column indexes requires less disk space and is considerably faster than indexing the entire column. Revisiting the previous example, you can imagine that using the first five characters of the last name suffices to ensure accurate retrieval:
CREATE TABLE employees (
id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
firstname VARCHAR(35) NOT NULL,
lastname VARCHAR(35) NOT NULL,
email VARCHAR(55) NOT NULL UNIQUE,
INDEX (lastname(5)),
PRIMARY KEY(id));
Often, however, selection queries are a function of including multiple columns. After all, more complex tables might require a query consisting of several columns before the desired data can be retrieved. Run time on such queries can be decreased greatly through the institution of multiple-column normal indexes, discussed next.
Multiple-Column Normal Indexes
Multiple-column indexing is recommended when you know that a number of specified columns will often be used together in retrieval queries. MySQL's multiple-column indexing approach is based upon a strategy known as leftmost prefixing. Leftmost prefixing states that any multiple-column index including columns A, B, and C will improve performance on queries involving the following column combinations:
Here's how you create a multiple-column MySQL index:
CREATE TABLE employees (
id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
lastname VARCHAR(35) NOT NULL,
firstname VARCHAR(35) NOT NULL,
email VARCHAR(55) NOT NULL UNIQUE,
INDEX name (lastname, firstname),
PRIMARY KEY(id));
This creates two indexes (in addition to the primary key index). The first is the unique index for the e-mail address. The second is a multiple-column index, consisting of two columns, lastname
and firstname
. This is quite useful, because it increases the search speed when queries involve any of the following column combinations:
lastname
, firstname
lastname
Driving this point home, the following queries would benefit from the multiple-column index:
SELECT email FROM employees WHERE lastname="Geronimo" AND firstname="Ed";
SELECT lastname FROM employees WHERE lastname="Geronimo";
The following query would not benefit:
SELECT lastname FROM employees WHERE firstname="Ed";
To gain performance on this query, you'd need to create separate indexes for the firstname
column.
Full-text indexes offer an efficient means for searching text stored in CHAR
, VARCHAR
, or TEXT
datatypes. Before delving into examples, a bit of background regarding MySQL's special handling of this index is in order.
Because MySQL assumes that full-text searches will be implemented for sifting through large amounts of natural-language text, it provides a mechanism for retrieving data that produces results that best fit the user's desired result. More specifically, if a user were to search using a string like Apache is the world's most popular web server
, the words is
and the
should play little or no role in determining result relevance. In fact, MySQL splits searchable text into words, by default eliminating any word of fewer than four characters. You'll learn how to modify this behavior later in this section.
Creating a full-text index is much like creating indexes of other types. As an example, revisit the bookmarks
table created earlier in this chapter, indexing its description
column using the full-text variant:
CREATE TABLE bookmarks (
id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(75) NOT NULL,
url VARCHAR(200) NOT NULL,
description MEDIUMTEXT NOT NULL,
FULLTEXT(description),
PRIMARY KEY(id));
In addition to the typical primary index, this example creates a full-text index consisting of the description
column. For demonstration purposes, Table 36-1 presents the data found in the bookmarks
table.
id | name | url | description |
1 | Python.org | http://www.python.org/ | The official Python Web site |
2 | MySQL manual | http://dev.mysql.com/doc/ | The MySQL reference manual |
3 | Apache site | http://httpd.apache.org/ | Includes Apache 2 manual |
4 | PHP: Hypertext | http://www.php.net/ | The official PHP Web site |
5 | Apache Week | http://www.apacheweek.com/ | Offers a dedicated Apache 2 section |
Whereas creating full-text indexes is much like creating other types of indexes, retrieval queries based on the full-text index are different. When retrieving data based on full-text indexes, SELECT
queries use two special MySQL functions, MATCH()
and AGAINST()
. With these functions, natural-language searches can be executed against the full-text index, like so:
SELECT name,url FROM bookmarks WHERE MATCH(description) AGAINST('Apache 2'),
The results returned look like this:
+-------------+----------------------------+
| name | url |
+------------------------------------------+
| Apache site | http://httpd.apache.org |
| Apache Week | http://www.apacheweek.com |
+-------------+----------------------------+
This lists the rows in which "Apache" is found in the description
column, in order of highest relevance. Remember that the "2" is ignored because of its length. When MATCH()
is used in a WHERE
clause, relevance is defined in terms of how well the returned row matches the search string. Alternatively, the functions can be incorporated into the query body, returning a list of weighted scores for matching rows; the higher the score, the greater the relevance. An example follows:
SELECT MATCH(description) AGAINST('Apache 2') FROM bookmarks;
Upon execution, MySQL will search every row in the bookmarks
table, calculating relevance values for each row, like so:
+----------------------------------------+
| match(description) against('Apache 2') |
+----------------------------------------+
| 0 |
| 0 |
| 0.57014514171969 |
| 0 |
| 0.38763393589171 |
+----------------------------------------+
You can also take advantage of a feature known as query expansion, which is particularly useful when the user is making certain presumptions that might not otherwise necessarily be built into the application's search logic. For example, suppose the user was searching for the term "football." Logically he would also be interested in rows including terms such as "Pittsburgh Steelers," "Ohio State Buckeyes," and "Woody Hayes." To compensate for this, you can include the WITH QUERY EXPANSION
clause, which will first retrieve all rows including the term "football," and then will search all rows again, this time retrieving all rows having any of the words found in the rows of the first set of results.
Therefore, returning to the example, a row including "Pittsburgh" would be retrieved in the second search even if it didn't also contain the term "football," provided a row found in the first search included the terms "football" and "Pittsburgh." While this can certainly result in more thorough searches, it could of course have unexpected side effects, such as a row being returned because it has the term "Pittsburgh" in it, yet having absolutely nothing to do with football.
Starting with MySQL version 4.01, Boolean-oriented, full-text searches were introduced. This feature is introduced later in this section.
Stopwords
As mentioned earlier, MySQL by default ignores any keywords of fewer than four characters. These words, along with those found in a predefined list built into the MySQL server, are known as stopwords, or words that should be ignored. You can exercise a good deal of control over stopword behavior by modifying the following MySQL variables:
ft_min_word_len:
You can qualify as stopwords words that don't meet a particular length. You can specify the minimum required length using this parameter. If you change this parameter, you need to restart the MySQL server daemon and rebuild the indexes.ft_max_word_len:
You can also qualify as stopwords words that exceed a particular length. You can specify this length using this parameter. If you change this parameter, you need to restart the MySQL server daemon and rebuild the indexes.ft_stopword_file:
The file assigned to this parameter contains a list of 544 English words that are automatically filtered out of any search keywords. You can change this to point to another list by setting this parameter to the path and name of the requested list. Alternatively, if you have the option of recompiling the MySQL source, you can modify this list by opening myisam/ft_static.c
and editing the predefined list. In the first case, you need to restart MySQL and rebuild the indexes, whereas in the second case you need to recompile MySQL according to your specifications and rebuild the indexes.Note Rebuilding MySQL's indexes is accomplished with the command REPAIR TABLE table_name USE_FRM
, where table_name
represents the name of the table that you would like to rebuild.
The reason that stopwords are ignored by default is that they presumably occur too frequently in common language to be considered relevant. This can have unintended effects, because MySQL also automatically filters out any keyword that is found to exist in over 50 percent of the records. Consider what happens, for example, if all contributors add a URL pertinent to the Apache Web server, and all include the word Apache
in the description. Executing a full-text search looking for the term Apache
will produce what are surely unexpected results: no records found. If you're working with a small result set, or for other reasons require that this default behavior be ignored, use MySQL's Boolean full-text searching capability, introduced next.
Boolean Full-Text Searches
Boolean full-text searches offer more granular control over search queries, allowing you to explicitly identify which words should and should not be present in candidate results (however, although the 50 percent threshold is ignored, the stopword list is still applicable). For example, Boolean full-text searches can retrieve rows that possess the word Apache
, but not Navajo
, Woodland
, or Shawnee
. Similarly, you can ensure that results include at least one keyword, all keywords, or no keywords; you are free to exercise considerable filtering control over returned results. Such control is maintained via a number of recognized Boolean operators. Several of these operators are presented in Table 36-2.
Table 36-2. Full-Text Search Boolean Operators
Operator | Description |
+ |
A leading plus sign ensures that the ensuing word is present in every result row. |
− |
A leading minus sign ensures that the ensuing word is not present in any row returned. |
* |
A tailing asterisk allows for keyword variations, provided that the variation begins with the string specified by the preceding word. |
" " |
Surrounding double quotes ensure that result rows contain that enclosed string, exactly as it was entered. |
< > |
Preceding greater-than and less-than symbols are used to decrease and increase an ensuing word's relevance to the search rankings, respectively. |
( ) |
Parentheses are used to group words into subexpressions. |
Consider a few brief examples. The first example returns rows containing Apache
, but not manual
:
SELECT name,url FROM bookmarks WHERE MATCH(description)
AGAINST('+Apache -manual' in boolean mode);
The next example returns rows containing the word Apache, but not Shawnee or Navajo:
SELECT name, url FROM bookmarks WHERE MATCH(description)
AGAINST('+Apache -Shawnee -Navajo' in boolean mode);
The final example returns rows containing web
and scripting
, or php
and scripting
, but ranks web scripting
lower than php scripting
:
SELECT name, url FROM bookmarks WHERE MATCH(description)
AGAINST(+(<web >php) +scripting);
Note that this last example will only work if you lower the ft_min_word_len
parameter to 3
.
The following list offers a few tips that you should always keep in mind when incorporating indexes into your database development strategy:
WHERE
and ORDER BY
clauses. Indexing columns in abundance will only result in unnecessary consumption of hard drive space, and will actually slow performance when altering table information. Performance degradation will occur on indexed tables because every time a record is changed, the indexes must be updated.INDEX(firstname, lastname)
, don't create INDEX(firstname)
, because MySQL is capable of searching an index prefix. However, keep in mind that only the prefix is relevant; this multiple-column index will not apply for searches that only target lastname
.NOT NULL
for those columns in which you plan on indexing, so that NULL
values will never be stored.−-log-long-format
option to log queries that aren't using indexes. You can then examine this log file and adjust your queries accordingly.EXPLAIN
statement helps you determine how MySQL will execute a query, showing you how and in what order tables are joined. This can be tremendously useful for determining how to write optimized queries, and whether indexes should be added. Please consult the MySQL manual for more information about the EXPLAIN
statement.The ability to easily drill down into a Web site using hyperlinks is one of the behaviors that made the Web such a popular medium. However, as both Web sites and the Web grew exponentially in size, the ability to execute searches based on user-supplied keywords evolved from convenience to necessity. This section offers several examples demonstrating how easy it is to build Web-based search interfaces for searching a MySQL database.
Many effective search interfaces involve a single text field. For example, suppose you want to provide the human resources department with the ability to look up employee contact information by last name. To implement this task, the query will examine the lastname
column found in the employees table. A sample interface for doing so is shown in Figure 36-1.
Figure 36-1. A simple search interface
Listing 36-1 implements this interface, passing the requested last name into the search query. If the number of returned rows is greater than zero, each is output; otherwise, an appropriate message is offered.
Listing 36-1. Searching the Employee Table (simplesearch.php)
<p>
Search the employee database:<br />
<form action="simplesearch.php" method="post">
Last name:<br />
<input type="text" name="lastname" size="20" maxlength="40" value="" /><br />
<input type="submit" value="Search!" />
</form>
</p>
<?php
// If the form has been submitted with a supplied last name
if (isset($_POST['lastname'])) {
// Connect to server and select database
$mysqldb = new mysqli("localhost","websiteuser","secret","corporate");
// Set the posted variable to a convenient name
$lastname = mysqli_real_escape_string($_POST['lastname']);
// Create the query
$query = "SELECT firstname, lastname, email FROM employees WHERE
lastname='$lastname'";
// Query the employees table
$result = $mysqldb->query($query);
// If records found, output firstname, lastname, and email field of each
if ($result->num_rows > 0) {
while ($row = $result->fetch_object())
echo "$row->lastname, $row->firstname ($row->email)<br />";
} else {
echo "No results found.";
}
}
?>
Therefore, entering Gilmore
into the search interface would return results similar to the following:
Gilmore, Jason ([email protected])
Although this simple search interface is effective, what happens if the user doesn't know the employee's last name? What if the user knows another piece of information, such as the e-mail address? Listing 36-2 modifies the original example so that it can handle input originating from the form depicted in Figure 36-2.
Figure 36-2. The search form revised
Listing 36-2. Extending the Search Capabilities (searchextended.php)
<p>
Search the employee database:<br />
<form action="searchextended.php" method="post">
Keyword:<br />
<input type="text" name="keyword" size="20" maxlength="40" value="" /><br />
Field:<br />
<select name="field">
<option value="">Choose field:</option>
<option value="lastname">Last Name</option>
<option value="email">Email Address</option>
</select>
<input type="submit" value="Search!" />
</form>
</p>
<?php
// If the form has been submitted with a supplied keyword
if (isset($_POST['field'])) {
// Connect to server and select database
$mysqldb = new mysqli("localhost","websiteuser","secret","corporate");
// Set the posted variables to convenient names
$keyword = $mysqldb->mysqli_real_escape_string($_POST['keyword']);
$field = $mysqldb->mysqli_real_escape_string($_POST['field']);
// Create the query
if ($field == "lastname" ) {
$result = $mysqldb->query("SELECT firstname, lastname, email
FROM employees WHERE lastname='$keyword'");
} elseif ($field == "email") {
$result = $mysqldb->query("SELECT firstname, lastname, email
FROM employees WHERE email='$keyword'");
}
// If records found, output firstname, lastname, and email field
if ($result->num_rows > 0) {
while ($row = $result->fetch_object())
echo "$row->lastname, $row->firstname ($row->email)<br />";
} else {
echo "No results found.";
}
}
?>
Therefore, setting the field to E-mail Address and inputting [email protected] as the keyword would return results similar to the following:
Gilmore, Jason ([email protected])
Of course, in both examples, you'd need to put additional controls in place to sanitize data and ensure that the user receives detailed responses if he supplies invalid input. Nonetheless, the basic search process should be apparent.
Performing a full-text search is really no different from executing any other selection query; only the query looks different, a detail that remains hidden from the user. As an example, Listing 36-3 implements the search interface depicted in Figure 36-3, demonstrating how to search the bookmarks
table's description
column.
Figure 36-3. A full-text search interface
Listing 36-3. Implementing Full-Text Search
<p>
Search the online resources database:<br />
<form action="fulltextsearch.php" method="post">
Keywords:<br />
<input type="text" name="keywords" size="20" maxlength="40" value="" /><br />
<input type="submit" value="Search!" />
</form>
</p>
<?php
// If the form has been submitted with supplied keywords
if (isset($_POST['keywords'])) {
// Connect to server and select database
$mysqldb = new mysqli("localhost","websiteuser","secret","corporate");
// Retrieve the search keyword string
$keywords = $mysqldb->mysqli_real_escape_string($_POST['keywords']);
// Create the query
$result = $mysqldb->query("SELECT name, url FROM bookmarks
WHERE MATCH(description) AGAINST('$keywords')");
// Output retrieved rows or display appropriate message
if ($result->num_rows > 0) {
while ($row = $result->fetch_object())
echo "<a href="$row->url">$row->name</a><br />";
} else {
echo "No results found.";
}
}
?>
To extend the user's full-text search capabilities, consider offering a help page demonstrating MySQL's Boolean search features.
Table indexing is a sure-fire way to optimize queries. This chapter introduced table indexing and showed you how to create primary, unique, normal, and full-text indexes. You then learned just how easy it is to create PHP-enabled search interfaces for querying your MySQL tables.
The next chapter introduces MySQL's transaction-handling feature and shows you how to incorporate transactions into your Web applications.
18.222.53.26