CHAPTER 36

Indexes and Searching

Chapter 28 introduced the utility of the PRIMARY and UNIQUE keys, defining the role of each and showing you how to recognize and incorporate them into your table structures. However, indexing plays such an important role in database development that it's worth devoting some additional time to this feature. In this chapter, the following topics are covered:

  • Database indexing: The first half of this chapter introduces general database indexing terminology and concepts, and discusses primary, unique, normal, and full-text MySQL indexes.
  • Forms-based searches: The second half of this chapter shows you how to create PHP-enabled search interfaces for querying your newly indexed MySQL tables.

Database Indexing

An index is essentially an ordered (or indexed) subset of table columns, with each row entry pointing to its corresponding table row. Generally speaking, introducing indexing into your MySQL database development strategy gives you three advantages:

  • Query optimization: Data is stored in a table in the same order in which you enter it. However, this order may not coincide with the order in which you'd like to access it. For instance, suppose you batch-insert a list of products ordered according to SKU. Chances are your online store visitors will search for these products according to name. Because database searches can be most efficiently executed when the target data is ordered (in this case alphabetically), it makes sense to index the product's name column, in addition to any other column that will be frequently searched.
  • Uniqueness: Often, a means is required for identifying a data row based on some value or set of values that is known to be unique to that row. For example, consider a table that stores information about company staff members. This table might include information about each staff member's first and last name, telephone number, and Social Security number. Although it's possible that two or more staff members could share the same name (John Smith, for example) or share the same phone number (if they share an office, for example), you know that no two employees will possess the same Social Security number, thereby guaranteeing uniqueness for each row.
  • Text searching: With the introduction of full-text indexes in MySQL version 3.23.23, users now have the opportunity to optimize searching against even large amounts of text located in any field indexed as such.

There are four general categories of indexes: primary, unique, normal, and full-text. Each type is introduced in this section.

Primary Key Indexes

The primary key index is the most common type of index found in relational databases. It's used to uniquely identify each row as a result of the primary key's own uniqueness. Therefore, the key must be either a value that the entity represented by the row uniquely possesses, or a unique value generated by the database, such as an automatically incrementing integer value. As a result, regardless of whether preexisting rows are subsequently deleted, every row will have a unique primary index. For example, suppose you want to create a database of useful Web sites for your company's IT team. This table might look like the following:

CREATE TABLE bookmarks (
   id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
   name VARCHAR(75) NOT NULL,
   url VARCHAR(200) NOT NULL,
   description MEDIUMTEXT NOT NULL,
   PRIMARY KEY(id));

Because the id column automatically increments (beginning with 1) with each insertion, it's not possible for the bookmarks table to ever contain multiple rows containing exactly the same cells. For instance, consider the following three queries:

INSERT INTO bookmarks (name, url, description)
       VALUES("Apress", "www.apress.com", "Computer books");
INSERT INTO bookmarks (name, url, description)
       VALUES("Google", "www.google.com", "Search engine");
INSERT INTO bookmarks (name, url, description)
       VALUES("W. Jason Gilmore", "www.wjgilmore.com", "Jason's website");

Executing these three queries and retrieving the table produces the following output:


+-------+------------------+-------------------+-----------------+
| id    | name             | url               | description     |
+------ +------------------+-------------------+-----------------+
|     1 | Apress           | www.apress.com    | Computer books  |
|     2 | Google           | www.google.com    | Search engine   |
|     3 | W. Jason Gilmore | www.wjgilmore.com | Jason's website |
+-------+------------------+-------------------+-----------------+

Note how the id column increments with each insertion, ensuring row uniqueness.


Note You can have only one automatically incrementing column per table, and that column must be designated as the primary key. Furthermore, any column designated as a primary key cannot hold NULL values; even if not explicitly declared as NOT NULL, MySQL will automatically assign this trait.


It is typically ill-advised to create a primary index that allows the developer to divine some information about the row it represents. The reason why is demonstrated with an illustration. Rather than use an integer value as the bookmarks table's primary index, suppose you decide to use the url instead. The repercussions involved in making such a decision should be obvious. First, what happens if the URL changes, due to a trademark issue or an acquisition, for example? Even Social Security numbers, values once taken for granted as being unique, can be changed due to the repercussions of identity theft. Save yourself the hassle and always use a primary index that offers no insight into the data it represents; it should be an autonomous vehicle with the sole purpose of ensuring the ability to uniquely identify a data record.

Unique Indexes

Like a primary index, a unique index prevents duplicate values from being created. However, the difference is that only one primary index is allowed per table, whereas multiple unique indexes are supported. With this possibility in mind, consider again the bookmarks table from the previous section. Although it's conceivable that two Web sites could share the same name—for example, "Great PHP resource"—it wouldn't make sense to repeat URLs. This sounds like an ideal unique index:

CREATE TABLE bookmarks (
   id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
   name VARCHAR(75) NOT NULL,
   url VARCHAR(200) NOT NULL UNIQUE,
   description MEDIUMTEXT NOT NULL,
   PRIMARY KEY(id));

As mentioned, it's possible to designate multiple fields as unique in a given table. For instance, suppose you want to prevent contributors to the link repository from repeatedly designating nondescriptive names ("cool site," for example) when inserting a new Web site. Again returning to the bookmarks table, define the name column as unique:

CREATE TABLE bookmarks (
   id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
   name VARCHAR(75) NOT NULL UNIQUE,
   url VARCHAR(200) NOT NULL UNIQUE,
   description MEDIUMTEXT NOT NULL,
   PRIMARY KEY(id));

You can also specify a multiple-column unique index. For example, suppose you want to allow your contributors to insert duplicate url values, and even duplicate name values, but you do not want duplicate name and url combinations to appear. You can enforce such restrictions by creating a multiple-column unique index. Revisiting the original bookmarks table:

CREATE TABLE bookmarks (
   id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
   name VARCHAR(75) NOT NULL,
   url VARCHAR(200) NOT NULL,
   UNIQUE(name,url),
   description MEDIUMTEXT NOT NULL,
   PRIMARY KEY(id));

Given this configuration, the following name and url value pairs could all simultaneously reside in the same table:

Apress site, http://www.apress.com/
Apress site, http://blogs.apress.com/
Blogs, http://www.apress.com/
Apress blogs, http://blogs.apress.com/

However, attempting to insert any of the preceding combinations again will result in an error because duplicate combinations of name and url are illegal.

Normal Indexes

Quite often you'll want to optimize searches on fields other than those designated as primary or even unique. Because this is a commonplace occurrence, it only makes sense that it should be possible to optimize such searches by indexing these fields. Such indexes are typically called normal, or ordinary.

Single-Column Normal Indexes

A single-column normal index should be used if a particular column in your table will be the focus of a considerable number of your selection queries. For example, suppose an employee profile table consists of four columns: a unique row ID, first name, last name, and e-mail address. You know that the majority of the searches will be specific to either the employee's last name or the e-mail address. You should create one normal index for the last name and a unique index for the e-mail address, like so:

CREATE TABLE employees (
   id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
   firstname VARCHAR(35) NOT NULL,
   lastname VARCHAR(35) NOT NULL,
   email VARCHAR(55) NOT NULL UNIQUE,
   INDEX (lastname),
   PRIMARY KEY(id));

Building on this idea, MySQL offers the feature of creating partial-column indexes, based on the idea that the first N characters of a given column often are enough to ensure uniqueness, where N is specified within the index creation parameters. Creating partial-column indexes requires less disk space and is considerably faster than indexing the entire column. Revisiting the previous example, you can imagine that using the first five characters of the last name suffices to ensure accurate retrieval:

CREATE TABLE employees (
   id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
   firstname VARCHAR(35) NOT NULL,
   lastname VARCHAR(35) NOT NULL,
   email VARCHAR(55) NOT NULL UNIQUE,
   INDEX (lastname(5)),
   PRIMARY KEY(id));

Often, however, selection queries are a function of including multiple columns. After all, more complex tables might require a query consisting of several columns before the desired data can be retrieved. Run time on such queries can be decreased greatly through the institution of multiple-column normal indexes, discussed next.

Multiple-Column Normal Indexes

Multiple-column indexing is recommended when you know that a number of specified columns will often be used together in retrieval queries. MySQL's multiple-column indexing approach is based upon a strategy known as leftmost prefixing. Leftmost prefixing states that any multiple-column index including columns A, B, and C will improve performance on queries involving the following column combinations:

  • A, B, C
  • A, B
  • A

Here's how you create a multiple-column MySQL index:

CREATE TABLE employees (
   id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
   lastname VARCHAR(35) NOT NULL,
   firstname VARCHAR(35) NOT NULL,
   email VARCHAR(55) NOT NULL UNIQUE,
   INDEX name (lastname, firstname),
   PRIMARY KEY(id));

This creates two indexes (in addition to the primary key index). The first is the unique index for the e-mail address. The second is a multiple-column index, consisting of two columns, lastname and firstname. This is quite useful, because it increases the search speed when queries involve any of the following column combinations:

  • lastname, firstname
  • lastname

Driving this point home, the following queries would benefit from the multiple-column index:

SELECT email FROM employees WHERE lastname="Geronimo" AND firstname="Ed";
SELECT lastname FROM employees WHERE lastname="Geronimo";

The following query would not benefit:

SELECT lastname FROM employees WHERE firstname="Ed";

To gain performance on this query, you'd need to create separate indexes for the firstname column.

Full-Text Indexes

Full-text indexes offer an efficient means for searching text stored in CHAR, VARCHAR, or TEXT datatypes. Before delving into examples, a bit of background regarding MySQL's special handling of this index is in order.

Because MySQL assumes that full-text searches will be implemented for sifting through large amounts of natural-language text, it provides a mechanism for retrieving data that produces results that best fit the user's desired result. More specifically, if a user were to search using a string like Apache is the world's most popular web server, the words is and the should play little or no role in determining result relevance. In fact, MySQL splits searchable text into words, by default eliminating any word of fewer than four characters. You'll learn how to modify this behavior later in this section.

Creating a full-text index is much like creating indexes of other types. As an example, revisit the bookmarks table created earlier in this chapter, indexing its description column using the full-text variant:

CREATE TABLE bookmarks (
   id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
   name VARCHAR(75) NOT NULL,
   url VARCHAR(200) NOT NULL,
   description MEDIUMTEXT NOT NULL,
   FULLTEXT(description),
   PRIMARY KEY(id));

In addition to the typical primary index, this example creates a full-text index consisting of the description column. For demonstration purposes, Table 36-1 presents the data found in the bookmarks table.

Table 36-1. Sample Table Data

id name url description
1 Python.org http://www.python.org/ The official Python Web site
2 MySQL manual http://dev.mysql.com/doc/ The MySQL reference manual
3 Apache site http://httpd.apache.org/ Includes Apache 2 manual
4 PHP: Hypertext http://www.php.net/ The official PHP Web site
5 Apache Week http://www.apacheweek.com/ Offers a dedicated Apache 2 section

Whereas creating full-text indexes is much like creating other types of indexes, retrieval queries based on the full-text index are different. When retrieving data based on full-text indexes, SELECT queries use two special MySQL functions, MATCH() and AGAINST(). With these functions, natural-language searches can be executed against the full-text index, like so:

SELECT name,url FROM bookmarks WHERE MATCH(description) AGAINST('Apache 2'),

The results returned look like this:


+-------------+----------------------------+
| name        | url                        |
+------------------------------------------+
| Apache site | http://httpd.apache.org    |
| Apache Week | http://www.apacheweek.com  |
+-------------+----------------------------+

This lists the rows in which "Apache" is found in the description column, in order of highest relevance. Remember that the "2" is ignored because of its length. When MATCH() is used in a WHERE clause, relevance is defined in terms of how well the returned row matches the search string. Alternatively, the functions can be incorporated into the query body, returning a list of weighted scores for matching rows; the higher the score, the greater the relevance. An example follows:

SELECT MATCH(description) AGAINST('Apache 2') FROM bookmarks;

Upon execution, MySQL will search every row in the bookmarks table, calculating relevance values for each row, like so:


+----------------------------------------+
| match(description) against('Apache 2') |
+----------------------------------------+
|                                      0 |
|                                      0 |
|                       0.57014514171969 |
|                                      0 |
|                       0.38763393589171 |
+----------------------------------------+

You can also take advantage of a feature known as query expansion, which is particularly useful when the user is making certain presumptions that might not otherwise necessarily be built into the application's search logic. For example, suppose the user was searching for the term "football." Logically he would also be interested in rows including terms such as "Pittsburgh Steelers," "Ohio State Buckeyes," and "Woody Hayes." To compensate for this, you can include the WITH QUERY EXPANSION clause, which will first retrieve all rows including the term "football," and then will search all rows again, this time retrieving all rows having any of the words found in the rows of the first set of results.

Therefore, returning to the example, a row including "Pittsburgh" would be retrieved in the second search even if it didn't also contain the term "football," provided a row found in the first search included the terms "football" and "Pittsburgh." While this can certainly result in more thorough searches, it could of course have unexpected side effects, such as a row being returned because it has the term "Pittsburgh" in it, yet having absolutely nothing to do with football.

Starting with MySQL version 4.01, Boolean-oriented, full-text searches were introduced. This feature is introduced later in this section.

Stopwords

As mentioned earlier, MySQL by default ignores any keywords of fewer than four characters. These words, along with those found in a predefined list built into the MySQL server, are known as stopwords, or words that should be ignored. You can exercise a good deal of control over stopword behavior by modifying the following MySQL variables:

  • ft_min_word_len: You can qualify as stopwords words that don't meet a particular length. You can specify the minimum required length using this parameter. If you change this parameter, you need to restart the MySQL server daemon and rebuild the indexes.
  • ft_max_word_len: You can also qualify as stopwords words that exceed a particular length. You can specify this length using this parameter. If you change this parameter, you need to restart the MySQL server daemon and rebuild the indexes.
  • ft_stopword_file: The file assigned to this parameter contains a list of 544 English words that are automatically filtered out of any search keywords. You can change this to point to another list by setting this parameter to the path and name of the requested list. Alternatively, if you have the option of recompiling the MySQL source, you can modify this list by opening myisam/ft_static.c and editing the predefined list. In the first case, you need to restart MySQL and rebuild the indexes, whereas in the second case you need to recompile MySQL according to your specifications and rebuild the indexes.

Note Rebuilding MySQL's indexes is accomplished with the command REPAIR TABLE table_name USE_FRM, where table_name represents the name of the table that you would like to rebuild.


The reason that stopwords are ignored by default is that they presumably occur too frequently in common language to be considered relevant. This can have unintended effects, because MySQL also automatically filters out any keyword that is found to exist in over 50 percent of the records. Consider what happens, for example, if all contributors add a URL pertinent to the Apache Web server, and all include the word Apache in the description. Executing a full-text search looking for the term Apache will produce what are surely unexpected results: no records found. If you're working with a small result set, or for other reasons require that this default behavior be ignored, use MySQL's Boolean full-text searching capability, introduced next.

Boolean Full-Text Searches

Boolean full-text searches offer more granular control over search queries, allowing you to explicitly identify which words should and should not be present in candidate results (however, although the 50 percent threshold is ignored, the stopword list is still applicable). For example, Boolean full-text searches can retrieve rows that possess the word Apache, but not Navajo, Woodland, or Shawnee. Similarly, you can ensure that results include at least one keyword, all keywords, or no keywords; you are free to exercise considerable filtering control over returned results. Such control is maintained via a number of recognized Boolean operators. Several of these operators are presented in Table 36-2.

Table 36-2. Full-Text Search Boolean Operators

Operator Description
+ A leading plus sign ensures that the ensuing word is present in every result row.
A leading minus sign ensures that the ensuing word is not present in any row returned.
* A tailing asterisk allows for keyword variations, provided that the variation begins with the string specified by the preceding word.
" " Surrounding double quotes ensure that result rows contain that enclosed string, exactly as it was entered.
< > Preceding greater-than and less-than symbols are used to decrease and increase an ensuing word's relevance to the search rankings, respectively.
( ) Parentheses are used to group words into subexpressions.

Consider a few brief examples. The first example returns rows containing Apache, but not manual:

SELECT name,url FROM bookmarks WHERE MATCH(description)
   AGAINST('+Apache -manual' in boolean mode);

The next example returns rows containing the word Apache, but not Shawnee or Navajo:

SELECT name, url FROM bookmarks WHERE MATCH(description)
   AGAINST('+Apache -Shawnee -Navajo' in boolean mode);

The final example returns rows containing web and scripting, or php and scripting, but ranks web scripting lower than php scripting:

SELECT name, url FROM bookmarks WHERE MATCH(description)
   AGAINST(+(<web >php) +scripting);

Note that this last example will only work if you lower the ft_min_word_len parameter to 3.

Indexing Best Practices

The following list offers a few tips that you should always keep in mind when incorporating indexes into your database development strategy:

  • Only index those columns that are required in WHERE and ORDER BY clauses. Indexing columns in abundance will only result in unnecessary consumption of hard drive space, and will actually slow performance when altering table information. Performance degradation will occur on indexed tables because every time a record is changed, the indexes must be updated.
  • If you create an index such as INDEX(firstname, lastname), don't create INDEX(firstname), because MySQL is capable of searching an index prefix. However, keep in mind that only the prefix is relevant; this multiple-column index will not apply for searches that only target lastname.
  • Use the attribute NOT NULL for those columns in which you plan on indexing, so that NULL values will never be stored.
  • Use the −-log-long-format option to log queries that aren't using indexes. You can then examine this log file and adjust your queries accordingly.
  • The EXPLAIN statement helps you determine how MySQL will execute a query, showing you how and in what order tables are joined. This can be tremendously useful for determining how to write optimized queries, and whether indexes should be added. Please consult the MySQL manual for more information about the EXPLAIN statement.

Forms-Based Searches

The ability to easily drill down into a Web site using hyperlinks is one of the behaviors that made the Web such a popular medium. However, as both Web sites and the Web grew exponentially in size, the ability to execute searches based on user-supplied keywords evolved from convenience to necessity. This section offers several examples demonstrating how easy it is to build Web-based search interfaces for searching a MySQL database.

Performing a Simple Search

Many effective search interfaces involve a single text field. For example, suppose you want to provide the human resources department with the ability to look up employee contact information by last name. To implement this task, the query will examine the lastname column found in the employees table. A sample interface for doing so is shown in Figure 36-1.

image

Figure 36-1. A simple search interface

Listing 36-1 implements this interface, passing the requested last name into the search query. If the number of returned rows is greater than zero, each is output; otherwise, an appropriate message is offered.

Listing 36-1. Searching the Employee Table (simplesearch.php)

<p>
Search the employee database:<br />
<form action="simplesearch.php" method="post">
   Last name:<br />
   <input type="text" name="lastname" size="20" maxlength="40" value="" /><br />
   <input type="submit" value="Search!" />
</form>
</p>

<?php

   // If the form has been submitted with a supplied last name
   if (isset($_POST['lastname'])) {

      // Connect to server and select database

      $mysqldb = new mysqli("localhost","websiteuser","secret","corporate");

      // Set the posted variable to a convenient name
      $lastname = mysqli_real_escape_string($_POST['lastname']);

      // Create the query

      $query = "SELECT firstname, lastname, email FROM employees WHERE
                lastname='$lastname'";

      // Query the employees table
      $result = $mysqldb->query($query);

      // If records found, output firstname, lastname, and email field of each
      if ($result->num_rows > 0) {

         while ($row = $result->fetch_object())
            echo "$row->lastname, $row->firstname ($row->email)<br />";
      } else {
         echo "No results found.";
      }
   }
?>

Therefore, entering Gilmore into the search interface would return results similar to the following:


Gilmore, Jason ([email protected])

Extending Search Capabilities

Although this simple search interface is effective, what happens if the user doesn't know the employee's last name? What if the user knows another piece of information, such as the e-mail address? Listing 36-2 modifies the original example so that it can handle input originating from the form depicted in Figure 36-2.

image

Figure 36-2. The search form revised

Listing 36-2. Extending the Search Capabilities (searchextended.php)

<p>
Search the employee database:<br />
<form action="searchextended.php" method="post">
   Keyword:<br />
   <input type="text" name="keyword" size="20" maxlength="40" value="" /><br />
   Field:<br />
   <select name="field">
      <option value="">Choose field:</option>
      <option value="lastname">Last Name</option>
      <option value="email">Email Address</option>
      </select>
   <input type="submit" value="Search!" />
</form>
</p>
<?php
   // If the form has been submitted with a supplied keyword
   if (isset($_POST['field'])) {

      // Connect to server and select database
      $mysqldb = new mysqli("localhost","websiteuser","secret","corporate");

      // Set the posted variables to convenient names
      $keyword = $mysqldb->mysqli_real_escape_string($_POST['keyword']);
      $field = $mysqldb->mysqli_real_escape_string($_POST['field']);

      // Create the query
      if ($field == "lastname" ) {
         $result = $mysqldb->query("SELECT firstname, lastname, email
                                    FROM employees WHERE lastname='$keyword'");
      } elseif ($field == "email") {
         $result = $mysqldb->query("SELECT firstname, lastname, email
                                    FROM employees WHERE email='$keyword'");
      }

      // If records found, output firstname, lastname, and email field
      if ($result->num_rows > 0) {
         while ($row = $result->fetch_object())
            echo "$row->lastname, $row->firstname ($row->email)<br />";
      } else {
         echo "No results found.";
      }
   }

?>

Therefore, setting the field to E-mail Address and inputting [email protected] as the keyword would return results similar to the following:


Gilmore, Jason ([email protected])

Of course, in both examples, you'd need to put additional controls in place to sanitize data and ensure that the user receives detailed responses if he supplies invalid input. Nonetheless, the basic search process should be apparent.

Performing a Full-Text Search

Performing a full-text search is really no different from executing any other selection query; only the query looks different, a detail that remains hidden from the user. As an example, Listing 36-3 implements the search interface depicted in Figure 36-3, demonstrating how to search the bookmarks table's description column.

image

Figure 36-3. A full-text search interface

To extend the user's full-text search capabilities, consider offering a help page demonstrating MySQL's Boolean search features.

Summary

Table indexing is a sure-fire way to optimize queries. This chapter introduced table indexing and showed you how to create primary, unique, normal, and full-text indexes. You then learned just how easy it is to create PHP-enabled search interfaces for querying your MySQL tables.

The next chapter introduces MySQL's transaction-handling feature and shows you how to incorporate transactions into your Web applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.53.26