More Advanced Uses of SELECT

So far today you have seen the fundamental uses of the SELECT query. You'll now see how to use this versatile query in some more sophisticated ways.

Restricting the Resultset with HAVING

On some occasions when you're writing a query, you want to restrict the resultset on the basis of some data in the resultset that has been aggregated.

For example, suppose that you want to find, out of the table of child_products, which age range has products costing less than $10.00.

You want to write this:

SELECT age, MIN(price) FROM child_products
  WHERE MIN(price)<10.00
  GROUP BY age

This query, if it worked, would group the products by age, use MIN() to find the minimum price in each group, and return results where the minimum price is less than $10.00.

But it doesn't work; the WHERE clause just won't allow it. Even if you use an alias, such as SELECT ... MIN(price) AS mp, it won't work.

The way around the problem is to use a HAVING clause. You can think of it like this: a HAVING clause sits back and lets the rest of the query do its work and create an initial resultset, and then applies a restriction to the resultset.

HAVING is like an extra WHERE clause, applied to the resultset rather than the tables. Its syntax is this:

SELECT
  select_expression,...
  ...
  [HAVING where_definition]
  ...

HAVING uses a where_definition that you can construct in the same way as in a WHERE clause. You must place HAVING after any GROUP BY clause but before any ORDER BY clause.

Here's how you could use it to solve the example problem of finding products whose minimum group price is less than $10.00:

mysql> SELECT age, MIN(price) FROM child_products
    -> GROUP BY age
    -> HAVING MIN(price)<10.00;
+------+------------+
| age  | MIN(price) |
+------+------------+
| 3-5  |       7.50 |
+------+------------+

As you can see, the query groups the data by age, finds the minimum price in each group, and then applies the HAVING condition to the minimum price.

MySQL also lets you use an alias to do the same thing (although this would not be allowed in the ANSI-92 standard). Here the alias is called lowest:

mysql> SELECT age, MIN(price) AS lowest FROM child_products
    -> GROUP BY age
    -> HAVING lowest<10.00;
+------+--------+
| age  | lowest |
+------+--------+
| 3-5  |   7.50 |
+------+--------+
						

Caution

MySQL optimizes the WHERE clause so as to run the SELECT query in the most efficient way. However, the HAVING clause is not optimized.

Therefore, never place a condition in a HAVING clause that could otherwise be placed in a WHERE clause.


Selecting into Another Table or File

Instead of running a query and sending the resultset to the user, the mysql console, or an application program, you may want to use the resultset to populate another table, or even to save it in a file.

You should recall from yesterday's lesson how we did this:

mysql> INSERT INTO subscribers
    -> SELECT * FROM tmp;
						

You just have to place an INSERT statement before a SELECT statement, and the resultset from the SELECT is inserted as rows into the table specified by the INSERT.

Remember that the number of columns and their types must correspond between the resultset data and the table you want to populate. Go back and study yesterday's lesson again if you need to recap on this.

Saving your resultset into a file is even simpler. Place the following INTO syntax in your SELECT query:

SELECT
  select_expression,...
  [INTO {OUTFILE | DUMPFILE} '/path/to/filename' export_options]
  ...

The keywords INTO OUTFILE cause the resultset to be written to a file given by /path/to/filename (which must not exist already).

For example, you could extract the whole child_products table to a file like this:

mysql> SELECT * INTO OUTFILE '/tmp/prods.txt'
							-> FROM child_products;
Query OK, 5 rows affected (0.02 sec)

The file produced would look like this by default (using the Unix cat program to display it):

$ cat prods.txt
101     Toy train       3-5     12.99
102     Racing car      3-5     9.99
103     Spinning top    3-5     7.50
104     Teddy bear      0-2     12.50
105     Kitchen         2-3     24.99

You can use export_options to control the field delimiters, end-of-line characters, and other formatting. The options are the same as for LOAD DATA INFILE, which you met in yesterday's lesson. LOAD DATA INFILE is ideally suited for reading a data file that has been created by SELECT INTO OUTFILE.

Note

SELECT INTO OUTFILE is the SQL equivalent of mysqldump, a useful command-line utility. You will learn how to use mysqldump in Day 15, “Administration.”


The INTO DUMPFILE option is a little more specialized. It is for writing a single row of data to a file, without any line terminations or escaping of special characters. You would rarely use this, except if you need to extract BLOB data from a MySQL table and create a file of it.

For example, if you have an image stored as a BLOB in a table pictures, you might do this:

SELECT image INTO DUMPFILE '/path/to/image_filename/'
FROM pictures WHERE id=123

Note

The mysql user needs write permission to the directory where you want to create the output file, and the user executing the query needs the File_priv privilege. See Day 14, “Security,” for more about user privileges.


Full-text Searching

You can use MySQL to perform full-text searching. Introduced in version 3.23.23, and improved in version 4.0.1, this feature of MySQL is developing quickly and becoming increasingly powerful.

Full-text searching allows you to create queries that search text columns in a similar way to a search engine on the Web. You can search for rows containing a given string, with the result being weighted according to either natural language rules, or Boolean logic.

Full-text searching has the advantage of not needing such a precise match as = (equals) or LIKE, both of which require the entire search string to exist in the table to find a match. With a full-text search, the words of the search pattern are considered individually and not case-sensitively, and only one of the search words needs to match for a row to be retrieved. Results are by default returned in order of match quality, with the best match first.

A full-text search system is ideal for databases with large quantities of textual data, such as a content management system, in which articles have hundreds or thousands of words to be searched.

However, to illustrate things so that you can see what's going on clearly, let's consider a sample scenario of a table of cities. You want to be able to give a city name to a SELECT query and retrieve cities with similar names. (This might be useful for a “yellow pages” type directory; if the search engine isn't sure exactly what city you mean, it might reply with “Do you mean...?”)

To run a full-text search, you have to have a FULLTEXT index applied to the column or columns you want to search in. (Don't worry too much about how indexes work right now; you'll cover FULLTEXT and other types of index in Day 9, “Joins and Indexes.”)

To add a FULLTEXT index to a table of cities, on the column city_name, you would do this:

mysql> ALTER TABLE cities ADD FULLTEXT(city_name);
						

If the table is large, this may take a little time because MySQL has to read every word in the given column and create an entry in the index for it.

The full-text search operation itself involves a SELECT with a MATCH...AGAINST operator. The MATCH clause has the following syntax:

MATCH (column_name1[, column_name2, ...]) AGAINST ('search_string')

Let's say that you want to search the cities table for cities with names like “Palm Springs.” You could write a full-text query like this:

mysql> SELECT city_name FROM cities
    -> WHERE MATCH (city_name) AGAINST ('Palm Springs'),
+-----------------------+
| city_name             |
+-----------------------+
| Palm Springs          |
| Palm Bay              |
| Palm Desert           |
| Palm Coast            |
| Palm Beach            |
| North Palm Beach      |
| Palm Beach Shores     |
| West Palm Beach       |
| Hot Springs           |
...
| West Siloam Springs   |
| White Sulphur Springs |
+-----------------------+

This is the simplest type of full-text search. The MATCH...AGAINST was placed in the WHERE clause. The search compares the given words of the search_string with the FULLTEXT index on city_name. The search is efficient and takes a short time to run compared to using LIKE.

The results look sensible: they all contain at least one of the search words, and they're sorted into the order of best match first. The first row of the resultset is a perfect match, later results match only one of the words, and the worst results are those in which the search words occupy a smaller proportion of the row's value. (The row values are longer, so the search terms appear less “relevant.”) A kind of fuzzy logic is taking place, and MySQL has been fine-tuned to do this in a way that makes the results appear most meaningful.

Caution

MySQL tries to look for relevant matches of the search terms in the table rows, but it tries to ignore certain words (in the search string) that appear too common to be worthy of meaning. It won't see any relevance for:

  • A search word fewer than four characters long

  • Common English words, as defined by its stop-word list

  • Words that occur in more than 50% of the table's rows

For example, if you try to search on the word “Key” you'll get no result because the word is just too short:

mysql> SELECT city_name FROM cities
    -> WHERE MATCH (city_name) AGAINST ('Key'),
Empty set (0.00 sec)

You may consider this unsatisfactory. This behavior can be changed by modifying the MySQL variable ft_min_word_len. You can see what this is set to on your system by doing SHOW VARIABLES, like this:

mysql> SHOW VARIABLES LIKE 'ft%';
+--------------------------+----------------+
| Variable_name            | Value          |
+--------------------------+----------------+
| ft_min_word_len          | 4              |
| ft_max_word_len          | 254            |
| ft_max_word_len_for_sort | 20             |
| ft_boolean_syntax        | + -><()~*:""&| |
+--------------------------+----------------+

You'll see how to adjust this variable in Day 15. It's also possible to edit the stop-word list, but you can do this only on a source distribution. In the myisam/ directory under the MySQL source directory, you will find a file called ft_static.c. You will need to edit the word list in this file and recompile MySQL to make the change take effect.

To overcome the situation of rows not being returned if the search string occurs in more than 50% of rows, you can perform a full-text search in Boolean mode. You'll see how to do this in a moment.


It's possible to see what level of relevance is being calculated by MySQL when it performs its full-text comparison. You just need to repeat the MATCH...AGAINST expression in the select_expression:

mysql> SELECT city_name, MATCH (city_name) AGAINST ('Palm Springs') FROM cities
    -> WHERE MATCH (city_name) AGAINST ('Palm Springs'),
+-----------------------+--------------------------------------------+
| city_name             | MATCH (city_name) AGAINST ('Palm Springs') |
+-----------------------+--------------------------------------------+
| Palm Springs          |                            14.176851272583 |
| Palm Bay              |                            7.9398336410522 |
| Palm Desert           |                            7.8505783081055 |
| Palm Coast            |                            7.8505783081055 |
| Palm Beach            |                            7.8505783081055 |
| North Palm Beach      |                             7.763307094574 |
| Palm Beach Shores     |                             7.763307094574 |
| West Palm Beach       |                             7.763307094574 |
| Hot Springs           |                            6.3981976509094 |
...
| West Siloam Springs   |                               6.2559466362 |
| White Sulphur Springs |                               6.2559466362 |
+-----------------------+--------------------------------------------+

Because the MATCH...AGAINST expression is the same in both places, the MySQL query optimizer recognizes this and only runs the full-text code once, so it's not wasteful of system resources.

As you can see in the preceding output, the right-hand column of the resultset is a non-negative floating-point number representing a relevance value of the search string compared with each row.

Although the default behavior is to list best matches first (as it has done in the preceding example), you can override this with other sort criteria if you prefer. You just need to add an ORDER BY clause.

As well as running full-text searches using the fuzzy matching process, you can run them in Boolean mode. Boolean, or logic-based full-text searching, is done by placing the words IN BOOLEAN MODE in the parentheses of the AGAINST clause, like this:

mysql> SELECT city_name FROM cities
    -> WHERE MATCH (city_name) AGAINST ('+Palm +Springs' IN BOOLEAN MODE);
+--------------+
| city_name    |
+--------------+
| Palm Springs |
+--------------+

This returned just one result, and a precise match! How did it do this?

The AGAINST clause contains a + (plus sign) before each of the words: the example used +Palm +Springs.

The + before a word specifies that rows must contain that word for the row to be considered a match. If several words are preceded by a +, all words must be present (but in any order).

You could instead put a - (minus sign) before the word, specifying that rows must not contain that word. For example:

mysql> SELECT city_name FROM cities
    -> WHERE MATCH (city_name) AGAINST ('+Palm -Springs' IN BOOLEAN MODE);
+-------------------+
| city_name         |
+-------------------+
| North Palm Beach  |
| Palm Beach        |
| West Palm Beach   |
| Palm Beach Shores |
| Palm Desert       |
| Palm Bay          |
| Palm Coast        |
+-------------------+

This time the search looked for cities containing Palm, but without Springs.

Caution

When performing full-text searching in Boolean mode, MySQL does not ignore matches on words present in more than 50% of rows, but does still consider stop-words and short words (less than four characters by default) irrelevant and returns an empty set if nothing more interesting is given in the search string.


In addition to the + and - modifiers in Boolean mode, several more modifiers can be used to fine-tune the way searching takes place. Table 8.1 lists the modifiers and their meanings.

Table 8.1. Modifiers for Full-text Searching in Boolean Mode
ModifierMeaning
+Word must be present in all rows returned.
-Word must not be present in any rows returned.
()Used to group words and modifiers.
>Give this word a greater relevance.
<Give this word a lesser relevance.
*Can be added to the end of a search word for wildcard matching.
"Apply the match on the words between the quotes.

The following query looks for cities containing the word South, and either Fork or Kensington, with Kensington being declared as rated higher (using>) than Fork (using <). It uses parentheses to process the Kensington and Fork comparisons together:

mysql> SELECT city_name,
    -> MATCH (city_name)
    -> AGAINST ('+South +(>Kensington <Fork)' IN BOOLEAN MODE) AS relevance
    -> FROM cities
    -> WHERE MATCH (city_name)
    -> AGAINST ('+South +(>Kensington <Fork)' IN BOOLEAN MODE)
    -> ORDER BY relevance DESC;
+------------------+------------------+
| city_name        | relevance        |
+------------------+------------------+
| South Kensington |             1.25 |
| South Fork       | 0.83333337306976 |
+------------------+------------------+

The preceding example also deliberately sorted on relevance (notice how the third line created an alias, relevance, and how the query has an ORDER BY...DESC clause). In Boolean mode, results are not sorted by default.

In the following example, the * is added after a word as a wildcard, to match all cities beginning with London:

mysql> SELECT city_name FROM cities
    -> WHERE MATCH (city_name) AGAINST ('+London*' IN BOOLEAN MODE);
+---------------+
| city_name     |
+---------------+
| London        |
| London Colney |
| New London    |
| Londonderry   |
+---------------+

Note that these are not necessarily in order of relevance because this is not done by default in Boolean mode.

You can use double quotes (") to combine several words, but only look for the precise pattern, like this:

mysql> SELECT city_name, MATCH (city_name)
    -> AGAINST ('+"South Fork"' IN BOOLEAN MODE) AS relevance FROM cities
    -> WHERE MATCH (city_name) AGAINST ('+"South Fork"' IN BOOLEAN MODE);
+------------+-----------+
| city_name  | relevance |
+------------+-----------+
| South Fork |         1 |
+------------+-----------+

This returns only rows that contain South Fork, with the words in the order given.

How to Perform Subselects

As of version 4.0.2 of MySQL, it is not possible to perform subselects in MySQL. However, by the time you read this, version 4.1 will probably be available. The new version is due to have the facility for subselects.

What is a subselect? A subselect is basically a SELECT within a SELECT. It's a way of nesting one SELECT query within another.

With a subselect, you can do a SELECT from a resultset that itself has just been produced by another SELECT.

Subselects are likely to look like this:

SELECT * FROM child_products
  WHERE id IN
    (SELECT id FROM safe_products)

This query would first run a SELECT on a table of safe_products (finding products classified as “safe”). Then the SELECT in the first line would be run, selecting data from the child_products table where the id is present in the resultset of the first query.

(This is a simple example and can actually be performed without necessarily needing a subselect. It's written as a subselect so as to show you the format.)

Subselects are an elegant way of writing complex, often multitable, queries in one go. However, you can get by without them because the same subselect query can be implemented in other ways:

  • If you're selecting from two or more tables, consider using a join; you'll learn about these in Day 9.

  • If that doesn't work, create a temporary table and select the data into it; then, for the final SELECT query, select rows out of the temporary table to get the final resultset.

Any query, no matter how complex, can be implemented without subselects (though not always with the same efficiency or convenience). This is why MySQL has not suffered for not having them so far.

Combining Resultsets with UNION

Introduced in MySQL 4.0.0, UNION is a keyword that can be used to glue two or more SELECT queries together. Each SELECT query is performed independently, but the resultsets are joined together.

The UNION syntax looks like this:

SELECT select_query1
  UNION [ALL]
  SELECT select_query2
    [UNION [ALL}
    SELECT select_query3]

You can join any number of resultsets together. With this syntax, select_query1 can be unioned with the result of select_query2, which itself may be unioned with select_query3, and so on.

For example, imagine that you want to list the cheapest products from the child_ products table, and the cheapest products from the teenage_products table.

Perhaps teenage_products looks like this:

mysql> SELECT * FROM teenage_products;
+-----+-------------+-------+--------+
| id  | name        | age   | price  |
+-----+-------------+-------+--------+
| 101 | Bicycle     | 10-15 |  69.00 |
| 102 | Make-up kit | 12-16 |  19.00 |
| 103 | Trampoline  | 12-18 | 120.00 |
+-----+-------------+-------+--------+

Now let's write a query that gets the cheapest from each table and unions the resultsets together:

mysql> SELECT 'teens',MIN(price) FROM teenage_products
    -> UNION
    -> SELECT 'kids',MIN(price) FROM child_products;
+-------+------------+
| teens | MIN(price) |
+-------+------------+
| teens |      19.00 |
| kids  |       7.50 |
+-------+------------+

Because of the UNION, the two SELECT queries are executed and their resultsets combined to produce the output you see here. The data formats from the resultsets must be identical or an error will result. Note also that the column names of the resultset (here, the string constant teens, and the aggregating function MIN(price)) get their names from the first SELECT query.

Normally, each SELECT works as if DISTINCT has been specified, thus yielding unique rows in the resultset. However, UNION can optionally be given the keyword ALL, in which case it returns all rows that have been extracted. For example:

mysql> SELECT id FROM teenage_products
    -> UNION ALL
    -> SELECT id FROM child_products;
+-----+
| id  |
+-----+
| 101 |
| 102 |
| 103 |
| 101 |
| 102 |
| 103 |
| 104 |
| 105 |
+-----+

This query now selects product ids from both tables. If there are duplicates in the resultset, they will appear.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.176.99