Chapter 3. Selecting Data from Tables

3.0 Introduction

This chapter focuses on using the SELECT statement to retrieve information from your database. You will find the chapter helpful if your SQL background is limited or to find out about the MySQL-specific extensions to SELECT syntax.

There are many ways to write SELECT statements; we’ll look at only a few. Consult the MySQL Reference Manual or a general MySQL text for more information about SELECT syntax and the functions and operators available to extract and manipulate data.

Many examples in this chapter use a table named mail that contains rows that track mail message traffic between users on a set of hosts. The following shows how that table was created:

CREATE TABLE mail
(
  t       DATETIME,    # when message was sent
  srcuser VARCHAR(8),  # sender (source user and host)
  srchost VARCHAR(20),
  dstuser VARCHAR(8),  # recipient (destination user and host)
  dsthost VARCHAR(20),
  size    BIGINT,      # message size in bytes
  INDEX (t)
);

The mail table contents look like this:

mysql> SELECT * FROM mail;
+---------------------+---------+---------+---------+---------+---------+
| t                   | srcuser | srchost | dstuser | dsthost | size    |
+---------------------+---------+---------+---------+---------+---------+
| 2014-05-11 10:15:08 | barb    | saturn  | tricia  | mars    |   58274 |
| 2014-05-12 12:48:13 | tricia  | mars    | gene    | venus   |  194925 |
| 2014-05-12 15:02:49 | phil    | mars    | phil    | saturn  |    1048 |
| 2014-05-12 18:59:18 | barb    | saturn  | tricia  | venus   |     271 |
| 2014-05-14 09:31:37 | gene    | venus   | barb    | mars    |    2291 |
| 2014-05-14 11:52:17 | phil    | mars    | tricia  | saturn  |    5781 |
| 2014-05-14 14:42:21 | barb    | venus   | barb    | venus   |   98151 |
| 2014-05-14 17:03:01 | tricia  | saturn  | phil    | venus   | 2394482 |
| 2014-05-15 07:17:48 | gene    | mars    | gene    | saturn  |    3824 |
| 2014-05-15 08:50:57 | phil    | venus   | phil    | venus   |     978 |
| 2014-05-15 10:25:52 | gene    | mars    | tricia  | saturn  |  998532 |
| 2014-05-15 17:35:31 | gene    | saturn  | gene    | mars    |    3856 |
| 2014-05-16 09:00:28 | gene    | venus   | barb    | mars    |     613 |
| 2014-05-16 23:04:19 | phil    | venus   | barb    | venus   |   10294 |
| 2014-05-19 12:49:23 | phil    | mars    | tricia  | saturn  |     873 |
| 2014-05-19 22:21:51 | gene    | saturn  | gene    | venus   |   23992 |
+---------------------+---------+---------+---------+---------+---------+

To create and load the mail table, change location into the tables directory of the recipes distribution and run this command:

% mysql cookbook < mail.sql

This chapter also uses other tables from time to time. Some were used in previous chapters, whereas others are new. To create any of them, do so the same way as for the mail table, using the appropriate script in the tables directory. In addition, many of the other scripts and programs used in this chapter are located in the select directory. The files in that directory enable you to try the examples more easily.

Many of the statements shown here can be executed from within the mysql program, which is discussed in Chapter 1. A few examples involve issuing statements from within the context of a programming language. See [Link to Come] for information on programming techniques.

3.1 Specifying Which Columns and Rows to Select

Problem

You want to display specific columns and rows from a table.

Solution

To indicate which columns to display, name them in the output column list. To indicate which rows to display, use a WHERE clause that specifies conditions that rows must satisfy.

Discussion

The simplest way to display columns from a table is to use SELECT * FROM tbl_name. The * specifier is a shortcut that means all columns:

mysql> SELECT * FROM mail;
+---------------------+---------+---------+---------+---------+---------+
| t                   | srcuser | srchost | dstuser | dsthost | size    |
+---------------------+---------+---------+---------+---------+---------+
| 2014-05-11 10:15:08 | barb    | saturn  | tricia  | mars    |   58274 |
| 2014-05-12 12:48:13 | tricia  | mars    | gene    | venus   |  194925 |
| 2014-05-12 15:02:49 | phil    | mars    | phil    | saturn  |    1048 |
| 2014-05-12 18:59:18 | barb    | saturn  | tricia  | venus   |     271 |
…

Using * is easy, but you cannot select only certain columns or control column display order. Naming columns explicitly enables you to select only the ones of interest, in any order. This query omits the recipient columns and displays the sender before the date and size:

mysql> SELECT srcuser, srchost, t, size FROM mail;
+---------+---------+---------------------+---------+
| srcuser | srchost | t                   | size    |
+---------+---------+---------------------+---------+
| barb    | saturn  | 2014-05-11 10:15:08 |   58274 |
| tricia  | mars    | 2014-05-12 12:48:13 |  194925 |
| phil    | mars    | 2014-05-12 15:02:49 |    1048 |
| barb    | saturn  | 2014-05-12 18:59:18 |     271 |
…

Unless you qualify or restrict a SELECT query in some way, it retrieves every row in your table. To be more precise, provide a WHERE clause that specifies one or more conditions that rows must satisfy.

Conditions can test for equality, inequality, or relative ordering. For some types of data, such as strings, you can use pattern matches. The following statements select columns from rows in the mail table containing srchost values that are exactly equal to the string 'venus' or that begin with the letter 's':

mysql> SELECT t, srcuser, srchost  FROM mail WHERE srchost = 'venus';
+---------------------+---------+---------+
| t                   | srcuser | srchost |
+---------------------+---------+---------+
| 2014-05-14 09:31:37 | gene    | venus   |
| 2014-05-14 14:42:21 | barb    | venus   |
| 2014-05-15 08:50:57 | phil    | venus   |
| 2014-05-16 09:00:28 | gene    | venus   |
| 2014-05-16 23:04:19 | phil    | venus   |
+---------------------+---------+---------+
mysql> SELECT t, srcuser, srchost FROM mail WHERE srchost LIKE 's%';
+---------------------+---------+---------+
| t                   | srcuser | srchost |
+---------------------+---------+---------+
| 2014-05-11 10:15:08 | barb    | saturn  |
| 2014-05-12 18:59:18 | barb    | saturn  |
| 2014-05-14 17:03:01 | tricia  | saturn  |
| 2014-05-15 17:35:31 | gene    | saturn  |
| 2014-05-19 22:21:51 | gene    | saturn  |
+---------------------+---------+---------+

The LIKE operator in the previous query performs a pattern match, where % acts as a wildcard that matches any string. [Link to Come] discusses pattern matching further.

A WHERE clause can test multiple conditions and different conditions can test different columns. The following statement finds messages sent by barb to tricia:

mysql> SELECT * FROM mail WHERE srcuser = 'barb' AND dstuser = 'tricia';
+---------------------+---------+---------+---------+---------+-------+
| t                   | srcuser | srchost | dstuser | dsthost | size  |
+---------------------+---------+---------+---------+---------+-------+
| 2014-05-11 10:15:08 | barb    | saturn  | tricia  | mars    | 58274 |
| 2014-05-12 18:59:18 | barb    | saturn  | tricia  | venus   |   271 |
+---------------------+---------+---------+---------+---------+-------+

Output columns can be calculated by evaluating expressions. This query combines the srcuser and srchost columns using CONCAT() to produce composite values in email address format:

mysql> SELECT t, CONCAT(srcuser,'@',srchost), size FROM mail;
+---------------------+-----------------------------+---------+
| t                   | CONCAT(srcuser,'@',srchost) | size    |
+---------------------+-----------------------------+---------+
| 2014-05-11 10:15:08 | barb@saturn                 |   58274 |
| 2014-05-12 12:48:13 | tricia@mars                 |  194925 |
| 2014-05-12 15:02:49 | phil@mars                   |    1048 |
| 2014-05-12 18:59:18 | barb@saturn                 |     271 |
…

You’ll notice that the email address column label is the expression that calculates it. To provide a better label, use a column alias (see Recipe 3.2).

Tip

As of MySQL 8.0.19 you can use statement TABLE to select all columns from the table. TABLE supports ORDER BY (see Recipe 3.3) and LIMIT (see Recipe 3.11) clauses, but does not allow any other filtering of columns or rows.

mysql> TABLE mail ORDER BY size DESC LIMIT 3;
+---------------------+---------+---------+---------+---------+---------+
| t                   | srcuser | srchost | dstuser | dsthost | size    |
+---------------------+---------+---------+---------+---------+---------+
| 2014-05-14 17:03:01 | tricia  | saturn  | phil    | venus   | 2394482 |
| 2014-05-15 10:25:52 | gene    | mars    | tricia  | saturn  |  998532 |
| 2014-05-12 12:48:13 | tricia  | mars    | gene    | venus   |  194925 |
+---------------------+---------+---------+---------+---------+---------+
3 rows in set (0.00 sec)
          

3.2 Naming Query Result Columns

Problem

The column names in a query result are unsuitable, ugly, or difficult to work with, so you want to name them yourself.

Solution

Use aliases to choose your own column names.

Discussion

When you retrieve a result set, MySQL gives every output column a name. (That’s how the mysql program gets the names you see displayed in the initial row of column headers in result set output.) By default, MySQL assigns the column names specified in the CREATE TABLE or ALTER TABLE statement to output columns, but if these defaults are not suitable, you can use column aliases to specify your own names.

This recipe explains aliases and shows how to use them to assign column names in statements. If you’re writing a program that must determine the names, see [Link to Come] for information about accessing column metadata.

If an output column comes directly from a table, MySQL uses the table column name for the output column name. The following statement selects four table columns, the names of which become the corresponding output column names:

mysql> SELECT t, srcuser, srchost, size FROM mail;
+---------------------+---------+---------+---------+
| t                   | srcuser | srchost | size    |
+---------------------+---------+---------+---------+
| 2014-05-11 10:15:08 | barb    | saturn  |   58274 |
| 2014-05-12 12:48:13 | tricia  | mars    |  194925 |
| 2014-05-12 15:02:49 | phil    | mars    |    1048 |
| 2014-05-12 18:59:18 | barb    | saturn  |     271 |
…

If you generate a column by evaluating an expression, the expression itself is the column name. This can produce long and unwieldy names in result sets, as illustrated by the following statement that uses one expression to reformat the dates in the t column, and another to combine srcuser and srchost into email address format:

mysql> SELECT
    -> DATE_FORMAT(t,'%M %e, %Y'), CONCAT(srcuser,'@',srchost), size
    -> FROM mail;
+----------------------------+-----------------------------+---------+
| DATE_FORMAT(t,'%M %e, %Y') | CONCAT(srcuser,'@',srchost) | size    |
+----------------------------+-----------------------------+---------+
| May 11, 2014               | barb@saturn                 |   58274 |
| May 12, 2014               | tricia@mars                 |  194925 |
| May 12, 2014               | phil@mars                   |    1048 |
| May 12, 2014               | barb@saturn                 |     271 |
…

To choose your own output column name, use an AS name clause to specify a column alias (the keyword AS is optional). The following statement retrieves the same result as the previous one, but renames the first column to date_sent and the second to sender:

mysql> SELECT
    -> DATE_FORMAT(t,'%M %e, %Y') AS date_sent,
    -> CONCAT(srcuser,'@',srchost) AS sender,
    -> size FROM mail;
+--------------+---------------+---------+
| date_sent    | sender        | size    |
+--------------+---------------+---------+
| May 11, 2014 | barb@saturn   |   58274 |
| May 12, 2014 | tricia@mars   |  194925 |
| May 12, 2014 | phil@mars     |    1048 |
| May 12, 2014 | barb@saturn   |     271 |
…

The aliases make the column names more concise, easier to read, and more meaningful. Aliases are subject to a few restrictions. For example, they must be quoted if they are SQL keywords, entirely numeric, or contain spaces or other special characters (an alias can consist of several words if you want to use a descriptive phrase). The following statement retrieves the same data values as the preceding one but uses phrases to name the output columns:

mysql> SELECT
    -> DATE_FORMAT(t,'%M %e, %Y') AS 'Date of message',
    -> CONCAT(srcuser,'@',srchost) AS 'Message sender',
    -> size AS 'Number of bytes' FROM mail;
+-----------------+----------------+-----------------+
| Date of message | Message sender | Number of bytes |
+-----------------+----------------+-----------------+
| May 11, 2014    | barb@saturn    |           58274 |
| May 12, 2014    | tricia@mars    |          194925 |
| May 12, 2014    | phil@mars      |            1048 |
| May 12, 2014    | barb@saturn    |             271 |
…

If MySQL complains about a single-word alias, the word probably is reserved. Quoting the alias should make it legal:

mysql> SELECT 1 AS INTEGER;
You have an error in your SQL syntax near 'INTEGER'
mysql> SELECT 1 AS 'INTEGER';
+---------+
| INTEGER |
+---------+
|       1 |
+---------+

Column aliases also are useful for programming purposes. If you write a program that fetches rows into an array and accesses them by numeric column indexes, the presence or absence of column aliases makes no difference because aliases don’t change the positions of columns within the result set. However, aliases make a big difference if you access output columns by name because aliases change those names. Exploit this fact to give your program easier names to work with. For example, if your query displays reformatted message time values from the mail table using the expression DATE_FORMAT(t,'%M %e, %Y'), that expression is also the name you must use when referring to the output column. In a Perl hashref, for example, you’d access it as $ref->{"DATE_FORMAT(t,'%M %e, %Y')"}. That’s inconvenient. Use AS date_sent to give the column an alias and you can refer to it more easily as $ref->{date_sent}. Here’s an example that shows how a Perl DBI script might process such values. It retrieves rows into a hash and refers to column values by name:

$sth = $dbh->prepare ("SELECT srcuser,
                       DATE_FORMAT(t,'%M %e, %Y') AS date_sent
                       FROM mail");
$sth->execute ();
while (my $ref = $sth->fetchrow_hashref ())
{
  printf "user: %s, date sent: %s
", $ref->{srcuser}, $ref->{date_sent};
}

In Java, you’d do something like this, where the argument to getString() names the column to access:

Statement s = conn.createStatement ();
s.executeQuery ("SELECT srcuser,"
                + " DATE_FORMAT(t,'%M %e, %Y') AS date_sent"
                + " FROM mail");
ResultSet rs = s.getResultSet ();
while (rs.next ())  // loop through rows of result set
{
  String name = rs.getString ("srcuser");
  String dateSent = rs.getString ("date_sent");
  System.out.println ("user: " + name + ", date sent: " + dateSent);
}
rs.close ();
s.close ();

[Link to Come] shows for each of our programming languages how to fetch rows into data structures that permit access to column values by name. The select directory of the recipes distribution has examples that show how to do this for the mail table.

You cannot refer to column aliases in a WHERE clause. Thus, the following statement is illegal:

mysql> SELECT t, srcuser, dstuser, size/1024 AS kilobytes
    -> FROM mail WHERE kilobytes > 500;
ERROR 1054 (42S22): Unknown column 'kilobytes' in 'where clause'

The error occurs because an alias names an output column, whereas a WHERE clause operates on input columns to determine which rows to select for output. To make the statement legal, replace the alias in the WHERE clause with the same column or expression that the alias represents:

mysql> SELECT t, srcuser, dstuser, size/1024 AS kilobytes
    -> FROM mail WHERE size/1024 > 500;
+---------------------+---------+---------+-----------+
| t                   | srcuser | dstuser | kilobytes |
+---------------------+---------+---------+-----------+
| 2014-05-14 17:03:01 | tricia  | phil    | 2338.3613 |
| 2014-05-15 10:25:52 | gene    | tricia  |  975.1289 |
+---------------------+---------+---------+-----------+

3.3 Sorting Query Results

Problem

You want to control how your query results are sorted.

Solution

MySQL can’t read your mind. Use an ORDER BY clause to tell it how to sort result rows.

Discussion

When you select rows, the MySQL server is free to return them in any order unless you instruct it otherwise by saying how to sort the result. There are lots of ways to use sorting techniques, as [Link to Come] explores in detail. Briefly, to sort a result set, add an ORDER BY clause that names the column or columns to use for sorting. This statement names multiple columns in the ORDER BY clause to sort rows by host and by user within each host:

mysql> SELECT * FROM mail WHERE dstuser = 'tricia'
    -> ORDER BY srchost, srcuser;
+---------------------+---------+---------+---------+---------+--------+
| t                   | srcuser | srchost | dstuser | dsthost | size   |
+---------------------+---------+---------+---------+---------+--------+
| 2014-05-15 10:25:52 | gene    | mars    | tricia  | saturn  | 998532 |
| 2014-05-14 11:52:17 | phil    | mars    | tricia  | saturn  |   5781 |
| 2014-05-19 12:49:23 | phil    | mars    | tricia  | saturn  |    873 |
| 2014-05-11 10:15:08 | barb    | saturn  | tricia  | mars    |  58274 |
| 2014-05-12 18:59:18 | barb    | saturn  | tricia  | venus   |    271 |
+---------------------+---------+---------+---------+---------+--------+

To sort a column in reverse (descending) order, add the keyword DESC after its name in the ORDER BY clause:

mysql> SELECT * FROM mail WHERE size > 50000 ORDER BY size DESC;
+---------------------+---------+---------+---------+---------+---------+
| t                   | srcuser | srchost | dstuser | dsthost | size    |
+---------------------+---------+---------+---------+---------+---------+
| 2014-05-14 17:03:01 | tricia  | saturn  | phil    | venus   | 2394482 |
| 2014-05-15 10:25:52 | gene    | mars    | tricia  | saturn  |  998532 |
| 2014-05-12 12:48:13 | tricia  | mars    | gene    | venus   |  194925 |
| 2014-05-14 14:42:21 | barb    | venus   | barb    | venus   |   98151 |
| 2014-05-11 10:15:08 | barb    | saturn  | tricia  | mars    |   58274 |
+---------------------+---------+---------+---------+---------+---------+

3.4 Removing Duplicate Rows

Problem

Output from a query contains duplicate rows. You want to eliminate them.

Solution

Use DISTINCT.

Discussion

Some queries produce results containing duplicate rows. For example, to see who sent mail, query the mail table like this:

mysql> SELECT srcuser FROM mail;
+---------+
| srcuser |
+---------+
| barb    |
| tricia  |
| phil    |
| barb    |
| gene    |
| phil    |
| barb    |
| tricia  |
| gene    |
| phil    |
| gene    |
| gene    |
| gene    |
| phil    |
| phil    |
| gene    |
+---------+

That result is heavily redundant. To remove the duplicate rows and produce a set of unique values, add DISTINCT to the query:

mysql> SELECT DISTINCT srcuser FROM mail;
+---------+
| srcuser |
+---------+
| barb    |
| tricia  |
| phil    |
| gene    |
+---------+

To count the number of unique values in a column, use COUNT(DISTINCT):

mysql> SELECT COUNT(DISTINCT srcuser) FROM mail;
+-------------------------+
| COUNT(DISTINCT srcuser) |
+-------------------------+
|                       4 |
+-------------------------+

DISTINCT works with multiple-column output, too. The following query shows which dates are represented in the mail table:

mysql> SELECT DISTINCT YEAR(t), MONTH(t), DAYOFMONTH(t) FROM mail;
+---------+----------+---------------+
| YEAR(t) | MONTH(t) | DAYOFMONTH(t) |
+---------+----------+---------------+
|    2014 |        5 |            11 |
|    2014 |        5 |            12 |
|    2014 |        5 |            14 |
|    2014 |        5 |            15 |
|    2014 |        5 |            16 |
|    2014 |        5 |            19 |
+---------+----------+---------------+

See Also

[Link to Come] revisits DISTINCT and COUNT(DISTINCT). [Link to Come] discusses duplicate removal in more detail.

3.5 Working with NULL Values

Problem

You’re trying to to compare column values to NULL, but it isn’t working.

Solution

Use the proper comparison operators: IS NULL, IS NOT NULL, or <=>.

Discussion

Conditions that involve NULL are special because NULL means unknown value. Consequently, comparisons such as value = NULL or value <> NULL always produce a result of NULL (not true or false) because it’s impossible to tell whether they are true or false. Even NULL = NULL produces NULL because you can’t determine whether one unknown value is the same as another.

To look for values that are or are not NULL, use the IS NULL or IS NOT NULL operator. Suppose that a table named expt contains experimental results for subjects who are to be given four tests each and that represents tests not yet administered using NULL:

+---------+------+-------+
| subject | test | score |
+---------+------+-------+
| Jane    | A    |    47 |
| Jane    | B    |    50 |
| Jane    | C    |  NULL |
| Jane    | D    |  NULL |
| Marvin  | A    |    52 |
| Marvin  | B    |    45 |
| Marvin  | C    |    53 |
| Marvin  | D    |  NULL |
+---------+------+-------+

You can see that = and <> fail to identify NULL values:

mysql> SELECT * FROM expt WHERE score = NULL;
Empty set (0.00 sec)
mysql> SELECT * FROM expt WHERE score <> NULL;
Empty set (0.00 sec)

Write the statements like this instead:

mysql> SELECT * FROM expt WHERE score IS NULL;
+---------+------+-------+
| subject | test | score |
+---------+------+-------+
| Jane    | C    |  NULL |
| Jane    | D    |  NULL |
| Marvin  | D    |  NULL |
+---------+------+-------+
mysql> SELECT * FROM expt WHERE score IS NOT NULL;
+---------+------+-------+
| subject | test | score |
+---------+------+-------+
| Jane    | A    |    47 |
| Jane    | B    |    50 |
| Marvin  | A    |    52 |
| Marvin  | B    |    45 |
| Marvin  | C    |    53 |
+---------+------+-------+

The MySQL-specific <=> comparison operator, unlike the = operator, is true even for two NULL values:

mysql> SELECT NULL = NULL, NULL <=> NULL;
+-------------+---------------+
| NULL = NULL | NULL <=> NULL |
+-------------+---------------+
|        NULL |             1 |
+-------------+---------------+

Sometimes it’s useful to map NULL values onto some other value that has more meaning in the context of your application. For example, use IF() to map NULL onto the string Unknown:

mysql> SELECT subject, test, IF(score IS NULL,'Unknown', score) AS 'score'
    -> FROM expt;
+---------+------+---------+
| subject | test | score   |
+---------+------+---------+
| Jane    | A    | 47      |
| Jane    | B    | 50      |
| Jane    | C    | Unknown |
| Jane    | D    | Unknown |
| Marvin  | A    | 52      |
| Marvin  | B    | 45      |
| Marvin  | C    | 53      |
| Marvin  | D    | Unknown |
+---------+------+---------+

This IF()-based mapping technique works for any kind of value, but it’s especially useful with NULL values because NULL tends to be given a variety of meanings: unknown, missing, not yet determined, out of range, and so forth. Choose the label that makes the most sense in a given context.

The preceding query can be written more concisely using IFNULL(), which tests its first argument and returns it if it’s not NULL, or returns its second argument otherwise:

SELECT subject, test, IFNULL(score,'Unknown') AS 'score'
FROM expt;

In other words, these two tests are equivalent:

IF(expr1 IS NOT NULL,expr1,expr2)
IFNULL(expr1,expr2)

From a readability standpoint, IF() often is easier to understand than IFNULL(). From a computational perspective, IFNULL() is more efficient because expr1 need not be evaluated twice, as happens with IF().

See Also

NULL values also behave specially with respect to sorting and summary operations. See [Link to Come] and [Link to Come].

3.6 Writing Comparisons Involving NULL in Programs

Problem

You’re writing a program that looks for rows containing a specific value, but it fails when the value is NULL.

Solution

Choose the proper comparison operator according to whether the comparison value is or is not NULL.

Discussion

Recipe 3.5 discusses the need to use different comparison operators for NULL values than for non-NULL values in SQL statements. This issue leads to a subtle danger when constructing statement strings within programs. If a value stored in a variable might represent a NULL value, you must account for that when you use the value in comparisons. For example, in Python, None represents a NULL value, so to construct a statement that finds rows in the expt table matching some arbitrary value in a score variable, you cannot do this:

cursor.execute("SELECT * FROM expt WHERE score = %s", (score,))

The statement fails when score is None because the resulting statement becomes:

SELECT * FROM expt WHERE score = NULL

A comparison of score = NULL is never true, so that statement returns no rows. To take into account the possibility that score could be None, construct the statement using the appropriate comparison operator like this:

operator = "IS" if score is None else "="
cursor.execute("SELECT * FROM expt WHERE score {} %s".format(operator), (score,))

This results in statements as follows for score values of None (NULL) or 43 (not NULL):

SELECT * FROM expt WHERE score IS NULL
SELECT * FROM expt WHERE score = 43

For inequality tests, set operator like this instead:

operator = "IS NOT" if score is None else "<>"

3.7 Using Views to Simplify Table Access

Problem

You want to refer to values calculated from expressions without writing the expressions each time you retrieve them.

Solution

Use a view defined such that its columns perform the desired calculations.

Discussion

Suppose that you retrieve several values from the mail table, using expressions to calculate most of them:

mysql> SELECT
    -> DATE_FORMAT(t,'%M %e, %Y') AS date_sent,
    -> CONCAT(srcuser,'@',srchost) AS sender,
    -> CONCAT(dstuser,'@',dsthost) AS recipient,
    -> size FROM mail;
+--------------+---------------+---------------+---------+
| date_sent    | sender        | recipient     | size    |
+--------------+---------------+---------------+---------+
| May 11, 2014 | barb@saturn   | tricia@mars   |   58274 |
| May 12, 2014 | tricia@mars   | gene@venus    |  194925 |
| May 12, 2014 | phil@mars     | phil@saturn   |    1048 |
| May 12, 2014 | barb@saturn   | tricia@venus  |     271 |
…

If you must issue such a statement often, it’s inconvenient to keep writing the expressions. To make the statement results easier to access, use a view, which is a virtual table that contains no data. Instead, it’s defined as the SELECT statement that retrieves the data of interest. The following view, mail_view, is equivalent to the SELECT statement just shown:

mysql> CREATE VIEW mail_view AS
    -> SELECT
    -> DATE_FORMAT(t,'%M %e, %Y') AS date_sent,
    -> CONCAT(srcuser,'@',srchost) AS sender,
    -> CONCAT(dstuser,'@',dsthost) AS recipient,
    -> size FROM mail;

To access the view contents, refer to it like any other table. You can select some or all of its columns, add a WHERE clause to restrict which rows to retrieve, use ORDER BY to sort the rows, and so forth. For example:

mysql> SELECT date_sent, sender, size FROM mail_view
    -> WHERE size > 100000 ORDER BY size;
+--------------+---------------+---------+
| date_sent    | sender        | size    |
+--------------+---------------+---------+
| May 12, 2014 | tricia@mars   |  194925 |
| May 15, 2014 | gene@mars     |  998532 |
| May 14, 2014 | tricia@saturn | 2394482 |
+--------------+---------------+---------+

Stored programs provide another way to encapsulate calculations (see [Link to Come]).

3.8 Selecting Data from Multiple Tables

Problem

The answer to a question requires data from more than one table, so you need to select data from multiple tables.

Solution

Use a join or a subquery.

Discussion

The queries shown so far select data from a single table, but sometimes you must retrieve information from multiple tables. Two types of statements that accomplish this are joins and subqueries. A join matches rows in one table with rows in another and enables you to retrieve output rows that contain columns from either or both tables. A subquery is one query nested within another, to perform a comparison between values selected by the inner query against values selected by the outer query.

This recipe shows a couple brief examples to illustrate the basic ideas. Other examples appear elsewhere: subqueries are used in various examples throughout the book (for example, Recipe 3.10 and [Link to Come]). [Link to Come] discusses joins in detail, including some that select from more than two tables.

The following examples use the profile table introduced in [Link to Come]. Recall that it lists the people on your buddy list:

mysql> SELECT * FROM profile;
+----+---------+------------+-------+-----------------------+------+
| id | name    | birth      | color | foods                 | cats |
+----+---------+------------+-------+-----------------------+------+
|  1 | Sybil   | 1970-04-13 | black | lutefisk,fadge,pizza  |    0 |
|  2 | Nancy   | 1969-09-30 | white | burrito,curry,eggroll |    3 |
|  3 | Ralph   | 1973-11-02 | red   | eggroll,pizza         |    4 |
|  4 | Lothair | 1963-07-04 | blue  | burrito,curry         |    5 |
|  5 | Henry   | 1965-02-14 | red   | curry,fadge           |    1 |
|  6 | Aaron   | 1968-09-17 | green | lutefisk,fadge        |    1 |
|  7 | Joanna  | 1952-08-20 | green | lutefisk,fadge        |    0 |
|  8 | Stephen | 1960-05-01 | white | burrito,pizza         |    0 |
+----+---------+------------+-------+-----------------------+------+

Let’s extend use of the profile table to include another table named profile_contact. This second table indicates how to contact people listed in the profile table via various social media services and is defined like this:

CREATE TABLE profile_contact
(
  profile_id   INT UNSIGNED NOT NULL, # ID from profile table
  service      VARCHAR(20) NOT NULL,  # social media service name
  contact_name VARCHAR(25) NOT NULL,  # name to use for contacting person
  INDEX (profile_id)
);

The table associates each row with the proper profile row via the profile_id column. The service and contact_name columns name the media service and the name to use for contacting the given person via that service. For the examples, assume that the table contains these rows:

mysql> SELECT * FROM profile_contact ORDER BY profile_id, service;
+------------+----------+--------------+
| profile_id | service  | contact_name |
+------------+----------+--------------+
|          1 | Facebook | user1-fbid   |
|          1 | Twitter  | user1-twtrid |
|          2 | Facebook | user2-msnid  |
|          2 | LinkedIn | user2-lnkdid |
|          2 | Twitter  | user2-fbrid  |
|          4 | LinkedIn | user4-lnkdid |
+------------+----------+--------------+

A question that requires information from both tables is, For each person in the profile table, show me which services I can use to get in touch, and the contact name for each service. To answer this question, use a join. Select from both tables and match rows by comparing the id column from the profile table with the profile_id column from the profile_contact table:

mysql> SELECT id, name, service, contact_name
    -> FROM profile INNER JOIN profile_contact ON id = profile_id;
+----+---------+----------+--------------+
| id | name    | service  | contact_name |
+----+---------+----------+--------------+
|  1 | Sybil   | Twitter  | user1-twtrid |
|  1 | Sybil   | Facebook | user1-fbid   |
|  2 | Nancy   | Twitter  | user2-fbrid  |
|  2 | Nancy   | Facebook | user2-msnid  |
|  2 | Nancy   | LinkedIn | user2-lnkdid |
|  4 | Lothair | LinkedIn | user4-lnkdid |
+----+---------+----------+--------------+

The FROM clause indicates the tables from which to select data, and the ON clause tells MySQL which columns to use to find matches between the tables. In the result, rows include the id and name columns from the profile table, and the service and contact_name columns from the profile_contact table.

Here’s another question that requires both tables to answer: List all the profile_contact records for Nancy. To pull the proper rows from the profile_contact table, you need Nancy’s ID, which is stored in the profile table. To write the query without looking up Nancy’s ID yourself, use a subquery that, given her name, looks it up for you:

mysql> SELECT * FROM profile_contact
    -> WHERE profile_id = (SELECT id FROM profile WHERE name = 'Nancy');
+------------+----------+--------------+
| profile_id | service  | contact_name |
+------------+----------+--------------+
|          2 | Twitter  | user2-fbrid  |
|          2 | Facebook | user2-msnid  |
|          2 | LinkedIn | user2-lnkdid |
+------------+----------+--------------+

Here the subquery appears as a nested SELECT statement enclosed within parentheses.

3.9 Selecting Rows from the Beginning, End, or Middle of Query Results

Problem

You want only certain rows from a result set, such as the first one, the last five, or rows 21 through 40.

Solution

Use a LIMIT clause, perhaps in conjunction with an ORDER BY clause.

Discussion

MySQL supports a LIMIT clause that tells the server to return only part of a result set. LIMIT is a MySQL-specific extension to SQL that is extremely valuable when your result set contains more rows than you want to see at a time. It enables you to retrieve an arbitrary section of a result set. Typical LIMIT uses include the following kinds of problems:

  • Answering questions about first or last, largest or smallest, newest or oldest, least or most expensive, and so forth.

  • Splitting a result set into sections so that you can process it one piece at a time. This technique is common in web applications for displaying a large search result across several pages. Showing the result in sections enables display of smaller, easier-to-understand pages.

The following examples use the profile table shown in Recipe 3.8. To see the first n rows of a SELECT result, add LIMIT n to the end of the statement:

mysql> SELECT * FROM profile LIMIT 1;
+----+-------+------------+-------+----------------------+------+
| id | name  | birth      | color | foods                | cats |
+----+-------+------------+-------+----------------------+------+
|  1 | Sybil | 1970-04-13 | black | lutefisk,fadge,pizza |    0 |
+----+-------+------------+-------+----------------------+------+
mysql> SELECT * FROM profile LIMIT 3;
+----+-------+------------+-------+-----------------------+------+
| id | name  | birth      | color | foods                 | cats |
+----+-------+------------+-------+-----------------------+------+
|  1 | Sybil | 1970-04-13 | black | lutefisk,fadge,pizza  |    0 |
|  2 | Nancy | 1969-09-30 | white | burrito,curry,eggroll |    3 |
|  3 | Ralph | 1973-11-02 | red   | eggroll,pizza         |    4 |
+----+-------+------------+-------+-----------------------+------+

LIMIT n means return at most n rows. If you specify LIMIT 10, and the result set has only four rows, the server returns four rows.

The rows in the preceding query results are returned in no particular order, so they may not be very meaningful. A more common technique uses ORDER BY to sort the result set and LIMIT to find smallest and largest values. For example, to find the row with the minimum (earliest) birth date, sort by the birth column, then add LIMIT 1 to retrieve the first row:

mysql> SELECT * FROM profile ORDER BY birth LIMIT 1;
+----+--------+------------+-------+----------------+------+
| id | name   | birth      | color | foods          | cats |
+----+--------+------------+-------+----------------+------+
|  7 | Joanna | 1952-08-20 | green | lutefisk,fadge |    0 |
+----+--------+------------+-------+----------------+------+

This works because MySQL processes the ORDER BY clause to sort the rows, then applies LIMIT.

To obtain rows from the end of a result set, sort them in the opposite order. The statement that finds the row with the most recent birth date is similar to the previous one, except that the sort order is descending:

mysql> SELECT * FROM profile ORDER BY birth DESC LIMIT 1;
+----+-------+------------+-------+---------------+------+
| id | name  | birth      | color | foods         | cats |
+----+-------+------------+-------+---------------+------+
|  3 | Ralph | 1973-11-02 | red   | eggroll,pizza |    4 |
+----+-------+------------+-------+---------------+------+

To find the earliest or latest birthday within the calendar year, sort by the month and day of the birth values:

mysql> SELECT name, DATE_FORMAT(birth,'%m-%d') AS birthday
    -> FROM profile ORDER BY birthday LIMIT 1;
+-------+----------+
| name  | birthday |
+-------+----------+
| Henry | 02-14    |
+-------+----------+

You can obtain the same information by running these statements without LIMIT and ignoring everything but the first row. The advantage of LIMIT is that the server returns only the first row, and the extra rows don’t cross the network at all. This is much more efficient than retrieving an entire result set, only to discard all but one row.

To pull rows from the middle of a result set, use the two-argument form of LIMIT, which enables you to pick an arbitrary section of rows. The arguments indicate how many rows to skip and how many to return. This means that you can use LIMIT to do such things as skip two rows and return the next one, thus answering questions such as What is the third-smallest or third-largest value? These are questions that MIN() or MAX() are not suited for, but are easy with LIMIT:

mysql> SELECT * FROM profile ORDER BY birth LIMIT 2,1;
+----+---------+------------+-------+---------------+------+
| id | name    | birth      | color | foods         | cats |
+----+---------+------------+-------+---------------+------+
|  4 | Lothair | 1963-07-04 | blue  | burrito,curry |    5 |
+----+---------+------------+-------+---------------+------+
mysql> SELECT * FROM profile ORDER BY birth DESC LIMIT 2,1;
+----+-------+------------+-------+-----------------------+------+
| id | name  | birth      | color | foods                 | cats |
+----+-------+------------+-------+-----------------------+------+
|  2 | Nancy | 1969-09-30 | white | burrito,curry,eggroll |    3 |
+----+-------+------------+-------+-----------------------+------+

The two-argument form of LIMIT also makes it possible to partition a result set into smaller sections. For example, to retrieve 20 rows at a time from a result, issue a SELECT statement repeatedly, but vary its LIMIT clause like so:

SELECT ... FROM ... ORDER BY ... LIMIT 0, 20;
SELECT ... FROM ... ORDER BY ... LIMIT 20, 20;
SELECT ... FROM ... ORDER BY ... LIMIT 40, 20;

Web developers often use LIMIT this way to split a large search result into smaller, more manageable pieces so that it can be presented over several pages.

To determine the number of rows in a result set so that you can determine the number of sections, issue a COUNT() statement first. For example, to display profile table rows in name order, three at a time, you can find out how many there are with the following statement:

mysql> SELECT COUNT(*) FROM profile;
+----------+
| COUNT(*) |
+----------+
|        8 |
+----------+

That tells you that there are three sets of rows (the last with fewer than three rows), which you can retrieve as follows:

SELECT * FROM profile ORDER BY name LIMIT 0, 3;
SELECT * FROM profile ORDER BY name LIMIT 3, 3;
SELECT * FROM profile ORDER BY name LIMIT 6, 3;

You can also fetch the first part of a result set and determine at the same time how big the result would have been without the LIMIT clause. To fetch the first three rows from the profile table, and then obtain the size of the full result, run these statements:

SELECT SQL_CALC_FOUND_ROWS * FROM profile ORDER BY name LIMIT 4;
SELECT FOUND_ROWS();

The keyword SQL_CALC_FOUND_ROWS in the first statement tells MySQL to calculate the size of the entire result set even though the statement requests that only part of it be returned. The row count is available by calling FOUND_ROWS(). If that function returns a value greater than three, there are other rows yet to be retrieved.

See Also

LIMIT is useful in combination with RAND() to make random selections from a set of items. See [Link to Come].

You can use LIMIT to restrict the effect of a DELETE or UPDATE statement to a subset of the rows that would otherwise be deleted or updated, respectively. For more information about using LIMIT for duplicate row removal, see [Link to Come].

3.10 What to Do When LIMIT Requires the Wrong Sort Order

Problem

LIMIT usually works best in conjunction with an ORDER BY clause that sorts rows. But sometimes that sort order differs from what you want for the final result.

Solution

Use LIMIT in a subquery to retrieve the desired rows, then use the outer query to sort them.

Discussion

If you want the last four rows of a result set, you can obtain them easily by sorting the set in reverse order and using LIMIT 4. The following statement returns the names and birth dates for the four people in the profile table who were born most recently:

mysql> SELECT name, birth FROM profile ORDER BY birth DESC LIMIT 4;
+-------+------------+
| name  | birth      |
+-------+------------+
| Ralph | 1973-11-02 |
| Sybil | 1970-04-13 |
| Nancy | 1969-09-30 |
| Aaron | 1968-09-17 |
+-------+------------+

But that requires sorting the birth values in descending order to place them at the head of the result set. What if you want the output rows to appear in ascending order instead? Use the SELECT as a subquery of an outer statement that re-sorts the rows in the desired final order:

mysql> SELECT * FROM
    -> (SELECT name, birth FROM profile ORDER BY birth DESC LIMIT 4) AS t
    -> ORDER BY birth;
+-------+------------+
| name  | birth      |
+-------+------------+
| Aaron | 1968-09-17 |
| Nancy | 1969-09-30 |
| Sybil | 1970-04-13 |
| Ralph | 1973-11-02 |
+-------+------------+

AS t is used here because any table referred to in the FROM clause must have a name, even a derived table produced from a subquery.

3.11 Calculating LIMIT Values from Expressions

Problem

You want to use expressions to specify the arguments for LIMIT.

Solution

Sadly, you cannot. LIMIT arguments must be literal integers—unless you issue the statement in a context that permits the statement string to be constructed dynamically. In that case, you can evaluate the expressions yourself and insert the resulting values into the statement string.

Discussion

Arguments to LIMIT must be literal integers, not expressions. Statements such as the following are illegal:

SELECT * FROM profile LIMIT 5+5;
SELECT * FROM profile LIMIT @skip_count, @show_count;

The same no expressions permitted principle applies if you use an expression to calculate a LIMIT value in a program that constructs a statement string. You must evaluate the expression first, and then place the resulting value in the statement. For example, if you produce a statement string in Perl or PHP as follows, an error will result when you attempt to execute the statement:

$str = "SELECT * FROM profile LIMIT $x + $y";

To avoid the problem, evaluate the expression first:

$z = $x + $y;
$str = "SELECT * FROM profile LIMIT $z";

Or do this (don’t omit the parentheses or the expression won’t evaluate properly):

$str = "SELECT * FROM profile LIMIT " . ($x + $y);

To construct a two-argument LIMIT clause, evaluate both expressions before placing them into the statement string.

Another issue related to LIMIT (or other syntax constructions that require literal integer values) occurs when you use prepared statements from an API that quotes all data values as strings when binding them to parameter markers. Suppose that you prepare and execute a statement like this in PDO:

$sth = $dbh->prepare ("SELECT * FROM profile LIMIT ?,?");
$sth->execute (array (2, 4));

The resulting statement is as follows, with quoted LIMIT arguments, so statement execution fails:

SELECT * FROM profile LIMIT '2','4'

To avoid this problem, evaluate the LIMIT arguments and place them in the statement yourself, as just described. Alternatively, if your API has type-hinting capability, use it to indicate that the LIMIT arguments are integers to prevent them from being quoted.

3.12 Combining Two or More SELECT Results

Problem

You want to combine rows, retrieved by two or more SELECT statements into one result set.

Solution

Use UNION clause.

Discussion

The mail table stores user names and hosts of the email senders and recipients. But what if we want to know all the user and host combinations possible?

Naive approach would be to choose either sender or receiver pairs. But if we perform even very basic test by comparing number of unique user-host combinations we will find out that it is different for each of directions.

mysql> select count(distinct srcuser, srchost) from mail;
+----------------------------------+
| count(distinct srcuser, srchost) |
+----------------------------------+
|                                9 |
+----------------------------------+
1 row in set (0.01 sec)

mysql> select count(distinct dstuser, dsthost) from mail;
+----------------------------------+
| count(distinct dstuser, dsthost) |
+----------------------------------+
|                               10 |
+----------------------------------+
1 row in set (0.00 sec)
        

We also do not know if our table stores emails from users who only send them and for users who receive but never send.

To get the full list we need to select pairs for both sender and receiver, then remove duplicates. SQL clause UNION DISTINCT and its short form UNION does exactly that. It combines results of two or more SELECT queries that select the same number of columns of the same type.

By default UNION uses column names of the first SELECT for the full result set header, but we can also use aliases as discussed in Recipe 3.2.

mysql> SELECT DISTINCT srcuser AS user, srchost AS host FROM mail
    -> UNION
    -> SELECT DISTINCT dstuser AS user, dsthost AS host FROM mail;
+--------+--------+
| user   | host   |
+--------+--------+
| barb   | saturn |
| tricia | mars   |
| phil   | mars   |
| gene   | venus  |
| barb   | venus  |
| tricia | saturn |
| gene   | mars   |
| phil   | venus  |
| gene   | saturn |
| phil   | saturn |
| tricia | venus  |
| barb   | mars   |
+--------+--------+
12 rows in set (0.00 sec)
        

You may sort as individual query, participating in UNION, as well the whole result. If you do not want to remove duplicates from the result use clause UNION ALL.

To demonstrate this lets create a query that will find four users who sent the highest number of emails and four users who recieved the highest number of emails, then sort result of the union by the user name.

mysql> (SELECT CONCAT(srcuser, '@', srchost) AS user, COUNT(*) AS emails 1
    -> FROM mail GROUP BY srcuser, srchost ORDER BY emails DESC LIMIT 4) 2
    -> UNION ALL
    -> (SELECT CONCAT(dstuser, '@', dsthost) AS user, COUNT(*) AS emails 
    -> FROM mail GROUP BY dstuser, dsthost ORDER BY emails DESC LIMIT 4) 3
    -> ORDER BY user;4
+---------------+--------+
| user          | emails |
+---------------+--------+
| barb@mars     |      2 |
| barb@saturn   |      2 |
| barb@venus    |      2 |
| gene@saturn   |      2 |
| gene@venus    |      2 | 5
| gene@venus    |      2 |
| phil@mars     |      3 |
| tricia@saturn |      3 |
+---------------+--------+
8 rows in set (0.00 sec)
        

1

Concatenate user and host into email address of the user.

2

Order first SELECT result by number of emails descending and limit number of retrieved rows.

3

Order result of the second SELECT.

4

Order result of the UNION by the user email address.

5

We used clause UNION ALL instead of UNION [DISTINCT], therefore we have two entries for gene@venus in the result. This user is in the top list of those who send emails and also of those who recieve emails.

3.13 Selecting Results of Subqueries

Problem

You want to retrieve not only table columns, but also results of queries that use these columns.

Solution

Use subquery in the column list.

Discussion

Suppose that you want to know not only how many emails sent a particular user, but also how many emails they received. You cannot do it without accessing the table mail two times: one to count how many emails were sent and second to count how many emails were received.

One of solutions for this issue is to use subqueries in the column list.

mysql> SELECT CONCAT(srcuser, '@', srchost) AS user, COUNT(*) AS mails_sent, 1
    -> (SELECT COUNT(*) FROM mail d WHERE d.dstuser=m.srcuser AND d.dsthost=m.srchost)  2
    -> AS mails_received 3
    -> FROM mail m 
    -> GROUP BY  srcuser, srchost  4
    -> ORDER BY mails_sent DESC;
+---------------+------------+----------------+
| user          | mails_sent | mails_received |
+---------------+------------+----------------+
| phil@mars     |          3 |              0 |
| barb@saturn   |          2 |              0 |
| gene@venus    |          2 |              2 |
| gene@mars     |          2 |              1 |
| phil@venus    |          2 |              2 |
| gene@saturn   |          2 |              1 |
| tricia@mars   |          1 |              1 |
| barb@venus    |          1 |              2 |
| tricia@saturn |          1 |              3 |
+---------------+------------+----------------+
9 rows in set (0.00 sec)
        

1

First we are retrieved a user name and a host of the sender and count number of emails that they sent.

2

To find the number of emails that this user received we are using subquery to the same table mail. In the WHERE clause we select only those rows where receiver has the same credentials as the sender in the main query.

3

A subquery in the column list must have its own alias.

4

To display statistics per user we use clause GROUP BY, so result is groupped by each user name and host. We discuss GROUP BY clause in details in [Link to Come].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.202.4