This chapter focuses on using the SELECT
statement to retrieve information from your database. You will find the
chapter helpful if your SQL background is limited or to find out about the
MySQL-specific extensions to SELECT
syntax.
There are many ways to write SELECT
statements; we’ll look at only a few.
Consult the MySQL Reference Manual or a general MySQL
text for more information about SELECT
syntax and the functions and operators available to extract and manipulate
data.
Many examples in this chapter use a table named mail
that contains rows that track mail message
traffic between users on a set of hosts. The following shows how that table was created:
CREATE
TABLE
(
t
DATETIME
,
#
when
message
was
sent
srcuser
VARCHAR
(
8
),
#
sender
(
source
user
and
host
)
srchost
VARCHAR
(
20
),
dstuser
VARCHAR
(
8
),
#
recipient
(
destination
user
and
host
)
dsthost
VARCHAR
(
20
),
size
BIGINT
,
#
message
size
in
bytes
INDEX
(
t
)
);
The mail
table contents look like
this:
mysql> SELECT * FROM mail;
+---------------------+---------+---------+---------+---------+---------+
| t | srcuser | srchost | dstuser | dsthost | size |
+---------------------+---------+---------+---------+---------+---------+
| 2014-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |
| 2014-05-12 12:48:13 | tricia | mars | gene | venus | 194925 |
| 2014-05-12 15:02:49 | phil | mars | phil | saturn | 1048 |
| 2014-05-12 18:59:18 | barb | saturn | tricia | venus | 271 |
| 2014-05-14 09:31:37 | gene | venus | barb | mars | 2291 |
| 2014-05-14 11:52:17 | phil | mars | tricia | saturn | 5781 |
| 2014-05-14 14:42:21 | barb | venus | barb | venus | 98151 |
| 2014-05-14 17:03:01 | tricia | saturn | phil | venus | 2394482 |
| 2014-05-15 07:17:48 | gene | mars | gene | saturn | 3824 |
| 2014-05-15 08:50:57 | phil | venus | phil | venus | 978 |
| 2014-05-15 10:25:52 | gene | mars | tricia | saturn | 998532 |
| 2014-05-15 17:35:31 | gene | saturn | gene | mars | 3856 |
| 2014-05-16 09:00:28 | gene | venus | barb | mars | 613 |
| 2014-05-16 23:04:19 | phil | venus | barb | venus | 10294 |
| 2014-05-19 12:49:23 | phil | mars | tricia | saturn | 873 |
| 2014-05-19 22:21:51 | gene | saturn | gene | venus | 23992 |
+---------------------+---------+---------+---------+---------+---------+
To create and load the mail
table, change location into the tables directory of the recipes
distribution and run this
command:
% mysql cookbook < mail.sql
This chapter also uses other tables from time to time. Some were
used in previous chapters, whereas others are new. To create any of them,
do so the same way as for the mail
table, using the appropriate script in the tables directory. In addition, many of the
other scripts and programs used in this chapter are located in the
select directory. The files in that
directory enable you to try the examples more easily.
Many of the statements shown here can be executed from within the mysql program, which is discussed in Chapter 1. A few examples involve issuing statements from within the context of a programming language. See [Link to Come] for information on programming techniques.
The simplest way to display columns from a table is to use
SELECT
*
FROM
tbl_name
. The *
specifier is a shortcut that means all
columns
:
mysql> SELECT * FROM mail;
+---------------------+---------+---------+---------+---------+---------+
| t | srcuser | srchost | dstuser | dsthost | size |
+---------------------+---------+---------+---------+---------+---------+
| 2014-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |
| 2014-05-12 12:48:13 | tricia | mars | gene | venus | 194925 |
| 2014-05-12 15:02:49 | phil | mars | phil | saturn | 1048 |
| 2014-05-12 18:59:18 | barb | saturn | tricia | venus | 271 |
…
Using *
is easy, but you cannot
select only certain columns or control column display order. Naming
columns explicitly enables you to select only the ones of interest, in
any order. This query omits the recipient columns and displays the
sender before the date and size:
mysql> SELECT srcuser, srchost, t, size FROM mail;
+---------+---------+---------------------+---------+
| srcuser | srchost | t | size |
+---------+---------+---------------------+---------+
| barb | saturn | 2014-05-11 10:15:08 | 58274 |
| tricia | mars | 2014-05-12 12:48:13 | 194925 |
| phil | mars | 2014-05-12 15:02:49 | 1048 |
| barb | saturn | 2014-05-12 18:59:18 | 271 |
…
Unless you qualify or restrict a SELECT
query in some way, it retrieves every
row in your table. To be more precise, provide a WHERE
clause that specifies one or more
conditions that rows must satisfy.
Conditions can test for equality, inequality, or relative
ordering. For some types of data, such as strings, you can use pattern
matches. The following statements select columns from rows in the
mail
table containing srchost
values that are exactly equal to the
string 'venus'
or that begin with the
letter 's'
:
mysql>SELECT t, srcuser, srchost FROM mail WHERE srchost = 'venus';
+---------------------+---------+---------+ | t | srcuser | srchost | +---------------------+---------+---------+ | 2014-05-14 09:31:37 | gene | venus | | 2014-05-14 14:42:21 | barb | venus | | 2014-05-15 08:50:57 | phil | venus | | 2014-05-16 09:00:28 | gene | venus | | 2014-05-16 23:04:19 | phil | venus | +---------------------+---------+---------+ mysql>SELECT t, srcuser, srchost FROM mail WHERE srchost LIKE 's%';
+---------------------+---------+---------+ | t | srcuser | srchost | +---------------------+---------+---------+ | 2014-05-11 10:15:08 | barb | saturn | | 2014-05-12 18:59:18 | barb | saturn | | 2014-05-14 17:03:01 | tricia | saturn | | 2014-05-15 17:35:31 | gene | saturn | | 2014-05-19 22:21:51 | gene | saturn | +---------------------+---------+---------+
The LIKE
operator in the previous query performs a pattern match, where %
acts as
a wildcard that matches any string. [Link to Come] discusses pattern matching
further.
A WHERE
clause can test
multiple conditions and different conditions can test different columns.
The following statement finds messages sent by barb
to tricia
:
mysql> SELECT * FROM mail WHERE srcuser = 'barb' AND dstuser = 'tricia';
+---------------------+---------+---------+---------+---------+-------+
| t | srcuser | srchost | dstuser | dsthost | size |
+---------------------+---------+---------+---------+---------+-------+
| 2014-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |
| 2014-05-12 18:59:18 | barb | saturn | tricia | venus | 271 |
+---------------------+---------+---------+---------+---------+-------+
Output columns can be calculated by evaluating expressions. This
query combines the srcuser
and
srchost
columns using CONCAT()
to produce composite values in email address format:
mysql> SELECT t, CONCAT(srcuser,'@',srchost), size FROM mail;
+---------------------+-----------------------------+---------+
| t | CONCAT(srcuser,'@',srchost) | size |
+---------------------+-----------------------------+---------+
| 2014-05-11 10:15:08 | barb@saturn | 58274 |
| 2014-05-12 12:48:13 | tricia@mars | 194925 |
| 2014-05-12 15:02:49 | phil@mars | 1048 |
| 2014-05-12 18:59:18 | barb@saturn | 271 |
…
You’ll notice that the email address column label is the expression that calculates it. To provide a better label, use a column alias (see Recipe 3.2).
As of MySQL 8.0.19 you can use statement TABLE
to select all columns from the table. TABLE
supports ORDER BY
(see Recipe 3.3) and LIMIT
(see Recipe 3.11) clauses, but does not allow any other filtering of columns or rows.
mysql> TABLE mail ORDER BY size DESC LIMIT 3;
+---------------------+---------+---------+---------+---------+---------+
| t | srcuser | srchost | dstuser | dsthost | size |
+---------------------+---------+---------+---------+---------+---------+
| 2014-05-14 17:03:01 | tricia | saturn | phil | venus | 2394482 |
| 2014-05-15 10:25:52 | gene | mars | tricia | saturn | 998532 |
| 2014-05-12 12:48:13 | tricia | mars | gene | venus | 194925 |
+---------------------+---------+---------+---------+---------+---------+
3 rows in set (0.00 sec)
Use aliases to choose your own column names.
When you retrieve a result set, MySQL gives every output column a
name. (That’s how the mysql program
gets the names you see displayed in the initial row of column headers in
result set output.) By default, MySQL assigns the column names specified
in the CREATE
TABLE
or ALTER
TABLE
statement to output columns, but if these defaults are not suitable, you
can use column aliases to specify your own names.
This recipe explains aliases and shows how to use them to assign column names in statements. If you’re writing a program that must determine the names, see [Link to Come] for information about accessing column metadata.
If an output column comes directly from a table, MySQL uses the table column name for the output column name. The following statement selects four table columns, the names of which become the corresponding output column names:
mysql> SELECT t, srcuser, srchost, size FROM mail;
+---------------------+---------+---------+---------+
| t | srcuser | srchost | size |
+---------------------+---------+---------+---------+
| 2014-05-11 10:15:08 | barb | saturn | 58274 |
| 2014-05-12 12:48:13 | tricia | mars | 194925 |
| 2014-05-12 15:02:49 | phil | mars | 1048 |
| 2014-05-12 18:59:18 | barb | saturn | 271 |
…
If you generate a column by evaluating an expression, the
expression itself is the column name. This can produce long and unwieldy
names in result sets, as illustrated by the following statement that
uses one expression to reformat the dates in the t
column, and another to combine srcuser
and srchost
into email address format:
mysql>SELECT
->DATE_FORMAT(t,'%M %e, %Y'), CONCAT(srcuser,'@',srchost), size
->FROM mail;
+----------------------------+-----------------------------+---------+ | DATE_FORMAT(t,'%M %e, %Y') | CONCAT(srcuser,'@',srchost) | size | +----------------------------+-----------------------------+---------+ | May 11, 2014 | barb@saturn | 58274 | | May 12, 2014 | tricia@mars | 194925 | | May 12, 2014 | phil@mars | 1048 | | May 12, 2014 | barb@saturn | 271 | …
To choose your own output column name, use an AS
name
clause to
specify a column alias (the keyword AS
is
optional). The following statement retrieves the same result as the
previous one, but renames the first column to date_sent
and the second to sender
:
mysql>SELECT
->DATE_FORMAT(t,'%M %e, %Y') AS date_sent,
->CONCAT(srcuser,'@',srchost) AS sender,
->size FROM mail;
+--------------+---------------+---------+ | date_sent | sender | size | +--------------+---------------+---------+ | May 11, 2014 | barb@saturn | 58274 | | May 12, 2014 | tricia@mars | 194925 | | May 12, 2014 | phil@mars | 1048 | | May 12, 2014 | barb@saturn | 271 | …
The aliases make the column names more concise, easier to read, and more meaningful. Aliases are subject to a few restrictions. For example, they must be quoted if they are SQL keywords, entirely numeric, or contain spaces or other special characters (an alias can consist of several words if you want to use a descriptive phrase). The following statement retrieves the same data values as the preceding one but uses phrases to name the output columns:
mysql>SELECT
->DATE_FORMAT(t,'%M %e, %Y') AS 'Date of message',
->CONCAT(srcuser,'@',srchost) AS 'Message sender',
->size AS 'Number of bytes' FROM mail;
+-----------------+----------------+-----------------+ | Date of message | Message sender | Number of bytes | +-----------------+----------------+-----------------+ | May 11, 2014 | barb@saturn | 58274 | | May 12, 2014 | tricia@mars | 194925 | | May 12, 2014 | phil@mars | 1048 | | May 12, 2014 | barb@saturn | 271 | …
If MySQL complains about a single-word alias, the word probably is reserved. Quoting the alias should make it legal:
mysql>SELECT 1 AS INTEGER;
You have an error in your SQL syntax near 'INTEGER' mysql>SELECT 1 AS 'INTEGER';
+---------+ | INTEGER | +---------+ | 1 | +---------+
Column aliases also are useful for programming purposes. If you
write a program that fetches rows into an array and accesses them by
numeric column indexes, the presence or absence of column aliases makes
no difference because aliases don’t change the positions of columns
within the result set. However, aliases make a big difference if you
access output columns by name because aliases change those names.
Exploit this fact to give your program easier names to work with. For
example, if your query displays reformatted message time values from the
mail
table using the expression
DATE_FORMAT(t,'%M %e, %Y')
, that
expression is also the name you must use when referring to the output
column. In a Perl hashref, for example, you’d access it as $ref->{"DATE_FORMAT(t,'%M %e, %Y')"}
.
That’s inconvenient. Use AS
date_sent
to give the column an alias and you
can refer to it more easily as $ref->{date_sent}
. Here’s an example that
shows how a Perl DBI script might process such values. It retrieves rows
into a hash and refers to column values by name:
$sth
=
$dbh
->
prepare
(
"SELECT srcuser,
DATE_FORMAT(t,'%M %e, %Y') AS date_sent
FROM mail"
);
$sth
->
execute
();
while
(
my
$ref
=
$sth
->
fetchrow_hashref
())
{
printf
"user: %s, date sent: %s "
,
$ref
->
{
srcuser
},
$ref
->
{
date_sent
};
}
In Java, you’d do something like this, where the argument to getString()
names the column to access:
Statement
s
=
conn
.
createStatement
();
s
.
executeQuery
(
"SELECT srcuser,"
+
" DATE_FORMAT(t,'%M %e, %Y') AS date_sent"
+
" FROM mail"
);
ResultSet
rs
=
s
.
getResultSet
();
while
(
rs
.
next
())
// loop through rows of result set
{
String
name
=
rs
.
getString
(
"srcuser"
);
String
dateSent
=
rs
.
getString
(
"date_sent"
);
System
.
out
.
println
(
"user: "
+
name
+
", date sent: "
+
dateSent
);
}
rs
.
close
();
s
.
close
();
[Link to Come] shows for each of our
programming languages how to fetch rows into data structures that permit
access to column values by name. The select directory of the recipes
distribution has examples that show
how to do this for the mail
table.
You cannot refer to column aliases in a WHERE
clause. Thus, the following statement is illegal:
mysql>SELECT t, srcuser, dstuser, size/1024 AS kilobytes
->FROM mail WHERE kilobytes > 500;
ERROR 1054 (42S22): Unknown column 'kilobytes' in 'where clause'
The error occurs because an alias names an
output column, whereas a WHERE
clause operates on
input columns to determine which rows to select for
output. To make the statement legal, replace the alias in the WHERE
clause with the same column or
expression that the alias represents:
mysql>SELECT t, srcuser, dstuser, size/1024 AS kilobytes
->FROM mail WHERE size/1024 > 500;
+---------------------+---------+---------+-----------+ | t | srcuser | dstuser | kilobytes | +---------------------+---------+---------+-----------+ | 2014-05-14 17:03:01 | tricia | phil | 2338.3613 | | 2014-05-15 10:25:52 | gene | tricia | 975.1289 | +---------------------+---------+---------+-----------+
You want to control how your query results are sorted.
When you select rows, the MySQL server is free to return them in
any order unless you instruct it otherwise by saying how to sort the
result. There are lots of ways to use sorting techniques, as [Link to Come] explores in detail. Briefly, to sort a result set,
add an ORDER
BY
clause that names the column or columns to
use for sorting. This statement names multiple columns in the ORDER
BY
clause to sort rows by host and by user within each host:
mysql>SELECT * FROM mail WHERE dstuser = 'tricia'
->ORDER BY srchost, srcuser;
+---------------------+---------+---------+---------+---------+--------+ | t | srcuser | srchost | dstuser | dsthost | size | +---------------------+---------+---------+---------+---------+--------+ | 2014-05-15 10:25:52 | gene | mars | tricia | saturn | 998532 | | 2014-05-14 11:52:17 | phil | mars | tricia | saturn | 5781 | | 2014-05-19 12:49:23 | phil | mars | tricia | saturn | 873 | | 2014-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 | | 2014-05-12 18:59:18 | barb | saturn | tricia | venus | 271 | +---------------------+---------+---------+---------+---------+--------+
To sort a column in reverse (descending) order, add the keyword DESC
after its
name in the ORDER
BY
clause:
mysql> SELECT * FROM mail WHERE size > 50000 ORDER BY size DESC;
+---------------------+---------+---------+---------+---------+---------+
| t | srcuser | srchost | dstuser | dsthost | size |
+---------------------+---------+---------+---------+---------+---------+
| 2014-05-14 17:03:01 | tricia | saturn | phil | venus | 2394482 |
| 2014-05-15 10:25:52 | gene | mars | tricia | saturn | 998532 |
| 2014-05-12 12:48:13 | tricia | mars | gene | venus | 194925 |
| 2014-05-14 14:42:21 | barb | venus | barb | venus | 98151 |
| 2014-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |
+---------------------+---------+---------+---------+---------+---------+
Use DISTINCT
.
Some queries produce results containing duplicate rows. For example, to
see who sent mail, query the mail
table like this:
mysql> SELECT srcuser FROM mail;
+---------+
| srcuser |
+---------+
| barb |
| tricia |
| phil |
| barb |
| gene |
| phil |
| barb |
| tricia |
| gene |
| phil |
| gene |
| gene |
| gene |
| phil |
| phil |
| gene |
+---------+
That result is heavily redundant. To remove the duplicate rows and
produce a set of unique values, add DISTINCT
to the query:
mysql> SELECT DISTINCT srcuser FROM mail;
+---------+
| srcuser |
+---------+
| barb |
| tricia |
| phil |
| gene |
+---------+
To count the number of unique values in a column, use COUNT(DISTINCT)
:
mysql> SELECT COUNT(DISTINCT srcuser) FROM mail;
+-------------------------+
| COUNT(DISTINCT srcuser) |
+-------------------------+
| 4 |
+-------------------------+
DISTINCT
works with
multiple-column output, too. The following query shows which dates are
represented in the mail
table:
mysql> SELECT DISTINCT YEAR(t), MONTH(t), DAYOFMONTH(t) FROM mail;
+---------+----------+---------------+
| YEAR(t) | MONTH(t) | DAYOFMONTH(t) |
+---------+----------+---------------+
| 2014 | 5 | 11 |
| 2014 | 5 | 12 |
| 2014 | 5 | 14 |
| 2014 | 5 | 15 |
| 2014 | 5 | 16 |
| 2014 | 5 | 19 |
+---------+----------+---------------+
[Link to Come] revisits DISTINCT
and COUNT(DISTINCT)
. [Link to Come]
discusses duplicate removal in more detail.
Conditions that involve NULL
are special because NULL
means
unknown value.
Consequently, comparisons such as
value
=
NULL
or
value
<>
NULL
always produce a result of NULL
(not true or false) because it’s
impossible to tell whether they are true or false. Even NULL
=
NULL
produces NULL
because you can’t determine whether one
unknown value is the same as another.
To look for values that are or are not NULL
, use the IS
NULL
or
IS
NOT
NULL
operator. Suppose that a table named expt
contains experimental results for
subjects who are to be given four tests each and that represents tests
not yet administered using NULL
:
+---------+------+-------+ | subject | test | score | +---------+------+-------+ | Jane | A | 47 | | Jane | B | 50 | | Jane | C | NULL | | Jane | D | NULL | | Marvin | A | 52 | | Marvin | B | 45 | | Marvin | C | 53 | | Marvin | D | NULL | +---------+------+-------+
You can see that =
and <>
fail to identify NULL
values:
mysql>SELECT * FROM expt WHERE score = NULL;
Empty set (0.00 sec) mysql>SELECT * FROM expt WHERE score <> NULL;
Empty set (0.00 sec)
Write the statements like this instead:
mysql>SELECT * FROM expt WHERE score IS NULL;
+---------+------+-------+ | subject | test | score | +---------+------+-------+ | Jane | C | NULL | | Jane | D | NULL | | Marvin | D | NULL | +---------+------+-------+ mysql>SELECT * FROM expt WHERE score IS NOT NULL;
+---------+------+-------+ | subject | test | score | +---------+------+-------+ | Jane | A | 47 | | Jane | B | 50 | | Marvin | A | 52 | | Marvin | B | 45 | | Marvin | C | 53 | +---------+------+-------+
The MySQL-specific <=>
comparison operator, unlike the =
operator, is true even for two NULL
values:
mysql> SELECT NULL = NULL, NULL <=> NULL;
+-------------+---------------+
| NULL = NULL | NULL <=> NULL |
+-------------+---------------+
| NULL | 1 |
+-------------+---------------+
Sometimes it’s useful to map NULL
values onto some other value that has
more meaning in the context of your application. For example, use IF()
to map
NULL
onto the string Unknown
:
mysql>SELECT subject, test, IF(score IS NULL,'Unknown', score) AS 'score'
->FROM expt;
+---------+------+---------+ | subject | test | score | +---------+------+---------+ | Jane | A | 47 | | Jane | B | 50 | | Jane | C | Unknown | | Jane | D | Unknown | | Marvin | A | 52 | | Marvin | B | 45 | | Marvin | C | 53 | | Marvin | D | Unknown | +---------+------+---------+
This IF()
-based mapping
technique works for any kind of value, but it’s especially useful with
NULL
values because NULL
tends to be given a variety of meanings:
unknown, missing, not yet determined, out of range, and so forth. Choose
the label that makes the most sense in a given context.
The preceding query can be written more concisely using IFNULL()
, which
tests its first argument and returns it if it’s not NULL
, or returns its second argument
otherwise:
SELECT
subject
,
test
,
IFNULL
(
score
,
'Unknown'
)
AS
'score'
FROM
expt
;
In other words, these two tests are equivalent:
IF(expr1
IS NOT NULL,expr1
,expr2
) IFNULL(expr1
,expr2
)
From a readability standpoint, IF()
often is easier to understand than
IFNULL()
. From a computational
perspective, IFNULL()
is more
efficient because expr1
need not be evaluated
twice, as happens with IF()
.
You’re writing a program that looks for rows containing a specific
value, but it fails when the value is NULL
.
Choose the proper comparison operator according to whether the
comparison value is or is not NULL
.
Recipe 3.5 discusses the need to
use different comparison operators for NULL
values than for non-NULL
values in SQL statements. This issue
leads to a subtle danger when constructing statement strings within
programs. If a value stored in a variable might represent a NULL
value, you must account for that when you
use the value in comparisons. For example, in Python, None
represents a NULL
value, so to
construct a statement that finds rows in the expt
table matching some arbitrary value in a
score
variable, you cannot do
this:
cursor
.
execute
(
"SELECT * FROM expt WHERE score =
%s
"
,
(
score
,))
The statement fails when score
is None
because the resulting
statement becomes:
SELECT
*
FROM
expt
WHERE
score
=
NULL
A comparison of score
=
NULL
is
never true, so that statement returns no rows. To take into account the
possibility that score
could be
None
, construct the statement using
the appropriate comparison operator like this:
operator
=
"IS"
if
score
is
None
else
"="
cursor
.
execute
(
"SELECT * FROM expt WHERE score {}
%s
"
.
format
(
operator
),
(
score
,))
This results in statements as follows for score
values of None
(NULL
) or 43 (not NULL
):
SELECT
*
FROM
expt
WHERE
score
IS
NULL
SELECT
*
FROM
expt
WHERE
score
=
43
For inequality tests, set operator
like this instead:
operator
=
"IS NOT"
if
score
is
None
else
"<>"
Use a view defined such that its columns perform the desired calculations.
Suppose that you retrieve several values from the mail
table, using expressions to calculate
most of them:
mysql>SELECT
->DATE_FORMAT(t,'%M %e, %Y') AS date_sent,
->CONCAT(srcuser,'@',srchost) AS sender,
->CONCAT(dstuser,'@',dsthost) AS recipient,
->size FROM mail;
+--------------+---------------+---------------+---------+ | date_sent | sender | recipient | size | +--------------+---------------+---------------+---------+ | May 11, 2014 | barb@saturn | tricia@mars | 58274 | | May 12, 2014 | tricia@mars | gene@venus | 194925 | | May 12, 2014 | phil@mars | phil@saturn | 1048 | | May 12, 2014 | barb@saturn | tricia@venus | 271 | …
If you must issue such a statement often, it’s inconvenient to
keep writing the expressions. To make the statement results easier to
access, use a view, which is a virtual table that contains no data.
Instead, it’s defined as the SELECT
statement that retrieves the data of interest. The following view,
mail_view
, is equivalent to the
SELECT
statement just shown:
mysql>CREATE VIEW mail_view AS
->SELECT
->DATE_FORMAT(t,'%M %e, %Y') AS date_sent,
->CONCAT(srcuser,'@',srchost) AS sender,
->CONCAT(dstuser,'@',dsthost) AS recipient,
->size FROM mail;
To access the view contents, refer to it like any other table. You
can select some or all of its columns, add a WHERE
clause to restrict which rows to retrieve, use ORDER
BY
to
sort the rows, and so forth. For example:
mysql>SELECT date_sent, sender, size FROM mail_view
->WHERE size > 100000 ORDER BY size;
+--------------+---------------+---------+ | date_sent | sender | size | +--------------+---------------+---------+ | May 12, 2014 | tricia@mars | 194925 | | May 15, 2014 | gene@mars | 998532 | | May 14, 2014 | tricia@saturn | 2394482 | +--------------+---------------+---------+
Stored programs provide another way to encapsulate calculations (see [Link to Come]).
The queries shown so far select data from a single table, but sometimes you must retrieve information from multiple tables. Two types of statements that accomplish this are joins and subqueries. A join matches rows in one table with rows in another and enables you to retrieve output rows that contain columns from either or both tables. A subquery is one query nested within another, to perform a comparison between values selected by the inner query against values selected by the outer query.
This recipe shows a couple brief examples to illustrate the basic ideas. Other examples appear elsewhere: subqueries are used in various examples throughout the book (for example, Recipe 3.10 and [Link to Come]). [Link to Come] discusses joins in detail, including some that select from more than two tables.
The following examples use the profile
table introduced in [Link to Come]. Recall that it lists the people on your buddy
list:
mysql> SELECT * FROM profile;
+----+---------+------------+-------+-----------------------+------+
| id | name | birth | color | foods | cats |
+----+---------+------------+-------+-----------------------+------+
| 1 | Sybil | 1970-04-13 | black | lutefisk,fadge,pizza | 0 |
| 2 | Nancy | 1969-09-30 | white | burrito,curry,eggroll | 3 |
| 3 | Ralph | 1973-11-02 | red | eggroll,pizza | 4 |
| 4 | Lothair | 1963-07-04 | blue | burrito,curry | 5 |
| 5 | Henry | 1965-02-14 | red | curry,fadge | 1 |
| 6 | Aaron | 1968-09-17 | green | lutefisk,fadge | 1 |
| 7 | Joanna | 1952-08-20 | green | lutefisk,fadge | 0 |
| 8 | Stephen | 1960-05-01 | white | burrito,pizza | 0 |
+----+---------+------------+-------+-----------------------+------+
Let’s extend use of the profile
table to include another table named profile_contact
. This second table indicates
how to contact people listed in the profile
table via various social media
services and is defined like this:
CREATE
TABLE
profile_contact
(
profile_id
INT
UNSIGNED
NOT
NULL
,
#
ID
from
profile
table
service
VARCHAR
(
20
)
NOT
NULL
,
#
social
media
service
name
contact_name
VARCHAR
(
25
)
NOT
NULL
,
#
name
to
use
for
contacting
person
INDEX
(
profile_id
)
);
The table associates each row with the proper profile
row via the profile_id
column. The service
and contact_name
columns name the media service
and the name to use for contacting the given person via that service.
For the examples, assume that the table contains these rows:
mysql> SELECT * FROM profile_contact ORDER BY profile_id, service;
+------------+----------+--------------+
| profile_id | service | contact_name |
+------------+----------+--------------+
| 1 | Facebook | user1-fbid |
| 1 | Twitter | user1-twtrid |
| 2 | Facebook | user2-msnid |
| 2 | LinkedIn | user2-lnkdid |
| 2 | Twitter | user2-fbrid |
| 4 | LinkedIn | user4-lnkdid |
+------------+----------+--------------+
A question that requires information from both tables is,
For each person in the
To answer this question, use a join. Select from both tables and match
rows by comparing the profile
table, show me which services I
can use to get in touch, and the contact name for each service.id
column from
the profile
table with the profile_id
column from the profile_contact
table:
mysql>SELECT id, name, service, contact_name
->FROM profile INNER JOIN profile_contact ON id = profile_id;
+----+---------+----------+--------------+ | id | name | service | contact_name | +----+---------+----------+--------------+ | 1 | Sybil | Twitter | user1-twtrid | | 1 | Sybil | Facebook | user1-fbid | | 2 | Nancy | Twitter | user2-fbrid | | 2 | Nancy | Facebook | user2-msnid | | 2 | Nancy | LinkedIn | user2-lnkdid | | 4 | Lothair | LinkedIn | user4-lnkdid | +----+---------+----------+--------------+
The FROM
clause indicates the
tables from which to select data, and the ON
clause tells MySQL which columns to use to
find matches between the tables. In the result, rows include the
id
and name
columns from the profile
table, and the service
and contact_name
columns from the profile_contact
table.
Here’s another question that requires both tables to answer:
List all the
To pull the proper rows from the profile_contact
records for Nancy.profile_contact
table, you need Nancy’s ID,
which is stored in the profile
table.
To write the query without looking up Nancy’s ID yourself, use a
subquery that, given her name, looks it up for you:
mysql>SELECT * FROM profile_contact
->WHERE profile_id = (SELECT id FROM profile WHERE name = 'Nancy');
+------------+----------+--------------+ | profile_id | service | contact_name | +------------+----------+--------------+ | 2 | Twitter | user2-fbrid | | 2 | Facebook | user2-msnid | | 2 | LinkedIn | user2-lnkdid | +------------+----------+--------------+
Here the subquery appears as a nested SELECT
statement
enclosed within parentheses.
MySQL supports a LIMIT
clause
that tells the server to return only part of a result set. LIMIT
is a MySQL-specific extension to SQL
that is extremely valuable when your result set contains more rows than
you want to see at a time. It enables you to retrieve an arbitrary
section of a result set. Typical LIMIT
uses include the following kinds of
problems:
Answering questions about first or last, largest or smallest, newest or oldest, least or most expensive, and so forth.
Splitting a result set into sections so that you can process it one piece at a time. This technique is common in web applications for displaying a large search result across several pages. Showing the result in sections enables display of smaller, easier-to-understand pages.
The following examples use the profile
table shown in Recipe 3.8. To see the first
n
rows of a SELECT
result, add LIMIT
n
to the end
of the statement:
mysql>SELECT * FROM profile LIMIT 1;
+----+-------+------------+-------+----------------------+------+ | id | name | birth | color | foods | cats | +----+-------+------------+-------+----------------------+------+ | 1 | Sybil | 1970-04-13 | black | lutefisk,fadge,pizza | 0 | +----+-------+------------+-------+----------------------+------+ mysql>SELECT * FROM profile LIMIT 3;
+----+-------+------------+-------+-----------------------+------+ | id | name | birth | color | foods | cats | +----+-------+------------+-------+-----------------------+------+ | 1 | Sybil | 1970-04-13 | black | lutefisk,fadge,pizza | 0 | | 2 | Nancy | 1969-09-30 | white | burrito,curry,eggroll | 3 | | 3 | Ralph | 1973-11-02 | red | eggroll,pizza | 4 | +----+-------+------------+-------+-----------------------+------+
LIMIT
n
means return at
most
If you
specify n
rows.LIMIT
10
, and the result set has only four rows, the
server returns four rows.
The rows in the preceding query results are returned in no
particular order, so they may not be very meaningful. A more common
technique uses ORDER
BY
to sort the result set and LIMIT
to find smallest and largest values. For
example, to find the row with the minimum (earliest) birth date, sort by
the birth
column, then add LIMIT
1
to
retrieve the first row:
mysql> SELECT * FROM profile ORDER BY birth LIMIT 1;
+----+--------+------------+-------+----------------+------+
| id | name | birth | color | foods | cats |
+----+--------+------------+-------+----------------+------+
| 7 | Joanna | 1952-08-20 | green | lutefisk,fadge | 0 |
+----+--------+------------+-------+----------------+------+
This works because MySQL processes the ORDER
BY
clause to sort the rows, then applies LIMIT
.
To obtain rows from the end of a result set, sort them in the opposite order. The statement that finds the row with the most recent birth date is similar to the previous one, except that the sort order is descending:
mysql> SELECT * FROM profile ORDER BY birth DESC LIMIT 1;
+----+-------+------------+-------+---------------+------+
| id | name | birth | color | foods | cats |
+----+-------+------------+-------+---------------+------+
| 3 | Ralph | 1973-11-02 | red | eggroll,pizza | 4 |
+----+-------+------------+-------+---------------+------+
To find the earliest or latest birthday within the calendar year,
sort by the month and day of the birth
values:
mysql>SELECT name, DATE_FORMAT(birth,'%m-%d') AS birthday
->FROM profile ORDER BY birthday LIMIT 1;
+-------+----------+ | name | birthday | +-------+----------+ | Henry | 02-14 | +-------+----------+
You can obtain the same information by running these statements
without LIMIT
and ignoring everything
but the first row. The advantage of LIMIT
is that the server returns only the
first row, and the extra rows don’t cross the network at all. This is
much more efficient than retrieving an entire result set, only to
discard all but one row.
To pull rows from the middle of a result set, use the two-argument
form of LIMIT
, which enables you to
pick an arbitrary section of rows. The arguments indicate how many rows
to skip and how many to return. This means that you can use LIMIT
to do such things as skip two rows and
return the next one, thus answering questions such as What is the
third-smallest or
third-largest value?
These are questions
that MIN()
or MAX()
are not suited for, but are easy with
LIMIT
:
mysql>SELECT * FROM profile ORDER BY birth LIMIT 2,1;
+----+---------+------------+-------+---------------+------+ | id | name | birth | color | foods | cats | +----+---------+------------+-------+---------------+------+ | 4 | Lothair | 1963-07-04 | blue | burrito,curry | 5 | +----+---------+------------+-------+---------------+------+ mysql>SELECT * FROM profile ORDER BY birth DESC LIMIT 2,1;
+----+-------+------------+-------+-----------------------+------+ | id | name | birth | color | foods | cats | +----+-------+------------+-------+-----------------------+------+ | 2 | Nancy | 1969-09-30 | white | burrito,curry,eggroll | 3 | +----+-------+------------+-------+-----------------------+------+
The two-argument form of LIMIT
also makes it possible to partition a result set into smaller sections.
For example, to retrieve 20 rows at a time from a result, issue a
SELECT
statement repeatedly, but vary
its LIMIT
clause like so:
SELECT
...
FROM
...
ORDER
BY
...
LIMIT
0
,
20
;
SELECT
...
FROM
...
ORDER
BY
...
LIMIT
20
,
20
;
SELECT
...
FROM
...
ORDER
BY
...
LIMIT
40
,
20
;
…
Web developers often use LIMIT
this way to split a large search result into smaller, more manageable
pieces so that it can be presented over several pages.
To determine the number of rows in a result set so that you can
determine the number of sections, issue a COUNT()
statement first. For example, to display profile
table rows in name order, three at a
time, you can find out how many there are with the following
statement:
mysql> SELECT COUNT(*) FROM profile;
+----------+
| COUNT(*) |
+----------+
| 8 |
+----------+
That tells you that there are three sets of rows (the last with fewer than three rows), which you can retrieve as follows:
SELECT
*
FROM
profile
ORDER
BY
name
LIMIT
0
,
3
;
SELECT
*
FROM
profile
ORDER
BY
name
LIMIT
3
,
3
;
SELECT
*
FROM
profile
ORDER
BY
name
LIMIT
6
,
3
;
You can also fetch the first part of a result set and determine at
the same time how big the result would have been without the LIMIT
clause. To fetch the first three rows
from the profile
table, and then
obtain the size of the full result, run these statements:
SELECT
SQL_CALC_FOUND_ROWS
*
FROM
profile
ORDER
BY
name
LIMIT
4
;
SELECT
FOUND_ROWS
();
The keyword SQL_CALC_FOUND_ROWS
in the first statement tells MySQL to calculate the size of
the entire result set even though the statement requests that only part
of it be returned. The row count is available by calling FOUND_ROWS()
.
If that function returns a value greater than three, there are other
rows yet to be retrieved.
LIMIT
is useful in combination
with RAND()
to make random selections from a set of items. See [Link to Come].
You can use LIMIT
to restrict
the effect of a DELETE
or
UPDATE
statement to a subset of the
rows that would otherwise be deleted or updated, respectively. For more
information about using LIMIT
for
duplicate row removal, see [Link to Come].
WrongSort Order
If you want the last four rows of a result set, you can obtain
them easily by sorting the set in reverse order and using LIMIT
4
.
The following statement returns the names and birth dates for the four
people in the profile
table who were
born most recently:
mysql> SELECT name, birth FROM profile ORDER BY birth DESC LIMIT 4;
+-------+------------+
| name | birth |
+-------+------------+
| Ralph | 1973-11-02 |
| Sybil | 1970-04-13 |
| Nancy | 1969-09-30 |
| Aaron | 1968-09-17 |
+-------+------------+
But that requires sorting the birth
values in descending order to place them
at the head of the result set. What if you want the output rows to
appear in ascending order instead? Use the SELECT
as a subquery of an outer statement
that re-sorts the rows in the desired final order:
mysql>SELECT * FROM
->(SELECT name, birth FROM profile ORDER BY birth DESC LIMIT 4) AS t
->ORDER BY birth;
+-------+------------+ | name | birth | +-------+------------+ | Aaron | 1968-09-17 | | Nancy | 1969-09-30 | | Sybil | 1970-04-13 | | Ralph | 1973-11-02 | +-------+------------+
AS
t
is used here because any table referred to in the FROM
clause must
have a name, even a derived
table produced from a
subquery.
Sadly, you cannot. LIMIT
arguments must be literal integers—unless you issue the statement in a
context that permits the statement string to be constructed dynamically.
In that case, you can evaluate the expressions yourself and insert the
resulting values into the statement string.
Arguments to LIMIT
must be
literal integers, not expressions. Statements such as the following are
illegal:
SELECT
*
FROM
profile
LIMIT
5
+
5
;
SELECT
*
FROM
profile
LIMIT
@
skip_count
,
@
show_count
;
The same no expressions permitted
principle applies
if you use an expression to calculate a LIMIT
value in a program that constructs a
statement string. You must evaluate the expression first, and then place
the resulting value in the statement. For example, if you produce a
statement string in Perl or PHP as follows, an error will result when
you attempt to execute the statement:
$str
=
"SELECT * FROM profile LIMIT $x + $y"
;
To avoid the problem, evaluate the expression first:
$z
=
$x
+
$y
;
$str
=
"SELECT * FROM profile LIMIT $z"
;
Or do this (don’t omit the parentheses or the expression won’t evaluate properly):
$str
=
"SELECT * FROM profile LIMIT "
.
(
$x
+
$y
);
To construct a two-argument LIMIT
clause, evaluate both expressions before
placing them into the statement string.
Another issue related to LIMIT
(or other syntax constructions that require literal integer values)
occurs when you use prepared statements from an API that quotes all data
values as strings when binding them to parameter markers. Suppose that
you prepare and execute a statement like this in PDO:
$sth
=
$dbh
->
prepare
(
"SELECT * FROM profile LIMIT ?,?"
);
$sth
->
execute
(
array
(
2
,
4
));
The resulting statement is as follows, with quoted LIMIT
arguments, so statement execution
fails:
SELECT
*
FROM
profile
LIMIT
'2'
,
'4'
To avoid this problem, evaluate the LIMIT
arguments and place them in the
statement yourself, as just described. Alternatively, if your API has
type-hinting capability, use it to indicate that the LIMIT
arguments are integers to prevent them
from being quoted.
You want to combine rows, retrieved by two or more SELECT
statements into one result set.
Use UNION
clause.
The mail
table stores user names and hosts of the email senders and recipients. But what if we want to know all the user and host combinations possible?
Naive approach would be to choose either sender or receiver pairs. But if we perform even very basic test by comparing number of unique user-host combinations we will find out that it is different for each of directions.
mysql>select count(distinct srcuser, srchost) from mail;
+----------------------------------+ | count(distinct srcuser, srchost) | +----------------------------------+ | 9 | +----------------------------------+ 1 row in set (0.01 sec) mysql>select count(distinct dstuser, dsthost) from mail;
+----------------------------------+ | count(distinct dstuser, dsthost) | +----------------------------------+ | 10 | +----------------------------------+ 1 row in set (0.00 sec)
We also do not know if our table stores emails from users who only send them and for users who receive but never send.
To get the full list we need to select pairs for both sender and receiver, then remove duplicates. SQL clause UNION DISTINCT
and its short form UNION
does exactly that. It combines results of two or more SELECT
queries that select the same number of columns of the same type.
By default UNION
uses column names of the first SELECT
for the full result set header, but we can also use aliases as discussed in Recipe 3.2.
mysql>SELECT DISTINCT srcuser AS user, srchost AS host FROM mail
->UNION
->SELECT DISTINCT dstuser AS user, dsthost AS host FROM mail;
+--------+--------+ | user | host | +--------+--------+ | barb | saturn | | tricia | mars | | phil | mars | | gene | venus | | barb | venus | | tricia | saturn | | gene | mars | | phil | venus | | gene | saturn | | phil | saturn | | tricia | venus | | barb | mars | +--------+--------+ 12 rows in set (0.00 sec)
You may sort as individual query, participating in UNION
, as well the whole result. If you do not want to remove duplicates from the result use clause UNION ALL
.
To demonstrate this lets create a query that will find four users who sent the highest number of emails and four users who recieved the highest number of emails, then sort result of the union by the user name.
mysql>(SELECT CONCAT(srcuser, '@', srchost) AS user, COUNT(*) AS emails
->FROM mail GROUP BY srcuser, srchost ORDER BY emails DESC LIMIT 4)
->UNION ALL
->(SELECT CONCAT(dstuser, '@', dsthost) AS user, COUNT(*) AS emails
->FROM mail GROUP BY dstuser, dsthost ORDER BY emails DESC LIMIT 4)
->ORDER BY user;
+---------------+--------+ | user | emails | +---------------+--------+ | barb@mars | 2 | | barb@saturn | 2 | | barb@venus | 2 | | gene@saturn | 2 | | gene@venus | 2 | | gene@venus | 2 | | phil@mars | 3 | | tricia@saturn | 3 | +---------------+--------+ 8 rows in set (0.00 sec)
Concatenate user and host into email address of the user.
Order first SELECT
result by number of emails descending and limit number of retrieved rows.
Order result of the second SELECT
.
Order result of the UNION
by the user email address.
We used clause UNION ALL
instead of UNION [DISTINCT]
, therefore we have two entries for gene@venus
in the result. This user is in the top list of those who send emails and also of those who recieve emails.
You want to retrieve not only table columns, but also results of queries that use these columns.
Use subquery in the column list.
Suppose that you want to know not only how many emails sent a particular user, but also how many emails they received. You cannot do it without accessing the table mail
two times: one to count how many emails were sent and second to count how many emails were received.
One of solutions for this issue is to use subqueries in the column list.
mysql>SELECT CONCAT(srcuser, '@', srchost) AS user, COUNT(*) AS mails_sent,
->(SELECT COUNT(*) FROM mail d WHERE d.dstuser=m.srcuser AND d.dsthost=m.srchost)
->AS mails_received
->FROM mail m
->GROUP BY srcuser, srchost
->ORDER BY mails_sent DESC;
+---------------+------------+----------------+ | user | mails_sent | mails_received | +---------------+------------+----------------+ | phil@mars | 3 | 0 | | barb@saturn | 2 | 0 | | gene@venus | 2 | 2 | | gene@mars | 2 | 1 | | phil@venus | 2 | 2 | | gene@saturn | 2 | 1 | | tricia@mars | 1 | 1 | | barb@venus | 1 | 2 | | tricia@saturn | 1 | 3 | +---------------+------------+----------------+ 9 rows in set (0.00 sec)
First we are retrieved a user name and a host of the sender and count number of emails that they sent.
To find the number of emails that this user received we are using subquery to the same table mail
. In the WHERE
clause we select only those rows where receiver has the same credentials as the sender in the main query.
A subquery in the column list must have its own alias.
To display statistics per user we use clause GROUP BY
, so result is groupped by each user name and host. We discuss GROUP BY
clause in details in [Link to Come].
3.15.202.4