The Basics of SELECT Queries

The SELECT query can be run with a range of possible arguments. Its syntax, showing the majority of its options (but not all), looks like this:

SELECT
  [DISTINCT]
  select_expression,...
  [INTO {OUTFILE | DUMPFILE} '/path/to/filename' export_options]
  [FROM table_references
  [WHERE where_definition]
  [GROUP BY {column_name | column_alias}
    [ASC | DESC], ...]
  [HAVING where_definition]
  [ORDER BY {column_name | column_alias}
    [ASC | DESC], ...]
  [LIMIT [offset,] num_rows]

Some arguments have been omitted for clarity where they are less frequently used. See Appendix B, “SQL Reference,” for the full detail.

You will often use just a small part of its syntax; often just this:

SELECT
 select_expression,...
 [FROM table_name]
 [WHERE where_definition]

Let's consider this simpler form first, before exploring some of the more complex possibilities.

SELECT Queries Made Simple

In simple terms, the SELECT query lets you read data from a table. If you're reading from a table and add a WHERE clause, you can require that the returned table rows match a condition that you specify, denoted in the preceding syntax by where_definition.

You can select from a single table or several tables, and you can write expressions around column names, so that table values are processed in some way before reaching the resultset. You can also have a SELECT query with no tables and purely evaluate expressions. You'll see some examples of these things in a moment.

Let's look at a query that reads every row from a single table, products, which has columns for id, product_name, and price. SELECT reads each row of the table in turn and puts the data into a resultset. The resultset can then be displayed to the user (as we'll do in our examples, in the mysql console), but it could be returned to an application that is querying MySQL through an API.

You can use a wildcard * to retrieve the data in every column of a table. Here's the sample query:

mysql> SELECT * FROM products;
						

This is identical to naming all the columns:

mysql> SELECT id, product_name, price FROM products;
						

(Remember from Day 4, “Getting Hands-On with MySQL,” that the trailing semicolon; after the query is not part of the query itself and just tells the mysql console that you've finished your query.)

The output looks like this:

+------+--------------------+-------+
| id   | product_name       | price |
+------+--------------------+-------+
|  102 | Light brown jacket | 10.00 |
|  103 | Leather jacket     | 28.00 |
|  104 | Charcoal trousers  | 39.50 |
|  113 | Dark grey jacket   | 10.00 |
+------+--------------------+-------+

The names of the columns in products are returned as column headers in the resultset, with the data for every row in the body of the resultset.

You can restrict the data you retrieve to include just a subset of the columns. For example, to get only the column product_name, you would name just this column, like this:

mysql> SELECT product_name FROM products;
+--------------------+
| product_name       |
+--------------------+
| Light brown jacket |
| Leather jacket     |
| Charcoal trousers  |
| Dark grey jacket   |
+--------------------+

You can control which rows are retrieved by adding a WHERE condition. For example, to only retrieve products with an id equal to 102, and to display product_name and price, you would do this:

mysql> SELECT product_name, price FROM products
    -> WHERE product_id=102;
+--------------------+-------+
| product_name       | price |
+--------------------+-------+
| Light brown jacket | 10.00 |
+--------------------+-------+

You can ask SELECT to evaluate expressions for you, without needing to refer to a table. For example:

mysql> SELECT 12.50 * 7, 1+2, 'hello world';
+-----------+-----+-------------+
| 12.50 * 7 | 1+2 | hello world |
+-----------+-----+-------------+
|     87.50 |   3 | hello world |
+-----------+-----+-------------+

In this example, the query has multiplied two numbers together using the * multiplication operator, added two numbers using +, and returned a constant string.

SELECT can also evaluate functions, of which MySQL has a large library. For example, it can process dates and times, and here's an example of it getting the current time using the function NOW():

mysql> SELECT NOW();
+---------------------+
| NOW()               |
+---------------------+
| 2002-09-07 20:05:45 |
+---------------------+

Operators and functions are explained more thoroughly in Day 10, “Operators and Functions in MySQL.”

SELECT in More Detail

Now that you've seen how SELECT works in its most simple form, it's time to look at more of its possibilities. Take a look at its syntax again:

SELECT
  [DISTINCT]
  select_expression,...
  [FROM table_references
  [WHERE where_definition]
  [GROUP BY {column_name | column_alias}
    [ASC | DESC], ...]
  [HAVING where_definition]
  [ORDER BY {column_name | column_alias}
    [ASC | DESC], ...]
  [LIMIT [offset,] num_rows]

Let's go through the preceding syntax again, this time a little more thoroughly:

  • select_expression specifies a list of things to be retrieved or evaluated: names of columns, constants, or the result of evaluating operators and functions (comma-separated if more than one, and optionally with alias names).

  • table_references specifies a list of tables in which any named columns reside (comma-separated if more than one table, and optionally with table alias names and/or join specifications).

  • where_definition specifies the conditions for inclusion of rows.

  • DISTINCT specifies that only unique rows should be returned.

  • The GROUP BY clause specifies how results should be grouped, by a column given by column_name or column_alias; can be in either ascending (ASC) or descending (DESC) order.

  • The HAVING clause specifies how additional processing should be applied to a resultset, just before being sent to the user.

  • The ORDER BY clause specifies how results should be sorted, by a column given by column_name or column_alias; can be in either ascending (ASC) or descending (DESC) order.

  • The LIMIT clause specifies how the resultset should be limited, by allowing only a subset of the total rows it may comprise given by num_rows, optionally omitting offset rows.

Don't worry about taking all this in at once! We'll spend much of today going through these options one by one.

In the syntax just now, aliases were mentioned. You should recall from Day 6, “Manipulating a Database,” that an alias is a way of calling a table, a column, or some expression by another name. You use the keyword AS to declare an alias. The meaning of the alias will be the same as the original thing, but the alias is a shorter or more convenient name.

The syntax also mentioned joins. What's a join? SELECT can retrieve data from several tables at once, and in doing so can also cross-reference rows in one table with rows in another. You'll learn how to perform joins in Day 9, “Joins and Indexes.”

If you use a where_definition in a SELECT, it defines what limitations you placed on the rows that are retrieved. You already saw in this lesson how to limit the rows retrieved from the products table by placing a restriction on their names. You can have several conditions in a WHERE clause, including operators, functions, and expressions, which are brought together using AND and OR conditions, among other things.

Here's an example of a more complex SELECT statement, which combines column names, expressions, and a WHERE clause:

mysql> SELECT product_name, price, price*(17.5/100)
    -> FROM products
    -> WHERE product_name LIKE 'Leather%'
    -> AND price < 30;
+----------------+-------+------------------+
| product_name   | price | price*(17.5/100) |
+----------------+-------+------------------+
| Leather jacket | 28.00 |            4.900 |
+----------------+-------+------------------+

In this example, the select_expression contains not just two column names, but an expression that multiplies price by 17.5/100 (for calculating sales tax, for example).

Rows will only be retrieved where the overall where_definition resolves to a logical “true.” The where_definition contains two conditions, combined with an AND clause. So products must have a name starting with the word Leather and have a price of under 20 to be included in the resultset.

When executing the SELECT query, MySQL considers each row of the table products and checks that the where_definition evaluates to “true.” You could of course write this:

mysql> SELECT product_name, price, price*(12.5/100)
    -> FROM products
    -> WHERE 1;
						

It would retrieve every row (as if you had omitted the WHERE clause) because the where_condition is 1—a logical true—for every row.

You've seen how to define what will be retrieved by a query, and how to apply a WHERE clause. But what about the other elements of a SELECT? Let's look at those now.

Controlling the Resultset of a SELECT Query

How else can you control what is returned by SELECT? Here are a few other controls that can apply:

  • Only retrieve rows that are distinct—that is, you want no duplicate rows in your resultset.

  • Sort the resultset. You can have the resultset data listed in the order you specify.

  • Group the data in the resultset.

  • Retrieve only the first n rows of the total resultset, or retrieve n rows starting from a given row number.

The tools that enable you to control the resultset in these ways are DISTINCT, ORDER BY, GROUP BY, and LIMIT. Let's look at each one in turn.

Selecting with DISTINCT

The keyword DISTINCT tells SELECT to retrieve only unique rows in the resultset. Duplicate rows appear only once in the output.

Look at the sample products table again:

mysql> SELECT * FROM products;
+------+--------------------+-------+
| id   | product_name       | price |
+------+--------------------+-------+
|  102 | Light brown jacket | 10.00 |
|  103 | Leather jacket     | 28.00 |
|  104 | Charcoal trousers  | 39.50 |
|  113 | Dark grey jacket   | 10.00 |
+------+--------------------+-------+

Suppose that you want to list prices only, but you want each price to appear only once in the resultset, regardless of how many products have that price. You would use DISTINCT, like this:

mysql> SELECT DISTINCT price FROM products;
+-------+
| price |
+-------+
| 10.00 |
| 28.00 |
| 39.50 |
+-------+

Although there's more than one product whose price is 10.00, the number 10.00 is returned only once.

Note, however, that MySQL looks for entire rows of the resultset to be distinct. So the following query now returns items with the same price multiple times, because their product_name is different between the rows:

mysql> SELECT DISTINCT price, product_name FROM products;
+-------+--------------------+
| price | product_name       |
+-------+--------------------+
| 10.00 | Light brown jacket |
| 28.00 | Leather jacket     |
| 39.50 | Charcoal trousers  |
| 10.00 | Dark grey jacket   |
+-------+--------------------+
							

Sorting the Resultset with ORDER BY

It's common to want to sort the resultset. You can sort rows on numerical or string values, and even sort on several things in a defined order of precedence.

Here's a simple example, listing the products in price order:

mysql> SELECT * FROM products ORDER BY price;
+------+--------------------+-------+
| id   | product_name       | price |
+------+--------------------+-------+
|  102 | Light brown jacket | 10.00 |
|  113 | Dark grey jacket   | 10.00 |
|  103 | Leather jacket     | 28.00 |
|  104 | Charcoal trousers  | 39.50 |
+------+--------------------+-------+

As you can see, the rows are sorted with the lowest price first. They're in ascending order (lowest first).

You can use the keyword ASC if you want to be absolutely clear when declaring the sort order, but it's not necessary because ascending is the default. This does the same thing:

mysql> SELECT * FROM products ORDER BY price ASC;
							

To sort in descending order, use DESC, like this:

mysql> SELECT * FROM products ORDER BY price DESC;
+------+--------------------+-------+
| id   | product_name       | price |
+------+--------------------+-------+
|  104 | Charcoal trousers  | 39.50 |
|  103 | Leather jacket     | 28.00 |
|  102 | Light brown jacket | 10.00 |
|  113 | Dark grey jacket   | 10.00 |
+------+--------------------+-------+

Tip

Don't overuse the * wildcard in SELECT queries when you're writing applications. You should retrieve all columns if you need them, but not otherwise.

Retrieving columns that your application doesn't need just wastes system resources and slows down performance.


You can sort just as easily on a text column, like this, which puts items in alphabetical order:

mysql> SELECT * FROM products ORDER BY product_name;
+------+--------------------+-------+
| id   | product_name       | price |
+------+--------------------+-------+
|  104 | Charcoal trousers  | 39.50 |
|  113 | Dark grey jacket   | 10.00 |
|  103 | Leather jacket     | 28.00 |
|  102 | Light brown jacket | 10.00 |
+------+--------------------+-------+

Note

Particular rules define the order of sorting.

For example, A always comes before (is less than) B. With nonbinary data types lowercase letters are treated as equal to uppercase letters. But if you use a BLOB type or the BINARY operator when declaring the column, the cases will be treated differently, and lowercase letters will come before uppercase.

The rules for sorting are explained in Day 10.


You can sort on several columns and impose an order of sort precedence by listing several resultset columns in your ORDER BY clause. Put them in order of sort precedence.

For example, to list the items by price first (descending) and product_name second (ascending), you would do this:

mysql> SELECT * FROM products ORDER BY price DESC, product_name;
+------+--------------------+-------+
| id   | product_name       | price |
+------+--------------------+-------+
|  104 | Charcoal trousers  | 39.50 |
|  103 | Leather jacket     | 28.00 |
|  113 | Dark grey jacket   | 10.00 |
|  102 | Light brown jacket | 10.00 |
+------+--------------------+-------+
							

Grouping the Resultset with GROUP BY

You may want to return your data in groups as well as sorting it. The principal reason for doing this is to apply an aggregating function, which provides information about each group rather than about individual rows.

To explain this, consider a new example table called child_products:

mysql> SELECT * FROM child_products;
+-----+--------------+------+-------+
| id  | name         | age  | price |
+-----+--------------+------+-------+
| 101 | Toy train    | 3-5  | 12.99 |
| 102 | Racing car   | 3-5  |  9.99 |
| 103 | Spinning top | 3-5  |  7.50 |
| 104 | Teddy bear   | 0-2  | 12.50 |
| 105 | Kitchen      | 2-3  | 24.99 |
+-----+--------------+------+-------+

Each item has a unique id number, a name, the age range of its intended customers, and a price.

You can use a GROUP BY clause to group items by age, like this:

mysql> SELECT * FROM child_products GROUP BY age;
+-----+------------+------+-------+
| id  | name       | age  | price |
+-----+------------+------+-------+
| 104 | Teddy bear | 0-2  | 12.50 |
| 105 | Kitchen    | 2-3  | 24.99 |
| 101 | Toy train  | 3-5  | 12.99 |
+-----+------------+------+-------+

But this is not very useful! It has made the age column distinct, but the row it chooses to display with each distinct age range has effectively been chosen at random. Using GROUP BY in this way and retrieving several columns is seldom helpful.

Now consider this:

mysql> SELECT age FROM child_products GROUP BY age;
+------+
| age  |
+------+
| 0-2  |
| 2-3  |
| 3-5  |
+------+

This time products are grouped by age, but only age has been displayed in the result. In other words, the resultset contains only group level data, and not the misleading row-level data from within each group.

But note that this behavior is the same as SELECT DISTINCT, which you saw earlier today. The following query gives the same result:

mysql> SELECT DISTINCT age FROM child_products;
							

Note

Notice that GROUP BY is not just grouping but sorting as well. It has the same possibilities as ORDER BY—that is, ASC (the default) and DESC.

So you could do this, to sort the groups in descending order:

									mysql> SELECT age FROM child_products GROUP BY age DESC;
+------+
| age  |
+------+
| 3-5  |
| 2-3  |
| 0-2  |
+------+

However, it's probably clearer to use GROUP BY age ORDER BY age DESC, even though it's a little more verbose.


The most common use of GROUP BY is in conjunction with aggregating functions. These perform some operation on the values retrieved within each group (even though the individual values do not appear in the resultset).

SUM() and COUNT() are commonly used aggregating functions, which add up items by group and count the number of items in each group, respectively.

Here's an example of GROUP BY with SUM(). It adds up the total cost of items in each age range:

mysql> SELECT age, SUM(price) FROM child_products GROUP BY age;
+------+------------+
| age  | SUM(price) |
+------+------------+
| 0-2  |      12.50 |
| 2-3  |      24.99 |
| 3-5  |      30.48 |
+------+------------+

You can use COUNT() in much the same way, to return a count of items in each group:

mysql> SELECT age, COUNT(*) FROM child_products GROUP BY age;
+------+----------+
| age  | COUNT(*) |
+------+----------+
| 0-2  |        1 |
| 2-3  |        1 |
| 3-5  |        3 |
+------+----------+

Why is there a * in COUNT(*)? The asterisk (*) tells the query to look for any row that exists and count it. It doesn't care what the row's values are, just that the row is there.

You can name a column, COUNT(price) for example, which would only count items where price is not NULL or empty. * is used here just in case any product has no price; this query would still count it.

As well as SUM() and COUNT(), the expressions that may be used with GROUP BY include the following:

  • AVG() Return the mean, or average, value of data in the group

  • MAX() The maximum value

  • MIN()— The minimum value

  • STD()— The standard deviation

Caution

Don't name columns in your GROUP BY clause that are also used by your aggregating function. It defeats the object of grouping!

For example, this is nonsense because nothing gets grouped:

mysql> SELECT age, SUM(price) FROM child_products GROUP BY price;
+------+------------+
| age  | SUM(price) |
+------+------------+
| 3-5  |       7.50 |
| 3-5  |       9.99 |
| 0-2  |      12.50 |
| 3-5  |      12.99 |
| 2-3  |      24.99 |
+------+------------+

You would also get a misleading result if there are several products at the same price because a value for age would be returned at random.


Here's a slightly more complex example, combining the averaging function AVG(), GROUP BY, and ORDER BY, and using an alias:

mysql> SELECT age, AVG(price) AS avg_p FROM child_products
    -> GROUP BY age
    -> ORDER BY avg_p;
+------+-----------+
| age  | avg_p     |
+------+-----------+
| 3-5  | 10.160000 |
| 0-2  | 12.500000 |
| 2-3  | 24.990000 |
+------+-----------+

This query finds the average price of items in each age group and then sorts by the average price (the alias avg_p).

A GROUP BY clause must always be placed before an ORDER BY clause.

Restricting the Resultset Using LIMIT

When you run a SELECT query, you may not want to return the entire resultset to your application.

You may just want the first row, or a small number of rows, and it's more efficient to restrict this in the SELECT than to return a large resultset to your application and discard all but a few rows.

Alternatively, your application may be a kind of search engine, which has a paginated output. You want a handy way of selecting the first 20 rows for display to the user, or perhaps the second 20 rows, and so on.

These kinds of controls are made easy with the LIMIT keyword. You use LIMIT like this:

SELECT
  select_expression,...
  ...
  [LIMIT [offset,] num_rows]

As you can see, you may pass it just one parameter, an integer num_rows; this tells SELECT the maximum number of rows to return.

For example, here's a query that limits the resultset to two rows:

mysql> SELECT * FROM child_products LIMIT 2;
+-----+------------+------+-------+
| id  | name       | age  | price |
+-----+------------+------+-------+
| 101 | Toy train  | 3-5  | 12.99 |
| 102 | Racing car | 3-5  |  9.99 |
+-----+------------+------+-------+

(Note that LIMIT specifies the maximum number to return; if the entire resultset has fewer rows than this, you'll only get what's there!)

LIMIT is commonly used with either an ORDER BY or GROUP BY clause. It's not essential, but without something to control the order of the output, you will get a random selection of rows. So this usage is common:

mysql> SELECT * FROM child_products ORDER BY id LIMIT 2;
+-----+------------+------+-------+
| id  | name       | age  | price |
+-----+------------+------+-------+
| 101 | Toy train  | 3-5  | 12.99 |
| 102 | Racing car | 3-5  |  9.99 |
+-----+------------+------+-------+

This time the resultset is ordered by id, and because this has unique values, the query will return the same data every time.

You can also specify an offset: in other words, the number of rows to ignore before capturing num_rows rows. In the following example, four rows are ignored before looking for two rows to return:

mysql> SELECT * FROM child_products ORDER BY id LIMIT 4,2;
+-----+---------+------+-------+
| id  | name    | age  | price |
+-----+---------+------+-------+
| 105 | Kitchen | 2-3  | 24.99 |
+-----+---------+------+-------+

This skips four rows and then tries to return two. But because only five rows are in the table, it returns only one. If your offset was greater than the number of rows in the table, the query would return an empty resultset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.79.121