A MySQL Tutorial

You have all the background you need now; it's time to put MySQL to work!

This section will help you familiarize yourself with MySQL by providing a tutorial for you to try out. As you work through the tutorial, you will create a sample database and the tables within it, then interact with the database by adding, retrieving, deleting, and modifying information. In addition, during the process of working with the sample database, you will learn the following things:

  • How to communicate with a MySQL server using the mysql client program.

  • The basics of the SQL language that MySQL understands. (If you already know SQL from having used some other RDBMS, it would be a good idea to skim through this tutorial to see whether or not MySQL's version of SQL differs from the version with which you are familiar.)

As noted in the previous section, MySQL operates using a client/server architecture in which the server runs on the machine containing the databases and clients connect to the server over a network. This tutorial is based largely on the use of the mysql client program. mysql reads SQL queries from you, sends them to the server to be executed, and displays the results so you can see what happened. mysql runs on all platforms supported by MySQL and provides the most direct means of interacting with the server, so it's the logical client to begin with.

Throughout this book, we'll use samp_db as the name of our sample database. However, you may need to use a different database name as you work through the examples. Someone else on your system may be using the name samp_db already, or your administrator may assign you a different database name. In either case, substitute the actual name of your database for samp_db whenever you see the latter in examples.

Table names can be used exactly as shown in the examples, even if multiple people on your system have their own sample databases. In MySQL, it doesn't matter if someone else uses the same table names. As long as each of you has your own database, MySQL will keep the names straight and prevent you from interfering with each other.

Preliminary Requirements

To try the examples in this tutorial, MySQL must be installed. In particular, you must have access to MySQL clients and to some MySQL server. The client programs must be located on your machine. You'll need at least mysql, and mysqlimport is useful as well. The server can be located on your machine, though that is not required. As long as you have permission to connect to it, the server can be located anywhere.

If the server does happen to be running on your machine, the client programs likely are installed as well, and you should be ready to go. If you need to get MySQL, see Appendix A, "Obtaining and Installing Software," for instructions. Either refer to that chapter (if you're installing MySQL yourself) or show it to your system administrator. If your network access comes through an Internet service provider (ISP), check whether or not your provider has MySQL. If your ISP doesn't offer MySQL service and won't install it, check Appendix J, "Internet Service Providers," for some advice on choosing a more suitable provider.

In addition to the MySQL software, you'll need permission to create the sample database and its tables. If you don't have permission, ask your MySQL administrator. The administrator can give you this permission by running mysql and issuing commands like these:

GRANT ALL ON samp_db.* TO paul@localhost IDENTIFIED BY "secret"
GRANT ALL ON samp_db.* TO paul@% IDENTIFIED BY "secret"

The Difference Between MySQL and mysql

To avoid confusion, I should point out that "MySQL" refers to the entire MySQL RDBMS and "mysql" is the name of a particular client program. They sound the same if you pronounce them, but they're distinguished here by capitalization and typeface differences.

Speaking of pronunciation, MySQL is pronounced "my-ess-queue-ell." We know this because the MySQL Reference Manual says so. On the other hand, SQL is pronounced "sequel" or "ess-queue-ell," depending on who you ask. I'm not going to take sides. Pronounce it how you like, but be prepared for the eventuality that you'll run into someone who will correct you and inform you of the "proper" pronunciation!


The first command gives paul complete access to the samp_db database and all tables in it when paul connects from localhost (the same host the server is running on). It also assigns a password of secret. The second command is similar but allows paul to connect from any host ('%' is a wildcard character). You could also substitute a specific hostname for '%' to allow paul to connect from just that host. (Such a GRANT statement may be necessary if your server allows anonymous access from localhost, due to the way the server searches the grant tables for matches to incoming connections.) More information on the GRANT statement and setting up MySQL user accounts may be found in Chapter 11, "General MySQL Administration."

Obtaining the Sample Database Distribution

This tutorial refers at certain points to files from the "sample database distribution." These are files containing queries or data that will help you set up the sample database. To get this distribution, see Appendix A. When you unpack the distribution, it will create a directory named samp_db containing the files you'll need. I recommend that you move into that directory whenever you're trying out examples pertaining to the sample database.

Establishing and Terminating Connections to the Server

To connect to your server, invoke the mysql program from your shell (that is, from your UNIX prompt, or from a DOS console under Windows). The command looks like this:

% mysql
							options
						

I use '%' throughout this book to indicate the shell prompt. That's one of the standard UNIX prompts; another is '$'. Under Windows, the prompt is something like 'C:>'

The options part of the mysql command line may be empty, but more likely you'll have to issue a command something like this:

% mysql -h
							host_name
							-u
							user_name
							-p
						

You may not need to supply all those options when you invoke mysql; ask your MySQL administrator for the exact command to use. You'll probably need to specify at least a name and password.

When you're just starting to learn MySQL, you'll probably consider its security system to be an annoyance because it makes it harder to do what you want. (You must obtain permission to create and access a database, and you must specify your name and password whenever you connect to the server.) However, after you move beyond the sample database to entering and using your own records, your perspective will change radically. Then you'll appreciate that MySQL keeps other people from snooping through (or worse, destroying!) your stuff.

Here's what the options mean:

  • -h host_name (alternate form: --host=host_name)

    The server host you want to connect to. If the server is running on the same machine that you are running mysql on, this option normally may be omitted.

  • -u user_name (alternate form: --user=user_name)

    Your MySQL user name. If you're using UNIX and your MySQL username is the same as your login name, you can omit this option; mysql will use your login name as your MySQL name.

    Under Windows, the default user name is ODBC. This is not likely to be very useful. Either specify a name on the command line, or set a default in your environment by setting the USER variable. For example, I can use the following set command to specify a user name of paul:

    set USER=paul
    
  • -p (alternate form:--password)

    This option tells mysql to prompt you for your MySQL password. Note: you can give your password on the command line by typing it as -pyour_password (alternate form: --password=your_password). However, for security reasons, it's best not to do that. Specifying -p with no password following it tells mysql to prompt you for a password when it starts up. For example:

    % mysql -h
    									host_name
    									-u
    									user_name
    									-p
    Enter password:
    

    When you see Enter password:, type in your password. (It won't be echoed to the screen, in case someone's looking over your shoulder.) Note that your MySQL password is not necessarily the same as your UNIX or Windows password.

    If you omit the -p option entirely, mysql assumes you don't need one and doesn't prompt for it.

    Note: The -h and -u options are associated with the words that follow them, whether or not there is a space between the option and the following word. This is not true for -p. If you specify the password on the command line, there must be no space between -p and the password.

Example: Suppose my MySQL user name and password are paul and secret, and that I want to connect to the server that is running on the same machine that I'm logged in on. The mysql command to do this is:

% mysql -u paul -p
Enter password: ******
						

After I enter the command, mysql prints Enter password: to prompt for my password, and I type it in (the ****** indicates where I type secret).

If all goes well, mysql prints a greeting and a 'mysql>' prompt indicating that it is waiting for me to issue queries. The full startup sequence looks like this:

% mysql -u paul -p
Enter password: ******
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 1805 to server version: 3.22.25-log

Type 'help' for help.

mysql>

To connect to a server running on some other machine, I need to specify the hostname using -h. If that host is pit-viper.snake.net, the command looks like this:

% mysql -h pit-viper.snake.net -u paul -p
						

In most of the examples that follow that show a mysql command line, I'm going to leave out the -h, -u, and -p options for brevity. It's assumed that you'll supply whatever options are necessary.

There are ways to set up your account so you don't have to type in connection parameters each time you run mysql. This is discussed in "Tips for Interacting with mysql." You may want to skip ahead to that section right now to get an idea of some of the possibilities for making it easier to connect to the server.

After you establish a connection to the server, you can terminate your session any time by typing QUIT:

mysql> QUIT
Bye

You can also quit by typing Control-D, at least on UNIX.

Issuing Queries

After you're connected to the server, you're ready to issue queries. This section describes some general things you should know about interacting with mysql.

To enter a query in mysql, just type it in. At the end of the query, type a semicolon (';') and press Enter. The semicolon tells mysql that the query is complete. (You can also use 'g' to terminate queries if you prefer to type two characters rather than one.)

After you've entered a query, mysql sends it to the server to be executed. The server processes the query and sends the results back to mysql, which displays the result for you.

Here's an example of a simple query and the result:

mysql> SELECT NOW();
+---------------------+
| NOW()               |
+---------------------+
| 1999-07-24 11:02:36 |
+---------------------+
1 row in set (0.00 sec)

This tells me the current date and time. (The NOW() function is not so useful by itself, but it can be used in expressions—for example, to calculate the difference between the current date and another date.)

mysql also displays a count of the number of rows in the result. I usually will not show this count in examples.

Because mysql waits for the semicolon before sending the query to the server, you don't need to enter it on a single line. You can spread a query over several lines if you want:

mysql> SELECT NOW(),
    -> USER(),
    -> VERSION()
    -> ;
+---------------------+----------------+------------------+
| NOW()               | USER()         | VERSION()        |
+---------------------+----------------+------------------+
| 1999-07-24 11:06:16 | paul@localhost | 3.23.1-alpha-log |
+---------------------+----------------+------------------+

Note how the prompt changes from 'mysql' to '->' after you enter the first line of the query; this tells you that mysql thinks you're still entering the query. That's important feedback because if you forget the semicolon at the end of a query, the prompt helps you realize that mysql is still waiting for something. Otherwise, you'll be waiting, wondering why it's taking MySQL so long to execute your query, and mysql will be waiting, wondering why it's taking you so long to finish entering your query!

For the most part, it doesn't matter whether you enter queries using uppercase, lowercase, or mixed case. These queries are all equivalent:

SELECT USER()
select user()
SeLeCt UsEr()

The examples in this book use uppercase for SQL keywords and function names, and lowercase for database, table, and column names.

When you invoke a function in a query, there must be no space between the function name and the following parenthesis:

mysql> SELECT NOW();
+---------------------+
| NOW()               |
+---------------------+
| 1999-07-17 12:44:52 |
+---------------------+
mysql> SELECT NOW ();
ERROR 1064: You have an error in your SQL syntax near '()' at line 1

These two queries look similar, but the second one fails because the parenthesis doesn't immediately follow the function name.

If you've begun typing in a multiple-line query and decide you don't want to execute it, type 'c' to clear (cancel) it:

mysql> SELECT NOW(),
    -> VERSION(),
    -> c
mysql>

Notice that the prompt changes back to 'mysql>' to indicate that mysql is ready for a new query.

You can store queries in a file and tell mysql to read queries from the file rather than waiting for you to type them in at the keyboard. Use your shell's input redirection facilities for this. For example, if I have queries stored in a file my_file.sql, I can execute its contents like this:

% mysql < my_file.sql
						

You can call the file whatever you want. I use the '.sql' suffix to indicate that the file contains SQL statements.

Executing mysql this way is something that will come up in "Adding New Records" when we enter data into the samp_db database. It's a lot more convenient to load a table by having mysql read INSERT statements from a file than to type in each statement manually.

The remainder of this tutorial shows many queries that you can try out for yourself. These are indicated by the 'mysql>' prompt before the query and the terminating semicolon after it, and such examples are usually accompanied by the output of the query. You should be able to type in these queries as shown, and the resulting output should be the same.

Queries that are shown without a prompt or the semicolon statement terminator are intended simply to illustrate a point, and you need not execute them. (You can try them out if you like, but if you do, remember to add a semicolon at the end.)

In later chapters of this book, I usually will not show the 'mysql>' prompt or the semicolon with SQL statements. The reason for this is that you can issue queries in contexts other than the mysql client program (for example, in Perl scripts or in PHP scripts), and in those contexts, no prompt appears and no semicolon is needed. I will indicate cases in which it is specifically intended that you enter a query in mysql.

Creating the Database

We'll begin by creating the samp_db sample database and the tables within it, populating its tables, and performing some simple queries on the data contained in those tables.

Using a database involves several steps:

  1. Creating (initializing) the database

  2. Creating the tables within the database

  3. Interacting with the tables by inserting, retrieving, modifying, or deleting data

Retrieving existing data is easily the most common operation performed on a database. The next most common operations are inserting new data and updating or deleting existing data. Less frequent are table creation operations, and least frequent of all is database creation.

We're beginning from scratch, so we must begin with database creation, the least common thing, and work our way through table creation and insertion of our initial data before we get to where we can do the really common thing—retrieving data.

To create a new database, connect to the server using mysql and then issue a CREATE DATABASE statement that specifies the database name:

mysql> CREATE DATABASE samp_db;
						

You'll need to create the samp_db database before you can create any of the tables that will go in it or do anything with the contents of those tables.

Does creating the database make it the current (default) database? No, it doesn't, as you can see by executing the following query:

mysql> SELECT DATABASE();
+------------+
| DATABASE() |
+------------+
|            |
+------------+

To make samp_db current, issue a USE statement:

mysql> USE samp_db
						

USE is one of the few statements that require no terminating semicolon, although you can add it if you want. HELP is another, and if you issue a HELP statement, it will show you the full list of statements that don't need a semicolon.

After you issue the USE statement, samp_db is the default database:

mysql> SELECT DATABASE();
+------------+
| DATABASE() |
+------------+
| samp_db    |
+------------+

The other way to make a database current is to name it on the command line when you invoke mysql:

% mysql samp_db
						

That is, in fact, the usual way to name the database you want to use. If you need any connection parameters, specify them before the database name. For example, the following two commands allow me to connect to the samp_db database on the local host and on pit-viper.snake.net:

% mysql -u paul -p samp_db
% mysql -h pit-viper.snake.net -u paul -p samp_db
						

Unless specified otherwise, all the examples that follow assume that when you invoke mysql, you name the samp_db database on the command line to make it the current database. If you invoke mysql but forget to name the database on the command line, just issue a USE samp_db statement.

Creating Tables

In this section, we'll build the tables needed for the samp_db sample database. First, we'll consider the tables needed for the Historical League. Then we'll consider those needed for the grade-keeping project. This is the part where some database books start talking about Analysis and Design, Entity-Relationship Diagrams, Normalization Procedures, and other such stuff. There's a place for all that, but I prefer just to say we need to think a bit about what our database will look like: what tables it should contain, what the contents of each table should be, and some of the issues involved in deciding how to represent our data.

The choices made here about data representation are not absolute. In other situations, you might well elect to represent similar data in a different way, depending on the requirements of your applications and the uses to which you intend to put your data.

Tables for the Historical League

Table layout for the Historical League is pretty simple:

  • A president table. This contains a descriptive record for each US president. We'll need this for the online quiz on the League Web site (the interactive analog to the printed quiz that appears in the children's section of the League's newsletter).

  • A member table. This is used to maintain current information about each member of the League. It'll be used for creating printed and online versions of the member directory, sending automated membership renewal reminders, and so forth.

The president Table

The president table is simpler, so let's discuss it first. This table will contain some basic biographical information about each United States president:

  • Name. Names can be represented in a table several ways. For example, we could have a single column containing the entire name, or separate columns for the first and last name. It's certainly simpler to use a single column, but that limits you in some ways:

    • If you enter the names with the first name first, you can't sort on last name.

    • If you enter the names with the last name first, you can't display them with the first name first.

    • It's harder to search for names. For example, to search for a particular last name, you must use a pattern and look for names that match the pattern. This is less efficient and slower than looking for an exact last name.

    Our member table will use separate columns for the first and last name to avoid these limitations.

    The first name column will also hold the middle name or initial. This shouldn't break any sorting we might do because it's not likely we'll want to sort on middle name (or even first name). Name display should work properly, too, whether a name is printed in "Bush, George W." or in "George W. Bush" format. The middle name immediately follows the first name either way.

    There is another slight complication. One president (Jimmy Carter) has a "Jr." at the end of his name. Where does that go? Depending on the format in which names are printed, this president's name is displayed as "James E. Carter, Jr.," or "Carter, James E., Jr." The "Jr." doesn't associate with either first or last name, so we'll create another column to hold a name suffix. This illustrates how even a single value can cause problems when you're trying to determine how to represent your data. It also shows why it's a good idea to know as much as possible about the type of data values you'll be working with before you put them in a database. If you have incomplete knowledge of what your data look like, you may have to change your table structure after you've already begun to use it. That's not necessarily a disaster, but in general it's something you want to avoid.

  • Birthplace (city and state). Like the name, this too can be represented using a single column or multiple columns. It's simpler to use a single column, but as with the name, separate columns allow you to do some things you can't do easily otherwise. For example, it's easier to find records for presidents born in a particular state if city and state are listed separately.

  • Birth date and death date. The only special problem here is that we can't require the death date to be filled in because some presidents are still alive. MySQL provides a special value NULL that means "no value"— we can use that in the death date column to mean "still alive."

The member Table

The member table for the Historical League membership list is similar to the president table in the sense that each record contains basic descriptive information for a single person. But each member record contains more columns:

  • Name. We'll use the same three-column representation as for the president table: last name, first name (and middle if available), suffix.

  • ID number. This is a unique value assigned to each member when a membership first begins. The League hasn't ever used ID numbers before, but now that the records are being made more systematic, it's a good time to start. (I am anticipating that you'll find MySQL beneficial and that you'll think of other ways to apply it to the League's records. When that happens, it'll be easier to associate records in the member table with other member-related tables you may create if you use numbers rather than names.)

  • Expiration date. Members must renew their memberships periodically to avoid having them lapse. For some applications, you might use the date of the most recent renewal, but this is not suitable for the League's purposes. Memberships can be renewed for a variable number of years (typically one, two, three, or five years), and a date for the most recent renewal wouldn't tell you when the next renewal must take place. In addition, the League allows lifetime memberships. We could represent these with a date far in the future, but NULL seems more appropriate because "no value" logically corresponds to "never expires."

  • Email address. For members that have email addresses, this will allow them to communicate with each other easily. For your purposes as League secretary, this will allow you to send out membership renewal notices electronically rather than by postal mail. This should be easier than going to the post office and less expensive, too. You'll also be able to use email to send members the current contents of their directory entries and ask them to update the information as necessary.

  • Postal address. This is needed for contacting members that don't have email (or who don't respond to it). We'll use columns for street address, city, state, and Zip code. The street address column can double for box number for members that have an address such as P.O. Box 123 rather than 123 Elm St.

    I'm assuming that all League members live in the United States. For organizations with a membership that is international in scope, that assumption is an oversimplification, of course. If you want to deal with addresses from multiple countries, you'll run into some sticky issues having to do with the different address formats used for different countries. For example, Zip code is not an international standard, and some countries have provinces rather than states.

  • Phone number. Like the address fields, this is useful for contacting members.

  • Special interest keywords. Every member is assumed to be have a general interest in US history, but members probably also have some special areas of interest. This column records those interests. Members can use it to find other members with similar interests.

Creating the Tables

Now we're ready to create the Historical League tables. For this we use the CREATE TABLE statement, which has the following general form:

CREATE TABLE tbl_name ( column_specs )

tbl_name indicates the name you want to give the table. column_specs provides the specifications for the columns in the table, as well as any indexes (if you have any). Indexes make lookups faster; we'll discuss them further in Chapter 4, "Query Optimization."

The CREATE TABLE statement for the president table looks like this:

CREATE TABLE president
(
    last_name VARCHAR(15) NOT NULL,
    first_name VARCHAR(15) NOT NULL,
    suffix VARCHAR(5) NULL,
    city VARCHAR(20) NOT NULL,
    state VARCHAR(2) NOT NULL,
    birth DATE NOT NULL,
    death DATE NULL
)

If you want to type that statement in yourself, invoke mysql, making samp_db the current database:

% mysql samp_db
								

Then enter the CREATE TABLE statement as just shown. (Remember to add a semicolon at the end or mysql won't know where the end of the statement is.)

To create the president table by using a prewritten description file from the sample database distribution (see "Obtaining the Sample Database Distribution"), run this command from the shell:

% mysql samp_db < create_president.sql
								

Whichever way you invoke mysql, specify any connection parameters you may need (hostname, username, or password) on the command line preceding the database name.

Each column specification in the CREATE TABLE statement consists of the column name, the type (the kind of values the column will hold), and possibly some column attributes.

The two column types used in the president table are VARCHAR and DATE. VARCHAR(n) means the column contains variable-length character (string) values, with a maximum length of n characters each. You choose the value of n according to how long you expect your values to be. state is declared as VARCHAR(2); that's all we need if states are entered using their two-character abbreviations. The other string-valued columns need to be wider to accommodate longer values.

The other column type we've used is DATE. This type indicates, not surprisingly, that the column holds date values. However, what may be surprising to you is that dates are represented with the year first. The standard format is "YYYY-MM-DD" (for example, "1999-07-18"). This is the ANSI SQL standard for date representation.

The only column attributes we're using for the president table are NULL (values can be missing) and NOT NULL (values must be filled in).Most columns are NOT NULL, because we'll always have a value for them. The two columns that can have NULL values are suffix (most names don't have one), and death (some presidents are still alive, so there is no date of death).

The CREATE TABLE statement for the member table looks like this:

CREATE TABLE member
(
    last_name VARCHAR(20) NOT NULL,
    first_name VARCHAR(20) NOT NULL,
    suffix VARCHAR(5) NULL,
    expiration DATE NULL DEFAULT "0000-00-00",
    email VARCHAR(100) NULL,
    street VARCHAR(50) NULL,
    city VARCHAR(50) NULL,
    state VARCHAR(2) NULL,
    zip VARCHAR(10) NULL,
    phone VARCHAR(20) NULL,
    interests VARCHAR(255) NULL
)

Type that statement into mysql or execute the following shell command:

% mysql samp_db < create_member.sql
								

In terms of column types, the member table is not very interesting: every column except one is created as a variable-length string. The exception, expiration, is a DATE. The expiration value has a default value of "0000-00-00", which is a non-NULL value that means no legal date has been entered. The reason for this is that expiration can be NULL to indicate that a member has a lifetime membership. However, because the column can be NULL, that becomes the default value unless you specify a different default. If you created a new member record but forgot to specify the expiration date, the member would become a lifetime member! By using a default value of "0000-00-00", we avoid this problem. It also gives us a value we can search for periodically to find records for which the expiration date was never properly entered.

Note that I've "forgotten" to put in a column for the membership ID number.

That's because I need an excuse to use the ALTER TABLE statement later on, and leaving out this column provides that excuse.

Let's make sure that MySQL created the tables as we expect. In mysql, issue the following query:

mysql> DESCRIBE president;
+------------+-------------+------+-----+------------+-------+
| Field      | Type        | Null | Key | Default    | Extra |
+------------+-------------+------+-----+------------+-------+
| last_name  | varchar(15) |      |     |            |       |
| first_name | varchar(15) |      |     |            |       |
| suffix     | varchar(5)  | YES  |     | NULL       |       |
| city       | varchar(20) |      |     |            |       |
| state      | char(2)     |      |     |            |       |
| birth      | date        |      |     | 0000-00-00 |       |
| death      | date        | YES  |     | NULL       |       |
+------------+-------------+------+-----+------------+-------+

As of MySQL 3.23, the output includes another column showing access privilege information. I've not shown it here, as it makes the lines too long to show easily.

The output looks pretty much as we'd expect, except that the information for the state column says its type is CHAR(2). That's odd; wasn't it declared as VARCHAR(2)? Yes, it was, but MySQL has silently changed the type from VARCHAR to CHAR. The reason for this has to do with efficiency of storage space for short character columns, which I won't go into here. If you want the details, check the discussion of the ALTER TABLE statement in Chapter 3, "MySQL SQL Syntax and Use," provides the details. For our purposes here, there is no difference between the two types.

If you issue a DESCRIBE member query, mysql will show you similar information for the member table.

DESCRIBE is useful when you forget the name of a column in a table, need to know the column type, need to know how wide the column is, and so forth. It's also useful for finding out the order in which MySQL stores columns in table rows. That order is important when you use INSERT or LOAD DATA statements that expect column values to be listed in the default column order.

DESCRIBE may be abbreviated as DESC, or, if you prefer to type more rather than less, SHOW COLUMNS FROM tbl_name is a synonym for DESCRIBE tbl_name.

What if you forget the names of your tables? Then you can use SHOW TABLES. For the samp_db database, with the two tables we've created so far, the output looks like this:

mysql> SHOW TABLES;
+-------------------+
| Tables_in_samp_db |
+-------------------+
| member            |
| president         |
+-------------------+

If you can't even remember the name of your database, invoke mysql without specifying a database name on the command line, then issue a SHOW DATABASES query:

mysql> SHOW DATABASES;
+-----------+
| Database  |
+-----------+
| menagerie |
| mysql     |
| samp_db   |
| test      |
+-----------+

The list of databases varies from server to server, but you should see at least samp_db and mysql; the latter database holds the grant tables that control MySQL access privileges.

The DESCRIBE and SHOW queries have command-line equivalents that you can use from the shell:

% mysqlshowList all databases, like SHOW DATABASES
% mysqlshow db_nameList tables in given database, like SHOW TABLES
%mysqlshow db_name tbl_nameList columns in given table, like DESCRIBE tbl_name

Tables for the Grade-Keeping Project

To see what tables we need for the grade-keeping project, let's look at how you might write down scores when you use a paper-based gradebook. Figure 1.2 shows a page from your gradebook. The main body of this page is a matrix for recording scores. There is also other information necessary for making sense of the scores. Student names and ID numbers are listed down the side of the matrix. (For simplicity, only four students are shown.) Along the top of the matrix, you put down the dates when you give quizzes and tests. The figure shows that you've given quizzes on September 3, 6, 16, and 23, and tests on September 9 and October 1.

Figure 1.2. Example gradebook.


To keep track of this kind of information using a database, we need a score table. What should records in this table contain? That's easy. For each row, we need student name, the date of the quiz or test, and the score. Figure 1.3 shows how some of the scores from the gradebook look when represented in a table like this. (Dates are written the way MySQL represents them, in "YYYY-MM-DD" format.)

Figure 1.3. Initial score table layout.


However, there seems to be a problem with setting up the table in this way. It appears to leave out some information. For example, looking at the records in Figure 1.3, we can't tell whether scores are for a quiz or a test. It could be important to know score types when determining final grades if quizzes and tests are weighted differently. We might try to infer the type from the range of scores on a given date (quizzes usually are worth fewer points than a test), but that's ugly because it relies on inference and not something explicit in the data.

It's possible to distinguish scores by recording the type in each record, for example, by adding a column to the score table that contains 'T' or 'Q' for each row to indicate "test" or "quiz," as in Figure 1.4. This has the advantage of making the type of score explicit in the data. The disadvantage is that this information is somewhat redundant. Observe that for all records with a given date, the score type column always has the same value. The scores for September 23 all have a type of 'Q', and the scores for October 1 all have a type of 'T'. This is unappealing. If we record a set of scores for a quiz or test this way, not only will we be putting in the same date for each new record in the set, we'll be putting in the same score type over and over again. Ugh. Who wants to enter all that redundant information?

Figure 1.4. score table layout, revised to include score type.


Let's try an alternative representation. Instead of recording score types in the score table, we'll figure them out from the dates. We can keep a list of dates and use it to keep track of what kind of "grade event" (quiz or test) occurred on each date. Then we can determine whether any given score was from a quiz or a test by combining it with the information in our event list: Just match the date in the score table record with the date in the event table to get the event type. Figure 1.5 shows this table layout and demonstrates how the association works for a score table record with a date of September 23. By matching the record with the corresponding record in the event table, we see that the score is from a quiz.

Figure 1.5. score and event table, linked on date.


This is much better than trying to infer the score type based on some guess; instead, we're deriving the type directly from data recorded explicitly in our database. It's also preferable to recording score types in the score table because we must record each type only once.

However, if you're like me, when you first hear about this kind of thing (combining information from multiple tables), you think, "Yeah, that's a cute idea, but isn't it a lot of work to do all that looking up all the time; doesn't it just make things more complicated?"

In a way, you're right. Keeping two lists of records is more complicated than keeping one list. But take another look at your gradebook (see Figure 1.2). Aren't you already keeping two sets of records? Consider these facts:

  • You keep track of scores using the cells in the score matrix, where each cell is indexed by student name and date (down the side and along the top of the matrix). This represents one set of records; it's analogous to the contents of the score table.

  • How do you know what kind of event each date represents? You've written a little 'T' or 'Q' above the date! Thus, you're also keeping track of the association between date and score type along the top of the matrix. This represents a second set of records; it's analogous to the event table contents.

In other words, even though you may not think about it as such, you're really not doing anything different with the gradebook than what I'm proposing to do by keeping information in two tables. The only real difference is that the two kinds of information aren't so explicitly separated in the gradebook.

The page in the gradebook illustrates something about the way we think of information, and about the difficulty of figuring out how to put information in a database: We tend to integrate different kinds of information and interpret them as a whole. Databases don't work like that, which is one reason they sometimes seem artificial and unnatural. Our natural tendency to unify information makes it quite difficult sometimes even to realize when we have multiple types of data instead of just one. Because of this, you may find it a challenge to "think as a database thinks" about how your data should be represented.

One requirement imposed on the event table by the layout shown in Figure 1.5 is that the dates be unique because the date is used to link together records from the score and event tables. In other words, you cannot give two quizzes on the same day, or a quiz and a test. If you do, you'll have two sets of records in the score table and two records in the event table, all with the same date, and you won't be able to tell how to match score records with event records.

That's a problem that will never come up if there is never more than one grade event per day, but is it really valid to assume that will never happen? It might seem so; after all, you don't consider yourself sadistic enough to give a quiz and a test on the same day. But I hope you'll pardon me if I'm skeptical. I've often heard people claim about their data, "That odd case will never occur." Then it turns out the odd case does occur on occasion, and usually you have to redesign your tables to fix problems that the odd case causes.

It's better to think about the possible problems in advance and anticipate how to handle them. So, let's suppose you might need to record two sets of scores for the same day sometimes. How can we handle that? As it turns out, this problem isn't so difficult to solve. With a minor change to the way we lay out our data, multiple events on a given date won't cause trouble:

  1. Add a column to the event table and use it to assign a unique number to each record in the table. In effect, this gives each event its own ID number, so we'll call this the event_id column. (If this seems like an odd thing to do, consider that your gradebook in Figure 1.2 already has this property: The event ID is just like the column number in your gradebook score matrix. The number might not be written down explicitly there and labeled "event ID," but that's what it is.)

  2. When you put scores in the score table, record the event ID rather than the date.

The result of these changes is shown in Figure 1.6. Now you link together the score and event tables using the event ID rather than the date, and you use the event table to determine not just the type of each score, but also the date on which it occurred. Also, it's no longer the date that must be unique in the event table, it's the event ID. This means you can have a dozen tests and quizzes on the same day, and you'll be able to keep them straight in your records. (No doubt your students will be thrilled to hear this.)

Figure 1.6. score and event tables, linked on event ID.


Unfortunately, from a human standpoint, the table layout in Figure 1.6 seems less satisfactory than the previous ones. The score table is more abstract because it contains fewer columns with a readily apparent meaning. The table layout shown earlier in Figure 1.4 was easy to look at it and understand because the score table had columns for both dates and score types. The current score table shown in Figure 1.6 has columns for neither. This seems highly removed from anything we can think about easily. Who wants to look at a score table that has "event IDs" in it? That just doesn't mean much to us.

At this point you reach a crossroads. You're intrigued by the possibility of being able to perform grade-keeping electronically and not having to do all kinds of tedious manual calculations when assigning grades. But after considering how you actually would represent score information in a database, you're put off by how abstract and disconnected the representation seems to make that information.

This leads naturally to a question: "Would it be better not to use a database at all? Maybe MySQL isn't for me." As you might guess, I will answer that question in the negative because otherwise this book will come to a quick end. But when you're thinking about how to do a job, it's not a bad idea to consider various alternatives and to ask whether you're better off using a database system, such as MySQL, or something else, such as a spreadsheet:

  • The gradebook has rows and columns, and so does a spreadsheet. This makes the gradebook and a spreadsheet conceptually and visually very similar.

  • A spreadsheet can perform calculations, so you could total up each student's scores using a calculation field. It might be a little tricky to weight quizzes and tests differently, but you could do it.

On the other hand, if you want to look at just part of your data (quizzes only or tests only, for example), perform comparisons such as boys versus girls, or display summary information in a flexible way, it's a different story. A spreadsheet doesn't work so well, whereas relational database systems perform those operations easily.

Another point to consider is that the abstract and disconnected nature of your data as represented in a relational database is not really that big of a deal, anyway. You have to think about that representation when setting up the database so that you don't lay out your data in a way that doesn't make sense for what you want to do with it. However, after you determine the representation, you're going to rely on the database engine to pull together and present your data in a way that is meaningful to you. You're not going to look at it as a bunch of disconnected pieces.

For example, when you retrieve scores from the score table, you don't want to see event IDs; you want to see dates. That's not a problem. The database will look up dates from the event table based on the event ID and show them to you. You may also want to see whether the scores are for tests or quizzes. That's not a problem, either. The database will look up score types the same way—using event ID. Remember, that's what a relational database system like MySQL is good at: relating one thing to another to pull out information from multiple sources to present you with what you really want to see. In the case of our grade-keeping data, MySQL does the thinking about pulling information together using event IDs so that you don't have to.

Now, just to provide a little advance preview of how you'd tell MySQL to do this relating of one thing to another, suppose you want to see the scores for September 23, 1999. The query to pull out scores for an event given on a particular date looks like this:

SELECT score.name, event.date, score.score, event.type
FROM score, event
WHERE event.date = "1999-09-23"
AND score.event_id = event.event_id

Pretty scary, huh? This query retrieves the student name, the date, score, and the type of score by joining (relating) score table records to event table records. The result looks like this:

+--------+------------+-------+------+
| name   | date       | score | type |
+--------+----------- +-------+------+
| Billy  | 1999-09-23 |    15 | Q |
| Missy  | 1999-09-23 |    14 | Q |
| Johnny | 1999-09-23 |    17 | Q |
| Jenny  | 1999-09-23 |    19 | Q |
+--------+------------+-------+------+

Notice anything familiar about the format of that information? You should; it's the same as the table layout shown in Figure 1.4! And you don't need to know the event ID to get this result. You specify the date you're interested in and let MySQL figure out which score records go with that date. If you've been wondering whether all the abstraction and disconnectedness loses us anything when it comes to getting information out of the database in a form that's meaningful to us, it doesn't.

Of course, after looking at that query, you might be wondering something else, too. Namely, it looks kind of long and complicated; isn't writing something like that a lot of work to go to just to find the scores for a given date? Yes, it is. However, there are ways to avoid typing several lines of SQL each time you want to issue a query. Generally, you figure out once how to perform a query such as that one and then you store it so that you can repeat it easily as necessary. We'll see how to do this in "Tips for Interacting with mysql."

I've actually jumped the gun a little bit in showing you that query. It is, believe it or not, a little simpler than the one we're really going to use to pull out scores. The reason for this is that we need to make one more change to our table layout. Instead of recording student name in the score table, we'll use a unique student ID. (That is, we'll use the value from the "ID" column of your gradebook rather than from the "Name" column.) Then we create another table called student that contains name and student_id columns (Figure 1.7).

Figure 1.7. student, score, and event tables, linked on student ID and event ID.


Why make this modification? For one thing, there might be two students with the same name. Using a unique student ID number helps you tell their scores apart. (This is exactly analogous to the way you can tell scores apart for a test and quiz given on the same day by using a unique event ID rather than the date.)

After making this change to the table layout, the query we'll actually use to pull out scores for a given date becomes a little more complex:

SELECT student.name, event.date, score.score, event.type
FROM event, score, student
WHERE event.date = "1999-09-23"
AND event.event_id = score.event_id
AND score.student_id = student.student_id

If you're concerned because you don't find the meaning of that query immediately obvious, don't be. Most people wouldn't. We'll see the query again after we get further along into this tutorial, but the difference between now and later is that later you'll understand it. And, no, I'm not kidding.

You'll note from Figure 1.7 that I added something to the student table that wasn't in your gradebook: It contains a column for sex. This will allow for simple things like counting the number of boys and girls in the class or more complex things like comparing scores for boys and girls.

We're almost done with the tables for the grade-keeping project. We need just one more table to record absences for attendance purposes. Its contents are relatively straightforward: a student ID number and a date (see Figure 1.8). Each row in the table indicates that the given student was absent on the given date. At the end of the grading period, we'll call on MySQL's counting abilities to summarize the table's contents to tell us how many times each student was absent.

Figure 1.8. absence table.


We're ready to create our grade-keeping tables now that we know what they should look like.

The CREATE TABLE statement for the student table looks like this:

CREATE TABLE student
(
    name VARCHAR(20) NOT NULL,
    sex ENUM('F','M') NOT NULL,
    student_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
)

Type that statement into mysql or execute the following shell command:

% mysql samp_db < create_student.sql
							

The CREATE TABLE statement creates a table named student with three columns: name, sex, and student_id.

name is a variable-length string column that can hold up to 20 characters. This name representation is simpler than the one used for the Historical League tables; it uses a single column rather than separate first name and last name columns. That's because I know in advance that no grade-keeping query examples will need to do anything that would work better with separate columns. (Yes, that's cheating. I admit it.)

sex represents whether a student is a boy or a girl. It's an ENUM (enumeration) column, which means it can take on only one of the values explicitly listed in the column specification: 'F' or 'M' for female or male. ENUM is useful when you have a restricted set of values that a column can hold. We could have used CHAR(1) instead, but ENUM makes it more explicit what the column values can be. If you issue a DESCRIBE tbl_name statement for a table that contains an ENUM column, MySQL will show you exactly what the possible values are.

By the way, values in an ENUM column need not be just a single character. The type column could have been declared as something like ENUM('female','male') instead.

student_id is an integer column that will contain unique student ID numbers. Normally, you'd probably get ID numbers for your students from a central source, such as the school office, but we'll just make them up. The student_id column declaration has several parts, even though it just contains numbers:

  • INT signifies that the column holds integers (values with no fractional part).

  • UNSIGNED disallows negative numbers.

  • NOT NULL means that the column value must be filled in. (No student can be without an ID number.)

  • AUTO_INCREMENT is a special attribute in MySQL. It works like this: If the value for the student_id column is missing (or NULL) when you create a new student table record, MySQL automatically generates a unique number that is one greater than the maximum value currently in the column. We'll use this fact when we load the student table, by specifying values only for the name and sex columns and letting MySQL generate the student_id column value for us.

  • PRIMARY KEY means that the column is indexed for fast lookups and that each value in the column must be unique. This prevents us from using the same ID twice by mistake, which is a desirable property for student ID numbers. (Not only that, but MySQL requires that every AUTO_INCREMENT column have a unique index.)

If you don't understand that stuff about AUTO_INCREMENT and PRIMARY KEY, just think of them as giving us a magic way of generating ID numbers for each student. It doesn't particularly matter what the values are, as long as they're unique.

Note: If you really were going to get student ID numbers from the office (and not generate them automatically), you could declare the student_id column the same way, except that you'd leave out the AUTO_INCREMENT attribute.

The event table looks like this:

CREATE TABLE event
(
    date DATE NOT NULL,
    type ENUM('T','Q') NOT NULL,
    event_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
)

Type that statement into mysql or execute the following shell command:

% mysql samp_db < create_event.sql
							

All the columns are declared as NOT NULL because none of them can be missing.

The date column holds a standard MySQL DATE value, in "YYYY-MM-DD" (year-first) format.

type represents score type. Like sex in the student table, type is an enumeration column. The allowable values are 'T' and 'Q', representing "test" and "quiz."

event_id is an AUTO_INCREMENT column, similar to student_id in the student table. Using AUTO_INCREMENT allows us to generate unique event ID values easily. As with the student_id column in the student table, the particular values are less important than that they be unique.

The score table looks like this:

CREATE TABLE score
(
    student_id INT UNSIGNED NOT NULL,
    event_id INT UNSIGNED NOT NULL,
    score INT NOT NULL,
    PRIMARY KEY (event_id, student_id)
)

Type that statement into mysql or execute the following shell command:

% mysql samp_db < create_score.sql
							

score is an INT (integer) column. That is, I'm assuming score values are always integers. If you wanted to allow scores such as 58.5 that have a fractional part, you'd use one of the floating-point column types such as FLOAT or DECIMAL instead.

The student_id and event_id columns are integers representing which student and event each score is for. By using them to link to the student and event tables, we'll be able to tell the student name and event date. We've also made the combination of the two columns a PRIMARY KEY. This ensures that we won't have duplicate scores for a student for a given quiz or test. Also, it'll be easier to change a score later. For example, when a score is entered incorrectly, we can clobber the old record when we put in the new records by using MySQL's REPLACE statement. It's not necessary to do a DELETE coupled with an INSERT; MySQL does it for us.

Note that it's the combination of event_id and student_id that is unique. In the score table, neither value is unique by itself. There will be multiple score records for each event_id value (one per student), and multiple records for each student_id value (one for each quiz and test).

The absence table for attendance looks like this:

CREATE TABLE absence
(
    student_id INT UNSIGNED NOT NULL,
    date DATE NOT NULL,
    PRIMARY KEY (student_id, date)
)

Type that statement into mysql or execute the following shell command:

% mysql samp_db < create_absence.sql
							

The student_id and date columns are both declared as NOT NULL to disallow missing values. We make the combination of the two columns a primary key so that we don't accidentally create duplicate records. After all, it's not fair to count a student absent twice on the same day!

Adding New Records

At this point, our database and its tables have been created, and in the next section, "Retrieving Information," we'll see how to pull data out of the database. But first, we need to put some records into the tables.

There are several ways to add data to a database. You can insert records into a table manually by issuing INSERT statements. You can also add records by reading them from a file, either as raw data values that you load using the LOAD DATA statement or the mysqlimport utility, or in the form of pre-written INSERT statements that you feed to mysql.

This section demonstrates each method of inserting records into your tables. What you should do is play with the various methods to see how they work. Then go to the end of the section and run the commands you find there to clear out the tables and reload them. By doing so, you'll make sure the tables contain the same records that I was working with while writing the next section, and you'll get the same results.

Let's start adding records by using INSERT, a SQL statement for which you specify the table into which you want to insert a row of data and the values to put in the row. The INSERT statement has several forms:

  • You can specify values for all the columns:

    INSERT INTO tbl_name VALUES(value1,value2,…)
    

    For example:

    mysql> INSERT INTO student VALUES('Kyle','M',NULL);
    mysql> INSERT INTO event VALUES("1999-9-3","Q",NULL);
    								

    The word INTO is optional as of MySQL 3.22.5. (This is true for the other forms of INSERT as well.) The VALUES list must contain a value for each column in the table, in the order that the columns are stored in the table. (Normally, this is the order in which the columns were specified when the table was created. Use DESCRIBE tbl_name to find out the order if you're not sure what it is.)

    You can quote string and date values in MySQL using either single or double quotes. The NULL values in the preceding example are for the AUTO_INCREMENT columns in the student and event tables. (Inserting a "missing value" causes the next student_id or event_id number to be generated.)

    MySQL versions from 3.22.5 up allow you to insert several rows into a table with a single INSERT statement by specifying multiple value lists:

    INSERT INTO tbl_name VALUES(…),(…),…
    

    For example:

    mysql> INSERT INTO student VALUES('Abby','F',NULL),("Kyle","M",NULL);
    								

    This involves less typing than multiple INSERT statements, and also is more efficient for the server to execute.

  • You can name the columns to which you want to assign values, then list the values. This is useful when you want to create a record for which only a few columns need to be set up initially.

    INSERT INTO tbl_name (col_name1,col_name2,…) VALUES(value1,value2,…)
    

    For example:

    mysql> INSERT INTO member (last_name,first_name) VALUES('Stein','Waldo'),
    								

    As of MySQL 3.22.5, this form of INSERT allows multiple value lists, too:

    mysql> INSERT INTO student (name,sex) VALUES('Abby','F'),("Kyle","M");
    								

    Any column not named in the column list is assigned a default value.

  • As of MySQL 3.22.10, you can name columns and values in col_name=value form.

INSERT INTO tbl_name SET col_name1=value1, col_name2=value2, …

For example:

mysql> INSERT INTO member SET last_name='Stein',first_name='Waldo';
						

Any column not named in the SET clause is assigned a default value.

You cannot insert multiple rows using this form of INSERT.

Another method for loading records into a table is to read the data values directly from a file. You can load records this way with the LOAD DATA statement or with the mysqlimport utility.

The LOAD DATA statement acts as a bulk loader that reads data from a file. Use it from within mysql:

mysql> LOAD DATA LOCAL INFILE "member.txt" INTO TABLE member;
						

This statement reads the contents of the data file member.txt located in your current directory on the client host and sends it to the server to be loaded into the member table.

LOAD DATA LOCAL won't work if your MySQL is older than version 3.22.15 because that's when the capability of reading files from the client was added to LOAD DATA. (Without the LOCAL keyword, the file must be located on the server host and you need a server access privilege that most MySQL users don't have.)

By default, the LOAD DATA statement assumes that column values are separated by tabs and that lines end with newlines. It also assumes that the values are present in the order that columns are stored in the table. It's possible to read files in other formats or to specify a different column order. See the entry for LOAD DATA in Appendix D, "SQL Syntax Reference" for more details.

The mysqlimport utility acts as a command line interface to LOAD DATA. You invoke mysqlimport from the shell, and it generates a LOAD DATA statement for you:

% mysqlimport --local samp_db member.txt
						

mysqlimport generates a LOAD DATA statement for you that causes member.txt file to be loaded into the member table. This won't work if your MySQL is older than version 3.22.15 because the --local option requires LOAD DATA LOCAL. As with mysql, if you need to specify connection parameters, indicate them on the command line preceding the database name.

mysqlimport derives the table name from the name of the data file. (It uses everything up to the first period of the filename as the table name.) For example, member.txt would be loaded into the member table, and president.txt would be loaded into the president table. If you have multiple files that you want to load into a single table, choose the filenames carefully or mysqlimport won't use the correct table name. With names like member1.txt and member2.txt, mysqlimport would think the table names are member1 and member2. You could, however, use names like member.1.txt and member.2.txt, or member.txt1 and member.txt2.

After you have tried these record-adding methods, you should clear out the tables and reload them so that their contents are the same as what the next section assumes.

From your shell, execute the following commands:

% mysql samp_db < insert_president.sql
% mysql samp_db < insert_member.sql
% mysql samp_db < insert_student.sql
% mysql samp_db < insert_score.sql
% mysql samp_db < insert_event.sql
% mysql samp_db < insert_absence.sql
						

Each file contains a DELETE statement to delete any records you may have inserted into the table, followed by a set of INSERT statements to reinitialize the table contents. If you don't want to type those commands individually, try this:

% cat insert_*.sql| mysql samp_db

Retrieving Information

Our tables are created and loaded with data now, so let's see what we can do with that data. The SELECT statement allows you to retrieve and display information from your tables, in as general or specific a manner as you like. You can display the entire contents of a table:

SELECT * FROM president

Or you can select as little as a single column of a single row:

SELECT birth_date FROM president WHERE last_name = "Eisenhower"

The SELECT statement has several clauses (parts), which you combine as necessary to retrieve the information in which you're interested. Each of these clauses can be simple or complex, so SELECT statements as a whole can be simple or complex. However, you may rest assured that you won't find any page-long queries that take an hour to figure out in this book. (When I see arm-length queries in something I'm reading, I generally skip right over them, and I'm guessing you do the same.)

The general form of SELECT is:

SELECT what to select
FROM table or tables
WHERE conditions that data must satisfy
						

Remember that SQL is a free-format language, so when you write your own SELECT queries, you need not put line breaks in the same places I do.

To write a SELECT statement, specify what you want to retrieve and then some optional clauses. The clauses just shown (FROM, WHERE) are the ones most commonly used, although others can be specified as well, such as GROUP BY, ORDER BY, and LIMIT.

The FROM clause is usually present, but it need not be if you're not selecting data from tables. For example, the following query simply displays the values of some expressions that can be calculated without reference to any table, so no FROM is required:

mysql> SELECT 2+2, "Hello, world", VERSION();
+-----+--------------+------------------+
| 2+2 | Hello, world | VERSION()        |
+-----+--------------+------------------+
|   4 | Hello, world | 3.23.0-alpha-log |
+-----+--------------+------------------+

When you do use a FROM clause to specify a table from which to retrieve data, the most "generic" form of SELECT is to retrieve everything. To do this, use '*', which is shorthand for "all columns." The following query retrieves all rows from the student table and displays them:

mysql> SELECT * FROM student;
+-----------+-----+------------+
| name      | sex | student_id |
+-----------+-----+------------+
| Megan     | F   |          1 |
| Joseph    | M   |          2 |
| Kyle      | M   |          3 |
| Katie     | F   |          4 |
…

The columns are returned in the order that MySQL stores them in the table. This is the same order in which the columns are listed when you issue a DESCRIBE student statement. (The '' at the end of the example indicates that the query returns more rows than are shown.)

You can explicitly name the column or columns you want to see. To select just student names, do this:

mysql> SELECT name FROM student;
+-----------+
| name      |
+-----------+
| Megan     |
| Joseph    |
| Kyle      |
| Katie     |
…

If you name more than one column, separate them with commas. The following statement is equivalent to SELECT * FROM student, but each column is named explicitly:

mysql> SELECT name, sex, student_id FROM student;
+-----------+-----+------------+
| name      | sex | student_id |
+-----------+-----+------------+
| Megan     | F   |          1 |
| Joseph    | M   |          2 |
| Kyle      | M   |          3 |
| Katie     | F   |          4 |
…

You can name columns in any order:

SELECT name, student_id FROM student
SELECT student_id, name FROM student

You can even name a column more than once if you like, although generally that's kind of pointless.

Column names are not case sensitive in MySQL. The following queries are equivalent:

SELECT name, student_id FROM student
SELECT NAME, STUDENT_ID FROM student
SELECT nAmE, sTuDeNt_Id FROM student

Database and table names may be case sensitive; it depends on the file system used on the server host. A server running on UNIX treats database and table names as case sensitive because UNIX filenames are case sensitive. Windows filenames are not case sensitive, so a server running on Windows does not treat database and table names as case sensitive.

MySQL allows you to select columns from more than one table at a time. We'll get to this in "Retrieving Information from Multiple Tables."

Specifying Retrieval Criteria

To restrict the set of records retrieved by the SELECT statement, use a WHERE clause that specifies criteria for selecting rows. You can select rows by looking for column values that satisfy various criteria.

You can look for numeric values:

mysql> SELECT * FROM score WHERE score > 95;
+------------+----------+-------+
| student_id | event_id | score |
+------------+----------+-------+
|          5 |        3 |    97 |
|         18 |        3 |    96 |
|          1 |        6 |   100 |
|          5 |        6 |    97 |
|         11 |        6 |    98 |
|         16 |        6 |    98 |
+------------+----------+-------+

Alternatively, you can look for string values. (Note that string comparisons normally are not case sensitive.)

mysql> SELECT last_name, first_name FROM president
    -> WHERE last_name="ROOSEVELT";
+-----------+-------------+
| last_name | first_name  |
+-----------+-------------+
| Roosevelt | Theodore    |
| Roosevelt | Franklin D. |
+-----------+-------------+
mysql> SELECT last_name, first_name FROM president
    -> WHERE last_name="roosevelt";
+-----------+-------------+
| last_name | first_name  |
+-----------+-------------+
| Roosevelt | Theodore    |
| Roosevelt | Franklin D. |
+-----------+-------------+

Or you could look for date values:

mysql> SELECT last_name, first_name, birth FROM president
    -> WHERE birth < "1750-1-1";
+------------+------------+------------+
| last_name  | first_name | birth      |
+------------+------------+------------+
| Washington | George     | 1732-02-22 |
| Adams      | John       | 1735-10-30 |
| Jefferson  | Thomas     | 1743-04-13 |
+------------+------------+------------+

Or you can search for a combination of values:

mysql> SELECT last_name, first_name, birth, state FROM president
    -> WHERE birth < "1750-1-1" AND (state="VA" OR state="MA");
+------------+------------+------------+-------+
| last_name  | first_name | birth      | state |
+------------+------------+------------+-------+
| Washington | George     | 1732-02-22 | VA    |
| Adams      | John       | 1735-10-30 | MA    |
| Jefferson  | Thomas     | 1743-04-13 | VA    |
+------------+------------+------------+-------+

Expressions in WHERE clauses can use arithmetic operators , as in Table 1.1, comparison operators, as in Table 1.2, and logical operators, as in Table 1.3. You can also use parentheses to group parts of an expression. Operations can be performed using constants, table columns, and function calls. We will have occasion to use several of MySQL's functions in queries throughout this tutorial, but there are far too many to show here. See Appendix C, "Operator and Function Reference," for a complete list.

Table 1.1. Arithmetic Operators
OperatorMeaning
+Addition
-Subtraction
*Multiplication
/Division

Table 1.2. Comparison Operators
OperatorMeaning
<Less than
<=Less than or equal to
=Equal to
!= or <>Not equal to
>=Greater than or equal to
>Greater than

Table 1.3. Logical Operators
OperatorMeaning
ANDLogical AND
ORLogical OR
NOTLogical negation

When you're formulating a query that requires logical operators, take care not to confuse the meaning of the logical AND operator with the way we use "and" in everyday speech. Suppose you want to find "presidents born in Virginia and presidents born in Maryland." Note how that is phrased using "and"—do you therefore write the query as follows?:

SELECT last_name, first_name, state FROM president
WHERE state="VA" AND state="MA"

No, because what that query means is "Select presidents who were born both in Virginia and in Maryland," which makes no sense. In English, you might express the query using "and," but in SQL, you connect the two conditions with OR:

mysql> SELECT last_name, first_name, state FROM president
   -> WHERE state="VA" OR state="MA";
+------------+-------------+-------+
| last_name  | first_name  | state |
+------------+-------------+-------+
| Washington | George      | VA    |
| Adams      | John        | MA    |
| Jefferson  | Thomas      | VA    |
| Madison    | James       | VA    |
| Monroe     | James       | VA    |
| Adams      | John Quincy | MA    |
| Harrison   | William H.  | VA    |
| Tyler      | John        | VA    |
| Taylor     | Zachary     | VA    |
| Wilson     | Woodrow     | VA    |
| Kennedy    | John F      | MA    |
| Bush       | George W.   | MA    |
+------------+-------------+-------+

This is something to be aware of, not just when formulating your own queries, but also when writing queries for other people. It's best to listen carefully as they describe what they want to retrieve, but you don't necessarily want to transcribe their descriptions into SQL using the same logical operators. For the example just described, the proper English equivalent for the query is "Select presidents who were born either in Virginia or in Maryland."

The NULL Value

The NULL value is special; because it means "no value," you can't assess it against known values the way you can assess two known values against each other. If you attempt to use NULL with the usual arithmetic comparison operators, the result is undefined:

mysql> SELECT NULL < 0, NULL = 0, NULL != 0, NULL > 0;
+----------+----------+-----------+----------+
| NULL < 0 | NULL = 0 | NULL != 0 | NULL > 0 |
+----------+----------+-----------+----------+
|     NULL |     NULL |      NULL |     NULL |
+----------+----------+-----------+----------+

In fact, you can't even compare NULL against itself because the result of comparing two unknown values cannot be known:

mysql> SELECT NULL = NULL, NULL != NULL;
+-------------+--------------+
| NULL = NULL | NULL != NULL |
+-------------+--------------+
|        NULL |         NULL |
+-------------+--------------+

To perform searches for NULL values, you must use a special syntax. Instead of using = or != to test for equality or inequality, use IS NULL or IS NOT NULL.

For example, because we have represented the death date for living presidents as NULL, you can find them like this:

mysql> SELECT last_name, first_name FROM president WHERE death IS NULL;
+-----------+------------+
| last_name | first_name |
+-----------+------------+
| Ford      | Gerald R   |
| Carter    | James E.   |
| Reagan    | Ronald W.  |
| Bush      | George W.  |
| Clinton   | William J. |
+-----------+------------+

To find names that have a suffix part, use IS NOT NULL:

mysql> SELECT last_name, first_name, suffix
    -> FROM president WHERE suffix IS NOT NULL;
+-----------+------------+--------+
| last_name | first_name | suffix |
+-----------+------------+--------+
| Carter    | James E.   | Jr.    |
+-----------+------------+--------+

MySQL 3.23 and up has a special MySQL-specific '<=>' comparison operator that is true even for NULL-to-NULL comparisons. The preceding two queries can be rewritten to use this operator as follows:

mysql> SELECT last_name, first_name FROM president WHERE death <=> NULL;
+-----------+------------+
| last_name | first_name |
+-----------+------------+
| Ford      | Gerald R   |
| Carter    | James E.   |
| Reagan    | Ronald W.  |
| Bush      | George W.  |
| Clinton   | William J. |
+-----------+------------+
mysql> SELECT last_name, first_name, suffix
    -> FROM president WHERE NOT (suffix <=> NULL);
+-----------+------------+--------+
| last_name | first_name | suffix |
+-----------+------------+--------+
| Carter    | James E.   | Jr.    |
+-----------+------------+--------+

Sorting Query Results

Sometimes people notice that if you issue a SELECT * FROM tbl_name query on a table after loading an initial data set into it, the rows are retrieved in the same order in which they were inserted. You can't count on that always being true. If you delete and insert rows after loading the table initially, you'll find that that changes the order in which the server returns the table's rows. (Deleting records puts "holes" of unused space in the table, which MySQL tries to fill later when you insert new records.)

By default, when you select rows, the server makes no guarantees about the order in which rows will be returned. To sort rows, use an ORDER BY clause:

mysql> SELECT last_name, first_name FROM president
    -> ORDER BY last_name;
+------------+---------------+
| last_name  | first_name    |
+------------+---------------+
| Adams      | John          |
| Adams      | John Quincy   |
| Arthur     | Chester A.    |
| Buchanan   | James         |
…

You can specify whether to sort a column in ascending or descending order by using the ASC or DESC keywords after column names in the ORDER BY clause. For example, to sort president names by reverse (descending) name order, use DESC like this:

mysql> SELECT last_name, first_name FROM president
    -> ORDER BY last_name DESC;
+------------+---------------+
| last_name  | first_name    |
+------------+---------------+
| Wilson     | Woodrow       |
| Washington | George        |
| Van Buren  | Martin        |
| Tyler      | John          |
…

Ascending order is the default if you specify neither ASC nor DESC for acolumn name in an ORDER BY clause.

If you sort a column that may contain NULL values, any NULL values appear at the beginning of the column, or at the end if you sort in descending order.

Query output can be sorted on multiple columns, and each column can be sorted in ascending or descending order independently of any other. The following query retrieves rows from the president table, sorts them by reverse state of birth, and by last name within each state:

mysql> SELECT last_name, first_name, state FROM president
    -> ORDER BY state DESC, last_name ASC;
+------------+---------------+-------+
| last_name  | first_name    | state |
+------------+---------------+-------+
| Arthur     | Chester A.    | VT    |
| Coolidge   | Calvin        | VT    |
| Harrison   | William H.    | VA    |
| Jefferson  | Thomas        | VA    |
| Madison    | James         | VA    |
| Monroe     | James         | VA    |
| Taylor     | Zachary       | VA    |
| Tyler      | John          | VA    |
| Washington | George        | VA    |
| Wilson     | Woodrow       | VA    |
| Eisenhower | Dwight D.     | TX    |
| Johnson    | Lyndon B.     | TX    |
…

Limiting Query Results

When a query returns many rows, but you want to see only a few of them, the LIMIT clause is useful, especially in conjunction with ORDER BY. MySQL allows you to limit the output of a query to the first n rows of the result that would otherwise be returned. The following query selects the five presidents who were born first:

mysql> SELECT last_name, first_name, birth FROM president
    -> ORDER BY birth LIMIT 5;
+------------+------------+------------+
| last_name  | first_name | birth      |
+------------+------------+------------+
| Washington | George     | 1732-02-22 |
| Adams      | John       | 1735-10-30 |
| Jefferson  | Thomas     | 1743-04-13 |
| Madison    | James      | 1751-03-16 |
| Monroe     | James      | 1758-04-28 |
+------------+------------+------------+

If you sort in reverse order, using ORDER BY birth DESC, you'd get the five most recently born presidents instead.

LIMIT also allows you to pull a section of records out of the middle of a result set. To do this, you must specify two values. The first value is the number of the initial record in the result set that you want to see. (The first record is numbered 0, not 1.) The second value indicates the number of records to return. The following query is similar to the previous one but returns five records beginning with the eleventh row of the result:

mysql> SELECT last_name, first_name, birth FROM president
    -> ORDER BY birth LIMIT 10, 5;
+-----------+------------+------------+
| last_name | first_name | birth      |
+-----------+------------+------------+
| Tyler     | John       | 1790-03-29 |
| Buchanan  | James      | 1791-04-23 |
| Polk      | James K.   | 1795-11-02 |
| Fillmore  | Millard    | 1800-01-07 |
| Pierce    | Franklin   | 1804-11-23 |
+-----------+------------+------------+

As of MySQL 3.23.2, you can order query results according to a formula. For example, to pull a randomly selected record from the president table, use ORDER BY RAND() in conjunction with LIMIT:

mysql> SELECT last_name, first_name FROM president
    -> ORDER BY RAND() LIMIT 1;
+-----------+------------+
| last_name | first_name |
+-----------+------------+
| McKinley  | William    |
+-----------+------------+

Calculating and Naming Output Column Values

Most of the preceding queries have produced output by retrieving values from tables. MySQL also allows you to calculate an output column value as the result of an expression. Expressions can be simple or complex. The following query evaluates a simple expression (a constant) and a more complex expression involving several arithmetic operations and a couple of function calls:

mysql> SELECT 17, FORMAT(SQRT(3*3+4*4),0);
+----+-------------------------+
| 17 | FORMAT(SQRT(3*3+4*4),0) |
+----+-------------------------+
| 17 | 5                       |
+----+-------------------------+

Expressions may also refer to table columns:

mysql> SELECT CONCAT(first_name," ",last_name),CONCAT(city,", ",state)
    -> FROM president;
+----------------------------------+-------------------------+
| CONCAT(first_name," ",last_name) | CONCAT(city,", ",state) |
+----------------------------------+-------------------------+
| George Washington                | Wakefield, VA           |
| John Adams                       | Braintree, MA           |
| Thomas Jefferson                 | Albemarle County, VA    |
| James Madison                    | Port Conway, VA         |
…

This query formats president names as a single string by concatenating first and last names separated by a space and formats birthplaces as the birth cities and states separated by a comma.

When you use an expression to calculate a column value, the expression is used for the column heading. That can lead to a very wide column if the expression is long (as the preceding query illustrates). To deal with this, the column can be assigned a name for the heading, using the AS name construct. Such names are called column aliases. The output from the previous query can be made more meaningful like this:

mysql> SELECT CONCAT(first_name," ",last_name) AS Name,
    -> CONCAT(city,", ",state) As Birthplace
    -> FROM president;
+-----------------------+-------------------------+
| Name                  | Birthplace              |
+-----------------------+-------------------------+
| George Washington     | Wakefield, VA           |
| John Adams            | Braintree, MA           |
| Thomas Jefferson      | Albemarle County, VA    |
| James Madison         | Port Conway, VA         |
…

If the column alias contains spaces, you'll need to put it in quotes:

mysql> SELECT CONCAT(first_name," ",last_name) AS "President Name",
    -> CONCAT(city,", ",state) As "Place of Birth"
    -> FROM president;
+-----------------------+-------------------------+
| President Name        | Place of Birth          |
+-----------------------+-------------------------+
| George Washington     | Wakefield, VA           |
| John Adams            | Braintree, MA           |
| Thomas Jefferson      | Albemarle County, VA    |
| James Madison         | Port Conway, VA         |
…

Working With Dates

The principal thing to keep in mind when it comes to dates in MySQL is that they are always represented with the year first. July 27, 1999 is represented as "1999-07-27". It is not represented as "07-27-1999" or as "27-07-1999", as you might be more used to writing.

MySQL provides several ways to perform operations on dates. Some of the things you can do are as follows:

  • Sort by date. (We've seen this several times already.)

  • Look for particular dates or a range of dates.

  • Extract parts of a date value, such as the year, month, or day.

  • Calculate the difference between dates.

  • Compute a date by adding or subtracting an interval from another date.

Some examples of these operations follow.

To look for particular dates, either by exact value or compared to another value, compare a DATE column to the value you're interested in:

mysql> SELECT * FROM event WHERE date = "1999-10-01";
+------------+------+----------+
| date       | type | event_id |
+------------+------+----------+
| 1999-10-01 | T    |        6 |
+------------+------+----------+
mysql> SELECT last_name, first_name, death
    -> FROM president
    -> WHERE death >= "1970-01-01" AND death < "1980-01-01";
+-----------+------------+------------+
| last_name | first_name | death      |
+-----------+------------+------------+
| Truman    | Harry S.   | 1972-12-26 |
| Johnson   | Lyndon B.  | 1973-01-22 |
+-----------+------------+------------+

To test or retrieve parts of dates, you can use functions such as YEAR(), MONTH(), or DAYOFMONTH(). For example, I can find presidents who were born in the same month that I was (March) by looking for dates with a month value of 3:

mysql> SELECT last_name, first_name, birth
    -> FROM president WHERE MONTH(birth) = 3;
+-----------+------------+------------+
| last_name | first_name | birth      |
+-----------+------------+------------+
| Madison   | James      | 1751-03-16 |
| Jackson   | Andrew     | 1767-03-15 |
| Tyler     | John       | 1790-03-29 |
| Cleveland | Grover     | 1837-03-18 |
+-----------+------------+------------+

The query can also be written in terms of the month name:

mysql> SELECT last_name, first_name, birth
    -> FROM president WHERE MONTHNAME(birth) = "March";
+-----------+------------+------------+
| last_name | first_name | birth      |
+-----------+------------+------------+
| Madison   | James      | 1751-03-16 |
| Jackson   | Andrew     | 1767-03-15 |
| Tyler     | John       | 1790-03-29 |
| Cleveland | Grover     | 1837-03-18 |
+-----------+------------+------------+

To be more specific—down to the day—I can combine tests for MONTH() and DAYOFMONTH() to find presidents born on my birthday:

mysql> SELECT last_name, first_name, birth
    -> FROM president WHERE MONTH(birth) = 3 AND DAYOFMONTH(birth) = 29;
+-----------+------------+------------+
| last_name | first_name | birth      |
+-----------+------------+------------+
| Tyler     | John       | 1790-03-29 |
+-----------+------------+------------+

This is the kind of query you'd use to generate one of those "these people have birthdays today" lists such as you see in the Entertainment section of your newspaper. However, you don't have to plug in a specific day the way the previous query did. To check for presidents born today, no matter what day of the year today is, compare their birthdays to the value of CURRENT_DATE:

SELECT last_name, first_name, birth
FROM president WHERE MONTH(birth) = MONTH(CURRENT_DATE)
AND DAYOFMONTH(birth) = DAYOFMONTH(CURRENT_DATE)

You can subtract one date from another. This allows you to find the interval between dates, which is useful for determining ages. For example, to determine which presidents lived the longest, subtract the birth date from the death date. To do this, convert birth and death to days using the TO_DAYS() function, take the difference, and divide by 365 to yield approximate age in years:

mysql> SELECT last_name, first_name, birth, death,
    -> FLOOR((TO_DAYS(death) - TO_DAYS(birth))/365) AS age
    -> FROM president WHERE death IS NOT NULL
    -> ORDER BY age DESC LIMIT 5;
+-----------+------------+------------+------------+------+
| last_name | first_name | birth      | death      | age  |
+-----------+------------+------------+------------+------+
| Adams     | John       | 1735-10-30 | 1826-07-04 |   90 |
| Hoover    | Herbert C. | 1874-08-10 | 1964-10-20 |   90 |
| Truman    | Harry S.   | 1884-05-08 | 1972-12-26 |   88 |
| Madison   | James      | 1751-03-16 | 1836-06-28 |   85 |
| Jefferson | Thomas     | 1743-04-13 | 1826-07-04 |   83 |
+-----------+------------+------------+------------+------+

The FLOOR() function used in this query chops off any fractional part from the age to produce an integer.

Taking the difference between dates is also useful for determining how far dates are from some reference date. That's how you can tell which Historical League members need to renew their memberships soon. Compute the difference between their expiration dates and the current date, and if it's less than some threshold value, a renewal will soon be needed. The following query finds memberships that are due for renewal within 60 days:

SELECT last_name, first_name, expiration FROM member
WHERE (TO_DAYS(expiration) - TO_DAYS(CURRENT_DATE)) < 60

As of MySQL 3.22, you can use DATE_ADD() or DATE_SUB() to calculate one date from another. These functions take a date and an interval and produce a new date. For example:

mysql> SELECT DATE_ADD("1970-1-1", INTERVAL 10 YEAR);
+----------------------------------------+
| DATE_ADD("1970-1-1", INTERVAL 10 YEAR) |
+----------------------------------------+
| 1980-01-01                             |
+----------------------------------------+
mysql> SELECT DATE_SUB("1970-1-1", INTERVAL 10 YEAR);
+----------------------------------------+
| DATE_SUB("1970-1-1", INTERVAL 10 YEAR) |
+----------------------------------------+
| 1960-01-01                             |
+----------------------------------------+

A query shown earlier in this section selected presidents who died during the 1970s, using literal dates for the endpoints of the selection range. That query can be rewritten to use a literal starting date and an ending date calculated from the starting date and an interval:

mysql> SELECT last_name, first_name, death
    -> FROM president
    -> WHERE death >= "1970-1-1"
    -> AND death < DATE_ADD("1970-1-1", INTERVAL 10 YEAR);
+-----------+------------+------------+
| last_name | first_name | death      |
+-----------+------------+------------+
| Truman    | Harry S.   | 1972-12-26 |
| Johnson   | Lyndon B.  | 1973-01-22 |
+-----------+------------+------------+

The membership-renewal query can be written in terms of DATE_ADD():

SELECT last_name, first_name, expiration FROM member
WHERE expiration < DATE_ADD(CURRENT_DATE, INTERVAL 60 DAY)

Earlier in this chapter, a query was presented for determining which of a dentist's patients haven't been in for their checkups in a while:

SELECT last_name, first_name, last_visit FROM patient
WHERE last_visit < DATE_SUB(CURRENT_DATE,INTERVAL 6 MONTH)

This query may not have meant much to you then. Is it more meaningful now?

Pattern Matching

MySQL allows you to look for values that match a pattern. In this way, you can select records without supplying an exact value. To perform a pattern matching operation, you use special operators (LIKE and NOT LIKE), and you specify a string containing wild card characters. The character '_' matches any single character, and '%' matches any sequence of characters (including an empty sequence). Pattern matches using LIKE or NOT LIKE are not case sensitive.

This pattern matches last names that begin with 'W' or 'w':

mysql> SELECT last_name, first_name FROM president
    -> WHERE last_name LIKE "W%";
+------------+------------+
| last_name  | first_name |
+------------+------------+
| Washington | George     |
| Wilson     | Woodrow    |
+------------+------------+

This pattern match is erroneous:

mysql> SELECT last_name, first_name FROM president
    -> WHERE last_name = "W%";
Empty set (0.00 sec)

The query demonstrates a common error, which is to use a pattern with an arithmetic comparison operator. The only way for such a comparison to succeed is for the column to contain exactly the string "W%" or "w%".

This pattern matches last names that contain 'W' or 'w' anywhere in the name:

mysql> SELECT last_name, first_name FROM president
    -> WHERE last_name LIKE "%W%";
+------------+------------+
| last_name  | first_name |
+------------+------------+
| Washington | George     |
| Wilson     | Woodrow    |
| Eisenhower | Dwight D.  |
+------------+------------+

This pattern matches last names that contain exactly four characters:

mysql> SELECT last_name, first_name FROM president
    -> WHERE last_name LIKE "____";
+------------+------------+
| last_name  | first_name |
+------------+------------+
| Polk       | James K.   |
| Taft       | William H. |
| Ford       | Gerald R.  |
| Bush       | George W.  |
+------------+------------+

MySQL also provides another form of pattern matching based on extended regular expressions. Regular expressions are described in the discussion of the REGEXP operator in Appendix C.

Generating Summaries

One of the most useful things MySQL can do for you is to boil down lots of raw data and summarize it. MySQL becomes a powerful ally when you learn to use it to generate summaries because that is an especially tedious, time-consuming, error-prone activity when done manually.

One simple form of summarizing is to determine which unique values are present in a set of values. Use the DISTINCT keyword to remove duplicate rows from a result. For example, the different states in which presidents have been born can be found like this:

mysql> SELECT DISTINCT state FROM president ORDER BY state;
+-------+
| state |
+-------+
| AK    |
| CA    |
| GA    |
| IA    |
| IL    |
| KY    |
| MA    |
| MO    |
| NC    |
| NE    |
| NH    |
| NJ    |
| NY    |
| OH    |
| PA    |
| SC    |
| TX    |
| VA    |
| VT    |
+-------+

Another form of summarizing involves counting, using—you guessed it—the COUNT() function. If you use COUNT(*), it tells you the number of rows selected by your query. If a query has no WHERE clause, COUNT(*) tells you the number of rows in your table.

The following query shows how many US presidents there have been:

mysql> SELECT COUNT(*) FROM president;
+----------+
| COUNT(*) |
+----------+
|       41 |
+----------+

If a query does have a WHERE clause, COUNT(*) tells you how many rows the clause selects. This query shows how many quizzes you have given to your class so far:

mysql> SELECT COUNT(*) FROM event WHERE type = 'Q';
+----------+
| COUNT(*) |
+----------+
|        4 |
+----------+

COUNT(*) counts every row selected. By contrast, COUNT(col_name) counts only non-NULL values. The following query demonstrates these differences:

mysql> SELECT COUNT(*),COUNT(suffix),COUNT(death) FROM president;
+----------+---------------+--------------+
| COUNT(*) | COUNT(suffix) | COUNT(death) |
+----------+---------------+--------------+
|       41 |             1 |           36 |
+----------+---------------+--------------+

This shows that there have been 41 presidents in all, that only one of them has a name suffix, and that most presidents are no longer living.

As of MySQL 3.23.2, you can combine COUNT() with DISTINCT to count the number of distinct values in a result. For example, to count the number of different states in which presidents have been born, do this:

mysql> SELECT COUNT(DISTINCT state) FROM president;
+-----------------------+
| COUNT(DISTINCT state) |
+-----------------------+
|                    19 |
+-----------------------+

You can break down counts by individual values in the summarized column. For example, you may know the overall number of students in your class as a result of running this query:

mysql> SELECT COUNT(*) FROM student;
+----------+
| COUNT(*) |
+----------+
|       31 |
+----------+

But how many students are boys and how many are girls? One way to find out is by asking for a count for each sex separately:

mysql> SELECT COUNT(*) FROM student WHERE sex='f';
+----------+
| COUNT(*) |
+----------+
|       15 |
+----------+
mysql> SELECT COUNT(*) FROM student WHERE sex='m';
+----------+
| COUNT(*) |
+----------+
|       16 |
+----------+

However, although this approach works, it's tedious and not really very well suited for columns that might have several different values. Consider how you'd determine the number of presidents born in each state this way. You'd have to find out which states are represented so as not to miss any (SELECT DISTINCT state FROM president), then run a SELECT COUNT(*) query for each state. That is clearly something you don't want to do.

Fortunately, MySQL can count, using a single query, how many times each distinct value occurs in a column: For our student list, we can count boys and girls like this:

mysql> SELECT sex, COUNT(*) FROM student GROUP BY sex;
+-----+----------+
| sex | COUNT(*) |
+-----+----------+
| M   |       16 |
| F   |       15 |
+-----+----------+

The same form of query tells us how many presidents were born in each state:

mysql> SELECT state, COUNT(*) FROM president GROUP BY state;
+-------+----------+
| state | COUNT(*) |
+-------+----------+
| AK    |        1 |
| CA    |        1 |
| GA    |        1 |
| IA    |        1 |
| IL    |        1 |
| KY    |        1 |
| MA    |        4 |
| MO    |        1 |
| NC    |        2 |
| NE    |        1 |
| NH    |        1 |
| NJ    |        1 |
| NY    |        4 |
| OH    |        7 |
| PA    |        1 |
| SC    |        1 |
| TX    |        2 |
| VA    |        8 |
| VT    |        2 |
+-------+----------+

When you count values this way, the GROUP BY clause is necessary; it tells MySQL how to cluster values before counting them. You'll just get an error if you omit it.

The use of COUNT(*) with GROUP BY to count values has a number of advantages over counting occurrences of each distinct column value individually:

  • You don't have to know in advance what values are present in the column you're summarizing.

  • You need only a single query, not several.

  • You get all the results with a single query, so you can sort the output.

The first two advantages are important for expressing queries more easily. The third advantage is important because it affords you flexibility in displaying your results. When you use a GROUP BY clause, the results are sorted on the columns you group by, but you can use ORDER BY to sort in a different order. For example, if you want number of presidents grouped by state of birth, but sorted with the most well-represented states first, you can use an ORDER BY clause as follows:

mysql> SELECT state, COUNT(*) AS count FROM president
    -> GROUP BY state ORDER BY count DESC;
+-------+-------+
| state | count |
+-------+-------+
| VA    |     8 |
| OH    |     7 |
| MA    |     4 |
| NY    |     4 |
| NC    |     2 |
| VT    |     2 |
| TX    |     2 |
| SC    |     1 |
| NH    |     1 |
| PA    |     1 |
| KY    |     1 |
| NJ    |     1 |
| IA    |     1 |
| MO    |     1 |
| CA    |     1 |
| NE    |     1 |
| GA    |     1 |
| IL    |     1 |
| AK    |     1 |
+-------+-------+

When the column you want to sort by is determined by calculation, you can give the column an alias and refer to the alias in the ORDER BY clause. The preceding query demonstrates this; the COUNT(*) column is aliased as count. Another way to refer to such a column is by its position in the output. The previous query could have been written like this instead:

SELECT state, COUNT(*) FROM president
GROUP BY state ORDER BY 2 DESC

I don't find that referring to columns by position is very readable. If you add, remove, or reorder output columns, you must remember to check the ORDER BY clause and fix the column number if it has changed. Aliases don't have this problem.

If you want to use GROUP BY with a calculated column, you should refer to it using an alias or column position, just as with ORDER BY. The following query determines how many presidents were born in each month of the year:

mysql> SELECT MONTH(birth) as Month, MONTHNAME(birth) as Name,
    -> COUNT(*) AS count
    -> FROM president GROUP BY Name ORDER BY Month;
+-------+-----------+-------+
| Month | Name      | count |
+-------+-----------+-------+
|     1 | January   |     4 |
|     2 | February  |     4 |
|     3 | March     |     4 |
|     4 | April     |     4 |
|     5 | May       |     2 |
|     6 | June      |     1 |
|     7 | July      |     3 |
|     8 | August    |     4 |
|     9 | September |     1 |
|    10 | October   |     6 |
|    11 | November  |     5 |
|    12 | December  |     3 |
+-------+-----------+-------+

Using column positions, the query would be written like this:

SELECT MONTH(birth), MONTHNAME(birth), COUNT(*)
FROM president GROUP BY 2 ORDER BY 1;

COUNT() may be combined with ORDER BY and LIMIT to find, for example, the four most well-represented states in the president table:

mysql> SELECT state, COUNT(*) AS count FROM president
    -> GROUP BY state ORDER BY count DESC LIMIT 4;
+-------+-------+
| state | count |
+-------+-------+
| VA    |     8 |
| OH    |     7 |
| MA    |     4 |
| NY    |     4 |
+-------+-------+

If you don't want to limit query output with a LIMIT clause, but rather by looking for particular values of COUNT(), use a HAVING clause. The following query will tell you which states are represented by two or more presidents:

mysql> SELECT state, COUNT(*) AS count FROM president
    -> GROUP BY state HAVING count > 1 ORDER BY count DESC;
+-------+-------+
| state | count |
+-------+-------+
| VA    |     8 |
| OH    |     7 |
| MA    |     4 |
| NY    |     4 |
| NC    |     2 |
| VT    |     2 |
| TX    |     2 |
+-------+-------+

More generally, this is the type of query to run when you want to find duplicated values in a column.

HAVING is similar to WHERE, but it's applied after the query results have been selected and used to narrow down the results actually sent by the server to the client.

There are summary functions other than COUNT(). The MIN(), MAX(),SUM(), and AVG() functions are useful for determining the minimum, maximum, total, and average values in a column. You can even use them all at the same time. The following query shows various numeric characteristics for each quiz and test you've given. It also shows how many scores go into computing each of the values. (Some students may have been absent and are not counted.)

mysql> SELECT
    -> event_id,
    -> MIN(score) AS minimum,
    -> MAX(score) AS maximum,
    -> MAX(score)-MIN(score)+1 AS range,
    -> SUM(score) AS total,
    -> AVG(score) AS average,
    -> COUNT(score) AS count
    -> FROM score
    -> GROUP BY event_id;
+----------+---------+---------+-------+-------+---------+-------+
| event_id | minimum | maximum | range | total | average | count |
+----------+---------+---------+-------+-------+---------+-------+
|        1 |       9 |      20 |    12 |   439 | 15.1379 |    29 |
|        2 |       8 |      19 |    12 |   425 | 14.1667 |    30 |
|        3 |      60 |      97 |    38 |  2425 | 78.2258 |    31 |
|        4 |       7 |      20 |    14 |   379 | 14.0370 |    27 |
|        5 |       8 |      20 |    13 |   383 | 14.1852 |    27 |
|        6 |      62 |     100 |    39 |  2325 | 80.1724 |    29 |
+----------+---------+---------+-------+-------+---------+-------+

This information might be more meaningful if you knew whether the values were from quizzes or tests, of course. However, to produce that information, we need to consult the event table as well; we'll revisit this query in "Retrieving Information from Multiple Tables."

Summary functions are fun to play with because they're so powerful, but it's easy to get carried away with them. Consider this query:

mysql> SELECT
    -> state AS State,
    -> AVG((TO_DAYS(death)-TO_DAYS(birth))/365) AS Age
    -> FROM president WHERE death IS NOT NULL
    -> GROUP BY state ORDER BY Age;
+-------+-----------+
| State | Age       |
+-------+-----------+
| KY    | 56.208219 |
| VT    | 58.852055 |
| NC    | 60.141096 |
| OH    | 62.866145 |
| NH    | 64.917808 |
| NY    | 69.342466 |
| NJ    | 71.315068 |
| TX    | 71.476712 |
| MA    | 72.642009 |
| VA    | 72.822945 |
| PA    | 77.158904 |
| SC    | 78.284932 |
| CA    | 81.336986 |
| MO    | 88.693151 |
| IA    | 90.254795 |
+-------+-----------+

The query selects presidents who have died, groups them by date of birth, figures out their age at time of death, computes the average age (per state), and then sorts the results by average age. In other words, the query determines, for non-living presidents, the average age of death by state of birth.

And what does that demonstrate? It shows only that you can write the query. It certainly doesn't show that the query is worth writing. Not all things you can do with a database are equally meaningful; nevertheless, people sometimes go query-happy when they find out what they can do with their database. This may account for the rise of increasingly esoteric (and pointless) statistics on televised sporting events over the last few years. The sports statisticians can use their databases to figure out everything you'd ever want to know about a team, and also everything you'd never want to know. Do you really care which third-string quarterback holds the record for most interceptions on third down when his team is leading by more than 14 points with the ball inside the 15-yard line and less than three minutes remaining on the clock in the second quarter?

Retrieving Information from Multiple Tables

The queries we've written so far have pulled data from a single table. Now we get to the interesting part. I've mentioned before that the power of a relational DBMS lies in its ability to relate one thing to another because that allows you to combine information from multiple tables to answer questions that can't be answered from individual tables alone. This section describes how to write queries that do that.

When you select information from multiple tables, you're performing an operation called a join. That's because you're producing a result by joining the information from one table to the information in another. This is done by matching up common values in the tables.

Let's work through an example. Earlier, in "Tables for the Grade-Keeping Project," a query to retrieve quiz or test scores for a given date was presented without explanation. Now it's time for the explanation. The query actually involves a three-way join, so we'll work up to it in two steps.

In the first step, we construct a query to select scores for a given date as follows:

mysql> SELECT student_id, date, score, type
    -> FROM event, score
    -> WHERE date = "1999-09-23"
    -> AND event.event_id = score.event_id;
+------------+------------+-------+------+
| student_id | date       | score | type |
+------------+------------+-------+------+
|          1 | 1999-09-23 |    15 | Q    |
|          2 | 1999-09-23 |    12 | Q    |
|          3 | 1999-09-23 |    11 | Q    |
|          5 | 1999-09-23 |    13 | Q    |
|          6 | 1999-09-23 |    18 | Q    |
…

The query works by finding the event record with the given date and then uses the event ID in that record to locate scores with the same event ID. For each matching event record and score record combination, the student ID, score, date, and event type are displayed.

The query differs from others we have written in two important respects:

  • The FROM clause names more than one table because we're retrieving data from more than one table:

    FROM event, score
    
  • The WHERE clause specifies that the event and score tables are joined by matching up the event_id values in each table:

    WHERE … event.event_id = score.event_id
    

Notice how we refer to the columns using tbl_name.col_name syntax so that MySQL knows which tables we're referring to. (event_id occurs in both tables, so it's ambiguous if used without a table name to qualify it.)

The other columns in the query (date, score, type) can be used without a table qualifier because they appear in only one of the tables and thus are unambiguous. However, I generally prefer to qualify every column in a join to make it clearer which table each column is part of. In fully qualified form, the query looks like this:

SELECT score.student_id, event.date, score.score, event.type
FROM event, score
WHERE event.date = "1999-09-23"
AND event.event_id = score.event_id

I will use the fully qualified form from now on.

In the second step, we complete the query by using the student table to display student names. (Output of the query from the first step shows us the student_id field, but names are more meaningful.) Name display is accomplished using the fact that the score and student tables both have student_id columns allowing the records in them to be linked. The resulting query looks like this:

mysql> SELECT student.name, event.date, score.score, event.type
    -> FROM event, score, student
    -> WHERE event.date = "1999-09-23"
    -> AND event.event_id = score.event_id
    -> AND score.student_id = student.student_id;
+-----------+------------+-------+------+
| name      | date       | score | type |
+-----------+------------+-------+------+
| Megan     | 1999-09-23 |    15 | Q    |
| Joseph    | 1999-09-23 |    12 | Q    |
| Kyle      | 1999-09-23 |    11 | Q    |
| Abby      | 1999-09-23 |    13 | Q    |
| Nathan    | 1999-09-23 |    18 | Q    |
…

This query differs from the previous one as follows:

  • The student table is added to the FROM clause because it is used in addition to the event and score tables.

  • The student_id column is ambiguous now, so it must be qualified as score.student_id or student.student_id to make it clear which table to use. (This is true even if you prefer writing joins that don't fully qualify every column reference.)

  • The WHERE clause has an additional term specifying that score table records are matched against student table records based on student ID:

    WHERE … score.student_id = student.student_id
    
  • The query displays the student name rather than the student ID. (You could display both if you wanted to, of course.)

With this query, you can plug in any date and get back the scores for that date, complete with student names and the score type. You don't have to know anything about student IDs or event IDs. MySQL takes care of figuring out the relevant ID values and using them to match up table rows, automatically.

Another task the grade-keeping project involves is summarizing student absences. Absences are recorded by student ID and date in the absence table. To get student names (not just IDs), we need to join the absence table to the student table, based on the student_id value. The following query lists student ID number and name along with a count of absences:

mysql> SELECT student.student_id, student.name,
    -> COUNT(absence.date) as absences
    -> FROM student, absence
    -> WHERE student.student_id = absence.student_id
    -> GROUP BY student.student_id;
+------------+-------+----------+
| student_id | name  | absences |
+------------+-------+----------+
|          3 | Kyle  |        1 |
|          5 | Abby  |        1 |
|         10 | Peter |        2 |
|         17 | Will  |        1 |
|         20 | Avery |        1 |
…

Note: Although I'm supplying a qualifier in the GROUP BY clause, it isn't strictly necessary for this query. GROUP BY refers to columns in the selection list (on the first two lines of the query). There is only one column named student_id there, so MySQL knows which column you mean. This is also true for columns named in ORDER BY clauses.

The output produced by the query is fine if we want to know only which students had absences. But if we turn in this list to the school office, they may say, "What about the other students? We want a value for each student." That's a slightly different question. It means we want to know the number of absences, even for students that had none. Because the question is different, the query is different as well.

To answer the question, we can use LEFT JOIN rather than associating student ID in the WHERE clause. LEFT JOIN tells MySQL to produce a row of output for each row selected from the table named first in the join (that is, the table named to the left of the LEFT JOIN keywords). By naming the student table first, we'll get output for every student, even those who are not represented in the absence table. The query looks like this:

mysql> SELECT student.student_id, student.name,
    -> COUNT(absence.date) as absences
    -> FROM student LEFT JOIN absence
    -> ON student.student_id = absence.student_id
    -> GROUP BY student.student_id;
+------------+-----------+----------+
| student_id | name      | absences |
+------------+-----------+----------+
|          1 | Megan     |        0 |
|          2 | Joseph    |        0 |
|          3 | Kyle      |        1 |
|          4 | Katie     |        0 |
|          5 | Abby      |        1 |
|          6 | Nathan    |        0 |
|          7 | Liesl     |        0 |
…

Earlier, in "Generating Summaries," we ran a query that produced a numeric characterization of the data in the score table. Output from that query listed event ID but did not include score date or type, because we didn't know then how to join the score table to the event table to get score dates and types. Now we do. The following query is similar to one we ran earlier, but shows the date and type for scores rather than simply the numeric event ID:

mysql> SELECT
    -> event.date,event.type,
    -> MIN(score.score) AS minimum,
    -> MAX(score.score) AS maximum,
    -> MAX(score.score)-MIN(score.score)+1 AS range,
    -> SUM(score.score) AS total,
    -> AVG(score.score) AS average,
    -> COUNT(score.score) AS count
    -> FROM score, event
    -> WHERE score.event_id = event.event_id
    -> GROUP BY event.date;
+------------+------+---------+---------+-------+-------+---------+-------+
| date       | type | minimum | maximum | range | total | average | count |
+------------+------+---------+---------+-------+-------+---------+-------+
| 1999-09-03 | Q    |       9 |      20 |    12 |   439 | 15.1379 |    29 |
| 1999-09-06 | Q    |       8 |      19 |    12 |   425 | 14.1667 |    30 |
| 1999-09-09 | T    |      60 |      97 |    38 |  2425 | 78.2258 |    31 |
| 1999-09-16 | Q    |       7 |      20 |    14 |   379 | 14.0370 |    27 |
| 1999-09-23 | Q    |       8 |      20 |    13 |   383 | 14.1852 |    27 |
| 1999-10-01 | T    |      62 |     100 |    39 |  2325 | 80.1724 |    29 |
+------------+------+---------+---------+-------+-------+---------+-------+

You can use functions such as COUNT() and AVG() to produce a summary over multiple columns, even if the columns come from different tables. The following query determines the number of scores and the average score for each combination of event date and student sex:

mysql> SELECT event.date, student.sex,
    -> COUNT(score) AS count, AVG(score) As average
    -> FROM event, score, student
    -> WHERE event.event_id = score.event_id
    -> AND score.student_id = student.student_id
    -> GROUP BY event.date, student.sex;
+------------+-----+-------+---------+
| date       | sex | count | average |
+------------+-----+-------+---------+
| 1999-09-03 | F   |    14 | 14.6429 |
| 1999-09-03 | M   |    15 | 15.6000 |
| 1999-09-06 | F   |    14 | 14.7143 |
| 1999-09-06 | M   |    16 | 13.6875 |
| 1999-09-09 | F   |    15 | 77.4000 |
| 1999-09-09 | M   |    16 | 79.0000 |
| 1999-09-16 | F   |    13 | 15.3077 |
| 1999-09-16 | M   |    14 | 12.8571 |
| 1999-09-23 | F   |    12 | 14.0833 |
| 1999-09-23 | M   |    15 | 14.2667 |
| 1999-10-01 | F   |    14 | 77.7857 |
| 1999-10-01 | M   |    15 | 82.4000 |
+------------+-----+-------+---------+

We can use a similar query to perform one of the grade-keeping project tasks: computing the total score per student at the end of the semester. The query looks like this:

SELECT student.student_id, student.name,
SUM(score.score) AS total, COUNT(score.score) AS n
FROM event, score, student
WHERE event.event_id = score.event_id
AND score.student_id = student.student_id
GROUP BY score.student_id
ORDER BY total

There is no requirement that a join be performed using two different tables. It might seem odd at first, but you can join a table to itself. For example, you can determine whether any presidents were born in the same city by checking each president's birthplace against every other president's birthplace:

mysql> SELECT p1.last_name, p1.first_name, p1.city, p1.state
    -> FROM president AS p1, president AS p2
    -> WHERE p1.city = p2.city AND p1.state = p2.state
    -> AND (p1.last_name != p2.last_name OR p1.first_name != p2.first_name)
    -> ORDER BY state, city, last_name;
+-----------+-------------+-----------+-------+
| last_name | first_name  | city      | state |
+-----------+-------------+-----------+-------+
| Adams     | John Quincy | Braintree | MA    |
| Adams     | John        | Braintree | MA    |
+-----------+-------------+-----------+-------+

There are two tricky things about this query:

  • We need to refer to two instances of the same table, so we create table aliases (p1, p2) and use them to disambiguate references to the table's columns.

  • Every president's record matches itself, but we don't want to see that in the output. The second line of the WHERE clause disallows matches of a record to itself by making sure the records being compared are for different presidents.

A similar query finds presidents who were born on the same day. Birth dates cannot be compared directly because that would miss presidents who were born in different years. Instead, we use MONTH() and DAYOFMONTH() to compare month and day of the birth date:

mysql> SELECT p1.last_name, p1.first_name, p1.birth
    -> FROM president AS p1, president AS p2
    -> WHERE MONTH(p1.birth) = MONTH(p2.birth)
    -> AND DAYOFMONTH(p1.birth) = DAYOFMONTH(p2.birth)
    -> AND (p1.last_name != p2.last_name OR p1.first_name != p2.first_name)
    -> ORDER BY p1.last_name;
+-----------+------------+------------+
| last_name | first_name | birth      |
+-----------+------------+------------+
| Harding   | Warren G.  | 1865-11-02 |
| Polk      | James K.   | 1795-11-02 |
+-----------+------------+------------+

Using DAYOFYEAR() rather than the combination of MONTH() and DAYOFMONTH() would result in a simpler query, but it would produce incorrect results when comparing dates from leap years to dates from non-leap years.

The joins performed thus far combined information from tables that have some meaningful logical relationship, but only you know the meaningfulness of that relationship. MySQL doesn't know (or care) whether or not the joined tables have anything to do with each other. For instance, you can join the event table to the president table to find out whether or not you gave any quizzes or tests on a president's birthday:

mysql> SELECT president.last_name, president.first_name,
    -> president.birth, event.type
    -> FROM president, event
    -> WHERE MONTH(president.birth) = MONTH(event.date)
    -> AND DAYOFMONTH(president.birth) = DAYOFMONTH(event.date);
+-----------+------------+------------+------+
| last_name | first_name | birth      | type |
+-----------+------------+------------+------+
| Carter    | James E.   | 1924-10-01 | T    |
+-----------+------------+------------+------+

It turns out you did. But so what? This illustrates that MySQL will happily crank out results, whether they make any sense or not. Just because you're using a computer, it doesn't automatically mean that results from a query are useful or worthwhile. Fortunately or unfortunately, we still must think about what we're doing.

Deleting or Updating Existing Records

Sometimes you want to get rid of records or change their contents. The DELETE and UPDATE statements let you do this.

The DELETE statement has this form:

DELETE FROM tbl_name WHERE which records to delete
						

The WHERE clause specifies which records should be deleted. It's optional, but if you leave it out, all records are deleted. That means the simplest DELETE statement is also the most dangerous:

DELETE FROM tbl_name
						

This query wipes out the table's contents entirely. Be careful with this!

To delete specific records, use the WHERE clause to select the records in which you're interested. This is similar to the WHERE clause in a SELECT statement. For example, to delete from the president table all presidents born in Ohio, use this query:

mysql> DELETE FROM president WHERE state="OH";
Query OK, 7 rows affected (0.00 sec)

One limitation of the WHERE clause for DELETE is that you can refer only to columns of the table from which you're deleting records.

Before issuing a DELETE statement, it's often a good idea to test the WHERE clause by using it with a SELECT statement to make sure you'll actually delete the records you intend (and only those records). Suppose you want to delete the record for Teddy Roosevelt. Would the following query do the job?

DELETE FROM president WHERE last_name="Roosevelt"

Yes, in the sense that it would delete the record you have in mind. No, in the sense that it also would delete the record for Franklin Roosevelt. It's safer to check the WHERE clause first, like this:

mysql> SELECT last_name, first_name FROM president
    -> WHERE last_name="Roosevelt";
+-----------+-------------+
| last_name | first_name  |
+-----------+-------------+
| Roosevelt | Theodore    |
| Roosevelt | Franklin D. |
+-----------+-------------+

From that you can see the need to be more specific:

mysql> SELECT last_name, first_name FROM president
    -> WHERE last_name="Roosevelt" AND first_name="Theodore";
+-----------+------------+
| last_name | first_name |
+-----------+------------+
| Roosevelt | Theodore   |
+-----------+------------+

Now we know the proper WHERE clause to select the desired record, so the DELETE query can be constructed correctly:

mysql> DELETE FROM president
    -> WHERE last_name="Roosevelt" AND first_name="Theodore";
						

This seems like a lot of work to delete a record, doesn't it? Better safe than sorry! (This is the type of situation in which you'll want to minimize typing through the use of copy and paste or input line-editing techniques. See "Tips for Interacting With mysql" for more information.)

To modify existing records, use UPDATE, which has this form:

UPDATE tbl_name SET which columns to change
WHERE which records to update
						

The WHERE clause is just as for DELETE. It's optional, so if you don't specify one, every record in the table is updated. The following query changes the name of each of your students to "George":

mysql> UPDATE student SET name="George";
						

Obviously, you must be careful with queries like that.

Usually you'll be more specific with the records you're updating. Suppose you recently added a new member to the Historical League but filled in only a few columns of his entry:

mysql> INSERT member (last_name,first_name)
    -> VALUES('York','Jerome'),
						

Then you realize you forgot to set his membership expiration date. You can fix that as follows:

mysql> UPDATE member
    -> SET expiration='2001-7-20'
    -> WHERE last_name='York' AND first_name='Jerome';
						

You can update multiple columns at the same time. The following statement updates Jerome's email and postal addresses:

mysql> UPDATE member
    -> SET email='[email protected]',street='123 Elm St',city='Anytown',
    -> state='NY',zip='01003'
    -> WHERE last_name='York' AND first_name='Jerome';
						

You can also "unset" a column by setting its value to NULL (assuming the column allows NULL values). If at some point in the future Jerome later decides to pay the big membership renewal fee that allows him to become a lifetime member, you can mark his record that way by setting his expiration date to NULL ("never expires"):

mysql> UPDATE member
    -> SET expiration=NULL
    -> WHERE last_name='York' AND first_name='Jerome';
						

With UPDATE, just as for DELETE, it's not a bad idea to test a WHERE clause using a SELECT statement to make sure you're choosing the right records to update. If your selection criteria are too narrow or too broad, you'll update too few or too many records.

If you've tried the queries in this section, you'll have deleted and modified records in the samp_db tables. Before proceeding to the next section you should undo those changes. Do that by reloading the tables using the instructions at the end of the section "Adding New Records."

Altering the Structure of Tables

Remember how we created the Historical League member table without a membership number column so that I'd have an excuse to use the ALTER TABLE statement? It's time to do that. With ALTER TABLE, you can rename tables, add or drop columns, change column types, and more. The only example I'll show here is how to add a new column. See Chapter 3 for more details about what ALTER TABLE is capable of.

The primary consideration when adding a membership number column to the member table is that the values should be unique to avoid confusion between entries. An AUTO_INCREMENT column is useful here because then we can let MySQL generate unique numbers for us automatically when we add new members. In a CREATE TABLE statement, the specification for such a column would look like this:

member_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY

For ALTER TABLE, the syntax is similar. Run this query to add the column:

mysql> ALTER TABLE member
    -> ADD member_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;
						

Now that we have a new column to hold membership numbers, how do we assign numbers to the existing records in the member table? That's easy—MySQL has already done it! When you add a column to a table, MySQL initializes column values with the default value. In the case of an AUTO_INCREMENT column, this causes a new sequence number to be generated for each row.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.27.45