Chapter 4. Working with Database Structures

This chapter shows you how to create your own databases, add and remove structures such as tables and indexes, and make choices about column types in your tables. It focuses on the syntax and features of SQL, and not the semantics of conceiving, specifying, and refining a database design; you’ll find an introductory description of database design techniques in Chapter 2. To work through this chapter, you need to understand how to work with an existing database and its tables, as discussed in Chapter 3.

This chapter lists the structures in the sample sakila database; detail on how to load the database is presented in Chapter 1. If you’ve followed those instructions, you’ll already have the database available and know how to restore the database after you’ve modified its structures.

When you finish this chapter, you’ll have all the basics required to create, modify, and delete database structures. Together with the techniques you learned in Chapter 3, you’ll have the skills to carry out a wide range of basic operations. Chapters [Link to Come], [Link to Come], and [Link to Come] cover skills that allow you to do more advanced operations with MySQL.

Creating and Using Databases

When you’ve finished designing a database, the first practical step to take with MySQL is to create it. You do this with the ((("CREATE DATABASE statement")))CREATE DATABASE statement. Suppose you want to create a database with the name lucy. Here’s the statement you’d type in the monitor:

mysql> CREATE DATABASE lucy;
Query OK, 1 row affected (0.10 sec)

We assume here that you know how to connect to and use the monitor, as described in [Link to Come]. We also assume that you’re able to connect as the root user or as another user who can create, delete, and modify structures (you’ll find a detailed discussion on user privileges in [Link to Come]). Note that when you create the database, MySQL says that one row was affected. This isn’t in fact a normal row in any specific database—but a new entry added to the list that you see with SHOW DATABASES.

Prior to 8.0, behind the scenes, MySQL created a new directory under the data directory for the new database and stored the text file db.opt that lists the database options; for example, the file might contain:

default-character-set=latin1
default-collation=latin1_swedish_ci

These particular two lines specify the default character set and collation of the new database. We’ll look at what these mean later, but you generally won’t need to know much about the db.opt file or access it directly.

MySQL 8.0 internalized this and other type of metadata files inside of the data dictionary, decreasing code complexity and performance overhead associated with a files-based approach. The most important feauter of the new approach for a DBA is the fact that DDL is now atomic: either going through or not. This wasn’t always true in earlier releases.

Once you’ve created the database, the next step is to use it—that is, choose it as the database you’re working with. You do this with the MySQL command:

mysql> USE lucy;
Database changed

As discussed previously in Chapter 3, this command must be entered on one line and need not be terminated with a semicolon, though we usually do so automatically through habit. Once you’ve used the database, you can start creating tables, indexes, and other structures using the steps discussed next in “Creating Tables”.

Before we move on to creating other structures, let’s discuss a few features and limitations of creating databases. First, let’s see what happens if you create a database that already exists:

mysql> CREATE DATABASE lucy;
ERROR 1007 (HY000): Can't create database lucy; database exists

You can avoid this error by adding the IF NOT EXISTS keyword phrase to the statement:

mysql> CREATE DATABASE IF NOT EXISTS lucy;
Query OK, 0 rows affected (0.00 sec)

You can see that MySQL didn’t complain, but it didn’t do anything either: the 0 rows affected message indicates that no data was changed. This addition is useful when you’re adding SQL statements to a script: it prevents the script from aborting on error.

Let’s discuss how to choose database names and the use of character case. Database names define physical directory (or folder) names on disk. On some operating systems, directory names are case-sensitive; on others, case doesn’t matter. For example, Unix-like systems such as Linux and Mac OS X are typically case-sensitive, while Windows isn’t. The result is that database names have the same restrictions: when case matters to the operating system, it matters to MySQL. For example, on a Linux machine, LUCY, lucy, and Lucy are different database names; on Windows, they refer to just one database. Using incorrect capitalization under Linux or Mac OS X will cause MySQL to complain:

mysql> select actOr.Actor_id from ACTor;
ERROR 1146 (42S02): Table sakila.ACTor doesn't exist

but under Windows, this will normally work. To make your SQL machine-independent, we recommend that you consistently use lowercase names for databases (and for tables, columns, aliases, and indexes).

There are other restrictions on database names. They can be at most 64 characters in length. You also shouldn’t use MySQL reserved words—such as SELECT, FROM, and USE—as names for structures; these can confuse the MySQL parser, making it impossible to interpret the meaning of your statements. There’s a way around this problem: you can enclose the reserved word with the backtick symbol () on either side, but it’s more trouble remembering to do so than it’s worth. In addition, you can’t use selected characters in the names: specifically, you can’t use the forward slash, backward slash, semicolon, and period characters, and a database name can’t end in whitespace. Again, the use of these characters confuses the MySQL parser and can result in unpredictable behavior. For example, here’s what happens when you insert a semicolon into a database name:

mysql> CREATE DATABASE IF NOT EXISTS lu;cy;
Query OK, 1 row affected (0.00 sec)

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right syntax to use
near cy at line 1

Since more than one SQL statement can be on a single line, the result is that a database lu is created, and then an error is generated by the very short, unexpected SQL statement cy;.

Creating Tables

This section covers topics on table structures. We show you how to:

  • Create tables, through introductory examples

  • Choose names for tables and table-related structures

  • Understand and choose column types

  • Understand and choose keys and indexes

  • Use the proprietary MySQL AUTO_INCREMENT feature

When you finish this section, you’ll have completed all of the basic material on creating database structures; the remainder of this chapter covers the sample sakila database, and how to alter and remove existing structures.

Basics

For our examples in this section, we’ll assume that the database sakila hasn’t been created. If you want to follow the examples, and you have already loaded the database, you can drop it for this section and reload it later; dropping it removes the database, tables, and all of the data, but the original is easy to restore by following the steps in Chapter 1. Here’s how you drop it temporarily:

mysql> DROP DATABASE sakila;
Query OK, 4 rows affected (0.06 sec)

The DROP statement is discussed further at the end of this chapter in “Deleting Structures”.

To begin, create the database sakila using the statement:

mysql> CREATE DATABASE sakila;
Query OK, 1 row affected (0.00 sec)

Then select the database with:

mysql> USE sakila;
Database changed

We’re now ready to begin creating the tables that’ll hold our data. Let’s create a table to hold actor details. For now, we’re going to have a simplified structure, and talk about more complexity later. Here’s the statement that we use:

mysql> CREATE TABLE actor (
    -> actor_id SMALLINT UNSIGNED NOT NULL DEFAULT 0,
    -> first_name VARCHAR(45) DEFAULT NULL,
    -> last_name VARCHAR(45),
    -> last_update TIMESTAMP,
    -> PRIMARY KEY (actor_id)
    -> );

Query OK, 0 rows affected (0.01 sec)

Don’t panic: even though MySQL reports that zero rows were affected, it’s definitely created the table:

mysql> SHOW TABLES;
------------------
| Tables_in_sakila |
------------------
| actor            |
------------------
1 row in set (0.01 sec)

Let’s consider all this in detail. The CREATE TABLE statement has three major sections:

  1. The CREATE TABLE statement, which is followed by the table name to create. In this example, it’s actor.

  2. A list of one or more columns to add to the table. In this example, we’ve added quite a few: actor_id SMALLINT UNSIGNED NOT NULL DEFAULT 0, first_name VARCHAR(45) DEFAULT NULL, last_name VARCHAR(45), and last_update TIMESTAMP. We’ll discuss these in a moment.

  3. Optional key definitions. In this example, we’ve defined a single key: PRIMARY KEY (actor_id). We’ll discuss keys and indexes in detail later in this section.

Notice that the CREATE TABLE component is followed by an opening parenthesis that’s matched by a closing parenthesis at the end of the statement. Notice also that the other components are separated by commas. There are other elements that you can add to a CREATE TABLE statement, and we’ll discuss some in a moment.

Let’s discuss the column specifications. The basic syntax is as follows: __name type__ [NOT NULL | NULL] [DEFAULT __value__]. The `name` field is the column name, and it has the same limitations as database names, as discussed in the previous section. It can be at most 64 characters in length, backward and forward slashes aren’t allowed, periods aren’t allowed, it can’t end in whitespace, and case sensitivity is dependent on the underlying operating system. The `type` defines how and what is stored in the column; for example, we’ve seen that it can be set to VARCHAR for strings, SMALLINT for numbers, or TIMESTAMP for a date and time.

If you specify NOT NULL, a row isn’t valid without a value for the column; if you specify NULL or omit the clause, a row can exist without a value for the column. If you specify a `value` with the DEFAULT clause, it’ll be used to populate the column when you don’t otherwise provide data; this is particularly useful when you frequently reuse a default value such as a country name. The `value` must be a constant (such as 0, "cat", or 20060812045623), except if the column is of the type TIMESTAMP. Types are discussed in detail later in this section.

The NOT NULL and DEFAULT features can be used together. If you specify NOT NULL and add a DEFAULT value, the default is used when you don’t provide a value for the column. Sometimes, this works fine:

mysql> INSERT INTO actor(first_name) values ("John");
Query OK, 1 row affected (0.01 sec)

And sometimes it doesn’t:

mysql> INSERT INTO actor(first_name) values ("Elisabeth");
ERROR 1062 (23000): Duplicate entry 0 for key actor.PRIMARY

Whether it works or not is dependent on the underlying constraints and conditions of the database: in this example, actor_id has a default value of 0, but it’s also the primary key. Having two rows with the same primary-key value isn’t permitted, and so the second attempt to insert a row with no values (and a resulting primary-key value of 0) fails. We discuss primary keys in detail later in this section.

Column names have fewer restrictions than database and table names. What’s more, they’re not dependent on the operating system: the names are case-insensitive and portable across all platforms. All characters are allowed in column names, though if you want terminate them with whitespace or include periods (or other special characters such as the semicolon), you’ll need to enclose the name with a backtick symbol (```) on either side. We recommend that you consistently choose lowercase names for developer-driven choices (such as database, alias, and table names) and avoid characters that require you to remember to use backticks. Naming the columns, as well as other database objects is something of a personal preference when starting anew, or a matter of following standards when working on an existing codebase. We recommend avoiding repeating yourself. Column name actor_first_name is going to look redundant when table name precedes it (e.g. in a complex join query): actor.actor_first_name or actor.first_name. Usually, one exception is done to that: the ubiquitous id column name should either not be used or have the table name prepended for clarity. You may use sakila, world, or employee example databases to get an inspiration. Another good practice is to use the underscore character to separate words; you could use underscores or dashes, or omit the word-separating formatting altogether. However, “CamelCase” is harder to read. As with database and table names, the longest column name is 64 characters in length.

Collation and Character Sets

Because not everyone wants to store English strings, it’s important that a database server be able to manage non-English characters and different ways of sorting characters. When you’re comparing or sorting strings, how MySQL evaluates the result depends on the character set and collation used. Character sets define what characters can be stored; for example, you may need to store non-English characters such as ٱ or ü. A collation defines how strings are ordered, and there are different collations for different languages: for example, the position of the character ü in the alphabet is different in two German orderings, and different again in Swedish and Finnish.

In our previous string-comparison examples, we ignored the collation and character-set issue, and just let MySQL use its defaults; in versions of MySQL prior to 8.0, the default character set is latin1, and the default collation is latin1_swedish_ci. MySQL 8.0 changed the defaults, and now the default character set is utf8mb4, and the default collation is utf8mb4_0900_ai_ci. MySQL can be configured to use different character sets and collation orders at the connection, database, table, and column levels. Outputs below come from MySQL 8.0.

You can list the character sets available on your server with the SHOW CHARACTER SET command. This shows a short description for each character set, its default collation, and the maximum number of bytes used for each character in that character set:

mysql> SHOW CHARACTER SET;
------------------------------------------------------------------------+
| Charset  | Description                     | Default collation   | Maxlen |
------------------------------------------------------------------------+
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| binary   | Binary pseudo charset           | binary              |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_0900_ai_ci  |      4 |
------------------------------------------------------------------------+
41 rows in set (0.00 sec)

For example, the latin1 character set is actually the Windows code page 1252 that supports West European languages. The default collation for this character set is latin1_swedish_ci, which follows Swedish conventions to sort accented characters (English is handled as you’d expect). This collation is case-insensitive, as indicated by the letters ci. Finally, each character takes up one byte. By comparison, if you use the default utf8mb4 character set, each character would take up to four bytes of storage. Sometimes, it makes sense to change that default. For example, there’s no reason to store base64-encoded data (which, by definition, is ASCII) in utf8mb4. With a 128 character wide column, at million rows you’re looking at approximately 350MiB of overhead on charset alone in the worst case.

Similarly, you can list the collation orders and the character sets they apply to:

mysql> SHOW COLLATION;
-------------------------------------------------------------------------------
| Collation           | Charset  | Id  | Default | Compiled | Sortlen | Pad_attribute |
-------------------------------------------------------------------------------
| armscii8_bin        | armscii8 |  64 |         | Yes      |       1 | PAD SPACE     |
| armscii8_general_ci | armscii8 |  32 | Yes     | Yes      |       1 | PAD SPACE     |
| ascii_bin           | ascii    |  65 |         | Yes      |       1 | PAD SPACE     |
| ascii_general_ci    | ascii    |  11 | Yes     | Yes      |       1 | PAD SPACE     |
...
| greek_bin           | greek    |  70 |         | Yes      |       1 | PAD SPACE     |
| greek_general_ci    | greek    |  25 | Yes     | Yes      |       1 | PAD SPACE     |
| hebrew_bin          | hebrew   |  71 |         | Yes      |       1 | PAD SPACE     |
| hebrew_general_ci   | hebrew   |  16 | Yes     | Yes      |       1 | PAD SPACE     |
...
| utf8_unicode_ci     | utf8     | 192 |         | Yes      |       8 | PAD SPACE     |
| utf8_vietnamese_ci  | utf8     | 215 |         | Yes      |       8 | PAD SPACE     |
-------------------------------------------------------------------------------
272 rows in set (0.02 sec)

You can see the current defaults on your server as follows:

mysql> SHOW VARIABLES LIKE c%;
----------------------------------------------------------+
| Variable_name            | Value                          |
----------------------------------------------------------+
...
| character_set_client     | utf8mb4                        |
| character_set_connection | utf8mb4                        |
| character_set_database   | utf8mb4                        |
| character_set_filesystem | binary                         |
| character_set_results    | utf8mb4                        |
| character_set_server     | utf8mb4                        |
| character_set_system     | utf8                           |
| character_sets_dir       | /usr/share/mysql-8.0/charsets/ |
...
| collation_connection     | utf8mb4_0900_ai_ci             |
| collation_database       | utf8mb4_0900_ai_ci             |
| collation_server         | utf8mb4_0900_ai_ci             |
...
----------------------------------------------------------+
21 rows in set (0.00 sec)

When you’re creating a database, you can set the default character set and sort order for the database and its tables. For example, if you want to use the utf8mb4 character set and the utf8mb4_ru_0900_as_cs (case-sensitive) collation order, you would write:

mysql> CREATE DATABASE rose DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_ru_0900_as_cs;
Query OK, 1 row affected (0.00 sec)

Usually, there’s no need to do this if you’ve installed your MySQL correctly for your language and region, and if you’re not planning on internationalizing your application. With utf8mb4 being the default since MySQL 8.0, there’s even less need to change the charset. You can also control the character set and collation for individual tables or columns, but we won’t go into the detail of how to do that here.

Other Features

This section briefly describes other features of the MySQL CREATE TABLE statement. It includes an example using the IF NOT EXISTS feature, and a list of advanced features and where to find more about them in this book. The statement shown is the full representation of the table taken from sakila database, unlike a previous simplified example.

You can use the IF NOT EXISTS keyword phrase when creating a table, and it works much as it does for databases. Here’s an example that won’t report an error even when the actor table exists:

mysql> CREATE TABLE IF NOT EXISTS actor (
    -> actor_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
    -> first_name VARCHAR(45) NOT NULL,
    -> last_name VARCHAR(45) NOT NULL,
    -> last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    -> PRIMARY KEY  (actor_id),
    -> KEY idx_actor_last_name (last_name)
    -> );
)
Query OK, 0 rows affected, 1 warning (0.00 sec)

You can notice that 0 rows are affected, and a warning is reported. Let’s take a look:

mysql> SHOW WARNINGS;
-------------------------------------------
| Level | Code | Message                      |
-------------------------------------------
| Note  | 1050 | Table actor already exists |
-------------------------------------------
1 row in set (0.01 sec)

There are a wide range of additional features you can add to a CREATE TABLE statement, only few of which are present in this more full statement. Many of these are advanced and aren’t discussed in this book, but you can find more information in the MySQL manual under the heading “CREATE TABLE Statement.” These additional features include:

The AUTO_INCREMENT feature for numeric columns

This feature allows you to automatically create unique identifiers for a table. We discuss it in detail later in this chapter in “The AUTO_INCREMENT Feature”.

Column comments

You can add a comment to a column; this is displayed when you use the SHOW CREATE TABLE command that we discuss later in this section.

Foreign key constraints

You can tell MySQL to check whether data in one or more columns matches data in another table. For example, sakila database has a foreign key constraint on a city_id column of the address table, referring to the city table’s city_id column. That means, it’s impossible to have an address in a city not present in the city table. We’ll take a deeper look at foreign key constraints in [Link to Come]. Not every storage engine in MySQL supports foreign keys.

Creating temporary tables

If you create a table using the keyword phrase CREATE TEMPORARY TABLE, it’ll be removed (dropped) when the monitor connection is closed. This is useful for copying and reformatting data because you don’t have to remember to clean up. Sometimes, they are also used as an optimization to hold some intermediate data.

Advanced table options

You can control a wide range of features of the table using table options. These include the starting value of AUTO_INCREMENT, the way indexes and rows are stored, and options to override the information that the MySQL query optimizer gathers from the table. It’s also possible to specify generated columns, containing data like sum of two other columns, as well as indexes on such columns.

Control over index structures

Some storage engines in MySQL allow you to specify and control what type of internal structure—such as a B-tree or hash table—MySQL uses for its indexes. You can also tell MySQL that you want a full text or spatial data index on a column, allowing special types of search.

Partitioning

MySQL supports different partitioning strategies, which you can select at the table creation time, as well as at a later time. We will cover partitioning later in [Link to Come].

You can check the CREATE TABLE statement for a table using the SHOW CREATE TABLE statement introduced in Chapter 3. This often shows you output that includes some of the advanced features we’ve just discussed; the output rarely matches what you actually typed to create the table. Here’s an example for the actor table:

mysql> SHOW CREATE TABLE actorG

       Table: actor
Create Table: CREATE TABLE `actor` (
  `actor_id` smallint unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT
        CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
    COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

We’ve reformatted the output slightly to fit better in this book. You’ll notice that the output includes content added by MySQL that wasn’t in our original CREATE TABLE statement:

  • The names of the table and columns are enclosed in backticks. This isn’t necessary, but it does avoid any parsing problems that can occur through using reserved words and special characters, as discussed previously

  • An additional default ENGINE clause is included, which explicitly states the table type that should be used. The setting in a default installation of MySQL is InnoDB, so it has no effect in this example

  • An additional DEFAULT CHARSET=utf8mb4 clause is included, which tells MySQL what character set is used by the columns in the table. Again, this has no effect in a default installation

Column Types

This section describes the column types you can use in MySQL. It explains when each should be used and any limitations it has. The types are grouped by their purpose. We’ll cover widely-used datatypes, and mention more advanced or less used types in passing. That doesn’t mean they have no use, but consider learning about them as an exercise. Most likely, you will not remember each of the data types and its particular intricacies, and that’s okay. It’s worth re-reading this chapter later, and consulting with MySQL documentation on the topic to keep your knowledge up to date.

Integer types

We will start with numeric data types, and more specifically with integer types, or the types holding specific whole number. First, the two most popular integer types.

INT[(__width__)] [UNSIGNED] [ZEROFILL]

The most commonly used numeric type. Stores integer (whole number) values in the range –2,147,483,648 to 2,147,483,647. If the optional UNSIGNED keyword is added, the range is 0 to 4,294,967,295. The keyword INT is short for INTEGER, and they can be used interchangeably. An INT column requires four bytes of storage space.

INT, as well as other integer types, has two properties specific to MySQL: optional `width` and ZEROFILL arguments. They are not part of an SQL standard, and as of MySQL 8.0 are deprecated. Still, you will surely notice them in a lot of codebases, so we will briefly cover both of them.

The `width` parameter specifies the display width, which can be read by applications as part of the column metadata. Contrary to parameters in a similar position for other data types, this parameter has no effect on the storage characteristics of a particular integer type, and does not constrain the usable range of values. INT(4) and INT(32) are same for the purpose of data storage.

ZEROFILL is an additional argument, which is used to left-pad the values with zeros up to the length, specified by the `width`. If you use ZEROFILL, MySQL automatically adds UNSIGNED to the declaration (since zero filling makes sense only in the context of positive numbers).

In a few applications where ZEROFILL and `width` are useful, LPAD() function can be used, or numbers can be stored formatted in CHAR column.

BIGINT[(__width__)] [UNSIGNED] [ZEROFILL]

In the world of growing data sizes, having tables with count of rows in the billions is getting common. Even simple id-type columns might need a wider range than a regular INT provides. BIGINT solves that problem. It is a large integer type with a signed range of -9223372036854775808 to 9223372036854775807. Unsigned BIGINT can store numbers from 0 to 18446744073709551615. Column of this type will require eight bytes of storage.

Internally, all calculations within MySQL are done using signed BIGINT or DOUBLE values. The important consequence of that is that you should be extremely careful when dealing with extremely large numbers. First, unsigned big integers larger than 9223372036854775807 should only be used with bit functions. Second, if a result of a arithmetical operation is larger than 9223372036854775807, unexpected results might be observed.

For example:

mysql> CREATE TABLE test_bigint (id BIGINT UNSIGNED);
mysql> INSERT INTO test_bigint VALUES (18446744073709551615);
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO test_bigint VALUES (18446744073709551615-1);
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO test_bigint VALUES (184467440737095516*100);
ERROR 1690 (22003): BIGINT value is out of range in (184467440737095516 * 100)

Even though 18446744073709551600 is less than 18446744073709551615, since signed BIGINT is used for multiplication internally, the out of range error is observed.

Data type SERIAL can be used as an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE. Unless you must optimize for data size and performance, consider using SERIAL for your id-like columns. Even UNSIGNED INT runs out of range much quicker than you’d expect, and in the worst possible time.

We will only touch other integer types briefly. However, you should keep in mind, that while it’s possible to store every integer as BIGINT, that’s wasteful in terms of storage space. Moreover, since as we discussed the `width` parameter doesn’t constrain the range of values, different integer types can be used for that purpose.

SMALLINT[(__width__)] [UNSIGNED] [ZEROFILL]

As expected from its name, this type holds a small integer, with range from -32768 to 32767 signed, and from 0 to 65535 unsigned. It takes two bytes of storage.

TINYINT[(__width__)] [UNSIGNED] [ZEROFILL]

Even smaller integer, and the smallest numeric data type. Range of this type is -128 to 127 signed and 0 to 255 unsigned. It only takes one byte of storage.

BOOL[(__width__)]

Short for BOOLEAN, and a synonym for TINYINT(1). Usually, boolean types only accept two values: true or false. However, since BOOL in MySQL is an integer type, you can store values from -128 to 127 there. 0 will be treated as false, and all nonzero values as true. It’s also possible to use special true and false aliases for 1 and 0 respectively.

mysql> CREATE TABLE test_bool (i BOOL);
Query OK, 0 rows affected (0.04 sec)
mysql> INSERT INTO test_bool VALUES (true),(false);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql> INSERT INTO test_bool VALUES (1),(0),(-128),(127);
Query OK, 4 rows affected (0.02 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql> SELECT i, IF(i,true,false) FROM test_bool;
----------------------------+
| i    | IF(i,true,false) |
----------------------------+
|    1 | true                 |
|    0 | false                |
|    1 | true                 |
|    0 | false                |
| -128 | true                 |
|  127 | true                 |
----------------------------+
6 rows in set (0.01 sec)
MEDIUMINT[(__width__)] [UNSIGNED] [ZEROFILL]

Another type of integer. Takes 3 bytes of storage space. Stores values in the signed range of -8388608 to 8388607, and unsigned range of 0 to 16777215.

BIT[(__M__)]

This is a special type used to store bit values. `M` specifies number of bits per value and defaults to 1 if omitted. MySQL uses a b'__value__' syntax for the binary values.

Fixed-point types

Both DECIMAL and NUMERIC data types in MySQL are the same. Thus we will only describe DECIMAL here, and everything will apply to NUMERIC. The main difference between fixed-point types (or type, in MySQL case) and floating-point types is precision. For fixed-point types, the value retrieved is identical to the value stored; this isn’t always the case with other types that contain decimal points, such as the FLOAT and DOUBLE types described later. That is the most important property of the DECIMAL data type.

DECIMAL[(__width__[,__decimals__])] [UNSIGNED] [ZEROFILL]

A commonly used numeric type. Stores a fixed-point number such as a salary or distance, with a total of `width` digits of which some smaller number are `decimals` that follow a decimal point. The maximum value of `width` is 255. For example, a column declared as price DECIMAL(6,2) should be used to store values in the range –9999.99 to 9999.99. price DECIMAL(10,4) would allow values like 123456.1234.

Prior to MySQL version 5.7, if you tried to store a value that’s outside this range, it would be stored as the closest value in the allowed range. For example, 100 would be stored as 99.99, and –100 would be stored as –99.99. Starting with 5.7, however, special SQL mode is set, which prohibits this, and other unsafe behaviors: STRICT_TRANS_TABLES. Using old behavior is possible, but could result in a data loss.

SQL modes are special settings that control behavior of MySQL when it comes to queries. For example, as you can see above, they can restrict “unsafe” behavior. Other modes might affect how queries are interpreted. We will discuss SQL modes in a little more depth in [Link to Come].

The `width` is optional, and a value of 10 is assumed when this is omitted. The number of `decimals` is optional and, when omitted, a value of 0 is assumed; the maximum value of `decimals` should be two less than the value of `width`. The maximum value of `width` is 65, and `decimals` is 30.

If you’re storing only positive values, you can use the UNSIGNED keyword as described for INT. If you want zero padding, use the ZEROFILL keyword for the same behavior as described for INT. The keyword DECIMAL has three identical, interchangeable alternatives: DEC, NUMERIC, and FIXED.

Values in DECIMAL column are stored using a binary format. This format uses four bytes for every nine digits.

Floating-point types

In “Fixed-point types”, we discussed the fixed-point DECIMAL type. There are two other types that support decimal points: DOUBLE (also known as REAL) and FLOAT. They’re designed to store approximate numeric values rather than the exact values stored by DECIMAL.

Why would you want approximate values? The answer is that many numbers with a decimal point are approximations of real quantities. For example, suppose you earn $50,000 per annum and you want to store it as a monthly wage. When you convert it to a per-month amount, it’s $4,166 plus 66 and 2/3rds cents. If you store this as $4,166.67, it’s not exact enough to convert to a yearly wage (since 12 multiplied by $4,166.67 is $50,000.04). However, if you store 2/3rds with enough decimal places, it’s a closer approximation. You’ll find that it is accurate enough to correctly multiply to obtain the original value in a high-precision environment such as MySQL, using only a bit of rounding. That’s where DOUBLE and FLOAT are useful: they let you store values such as 2/3rds or pi with a large number of decimal places, allowing accurate approximate representations of exact quantities. You can later use ROUND() function to restore results to a given precision.

Let’s continue the previous example using DOUBLE. Suppose you create a table as follows:

mysql> CREATE TABLE wage (monthly DOUBLE);
Query OK, 0 rows affected (0.09 sec)

You can now insert the monthly wage using:

mysql> INSERT INTO wage VALUES (50000/12);
Query OK, 1 row affected (0.00 sec)

And see what’s stored:

mysql> SELECT * FROM wage;
----------------
| monthly        |
----------------
| 4166.666666666 |
----------------
1 row in set (0.00 sec)

However, when you multiply it to a yearly value, you get a high precision approximation:

mysql> SELECT monthly*12 FROM wage;
--------------------
| monthly*12         |
--------------------
| 49999.999999992004 |
--------------------
1 row in set (0.00 sec)

To get the original value back, you still need to perform a rounding with a desired precision. For example, your business might require precision to five decimal places. In this case, you could restore the original:

mysql> SELECT ROUND(monthly*12,5) FROM wage;
---------------------
| ROUND(monthly*12,5) |
---------------------
|         50000.00000 |
---------------------
1 row in set (0.00 sec)

But precision to eight decimal places would not result in the original value:

mysql> SELECT ROUND(monthly*12,8) FROM wage;
---------------------
| ROUND(monthly*12,8) |
---------------------
|      49999.99999999 |
---------------------
1 row in set (0.00 sec)

It’s important to understand the imprecise and approximate value of floating-point data types.

Here are the details of the DOUBLE and FLOAT types:

FLOAT[(width, decimals)] [UNSIGNED] [ZEROFILL]`or`FLOAT[(precision)] [UNSIGNED] [ZEROFILL]

Stores floating-point numbers. It has two optional syntaxes: the first allows an optional number of `decimals` and an optional display `width`, and the second allows an optional `precision` that controls the accuracy of the approximation measured in bits. Without parameters, the type stores small, four-byte, single-precision floating-point values; usually, you use it without providing any parameters. When `precision` is between 0 and 24, the default behavior occurs. When `precision` is between 25 and 53, the type behaves as for DOUBLE. The `width` has no effect on what is stored, only on what is displayed. The UNSIGNED and ZEROFILL options behave as for INT.

DOUBLE[(__width__, __decimals__)] [UNSIGNED] [ZEROFILL]

Stores floating-point numbers. It has one optional syntax: it allows an optional number of `decimals` and an optional display `width`. Without parameters, the type stores normal, eight-byte, double-precision floating point values; usually, you use it without providing any parameters. The `width` has no effect on what is stored, only on what is displayed. The UNSIGNED and ZEROFILL options behave as for INT. The DOUBLE type has two identical synonyms: REAL and DOUBLE PRECISION.

String types

String data types are used to store text, and, less obviously, binary data. In MySQL, the string data types are CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, and SET.

[NATIONAL] VARCHAR(__width__) [CHARACTER SET __charset_name__] [COLLATE __collation_name__]

Probably, the single most commonly used string type. VARCHAR stores variable-length strings up to a maximum `width`. The maximum value of `width` is 65,535 characters. Most of the information applicable to this type will apply to other string types as well.

CHAR and VARCHAR types are very similar, but have a few important distinctions. VARCHAR incurs one or two extra bytes of overhead to store the value of the string, depending on whether the value is smaller than or larger than 255 bytes. Note that this is different from string length in characters, as certain character might require up to 4 bytes of space. It might seem obvious then, that VARCHAR is less efficient. However, that is not always true. As VARCHAR stores strings of arbitrary length (up to the `width`), shorter strings stored will require less storage than CHAR of similar `width`.

Another difference betwee CHAR and VARCHAR is their handling of trailing spaces. VARCHAR retains trailing spaces up to the specified column length, and will truncate the excess, producing a warning. As will be shown later, CHAR values are right-padded to the column length, and the trailing spaces aren’t preserved. For VARCHAR, trailing spaces are significant, unless they are trimmed, and will count as unique values. ' ' and ' ' are different values.

TEXT and CHAR exhibit the same behavior. You can use BLOB type described below, which will accept the above statement.

Both data types disallow storage of values longer than `width`, unless strict SQL mode is disabled. We don’t recommend enabling legacy behavior, as it might result in an unaccounted data loss.

Sorting and comparison of VARCHAR, CHAR, and TEXT types happens according to the collation of the character set assigned. You can see that it is possible to specify character set, as well as collation for each individual string type column. It’s also possible to specify binary character set, which effectively converts VARCHAR into VARBINARY. Don’t mistake binary charset for a BINARY attribute for a charset. The latter is a MySQL-only shorthand to specify a binary (_bin) collation.

NATIONAL attribute is a standard SQL way to specify that a string type column must use a predefined character set. MySQL uses utf8 as such charset. It’s important to note that MySQL versions 5.7 and 8.0 disagree on what is utf8 exactly: former using it as an alias for utf8mb3, and latter — for utf8mb4. Thus, it is best to not use the NATIONAL attribute, as well as ambiguous aliases. The best practice with any text-related columns and data is to be as unambiguous and specific as possible.

[NATIONAL] CHAR[(__width__)] [CHARACTER SET __charset_name__] [COLLATE __collation_name__]

CHAR stores a fixed-length string (such as a name, address, or city) of length `width`. If a `width` is not provided, CHAR(1) is assumed. The maximum value of `width` is 255. As we discussed in the VARCHAR section above, values in CHAR columns are always stored at the specified length. Single letter stored in a CHAR(255) column will take 255 bytes (in latin1 charset), and will be padded with spaces. The padding is removed when reading the data, unless PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled. Worth mentioning again that it means that strings stored in CHAR will lose all of their trailing spaces.

Note that earlier, `width` of a CHAR column was often associated with bytes. That’s not always the case now, and it’s definitely not the case by default. Multi-byte character sets, such as default utf8mb4 can result in a much larger values. InnoDB will actually encode fixed-length columns as variable-length columns, if their maximum size exceeds 768 bytes. Thus, in MySQL 8, by default, InnoDB will store CHAR(255) as it would VARCHAR. Let’s see a small example:

mysql> CREATE TABLE test_char_length(
    -> *  utf8char CHAR(10) CHARACTER SET utf8mb4*
    -> , asciichar CHAR(10) CHARACTER SET binary
    -> );
Query OK, 0 rows affected (0.04 sec)
mysql> INSERT INTO test_char_length VALUES (Plain text, Plain text);
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO test_char_length VALUES (的開源軟體, Plain text);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT LENGTH(utf8char), LENGTH(asciichar) FROM test_char_length;
-------------------------------------+
| LENGTH(utf8char) | LENGTH(asciichar) |
-------------------------------------+
|               10 |                10 |
|               15 |                10 |
-------------------------------------+
2 rows in set (0.00 sec)

As the values are left-aligned and right-padded with spaces, and any trailing spaces aren’t considered for CHAR at all, it’s impossible to compare strings consisting of spaces alone. If you find yourself in a situation when that’s important, VARCHAR is the data type to use.

Plain BINARY[(__width__)] and VARBINARY(__width__)

These types are very similar to CHAR and VARCHAR but store binary strings. Binary strings have the special binary character set and collaction, and sorting them is dependent on the numeric values of the bytes in values stored. Instead of character strings, byte strings are stored. Remember that in VARCHAR we described the binary charset and BINARY attribute. Only the binary charset “converts” VARCHAR or CHAR into a respective BINARY form. BINARY attribute to a charset will not change the fact that character strings are stored. Unlike VARCHAR and CHAR, `width` here is exactly the number of bytes. When `width` is omitted for BINARY, it defaults to 1.

Similar to CHAR, data in the BINARY column is padded on the right. However, that being a binary data, it’s padded using zero bytes, usually written as 0x00 or . BINARY treats spaces as a significant character, not padding. If you need to store data which might end in zero bytes which are significant to you, VARBINARY or BLOB types should be used. That is similar to CHAR and trailing spaces.

It is important to keep the concept of binary string in mind when working with both of these data types. Even though they’ll accept strings, they aren’t a synonym for data types using text strings. For example, you cannot change the case of the letters stored, as that concept doesn’t really apply to binary data. That becomes quite clear when you consider the actual data stored. Let’s see an example:

mysql> CREATE TABLE test_binary_data (
    -> *  d1 BINARY(16)*
    -> , d2 VARBINARY(16)
    -> , d3 CHAR(16)
    -> , d4 VARCHAR(16)
    -> );
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO test_binary_data VALUES (
    -> *  something*
    -> , something
    -> , something
    -> , something
    -> );
Query OK, 1 row affected (0.00 sec)
mysql> *SELECT d1, d2, d3, d4 FROM test_binary_data;
--------------------------------------------------------------------------------+
| d1                                 | d2                   | d3        | d4        |
--------------------------------------------------------------------------------+
| 0x736F6D657468696E6700000000000000 | 0x736F6D657468696E67 | something | something |
--------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT UPPER(d2), UPPER(d4) FROM test_binary_data;
---------------------------------+
| UPPER(d2)            | UPPER(d4) |
---------------------------------+
| 0x736F6D657468696E67 | SOMETHING |
---------------------------------+
1 row in set (0.00 sec)

Note how MySQL Monitor actually shows binary types in hex format. We believe that this is much better than silent conversions that were performed prior to MySQL 8.0, which might’ve resulted in misunderstanding. To get the actual text data back, you have to explicitly cast the binary data to text:

mysql> SELECT CAST(d1 AS CHAR) d1t, CAST(d2 AS CHAR) d2t FROM test_binary_data;
-----------------------------+
| d1t              | d2t       |
-----------------------------+
| something        | something |
-----------------------------+
1 row in set (0.00 sec)

You can also notice that BINARY padding was converted to spaces when converting.

BLOB[(__width__)] and TEXT[(__width__)] [CHARACTER SET __charset_name__] [COLLATE __collation_name__]

BLOB and TEXT are commonly used data types for storing large data. You may think of BLOB as a VARBINARY holding as many data as you like, and similarly of TEXT for VARCHAR. The BLOB and TEXT types itself can store up to 65,535 bytes or characters respectively. As usual, note that multi-byte charsets do exist. `width` attribute is optional, and when it is specified, MySQL actually will change BLOB or TEXT data type to whatever is the smallest type capable of holding that amount of data. For example, BLOB(128) will result in TINYBLOB being used:

mysql> CREATE TABLE test_blob(data BLOB(128));
Query OK, 0 rows affected (0.07 sec)
mysql> DESC test_blob;
--------------------------------------------+
| Field | Type     | Null | Key | Default | Extra |
--------------------------------------------+
| data  | tinyblob | YES  |     | NULL    |       |
--------------------------------------------+
1 row in set (0.00 sec)

For BLOB type and related types, data is treated exactly as it would be in the case of VARBINARY. That is, no character set is assumed, and comparison and sorting is based on the numeric values of actual bytes stored. For TEXT, you may specify exact desired charset and collation. For both types and their variants, no padding is performed on insert, and no trimming is performed on select, making them ideal for storing data exactly as it is. In addition, a DEFAULT clause is not permitted, and you must take a prefix of the value when using it in an index (this is discussed in the next section).

One difference between BLOB and TEXT is their handling of trailing spaces. Even though they aren’t trimmed, trailing spaces don’t count towards record’s uniqueness in the TEXT type. Thus, if in your system values a and ‘a ' must be two separate and unique records, a BLOB type must be used.

TINYBLOB and TINYTEXT [CHARACTER SET __charset_name__] [COLLATE __collation_name__]

Identical to BLOB and TEXT, respectively, except that a maximum of 255 bytes or characters can be stored.

MEDIUMBLOB and MEDIUMTEXT [CHARACTER SET __charset_name__] [COLLATE __collation_name__]

Identical to BLOB and TEXT, respectively, except that a maximum of 16,777,215 bytes or characters can be stored. Types LONG and LONG VARCHAR map to MEDIUMTEXT data type for compatibility.

LONGBLOB and LONGTEXT [CHARACTER SET __charset_name__] [COLLATE __collation_name__]

Identical to BLOB and TEXT, respectively, except that a maximum of four gigabytes of data can be stored. Note that this is a hard limit even in case of LONGTEXT, and thus number of characters in multi-byte charsets can be less than 4,294,967,295. The effective maximum size of the data that can be stored by a client will be limited by the amount of available memory as well as the value of the max_packet_size variable, which defaults to 64MiB.

ENUM(–0—__value1__–1—[,–2—__value2__–3—[, ...]]) [CHARACTER SET __charset_name__] [COLLATE __collation_name__]

A list, or enumeration of string values. A column of type ENUM can be set to a value from the list `value1`, `value2`, and so on, up to a maximum of 65,535 different values. While the values are stored and retrieved as strings, what’s stored in the database is an integer representation. The enumerated column can contain NULL (stored as NULL), the empty string –0— (stored as 0), or any of the valid elements (stored as 1, 2, 3, and so on). You can prevent NULL values from being accepted by declaring the column as NOT NULL when creating the table.

This type is a compact way of storing values from a list of predefined values, such as state or country names. Consider this example using fruit names; the name can be any one of the predefined values Apple, Orange, or Pear (in addition to NULL and the empty string):

mysql> CREATE TABLE fruits_enum ( fruit_name ENUM(Apple, Orange, Pear) );
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO fruits_enum VALUES (Apple);
Query OK, 1 row affected (0.00 sec)

If you try inserting a value that’s not in the list, MySQL produces an error to tell you that it didn’t store the data you asked:

mysql> INSERT INTO fruits_enum VALUES (Banana);
ERROR 1265 (01000): Data truncated for column fruit_name at row 1

Similarly, a list of several allowed values isn’t accepted either:

mysql> INSERT INTO fruits_enum VALUES (Apple,Orange);
ERROR 1265 (01000): Data truncated for column fruit_name at row 1

Displaying the contents of the table, you can see that no wrong values were stored:

mysql> SELECT * FROM fruits_enum;
------------
| fruit_name |
------------
| Apple      |
------------
1 row in set (0.00 sec)

Earlier, MySQL produced a warning instead of error, and stored empty string in place of a wrong value. That behavior can be used by unsetting the default strict SQL mode. It’s also possible specify a default value other than the empty string:

mysql> CREATE TABLE new_fruits_enum ( fruit_name ENUM(Apple, Orange, Pear)
    -> DEFAULT Pear);
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO new_fruits_enum VALUES();
Query OK, 1 row affected (0.02 sec)

mysql> SELECT * FROM new_fruits_enum;
------------
| fruit_name |
------------
| Pear       |
------------
1 row in set (0.00 sec)

Here, not specifying a value results in the default value Pear being stored.

SET(–0—__value1__–1—[,–2—__value2__–3—[, ...]]) [CHARACTER SET __charset_name__] [COLLATE __collation_name__]

A set of string values. A column of type SET can be set to zero or more values from the list `value1`, `value2`, and so on, up to a maximum of 64 different values. While the values are strings, what’s stored in the database is an integer representation. SET differs from ENUM in that each row can store only one ENUM value in a column, but can store multiple SET values. This type is useful for storing a selection of choices from a list, such as user preferences. Consider this example using fruit names; the name can be any combination of the predefined values:

mysql> CREATE TABLE fruits_set ( fruit_name SET(Apple, Orange, Pear) );
Query OK, 0 rows affected (0.08 sec)

mysql> INSERT INTO fruits_set VALUES (Apple);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO fruits_set VALUES (Banana);
ERROR 1265 (01000): Data truncated for column fruit_name at row 1

mysql> INSERT INTO fruits_set VALUES (Apple,Orange);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM fruits_set;
--------------
| fruit_name   |
--------------
| Apple        |
| Apple,Orange |
--------------
2 rows in set (0.00 sec)

Again, note that we can store multiple values from the set in a single field, and that an empty string is stored for invalid input.

As with numeric types, we recommend that you always choose the smallest possible type to store values. For example, if you’re storing a city name, use CHAR or VARCHAR, rather than, say, the TEXT type. Having shorter columns helps keep your table size down, which in turns helps performance when the server has to search through a table.

Using a fixed size with the CHAR type is often faster than using a variable size with VARCHAR, since the MySQL server knows where each row starts and ends, and can quickly skip over rows to find the one it needs. However, with fixed-length fields, any space that you don’t use is wasted. For example, if you allow up to 40 characters in a city name, then CHAR(40) will always use up 40 characters, no matter how long the city name actually is. If you declare the city name to be VARCHAR(40), then you’ll use up only as much space as you need, plus one byte to store the name length. If the average city name is 10 characters long, this means that using a variable length field will take up 29 fewer bytes per entry; this can make a big difference if you’re storing millions of addresses.

In general, if storage space is at a premium or you expect large variations in the length of strings that are to be stored, use a variable-length field; if performance is a priority, use a fixed length.

Date and time types

These types serve the purpose of storing particular timestamps, dates, or time ranges. Particular care should be taken when dealing with timezones. We will try to explain details, but it’s worth re-reading this section and documentation later when you’ll need to actually work with timezones.

DATE

Stores and displays a date in the format `YYYY-MM-DD` for the range 1000-01-01 to 9999-12-31. Dates must always be input as year, month, and day triples, but the format of the input can vary, as shown in the following examples:

YYYY-MM-DD or YY-MM-DD;

It’s optional whether you provide two-digit or four-digit years. We strongly recommend that you use the four-digit version to avoid confusion about the century. In practice, if you use the two-digit version, you’ll find that 70 to 99 are interpreted as 1970 to 1999, and 00 to 69 are interpreted as 2000 to 2069.

YYYY/MM/DD, YYYY:MM:DD, YY/MM/DD, or other punctuated formats

MySQL allows any punctuation characters to separate the components of a date. We recommend using dashes and, again, avoiding the two-digit years.

YYYY-M-D, YYYY-MM-D, or YYYY-M-DD

When punctuation is used (again, any punctuation character is allowed), single-digit days and months can be specified as such. For example, February 2, 2006, can be specified as 2006-2-2. The two-digit year equivalent is available, but not recommended.

YYYYMMDD or YYMMDD

Punctuation can be omitted in both date styles, but the digit sequences must be six or eight digits in length.

You can also input a date by providing both a date and time in the formats described later for DATETIME and TIMESTAMP, but only the date component is stored in a DATE type column. Regardless of the input type, the storage and display type is always YYYY-MM-DD. The zero date 0000-00-00 is allowed in all versions and can be used to represent an unknown or dummy value. If an input date is out of range, the zero date 0000-00-00 is stored. However, only MySQL versions up to and including 5.6 allow that by default. Both 5.7 and 8.0 set SQL modes that prohibit this behavior: STRICT_TRANS_TABLES, NO_ZERO_DATE, and NO_ZERO_IN_DATE.

Here’s an example from MySQL 8.0 with default settings:

mysql> CREATE TABLE testdate (mydate DATE);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO testdate VALUES (2020/02/0);
ERROR 1292 (22007): Incorrect date value: 2020/02/0 for column mydate at row 1

mysql> INSERT INTO testdate VALUES (2020/02/1);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO testdate VALUES (2020/02/31);
ERROR 1292 (22007): Incorrect date value: 2020/02/31 for column mydate at row 1

mysql> INSERT INTO testdate VALUES (2020/02/100);
ERROR 1292 (22007): Incorrect date value: 2020/02/100 for column mydate at row 1

Once INSERTs are executed, table will have the following data:

mysql> SELECT * FROM testdate;
------------
| mydate     |
------------
| 2020-02-01 |
------------
1 row in set (0.00 sec)

MySQL protected you from having “bad” data stored. Sometimes you may need to preserve the actual input and manually process it later. That is possible by unsetting aforementioned SQL modes. Doing so, you could end up with the following data:

mysql> SELECT * FROM testdate;
------------
| mydate     |
------------
| 2020-02-00 |
| 2020-02-01 |
| 0000-00-00 |
| 0000-00-00 |
------------
4 rows in set (0.01 sec)

Note also that the date is displayed in the YYYY-MM-DD format, regardless of how it was input.

TIME [__fraction__]

Stores a time in the format `HHH:MM:SS` for the range -838:59:59 to 838:59:59. Useful for storing duration of some activity. The values that can be stored are outside the range of the 24-hour clock to allow large differences between time values (up to 34 days, 22 hours, 59 minutes, and 59 seconds) to be computed and stored. `fraction` in TIME and other related data types specifies the fractional seconds precision in the range from 0 to 6. The default value is 0, meaning that no fractional seconds are preserved.

Times must always be input in the order days, hours, minutes, and seconds, using the following formats:

`DD HH:MM:SS[.fraction]`, `HH:MM:SS[.fraction]`, `DD HH:MM`, `HH:MM`, `DD HH`, or `SS[.fraction]`

The `DD` represents a one-digit or two-digit value of days in the range 0 to 34. The `DD` value is separated from the hour value, `HH`, by a space, while the other components are separated by a colon. Note that `MM:SS` is not a valid combination, since it cannot be disambiguated from `HH:MM`. If TIME definition doesn’t specify the `fraction` or sets it to 0, inserting fractional seconds will result in values being rounded to the nearest second.

For example, if you insert 2 13:25:58.999999 into a TIME type column with `fraction` of 0, the value 61:25:59 is stored, since the sum of 2 days (48 hours) and 13 hours is 61 hours. Starting with MySQL 5.7, default SQL mode set prohibits insertion of incorrect values. However, it is possible to enable the older behavior. Then, if you try inserting a value that’s out of bounds, a warning is generated, and the value is limited to the maximum time available. Similarly, if you try inserting an incorrect value, a warning is generated and the value is set to zero. You can use the SHOW WARNINGS command to reports the details of the warning generated by the previous SQL statement. Our recommendation is to stick to the default STRICT SQL mode. Unlike DATE type, there’s seemingly no benefit in allowing incorrect TIME entries, apart from easier error management on application side and maintaiting legacy behaviors.

Let’s try all these out in practice:

mysql> CREATE TABLE test_time(id SMALLINT, mytime TIME);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO test_time VALUES(1, "2 13:25:59");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO test_time VALUES(2, "35 13:25:59");
ERROR 1292 (22007): Incorrect time value: 35 13:25:59 for column mytime at row 1

mysql> INSERT INTO test_time VALUES(3, "900.32");
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM test_time;
----------------+
| id   | mytime   |
----------------+
|    1 | 61:25:59 |
|    3 | 00:09:00 |
----------------+
2 rows in set (0.00 sec)
`H:M:S`, and single-, double-, and triple-digit combinations

You can use different combinations of digits when inserting or updating data; MySQL converts them into the internal time format and displays them consistently. For example, 1:1:3 is equivalent to 01:01:03. Different numbers of digits can be mixed; for example, 1:12:3 is equivalent to 01:12:03. Consider these examples:

mysql> CREATE TABLE mytime (testtime TIME);
Query OK, 0 rows affected (0.12 sec)

mysql> INSERT INTO mytime VALUES
    -> (-1:1:1), (1:1:1),
    -> (1:23:45), (123:4:5),
    -> (123:45:6), (-123:45:6);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM mytime;
------------
| testtime   |
------------
| -01:01:01  |
| 01:01:01   |
| 01:23:45   |
| 123:04:05  |
| 123:45:06  |
| -123:45:06 |
------------
5 rows in set (0.01 sec)

Note that hours are shown with two digits for values within the range –99 to +99.

`HHMMSS`, __MMSS__, and `SS`

Punctuation can be omitted, but the digit sequences must be two, four, or six digits in length. Note that the rightmost pair of digits is always interpreted as a `SS` (seconds) value, the second next rightmost pair (if present) as `MM` (minutes), and the third rightmost pair (if present) as `HH` (hours). The result is that a value such as 1222 is interpreted as 12 minutes and 22 seconds, not 12 hours and 22 minutes.

You can also input a time by providing both a date and time in the formats described for DATETIME and TIMESTAMP, but only the time component is stored in a TIME type column. Regardless of the input type, the storage and display type is always `HH:MM:SS`. The zero time 00:00:00 can be used to represent an unknown or dummy value.

TIMESTAMP[(__fraction__)]

Stores and displays a date and time pair in the format `YYYY-MM-DD HH:MM:SS .fraction time zone offset` for the range 1970-01-01 00:00:01.000000 to 2038-01-19 03:14:07.999999. This type is very similar to DATETIME type, but there are few differences. Both types accept a time zone modifier to the input value in the most recent MySQL version. Both types will store and present the data equally to the client in the same time zone. However, the values in TIMESTAMP columns are internally always stored in the UTC time zone, making it possible to get a local time zone automatically for clients in different time zones. That on its own is a very important distinction to remember. Arguably, TIMESTAMP is more convenient to use when dealing with time zones.

In earlier MySQL versions, preceding 5.6, only TIMESTAMP type supported automatic initialization and update. Moreover, only a single such column per a given table could do that. However, starting with 5.6, both types support the behaviors, and any number of columns can do so.

The value stored always matches the template `YYYY-MM-DD HH:MM:SS[.fraction][time zone offset]`, but the value can be provided in a wide range of formats. Ability to specify the time

`YYYY-MM-DD HH:MM:SS` or `YY-MM-DD HH:MM:SS`

The date and time components follow the same relaxed restrictions as the DATE and TIME components described previously. This includes allowance for any punctuation characters, including (unlike TIME) flexibility in the punctuation used in the time component. For example, 2020–0—02–1—15 12–2—22–3—23 is valid.

`YYYYMMDDHHMMSS` or `YYMMDDHHMMSS`

Punctuation can be omitted, but the string should be either 12 or 14 digits in length. We recommend only the unambiguous 14-digit version, for the reasons discussed for the DATE type. You can specify values with other lengths without providing separators, but we don’t recommend doing so.

Let’s discuss the automatic-update feature in detail. You control them by adding the following attributes to the column definition when creating a table, or later, as we’ll explain in “Altering Structures”:

  1. If you want the timestamp to be set only when a new row is inserted into the table, add DEFAULT CURRENT_TIMESTAMP to the end of the column declaration.

  2. If you don’t want a default timestamp but want the current time to be used whenever the data in a row is updated, add ON UPDATE CURRENT_TIMESTAMP to the end of the column declaration.

  3. If you want both of the above—that is, you want the timestamp to be set to the current time in each new row or whenever an existing row is modified— add DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP to the end of the column declaration.

    If you do not specify DEFAULT NULL or NULL for a TIMESTAMP column, it will have 0 as the default value.

YEAR[(4)]

Stores a four-digit year. The four-digit version stores values in the range 1901 to 2155, as well as the zero year, 0000. Illegal values are converted to the zero date. You can input year values as either strings (such as –0—2005–1—) or integers (such as 2005). The YEAR type requires one byte of storage space. In the earlier versions of MySQL, it was possible to specify the digits parameter. Depending on whether 2 or 4 is passed as the optional digits parameter. The two-digit version stored values from 70 to 69, representing 1970 to 2069. MySQL 8.0 doesn’t support the two-digit YEAR type, and specifying the digits parameter for display purposes is deprecated.

DATETIME[(__fraction__)]

Stores and displays a date and time pair in the format `YYYY-MM-DD HH:MM:SS .fraction time zone offset` for the range 1000-01-01 00:00:00 to 9999-12-31 23:59:59. As for TIMESTAMP, the value stored always matches the template `YYYY-MM-DD HH:MM:SS`, but the value can be input in the same formats listed for the TIMESTAMP description. If you assign only a date to a DATETIME column, the zero time 00:00:00 is assumed. If you assign only a time to a DATETIME column, the zero date 0000-00-00 is assumed. This type has the sa,e automatic update features as TIMESTAMP has. Unless NOT NULL attribute is specified for a DATETIME column, a NULL value is the default, otherwise the default is 0. Unlike TIMESTAMP, DATETIME values aren’t converted to UTC time zone for storage.

Other types

Currently, as of MySQL 8, spacial and JSON data types fall under this broad category. Using both is a quite advanced topic, and we won’t cover them in-depth.

Spatial data types are concerned with storing geometrical objects, and MySQL has types corresponding to OpenGIS classes. Working with these types is a topic worth writing a book on its own.

JSON data type allows a native storage of valid JSON documents. Before MySQL version 5.7, JSON was usually stored in a TEXT or a similar column. However, that has a lot of disadvantages, for example, documents aren’t validated. Moreover, there’s no storage optimization performed, and all JSON is just stored in its text form. With native JSON, it’s stored in binary format. If we were to summarize in one sentence: use JSON data type for JSON, dear reader.

Keys and Indexes

You’ll find that almost all tables you use will have a ((("primary keys", "PRIMARY KEY clause")))PRIMARY KEY clause declared in their CREATE TABLE statement, and some times multiple ((("secondary keys", "KEY clause")))KEY clauses. The reasons why you need a primary key and secondary keys are discussed in Chapter 2. This section discusses how primary keys are declared, what happens behind the scenes when you do so, and why you might want to also create other keys and indexes on your data.

A primary key uniquely identifies each row in a table. Even more importantly, for the default InnoDB storage engine, a primary key is also used as a clustered index. That means that the all of the actual table data is stored in an index structure. That is different to MyISAM, which stores data and indexes separately. Primary keys are generally a recommended part of database design, but for InnoDB they are necessary. In fact, if you do not specify a PRIMARY KEY clause when creating an InnoDB table, MySQL will use the first UNIQUE NOT NULL column as a base for the clustered index. If even that is not available, a hidden clustered index is created, based on ID values assigned by InnoDB to each row.

Given that InnoDB is a default storage engine, and a de-facto standard nowadays, we will concentrate on its behavior in this chapter. Alternative storage engines like MyISAM, MEMORY, or MyRocks will be discussed in the [Link to Come].

When primary key is defined, as we mentioned, it becomes a clustered index, and all data in the table is stored in the leaf blocks of that index. InnoDB uses B-tree indexes (more specifically, B+tree variant), with the exception of indexes on spatial data types, which use R-tree. Other storage engines might implement different index types, however, when table’s storage engine is not specified, you can assume that all indexes are B-tree.

Having a clustered index, or in other words having index-organized tables, speeds up queries and sorts involving the primary key columns. However, a downside is that modifying columns in a primary key is expensive. Thus, a good design will require a primary key based on columns which are frequently used for filtering in queries but are rarely modified. Remember that having no primary key at all will result in InnoDB using an implicit cluster index, thus if you’re not sure what columns to pick for a primary key, consider using a synthetic id-like column. For example, the SERIAL data type might fit well in that case.

<!-- discussion of internals? Should we touch the B+Tree structure? -->

Stepping away from the InnoDB internal details, when you declare a PRIMARY KEY for a table in MySQL, it creates a structure that stores information about where the data from each row in the table is stored. This information is called an index, and its purpose is to speed up searches that use the primary key. For example, when you declare PRIMARY KEY (actor_id) in the actor table in the sakila database, MySQL creates a structure that allows it to find rows that match a specific actor_id (or a range of identifiers) extremely quickly.

This is very useful to match actors to films, or films to categories for example. You can display the indexes available on a table using the SHOW INDEX (or SHOW INDEXES) command:

mysql> SHOW INDEX FROM category;
-----------------------------------------------------------------------...
| Table     | Non_unique | Key_name | Seq_in_index | Column_name | Collation |...
+-----------------------------------------------------------------------...
| category  | 0          | PRIMARY  | 1            | category_id | A         |...
+-----------------------------------------------------------------------...
... +----------------------------------------------------------+ ...
... | Cardinality | Sub_part | Packed | Null | Index_type | Comment | ...
... ----------------------------------------------------------+ ...
... | 16          | NULL     |        |      | BTREE      |         | ...
... ----------------------------------------------------------+ ...
... ------------------------------------ ...
... | Index_comment | Visible | Expression | ...
... ------------------------------------ ...
... |               | YES     | NULL       | ...
... ------------------------------------ ...

1 row in set (0.00 sec)

We’ve wrapped the output here so that it would fit on the page. The cardinality is the number of unique values in the index; for an index on a primary key, this is the same as the number of rows in the table.

Note that all columns that are part of a primary key must be declared as NOT NULL, since they must have a value for the row to be valid. Without the index, the only way to find rows in the table is to read each one from disk and check whether it matches the category_id you’re searching for. For tables with many rows, this exhaustive, sequential searching is extremely slow. However, you can’t just index everything; we’ll come back to this point at the end of this section.

You can create other indexes on the data in a table. You do this so that other searches—on other columns or combinations of columns—are extremely fast and in order to avoid sequential scans. For example, take the table actor again. Apart from having a primary key on actor_id, it also has a secondary key on last_name, to improve searching by actor’s last name.

mysql> SHOW CREATE TABLE actorG

       Table: actor
Create Table: CREATE TABLE `actor` (
  `actor_id` smallint unsigned NOT NULL AUTO_INCREMENT,
  ...
  `last_name` varchar(45) NOT NULL,
  ...
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)
) ...
1 row in set (0.00 sec)

You can see the keyword KEY is used to tell MySQL that an extra index is needed; you can use the word ((("INDEX keyword")))INDEX in place of ((("KEY keyword")))KEY. Following that keyword is an index name, and then the column to index is included in parentheses. You can also add indexes after tables are created—in fact, you can pretty much change anything about a table after its creation—and this is discussed in “Altering Structures”.

You can build an index on more than one column. For example, consider the following customer table, which is a modified table from sakila:

mysql> CREATE TABLE customer (
    *-> customer_id smallint unsigned NOT NULL AUTO_INCREMENT,
    -> first_name varchar(45) NOT NULL,
    -> last_name varchar(45) NOT NULL,
    -> email varchar(50) DEFAULT NULL,
    -> PRIMARY KEY (customer_id),
    -> KEY idx_names_email (first_name, last_name, email));
Query OK, 0 rows affected (0.06 sec)

You can see that we’ve added a primary key index on the customer_id identifier column, and we’ve also added another index—called idx_names_email—that includes the first_name, last_name, and email columns in this order. Let’s now consider how you can use that extra index.

You can use the idx_names_email index for fast searching by combinations of the three name columns. For example, it’s useful in the following query:

mysql> SELECT * FROM customer WHERE
    -> first_name = "Rose" AND
    -> last_name = "Williams" AND
    -> email = "[email protected]";

We know it helps the search, because all columns listed in the index are used in the query. You can use the EXPLAIN statement to check whether what you think should happen is in fact happening:

mysql> EXPLAIN SELECT * FROM customer WHERE
    -> first_name = "Rose" AND
    -> last_name = "Williams" AND
    -> email = "[email protected]";
---------------------------...-----------------------...
| id | select_type | table    |...| type | possible_keys   |...
+---------------------------...-----------------------...
|  1 | SIMPLE      | customer |...| ref  | idx_names_email |...
+---------------------------...-----------------------...
...---------------------------------------------------...-------------
...| key             | key_len | ref               | rows |...| Extra       |
...---------------------------------------------------...-------------+
...| idx_names_email |     567 | const,const,const |    1 |...| Using index |
...--------------------------------------------------...-------------
1 row in set (0.00 sec)

We’ve reformatted the output slightly to fit better in the book. You can see that MySQL reports that the possible_keys are idx_names_email (meaning that the index could be used for this query) and that the key that it’s decided to use is idx_names_email. So, what you expect and what is happening are the same, and that’s good news! You’ll find out more about the EXPLAIN statement in [Link to Come].

The index we’ve created is also useful for queries on only the first_name column. For example, it can be used by the following query:

mysql> SELECT * FROM customer WHERE
    -> first_name = "Rose";

You can use EXPLAIN to check whether the index is being used. The reason it can be used is because the first_name column is the first listed in the index. In practice, this means that the index clusters, or stores together, information about rows for all people with the same first name, and so the index can be used to find anyone with a matching first name.

The index can also be used for searches involving combinations of first name and last name, for exactly the same reasons we’ve just discussed. The index clusters together people with the same first name, and within that it clusters people with identical first names ordered by last name. So, it can be used for this query:

mysql> SELECT * FROM customer WHERE
    -> first_name = "Rose" AND
    -> last_name = "Williams";

However, the index can’t be used for this query because the leftmost column in the index, first_name, does not appear in the query:

mysql> SELECT * FROM customer WHERE
    -> last_name = "Williams" AND
    -> email = "[email protected]";

The index should help narrow down the set of rows to a smaller set of possible answers. For MySQL to be able to use an index, the query needs to meet both the following conditions:

  1. The leftmost column listed in the KEY (or PRIMARY KEY) clause must be in the query.

  2. The query must contain no OR clauses for columns that aren’t indexed.

Again, you can always use the EXPLAIN statement to check whether an index can be used for a particular query.

Before we finish this section, here are a few ideas on how to choose and design indexes. When you’re considering adding an index, think about the following:

  • Indexes cost space on disk, and they need to be updated whenever data changes. If your data changes frequently, or lots of data changes when you do make a change, indexes will slow the process down. However, in practice, since ((("SELECT")))SELECT statements (data reads) are usually much more common than other statements (data modifications), indexes are usually beneficial.

  • Only add an index that’ll be used frequently. Don’t bother indexing columns before you see what queries your users and your applications need. You can always add indexes afterward.

  • If all columns in an index are used in all queries, list the column with the highest number of duplicates at the left of the KEY clause. This minimizes index size.

  • The smaller the index, the faster it’ll be. If you index large columns, you’ll get a larger index. This is a good reason to ensure your columns are as small as possible when you design your tables.

  • For long columns, you can use only a prefix of the values from a column to create the index. You can do this by adding a value in parentheses after the column definition, such as KEY idx_names_email (first_name(3), last_name(2), email(10)). This means that only the first three characters of first_name are indexed, then the first two characters of last_name, and then 10 characters from email. This is a significant saving over indexing 140 characters from the 3 columns! When you do this, your index will be less able to uniquely identify rows, but it’ll be much smaller and still reasonably good at finding matching rows. That is also mandatory for long types like TEXT.

To finish this section, we must also discuss some peculiarities regarding secondary keys in InnoDB. Remember that all the table data is stored in the leafs of the clustered index. That means, using the actor example, that if we need to get the first_name data when filtering by last_name, even though we can use the idx_actor_last_name for quick filterting, we will need to access the data by the primary key. As a consequence, each secondary key in InnoDB has all of the primary key columns appended to its definition implicitly. Having unnecessarily-long primary keys in InnoDB results in significantly bloated secondary keys.

This can also be seen in explain, note the Extra: Using index in the first output:

mysql> EXPLAIN SELECT actor_id, last_name FROM actor WHERE last_name = SmithG
           id: 1
  select_type: SIMPLE
        table: actor
   partitions: NULL
         type: ref
possible_keys: idx_actor_last_name
          key: idx_actor_last_name
      key_len: 182
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT first_name FROM actor WHERE last_name = SmithG
           id: 1
  select_type: SIMPLE
        table: actor
   partitions: NULL
         type: ref
possible_keys: idx_actor_last_name
          key: idx_actor_last_name
      key_len: 182
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

Effectively, idx_actor_last_name is a covering index for the first query, meaning that InnoDB can extract all the required data from that index alone. However, for the second query, InnoDB will have to do an additional lookup of a clustered index to get the value for first_name column.

The AUTO_INCREMENT Feature

MySQL’s proprietary ((("–0—", id="ch04.increm", range="startofrange")))AUTO_INCREMENT feature allows you to create a unique identifier for a row without running a SELECT query. Here’s how it works. Let’s take the simplified actor table again:

mysql> CREATE TABLE actor (
    -> *  actor_id smallint unsigned NOT NULL AUTO_INCREMENT,*
    -> *  first_name varchar(45) NOT NULL,*
    -> *  last_name varchar(45) NOT NULL,*
    -> *  PRIMARY KEY (actor_id)*
    -> );
Query OK, 0 rows affected (0.03 sec)

It’s possible to insert rows into that table without specifying the actor_id:

mysql> INSERT INTO actor VALUES (NULL, "Alexander", "Kaidanovsky");
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO actor VALUES (NULL, "Anatoly", "Solonitsyn");
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO actor VALUES (NULL, "Nikolai", "Grinko");
Query OK, 1 row affected (0.00 sec)

When you view the data in this table you can see that each actor has a meaningful actor_id:

mysql> SELECT * FROM actor;
-----------------------------------
| actor_id | first_name | last_name   |
-----------------------------------
|        1 | Alexander  | Kaidanovsky |
|        2 | Anatoly    | Solonitsyn  |
|        3 | Nikolai    | Grinko      |
-----------------------------------
3 rows in set (0.00 sec)

Each time an actor is inserted, a unique actor_id is created for the new row.

Let’s consider how the new feature works. You can see that the actor_id column is declared as an integer with the clauses NOT NULL AUTO_INCREMENT. The AUTO_INCREMENT keyword tells MySQL that when a value isn’t provided for this column, the value allocated should be one more than the maximum currently stored in the table. The AUTO_INCREMENT sequence begins at 1 for an empty table.

The NOT NULL is required for AUTO_INCREMENT columns; when you insert NULL (or 0, though this isn’t recommended), the MySQL server automatically finds the next available identifier and assigns it to the new row. You can manually insert negative values if the column was not defined as UNSIGNED; however, for the next automatic increment, MySQL will simply use the largest (most positive) value in the column, or start from 1 if there are no positive values.

The AUTO_INCREMENT feature has the following requirements:

  • The column it is used on must be indexed.

  • The column that is it used on cannot have a DEFAULT value.

  • There can be only one AUTO_INCREMENT column per table.

MySQL supports different storage engines; we’ll learn more about these in [Link to Come] in [Link to Come]. When using the non-default MyISAM table type, you can use the AUTO_INCREMENT feature on keys that comprise multiple columns. In effect, you could have multiple independent counters within a single AUTO_INCREMENT column. However, it’s not possible with InnoDB.

While the AUTO_INCREMENT feature is useful, it isn’t portable to other database environments, and it hides the logical steps to creating new identifiers. It can also lead to ambiguity; for example, dropping or truncating a table will reset the counter, but deleting selected rows (with a WHERE clause) doesn’t reset the counter. Moreover, if a row is inserted inside a transaction, but then transaction is rolled back, an identifier would be used up anyway. Consider an example; let’s create the table count that contains an auto-incrementing field counter:

mysql> CREATE TABLE count (counter INT AUTO_INCREMENT KEY);
Query OK, 0 rows affected (0.13 sec)

mysql> INSERT INTO count VALUES (),(),(),(),(),();
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM count;
---------
| counter |
---------
| 1       |
| 2       |
| 3       |
| 4       |
| 5       |
| 6       |
---------
6 rows in set (0.00 sec)

Inserting several values works as expected. Now, let’s delete a few rows and then add six new rows:

mysql> DELETE FROM count WHERE counter > 4;
Query OK, 2 rows affected (0.00 sec)

mysql> INSERT INTO count VALUES (),(),(),(),(),();
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM count;
---------
| counter |
---------
| 1       |
| 2       |
| 3       |
| 4       |
| 7       |
| 8       |
| 9       |
| 10      |
| 11      |
| 12      |
---------
10 rows in set (0.00 sec)

Here, we see that the counter is not reset, and continues from 7. If, however, we truncate the table, thus removing all of the data, the counter is reset to 1:

mysql> TRUNCATE TABLE count;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO count VALUES (),(),(),(),(),();
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM count;
---------
| counter |
---------
| 1       |
| 2       |
| 3       |
| 4       |
| 5       |
| 6       |
---------
6 rows in set (0.00 sec)

To summarize: AUTO_INCREMENT guarantees a sequence of transactional and monotonically-increasing numbers. However, it does not in any way guarantee that each individual identifier provided will exactly follow the previous one. Usually, this behavior of AUTO_INCREMENT is clear enough and should not be a problem. However, if your particular use case requires a counter that guarantees no gaps, you should consider using some kind of a workaround. Unfortunately, it’ll likely be implemented on the application side.

Altering Structures

We’ve shown you all the basics you need for creating databases, tables, indexes, and columns. In this section, you’ll learn how to add, remove, and change columns, databases, tables, and indexes in structures that already exist.

Adding, Removing, and Changing Columns

You can use the ((("ALTER TABLE statement")))ALTER TABLE statement to add new columns to a table, remove existing columns, and change column names, types, and lengths.

Let’s begin by considering how you modify existing columns. Consider an example in which we rename a table column. The language table has a column called last_update that contains the time the record was modified. To change the name of this column to last_updated_time, you would write:

mysql> ALTER TABLE language RENAME COLUMN last_update TO last_updated_time;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

This particular example utilizes online DDL feature of MySQL. What actually happens behind the scenes is that MySQL only modifies metadata, and doesn’t need to actually re-write the table in any way. You can notice that by lack of affected rows. Not all of the DDL statements can be performed online, and that’s not going to be the case with a lot of other changes. You can check the result with the SHOW COLUMNS statement:

mysql> SHOW COLUMNS FROM language;
------------------------------------------------------------------------------------------------------------------+
| Field             | Type             | Null | Key | Default           | Extra                                         |
------------------------------------------------------------------------------------------------------------------+
| language_id       | tinyint unsigned | NO   | PRI | NULL              | auto_increment                                |
| name              | char(20)         | NO   |     | NULL              |                                               |
| last_updated_time | timestamp        | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.01 sec)

In the previos example we used the ALTER TABLE statement with RENAME COLUMN keyword. That is a MySQL 8 feature. We could alternatively use ALTER TABLE with the CHANGE keyword for compatibility.

mysql> ALTER TABLE language CHANGE last_update last_updated_time TIMESTAMP
    -> NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

In this example, you can see that we provided four parameters to the ALTER TABLE statement with the CHANGE keyword:

  1. The table name, language

  2. The original column name, last_update

  3. The new column name, last_updated_time

  4. The column type, TIMESTAMP, with a lot of extra attributes, which are necessary to not change the original definition

You must provide all four; that means you need to respecify the type and any clauses that go with it. In this example, as we’re using MySQL 8 with default settings, TIMESTAMP no longer has explicit defaults. As you can see, using RENAME COLUMN is much easier than CHANGE.

If you want to modify the type and clauses of a column, but not its name, you can use the ((("MODIFY keyword")))MODIFY keyword:

mysql> ALTER TABLE language MODIFY name CHAR(20) DEFAULT "n/a";
Query OK, 0 rows affected (0.14 sec)
Records: 0  Duplicates: 0  Warnings: 0

You can also do this with the ((("CHANGE keyword")))CHANGE keyword, but by specifying the same column name twice:

mysql> ALTER TABLE language CHANGE name name CHAR(20) DEFAULT "n/a";
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Be careful when you’re modifying types:

  • Don’t use incompatible types, since you’re relying on MySQL to successfully convert data from one format to another (for example, converting an INT column to a DATETIME column isn’t likely to do what you hoped).

  • Don’t truncate the data unless that’s what you want. If you reduce the size of a type, the values will be edited to match the new width, and you can lose data.

Suppose you want to add an extra column to an existing table. Here’s how to do it with the ALTER TABLE statement:

mysql> ALTER TABLE language ADD native_name CHAR(20);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

You must supply the ((("ADD keyword")))ADD keyword, the new column name, and the column type and clauses. This example adds the new column, native_name, as the last column in the table, as shown with the SHOW COLUMNS statement:

mysql> SHOW COLUMNS FROM artist;
------------------------------------------------------------------------------------------------------------------+
| Field             | Type             | Null | Key | Default           | Extra                                         |
------------------------------------------------------------------------------------------------------------------+
| language_id       | tinyint unsigned | NO   | PRI | NULL              | auto_increment                                |
| name              | char(20)         | YES  |     | n/a               |                                               |
| last_updated_time | timestamp        | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
| native_name       | char(20)         | YES  |     | NULL              |                                               |
------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

If you want it to instead be the first column, use the FIRST keyword as follows:

mysql> ALTER TABLE language ADD native_name CHAR(20) FIRST;
Query OK, 0 rows affected (0.08 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SHOW COLUMNS FROM language;
------------------------------------------------------------------------------------------------------------------+
| Field             | Type             | Null | Key | Default           | Extra                                         |
------------------------------------------------------------------------------------------------------------------+
| native_name       | char(20)         | YES  |     | NULL              |                                               |
| language_id       | tinyint unsigned | NO   | PRI | NULL              | auto_increment                                |
| name              | char(20)         | YES  |     | n/a               |                                               |
| last_updated_time | timestamp        | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)

If you want it added in a specific position, use the AFTER keyword:

mysql> ALTER TABLE language ADD native_name CHAR(20) AFTER name;
Query OK, 0 rows affected (0.08 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SHOW COLUMNS FROM language;
------------------------------------------------------------------------------------------------------------------+
| Field             | Type             | Null | Key | Default           | Extra                                         |
------------------------------------------------------------------------------------------------------------------+
| language_id       | tinyint unsigned | NO   | PRI | NULL              | auto_increment                                |
| name              | char(20)         | YES  |     | n/a               |                                               |
| native_name       | char(20)         | YES  |     | NULL              |                                               |
| last_updated_time | timestamp        | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

To remove a column, use the DROP keyword followed by the column name. Here’s how to get rid of the newly added formed column:

mysql> ALTER TABLE language DROP native_name;
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

This removes both the column structure and any data contained in that column. It also removes the column from any index it was in; if it’s the only column in the index, the index is dropped, too. You can’t remove a column if it’s the only one in a table; to do this, you drop the table instead as explained later in “Deleting Structures”. Be careful when dropping columns; you discard both the data and the structure of your table. When the structure of a table changes, you will generally have to modify any INSERT statements that you use to insert values in a particular order. We described INSERT statements in “The INSERT Statement” in Chapter 3.

MySQL allows you to specify multiple alterations in a single ALTER TABLE statement by separating them with commas. Here’s an example that adds a new column and adjusts another:

mysql> ALTER TABLE language ADD native_name CHAR(255), MODIFY name CHAR(255);
Query OK, 6 rows affected (0.06 sec)
Records: 6  Duplicates: 0  Warnings: 0

Note that this time, you can see that 6 records were change. If you haven’t noticed yet, note that we didn’t see any records affected when altering tables previously. The difference is that this time, we’re not performing an online DDL, because changing any column’s type will always result in a table being rebuilt. We recommend reading through documentation section titled “Online DDL Operations” when planning your changes. Combining online and “offline” operations will result in an “offline” operation.

When not using online DDL, or when any of the modifications is “offline”, it’s very efficient to join multiple modifications in a single operation. That potentially saves the cost of creating a new table, copying data from the old table to the new table, dropping the old table, and renaming the new table with the name of the old table for each modification individually.

Adding, Removing, and Changing Indexes

As we discussed previously, it’s often hard to know what indexes are useful before the application you’re building is used. You might find that a particular feature of the application is much more popular than you expected, causing you to evaluate how to improve performance for the associated queries. You’ll therefore find it useful to be able to add, alter, and remove indexes on the fly after your application is deployed. This section shows you how. Modifying indexes does not affect the data stored in the table.

We’ll start with adding a new index. Imagine that the language table is frequently queried using a WHERE clause that specifies the name. To speed this query, you’ve decided to add a new index, which you’ve named idx_name. Here’s how you add it after the table is created:

mysql> ALTER TABLE language ADD INDEX idx_name (name);
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

Again, you can use the terms KEY and INDEX interchangeably. You can check the results with the SHOW CREATE TABLE statement:

mysql> SHOW CREATE TABLE languageG
       Table: language
Create Table: CREATE TABLE `language` (
  `language_id` tinyint unsigned NOT NULL AUTO_INCREMENT,
  `name` char(255) DEFAULT NULL,
  `last_updated_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`language_id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

As expected, the new index forms part of the table structure. You can also specify a primary key for a table after it’s created:

mysql> CREATE TABLE no_pk (id INT);
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO no_pk VALUES (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE no_pk ADD PRIMARY KEY (id);
Query OK, 0 rows affected (0.13 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now let’s consider how to remove an index. To remove a non-primary-key index, you do the following:

mysql> ALTER TABLE language DROP INDEX idx_name;
Query OK, 0 rows affected (0.08 sec)
Records: 0  Duplicates: 0  Warnings: 0

You can drop a primary-key index as follows:

mysql> ALTER TABLE no_pk DROP PRIMARY KEY;
Query OK, 3 rows affected (0.07 sec)
Records: 3  Duplicates: 0  Warnings: 0

MySQL won’t allow you to have multiple primary keys in a table. If you want to change the primary key, you’ll have to remove the existing index before adding the new one. However, we know that it’s possible to group DDL operations. Consider this example:

mysql> ALTER TABLE language DROP PRIMARY KEY, ADD PRIMARY KEY (language_id, name);
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

You can’t modify an index once it’s been created. However, sometimes you’ll want to; for example, you might want to reduce the number of characters indexed from a column or add another column to the index. The best method to do this is to drop the index and then create it again with the new specification. For example, suppose you decide that you want the idx_name index to include only the first 10 characters of the artist_name. Simply do the following:

mysql> ALTER TABLE language DROP INDEX idx_name, ADD INDEX idx_name (name(10));
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

Renaming Tables and Altering Other Structures

We’ve seen how to modify columns and indexes in a table; now let’s see how to modify tables themselves. It’s easy to rename a table. Suppose you want to rename language to languages. Use the following command:

mysql> ALTER TABLE language RENAME TO languages;
Query OK, 0 rows affected (0.04 sec)

The TO keyword is optional.

There are several other things you can do with ALTER statements:

  • Change the default character set and collation order for a database, a table, or a column.

  • Manage and change constraints. For example, you can add and remove foreign keys.

  • Add partitioning to a table, or alter the current partitioning definition.

  • Change the storage engine of a table.

You can find more about these operations in the MySQL manual under the “ALTER DATABASE” and “ALTER TABLE” headings.

One thing that is not possible to alter is a name of a particular database. However, if you’re using InnoDB engine, you can use RENAME to move tables between databases:

mysql> CREATE DATABASE sakila_new;
Query OK, 1 row affected (0.05 sec)

mysql> RENAME TABLE sakila.languages sakila_new.languages;
Query OK, 0 rows affected (0.05 sec)

mysql> USE sakila;
Database changed

mysql> SHOW TABLES LIKE lang%;
Empty set (0.00 sec)

mysql> USE sakila_new;
Database changed

mysql> SHOW TABLES LIKE lang%;
------------------------------
| Tables_in_sakila_new (lang%) |
------------------------------
| languages                    |
------------------------------
1 row in set (0.00 sec)

Deleting Structures

In the previous section, we showed how you can delete columns and rows from a database; now we’ll describe how to remove databases and tables.

Dropping Databases

Removing, or dropping, a database is straightforward. Here’s how you drop the sakila database:

mysql> DROP DATABASE sakila;
Query OK, 25 rows affected (0.16 sec)

The number of rows returned in the response is the number of tables removed. You should take care when dropping a database, since all its tables, indexes, and columns are deleted, as are all the associated disk-based files and directories that MySQL uses to maintain them.

If a database doesn’t exist, trying to drop it causes MySQL to report an error. Let’s try dropping the sakila database again:

mysql> DROP DATABASE sakila;
ERROR 1008 (HY000): Can't drop database sakila; database doesn't exist

You can avoid the error, which is useful when including the statement in a script, by using the IF EXISTS phrase:

mysql> DROP DATABASE IF EXISTS sakila;
Query OK, 0 rows affected, 1 warning (0.00 sec)

You can see that a warning is reported, since the sakila database has already been dropped. Unfortunately, due to a bug https://bugs.mysql.com/bug.php?id=79684 theh warning itself is empty.

Removing Tables

Removing tables is as easy as removing a database. Let’s create and remove a table from the sakila database:

mysql> CREATE TABLE temp (id SERIAL PRIMARY KEY);
Query OK, 0 rows affected (0.05 sec)

mysql> DROP TABLE temp;
Query OK, 0 rows affected (0.03 sec)

Don’t worry: the 0 rows affected message is misleading. You’ll find the table is definitely gone.

You can use the IF EXISTS phrase to prevent errors. Let’s try dropping the temp table again:

mysql> DROP TABLE IF EXISTS temp;
Query OK, 0 rows affected, 1 warning (0.01 sec)

Again, you can investigate the warning indicates with the SHOW WARNINGS statement:

mysql> SHOW WARNINGS;
------------------------------------------
| Level | Code | Message                     |
------------------------------------------
| Note  | 1051 | Unknown table sakila.temp |
------------------------------------------
1 row in set (0.00 sec)

You can drop more than one table in a single statement by separating table names with commas:

mysql> DROP TABLE IF EXISTS temp, temp1, temp2;
Query OK, 0 rows affected, 3 warnings (0.00 sec)

You can see three warnings because none of these tables existed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
35.172.110.179