6. Database Design

In This Chapter

Normalization 166

Creating Indexes 179

Using Different Table Types 182

Languages and MySQL 184

Time Zones and MySQL 189

Foreign Key Constraints 195

Review and Pursue 202

Now that you have a basic understanding of databases, SQL, and MySQL, this chapter begins the process of taking that knowledge deeper. The focus on this chapter, as the title states, is real-world database design. Like the work done in Chapter 4, “Introduction to MySQL,” much of the effort herein requires paper and pen, and serious thinking about what your applications will need to do.

The chapter begins with thorough coverage of database normalization: a vital approach to the design process. After that, the chapter turns to design-related concepts specific to MySQL: indexes, table types, language support, working with times, and foreign key constraints.

By the end of this chapter, you’ll know the steps involved in proper database design, understand how to make the most of MySQL, and plan a couple of multi-table databases. In the next chapter, you’ll learn more advanced SQL and MySQL, using these new databases as examples.

Normalization

Whenever you are working with a relational database management system such as MySQL, the first step in creating and using a database is to establish the database’s structure (also called the database schema). Database design, aka data modeling, is crucial for successful long-term management of information. Using a process called normalization, you carefully eliminate redundancies and other problems that would undermine the integrity of your database.

The techniques you will learn over the next few pages will help to ensure the viability, usefulness, and reliability of your databases. The primary example to be discussed—a forum where users can post messages—will be more explicitly used in Chapter 17, “Example—Message Board,” but the principles of normalization apply to any database you might create. (The sitename example as created and used in the past two chapters was properly normalized, even though normalization was never discussed.)

Normalization was developed by an IBM researcher named E.F. Codd in the early 1970s (he also invented the relational database). A relational database is merely a collection of data, organized in a particular manner, and Dr. Codd created a series of rules called normal forms that help define that organization. This chapter discusses the first three of the normal forms, which are sufficient for most database designs.

Before you begin normalizing your database, you must define the role of the application being developed. Whether it means that you thoroughly discuss the subject with a client or figure it out for yourself, understanding how the information will be accessed dictates the modeling. Thus, this process will require paper and pen rather than the MySQL software itself (although database design is applicable to any relational database, not just MySQL).

In this example, I want to create a message board where users can post messages and other users can reply. I imagine that users will need to register, then log in with an email address/password combination, in order to post messages. I also expect that there could be multiple forums for different subjects. I have listed a sample row of data in Table 6.1. The database itself will be called forum.

Table 6.1. Sample Forum Data

Image


Tip

One of the best ways to determine what information should be stored in a database is to think about what questions will be asked of the database and what data would be included in the answers.



Tip

Always err on the side of storing more information than you might need. It’s easy to ignore unnecessary data but impossible to later manufacture data that was never stored in the first place.



Tip

Normalization can be hard to learn if you fixate on the little things. Each of the normal forms is defined in a very cryptic way; even when put into layman’s terms, they can still be confounding. My best advice is to focus on the big picture as you follow along. Once you’ve gone through normalization and seen the end result, the overall process should be clear enough.


Keys

As briefly mentioned in Chapter 4, keys are integral to normalized databases. There are two types of keys: primary and foreign. A primary key is a unique identifier that has to abide by certain rules. They must

• Always have a value (they cannot be NULL)

• Have a value that remains the same (never changes)

• Have a unique value for each record in a table

A good real-world example of a primary key is the U.S. Social Security number: each individual has a unique Social Security number, and that number never changes. Just as the Social Security number is an artificial construct used to identify people, you’ll frequently find creating an arbitrary primary key for each table to be the best design practice.

The second type of key is a foreign key. Foreign keys are the representation in Table B of the primary key from Table A. If you have a cinema database with a movies table and a directors table, the primary key from directors would be linked as a foreign key in movies. You’ll see better how this works as the normalization process continues.

The forum database is just a simple table as it stands (Table 6.1), but before beginning the normalization process, identify at least one primary key (the foreign keys will come in later steps).

To assign a primary key

1. Look for any fields that meet the three tests for a primary key.

In this example (Table 6.1), no column really fits all of the criteria for a primary key. The username and email address will be unique for each forum user but will not be unique for each record in the database (because the same user could post multiple messages). The same subject could be used multiple times as well. The message body will likely be unique for each message but could change (if edited), violating one of the rules of primary keys.

2. If no logical primary key exists, invent one (Table 6.2).

Table 6.2. Sample Forum Data

Image

Frequently, you will need to create a primary key because no good solution presents itself. In this example, a message ID is manufactured. When you create a primary key that has no other meaning or purpose, it’s called a surrogate primary key.


Tip

As a rule of thumb, I name my primary keys using at least part of the table’s name (e.g., message) and the word id. Some database developers like to add the abbreviation pk to the name as well. Some developers just use id.



Tip

MySQL allows for only one primary key per table, although you can base a primary key on multiple columns (this means the combination of those columns must be unique and never change).



Tip

Ideally, your primary key should always be an integer, which results in better MySQL performance.


Relationships

Database relationships refer to how the data in one table relates to the data in another. There are three types of relationships between any two tables: one-to-one, one-to-many, or many-to-many. (Two tables in a database may also be unrelated.)

A relationship is one-to-one if one and only one item in Table A applies to one and only one item in Table B. For example, each U.S. citizen has only one Social Security number, and each Social Security number applies to only one U.S. citizen; no citizen can have two Social Security numbers, and no Social Security number can refer to two citizens.

A relationship is one-to-many if one item in Table A can apply to multiple items in Table B. The terms female and male will apply to many people, but each person can be only one or the other (in theory). A one-to-many relationship is the most common one between tables in normalized databases.

Finally, a relationship is many-to-many if multiple items in Table A can apply to multiple items in Table B. A book can be written by multiple authors, and authors can write multiple books. Although many-to-many relationships are common in the real word, you should avoid many-to-many relationships in your design because they lead to data redundancy and integrity problems. Instead of having many-to-many relationships, properly designed databases use intermediary tables that break down one many-to-many relationship into two one-to-many relationships Image.

Image

Image A many-to-many relationship between two tables will be better represented as two one-to-many relationships those tables have with an intermediary table.

Relationships and keys work together in that a key in one table will normally relate to a key in another, as mentioned earlier.


Tip

Database modeling uses certain conventions to represent the structure of the database, which I’ll follow through a series of images in this chapter. The symbols for the three types of relationships are shown in Image.

Image

Image These symbols, or variations on them, are commonly used to represent relationships in database modeling schemes.



Tip

The process of database design results in an ERD (entity-relationship diagram) or ERM (entity-relationship model). This graphical representation of a database uses shapes for tables and columns and the symbols from Image to represent the relationships.



Tip

There are many programs available to help create a database schema, including MySQL Workbench (http://wb.mysql.com). Many of the images in this chapter will come from MySQL Workbench.



Tip

The term “relational” in RDBMS actually stems from the tables, which are technically called relations.


First Normal Form

As already stated, normalizing a database is the process of changing the database’s structure according to several rules, called forms. Your database should adhere to each rule exactly, and the forms must be followed in order.

Every table in a database must have the following two qualities in order to be in First Normal Form (1NF):

• Each column must contain only one value (this is sometimes described as being atomic or indivisible).

• No table can have repeating groups of related data.

A table containing one field for a person’s entire address (street, city, state, zip code, country) would not be 1NF compliant, because it has multiple values in one column, violating the first property above. As for the second, a movies table that had columns such as actor1, actor2, actor3, and so on would fail to be 1NF compliant because of the repeating columns all listing the exact same kind of information.

To begin the normalization process, check the existing structure (Table 6.2) for 1NF compliance. Any columns that are not atomic should be broken into multiple columns. If a table has repeating similar columns, then those should be turned into their own, separate table.

To make a database 1NF compliant

1. Identify any field that contains multiple pieces of information.

Looking at Table 6.2, one field is not 1NF compliant: actual name. The example record contained both the first name and the last name in this one column.

The message date field contains a day, a month, and a year, plus a time, but subdividing past that level of specificity is really not warranted. And, as the end of the last chapter shows, MySQL can handle dates and times quite nicely using the DATETIME type.

Other examples of problems would be if a table used just one column for multiple phone numbers (mobile, home, work), or stored a person’s multiple interests (cooking, dancing, skiing, etc.) in a single column.

2. Break up any fields found in Step 1 into distinct fields (Table 6.3).

Table 6.3. Forum Database, Atomic

Image

To fix this problem for the current example, create separate first name and last name fields, each of which contains only one value.

3. Turn any repeating column groups into their own table.

The forum database doesn’t have this problem currently, so to demonstrate what would be a violation, consider Table 6.4. The repeating columns (the multiple actor fields) introduce two problems. First of all, there’s no getting around the fact that each movie will be limited to a certain number of actors when stored this way. Even if you add columns actor 1 through actor 100, there will still be that limit (of a hundred). Second, any record that doesn’t have the maximum number of actors will have NULL values in those extra columns. You should generally avoid columns with NULL values in your database schema. As another concern, the actor and director columns are not atomic.

Table 6.4. Movies Table

Image

To fix the problems in the movies table, a second table would be created (Table 6.5). This table uses one row for each actor in a movie, which solves the problems mentioned in the last paragraph. The actor names are also broken up to be atomic. Notice as well that a primary-key column should be added to the new table. The notion that each table has a primary key is implicit in the First Normal Form.

Table 6.5. Movies-Actors Table

Image

4. Double-check that all new columns and tables created in Steps 2 and 3 pass the 1NF test.


Tip

The simplest way to think about 1NF is that this rule analyzes a table horizontally: inspect all of the columns within a single row to guarantee specificity and avoid repetition of similar data.



Tip

Various resources will describe the normal forms in somewhat different ways, likely with much more technical jargon. What is most important is the spirit—and end result—of the normalization process, not the technical wording of the rules.


Second Normal Form

For a database to be in Second Normal Form (2NF), the database must first already be in 1NF (you must normalize in order). Then, every column in the table that is not a key (i.e., a foreign key) must be dependent upon the primary key. You can normally identify a column that violates this rule when it has non-key values that are the same in multiple rows. Such values should be stored in their own table and related back to the original table through a key.

Going back to the cinema example, a movies table (Table 6.4) would have the director Martin Scorsese listed 20+ times. This violates the 2NF rule as the column(s) that store the directors’ names would not be keys and would not be dependent upon the primary key (the movie ID). The fix is to create a separate directors table that stores the directors’ information and assigns each director a primary key. To tie the director back to the movies, the director’s primary key would also be a foreign key in the movies table.

Looking at Table 6.5 (for actors in movies), both the movie name and the actor names are also in violation of the 2NF rule (they aren’t keys and they aren’t dependent on the table’s primary key). In the end, the cinema database in this minimal form requires four tables Image. Each director’s name, movie name, and actor’s name will be stored only once, and any non-key column in a table is dependent upon that table’s primary key. In fact, normalization could be summarized as the process of creating more and more tables until potential redundancies have been eliminated.

Image

Image To make the cinema database 2NF compliant (given the information being represented), four tables are necessary. The directors are represented in the movies table through the director ID key; the movies are represented in the movies-actors table through the movie ID key; and the actors are represented in the movies-actors table through the actor ID key.

To make a database 2NF compliant

1. Identify any non-key columns that aren’t dependent upon the table’s primary key.

Looking at Table 6.3, the username, first name, last name, email, and forum values are all non-keys (message ID is the only key column currently), and none are dependent upon the message ID. Conversely, the message subject, body, and date are also non-keys, but these do depend upon the message ID.

2. Create new tables accordingly Image.

Image

Image To make the forum database 2NF compliant, three tables are necessary.

The most logical modification for the forum database is to make three tables: users, forums, and messages.

In a visual representation of the database, create a box for each table, with the table name as a header and all of its columns (also called its attributes) underneath.

3. Assign or create new primary keys Image.

Image

Image Each table needs its own primary key.

Using the techniques described earlier in the chapter, ensure that each new table has a primary key. Here I’ve added a user ID field to the users table and a forum ID field to forums. These are both surrogate primary keys. Because the username field in the users table and the name field in the forums table must be unique for each record and must always have a value, you could have them act as the primary keys for their tables. However, this would mean that these values could never change (per the rules of primary keys) and the database will be a little slower, using text-based keys instead of numeric ones.

4. Create the requisite foreign keys and indicate the relationships Image.

Image

Image To relate the three tables, two foreign keys are added to the messages table, each key representing one of the other two tables.

The final step in achieving 2NF compliance is to incorporate foreign keys to link associated tables. Remember that a primary key in one table will often be a foreign key in another.

With this example, the user ID from the users table links to the user ID column in the messages table. Therefore, users has a one-to-many relationship with messages (because each user can post multiple messages, but each message can only be posted by one user).

Also, the two forum ID columns are linked, creating a one-to-many relationship between messages and forums (each message can only be in one forum, but each forum can have multiple messages).

There is no direct relationship between the users and forums tables.


Tip

Another way to test for 2NF is to look at the relationships between tables. The ideal is to create one-to-one or one-to-many situations. Tables that have a many-to-many relationship may need to be restructured.



Tip

Looking back at Image, the movies-actors table is an intermediary table, which turns the many-to-many relationship between movies and actors into two one-to-many relationships. You can often tell a table is acting as an intermediary when all of its columns are keys. In fact, in that table, the primary key could be the combination of the movie ID and the actor ID.



Tip

A properly normalized database should never have duplicate rows in the same table (two or more rows in which the values in every non–primary key column match).



Tip

To simplify how you conceive of the normalization process, remember that 1NF is a matter of inspecting a table horizontally, and 2NF is a vertical analysis (hunting for repeating values over multiple rows).


Third Normal Form

A database is in Third Normal Form (3NF) if it is in 2NF and every non-key column is mutually independent. If you followed the normalization process properly to this point, you may not have 3NF issues. You would know that you have a 3NF violation if changing the value in one column would require changing the value in another. In the forum example thus far, there aren’t any 3NF problems, but I’ll explain a hypothetical situation where this rule would come into play.

Take, as an example, a database about books. After applying the first two normal forms, you might end up with one table listing the books, another listing the authors, and a third acting as an intermediary table between books and authors, as there’s a many-to-many relationship there. If the books table listed the publisher’s name and address, that table would be in violation of 3NF Image. The publisher’s address isn’t related to the book, but rather to the publisher itself. In other words, that version of the books table has a column that’s dependent upon a non-key column (the publisher’s name).

Image

Image This database as currently designed fails the 3NF test.

As I said, the forum example is fine as is, but I’ll outline the 3NF steps just the same, showing how to fix the books example just mentioned.

To make a database 3NF compliant

1. Identify any fields in any tables that are interdependent.

As just stated, what to look for are columns that depend more upon each other than they do on the record as a whole. In the forum database, this isn’t an issue. Just looking at the messages table, each subject will be specific to a message ID, each body will be specific to that message ID, and so forth.

With a books example, the problematic fields are those in the books table that pertain to the publisher.

2. Create new tables accordingly.

If you found any problematic columns in Step 1, like address1, address2, city, state, and zip in a books example, you would create a separate publishers table. (Addresses would be more complex once you factor international publishers in.)

3. Assign or create new primary keys.

Every table must have a primary key, so add publisher ID to the new tables.

4. Create the requisite foreign keys that link any of the relationships Image.

Image

Image Going with a minimal version of a hypothetical books database, one new table is created for storing the publisher’s information.

Finally, add a publisher ID to the books table. This effectively links each book to its publisher.


Tip

Despite there being set rules for how to normalize a database, two different people could normalize the same example in slightly different ways. Database design does allow for personal preference and interpretations. The important thing is that a database has no clear and obvious NF violations. Any NF violation will likely lead to problems down the road.


Reviewing the Design

After walking through the normalization process, it’s best to review the design one more time. You want to make sure that the database stores all of the data you may ever need. Often the creation of new tables, thanks to normalization, implies additional information to record. For example, although the original focus of the cinema database was on the movies, now that there are separate actors and directors tables, additional facts about those people could be reflected in those tables.

With that in mind, although there are a number of additional columns that could be added to the forum database, particularly regarding the user, one more field should be added to the messages table. Because one message might be a reply to another, some method of indicating that relationship is required. One solution is to add a parent_id column to messages Image. If a message is a reply, its parent_id value will be the message_id of the original message (so message_id is acting as a foreign key to this same table). If a message has a parent_id of 0, then it’s a new thread, not a reply Image.

Image

Image To reflect a message hierarchy, the parent_id column is added to messages.

Image

Image How the parent_id field is used to track message threads.

After making any changes to the tables, you must run through the normal forms one more time to ensure that the database is still normalized. Finally, choose the column types and names, per the steps in Chapter 4 Image. Note that every integer column is UNSIGNED, the three primary key columns are also designated as AUTO_INCREMENT, and every column is set as NOT NULL.

Image

Image The final ERD for the forums database.

Once the schema is fully developed, it can be created in MySQL, using the commands shown in Chapter 5, “Introduction to SQL.” You’ll do that later in the chapter, after learning a few more things.


Tip

When you have a primary key–foreign key link (like forum_id in forums to forum_id in messages), both columns should be of the same type (in this case, TINYINT UNSIGNED NOT NULL).


Creating Indexes

Indexes are a special system that databases use to improve the performance of SELECT queries. Indexes can be placed on one or more columns, of any data type, effectively telling MySQL to pay particular attention to those values.

While the maximum number of indexes that a table can have varies, MySQL always guarantees that you can create at least 16 indexes for each table, and each index can incorporate up to 15 columns. While the need for a multicolumn index may not seem obvious, it will come in handy for searches frequently performed on the same combinations of columns (e.g., first and last name, city and state, etc.).

Although indexes are an integral part of any table, not everything needs to be indexed. While an index does improve the speed of reading from databases, it slows down queries that alter data in a database (because the changes need to be recorded in the index).

Indexes are best used on columns

• That are frequently used in the WHERE part of a query

• That are frequently used in an ORDER BY part of a query

• That are frequently used as the focal point of a JOIN (joins are discussed in the next chapter)

Generally speaking, you should not index columns that:

• Allow for NULL values

• Have a very limited range of values (such as just Y/N or 1/0)

MySQL has four types of indexes: INDEX (the standard), UNIQUE (which requires each row to have a unique value for that column), FULLTEXT (for performing FULLTEXT searches, also discussed in Chapter 7, “Advanced SQL and MySQL”), and PRIMARY KEY (which is just a particular UNIQUE index and one you’ve already been using). Note that a column should only ever have a single index on it, so choose the index type that’s most appropriate.

With this in mind, let’s continue designing the forum database by identifying appropriate indexes. Later in this chapter, the indexes will be defined when the tables are created in the database. To establish an index when creating a table, this clause is added to the CREATE TABLE command:

INDEX_TYPE index_name (columns)

The index name is optional. If no name is provided, the index will take the name of the column, or columns, to which it is applied. When indexing multiple columns, separate them by commas, and put them in the order from most to least important:

INDEX full_name (last_name, first_name)

You’ve already seen the syntax for creating indexes in Chapter 5. This command creates a table with a PRIMARY KEY index on the user_id field:

CREATE TABLE users (
user_id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
first_name VARCHAR(20) NOT NULL,
last_name VARCHAR(40) NOT NULL,
email VARCHAR(40) NOT NULL,
pass CHAR(40) NOT NULL,
registration_date DATETIME NOT NULL,
PRIMARY KEY (user_id)
);

The last thing you should know about indexes are the implications of indexing multiple columns. If you add an index on col1, col2, and col3 (in that order), this effectively creates an index for uses of col1, col1 and col2 together, or on all three columns together. It does not provide an index for referencing just col2 or col3 or those two together.

To create indexes

1. Add a PRIMARY KEY index on all primary keys.

Each table should always have a primary key and therefore a PRIMARY KEY index. With the forums database, the specific columns to be indexed as primary keys are: forums.forum_id, messages.message_id, and users.user_id. (The syntax table_name.column_name is a way to refer to a specific column within a specific table.)

2. Add UNIQUE indexes to any columns whose values cannot be duplicated within the table.

The forums database has three columns that should always be unique or else there will be problems: forums.name, users.username, and users.email.

3. Add FULLTEXT indexes, if appropriate.

FULLTEXT indexes and FULLTEXT searching are discussed in the next chapter, so I won’t discuss this topic any more here, but as you’ll discover, there is one FULLTEXT index to be used in this database.

4. Add standard indexes to columns frequently used in a WHERE clause.

It requires some experience to know in advance which columns will often be used in WHERE clauses (and therefore ought to be indexed). With the forums database, one common WHERE stands out: when a user logs in, they’ll provide their email address and password to log in. The query to confirm the user has provided the correct information will be something like:

SELECT * FROM users WHERE pass=SHA1('provided_password') AND email='provided_email_address'

From this query, one can deduce that indexing the combination of the email address and password would be beneficial.

5. Add standard indexes to columns frequently used in ORDER BY clauses.

Again, in time such columns will stand out while designing the database. In the forums example, there’s one column left that would be used in ORDER BY clauses that isn’t already indexed: messages.date_entered. This column will frequently be used in ORDER BY clauses as the site will, by default, show all messages in the order they were entered.

6. Add standard indexes to columns frequently used in JOINs.

You may not know what a JOIN is now, and the topic is thoroughly covered in Chapter 7, but the most obvious candidates are the foreign key columns. Remember that a foreign key in Table B relates to the primary key in Table A. When selecting data from the database, a JOIN will be written based upon this relationship. For that JOIN to be efficient, the foreign key must be indexed (the primary key will already have been indexed). In the forums example, three foreign key fields in the messages table ought to be indexed: forum_id, parent_id, and user_id.

Table 6.6 lists all the indexes identified through these steps.

Table 6.6. The forum Database Indexes

Image


Tip

Indexes can be created after you already have a populated table. However, you’ll get an error and the index will not be created if you attempt to add a UNIQUE index to a column that has duplicate values.



Tip

MySQL uses the term KEY as synonymous for INDEX:

KEY full_name (last_name, first_name)



Tip

You can limit the length of an index to a certain number of characters, such as the first 10:

INDEX index_name (column_name(10))

You might do so in situations where the first X characters will be sufficiently useful in an ORDER BY clause.


Using Different Table Types

A MySQL feature uncommon in other database applications is the ability to use different types of tables (a table’s type is also called its storage engine). Each table type supports different features, has its own limits (in terms of how much data it can store), and even performs better or worse under certain situations. Still, how you interact with any table type—in terms of running queries—is consistent across them all.

Historically, the most important table type was MyISAM. Until version 5.5.5 of MySQL, MyISAM was the default table type on all operating systems (on Windows, the switch to a different default was made in an earlier version of MySQL). MyISAM tables are great for most applications, handling SELECTs and INSERTs very quickly. The MyISAM storage engine cannot handle transactions, though, which is its main drawback (transactions are covered in the next chapter). Between that feature and its lack of row-level locking (the entire table must be locked instead), MyISAM tables are more vulnerable to corruption and data loss should a crash occur.

As of MySQL version 5.5.5, MySQL’s new default storage engine, on all operating systems, is InnoDB. InnoDB tables can be used for transactions and they perform UPDATEs nicely. InnoDB tables also support foreign key constraints (discussed at the end of the chapter) and row-level locking. But the InnoDB storage engine is generally slower than MyISAM and requires more disk space on the server. Also, an InnoDB table does not support FULLTEXT indexes (covered in Chapter 7).

To specify the storage engine when you define a table, add a clause to the end of the creation statement:

CREATE TABLE tablename (
column1name COLUMNTYPE,
column2name COLUMNTYPE...

) ENGINE = type

If you don’t specify a storage engine when creating tables, MySQL will use the default type for that MySQL server.

This feature of MySQL is even more significant because you can mix the table types within the same database. This way you can best customize each table for optimum features and performance. To continue designing the forums database, the next step is to identify the storage engine to be used by each table.

To establish a table’s type

1. Find your MySQL server’s available table types Image:

SHOW ENGINES;

Image

Image To confirm what table types your MySQL installation supports, run this command (in the mysql client, here, or phpMyAdmin).

The SHOW ENGINES command, when executed on the MySQL server, will reveal not only the available storage engines but also the default storage engine. It will help to know this information when it’s time to choose a table type for your database.

2. If any of your tables requires a FULLTEXT index, make it a MyISAM table.

Again, FULLTEXT indexes and searches are discussed in the next chapter, but I’ll say now that the messages table in the forums example will require a FULLTEXT index. Therefore, this table must be MyISAM.

3. If any of your tables requires support for transactions, make it an InnoDB table.

Yes, again, transactions are discussed in the next chapter, but the storage engines ought to be determined now. Neither the forums nor users tables in the forums database will require transactions.

4. If neither of the above applies to a table, use the default storage engine.

Table 6.7 identifies the storage engines to be used by the tables in the forums database.

Table 6.7. The forum Database Table Types

Image


Tip

MySQL has several other table types, but MyISAM and InnoDB are the two most important, by far. The MEMORY type creates the table in memory, making it an extremely fast table but with absolutely no permanence.



Tip

At the time of this writing, the MySQL documentation claims that the InnoDB table type is overall faster than MyISAM. There are some politics involved with the table types, so I take this claim with a grain of salt.


Languages and MySQL

Chapter 1, “Introduction to PHP,” quickly introduced the concept of encodings. An HTML page or PHP script can specify its encoding, which dictates what characters, and therefore languages, are supported. The same is true for a MySQL database: by setting your database’s encoding, you can impact what characters can be stored in it. To see a list of encodings supported by your version of MySQL, run a SHOW CHARACTER SET command Image. Note that the phrase character set is being used in MySQL to mean encoding (which I’ll generally follow in this section to be consistent with MySQL).

Image

Image The list of character sets supported by this MySQL installation.

Each character set in MySQL has one or more collations. Collation refers to the rules used for comparing characters in a set. It’s like alphabetization, but takes into account numbers, spaces, and other characters as well. Collation is tied to the character set being used, reflecting both the kinds of characters present in that language and the cultural habits of people who generally use the language. For example, how text is sorted in English is not the same as it is in Traditional Spanish or in Arabic. Other considerations include: Are upper-and lowercase versions of a character considered to be the same or different (i.e., is it a case-sensitive comparison)? Or, how do accented characters get sorted? Is a space counted or ignored?

To view MySQL’s available collations, run this query Image, replacing charset with the proper value from the result in the last query Image:

SHOW COLLATION LIKE 'charset%'

Image

Image The list of collations available in the UTF-8 character set. The first one, utf_general_ci, is the default.

The results of this query will also indicate the default collation for that character set. The names of collations use a concluding ci to indicate case-insensitivity, cs for case-sensitivity, and bin for binary.

Generally speaking, I recommend using the UTF-8 character set, with its default collation. More importantly, the character set in use by the database should match that of your PHP scripts. If you’re not using UTF-8 in your PHP scripts, use the matching encoding in the database. If the default collation doesn’t adhere to the conventions of the language primarily in use, then adjust the collation accordingly.

In MySQL, the server as a whole, each database, each table, and even every string column can have a defined character set and collation. To set these values when you create a database, use

CREATE DATABASE name CHARACTER SET charset COLLATE collation

To set these values when you create a table, use

CREATE TABLE name (
column definitions
) CHARACTER SET charset COLLATE collation

To establish the character set and collation for a column, add the right clause to the column’s definition (you’d only use this for text types):

CREATE TABLE name (
something TEXT CHARACTER SET charset COLLATE collation
...)

In each of these cases, both clauses are optional. If omitted, a default character set or collation will be used.

Establishing the character set and collation when you define a database affects what data can be stored (e.g., you can’t store a character in a column if its encoding doesn’t support that character). A second issue is the encoding used to communicate with MySQL. If you want to store Chinese characters in a table with a Chinese encoding, those characters will need to be transferred using the same encoding. To do so within the mysql client, set the encoding using just

CHARSET charset

With phpMyAdmin, the encoding to be used is established in the application itself (i.e., written in the configuration file).

At this point in time, every aspect of the database design for the forums example has been covered, so let’s create that database in MySQL, including its indexes, storage engines, character sets, and collations.

To assign character sets and collations

1. Access MySQL using whatever client you prefer.

Like the preceding chapter, this one will also use the mysql client for all of its examples. You are welcome to use phpMyAdmin or other tools as the interface to MySQL.

2. Create the forum database Image:

CREATE DATABASE forum
CHARACTER SET utf8
COLLATE utf8_general_ci;
USE forum;

Image

Image The first steps are to create and select the database.

Depending upon your setup, you may not be allowed to create your own databases. If not, just use the database provided to you and add the following tables to it. Note that in the CREATE DATABASE command, the character set and collation are also defined. By doing so at this point, every table will use those settings.

3. Create the forums table Image:

CREATE TABLE forums (
forum_id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(60) NOT NULL,
PRIMARY KEY (forum_id),
UNIQUE (name)
) ENGINE = INNODB;

Image

Image Creating the first table.

It does not matter in what order you create your tables, but I’ll make the forums table first. Remember that you can enter your SQL queries over multiple lines for convenience.

This table only contains two columns (which will happen frequently in a normalized database). Because I don’t expect there to be many forums, the primary key is a really small type (TINYINT). If you wanted to add descriptions of each forum, a VARCHAR(255) or TINYTEXT column could be added to this table. This table uses the InnoDB storage engine.

4. Create the messages table Image:

CREATE TABLE messages (
message_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
parent_id INT UNSIGNED NOT NULL DEFAULT 0,
forum_id TINYINT UNSIGNED NOT NULL,
user_id MEDIUMINT UNSIGNED NOT NULL,
subject VARCHAR(100) NOT NULL,
body LONGTEXT NOT NULL,
date_entered DATETIME NOT NULL,
PRIMARY KEY (message_id),
INDEX (parent_id),
INDEX (forum_id),
INDEX (user_id),
INDEX (date_entered)
) ENGINE = MYISAM;

Image

Image Creating the second table.

The primary key for this table has to be big, as it could have lots and lots of records. The three foreign key columns—forum_id, parent_id, and user_id—will all be the same size and type as their primary key counterparts. The subject is limited to 100 characters and the body of each message can be a lot of text. The date_entered field is a DATETIME type.

Unlike the other two tables, this one is of the MyISAM type.

5. Create the users table Image:

CREATE TABLE users (
user_id MEDIUMINT UNSIGNED NOT
NULL AUTO_INCREMENT,
username VARCHAR(30) NOT NULL,
pass CHAR(40) NOT NULL,
first_name VARCHAR(20) NOT NULL,
last_name VARCHAR(40) NOT NULL,
email VARCHAR(60) NOT NULL,
PRIMARY KEY (user_id),
UNIQUE (username),
UNIQUE (email),
INDEX login (pass, email)
) ENGINE = INNODB;

Image

Image The database’s third and final table.

Most of the columns here mimic those in the sitename database’s users table, created in the preceding two chapters. The pass column is defined as CHAR(40), because the SHA1( ) function will be used and it always returns a string 40 characters long (see Chapter 5).

This table uses the InnoDB engine.

6. If desired, confirm the database’s structure Image:

SHOW TABLES;
SHOW COLUMNS FROM forums;
SHOW COLUMNS FROM messages;
SHOW COLUMNS FROM users;

Image

Image Check the structure of any database or table using SHOW.

The SHOW command reveals information about a database or a table. This step is optional because MySQL reports on the success of each query as it is entered. Still, it’s always nice to remind yourself of a database’s structure.


Tip

Collations in MySQL can also be specified within a query, to affect the results:

SELECT ... ORDER BY column COLLATE collation
SELECT ... WHERE column LIKE 'value' COLLATE collation



Tip

The CONVERT( ) function can convert text from one character set to another.



Tip

You can change the default character set or collation for a database or table using an ALTER command, discussed in Chapter 7.



Tip

Because different character sets require more space to represent a string, you will likely need to increase the size of a column for UTF-8 characters. Do this before changing a column’s encoding so that no data is lost.


Time Zones and MySQL

Chapter 5 discussed how to use NOW( ) and other date- and time-related functions. That chapter explained that these functions reflect the time on the server. Therefore, values stored in a database using these functions are also storing the server’s time. That may not sound like a problem, but say you move your site from one server to another: you export all the data, import it into the other, and everything’s fine... unless the two servers are in different time zones, in which case all of the dates are now technically off. For some sites, such an alteration wouldn’t be a big deal, but what if your site features paid memberships? That means some people’s membership might expire several hours early and for others, several hours late! At the end of the day, the goal of a database is to reliably store information.

The solution to this particular problem is to store dates and times in a time zone–neutral way. Doing so requires something called UTC (Coordinated Universal Time, and, yes, the abbreviation doesn’t exactly match the term). UTC, like Greenwich Mean Time (GMT), provides a common point of origin, from which all times in the world can be expressed as UTC plus or minus some hours and minutes (Table 6.8).

Table 6.8. UTC Offsets

Image

Fortunately, you don’t have to know these values or perform any calculations in order to determine UTC for your server. Instead, the UTC_DATE( ) function returns the UTC date; UTC_TIME( ) returns the current UTC time; and UTC_TIMESTAMP( ) returns the current date and time.

Once you have stored a UTC time, you can retrieve the time adjusted to reflect the server’s or the user’s location. To change a date and time from any one time zone to another, use CONVERT_TZ( ) Image:

CONVERT_TZ(dt, from, to)

Image

Image A conversion of the current UTC date and time to the American Eastern Daylight Time (EDT).

The first argument is a date and time value, like the result of a function or what’s stored in a column. The second and third arguments are named time zones. In order to use this function, the list of time zones must already be stored in MySQL, which may or may not be the case for your installation (see the sidebar). If you see NULL results Image, check out the MySQL manual for how to install the time zones on your server.

Image

Image The CONVERT_TZ( ) function will return NULL if it references an invalid time zone or if the time zones haven’t been installed in MySQL (which is the case here).

To use this information, let’s start populating the forums database, recording the message posted date and time using UTC.

To work with UTC

1. Access the forums database using whatever client you prefer.

Like the preceding chapter, this one will also use the mysql client for all of its examples. You are welcome to use phpMyAdmin or other tools as the interface to MySQL.

2. If necessary, change the encoding to UTF-8 Image:

CHARSET utf8;

Image

Image The character set used to communicate with MySQL should match that used in the database.

Because the database uses UTF-8 as its character set, the communication with the database should use the same. This line, explained in the previous section of the chapter, does exactly that. Note that you only need to do this when using the mysql client. Also, if you’re not using UTF-8, change the command accordingly.

3. Add some new records to the forums table Image:

INSERT INTO forums (name) VALUES
('MySQL'), ('PHP'), ('Sports'),
('HTML'), ('CSS'), ('Kindling'),

Image

Image Adding records to the forums table.

Since the messages table relies on values retrieved from both the forums and users tables, those two tables need to be populated first. With this INSERT command, only the name column must be provided a value (the table’s forum_id column will be given an automatically incremented integer by MySQL).

4. Add some records to the users table Image:

INSERT INTO users (username, pass,
first_name, last_name, email) VALUES
('troutster', SHA1('mypass'),
'Larry', 'Ullman', '[email protected]'),
('funny man', SHA1('monkey'),
'David', 'Brent', '[email protected]'),
('Gareth', SHA1('asstmgr'), 'Gareth',
'Keenan', '[email protected]'),

Image

Image Adding records to the users table.

If you have any questions on the INSERT syntax or use of the SHA1( ) function here, see Chapter 5.

5. Add new records to the messages table Image:

SELECT * FROM forums;
SELECT user_id, username FROM users;
INSERT INTO messages (parent_id, forum_id, user_id, subject, body, date_entered) VALUES
(0, 1, 1, 'Question about normalization.', 'I''m confused about normalization. For the second normal form (2NF), I read...', UTC_TIMESTAMP( )),
(0, 1, 2, 'Database Design', 'I''m creating a new database and am having problems with the structure. How many tables should I have?...', UTC_TIMESTAMP( )),
(2, 1, 2, 'Database Design', 'The number of tables your database includes...', UTC_TIMESTAMP( )),
(0, 1, 3, 'Database Design', 'Okay, thanks!', UTC_TIMESTAMP( )),
(0, 2, 3, 'PHP Errors', 'I''m using the scripts from Chapter 3 and I can''t get the first calculator example to work. When I submit the form...', UTC_TIMESTAMP( ));

Image

Image Populating the messages table requires knowing foreign key values from users and forums.

Because two of the fields in the messages table (forum_id and user_id) relate to values in other tables, you need to know those values before inserting new records into this table. For example, when the troutster user creates a new message in the MySQL forum, the INSERT will have a forum_id of 1 and a user_id of 1.

This is further complicated by the parent_id column, which should store the message_id to which the new message is a reply. The second message added to the database will have a message_id of 2, so replies to that message need a parent_id of 2.

With your PHP scripts—once you’ve created an interface for this database, this process will be much easier, but it’s important to comprehend the theory in SQL terms first.

For the date_entered field, the value returned by the UTC_TIMESTAMP( ) function will be used. Using the UTC_TIMESTAMP( ) function, the record will store the UTC date and time, not the date and time on the server.

6. Repeat Steps 1 through 3 to populate the database.

The rest of the examples in this chapter and the next will use the populated database. You’ll probably want to download the SQL commands from the book’s corresponding Web site, although you can populate the tables with your own examples and then just change the queries in the rest of the chapter accordingly.

7. View the most recent record in the messages table, using the stored date and time Image:

SELECT message_id, subject, date_entered FROM messages
ORDER BY date_entered DESC LIMIT 1;

Image

Image The record that was just inserted, which reflects a time four hours ahead (the server is UTC-4).

As you can see in the figure and the table definition, UTC times are stored just the same as non-UTC times. What’s not obvious in the figure is that the record just inserted reflects a time four hours ahead of the server (because my particular server is in a time zone four hours behind UTC).

8. Retrieve the same record converting the date_entered to your time zone Image:

SELECT message_id, subject,
CONVERT_TZ(date_entered, 'UTC', 'America/New_York') AS local
FROM messages ORDER BY date_entered DESC LIMIT 1;

Image

Image The UTC-stored date and time converted to my local time.

Using the CONVERT_TZ( ) function, you can convert any date and time to a different time zone. For the from time zone, use UTC. For the to time zone, use yours.

If you get a NULL result Image, either the name of one of your time zones is wrong or MySQL hasn’t had its time zones loaded yet (see the sidebar).


Tip

However you decide to handle dates, the key is to be consistent. If you decide to use UTC, then always use UTC.



Tip

UTC is also known as Zulu time, represented by the letter Z.



Tip

Besides being time zone and daylight saving time agnostic, UTC is also more accurate. It factors in irregular leap seconds that compensate for the inexact movement of the planet.


Foreign Key Constraints

A feature of the InnoDB table type, not supported in other storage engines, is the ability to apply foreign key constraints. When you have related tables, the foreign key in Table B relates to the primary key in Table A (for ease of understanding, it may help to think of Table B as the child to Table A’s parent). For example, in the forums database, the messages.user_id is tied to users.user_id. If the administrator were to delete a user account, the relationship between those tables would be broken (because the messages table would have records with a user_id value that doesn’t exist in users). Foreign key constraints set rules as to what should happen when a break would occur, including preventing that break.

The syntax for creating a foreign key constraint is:

FOREIGN KEY (item_name) REFERENCES table (column)

(This goes within a CREATE TABLE or ALTER TABLE statement.)

The item name is the foreign key column in the current table. The table(column) clause is a reference to the parent table column to which this foreign key should be constrained. If you just use the above, thereby only identifying the relationship, without stating what should happen when the constraint would be broken, MySQL will throw an error if you attempt to delete the parent record while child records exist Image. MySQL will also throw an error if you attempt to create a child record using a parent ID that doesn’t exist Image.

Image

Image This error indicates that MySQL is preventing a query from deleting a parent record because the record is constrained to one or more existing children records.

Image

Image Foreign key constraints also affect INSERT queries.

You can dictate what alternative actions should occur by following the above syntax with one or both of these:

ON DELETE action
ON UPDATE action

There are five action options, but two—RESTRICT and NO ACTION—are synonymous and also the default (i.e., the same as if you don’t specify the action at all). A third action option—SET DEFAULT—doesn’t work (don’t ask me, ask the MySQL manual!). That leaves CASCADE and SET NULL. If the action set is SET NULL, the removal of a parent record will result in setting the corresponding foreign keys in the child table to NULL. If that table defines that column as NOT NULL (which it almost always should), deletion of the parent record will trigger an error.

The CASCADE action is the most useful option. It tells the database to apply the same changes to the related table. With this instruction, if you delete the parent record, MySQL will also delete the child records with that parent ID as its foreign key.

As only the InnoDB table type supports foreign key constraints, both tables in the relationship must be of the InnoDB type. Also, in order for MySQL to be able to compare the foreign key-primary key values, the related columns must be of equitable types. This means that numeric columns must be the same type and size; text columns must use the same character set and collation.

With the forums example, it’s not possible to use foreign key constraints as the sole table related to another—messages relates to both forums and users—must use the MyISAM table type (to support FULLTEXT searches). Instead, let’s take a look at a new hypothetical example for banking Image.

Image

Image The banking database could be used for virtual banking.

The customers table stores all of the information particular to a customer. It would logically also store contact information and so forth. The accounts table stores the accounts for each customer, including the type (Checking or Savings) and balance. Each customer may have more than one account, but each account is associated with only one customer (for a bit of simplicity). In the real world, the table might also store the date the account was opened and use a BIGINT as the balance (thereby representing all transactions in cents instead of dollars with decimals). Finally, the transactions table stores every movement of money from one account to another. Again, to make the example a bit easier to follow, the example assumes that only accounts within this same system will interact. Note that the transactions table has two one-to-many relationships with accounts (not one many-to-many). Each transaction’s to_account_id value will be associated with a single account, but each account could be the “to” account multiple times. The same applies to the “from” account. Finally, foreign key constraints are applied to preserve the integrity of the data.

In this next series of steps, you’ll create and populate this database, paying attention to the constraints. In the next chapter, this same database will be used to demonstrate transactions and encryption.

To create foreign key constraints

1. Access MySQL using whatever client you prefer.

Like the preceding chapter, this one will also use the mysql client for all of its examples. You are welcome to use phpMyAdmin or other tools as the interface to MySQL.

2. Create the banking database Image:

CREATE DATABASE banking
CHARACTER SET utf8
COLLATE utf8_general_ci;
USE banking;

Image

Image A new database is being created for this example.

As always, depending upon your setup, you may not be allowed to create your own databases. If not, just use the database provided to you and add the following tables to it.

3. If necessary, change the communication encoding to UTF-8:

CHARSET utf8;

4. Create the customers table Image:

CREATE TABLE customers (
customer_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
first_name VARCHAR(20) NOT NULL,
last_name VARCHAR(40) NOT NULL,
PRIMARY KEY (customer_id),
INDEX full_name (last_name, first_name)
) ENGINE = INNODB;

Image

Image Creating the customers table.

The customers table just stores the customer’s ID—the primary key—and name (in two columns). An index is also placed on the full name, in case that might be used in ORDER BY and other query clauses. So that the database can use foreign key constraints, every table will use the InnoDB storage engine.

5. Create the accounts table Image:

CREATE TABLE accounts (
account_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
customer_id INT UNSIGNED NOT NULL,
type ENUM('Checking', 'Savings') NOT NULL,
balance DECIMAL(10,2) UNSIGNED NOT NULL DEFAULT 0.0,
PRIMARY KEY (account_id),
INDEX (customer_id),
FOREIGN KEY (customer_id) REFERENCES customers (customer_id)
ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE = INNODB;

Image

Image Creating the accounts table.

The accounts table stores the account ID, customer ID, account type, and balance. The customer_id column has an index on it, as it will be used in JOINs (in Chapter 7). More importantly, the column is constrained to customers.customer_id, thereby protecting both tables. Even though NO ACTION is the default constraint, I’ve included it in the definition for added clarity.

6. Create the transactions table Image:

CREATE TABLE transactions (
transaction_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
to_account_id INT UNSIGNED NOT NULL,
from_account_id INT UNSIGNED NOT NULL,
amount DECIMAL(5,2) UNSIGNED NOT NULL,
date_entered TIMESTAMP NOT NULL,
PRIMARY KEY (transaction_id),
INDEX (to_account_id),
INDEX (from_account_id),
INDEX (date_entered),
FOREIGN KEY (to_account_id) REFERENCES accounts (account_id)
ON DELETE NO ACTION ON UPDATE NO ACTION,
FOREIGN KEY (from_account_id) REFERENCES accounts (account_id)
ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE = INNODB;

Image

Image Creating the third, and final, table: transactions.

The final table will be used to record all movements of monies among the accounts. To do so, it stores both account IDs—the “to” and “from,” the amount, and the date/time. Indexes are added accordingly, and both account IDs are constrained to the accounts table.

7. Populate the customers and accounts tables Image:

INSERT INTO customers (first_name, last_name)
VALUES ('Sarah', 'Vowell'), ('David', 'Sedaris'), ('Kojo', 'Nnamdi'),
INSERT INTO accounts (customer_id, balance)
VALUES (1, 5460.23), (2, 909325.24), (3, 892.00);

Image

Image Three records are added to both the customers and accounts tables.

First, sample data is entered into the first two tables (the third will be used in the next chapter). Note that because the accounts.type column is defined as an ENUM NOT NULL, if no value is provided for that column, the first item in the ENUM definition—Checking—will be used.

8. Attempt to put data into the accounts table for which there is no customer Image:

INSERT INTO accounts (customer_id, type, balance)
VALUES (10, 'Savings', 200.00);

Image

Image Again, as in Image, the constraint denies the INSERT query due to an invalid value from the parent table.

The foreign key constraint present in the accounts table will prevent an account being created without a valid customer ID (a pretty useful check in the real world).

9. Attempt to delete a record from the customers table for which there is an accounts record Image:

DELETE FROM customers
WHERE customer_id=2;

Image

Image Because the customer with an ID of 2 has one or more records in the accounts table, the customers record cannot be deleted.

The constraint will also prevent the deletion of customer records when that customer still has an account.

Despite the constraint, you could still delete a customer record if the customer does not have any records in the accounts table.


Tip

To actually delete constrained records, you must first delete all the children records, and then the parent record.



Tip

Foreign key constraints require that all columns in the constraint be indexed. Normal database design would suggest this is the case, but if the correct indexes do not exist, MySQL will create them when the constraint is defined.



Tip

Similar to constraints are triggers. Simply put, a trigger is a way of telling the database “when X happens to this table, do Y.” For example, when inserting a record in Table A, another record might be created or updated in Table B. See the MySQL manual for more on triggers.


Review and Pursue

If you have any problems with the review questions or the pursue prompts, turn to the book’s supporting forum (www.LarryUllman.com/forums/).

Review

• Why is normalization important?

• What are the two types of keys?

• What are the three types of table relationships?

• How do you fix the problem of a many-to-many relationship between two tables?

• What are the four types of indexes? What general types of columns should be indexed? What general types of columns should not be indexed?

• What are the two most common MySQL table types? What is the default table type for your MySQL installation?

• What is a character set? What is a collation? What impact does the character set have on the database? What impact does the collation have? What character set and collation are you using?

• What is UTC? How do you find the UTC time in MySQL? How do you convert from UTC to another time zone’s time?

• What are foreign key constraints? What table type supports foreign key constraints?

Pursue

• You may want to consider downloading, installing, and learning to use the MySQL Workbench application. It can be quite useful.

• If you don’t fully grasp the process of normalization—and that’s perfectly understandable—search for additional tutorials online or ask a question in my support forums.

• Design your own database using the information presented here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.8.222