Suppose you're a company that sells food items, like Northwind. You have a Products table with all the items you sell, with one row for each item: ice cream, yogurt, nondairy creamer, and so on. Among the columns, you include a field to store all the flavors for that item: vanilla, blueberry, Irish crème (see Table 2.14).
Products | |||
---|---|---|---|
Product | Supplier | Flavors | <<other fields>> |
Ice cream | Frozen Delights | Vanilla, Chocolate, Strawberry | |
Yogurt | Heavenly Cultures | Vanilla, Blueberry, Asparagus, Strawberry | |
Nondairy Creamer | Foodstuffs Inc. | Vanilla, Asparagus, Hazelnut |
Authors | |||
---|---|---|---|
Last Name | First Name | Major Works | <<other fields>> |
Dostoevsky | Fyodor | The Brothers Karamazov, Crime and Punishment, The Idiot, The Gambler | |
Turgenev | Ivan | Fathers and Sons | |
Tolstoy | Leo | War and Peace, Anna Karenina |
Or consider a table of Russian authors. Each row is dedicated to one author. You have fields for first name, last name, and birthday. You also have a column for the author's major works (see Table 2.14).
The Flavors field in the Products table and the Major Works field in the Authors table are known as multivalue fields. These are fields that store more than one value in each row. They're usually easy to identify because, as in the examples, they use commas to separate the various values. The fields are also usually named in the plural—flavors, characters, works, and so on.
I think multivalue fields are among the most difficult database problems to handle, both conceptually and practically. At first glance, the Flavors field in the Products table and the Major Works field in the Authors table both seem like fine, even elegant, solutions to a common problem. If you know that the number of values for a single characteristic—product flavors, authors' major works, and so on—will vary among records, why try to predict the number of columns you need? Instead of using three—or six, or eight, or who knows how many—columns to store values, wouldn't the best solution be to store all values in a single field and separate them by commas?
But you are designing your database so you can quickly and efficiently retrieve the data you need. Suppose you decide to discontinue an experimental flavor—say, asparagus—from all your products. If Asparagus is always a discrete record, it's easy to delete all occurrences of Asparagus from your database. If Asparagus is but one value in a multivalue field, however, you need to edit each of those records. That task is more difficult and, equally important, more likely to result in error. The threat to data integrity is substantial.
Or suppose you want to retrieve all records on Dostoevsky's Crime and Punishment. Surrounded as it is between The Brothers Karamazov and The Idiot, it will be inefficient to find information on that one book. Or say you want to sort all the books by title. A multivalue field will make it difficult to put all the books in alphabetical order.
Put simply, multivalue fields are difficult to edit, sort, and delete. They violate the essence of good relational database design. You need to get rid of them.
Earlier you saw that the Programs table had a multipart field, Character Names/Character Occupations. To resolve this multipart field, you created separate Character Names and Character Occupations fields (see Table 2.15).
Multivalue Fields | |||
---|---|---|---|
Program Name | Character Names | Character Occupations | <<other fields>> |
The Andy Griffith Show | Andy Taylor, Barney Fife, Gomer Pyle | Sheriff, Deputy Sheriff, Gas Station Attendant | |
Happy Days | Richie Cunningham, Arthur Fonzarelli | Student, Motorcyclist | |
Bewitched | Samantha Stevens, Darrin Stevens | Witch, Ad Executive | |
The Lucy Show | Lucy Carmichael, Vivian Bagley | Widow, Divorcee | |
Married with Children | Al Bundy, Peg Bundy | Shoe Salesman, Homemaker |
Splitting the field into two columns resolved the problem of having two different types of data in the same field. But the Character Names and Character Occupations remain multivalue fields. To repeat, they have the two telltale signs of multivalue fields: The values are separated by commas, and the field name is plural.
So how do you fix these multivalue fields?
First, take both the Character Names and Character Occupations fields out of the Programs table and put them in a new table called Roles. Second, take the primary key from the Programs table and put it in the Roles table as well. Finally, let's split the new Character Name (now singular) field into Last Name and First Name fields to eliminate the multipart field. A separate field is assigned to the character occupation (see Table 2.16).
Multivalue Fields Resolved | |||
---|---|---|---|
ProgramID | Last Name | First Name | Occupation |
1 (Andy Griffith Show) | Taylor | Andy | Sheriff |
1 (Andy Griffith Show) | Fife | Barney | Deputy Sheriff |
1 (Andy Griffith Show) | Pyle | Gomer | Filling station attendant |
2 (Happy Days) | Cunningham | Richie | Student |
2 (Happy Days) | Fonzarelli | Arthur | Motorcyclist |
This solution seems counterintuitive; it provides a complex, even convoluted, solution to what seems to be a simple problem.
But using keys to resolve multivalue fields gives you a small taste of how you are going to integrate data in your database. As you'll see, putting the primary key of the original table into the new table enables you to establish a relationship between the original and new tables. By using primary keys to represent the values in each record, you make it possible to bring together values from different tables and integrate the information in your database.
Q&A
|
18.119.248.149