Chapter 2. The Entity-Relationship Model of a Database

Let us begin our discussion of database design by looking at an informal database model called the entity-relationship model . This model of a relational database provides a useful perspective, especially for the purposes of the initial database design.

I will illustrate the general principles of this model with the LIBRARY database example, which I will carry through the entire book. This example database is designed to hold data about the books in a certain library. The amount of data we will use will be kept artificially small—just enough to illustrate the concepts. (In fact, at this point, you may want to take a look at the example database. For details on downloading it from the Internet, or on using Microsoft Access to create it yourself, see Appendix D .) In the next chapter, we will actually implement the entity-relationship (E/R) model for our LIBRARY database.

What Is a Database?

A database may be defined as a collection of persistent data. The term persistent is somewhat vague, but is intended to imply that the data has a more-or-less independent existence or that it is semipermanent. For instance, data stored on paper in a filing cabinet, or stored magnetically on a hard disk, CD-ROM, or computer tape is persistent, whereas data stored in a computer’s memory is generally not considered to be persistent. (The term permanent is a bit too strong, since very little in life is truly permanent.)

Of course, this is a very general concept. Most real-life databases consist of data that exist for a specific purpose and are thus persistent.

Entities and Their Attributes

The purpose of a database is to store information about certain types of objects. In database language, these objects are called entities . For example, the entities of the LIBRARY database include books, authors, and publishers.

It is very important at the outset to make a distinction between the entities that are contained in a database at a given time and the world of all possible entities that the database might contain. The reason this is important is that the contents of a database are constantly changing and we must make decisions based not just on what is contained in a database at a given time, but on what might be contained in the database in the future.

For example, at a given time, our LIBRARY database might contain 14 book entities. However, as time goes on, new books may be added to the database, and old books may be removed. Thus, the entities in the database are constantly changing. If, for example, based on the fact that the 14 books currently in the database have different titles, we decide to use the title to identify each book uniquely, we may be in for some trouble when, later on, a different book arrives at the library with the same title as a previous book.

The world of all possible entities of a specific type that a database might contain is referred to as an entity class . We will use italics to denote entity classes. Thus, for instance, the world of all possible books is the Books entity class, and the world of all possible authors is the Authors entity class.

We emphasize that an entity class is just an abstract description of something, whereas an entity is a concrete example of that description. The entity classes in our very modest LIBRARY example database are (at least so far):

  • Books

  • Authors

  • Publishers

The set of entities of a given entity class that are in the database at a given time is called an entity set . To clarify the difference between entity set and entity class with an example, consider the BOOKS table in the LIBRARY database, which is shown in Table 2-1.

Table 2-1. The BOOKS table from the LIBRARY database

ISBN

Title

Price

0-12-333433-3

On Liberty

$25.00

0-103-45678-9

Iliad

$25.00

0-91-335678-7

Faerie Queene

$15.00

0-99-999999-9

Emma

$20.00

1-22-233700-0

Visual Basic

$25.00

1-1111-1111-1

C++

$29.95

0-91-045678-5

Hamlet

$20.00

0-555-55555-9

Macbeth

$12.00

0-99-777777-7

King Lear

$49.00

0-123-45678-0

Ulysses

$34.00

0-12-345678-9

Jane Eyre

$49.00

0-11-345678-9

Moby-Dick

$49.00

0-321-32132-1

Balloon

$34.00

0-55-123456-9

Main Street

$22.95

The entities are books, the entity class is the set of all possible books, and the entity set (at this moment) is the specific set of 14 books listed in the BOOKS table. As mentioned, the entity set will change as new books (book entities) are added to the table or old ones are removed. However, the entity class does not change.

Incidentally, if you are familiar with object-oriented programming concepts, you will recognize the concept of a class . In object-oriented circles, we would refer to an entity class simply as a class and an entity as an object.

The entities of an entity class possess certain properties, which are called attributes. We usually refer to these attributes as attributes of the entity class itself. It is up to the database designer to determine which attributes to include for each entity class. It is these attributes that will correspond to the fields in the tables of the database.

The attributes of an entity class serve three main purposes:

  • Attributes are used to include information that we want in the database. For instance, we want the title of each book to be included in the database, so we include a Title attribute for the Books entity class.

  • Attributes are used to help uniquely identify individual entities within an entity class. For instance, we may wish to include a publisher’s ID-number attribute for the Publishers entity class, to uniquely identify each publisher. If combinations of other attributes (such as the publisher’s name and publisher’s address) will serve this purpose, the inclusion of an identifying attribute is not strictly necessary, but it can still be more efficient to include such an attribute, since often we can create a much shorter identifying attribute. For instance, a combination of title, author, publisher, and copyright date would make a very awkward and inefficient identifying attribute for the Books entity class—much more so than the ISBN attribute.

  • Attributes are used to describe relationships between the entities in different entity classes. We will discuss this subject in more detail later.

For now, let us list the attributes for the LIBRARY database that we need to supply information about each entity and to identify each entity uniquely. I will deal with the issue of describing relationships later. Remember that this example is kept deliberately small—in real life we would no doubt include many other attributes.

The attributes of the entity classes in the LIBRARY database are:

Books attributes

Title
ISBN
Price

Authors attributes

AuName
AuPhone
AuID

Publishers attributes

PubName
PubPhone
PubID

Let us make a few remarks about these attributes.

  • From these attributes alone, there is no direct way to tell who is the author of a given book, since there is no author-related attribute in the Books entity class. A similar statement applies to determining the publisher of a book. Thus, we will need to add more attributes in order to describe these relationships.

  • The ISBN (International Standard Book Number) of a book serves to identify the book uniquely, since no two books have the same ISBN (at least in theory). On the other hand, the Title alone does not uniquely identify the book, since many books have the same title. In fact, the sole purpose of ISBNs (here and in the real world) is to identify books uniquely. Put another way, the ISBN is a quintessential identifying attribute!

  • We may reasonably assume that no two publishers in the world have the same name and the same phone number. Hence, these two attributes together uniquely identify the publisher. Nevertheless, we have included a publisher’s ID attribute to make this identification more convenient.

Let us emphasize that an entity class is a description, not a set. For instance, the entity class Books is a description of the attributes of the entities that we identify as books. A Books entity is the “database version” of a book. It is not a physical book, but rather a book as defined by the values of its attributes. For instance, the following is a Books entity:

Title = Gone With the Wind
ISBN = 0-12-345678-9
Price = $24.00

Now, there is certainly more than one physical copy in existence of the book Gone With the Wind, with this ISBN and price, but that is not relevant to our discussion. As far as the database is concerned, there is only one Books entity defined by:

Title = Gone With the Wind
ISBN = 0-12-345678-9
Price = $24.00

If we need to model multiple copies of physical books in our database (as a real library would do), then we must add another attribute to the Books entity class, perhaps called CopyNumber. Even still, a book entity is just a set of attribute values.

These matters emphasize the point that it is up to the database designer to ensure that the set of attributes for an entity uniquely identify the entity from among all other entities that may appear in the database (now and forever, if possible!). For instance, if the Books entity class included only the Title and Price attributes, there would certainly be cause to worry that someday we might want to include two books with the same title and price. While this is allowed in some database-application programs, it can lead to great confusion and is definitely not recommended. Moreover, it is forbidden by definition in a true relational database. In other words, no two entities can agree on all of their attributes. (This is allowed in Microsoft Access, however.)

Keys and Superkeys

A set of attributes that uniquely identifies any entity from among all possible entities in the entity class that may appear in the database is called a superkey for the entity class. Thus, the set {ISBN} is a superkey for the Books entity class, and the sets {PubID} and {PubName, PubPhone} are both superkeys for the Publishers entity class.

Note that there is a bit of subjectivity in this definition of superkey, since it depends ultimately on our decision about which entities may ever appear in the database, and this is probably something of which we cannot be absolutely certain. Consider, for instance, the Books entity class. There is no law that says all books must have an ISBN (and many books do not). Also, there is no law that says that two books cannot have the same ISBN. (The ISBN is assigned, at least in part, by the publisher of the book.) Thus, the set {ISBN} is a superkey only if we are willing to accept the fact that all books that the library purchases have distinct ISBNs or that the librarian will assign a uniqueersatz ISBN to any books that do not have a real ISBN.

It is important to emphasize that the concept of a superkey applies to entity classes, and not entity sets. Although we can define a superkey for an entity set, this is of limited use, since what may serve to identify the entities uniquely in a particular entity set may fail to do so if we add new entities to the set. To illustrate, the Title attribute does serve to identify each of the 14 books uniquely in the BOOKS table. Thus, {Title} is a superkey for the entity set described by the BOOKS table. However, {Title} is not a superkey for the Books entity class, since there are many distinct books with the same title.

We have remarked that {ISBN} is a superkey for the Books entity class. Of course, so is {Title, ISBN}, but it is wasteful and inefficient to include the Title attribute purely for the sake of identification.

Indeed, one of the difficulties with superkeys is that they may contain more attributes than is absolutely necessary to identify any entity uniquely. It is more desirable to work with superkeys that do not have this property. A superkey is called a key when it has the property that no proper subset of it is also a superkey. Thus, if we remove an attribute from a key, the resulting set is no longer a superkey. Put more succinctly, a key is a minimal superkey. Sometimes keys are called candidate keys, since it is usually the case that we want to select one particular key to use as an identifier. This particular choice is referred to as the primary key . The primary keys in the LIBRARY database are ISBN, AuID, and PubID.

I should remark that a key may contain more than one attribute, and different keys may have different numbers of attributes. For instance, it is reasonable to assume that both {SocialSecurityNumber} and {FullName, FullAddress, DateofBirth} are keys for a US Citizens entity class.

Relationships Between Entities

If we are going to model a database as a collection of entity sets (tables), then we also need to describe the relationships between these entity sets. For instance, an author relationship exists between a book and the authors who wrote that book. We might call this relationship WrittenBy. Thus, Hamlet is WrittenBy Shakespeare.

It is possible to draw a diagram, called an entity-relationship diagram, or E/R diagram,to illustrate the entity classes in a database model, along with their attributes and relationships. Figure 2-1 shows the LIBRARY E/R diagram, with an additional entity class called Contributors (a contributor may be someone who contributes to or writes only a very small portion of a book, and thus may not be accorded all of the rights of an author, such as a royalty).

The LIBRARY entity-relationship diagram
Figure 2-1. The LIBRARY entity-relationship diagram

Note that each entity class is denoted by a rectangle, and each attribute by an ellipse. The relations are denoted by diamonds. We have included the Contributors entity class in this model merely to illustrate a special type of relationship. In particular, since a contributor is considered an author, there is an IsA relationship between the two entity classes.

The model represented by an E/R diagram is sometimes referred to as a semantic model since it describes much of the meaning of the database.

Types of Relationships

Referring to Figure 2-1, the symbols 1 and ∞ represent the type of relationship between the corresponding entity classes. (The symbol ∞ is read “many.”) Relationships can be classified into three types. For instance, the relationship between Books and Authors is many-to-many, meaning that a book may have many authors and an author may write many books. On the other hand, the relationship from Publishers to Books is one-to-many , meaning that one publisher may publish many books, but a book is published by at most one publisher (or so we will assume).

One-to-one relationships, where each entity on each side is related to at most one entity on the other side of the relationship, are fairly rare in database design. For instance, consider the Contributors-Authors relationship, which is one-to-one. We could replace the Contributors class by a contributor attribute of the Authors class, thus eliminating the need for a separate class and a separate relationship. On the other hand, if the Contributors class had several attributes that are not shared by the Authors class, then a separate class may be appropriate.

In Chapter 3 we will actually implement the full E/R model for our LIBRARY database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.168.83