5
Translating User Needs into Data Models

Chapter 4, “Understanding User Needs,” discussed ways that you can work with customers to gain a full understanding of the problem at hand. The result should be a big pile of facts, goals, needs, and requirements that should be part of the new database and its surrounding ecosystem. You might already have made some connections among various parts of this information, but mostly it should be a big heap of requirements that doesn't say too much about the database's design and construction.

This kind of pile of information is sometimes called a contextual list. It's basically just a list of important stuff (although it might be fairly elaborate and include requirements documents, diagrams, charts, and all sorts of other supporting documentation).

The next step in turning the conceptual list into a database is converting it into a more formal model. You can compare the formal model to the contextual list and ensure that the model can handle all of your requirements.

You can also use the model to verify that you're on track. You can explain the model to the customers and see if they think it will handle all of their needs or if they forgot to mention something important while you were following the procedures described in Chapter 4.

After you build a data model (or possibly more than one), you can use it to build a relational model. The relational model is a specific kind of formal model that is structurally very similar to the one used by relational databases. That makes it relatively easy to convert the relational model into an actual database in MySQL, Postgres, or some other relational database product.

In this chapter, you learn how to:

  • Create user interface models.
  • Create semantic object models.
  • Create entity-relationship models.
  • Convert those types of models into relational models.

After you master these techniques, you'll be ready to start pulling the models apart and rearranging the pieces to improve the design by making it robust and flexible.

WHAT ARE DATA MODELS?

Despite what some managers seem to believe, a model isn't a silver bullet or enchanted wand that will magically make a project succeed. A model by itself doesn't do anything. It doesn't build a database, it isn't a piece of software (although there are software tools that can help you build a model), and the final user of your database never sees a model.

A model is a plan. It's a blueprint for building something, in this case a database. The purpose of the model isn't to do anything by itself. Instead it gives you a concrete way to think about the database that you are going to build. By studying the model's pieces, you can decide whether it represents all the data that you need to use to meet your customers' needs.

A model is also useful for ensuring that everyone on the project has the same understanding of what needs to be done. If everyone understands the model, then everyone should have the same ideas about what data should be stored, which tables should contain it, and how the tables are related. They should also agree on the business rules that determine how the data is used and constrained.

It's very important that everyone actually understands the model. I've seen developers build remarkably complicated models and then dump them on hapless end users, expecting those users to understand the models' every subtle nuance. The developers ended up walking the users through the models until the users' heads were spinning, and the developers could have convinced them that up was down and red was blue if they would just stop the meeting. Models are for those who know how to understand them, not necessarily for everyone.

After you build a model, you should look at it and ask questions such as:

  • Where do we store customer information?
  • How many contact names can we store for a customer?
  • Where do we store the contacts' favorite colors?
  • What if we need to store multiple price points for the same product?
  • How do we store the 17 kinds of addresses we need for customers?
  • Where do we store supplier information?
  • If someone asks about an order they placed but haven't received, how can we figure out where it is?
  • Where can we enter special instructions for an order?
  • How do we know when we need to restock left-handed cable stretchers?

You should also work through any use cases to see if the model can handle them. You can't actually fill out insurance claim forms and look in the warehouse for missing orders yet, but you should be able to say, “This table contains the data that we would need to do that.”

The end users can help a lot with this part. Although they may not understand the models, they do understand their business and can ask these sorts of questions while you and the other developers try to figure out if the model can handle them.

If the model cannot handle all of your (and the users') questions, then you need to adjust the model. You might need to add fields or tables, change a field's data type, make new connections between tables, or make other changes to satisfy the requirements. In extreme cases, it may be easiest to start a new model from scratch.

This chapter discusses four kinds of models that grow successively closer to the final database implementation.

First, a user interface model views the database at a very high level as seen from the end user's point of view. Depending on how you are going to use the database, this might be as the user will view the database through forms on a computer screen. This model is very far from the final database implementation, and it doesn't tell much about the database design. This model is useful for understanding what data is needed by the project and how you might use it to navigate through the user interface.

The second and third types of models described in this chapter are semantic object models and entity-relationship models. These are roughly at the same distance from the final database. They are at a slightly lower level than the user interface model and show relationships among data entities more explicitly. They are still at a moderately high logical level, however, and do not provide quite enough detail to build the final database.

The fourth type of model described in this chapter is the relational model. This model mimics the structure of a relational database closely enough that you can actually sit down and start building the database.

In a typical database design project, you might start with a user interface model. I like to start there because I figure if the user is going to see something, we better have a place for it in the database. Conversely, if the user isn't going to see it in some manner, do we really need it in the database? (However, that's just me. I like designing user interfaces. Some people prefer to skip that and let someone else worry about the user interface.)

Next, you use what you learned from the user interface model to build either a semantic object model or an entity-relationship model. These models serve the same purpose, so you generally don't need to build them both. Work through this chapter and the exercises at its end and decide which one you prefer.

Finally, you convert the semantic object model or entity-relationship model into a relational model. Now, you have something that could be turned into a database. There are still some steps to go through as you refine the relational model to improve the final database's reliability and performance, but those are subjects for later chapters.

Remember, these models are intended to improve your understanding of the data and the ways in which different bits are related—so with that in mind, anything that increases understanding is beneficial. Don't be afraid to add notes that clarify confusing issues. Feel free to modify the basic modeling techniques described here. There's some benefit to sticking close to standard notations because it lets others who have studied the same notation understand what you are doing. But if adding a number in a box by each link or a colored triangle helps you and your team gain a better handle on the design, do it. Just be sure to make a note of your additions and changes so that everyone knows what your additions mean.

USER INTERFACE MODELS

A user interface model examines the database at a very high level from the point of view of end users and other systems that manipulate the data. In most database applications, a user will eventually see the data in some form. For example, an order entry and tracking application might use a series of screens where the user can perform such chores as entering orders, tracking orders, marking an order as paid, looking up available inventory, and so forth. Those screens form the database's user interface.

Some databases don't have their own user interfaces, at least not that a human will see. Some databases are designed to store data for other applications to manipulate. In that case, it is the interfaces that those other applications provide that the human user sees. If possible, you should consider what those applications will need to display and plan accordingly. Sometimes, it is useful to build throwaway interfaces to view the data on forms, in spreadsheets such as Google Sheets or Microsoft Excel, or in text files.

You should consider how those other applications will obtain the data from your database. The way in which those applications interact with your database forms a nonhuman interface, and you should plan for that one, too. For example, suppose you know that a dispatch system will need to fetch information about employees from your database and information about pending repair jobs from another system. You should think about the kinds of employee data that the dispatch system will need, such as a repairperson's skills, equipment, assigned vehicle, and so forth. Then you can design your database to make fetching this data easy and efficient.

To build the user interface model, start by making rough sketches of the screens that the user will see. Often these first sketches can be modified versions of paper forms if any exist.

Fill the fields with sample data to make it easier to understand what belongs on each screen. These sketches can be anything from crayon scribbles on bar napkins, to forms drawn with your favorite computerized drawing tool, to full user interface prototypes. Figure 5.1 shows a mocked-up Find Orders screen built with Visual Basic. This form holds only controls and doesn't include any code to do anything more than just sit there and look pretty.

A representation exhibits the normal forms and the rules.

FIGURE 5.1

In addition to the image in Figure 5.1, you should include text explaining what the various parts of the form do. In this case, that text might say:

  • The user enters selection criteria in the upper part of the form and clicks the Search button.
  • The program displays a list of matching order records in the bottom of the form.
  • The user can select an order from the list and click Open to open that order's detail form.

At this level, the user probably thinks of each order as containing all of the information on this form. If you were to fill out an order on a piece of paper, that paper would include blanks for you to fill in customer name, customer ID, contact name, order date, and so forth. The order would also contain a status, although you might represent that by putting the order in boxes on your desk labeled Pending, Open, Closed, Ignored, and so forth, rather than by having a status box on the paper form.

The form and its description also raise some important questions:

  • What fields should be allowed as selection criteria?
  • Should we index the selection criteria fields in the database to make searching faster? Some or all of those fields?
  • When the user selects an order and clicks Open, how does the program open the Orders record? (Searching for the exact combination of fields shown in the list would be slow, and there might even be two entries with the same values if someone placed two orders on the same day. It might be wise to add an order ID field to make finding the record again easier.)

When you select an order from the form shown in Figure 5.1 and click Open, the program displays the form shown in Figure 5.2.

A representation of a relational diagram to model employees and projects.

FIGURE 5.2

This form shows the fields that should be associated with an order. These include:

  • Various dates such as the date the order was placed, the date the products were shipped, and the date the customer paid
  • The order's current status
  • The shipping method (for example, Priority, Overnight, Armored Courier, and so forth)
  • The billing method (credit card, invoice net 30)
  • Various addresses such as the shipping and billing addresses
  • Contact information for when we get confused (or want to send spam to the unsuspecting contact)
  • The order's line items
  • Subtotal, taxes, shipping, and grand total

Both of these forms involve orders and both provide some information about the order data. The Order Detail form includes a lot of the fields that must be stored to represent an order. The Find Orders screen tells which order fields should be allowed as search criteria (and thus may make good keys) and which order fields should be displayed in the result list.

Each of these forms tells a little bit more about the order data. Other mocked-up forms would provide even more information. For example, the application would need an order entry form and a form to update order information (such as changing the addresses or setting order status to Closed). Depending on how the work was divided among employees, there might be special forms for performing a single specific task. For example, an order fulfillment clerk (who puts things in a box and ships them) would need to be able to change an order's status to Shipped but probably doesn't need to be able to change a customer's credit card numbers. In fact, going through the screens and deciding which employees should be able to do which tasks gives you an initial indication of the application's security requirements.

Still other forms would give hints about other parts of the database. A full-fledged database for this application would need to include forms for managing inventory (how many whoopee cushions do we have in stock and when do we order more?), supplier information (who sells us our nose glasses?), employee information (who is assigned to pester delinquent customers this week?), advertising data (which spam campaigns gave us the most new contacts?), and so forth.

A large application might include dozens or even hundreds of forms, each of which gives only a partial glimpse of the information contained in the database. Together, these mocked-up screens form a user interface model that shines spotlights into the data needed to support the application.

With the user interface model in hand, you are now ready to build a more formal model that shows the entities used by the application in greater detail. The first of those models discussed in this chapter is the semantic object model.

SEMANTIC OBJECT MODELS

A semantic object model (SOM) is intended to represent a system at a fairly high level. Although the ideas are somewhat technical, they still relate fairly closely to the way people think about things, so semantic object models are relatively understandable to users.

Classes and Objects

Intuitively, a semantic class is a type of thing that you might want to represent in your system. It can include physical objects such as people, furniture, inventory items, and invoices. It can also include logical abstractions such as report generators, tax years, and work queues.

Technically, a semantic class is a named collection of attributes that are sufficient to identify a particular entity. For example, a PERSON class might have FirstName and LastName attributes. If you can identify members of the PERSON class by using their FirstName and LastName attribute values, then that's good enough. (In real life, many people have the same first and last names, so that's probably not enough, but we'll let it slide for now.)

By convention, the names of semantic classes are written in ALL CAPS as in EMPLOYEE, WORK_ORDER, or PHISHING_ATTACK. Some people prefer to use hyphens instead of underscores, so the last two would be WORK-ORDER and PHISHING-ATTACK.

A semantic object (SO) is an instance of a semantic class. It is an entity instance that has all of the attributes defined by the class filled in. For example, an instance of the PERSON class might have FirstName “Stephen” and LastName “Colbert.”

Traditionally, the attributes that define a semantic class and that distinguish semantic objects are written in mixed case as in LastName, InvoiceDate, and DaysOfConfusion.

Attributes come in three flavors: simple, group, and object.

A simple attribute holds a simple value such as a string, number, or date. For example, LastName holds a string and EmployeeId holds a number.

A group attribute holds a composite value—a value that is composed of other values. For example, an Address attribute might contain Street, Suite, City, State, and ZipCode. You could think of these as separate attributes, but that would ignore the structure built into an address. These values really go together, so, to represent them together, you use a group attribute.

An object attribute represents a relationship with some other semantic object. For example, a relationship may represent logical containment. A COURSE class would have a STUDENT object attribute to represent the students taking the course. Similarly the STUDENT class would have a COURSE object attribute representing the courses that a student was taking. Each of these classes is related to the other, so they are called paired classes. Similarly, their related attributes are called paired attributes.

Cardinality

An attribute's cardinality tells how many values of that attribute an object might have. For example, at the start of some volleyball tournaments each team's roster must contain between 6 and 12 players.

You write the lower and upper bounds beside the attribute to which they apply separated by a period. The volleyball team roster's Players attribute would have cardinality 6.12. (I have no idea why it's a single period and not a dash or ellipsis. At this point it's just a historical quirk. If you decide to use a dash in your diagrams, I doubt the Semantic Object Model Police will break down your door and haul you away.)

Usually the minimum cardinality is 0 if the value is optional or 1 if it is required.

The maximum cardinality is usually 1 if at most one value is allowed or N if any number of values is allowed.

Probably the most common cardinalities are:

  • 1.1—Exactly one value required. For example, suppose you are building a database to track restaurant orders. In the ORDER class, the ServerName attribute would have cardinality 1.1 because every order must have exactly one server.
  • 1.N—Any number of values but at least one required. For example, the ORDER class's Item attribute would hold the items requested by the diners and would have cardinality 1.N. It wouldn't make sense to send an order to the kitchen if it didn't contain any items, but it could contain a potentially infinite number of items. (In practice, though, I might double-check with the server if the kitchen received an order for 13,000 hamburgers.)
  • 0.1—An optional single value. For example, the server might want to record a Comment to go with the order. (“Extra cheese on the milkshake.”)
  • 0.N—Any number of optional values. For example, a collection of Comment attributes. (“Dressing on the side for salad 1.” “No mayo on burger 2.” “Recognize poor tipper, use day-old breadsticks.”)

Those are the most common cardinalities, but you may need to use others for your application. For example, suppose an entrée comes with two side dishes included, and you can add up to two more for an extra charge. In that case, the SideDishes field would have cardinality of 2.4.

Identifiers

An object identifier is a group of one or more attributes that the users will typically use to identify an object in the class.

An object identifier can include a single attribute such as CustomerId or a group of attributes such as FirstName, MiddleName, and LastName.

You indicate an identifier by writing the text ID to the left of its attributes. Often identifiers contain unique values so every item in the class will have different values for the identifier. For example, CustomerId, SocialSecurityNumber, and Isbn are unique identifiers for customers, employees, and books, respectively. You can indicate a unique identifier by underlining the ID notation to its left.

Sometimes, non-unique identifiers are used to find groups of objects. For example, many systems might search for customers by LastName and FirstName, but sometimes those values are not unique, so you shouldn't underline them. That way, your database won't implode if you have two customers named Zaphod Beeblebrox.

Putting It Together

Figure 5.4 shows a simple representation of a CUSTOMER class that demonstrates these notational features.

A representation exhibits the tables holding their original data.

FIGURE 5.4

A big box surrounds the whole class definition. The class name, CUSTOMER, goes at the top.

CustomerId is a simple attribute that is used to identify customers, so it gets the ID notation. CustomerId values are unique so the ID is underlined. This value is required and a customer can have only one ID, so its cardinality is 1.1.

Users sometimes want to search for customers by name, so the Name attribute is also an identifier. It is possible that two customers could have the same name, however, so here ID isn't underlined.

The CUSTOMER class includes address information stored in the Addresses group attribute. Each address has the attributes AddressType (this will be something like Shipping or Billing), Street, Suite, City, State, and ZipCode. All of these except Suite are required and can hold only one value, so they have cardinality 1.1. The Suite attribute is optional, so its cardinality is 0.1. Lines show the attributes contained inside the Addresses value. The 1.N to the lower right of the group indicates that a CUSTOMER object must have one or more Addresses values (each containing a Street, maybe Suite, City, State, and ZipCode).

Finally, the class has two object attributes named CONTACT and ORDER. The CONTACT attribute represents one or more contact people for the customer. The box around the attribute tells you that this is an object attribute. Its cardinality 1.N indicates that the CUSTOMER must have at least one contact.

The ORDER attribute represents the orders placed by this customer. You might think that this should have cardinality 1.N. After all, why would you need a customer who doesn't place any orders? However, when you first create a customer record it will have no associated orders, so we should allow that. You might also want to be able to make a customer record in anticipation of future orders. For both of those reasons, this design sets the cardinality of ORDER to 0.N.

Semantic Views

Sometimes, it is useful to define different views into the same data. For example, consider the kinds of information a company typically tracks for its employees. That information might include:

  • Normal contact information such as name, address, phone number, and next of kin
  • Work-related contact information such as title, office number, extension, pager number, and locker number at the country club (if you're an executive)
  • Confidential salary information, including your complete salary and annual bonus history
  • Other confidential information such as your stock plan and 401K program participation, insurance selections, annual performance reviews, and golf handicap (with and without witnesses)

Some of this information, such as your name and title, is freely available to anyone who wants it.

Other semi-public information is available to anyone within the company but not outside the company. (Many companies worry that executive recruiters with the company phonebook could steal employees away with all of their valuable skills and the proprietary information locked inside their heads.) This information includes your office number, extension, project history, and birth date (excluding the year). It does not include your home address, annual performance reviews, salary history, or other financial data.

Other more sensitive information should be available to your manager and other superiors but not to the general population of coworkers. This information includes such things as your annual performance reviews and work history. However, your manager does not need to know how much you are having deducted for retirement contributions, whether you participate in the company stock plan, whether you are deducting the extra $750 a month for the dental plan, and the amount garnished from your wages to pay off your outstanding Pokémon debt. Those sorts of information should be hidden from your manager. (Depending on the way your company is structured, your manager might not even need to know your salary.)

The people in the Human Resources department are the ones who arrange to siphon money out of your paycheck for such perks as the stock plan and dental insurance, so they obviously need to know that information. However, they probably don't need access to your annual performance reviews.

Figure 5.6 shows an EMPLOYEE class and four views that give access to different parts of the employee data. For simplicity, I've shown each attribute as if it were a simple attribute when actually most of these are group or object attributes. For example, the OfficeData attribute is really a compound attribute including Title, Office, Extension, BirthDate, and so forth.

A representation of an ER diagram.

FIGURE 5.6

Defining these different views allows you to make data available only to those who need it. (This notion of view maps directly to the relational database concept of view, so defining views now will help you later.)

After you finish building a complete semantic object model, you should check each of the views to ensure that they contain all of the information needed for each class of user and nothing else. For example, you should run through all the use cases for managers to see if the EMPLOYEE class's MANAGER_VIEW provides enough information to handle those use cases. You should also check that every piece of data included in the MANAGER_VIEW is actually used. If something isn't used in at least one use case, then managers might not need it and it might not belong in the MANAGER_VIEW.

Class Types

The following sections describe some of the types of classes that you might need to use while building semantic object models. Some of these are little more than names for simple cases. Others, such as association classes and derived classes, introduce new concepts that are useful for building models.

Simple Objects

A simple object or atomic object is one that contains only single-valued simple attributes. For example, an inventory item class might include the attributes Sku, Description, UnitPrice, and QuantityInStock. Each inventory item's data must include exactly one value for each of these attributes.

Figure 5.7 shows a simple INVENTORY_ITEM class.

A representation exhibits the relationship between students and courses.

FIGURE 5.7

Composite Objects

A composite object contains at least one multivalued, non-object attribute. For example, suppose you allow online customers to provide product reviews for inventory items. Then you could add a multivalued Reviews attribute to the class shown in Figure 5.7 to get the composite object shown in Figure 5.8.

Note that the multivalued attribute need not be a simple attribute. For example, suppose you decide not to use a simple attribute to hold customer comments. Instead, for each comment you store the customer's user name, a numeric rating, and comments. Figure 5.9 shows the revised INVENTORY_ITEM class.

A representation of the StudentCourses table exposes multiple sets of records representing different years and semesters.

FIGURE 5.8

A representation exhibits the final relational model.

FIGURE 5.9

Compound Objects

A compound object contains at least one object attribute. For example, consider the CUSTOMER class shown in Figure 5.10. This class contains basic information such as a customer name and shipping and billing addresses. Its CONTACT object attribute stores information about the person we should contact if we have a question about this customer. (This is also the person who gets our junk mail.) The SALES_REPRESENTATIVE object attribute refers to another object representing the sales representative who is charged with keeping this customer happy. (Okay, not too much junk mail.)

A representation of an ER diagram of exposes a multiple-object association which is one where many different kinds of objects are collectively associated to each other.

FIGURE 5.10

Hybrid Objects

A hybrid object contains a combination of the other kinds of attributes. For example, it might contain a multivalued group that contains an object attribute. The ORDER class shown in Figure 5.11 contains a LineItems group attribute to represent the items in the order. Each LineItems entry contains an INVENTORY_ITEM object attribute that refers to an object of the type shown in Figure 5.9.

A representation of a new ER diagram.

FIGURE 5.11

Association Objects

An association object represents a relationship between two other objects and stores extra information about the relationship.

Association objects are particularly useful for many-to-many relationships, where an object of one class can be associated with many objects of a second class, and an object of the second class can be associated with many objects of the first class.

For example, consider the PROJECT and DEVELOPER classes. A PROJECT may include many DEVELOPERs and a DEVELOPER may work on many PROJECTs, so the two classes have a many-to-many relationship. Figure 5.12 shows this relationship modeled with straightforward object attributes.

A representation of a model which uses two association tables to represent the two many-to-many relationships.

FIGURE 5.12

If this is all there is to the relationship, then this model is fine. However, if there is extra information that should be stored with the relationship, this model has no place to store that information.

For example, suppose developers play different roles in a project. A developer might be a technical lead, toolsmith, tester, writer, meeting snack supplier, or even the project's manager. In that case, there's no place to store this information in Figure 5.12. You can't place it in the PROJECT class because data in that class applies to the project as a whole and not to a specific developer on the project. You can't place the information in the DEVELOPER class because a developer might play different roles on different projects.

The solution is to create an association class to connect these classes and store the extra information. Figure 5.13 shows the new design. A PROJECT_ROLE object connects the PROJECT and DEVELOPER classes to represent the relationship that a particular developer has with a particular project. The RoleName attribute stores the information about the type of role that a particular developer plays in the project (technical lead, tester, snack supplier, and so forth).

A representation exposes the PROJECT_ROLE object which connects the PROJECT and DEVELOPER classes to represent the relationship that a particular developer has with a particular project.

FIGURE 5.13

For a concrete example, consider Dr. Frankenstein's famous Build-a-Friend project. The following table shows this PROJECT object's attribute values:

PROJECTIDDESCRIPTIONPROJECT_ROLE
Build-a-FriendMake a friend out of spare parts.Role1
  Role2

The following table shows the attribute values for the two DEVELOPER objects:

DEVELOPERIDFIRSTNAMELASTNAMEPROJECT_ROLE
Dr. FrankensteinTedFrankensteinRole1
IgorIgorJohnsonRole2

Finally, the following table shows the values for PROJECT_ROLE objects:

ROLEIDROLENAMEDEVELOPERPROJECT
Role1Mad ScientistDr. FrankensteinMake-a-Friend
Role2FlunkyIgorMake-a-Friend

From this data, you can figure out which developers play which roles on what projects.

Inherited Objects

Sometimes, one class might share most of the characteristics of another class but with a few differences.

For example, say you've built a CAR class that has typical automobile attributes: Make, Model, Year, NumberOfCupholders, and so forth.

Now suppose you decide you need a RACECAR class. A racecar is a type of car, so it has all of the same attributes that a car has. In addition, it has some racecar-specific attributes such as ZeroTo60Time, ZeroTo100Time, TopSpeed, and QuarterMileTime. You could build a whole new class that duplicates all of the CAR attributes and adds the new ones. That would not only be extra work (something any self-respecting database designer should avoid), but it also wouldn't acknowledge the relationship between the two classes.

Instead you can make RACECAR a subclass or subtype of the CAR class. To denote a subclass in a semantic object model, create a RACECAR class that contains only the new attributes not included in CAR. Include an object attribute in CAR linking to the RACECAR class and using the notation 0.ST in place of the cardinality to indicate that RACECAR forms an optional subtype for CAR. Then place an object attribute in the RACECAR class, linking it back to the CAR class and using the notation p in place of the cardinality to indicate that the link refers to the parent class.

Figure 5.15 shows a CAR class and a RACECAR subclass. In this case, the RACECAR class is said to inherit from the CAR class. CAR is called the parent class, superclass, or supertype.

A representation of a CAR class and a RACECAR subclass.

FIGURE 5.15

In more complicated models, a class can have multiple subclasses, nested subclasses, or multiple parent classes.

For example, suppose you decide that you also want to store information about motorcycles. Motorcycles and cars share some information but one isn't really a special type of the other, so you create a new VEHICLE class to hold the common features. You then pull the common attributes from the CAR class into VEHICLE and make both CAR and MOTORCYCLE subclasses of VEHICLE. In this example, you have multiple classes (CAR and MOTORCYCLE) inheriting from a common parent class (VEHICLE). You also have a nested class RACECAR inheriting from the CAR subclass.

Comments and Notes

Semantic object models are fairly good at capturing the basic classes involved in a project, and through object attributes they do a decent job of showing which classes are related to other classes. However, they don't capture every possible scrap of information about a project.

For example, semantic object models don't indicate an attribute's domain. There's nothing in Figure 5.15 that shows that the CAR class's Make attribute must take values from an enumerated list (Ford, GM, Yugo, DeLorean, and so forth), that Model must come from a list that depends on Make, and that NumberOfCupholders should be an integer between 0 and 99. (Some of the bigger SUVs may need three-digit numbers.)

For an even stranger example, suppose you build a VOLLEYBALL_TEAM class to represent volleyball teams. Depending on the tournament, a volleyball team might have 2, 4, or 6 players, but other values are not allowed. (Although I've seen some really weird formats including the “executive retreat” event where as many 12 people wearing slacks and dress shirts but no shoes squeeze onto the court.) A semantic object model lets you specify a minimum and maximum for the PLAYER object attribute, but it cannot handle allowing the specific values 2, 4, or 6.

A semantic object model also doesn't necessarily capture all of the meaning of the relationships between classes. For example, suppose you build BAND and ARTIST classes to store information about your favorite heavy metal bands. You would like to make separate fields in the BAND class to represent lead vocal, lead guitar, lead trombone, and other key band members, but because these are all object attributes, you need to represent them in the model as ARTIST. You'd really like to make LeadVocal, LeadGuitar, and LeadTrombone attributes that have as their domain ARTIST objects.

Although you cannot make those kinds of attributes, you can jot down notes saying what each of the ARTIST objects in the BAND class represents. You can add them as a footnote to the class, in a separate document, or in any other way that will make it easy for you to remember the meanings of these associations.

Remember, the point of a semantic model (or any model for that matter) is to help you understand the problem. If the model alone doesn't capture the full scope of the problem, then add comments, notes, attachments, video clips, dioramas, and other extras to clarify. A model can only do so much, and if it's missing something, write it down. You may not need this information now to build the initial model, but you'll need it later to build the database.

ENTITY-RELATIONSHIP MODELS

An entity-relationship diagram (ER diagram or ERD) is another form of object model that in many ways is similar to a semantic object model. It also allows you to represent objects and their relationships, although it uses different symbols. ER diagrams also have a different focus, providing a bit more emphasis on relationships and a bit less on class structure (hence, the R in “ER diagram”).

The following sections explain how to build basic ER diagrams to study the entities and relationships that define a project.

Entities, Attributes, and Identifiers

An entity is similar to a semantic object. It represents a specific instance of something that you want to track in the object model. Like semantic objects, an entity can be a physical thing (such as an employee, work order, or espresso maker) or a logical abstraction (such as an appointment, discussion, or excuse for being late to work).

Similar entities are grouped into entity classes or entity sets. For example, the employee entities Bowb, Phrieda, and Gnick belong to the Employee entity set.

Like semantic objects, entities include attributes that describe the objects that they represent.

There are a couple of different methods for drawing entity sets. In the first method, a set is contained within a rectangle. Its attributes are drawn within ellipses and attached to the set with lines. If one of the attributes is an identifier (also called a key or primary key), then its name is underlined. Figure 5.16 shows a simple Employee entity set with three attributes. (Some developers write entity set names in ALL CAPS, whereas others use Mixed Case.)

A representation of a simple Employee entity set with three attributes.

FIGURE 5.16

One problem with this notation is that it takes up a lot of room. If you add all of the attributes to the Employee class (EmployeeId, FirstName, LastName, SocialSecurityNumber, Street, Suite, City, State, ZipCode, HomePhone, CellPhone, Fax, Email, and so forth), you'll get a pretty cluttered picture. If you then try to add Department, Project, Manager, and other classes to the picture with all of their attributes, you can quickly build an incomprehensible mess.

A second approach is to draw entity sets in a manner similar to the one used by semantic object models, and then place only the set's name in the ER diagram. Lines and other symbols, which are described shortly, connect the entity sets to show their relationships. This approach allows you greater room for listing attributes while removing them from the main ER diagram so that it can focus on relationships.

Relationships

An ER diagram indicates a relationship with a diamond containing the relationship's name. The name is usually something very descriptive such as Contains, Works For, or RulesWithAnIronFist, so often the relationship is perfectly understandable on its own. If the name isn't enough, you can add attributes to a relationship just as you can add them to entities: by placing the attribute in an ellipse and attaching it to the relationship with a line.

Normally entity names are nouns such as Voter, Person, Forklift, and Politician. Relationships are verbs such as Elects, Drives, and Deceives. When you see entities and relationships connected in an ER diagram, they appear as easy-to-read caveman or Tarzan phrases such as Voter Elects Politician, Person Drives Forklift, and Politician Deceives Voter.

Figure 5.17 shows the Person Drives Forklift relationship.

A representation exhibits the Person Drives Forklift relationship.

FIGURE 5.17

Note that every relation implicitly defines a reverse relation. The phrase Person Drives Forklift implicitly defines the relation Forklift IsDrivenBy Person. Usually you can figure out the relation's direction from the context. You can help by drawing the relationships from left-to-right and top-to-bottom whenever possible. (If you use a right-to-left language like Aramaic or Urdu, or a top-to-bottom language like Japanese or Korean, use your best judgment.)

I've also seen ER diagrams that include arrows above or beside a relationship to show its direction. For example, Figure 5.18 shows an ER diagram that includes three objects and two relationships. The arrows make it easier to see that Customer Places Order and Shipper Ships Order.

A representation exhibits an ER diagram that includes three objects and two relationships.

FIGURE 5.18

Cardinality

To add cardinality information, ER diagrams add one or more of three symbols to the lines leading in and out of entity sets. The three symbols are:

  • Ring—A ring (or circle or ellipse, but strangely not a 0) means zero.
  • Line—A short line (or dash or bar, but not a 1) means one.
  • Crow's foot—A crow's foot (or teepee or whatever you call it) means many.

These aren't too hard to remember because the number 0 looks like a circle, the number 1 looks a line, and the crow's foot looks like several 1s.

If two of these symbols are present, they give the minimum and maximum number of entities that can be associated with the relation. For example, if the line entering an entity includes a circle and line, then zero or one of those items is associated with the relation.

For a concrete example, consider Figure 5.19. The relationship Swallows connects the classes SwordSwallower and Sword. The two lines beside SwordSwallower mean that the relationship involves between 1 and 1 SwordSwallower. In other words, the relationship requires exactly one SwordSwallower. The circle and crow's foot beside Sword mean that the relationship involves between 0 and many swords. That means this is a one-to-many relationship. One sword swallower swallows many swords.

A representation exhibits the relationship Swallows which connects the classes SwordSwallower and Sword.

FIGURE 5.19

ER diagrams only have three symbols for representing three cardinalities: 0, 1, and many. (It reminds me of those primitive tribes that only have words for the numbers 1, 2, and many. I wonder if they played a role in developing ER diagrams?) This means you cannot specify cardinality as precisely as you can with semantic object models, which let you explicitly give upper and lower bounds.

For example, suppose you want to represent 2 to 4 jugglers juggling 5 or more flaming torches. (It's hardly juggling if two people just stand there holding four torches. Even I could do that, if they're not too heavy.) In a semantic object model, you would give the jugglers the cardinality 2.4 and the torches 5.N. Because ER diagrams don't have symbols for 2, 4, or 5, you're out of luck if you're building an ER diagram.

But wait! The point of these models is to gain an understanding of the system, not to rigidly follow the rules to their ridiculous conclusions, so I see no reason why you shouldn't merge the best of both systems and use ER diagrams that specify cardinality in the semantic object model style. (Yes, I know I may be cast out for such sacrilege.)

Figure 5.20 shows how I would model the jugglers. You won't find many people who use this combined notation on the Internet, so you should understand the normal ER symbols, too, but this version seems easy enough to understand.

A representation of the model of the jugglers.

FIGURE 5.20

Inheritance

Like a semantic object model, an ER diagram can represent inheritance. An ER diagram represents inheritance as a special relationship named IsA (read as “is a”) that's drawn inside a triangle. One point of the triangle points toward the parent class. Other lines leading into the triangle attach on the triangle's sides.

For example, a space shuttle crew contains several different kinds of astronauts, including Commander, Pilot, Mission Specialist, Payload Specialist, and Bartender. (I'm guessing about that last one.) All of these have the common crew member attributes plus additional attributes that relate to their more specialized roles. For example, a Commander, Pilot, and Mission Specialist have special NASA space training (I'll call them “space trained”).

A Payload Specialist is a doctor, physicist, database design book author (hint, hint), or other professional who comes along for the ride to perform some specific mission such as watching spiders spin webs in microgravity.

Figure 5.21 shows one way that you might model this inheritance hierarchy in an ER diagram. The PayloadSpecialist inherits directly from Astronaut. SpaceTrained also inherits from Astronaut, although the relationship diagram probably will include only subclasses of SpaceTrained and not any SpaceTrained entities. Commander, Pilot, and MissionSpecialist inherit from SpaceTrained.

A representation exhibits one way that you might model this inheritance hierarchy in an ER diagram.

FIGURE 5.21

Sometimes, you might see the IsA symbol shared by more than one inherited entity. The result implies a sibling relationship that probably doesn't mean much (for example, SpaceTrained and PayloadSpecialist are related only by the fact that they inherit from a common parent entity) but it does make the diagram less cluttered.

Figure 5.22 shows the same inheritance diagram shown in Figure 5.21 but with this new notation.

A representation exhibits the one way that you might model this inheritance hierarchy in an ER diagram.

FIGURE 5.22

Additional Conventions

ER diagrams use a few other conventions to add fine shades of meaning to a model.

If every entity in an entity set must participate in the relationship, then the diagram includes a thick or double line. This is called a participation constraint because each entity must participate.

For example, consider the Pilot Flies Airplane relationship. During flight, every airplane must have a pilot (otherwise it's called a “smoking pile of metal” instead of an “airplane”). This is a participation constraint on the Airplane entity set because all entities in that set must participate in the relationship (that is, have a pilot).

If an entity can participate in at most one instance of the relationship set, the diagram uses an arrow to connect the entity to the relationship. This is called a key constraint. For example, during flight a pilot can fly at most one airplane, so the Pilot entity set has a key constraint on the Flies relationship. (I suppose, however, a pilot could fly a drone while in the cockpit, and thus, technically fly two planes at the same time.)

If an entity must be involved in exactly one instance of a relationship set, it gets a thick or double arrow to indicate both participation and key constraints. For example, during flight an airplane must have one and only one pilot, so it would get the thick or double arrow.

Figure 5.24 shows the Pilot Flies Airplane relationship. Each Pilot can fly at most one airplane, so Pilot is connected to the relationship with an arrow (key constraint). A Pilot might sometimes be a passenger who's not flying the airplane, so there's no participation constraint on Pilot for this relationship. On the other side of the relationship, the Airplane must have one and only one Pilot, so it gets the double arrow to indicate both key and participation constraints. The cardinalities are between 1 and 1 for both entities because there's a one-to-one relationship between Pilot and Airplane (ignoring copilots) in this relationship.

A representation exhibits the Pilot Flies Airplane relationship.

FIGURE 5.24

A weak entity is one that cannot be identified by its attributes alone. For example, consider a database to store submarine race results. A Race entity holds information about a particular race. A Result entity holds information about how a submarine performed in a race. The Result entity has attributes to store a reference to the Race entity, a reference to a Sub entity, and result information such as Time, FinishPosition, and TorpedoesFired.

Alone, there's no reasonable way to find a specific Result entity. There is no combination of Result attributes that really makes sense as a search key. You could search for a combination of Time and FinishPosition but that doesn't identify a particular Result.

Instead, you would either search for a particular Race and use it to find its associated Results, or search for a particular Sub and use it to find its associated Results.

In an ER diagram, you draw a weak entity with a thick rectangle and connect it to its identifying relationship with a thick arrow. Figure 5.25 shows the Race, Sub, and Result entity sets and their relationships.

A representation exhibits the Race, Sub, and Result entity sets and their relationships.

FIGURE 5.25

Comments and Notes

As is the case with semantic object models, you shouldn't be afraid to add notes, comments, scribbles, and anything else to make an ER diagram easier to understand. Annotate entity set definitions to show the domain and cardinality of an entity's attributes. Add notes to further explain confusing entities and relationships.

The purpose of an ER diagram is to help you understand a project, not to become a technically correct but uninformative doodle.

RELATIONAL MODELS

Chapter 2, “Relational Overview,” explained basic concepts of relational databases such as tables, tuples, rows, and columns. (If you don't remember Chapter 2, go back and skim through it quickly to refresh your memory.)

Converting semantic object models and ER diagrams into a relational version isn't too difficult once you know how the concepts described in Chapter 2 map to those described so far in this chapter. The following table shows how key terms from Chapter 2 map to the terms used in semantic object models and ER diagrams.

THEORYDATABASEFILESOMER
RelationTableFileClassEntity Set
TupleRowRecordObjectEntity
AttributeColumnFieldAttributeAttribute

To convert semantic object models and ER diagrams into relational models, you simply map the classes or entity sets to tables. You then figure out which columns in the tables form the foreign key relationships among the tables.

The following sections work through examples of converting SOM and ER models into relational ones.

Converting Semantic Object Models

Consider the simple semantic object model shown in Figure 5.26. A CUSTOMER object has one or more Addresses, one or more CONTACTs, and one or more ORDERs. The CONTACT class contains only simple attributes. The ORDER class contains a simple Date and a group attribute to hold information about Items ordered.

A representation of a simple semantic object model. This model leads immediately to three relational tables: Customers, Contacts, and Orders.

FIGURE 5.26

This model leads immediately to three relational tables: Customers, Contacts, and Orders.

If the semantic object model includes inheritance relationships, build a table for each of the object sets. Use the parent class's primary key as a foreign key in the child class to connect the two in a one-to-one relationship. For example, if CUSTOMER inherits from PERSON, then add a PersonId field in the Customers table to associate the corresponding records in the two tables.

The CUSTOMER class's CONTACT and ORDER attributes indicate that there should be a link from the Customers table to the Contacts and Orders tables. To do this, you can place foreign key fields in the Contacts and Orders tables to hold the CustomerId values of their corresponding Customer records. To make understanding the relational model easier, call those fields CustomerId so they match the name in the Customers table.

At this point, the relational model is practically finished. Only one little problem remains: a relational record cannot hold a potentially unlimited number of columns. In this example, a row in the Customers table cannot have an unlimited number of columns to hold multiple address values for every row. Similarly, the Orders table cannot have an unlimited number of columns to hold item data.

The solution is similar to the one used to allow a Customers record to correspond to multiple Contacts and Orders records. Create new tables to hold the repeated items. Then use foreign key fields to link those records back to their owning Customers and Orders records.

Figure 5.27 shows the resulting relational model.

A representation exhibits the resulting relational model. Each table’s primary key is underlined.

FIGURE 5.27

Each table's primary key is underlined. (Only the Customers and Orders tables have primary keys.)

Lines connect the fields that form foreign key relationships. The numbers at the ends of these lines give the numbers of items participating in the relationship (the infinity symbol ∞ means “many”). In this example, all of the relationships are one-to-many relationships.

This diagram shows relationships among tables but doesn't show much other detail. In particular, it doesn't show the fields' data types or whether they are required. If you expand each table's representation, you can add some of this information. Figure 5.28 shows the same model with columns to show the fields' data types and whether each is required.

There's only so much information you can add to one of these diagrams, however. Even this relatively simple diagram is pretty big if you add data type and required data. Often, it's better to stick to the simpler version and place additional information in separate documents.

A representation exhibits the data types for each of the tables’ fields, that does not necessarily completely specify the fields’ domains.

FIGURE 5.28

As is the case with all models, you should write down notes to record any information that is not fully captured by the diagram alone. For example, Figure 5.28 does not show which fields are required, their meanings (what does Sku mean, anyway?), more precise cardinalities (what if “one-to-many” should really be “one-to-four”), and so forth.

Though the figure gives data types for each of the tables' fields, that does not necessarily completely specify the fields' domains. For example, the Zip field should contain a 5-digit ZIP Code or a ZIP+4 Code similar to 12345-5678, UnitPrice should be a positive monetary value, and the Email field should hold a properly formatted email address such as [email protected].

You should write down all of these and any other constraints that are not obvious from the diagram. (In case you're curious, Sku stands for “stock keeping unit” and is pronounced “skew.” It's like a serial number that you can use to identify products.)

Converting ER Diagrams

Figure 5.29 shows an ER diagram that covers a situation similar to the one modeled by the semantic object model shown in Figure 5.26.

Each Customer entity has at least one Address, Contact, and Order. Those are all participation constraints, so they are drawn with double lines.

The Address, Contact, and Order entities are accessed through their corresponding Customer entities. That makes them weak entities, so they are drawn with thick rectangles and they have thick arrows pointing to their identifying relationships. (If you want to allow the users to search for orders directly, perhaps by an OrderId, then Order would not be a weak entity.)

The Order entity must be associated with at least one Item, so it has a participation constraint drawn with a double line. The Item entity is weak, so it is drawn with a thick rectangle and uses a thick arrow to connect to its identifying relationship.

The entities in the ER diagram lead directly to the relational tables Customers, Addresses, Contacts, Orders, and Items.

A representation exhibits an ER diagram that covers a situation similar to the one modeled by the semantic object model.

FIGURE 5.29

To connect a weak entity with its owner, make sure the owner's table has a primary key. Then add a foreign key field to the weak entity's table that refers back to the owner's primary key.

The resulting relational model is the same as the one generated by the semantic object model and is shown in Figure 5.27.

You can handle inheritance the same way you did for semantic object models. Build a table for each of the entities. Use the parent class's primary key as a foreign key in the child class to connect the two in a one-to-one relationship. For example, if Politician inherits from Weasel, then add a WeaselId field in the Politicians table to link the corresponding records in the two tables.

As is the case when translating a semantic object model into a relational model, you will need to write down any extra conditions, constraints, or other information that is not completely captured by the model. See the end of the previous section for some examples of things you might want to write down.

SUMMARY

Different kinds of models help define a problem. They identify the entities that are significant to the problem, and they clarify the relationships among those entities. You can then use the models to test your understanding of the problem and to verify that the models provide the data you need to satisfy the problem's use cases and other requirements.

This chapter explained how to build different kinds of models. In this chapter, you learned how to:

  • Build user interface models to learn what kind of data the database will need to store.
  • Build semantic object models to study the objects that will interact while solving the problem.
  • Build entity-relationship diagrams to study the entities that are involved in the problem and to examine their interactions.
  • Convert semantic object models and entity-relationship diagrams into relational models.

After you've built a relational model, you can use it to start building a database. Before you begin, however, there are several techniques that you can use to make the model more robust and efficient. The first of these techniques, extracting business rules, is described in the next chapter.

Before you move on to Chapter 6, however, use the following exercises to test your understanding of the material covered in this chapter. You can find the solutions to these exercises in Appendix A.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.36.72