4.4. External Uniqueness Constraints

The uniqueness constraints discussed so far are called internal (intrapredicate) uniqueness constraints, since each applies to one or more roles inside a single predicate. We now discuss external (interpredicate) uniqueness constraints. These apply to roles from different predicates. To help understand these constraints, and prepare for later discussion on queries, the conceptual join operation is first discussed.

Consider the schema and sample population shown in Figure 4.31. The left-hand binary fact table indicates that employee 15 drives two cars, with registration numbers PKJ123 and ABC000. The right-hand binary fact table tells us that the car PKJ123 is imported from Italy. The ternary fact table indicates that employee 15 drives the car PKJ123 imported from Italy. As the definition reveals, this ternary is derived by combining the binaries, with Car as the connecting or join object type.

Figure 4.31. A compound ternary derived from a conceptual (inner) join of the binaries.


The defining expression “Employee drives Car that was imported from Country” is a formal verbalization of the conceptual path from Employee, through the drives predicate, Car type, and import predicate, to Country. The keyword “that” declares that the driven car must be the same as the imported car. In other words, if an employee drives a car, we must use the very same car to continue the path to Country.

The instance diagram in Figure 4.32 may help clarify things. The three binary facts appear as lines connecting two dots. The ternary fact corresponds to the continuous path from Employee to Country. Starting at the Car type, this path may also be verbalized as the conjunction: Car is driven by Employee and was imported from Country.

Figure 4.32. An instance diagram for the model shown in Figure 4.31


A fact type resulting from a join is a compound fact type, not an elementary fact type, since it is essentially the conjunction of at least two simpler fact types. For this reason, join fact types are normally excluded from ORM schema diagrams. If included for discussion purposes, they may be shaded as shown in Figure 4.31 to indicate that they are just views. Unlike elementary derived fact types, their compound nature can lead to redundancy unless each join role is covered by a simple UC.

For example, if we add the row (16, PKJ123) to the drives table in Figure 4.31, this causes the row (16, PKJ123, IT) to be added to the ternary. The fact that car PKJ123 is imported from Italy is now contained twice in the ternary. Although join fact types are never used as asserted fact types, join paths are often involved in constraints and queries, as discussed later.

In an ORM schema, to navigate from one predicate to another, we must pass through an object type, performing a conceptual join (or object join) on that object type. By default, the join condition is that the object instance remains the same as we pass through. This is called a conceptual inner join. This is similar to a relational natural inner join (see Chapter 12), except that conceptual joins require the conceptual objects to be the same, instead of matching attribute names and values. ORM object types are conceptual domains, not attributes. Conceptual joins still apply if we declare different role names (cf. attribute names) for the join roles.

In terms of the fact tables however, the column value of the role entering the join object type must equal the column value of the role that exits the object type. Figure 4.33 depicts two predicates, R and S, sharing a common object type, B. From a fact table perspective, the conceptual inner join of R and S is obtained by pairing rows of R with rows of S where the B column values match, and arranging for the final B column to appear just once in the result. The dotted lines show how the rows are paired together. For example, row (al, bl) is paired with rows (bl, cl) and (bl, c3) to give rows (al, bl, cl) and rows (al, bl, c3).

Figure 4.33. Joining on B requires the values in the two B columns to match.


Figure 4.34 shows the result of this join operation. The final B column is the intersection of the original B columns. The natural inner join of any tables R and S is denoted by “RS” or “R natural join S”. The “inner” is often omitted and assumed by default.

Figure 4.34. Table resulting from the natural inner join of the tables in Figure 4.33.


Other kinds of joins are sometimes discussed (e.g., joins based on operators other than equality, and outer joins). A left (right, full) outer join is obtained by adding to the inner join those rows, if any, where the join column value occurs in just the left (just the right, just one of) the tables, and padding the missing entries of such rows with nulls. For example, the left outer join of the binary fact tables in Figure 4.31 adds the row (15, ABC000, ?) where “?” denotes a null.

The full outer join of the tables in Figure 4.33 includes rows (a2, b2, ?) and (?, b4, c4). Outer joins are often used in queries at both conceptual and relational levels. Since asserted fact types in ORM are elementary, their fact tables cannot have nulls. We postpone further discussion of outer joins until Chapter 10.

With this background, let’s consider an example with an external uniqueness constraint. An output report concerning high school students is shown in Table 4.2. As an exercise, perform step 1 for the top row of this report before reading on.

The information for row 1 may be verbalized as two elementary facts:

The Student with studentNr ‘001’ has the StudentName ‘Adams J’.
The Student with studentNr ‘001’ is in the Class with code ‘11 A’.
Table 4.2. An output report about high school students.
Student NrNameClass
001Adams J11A
002Brown C12B
003Brown C11A

The first fact relates an entity and a value. The second fact is a relationship between two entities. In this UoD, students are identified by their student numbers. As rows 2 and 3 show, it is possible for two different students to have the same student name (“Brown C”). A populated schema diagram is shown in Figure 4.35.

Figure 4.35. A populated schema for Table 4.2 (draft version).


Let us agree that the population supplied is significant. It follows that each student has at most one name, and that each student is in at most one class. These uniqueness constraints have been captured on the schema diagram. However, there is another uniqueness constraint that is missing. What is it?

The output report is reproduced in Table 4.3, with uniqueness constraints marked above the relevant columns. Notice that the combination of name and class is unique. Let’s suppose that this is significant. Although a student’s name need not be unique in the high school, it is unique in the student’s class. While there are two students named “Brown C”, there can be only one student with this name in class 12B-, and only one student with this name in class 11 A.

Table 4.3. Uniqueness constraints added to original table.
Student NrNameClass
001Adams J11A
002Brown C12B
003Brown C11A

In the rare case where another student named “Brown C” joined one of these classes, at least one of the names would be modified to keep them distinct within that class. For instance, we could add extra initials or numbers (e.g., “Brown CT”, “Brown C2”).

Each row of the report splits into two elementary facts, one for the name and one for the class. So to specify the name-class UC on the schema we need to involve two fact types. The role boxes to which the uniqueness constraint applies are joined by dotted lines to a circled uniqueness bar “0” as shown in Figure 4.36.

Figure 4.36. An external uniqueness constraint has been added.


Because this uniqueness constraint involves roles from different predicates, it is an example of an external constraint or interpredicate constraint. A constraint on a single predicate is an internal or intrapredicate constraint. So there are two kinds of uniqueness constraint: internal uniqueness and external uniqueness.

In this example, the external uniqueness constraint indicates that for each student the combination of student name and class is unique. The constraint verbalizes as: “Given any StudentName and Class, at most one Student has that StudentName and is in that Class”. For instance, given ‘Adams J’ and ‘11A’ there is only one studentNr sharing rows with both (‘001’). Given ‘Adams J’ and ‘12B’ there is no studentNr paired with both.

Perhaps the easiest way to understand this constraint is to say that if we perform the conceptual (inner) join operation on the two fact tables, then the resulting table has a uniqueness constraint across the (name, class) column pair. Note that when we join the two fact tables, we obtain the table in the original report (Table 4.3)

The external uniqueness constraint is equivalent to an internal uniqueness constraint on the first and last roles of the derived, ternary fact type formed from the join path: StudentName is of Student who is in Class. See Figure 4.37. Because the object type Student is personal, we prefer the personal pronoun “who” instead of “that”.

Figure 4.37. Equivalent constraints on derived, join fact type (illegal in base model).


The middle role of the join fact type has a simple UC. This is equivalent to the two simple uniqueness constraints on the binaries. Although join fact types may be included on a schema diagram for discussion purposes, they are illegal in the base ORM model because they are compound, not elementary, fact types.

As discussed shortly, the presence of a simple uniqueness constraint on a ternary disqualifies it from being elementary. So don’t feel that it’s okay to do this in your normal model.

Suppose we added the following facts to the current populations: Student ‘004’ has StudentName ‘Adams J’; Student ‘004’ is in Class ‘11 A’. If this update were accepted, the new populations would be shown in Figure 4.38(a). To test your understanding, explain why this would violate the external uniqueness constraint before reading on.

Figure 4.38. This update would violate the external uniqueness constraint.


The extra row added to each table provides a counterexample to the constraint, since for ‘Adams J’ and ‘11 A’ there are two studentNr entries paired with these entries (001 and 004). This breaks the rule that student names are unique within a given class. In class 11A, two students (001,004) have the same name (Adams J’).

If we join the tables on studentNr (the dotted lines indicate the matches for the join), we obtain the ternary table in Figure 4.38(b), which clearly violates the equivalent compound uniqueness constraint over the StudentName and Class columns.

The general case is summarized in Figure 4.39, where A, B and C are any object types, and R and S are predicates. External uniqueness constraints may apply to roles from reference types, not just fact types. This is the basis for composite reference schemes, as discussed in a later chapter.

Figure 4.39. External uniqueness constraint.


As a more complex case, suppose persons enroll in subjects but are given subject positions (1st, 2nd, 3rd etc.) rather than subject ratings. Each person achieves at most one position in any given subject. Moreover, no ties may occur (i.e., for each position in a subject there is only one student).

Table 4.4. Student ranking within subjects (no ties allowed).
PersonSubjectPosition
Adams JCS1143
Adams JCS10010
Adams JPD1023
Brown CCS11410
Brown CPD1025

Table 4.4 shows a sample report for this UoD. Performing step 1 on the first row, we may express the information as the ternary: Person (name) ‘Adams J’ is placed in Subject (code) ‘CS114’ at Position (nr) 3. This leads to the schema of Figure 4.40. Notice the overlapping uniqueness constraints. Check these with the population for yourself to ensure that you understand them.

Figure 4.40. A schema for Table 4.4.


Now suppose we adopt a nested approach instead. For example we might express the information on row 1 as follows: Person (name) ‘Adams J’ enrolled in Subject (code) ‘CS114’; this Enrollment achieved Position (nr) 3. This leads to the nested version shown in Figure 4.41. Actually, for this to be equivalent to the ternary, the role played by the objectified association must be mandatory (see next chapter).

Figure 4.41. Another schema for Table 4.4 (nested version).


The Enrollment predicate is many:many. With this in mind, the simple uniqueness constraint on the achievement predicate captures the UC spanning the first two roles in the flat (unnested) version (Figure 4.40). The external uniqueness constraint corresponds to the constraint spanning the last two roles in the flattened version—each (Subject, Position) combination is unique. In other words, if we flatten the nested version into a ternary then any (Subject, Position) pair occurs on at most one row of the ternary table. To help understand this, we suggest that you add the fact tables to the diagram. The fact table for the outer predicate effectively matches the output report.

External uniqueness constraints sometimes connect more than two roles. For example, a point in three-dimensional space might be associated with a unique combination of x, y, and z coordinates.

As you may gather from these examples, nesting tends to produce a more complex constraint picture. For this example, the flattened version is preferred. However, if the Position information is optional (e.g., to be added later) then the nested approach is preferred. Such modeling choices are covered in more detail later.

Another reason for nesting is to avoid embedding the same association within more than one fact type. Suppose we have to record both a rating and a unique subject position for each student taking any given subject. A sample report for this situation is shown in Table 4.5.

Table 4.5. Students are assigned both ratings and positions.
PersonSubjectRatingPosition
Adams JCS11473
Adams JCS100610
Adams JPD10273
Brown CCS114610
Brown CPD10275

We might describe this UoD using two ternaries, as in Figure 4.42. Note however that all (Person, Subject) combinations appearing in one fact table must also appear in the other. A later chapter shows how such a constraint may be added.

Figure 4.42. A schema for Table 4.5 (flattened version).


Alternatively, we may objectify the enrollment association between Person and Subject, and attach rating and position predicates to this (see Figure 4.43). The nested approach corresponds to reading the information on row 1 of the table as: Person ‘Adams J’ enrolled in Subject ‘CS114’; this Enrollment scored Rating 7; this Enrollment achieved Position 3. To indicate that rating and position must be recorded, additional constraints are needed (see next chapter).

Figure 4.43. Another schema for Table 4.5 (nested version).


Note that if a role has a simple, internal uniqueness constraint then it should not be included in an external uniqueness constraint. Even if a join path exists to enable an external uniqueness constraint to be declared, the external constraint would be implied by the stronger, simple uniqueness constraint; it is generally preferable to omit implied constraints.

As an exercise to illustrate this point, you may wish to modify Figure 4.36 so that student names are actually unique, and then show that the external constraint is implied.

Before you start the section exercise, there’s one point you should be clear about. When you examine an information sample, you are seeing it at the external level. There are often many aspects of the presentation that may not be relevant to your application. In a tabular report for example, the order of the columns or rows is not normally something that you need to model. If the information is presented in nontabular ways (e.g., diagrammatic) there are always topological and metric aspects, and possibly other aspects (e.g., color), that are represented along with basic data. Whether these additional aspects of the presentation need to be modeled is something that only you and the domain expert can decide. To fully appreciate this point, make sure that you attempt at least Question 2 from the following exercise.

Exercise 4.4

1.Add the uniqueness constraints to the conceptual schema diagrams for:
  1. Exercise 3.4 Question 6

  2. Exercise 3.5 Question 4

2.Many manufactured products contain parts that may themselves be products of even smaller parts. The structure of one such product is shown.

  1. List at least three different kinds of fact that are captured in this diagram.

  2. Assume that we are interested in modeling only containment facts. These facts may also be displayed as a labeled tree (hierarchy) as shown. Draw a conceptual schema diagram for this UoD, including uniqueness constraints, but make no use of nesting.

  3. Draw an equivalent conceptual schema diagram that does make use of nesting.

3.A car dealer maintains a database on all the cars in stock. Each car is identified by the vehicle identification number (VIN) displayed on a plate attached to the car (e.g., on its dashboard). For each car the dealer records the model (e.g., Saturn SW2), the year of manufacture (e.g., 2006), the retail price (e.g., $18,000), and the color (e.g., dark green).

Because of space limitations the dealer will never have in stock more than one car of the same model, year, and color at the same time. The dealer also keeps figures on the number of cars of a particular model and color that are sold in any given year. For example, in 2007, five dark green Saturn SW2s were sold. Draw the conceptual schema diagram, including all uniqueness constraints.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.39.142