Predicate Specialization and Generalization

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

14.2. Predicate Specialization and Generalization

The previous section illustrated how the same UoD structure may be described by different, but equivalent, schemas and discussed the use of transformations to reshape one schema into another. This section considers a class of schema transformations known as predicate specialization, as well as its inverse, predicate generalization.

If two or more predicates may be thought of as special cases of a more general predicate, then we may replace them by the more general predicate, as long as the original distinction can be preserved in some way. For example, if we transform the schema of Figure 14.1(a) into that of Figure 14.1(b), we generalize smoking and drinking into indulging in a vice, where vice has two specific cases. If we transform in the opposite direction, we specialize indulging in a vice into two predicates, one for each case.

Predicate specialization and generalization are similar notions to object type specialization and generalization, except that it is rare to specify any subtype connections between predicates (the predicates must be objectified for this to be legal).

A predicate may be specialized if a value constraint or a frequency constraint indicates that it has a finite number of cases. Examples with value constraints are more common, so we examine these first. The drinker-smoker example provides one illustration, where Vice has the value constraint {’S’, ‘D’}. As another example, recall the Olympic Games schema reproduced in Figure 14.4(a).

Figure 14.4. The ternary is specialized into three binaries by absorbing MedalKind.

Because there are exactly three kinds of medal, the ternary may be specialized into three binaries, one for each medal kind, as in Figure 14.4(b). You may visualize the transformation from schema (a) into schema (b), thus when the object type MedalKind is absorbed into the ternary predicate, it divides it (or specializes it) into the three binaries. Hence this transformation is also known as object type absorption. The reverse transformation from (b) to (a) generalizes the three binaries into the ternary by extracting the object type MedalKind—this may be called object type extraction.

Notice that in the vices example, a binary is specialized into unaries. With the games example, a ternary is specialized into binaries. In general, when an n-valued object type is absorbed into a predicate, the n specialized predicates that result each have one less role than the original (since the object type has been absorbed). This general result is set out in Figure 14.5. Although the equivalence is set out diagram-matically to aid understanding, there is a formal mapping of the diagrams to predicate logic where the actual equivalence proofs were made (Halpin 1989b). In this sense, the results may be called theorems.

Figure 14.5. PSGl: R may be specialized into S₁... S_n by absorbing B.

The schema equivalence in Figure 14.5 is called Predicate Specialization/Generalization theorem 1 (PSGl). The predicates R and S, respectively, have m + 1 roles and m roles, where m≥1. The object types A₁.. A_m and B are not necessarily distinct. The value type for B has n values b₁,.., b_n..If m = 1 we have conversion between a binary and n unaries; if m = 2, the conversion is between a ternary and n binaries; and so on. Transforming from left to right specializes the predicate R into n predicates S₁.. S_n by absorbing the object type B. The reverse transformation from right to left generalizes the shorter predicates into the longer one by extracting B. As an exercise, draw the diagrams for the cases m = 1,2, and 3.

The theorem PSGl holds regardless of whatever additional constraints are added. However, any constraint added to one of the schemas must be translated into an equivalent, additional constraint on the other schema. For example, the uniqueness constraint in Figure 14.4(a) translates into the three shorter uniqueness constraints in Figure 14.4(b). This is an instance of the following corollary to PSGl (using “UC” for “uniqueness constraint” and reading “spans” as “exactly spans”).

If a UC in R spans a combination of B’s role and other roles, a UC spans the specialization of these other roles in S₁.. S_n; and conversely.

Figure 14.6 illustrates the most common case, where R is a ternary with a UC spanning B’s role and one other. However the result applies for longer predicates too. The games example of Figure 14.4 is an instance of the equivalence in Figure 14.6, where B = MedalKind, n = 3, and B’s role is included in a compound uniqueness constraint. What happens, however, if B’s role in the general predicate R is not included in a uniqueness constraint? Since R is elementary, its other roles must be spanned by a uniqueness constraint; this constraint is transformed into a mutual exclusion constraint over the specialized predicates.

Figure 14.6. The UC on the left is equivalent to the UCs on the right.

Exclusive unaries provide the simplest case of this. For example, suppose that workers in a company may hold at most one position (manager, clerk, or secretary). This may be rephrased in terms of exclusive unaries as shown in Figure 14.7, assuming appropriate translations between the predicates. For implementation purposes, the binary version is usually preferred (e.g., its relational schema simplifies both updates to a worker’s position, and schema updates to the list of allowable positions). The larger the number of unaries, the worse the unary solution becomes (why?).

Figure 14.7. The UC on the left is equivalent to the exclusion constraint on the right.

Another example, this time with exclusive binaries, is shown in Figure 14.8. For simplicity, reference schemes are omitted. Here the task codes ‘A’ and ‘R’ denote authoring and reviewing. Note the two alternative ways of saying that a person cannot act both as an author and as a reviewer of the same book.

Figure 14.8. The UC on the left is equivalent to the exclusion constraint on the right.

These two examples illustrate unary (m = 1) and binary (m = 2) cases of the following second corollary to theorem PSG1. This result applies to longer predicates as well (see Figure 14.9). Here the exclusion constraint means that no row of values may appear in more than one of the S, fact tables.

Figure 14.9. The UC on the left is equivalent to the exclusion constraint on the right.

If a UC spans all roles of R except for fBs role, then S₁...S_n are mutually exclusive; and conversely.

In rare cases, when a schema transformation is performed, a graphical constraint in one schema may have no corresponding graphical constraint in the other. In this case, the constraint should be expressed as a textual constraint in the other version. For example, consider the two schemas in Figure 14.10. The codes ‘A’ and ‘S’ indicate “assistant” and “supervisor”, respectively. Reference schemes are omitted for simplicity. A person might supervise one project and assist on another.

Figure 14.10. A textual constraint in schema (a) appears as a graphical constraint in (b).

Assuming appropriate translations between the predicates, the uniqueness constraint in schema (a) translates to the exclusion constraint in schema (b), and the top uniqueness constraint in (b) is implied. However, the lower uniqueness constraint in (b) declares that each project has at most one supervisor. This cannot be expressed as a graphical constraint on the ternary since it is a restricted uniqueness constraint (as discussed in Section 7.4). To ensure equivalence, this is added to the left-hand schema as the textual constraint Each Project uses at most one Employee in Position ‘S’.

If Project plays other functional roles, the binary approach of the right-hand schema would normally be preferred since its relational version simplifies the enforcement of this supervision constraint. If Project has no other functional roles, the ternary approach might be preferred since it maps to a single table.

Since schema transformation theorems are typically applied to subschemas within some global schema, care must be taken to preserve any mandatory role constraints (implicit or explicit). For example, in Figure 14.4, if Country plays no other role, then its role in schema (a) and the disjunction of its roles in (b) are implicitly mandatory. If Country does play another role in the global schema, and at least one medal result must be recorded for it, then these simple and disjunctive mandatory role (inclusive-or) constraints must be explicitly shown (see Figure 14.11).

Figure 14.11. A mandatory constraint in (a) maps to an inclusive-or constraint in (b).

This illustrates our third corollary to theorem PSG1:

If Als role (or role disjunction) in /R is mandatory, then the disjunction of its specialized roles is mandatory, and conversely (1 ≤i≤ m).

The inclusion of (or role disjunction) covers the rare case when A_i plays more than one role in R (recall that A_I .. A_m are not necessarily distinct). Note also how our convention for implicit mandatory role constraints enables theorems such as PSG1 to be specified without any assumption about other roles played in the global schema.

Now consider Figure 14.12(a). Here Country’s medal winning role is optional (this implies Country plays another role, depicted here as an unnamed role) but it has a frequency constraint of 3. So if any medal results are recorded for a country, all three medal results (gold, silver, and bronze) are required. To express this in Figure 14.12(b) we add an equality constraint between the medal winning roles played by Country. This is an instance of the following fourth corollary to theorem PSG:

Figure 14.12. A frequency constraint in (a) maps to an equality constraint in (b).

If R is a ternary with a UC spanning just ffs role and one other role, then adding a frequency constraint of n to this other role is equivalent to adding an equality constraint over the specialized versions of that role.

Now consider Figure 14.13(a), where again Country plays another role in the global schema, but this time its medal winning role is mandatory. The mandatory, frequency, uniqueness, and value constraints together ensure that each country must have its medal tally recorded for each kind of medal. Hence Figure 14.13(b) has three mandatory roles. This example illustrates the following fifth corollary to theorem PSG1:

If R is a ternary with a UC spanning just ffs role and one other role, then adding a mandatory role constraint and frequency constraint of n (the number of possible values for 6) to this other role is equivalent to making each specialized version of that role mandatory.

Figure 14.13. The impact of adding mandatory role and frequency constraints.

This corollary is illustrated in Figure 14.14. It is implied by the previous two corollaries, since an equality constraint across a set of disjunctively mandatory roles implies that each of these roles is mandatory.

Figure 14.14. Fifth corollary to theorem PSG1.

Other Kinds of Predicate Specialization/Generalization

As an introduction to a second equivalence theorem, consider Figure 14.15. The two schemas provide alternative models for a fragment of a car rally application. Each car in the rally has two drivers (a main driver and a backup driver), and each person drives exactly one car. The schema in Figure 14.15(a) is transformed into the schema in Figure 14.15(b) by absorbing the object type DriverStatus into the drives predicate, specializing this into the main driver and backup driver predicates. The reverse transformation generalizes the specific driver predicates into the general one by extracting the object type DriverStatus. Since this object type appears in a different fact type, this equivalence does not fit the pattern of PSG1.

Figure 14.15. The drives predicate is specialized by absorbing DriverStatus.

Note how the constraints are transformed. The external uniqueness constraint in (a) says that each car has at most one main driver and at most one backup driver. This is captured in Figure 14.15(b) by the uniqueness constraints on the roles of Car. The uniqueness constraint on the drives predicate in Figure 14.15(a) corresponds in Figure 14.15(b) to the uniqueness constraints on the roles of Driver. The uniqueness constraint on the driver status predicate in Figure 14.15(a) is captured by the exclusion constraint in Figure 14.15(b). The mandatory and frequency constraints on Car’s role in Figure 14.15(a) require the two mandatory role constraints on Car in Figure 14.15(b). Finally, the mandatory role constraints on Driver in Figure 14.15(a) are catered for in Figure 14.15(b) by the inclusive-or constraint (shown explicitly here as part of the exclusive-or constraint).

This example illustrates our second predicate specialization/generalization theorem (PSG2) as well as four of its corollaries (see Figure 14.16). The terms “LHS” and “RHS” abbreviate “left-hand schema” and “right-hand schema”. In our example, A, B and C correspond to Driver, DriverStatus, and Car; the equality constraint is implied by the two mandatory role constraints on Driver.

Figure 14.16. PSG2: R may be specialized into S₁..S_n by absorbing B.

Theorem PSG2 can be derived from earlier results. For example, adding an exclusion constraint across A’s roles in the RHS of Figure 14.6 makes A’s role unique in the LHS, causing this compound fact type to split into two binaries to agree with the LHS of Figure 14.16. Section 14.4 considers two more general versions of this theorem.

Sometimes we may wish to transform a schema into another that is not quite equivalent. For example, suppose that in our car rally application we limit each car to at most two drivers, but do not classify the drivers in any meaningful way (e.g., as main or backup drivers). Let us also remove the constraint that drivers may drive only one car. This situation is schematized in Figure 14.17.

Figure 14.17. Can the predicate be specialized?

Although no DriverStatus object type is present, the frequency constraint in Figure 14.17 says that each car has at most two drivers. This enables us to introduce an artificial distinction to specialize the predicate into two cases, as shown in Figure 14.18.

Figure 14.18. Two ways of strengthening the schema in Figure 14.17.

Since this distinction is not present in the original schema, the alternatives shown in Figure 14.18 are not equivalent to the original. They are, in fact, stronger—each implies the original schema, but is not implied by it.

Notice the use of hyphens in the predicates of Figure 14.18 to bind adjectival phrases to their object terms. This improves verbalization of the fact types and constraints. The four predicates are Car has Driver-A, Car has Driver-B, Car has first-Driver, and Car has second-Driver. When the NORMA tool generates constraint verbalization, it binds the adjective to the object noun. For example, the upper uniqueness constraint in Figure 14.18(b) is verbalized as “Each Car has at most one first Driver”. Without the hyphen, the verbalization defaults to the awkward “Each Car has first at most one Driver”.

The schema in Figure 14.18(b) is actually stronger than the schema in Figure 14.18(a). If the Car role in Figure 14.17 is mandatory, then the Car roles in Figure 14.18(a) are disjunctively mandatory, and the top Car role in Figure 14.18(b) is mandatory (which then implies the subset constraint).

In a business domain where other facts are stored about cars but not about drivers, one of these alternatives may well be chosen to avoid a separate table being generated for car-driver facts when the schema is mapped (the specialized predicates are functional rather than m:n). In practice, the schema in Figure 14.18(b) would normally be chosen. The UML version of this choice is shown in Figure 14.19(b), using a note to declare the subset and exclusion constraints. Figure 14.19(a) shows the UML version of Figure 14.17.

Figure 14.19. UML versions of the schemas in Figure 14.17 and Figure 14.18(b).

Transforming from the original schema in Figure 14.17 to one of those in Figure 14.18 strengthens the schema by adding information. Transforming in the opposite direction weakens the schema by losing information. Any such transformations that add or lose information should be the result of conscious decisions that are acceptable to the client (for which the business domain is being modeled).

This example illustrates our third specialization/generalization theorem (PSG3), as shown in Figure 14.20. Note the use of “” for “implies”, since this result is a schema implication rather than an equivalence. In practice, the transformation is usually performed from right to left (strengthening the schema), with a subset constraint added from the first role of S₂ to that of S₁ fn = 2. Some corollaries for this result are also specified.

Figure 14.20. PSG3: The left-hand schema implies the right-hand schema.

Well that covers the most important results concerning predicate specialization and generalization. Note that the theorems require that the appropriate translations hold between general and special predicates. Humans are needed to provide natural readings for the new predicate(s) and to confirm that the translation holds. This is not something that can be decided automatically by the system.

Exercise 14.2

Schematize the following table using (a) unaries and (b) a binary.

Male staff	Female staff
Halpin. T	Jones, F
Morgan, T	Smith. A

A company committee has to decide whether to increase its budget on staff training. The following table indicates the current views of the committee members on this issue. Schematize this using (a) unaries and (b) a binary.

For	Against	Undecided
Alan	Betty	Chris
David	Eve	Fred
Gearty

The following table is an extract from an output report indicating quarterly sales figures for software products marketed by a particular company.

Schematize this using a ternary.
Transform your solution into an equivalent one using binaries.

Software	Quarter	Sales ($)
DataModeler	1	200 000
	2	500 000
	3	500 000
	4	700 000
WordLight	1	90 000
	2	150 000
	3	155 000
	4	200 000

An embassy maintains details about how well its employees speak foreign languages. The following table is an extract from this system.

Schematize this using binaries.
Transform this into a ternary.

Language	Expert	Novice
Arabic		Smith,J
Dutch	Bruza,P Proper, HA
French	Rolland.C	Bruza, P

University debating teams are to be selected, with four students in each team, with one student from each year level (1..4). No student may be in more than one team. Students are identified by their student number, but their name is also recorded. Debating teams are identified by codes.

Schematize this UoD using YearLevel as an object type.
Transform your schema by absorbing this object type.

Employee records are kept that show the employee number, name, and up to two phone numbers for each employee.

Schematize this in the natural way.
Map this to a relational schema.
It is now required to map all the information into a single relational table. Transform the conceptual schema to enable this to happen.
Rmap your revised conceptual schema.

Each phone is now to be classified as a work phone or a home phone. At most one work phone and at most one home phone may be recorded for an employee.

Modify your solution to (a) accordingly.
Rmap this.
Modify your solution to (c) accordingly.
Rmap this.

The following conceptual schemas are meant to describe the same UoD, but each fails to capture some constraint in the other. Add a textual constraint to schema (1) and a graphical constraint to schema (2) to obtain equivalence. The codes ‘chr’, ‘sec’, and ‘ord’ abbreviate “chairperson”, “secretary”, and “ordinary member”.

Consider the following academic UoD. Each subject is identified by its code but also has a unique title. Each subject has at most three assignments (possibly none), numbered 1, 2, and 3. Each subject has a second assignment only if it has a first assignment, and it has a third assignment only if it has a first and a second assignment. Each assignment has exactly one due date. Although assignments for different subjects may be due on the same date, no subject has more than one assignment due on the same date.

Although unlikely, it is possible that within a subject the chronological order of due dates differs from the numerical order of the assignments (e.g., because of software problems the due date for CS400 assignment 1 might be postponed until after the due date for CS400 assignment 2).

Model this by adding constraints (graphic, and textual if needed) to the schema:
Reference schemes: Subject(.code)
Assignment (has AssignmentNr(), is for Subject)
Fact types: Subject has SubjectTitle()
Assignment is due on Date(mdy)
Map this to a relational schema, including all constraints.
Specialize the is-due-on predicate in (a) by absorbing AssignmentNr into it.
Map this to a relational schema, including all constraints.
Suppose that within each subject the chronological order of assignment due dates must match their numeric order. How does this affect answers to (c) and (d)?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Predicate Specialization and Generalization

Create new playlist

Sign In

Sign Up